How to keep from inserting duplicated rows into a table? - duplicates

I have a table MY_TABLE that contains records as follows:
NAME
ID1
ID2
FLAG
JESSICA
12
34
TRUE
NULL
12
34
TRUE
I want to insert into another table TEST the values from the last 3 columns from MY_TABLE but I don't want to duplicate the rows
I am trying to do:
INSERT ALL
WHEN ID1 IS NOT NULL AND FLAG THEN
INTO TEST VALUES (
ID1,
ID2,
FLAG
)
SELECT *
FROM MY_TABLE
LEFT JOIN TEMP ON ID1;
This is resulting in my table looking like:
ID1
ID2
FLAG
12
34
TRUE
12
34
TRUE
instead of:
ID1
ID2
FLAG
12
34
TRUE
The issue I am running into is that these duplicated values for these last 3 columns are resulting from my join and I can't select only the last 3 columns in my query because I need the first column for another table I am inserting into as well (not shown and also is the reason why I need to use INSERT ALL here). Is there a way to solve this duplicate rows issue within the INSERT itself?

You can project a column with a row number for each group of rows and use the row number to decide which row to use in your multi-table insert.
create or replace temp table T1 as
select
COLUMN1::string as "NAME",
COLUMN2::string as "ID1",
COLUMN3::string as "ID2",
COLUMN4::string as "FLAG"
from (values
('JESSICA','12','34','TRUE'),
('NULL','12','34','TRUE')
);
select NAME
,ID1
,ID2
,FLAG
,row_number() over (partition by ID1, ID2, FLAG order by NAME nulls last) ROWNUMBER
from t1
;
That select will produce a result set like this:
NAME
ID1
ID2
FLAG
ROWNUMBER
JESSICA
12
34
TRUE
1
NULL
12
34
TRUE
2
In your multi-table insert, you can then key off the ROWNUMBER column:
INSERT ALL
WHEN ID1 IS NOT NULL AND FLAG AND ROWNUMBER = 1 THEN
[etc., etc.]

Related

Ho to assign Previous value in column for each record

I have one table scenario in which data looks like this .
Request Id Field Id Current Key
1213 11 1001
1213 12 1002
1213 12 103
1214 13 799
1214 13 899
1214 13 7
In this when loop starts for first Request ID then it should check all the field ID for that particular request ID. then data should be look like this .
Request Id Field Id Previous Key Current Key
1213 11 null 1001
1213 12 null 1002
1213 12 1002 103
1214 13 null 799
1214 13 799 899
1214 13 899 7
When very first record for Field id for particular request id come then for it should be take null values in Previous key column and the current key will remain the same.
When the second record will come for same field ID its should take previous value of first record in Previous key column and when third record come it should take previous value of second record in Previous column and so on .
When the new field ID came the same thing should be repeated again.
Please let me know if you need any more info.Much needed your help.
You can check this.
Declare #t table (Request_Id int, Field_Id int, Current_Key int)
insert into #t values (1213, 11, 1001),(1213, 12, 1002), (1213, 12, 103) , (1214, 13, 799), (1214, 13, 899), (1214, 13, 7)
;with cte
as (
select 0 rowno,0 Request_Id, 0 Field_Id, 0 Current_Key
union
select ROW_NUMBER() over(order by request_id) rowno, * from #t
)
select
t1.Request_Id , t1.Field_Id ,
case when t1.Request_Id = t2.Request_Id and t1.Field_Id = t2.Field_Id
then t2.Current_Key
else null
end previous_key
, t1.Current_Key
from cte t1, cte t2
where t1.rowno = t2.rowno + 1
Refer link when you want to compare row value
When the second record will come for same field ID...
Tables don't work this way: there is no way to tell that 1213,12,1002 is the "previous" record of 1213,12,103 as you assume in your example.
Do you have any data you can use to sort your records properly? Request id isn't enough because, even if you guarantee that it increments monotonically for each operation, each operation can include multiple values for the same item id which need to be sorted relative to each other.
IN SQL 2008
You do not have the benefit of the lead and lag functions. Instead you must do a query for the new column. Make sure you query both tables in the same order, and add a row_num column. Then select the greatest row_num that is not equal to the current row_num and has the same request_id and field_id.
select a.request_id,
a.field_id,
(select x.current_key
from (select * from (select t.*, RowNumber() as row_num from your_table t) order by row_num desc) x
where x.request_id = a.request_id
and x.field_id = a.field_id
and x.row_num < a.row_num
and RowNumber()= 1
) as previous_key,
a.current_key
from (select t.*, RowNumber()as row_num from your_table t) a
IN SQL 2012+
You can use the LAG or LEAD functions with the OVER clause to get the previous or next nth row value:
select
Request_Id,
Field_Id,
lag(Current_Key,1) over (partition by Request_ID, Field_ID) as Previous_Key
,Current_Key
from your table
You should probably look at how you order your results too. If you have multiple results lag will only grab the next row in the default order of the table. If you had another column to order by such as a date time you could do the following:
lag(Current_Key,1) over (partition by Request_ID, Field_ID order by timestampColumn)
try this,
declare #tb table (RequestId int,FieldId int, CurrentKey int)
insert into #tb (RequestId,FieldId,CurrentKey) values
(1213,11,1001),
(1213,12,1002),
(1213,12,103),
(1214,13,799),
(1214,13,899),
(1214,13, 7)
select RequestId,t.FieldId,
case when t.FieldId=t1.FieldId then t1.CurrentKey end as PreviousKey,t.CurrentKey from
(select *, ROW_NUMBER() over (order by RequestId,FieldId) as rno
from #tb) t left join
(select FieldId,CurrentKey,
ROW_NUMBER() over (order by RequestId,FieldId) as rno from #tb) t1 on t.rno=t1.rno+1

Delete duplicates from db

I have table like following
id | a_id | b_id | success
--------------------------
1 34 43 1
2 34 84 1
3 34 43 0
4 65 43 1
5 65 84 1
6 93 23 0
7 93 23 0
I want delete duplicates with same a_id and b_id, but I want keep one record. If possible kept record should be with success=1. So in example table third and sixth/seventh record should be deleted. How to do this?
I'm using MySQL 5.1
The task is simple:
Find the minimum number of records that should not be deleted.
Delete the other records.
The Oracle way,
delete from sample_table where id not in(
select id from
(
Select id, success,row_number()
over (partition by a_id,b_id order by success desc) rown
from sample_table
)
where (success = 1 and rown = 1) or rown=1)
The solution in mysql:
Will give you the minimum ids that should not be deleted.:
Select id from (SELECT * FROM report ORDER BY success desc) t
group by t.a_id, t.b
o/p:
ID
1
2
4
5
6
You can delete the other rows.
delete from report where id not in (the above query)
The consolidated DML:
delete from report
where id not in (Select id
from (SELECT * FROM report
ORDER BY success desc) t
group by t.a_id, t.b_id)
Now doing a Select on report:
ID A_ID B_ID SUCCESS
1 34 43 1
2 34 84 1
4 65 43 1
5 65 84 1
6 93 23 0
You can check the documentation of how the group by clause works when no aggregation function is provided:
When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.
So just performing an order by 'success before the group by would allow us to get the first duplicate row with success = 1.
How about this:
CREATE TABLE new_table
AS (SELECT * FROM old_table WHERE 1 AND success = 1 GROUP BY a_id,b_id);
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;
This method will create a new table with a temporary name, and copy all the deduped rows which have success = 1 from the old table. The old table is then dropped and the new table is renamed to the name of the old table.
If I understand your question correctly, this is probably the simplest solution. (though I don't know if it's really efficient or not)
This should work:
If procedural programming is available to you like e.g. pl/sql it is fairly simple. If you on the other hand is looking for a clean SQL solution it might be possible but not very "nice". Below is an example in pl/sql:
begin
for x in ( select a_id, b_id
from table
having count(*) > 1
group by a_id, b_id )
loop
for y in ( select *
from table
where a_id = x.a_id
and b_id = x.b_id
order by success desc )
loop
delete from table
where a_id = y.a_id
and b_id = y.b_id
and id != x.id;
exit; // Only do the first row
end loop;
end loop;
end;
This is the idea: For each duplicated combination of a_id and b_id select all the instances ordered so that any with success=1 is up first. Delete all of that combination except the first - being the successful one if any.
or perhaps:
declare
l_a_id integer := -1;
l_b_id integer := -1;
begin
for x in ( select *
from table
order by a_id, b_id, success desc )
loop
if x.a_id = l_a_id and x.b_id = l_b_id
then
delete from table where id = x.id;
end if;
l_a_id := x.a_id;
l_b_id := x.b_id;
end loop;
end;
In MySQL, if you dont want to care about which record is maintained, a single alter table will work.
ALTER IGNORE TABLE tbl_name
ADD UNIQUE INDEX(a_id, b_id)
It ignores the duplicate records and maintain only the unique records.
A useful links :
MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated?

Delete duplicates where field1 and field2 are the identical

I have a table like
productId retailerId
1 2
1 2
1 4
1 6
1 8
1 8
2 3
2 6
2 6
Now, I need to remove the duplicates. I've figured out how to remove duplicates when one field is the same. But I need to remove the duplicates such as 1 2, 1 8 and 2 6, where both fields are identical.
Any help would be very gratefully received.
Use mysql's multiple-table DELETE syntax as follows:
delete mytable
from mytable
join mytable t
on t.productId = mytable.productId
and t.retailerId = mytable.retailerId
and t.id < mytable.id
See this running on SQLFiddle.
Note that I have assumed that you have an id column as well.
Edit:
Since there is no id column, there simplest approach is to copy the desired data to a temporary table, delete all data, then copy it back, as follows:
CREATE TEMPORARY TABLE temptable
SELECT DISTINCT productId, retailerId
FROM mytable;
DELEYE FROM mytable;
INSERT INTO mytable
SELECT *
FROM temptable;

How to delete only duplicate rows?

How I can create an SQL command to delete all rows from table where I have two or more specific columns with the same value and still I don't lose that row, only the duplicates?
For example:
Id value1 value2
1 71 5
2 8 8
3 8 8
4 8 8
5 23 26
Id2, Id3 and Id4 have same value1 and value2.
I need to delete all duplicate rows like (Id3 and Id4) or (Id2 and Id4) or (Id2 and Id3)
delete t
from table1 t
inner join table1 t2
on t.id>t2.id and t.value1=t2.value1 and t.value2=t2.value2
Since MySQL allows ungrouped fields in queries:
CREATE TEMPORARY TABLE ids AS
(SELECT id
FROM your_table
GROUP BY value1, value2);
DELETE FROM your_table
WHERE id NOT IN (SELECT id FROM ids);
What you can do is copy the distinct records into a new table by:
select distinct * into NewTable from MyTable

Slow query on update using select count(*)

I have to count how many times a number from table2 occurs between the number in range table2.a and table2.b
i.e. we wanna know how many times we have this : a < start < b
I ran the following query :
UPDATE table2
SET occurrence =
(SELECT COUNT(*) FROM table1 WHERE start BETWEEN table2.a AND table2.b);
table2
ID a b occurrence
1 1 10
2 1 20
3 1 25
4 2 30
table1
ID start col1 col2 col3
1 1
2 7
3 10
4 21
5 25
6 27
7 30
table2 as
3 indexes on a, b and occurrence
1567 rows (so we will SELECT COUNT(*) over table2 1567 times..)
ID column as PK
table1 as
1 index on start
42,000,000 rows
Column start was "ordered by column start"
ID column as PK
==> it took 2.5hours to do 2/3 of it. I need to speed this up... any suggestions ? :)
You could try to add the id column to the index on table 1:
CREATE INDEX start_index ON table1 (start,id);
And rewrite the query to
UPDATE table2
SET occurrence =
(SELECT COUNT(id) FROM table1 WHERE start BETWEEN table2.a AND table2.b);
This is called "covering index": http://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
-> The whole query on table 1 can be served through the data in the index -> no additional page lookup for the actual record.
Use a stored procedure. Keep the result from COUNT in a local variable, then use it to run the UPDATE query.
I will do this
// use one expensive join
create table tmp
select table2.id, count(*) as occurrence
from table1
inner join table1
on table1.start between table2.a and table2.b
group by table1.id;
update table2, tmp
set table2.occurrence=tmp.occurrence
where table2.id=tmp.id;
I think count(*) makes the database read the data rows when in your case it only needs to read the index. Try:
UPDATE table2
SET occurrence =
(SELECT COUNT(1) FROM table1 WHERE start BETWEEN table2.a AND table2.b);