Delete duplicates where field1 and field2 are the identical - mysql

I have a table like
productId retailerId
1 2
1 2
1 4
1 6
1 8
1 8
2 3
2 6
2 6
Now, I need to remove the duplicates. I've figured out how to remove duplicates when one field is the same. But I need to remove the duplicates such as 1 2, 1 8 and 2 6, where both fields are identical.
Any help would be very gratefully received.

Use mysql's multiple-table DELETE syntax as follows:
delete mytable
from mytable
join mytable t
on t.productId = mytable.productId
and t.retailerId = mytable.retailerId
and t.id < mytable.id
See this running on SQLFiddle.
Note that I have assumed that you have an id column as well.
Edit:
Since there is no id column, there simplest approach is to copy the desired data to a temporary table, delete all data, then copy it back, as follows:
CREATE TEMPORARY TABLE temptable
SELECT DISTINCT productId, retailerId
FROM mytable;
DELEYE FROM mytable;
INSERT INTO mytable
SELECT *
FROM temptable;

Related

Mysql - Merge multiple columns into one while preserving each value

Is there a simple way in Mysql to convert a multi-column resultset into a single-column resultset where each row in the single column contains a single value from each cell in the multi-column resultset?
For instance, say I have a table like:
id | fk1 | fk2 | fk3
1 2 3 4
5 6 7 8
Ideally I'd like to be able to run a query along the lines of:
SELECT <some_function>(fk1, fk2, fk3) AS value FROM myTable;
...and then get an output like:
value
2
3
4
6
7
8
Is there a straightforward way of doing so, or is the only real option to walk the multi-column resultset in code and extract the values into the format that I want?
The end goal is to be able to use the first query as a subquery in a context where the input can only be a single column of values, like:
SELECT * FROM myOtherTable WHERE id IN
(SELECT <some_function>(fk1, fk2, fk3) AS value FROM myTable);
Thus a pure SQL solution is preferable, if one exists.
The simplest way is to use union all:
select fk1 from mytable union all
select fk2 from mytable union all
select fk3 from mytable;
There are other methods, but this is simplest.
For your particular query, you can use exists and or or in:
SELECT mo.*
FROM myOtherTable mo
WHERE EXISTS (SELECT 1
FROM MyTable m
WHERE mo.id IN (m.fk1, m.fk2, m.fk3)
);
EDIT:
For performance, but an index on each fk column and use multiple conditions in the where:
SELECT mo.*
FROM myOtherTable mo
WHERE EXISTS (SELECT 1 FROM MyTable m WHERE mo.id = m.fk1) OR
EXISTS (SELECT 1 FROM MyTable m WHERE mo.id = m.fk2) OR
EXISTS (SELECT 1 FROM MyTable m WHERE mo.id = m.fk3);
This will do index lookups, which should be much faster than the original form.
Consider you table name tab1.
Now you can do query like
select fk1 from (
(select fk1 from tab1)
union all
(select fk2 from tab1)) as rt
group by fk1;
This will combine your two column in to one. Contains will appear one after another.

Delete duplicates from db

I have table like following
id | a_id | b_id | success
--------------------------
1 34 43 1
2 34 84 1
3 34 43 0
4 65 43 1
5 65 84 1
6 93 23 0
7 93 23 0
I want delete duplicates with same a_id and b_id, but I want keep one record. If possible kept record should be with success=1. So in example table third and sixth/seventh record should be deleted. How to do this?
I'm using MySQL 5.1
The task is simple:
Find the minimum number of records that should not be deleted.
Delete the other records.
The Oracle way,
delete from sample_table where id not in(
select id from
(
Select id, success,row_number()
over (partition by a_id,b_id order by success desc) rown
from sample_table
)
where (success = 1 and rown = 1) or rown=1)
The solution in mysql:
Will give you the minimum ids that should not be deleted.:
Select id from (SELECT * FROM report ORDER BY success desc) t
group by t.a_id, t.b
o/p:
ID
1
2
4
5
6
You can delete the other rows.
delete from report where id not in (the above query)
The consolidated DML:
delete from report
where id not in (Select id
from (SELECT * FROM report
ORDER BY success desc) t
group by t.a_id, t.b_id)
Now doing a Select on report:
ID A_ID B_ID SUCCESS
1 34 43 1
2 34 84 1
4 65 43 1
5 65 84 1
6 93 23 0
You can check the documentation of how the group by clause works when no aggregation function is provided:
When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.
So just performing an order by 'success before the group by would allow us to get the first duplicate row with success = 1.
How about this:
CREATE TABLE new_table
AS (SELECT * FROM old_table WHERE 1 AND success = 1 GROUP BY a_id,b_id);
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;
This method will create a new table with a temporary name, and copy all the deduped rows which have success = 1 from the old table. The old table is then dropped and the new table is renamed to the name of the old table.
If I understand your question correctly, this is probably the simplest solution. (though I don't know if it's really efficient or not)
This should work:
If procedural programming is available to you like e.g. pl/sql it is fairly simple. If you on the other hand is looking for a clean SQL solution it might be possible but not very "nice". Below is an example in pl/sql:
begin
for x in ( select a_id, b_id
from table
having count(*) > 1
group by a_id, b_id )
loop
for y in ( select *
from table
where a_id = x.a_id
and b_id = x.b_id
order by success desc )
loop
delete from table
where a_id = y.a_id
and b_id = y.b_id
and id != x.id;
exit; // Only do the first row
end loop;
end loop;
end;
This is the idea: For each duplicated combination of a_id and b_id select all the instances ordered so that any with success=1 is up first. Delete all of that combination except the first - being the successful one if any.
or perhaps:
declare
l_a_id integer := -1;
l_b_id integer := -1;
begin
for x in ( select *
from table
order by a_id, b_id, success desc )
loop
if x.a_id = l_a_id and x.b_id = l_b_id
then
delete from table where id = x.id;
end if;
l_a_id := x.a_id;
l_b_id := x.b_id;
end loop;
end;
In MySQL, if you dont want to care about which record is maintained, a single alter table will work.
ALTER IGNORE TABLE tbl_name
ADD UNIQUE INDEX(a_id, b_id)
It ignores the duplicate records and maintain only the unique records.
A useful links :
MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated?

Select a value from MySQL database only in case it exists only once

Lets say I have a MySQL table that has the following entries:
1
2
3
2
5
6
7
6
6
8
When I do an "SELECT * ..." I get back all the entries. But I want to get back only these entries, that exist only once within the table. Means the rows with the values 2 (exists two times) and 6 (exists three times) have to be dropped completely out of my result.
I found a keyword DISTINCT but as far as I understood it only avoids entries are shown twice, it does not filters them completely.
I think it can be done somehow with COUNT, but all I tried was not really successful. So what is the correct SQL statement here?
Edit: to clarify that, the result I want to get back is
1
3
5
7
8
You can use COUNT() in combination with a GROUP BY and a HAVING clause like this:
SELECT yourCol
FROM yourTable
GROUP BY yourCol
HAVING COUNT(*) < 2
Example fiddle.
You want to mix GROUP BY and COUNT().
Assuming the column is called 'id' and the table is called 'table', the following statement will work:
SELECT * FROM `table` GROUP BY id HAVING COUNT(id) = 1
This will filter out duplicate results entirely (e.g. it'll take out your 2's and 6's)
Three ways. One with GROUP BY and HAVING:
SELECT columnX
FROM tableX
GROUP BY columnX
HAVING COUNT(*) = 1 ;
one with a correlated NOT EXISTS subquery:
SELECT columnX
FROM tableX AS t
WHERE NOT EXISTS
( SELECT *
FROM tableX AS t2
WHERE t2.columnX = t.columnX
AND t2.pk <> t.pk -- pk is the primary key of the table
) ;
and an improvement on the first way (if you have a primary key pk column and an index on (columnX, pk):
SELECT columnX
FROM tableX
GROUP BY columnX
HAVING MIN(pk) = MAX(pk) ;
select id from foo group by id having count(*) < 2;

Adding one extra row to the result of MySQL select query

I have a MySQL table like this
id Name count
1 ABC 1
2 CDF 3
3 FGH 4
using simply select query I get the values as
1 ABC 1
2 CDF 3
3 FGH 4
How I can get the result like this
1 ABC 1
2 CDF 3
3 FGH 4
4 NULL 0
You can see Last row. When Records are finished an extra row in this format
last_id+1, Null ,0 should be added. You can see above. Even I have no such row in my original table. There may be N rows not fixed 3,4
The answer is very simple
select (select max(id) from mytable)+1 as id, NULL as Name, 0 as count union all select id,Name,count from mytable;
This looks a little messy but it should work.
SELECT a.id, b.name, coalesce(b.`count`) as `count`
FROM
(
SELECT 1 as ID
UNION
SELECT 2 as ID
UNION
SELECT 3 as ID
UNION
SELECT 4 as ID
) a LEFT JOIN table1 b
ON a.id = b.id
WHERE a.ID IN (1,2,3,4)
UPDATE 1
You could simply generate a table that have 1 column preferably with name (ID) that has records maybe up 10,000 or more. Then you could simply join it with your table that has the original record. For Example, assuming that you have a table named DummyRecord with 1 column and has 10,000 rows on it
SELECT a.id, b.name, coalesce(b.`count`) as `count`
FROM DummyRecord a LEFT JOIN table1 b
ON a.id = b.id
WHERE a.ID >= 1 AND
a.ID <= 4
that's it. Or if you want to have from 10 to 100, then you could use this condition
...
WHERE a.ID >= 10 AND
a.ID <= 100
To clarify this is how one can append an extra row to the result set
select * from table union select 123 as id,'abc' as name
results
id | name
------------
*** | ***
*** | ***
123 | abc
Simply use mysql ROLLUP.
SELECT * FROM your_table
GROUP BY Name WITH ROLLUP;
select
x.id,
t.name,
ifnull(t.count, 0) as count
from
(SELECT 1 AS id
-- Part of the query below, you will need to generate dynamically,
-- just as you would otherwise need to generate 'in (1,2,3,4)'
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) x
LEFT JOIN YourTable t
ON t.id = x.id
If the id does not exist in the table you're selecting from, you'll need to LEFT JOIN against a list of every id you want returned - this way, it will return the null values for ones that don't exist and the true values for those that do.
I would suggest creating a numbers table that is a single-columned table filled with numbers:
CREATE TABLE `numbers` (
id int(11) unsigned NOT NULL
);
And then inserting a large amount of numbers, starting at 1 and going up to what you think the highest id you'll ever see plus a thousand or so. Maybe go from 1 to 1000000 to be on the safe side. Regardless, you just need to make sure it's more-than-high enough to cover any possible id you'll run into.
After that, your query can look like:
SELECT n.id, a.*
FROM
`numbers` n
LEFT JOIN table t
ON t.id = n.id
WHERE n.id IN (1,2,3,4);
This solution will allow for a dynamically growing list of ids without the need for a sub-query with a list of unions; though, the other solutions provided will equally work for a small known list too (and could also be dynamically generated).

Slow query on update using select count(*)

I have to count how many times a number from table2 occurs between the number in range table2.a and table2.b
i.e. we wanna know how many times we have this : a < start < b
I ran the following query :
UPDATE table2
SET occurrence =
(SELECT COUNT(*) FROM table1 WHERE start BETWEEN table2.a AND table2.b);
table2
ID a b occurrence
1 1 10
2 1 20
3 1 25
4 2 30
table1
ID start col1 col2 col3
1 1
2 7
3 10
4 21
5 25
6 27
7 30
table2 as
3 indexes on a, b and occurrence
1567 rows (so we will SELECT COUNT(*) over table2 1567 times..)
ID column as PK
table1 as
1 index on start
42,000,000 rows
Column start was "ordered by column start"
ID column as PK
==> it took 2.5hours to do 2/3 of it. I need to speed this up... any suggestions ? :)
You could try to add the id column to the index on table 1:
CREATE INDEX start_index ON table1 (start,id);
And rewrite the query to
UPDATE table2
SET occurrence =
(SELECT COUNT(id) FROM table1 WHERE start BETWEEN table2.a AND table2.b);
This is called "covering index": http://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
-> The whole query on table 1 can be served through the data in the index -> no additional page lookup for the actual record.
Use a stored procedure. Keep the result from COUNT in a local variable, then use it to run the UPDATE query.
I will do this
// use one expensive join
create table tmp
select table2.id, count(*) as occurrence
from table1
inner join table1
on table1.start between table2.a and table2.b
group by table1.id;
update table2, tmp
set table2.occurrence=tmp.occurrence
where table2.id=tmp.id;
I think count(*) makes the database read the data rows when in your case it only needs to read the index. Try:
UPDATE table2
SET occurrence =
(SELECT COUNT(1) FROM table1 WHERE start BETWEEN table2.a AND table2.b);