SQL: Performing UPDATE IGNORE with duplicate removal - mysql

Let's assume I have following table, where PRIMARY_KEY is parent_id, child_id
parent_id | child_id
----------+---------
1 | 2
1 | 3
I would like to perform a query, which would update the rows and delete possible duplicates, so that following query:
UPDATE IGNORE table SET child_id = 2 WHERE child_id = 3;
Would result in:
parent_id | child_id
----------+---------
1 | 2
I need some kind of general solution so that I could write complex queries which would achieve this.

If MySQL was supporting modifications of the table from within a trigger on the same table the solution would have been very elegant.
CREATE TRIGGER prevent_duplicate BEFORE UPDATE ON table FOR EACH ROW
BEGIN
IF (SELECT COUNT(*) FROM table WHERE (parent_id, child_id) = (NEW.parent_id, NEW.child_id) > 0) THEN
DELETE FROM table WHERE (parent_id, child_id) = (OLD.parent_id, OLD.child_id);
END IF;
END;
Since it does not support such a feature, you might need to change your application logic in order to achieve the same thing, or create a stored procedure for these updates.

Related

How to detect circular dependency in mysql [duplicate]

So in my situation I have three tables: list, item and list_relation.
Each item will be linked to a list through the list_id foreign key.
the list_relation looks like this:
CREATE TABLE list_relation
(
parent_id INT UNSIGNED NOT NULL,
child_id INT UNSIGNED NOT NULL,
UNIQUE(parent_id, child_id)
FOREIGN KEY (parent_id)
REFERENCES list (id)
ON DELETE CASCADE,
FOREIGN KEY (child_id)
REFERENCES list (id)
ON DELETE CASCADE
);
I want to be be able to inherit from multiple lists as well (which includes the related items).
For example I have list: 1, 2, 3.
I was wondering if there was any SQL way to prevent there from being a circular relation. E.g.
List 1 inherits from List 3, List 2 inherits from List 1, List 3 inherits from List 1.
1 -> 2 -> 3 -> 1
My current idea is that I would have to find out whether it would be circular by validating the desired inheritance first then inserting it into the DB.
If you use MySQL 8.0 or MariaDB 10.2 (or higher) you can try recursive CTEs (common table expressions).
Assuming the following schema and data:
CREATE TABLE `list_relation` (
`child_id` int unsigned NOT NULL,
`parent_id` int unsigned NOT NULL,
PRIMARY KEY (`child_id`,`parent_id`)
);
insert into list_relation (child_id, parent_id) values
(2,1),
(3,1),
(4,2),
(4,3),
(5,3);
Now you try to insert a new row with child_id = 1 and parent_id = 4. But that would create cyclic relations (1->4->2->1 and 1->4->3->1), which you want to prevent. To find out if a reverse relation already exists, you can use the following query, which will show all parents of list 4 (including inherited/transitive parents):
set #new_child_id = 1;
set #new_parent_id = 4;
with recursive rcte as (
select *
from list_relation r
where r.child_id = #new_parent_id
union all
select r.*
from rcte
join list_relation r on r.child_id = rcte.parent_id
)
select * from rcte
The result would be:
child_id | parent_id
4 | 2
4 | 3
2 | 1
3 | 1
Demo
You can see in the result, that the list 1 is one of the parents of list 4, and you wouldn't insert the new record.
Since you only want to know if list 1 is in the result, you can change the last line to
select * from rcte where parent_id = #new_child_id limit 1
or to
select exists (select * from rcte where parent_id = #new_child_id)
BTW: You can use the same query to prevent redundant relations.
Assuming you want to insert the record with child_id = 4 and parent_id = 1. This would be redundant, since list 4 already inherits list 1 over list 2 and list 3. The following query would show you that:
set #new_child_id = 4;
set #new_parent_id = 1;
with recursive rcte as (
select *
from list_relation r
where r.child_id = #new_child_id
union all
select r.*
from rcte
join list_relation r on r.child_id = rcte.parent_id
)
select exists (select * from rcte where parent_id = #new_parent_id)
And you can use a similar query to get all inherited items:
set #list = 4;
with recursive rcte (list_id) as (
select #list
union distinct
select r.parent_id
from rcte
join list_relation r on r.child_id = rcte.list_id
)
select distinct i.*
from rcte
join item i on i.list_id = rcte.list_id
For those who do no have MySQL 8.0 or Maria DB and would like to use recursive method in MySQL 5.7. I just hope you don't have to exceed the max rec.depth of 255 manual:)
MySQL does not allow recursive functions, however it does allow recursive procedures. Combining them both you can have nice little function which you can use in any select command.
the recursive sp will take two input parameters and one output. First input is the ID you are searching the node tree for, second input is used by the sp to preserve results during the execution. Third parameter is the output parameter which carries the the end result.
CREATE DEFINER=`root`#`localhost` PROCEDURE `sp_list_relation_recursive`(
in itemId text,
in iPreserve text,
out oResult text
)
BEGIN
DECLARE ChildId text default null;
IF (coalesce(itemId,'') = '') then
-- when no id received retun whatever we have in the preserve container
set oResult = iPreserve;
ELSE
-- add the received id to the preserving container
SET iPreserve = concat_ws(',',iPreserve,itemId);
SET oResult = iPreserve;
SET ChildId =
(
coalesce(
(
Select
group_concat(TNode.child_id separator ',') -- get all children
from
list_relation as TNode
WHERE
not find_in_set(TNode.child_id, iPreserve) -- if we don't already have'em
AND find_in_set(TNode.parent_id, itemId) -- from these parents
)
,'')
);
IF length(ChildId) >0 THEN
-- one or more child found, recursively search again for further child elements
CALL sp_list_relation_recursive(ChildId,iPreserve,oResult);
END IF;
END IF;
-- uncomment this to see the progress looping steps
-- select ChildId,iPreserve,oResult;
END
test this:
SET MAX_SP_RECURSION_DEPTH = 250;
set #list = '';
call test.sp_list_relation_recursive(1,'',#list);
select #list;
+----------------+
| #list |
+----------------+
| ,1,2,3,6,4,4,5 |
+----------------+
don't mind about the duplicate parents or extra commas, we just want to know if an element exist in the node without having much if's and whens.
Looks fine sofar, but SP can't be used in select command so we just create wrapper function for this sP.
CREATE DEFINER=`root`#`localhost` FUNCTION `fn_list_relation_recursive`(
NodeId int
) RETURNS text CHARSET utf8
READS SQL DATA
DETERMINISTIC
BEGIN
/*
Returns a tree of nodes
branches out all possible branches
*/
DECLARE mTree mediumtext;
SET MAX_SP_RECURSION_DEPTH = 250;
call sp_list_relation_recursive(NodeId,'',mTree);
RETURN mTree;
END
now check it in action:
SELECT
*,
FN_LIST_RELATION_RECURSIVE(parent_id) AS parents_children
FROM
list_relation;
+----------+-----------+------------------+
| child_id | parent_id | parents_children |
+----------+-----------+------------------+
| 1 | 7 | ,7,1,2,3,6,4,4,5 |
| 2 | 1 | ,1,2,3,6,4,4,5 |
| 3 | 1 | ,1,2,3,6,4,4,5 |
| 4 | 2 | ,2,4 |
| 4 | 3 | ,3,4,5 |
| 5 | 3 | ,3,4,5 |
| 6 | 1 | ,1,2,3,6,4,4,5 |
| 51 | 50 | ,50,51 |
+----------+-----------+------------------+
your inserts will look like this:
insert into list_relation (child_id,parent_id)
select
-- child, parent
1,6
where
-- parent not to be foud in child's children node
not find_in_set(6,fn_list_relation_recursive(1));
1,6 should add 0 records. However 1,7 should work.
As always, i'm just proving the concept, you guys are more than welcome
to tweak the sp to return a parent's children node, or child's parent node. Or have two separate SP for each node tree or even all combined so from a single single id it returns all parents and children.
Try it.. it's not that hard :)
Q: [is there] any SQL way to prevent a circular relation
A: SHORT ANSWER
There's no declarative constraint that would prevent an INSERT or UPDATE from creating a circular relation (as described in the question.)
But a combination of a BEFORE INSERT and BEFORE UPDATE trigger could prevent it, using queries and/or procedural logic to detect that successful completion of the INSERT or UPDATE would cause a circular relation.
When such a condition is detected, the triggers would need to raise an error to prevent the INSERT/UPDATE operation from completing.
Isn't better to put a column parent_id inside the list table?
Then you can get the list tree by a query with LEFT JOIN on the list table, matching the parent_id with the list_id, e.g:
SELECT t1.list_id, t2.list_id, t3.list_id
FROM list AS t1
LEFT JOIN list as t2 ON t2.parent_id = t1.list_id
LEFT JOIN list as t3 ON t3.parent_id = t2.list_id
WHERE t1.list_id = #your_list_id#
Is it a solution to your case?
Anyway, I suggest you to read about managing hierarchical data in mysql, you can find a lot about this issue!
Do you mind if you need to add an additional table?
A SQL way and efficient way to do this is to create an additional table which contains ALL parents for every child. And then check to see if the potential child exists in the parent list of the current node before the inheritance is established.
The parent_list table would be something like this:
CREATE TABLE parent_list (
list_id INT UNSIGNED NOT NULL,
parent_list_id INT UNSIGNED NOT NULL,
PRIMARY KEY (list_id, parent_list_id)
);
Now, let's start at the very beginning.
2 inherit from 1 and 4.
parent_list is empty, which means both 1 and 4 have no parents. So it's fine in this case.
After this step, parent_list should be:
list_id, parent_list_id
2, 1
2, 4
3 inherit from 2.
2 have two parents, 1 and 4. 3 isn't one of them. So it's fine again.
Now parent_list becomes(Note that 2's parents should be 3's parents also):
list_id, parent_list_id
2, 1
2, 4
3, 1
3, 4
3, 2
4 inherit from 3.
4 exists in 3's parent list. This will lead to a cycle. NO WAY!
To check whether the cycle will happen, you just need one simple SQL:
SELECT * FROM parent_list
WHERE list_id = potential_parent_id AND parent_list_id = potential_child_id;
Want to do all these things with one call? Apply a stored procedure:
CREATE PROCEDURE 'inherit'(
IN in_parent_id INT UNSIGNED,
IN in_child_id INT UNSIGNED
)
BEGIN
DECLARE result INT DEFAULT 0;
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
ROLLBACK;
SELECT -1;
END;
START TRANSACTION;
IF EXISTS(SELECT * FROM parent_list WHERE list_id = in_parent_id AND parent_list_id = in_child_id) THEN
SET result = 1; -- just some error code
ELSE
-- do your inserting here
-- update parent_list
INSERT INTO parent_list (SELECT in_child_id, parent_list_id FROM parent_list WHERE list_id = in_parent_id);
INSERT INTO parent_list VALUES (in_child_id, in_parent_id);
END IF;
COMMIT;
SELECT result;
END
When it comes to a multiple inheritance, just call inherit multiple times.
In the example you provide, the errant relationship is simple. It's the 3 -> 1 and 1-> 3 relationships. You could simply look for the inverse relationships when inserting a new row. If it exists, don't insert the new row.
If you add an auto-incrementing column, you could then identify the offending rows specifically.
On the other hand, if you are looking at existing rows, you could identify the errant rows using a simple SQL statement like:
SELECT
a.parent_id,
a.child_id
FROM list_relation a
JOIN list_relation b
ON a.child_id = b.parent_id AND a.parent_id = b.child_id
If you add an auto-incrementing column, you could then identify the offending rows specifically.
Your question title includes the word "prevent", so I presume you want to avoid adding the rows. To do so, you would need a ON BEFORE INSERT trigger that checks for an existing row and prevents the insert. You could also use an ON BEFORE UPDATE trigger to prevent existing rows from being changed to values that would be a problem.

Preventing circular joining, recursive searches

So in my situation I have three tables: list, item and list_relation.
Each item will be linked to a list through the list_id foreign key.
the list_relation looks like this:
CREATE TABLE list_relation
(
parent_id INT UNSIGNED NOT NULL,
child_id INT UNSIGNED NOT NULL,
UNIQUE(parent_id, child_id)
FOREIGN KEY (parent_id)
REFERENCES list (id)
ON DELETE CASCADE,
FOREIGN KEY (child_id)
REFERENCES list (id)
ON DELETE CASCADE
);
I want to be be able to inherit from multiple lists as well (which includes the related items).
For example I have list: 1, 2, 3.
I was wondering if there was any SQL way to prevent there from being a circular relation. E.g.
List 1 inherits from List 3, List 2 inherits from List 1, List 3 inherits from List 1.
1 -> 2 -> 3 -> 1
My current idea is that I would have to find out whether it would be circular by validating the desired inheritance first then inserting it into the DB.
If you use MySQL 8.0 or MariaDB 10.2 (or higher) you can try recursive CTEs (common table expressions).
Assuming the following schema and data:
CREATE TABLE `list_relation` (
`child_id` int unsigned NOT NULL,
`parent_id` int unsigned NOT NULL,
PRIMARY KEY (`child_id`,`parent_id`)
);
insert into list_relation (child_id, parent_id) values
(2,1),
(3,1),
(4,2),
(4,3),
(5,3);
Now you try to insert a new row with child_id = 1 and parent_id = 4. But that would create cyclic relations (1->4->2->1 and 1->4->3->1), which you want to prevent. To find out if a reverse relation already exists, you can use the following query, which will show all parents of list 4 (including inherited/transitive parents):
set #new_child_id = 1;
set #new_parent_id = 4;
with recursive rcte as (
select *
from list_relation r
where r.child_id = #new_parent_id
union all
select r.*
from rcte
join list_relation r on r.child_id = rcte.parent_id
)
select * from rcte
The result would be:
child_id | parent_id
4 | 2
4 | 3
2 | 1
3 | 1
Demo
You can see in the result, that the list 1 is one of the parents of list 4, and you wouldn't insert the new record.
Since you only want to know if list 1 is in the result, you can change the last line to
select * from rcte where parent_id = #new_child_id limit 1
or to
select exists (select * from rcte where parent_id = #new_child_id)
BTW: You can use the same query to prevent redundant relations.
Assuming you want to insert the record with child_id = 4 and parent_id = 1. This would be redundant, since list 4 already inherits list 1 over list 2 and list 3. The following query would show you that:
set #new_child_id = 4;
set #new_parent_id = 1;
with recursive rcte as (
select *
from list_relation r
where r.child_id = #new_child_id
union all
select r.*
from rcte
join list_relation r on r.child_id = rcte.parent_id
)
select exists (select * from rcte where parent_id = #new_parent_id)
And you can use a similar query to get all inherited items:
set #list = 4;
with recursive rcte (list_id) as (
select #list
union distinct
select r.parent_id
from rcte
join list_relation r on r.child_id = rcte.list_id
)
select distinct i.*
from rcte
join item i on i.list_id = rcte.list_id
For those who do no have MySQL 8.0 or Maria DB and would like to use recursive method in MySQL 5.7. I just hope you don't have to exceed the max rec.depth of 255 manual:)
MySQL does not allow recursive functions, however it does allow recursive procedures. Combining them both you can have nice little function which you can use in any select command.
the recursive sp will take two input parameters and one output. First input is the ID you are searching the node tree for, second input is used by the sp to preserve results during the execution. Third parameter is the output parameter which carries the the end result.
CREATE DEFINER=`root`#`localhost` PROCEDURE `sp_list_relation_recursive`(
in itemId text,
in iPreserve text,
out oResult text
)
BEGIN
DECLARE ChildId text default null;
IF (coalesce(itemId,'') = '') then
-- when no id received retun whatever we have in the preserve container
set oResult = iPreserve;
ELSE
-- add the received id to the preserving container
SET iPreserve = concat_ws(',',iPreserve,itemId);
SET oResult = iPreserve;
SET ChildId =
(
coalesce(
(
Select
group_concat(TNode.child_id separator ',') -- get all children
from
list_relation as TNode
WHERE
not find_in_set(TNode.child_id, iPreserve) -- if we don't already have'em
AND find_in_set(TNode.parent_id, itemId) -- from these parents
)
,'')
);
IF length(ChildId) >0 THEN
-- one or more child found, recursively search again for further child elements
CALL sp_list_relation_recursive(ChildId,iPreserve,oResult);
END IF;
END IF;
-- uncomment this to see the progress looping steps
-- select ChildId,iPreserve,oResult;
END
test this:
SET MAX_SP_RECURSION_DEPTH = 250;
set #list = '';
call test.sp_list_relation_recursive(1,'',#list);
select #list;
+----------------+
| #list |
+----------------+
| ,1,2,3,6,4,4,5 |
+----------------+
don't mind about the duplicate parents or extra commas, we just want to know if an element exist in the node without having much if's and whens.
Looks fine sofar, but SP can't be used in select command so we just create wrapper function for this sP.
CREATE DEFINER=`root`#`localhost` FUNCTION `fn_list_relation_recursive`(
NodeId int
) RETURNS text CHARSET utf8
READS SQL DATA
DETERMINISTIC
BEGIN
/*
Returns a tree of nodes
branches out all possible branches
*/
DECLARE mTree mediumtext;
SET MAX_SP_RECURSION_DEPTH = 250;
call sp_list_relation_recursive(NodeId,'',mTree);
RETURN mTree;
END
now check it in action:
SELECT
*,
FN_LIST_RELATION_RECURSIVE(parent_id) AS parents_children
FROM
list_relation;
+----------+-----------+------------------+
| child_id | parent_id | parents_children |
+----------+-----------+------------------+
| 1 | 7 | ,7,1,2,3,6,4,4,5 |
| 2 | 1 | ,1,2,3,6,4,4,5 |
| 3 | 1 | ,1,2,3,6,4,4,5 |
| 4 | 2 | ,2,4 |
| 4 | 3 | ,3,4,5 |
| 5 | 3 | ,3,4,5 |
| 6 | 1 | ,1,2,3,6,4,4,5 |
| 51 | 50 | ,50,51 |
+----------+-----------+------------------+
your inserts will look like this:
insert into list_relation (child_id,parent_id)
select
-- child, parent
1,6
where
-- parent not to be foud in child's children node
not find_in_set(6,fn_list_relation_recursive(1));
1,6 should add 0 records. However 1,7 should work.
As always, i'm just proving the concept, you guys are more than welcome
to tweak the sp to return a parent's children node, or child's parent node. Or have two separate SP for each node tree or even all combined so from a single single id it returns all parents and children.
Try it.. it's not that hard :)
Q: [is there] any SQL way to prevent a circular relation
A: SHORT ANSWER
There's no declarative constraint that would prevent an INSERT or UPDATE from creating a circular relation (as described in the question.)
But a combination of a BEFORE INSERT and BEFORE UPDATE trigger could prevent it, using queries and/or procedural logic to detect that successful completion of the INSERT or UPDATE would cause a circular relation.
When such a condition is detected, the triggers would need to raise an error to prevent the INSERT/UPDATE operation from completing.
Isn't better to put a column parent_id inside the list table?
Then you can get the list tree by a query with LEFT JOIN on the list table, matching the parent_id with the list_id, e.g:
SELECT t1.list_id, t2.list_id, t3.list_id
FROM list AS t1
LEFT JOIN list as t2 ON t2.parent_id = t1.list_id
LEFT JOIN list as t3 ON t3.parent_id = t2.list_id
WHERE t1.list_id = #your_list_id#
Is it a solution to your case?
Anyway, I suggest you to read about managing hierarchical data in mysql, you can find a lot about this issue!
Do you mind if you need to add an additional table?
A SQL way and efficient way to do this is to create an additional table which contains ALL parents for every child. And then check to see if the potential child exists in the parent list of the current node before the inheritance is established.
The parent_list table would be something like this:
CREATE TABLE parent_list (
list_id INT UNSIGNED NOT NULL,
parent_list_id INT UNSIGNED NOT NULL,
PRIMARY KEY (list_id, parent_list_id)
);
Now, let's start at the very beginning.
2 inherit from 1 and 4.
parent_list is empty, which means both 1 and 4 have no parents. So it's fine in this case.
After this step, parent_list should be:
list_id, parent_list_id
2, 1
2, 4
3 inherit from 2.
2 have two parents, 1 and 4. 3 isn't one of them. So it's fine again.
Now parent_list becomes(Note that 2's parents should be 3's parents also):
list_id, parent_list_id
2, 1
2, 4
3, 1
3, 4
3, 2
4 inherit from 3.
4 exists in 3's parent list. This will lead to a cycle. NO WAY!
To check whether the cycle will happen, you just need one simple SQL:
SELECT * FROM parent_list
WHERE list_id = potential_parent_id AND parent_list_id = potential_child_id;
Want to do all these things with one call? Apply a stored procedure:
CREATE PROCEDURE 'inherit'(
IN in_parent_id INT UNSIGNED,
IN in_child_id INT UNSIGNED
)
BEGIN
DECLARE result INT DEFAULT 0;
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
ROLLBACK;
SELECT -1;
END;
START TRANSACTION;
IF EXISTS(SELECT * FROM parent_list WHERE list_id = in_parent_id AND parent_list_id = in_child_id) THEN
SET result = 1; -- just some error code
ELSE
-- do your inserting here
-- update parent_list
INSERT INTO parent_list (SELECT in_child_id, parent_list_id FROM parent_list WHERE list_id = in_parent_id);
INSERT INTO parent_list VALUES (in_child_id, in_parent_id);
END IF;
COMMIT;
SELECT result;
END
When it comes to a multiple inheritance, just call inherit multiple times.
In the example you provide, the errant relationship is simple. It's the 3 -> 1 and 1-> 3 relationships. You could simply look for the inverse relationships when inserting a new row. If it exists, don't insert the new row.
If you add an auto-incrementing column, you could then identify the offending rows specifically.
On the other hand, if you are looking at existing rows, you could identify the errant rows using a simple SQL statement like:
SELECT
a.parent_id,
a.child_id
FROM list_relation a
JOIN list_relation b
ON a.child_id = b.parent_id AND a.parent_id = b.child_id
If you add an auto-incrementing column, you could then identify the offending rows specifically.
Your question title includes the word "prevent", so I presume you want to avoid adding the rows. To do so, you would need a ON BEFORE INSERT trigger that checks for an existing row and prevents the insert. You could also use an ON BEFORE UPDATE trigger to prevent existing rows from being changed to values that would be a problem.

MySQL insert on duplicate update for non-PRIMARY key

I am little confused with insert on duplicate update query.
I have MySQL table with structure like this:
record_id (PRIMARY, UNIQUE)
person_id (UNIQUE)
some_text
some_other_text
I want to update some_text and some_other_text values for person if it's id exists in my table.person or insert new record in this table otherwise. How it can be done if person_id is not PRIMARY?
You need a query that check if exists any row with you record_id (or person_id). If exists update it, else insert new row
IF EXISTS (SELECT * FROM table.person WHERE record_id='SomeValue')
UPDATE table.person
SET some_text='new_some_text', some_other_text='some_other_text'
WHERE record_id='old_record_id'
ELSE
INSERT INTO table.person (record_id, person_id, some_text, some_other_text)
VALUES ('new_record_id', 'new_person_id', 'new_some_text', 'new_some_other_text')
Another better approach is
UPDATE table.person SET (...) WHERE person_id='SomeValue'
IF ROW_COUNT()=0
INSERT INTO table.person (...) VALUES (...)
Your question is very valid. This is a very common requirement. And most people get it wrong, due to what MySQL offers.
The requirement: Insert unless the PRIMARY key exists, otherwise update.
The common approach: ON DUPLICATE KEY UPDATE
The result of that approach, disturbingly: Insert unless the PRIMARY or any UNIQUE key exists, otherwise update!
What can go horribly wrong with ON DUPLICATE KEY UPDATE? You insert a supposedly new record, with a new PRIMARY key value (say a UUID), but you happen to have a duplicate value for its UNIQUE key.
What you want is a proper exception, indicating that you are trying to insert a duplicate into a UNIQUE column.
But what you get is an unwanted UPDATE! MySQL will take the conflicting record and start overwriting its values. If this happens unintentionally, you have mutilated an old record, and any incoming references to the old record are now referencing the new record. And since you probably won't tell the query to update the PRIMARY column, your new UUID is nowhere to be found. If you ever encounter this data, it will probably make no sense and you will have no idea where it came from.
We need a solution to actually insert unless the PRIMARY key exists, otherwise update.
We will use a query that consists of two statements:
Update where the PRIMARY key value matches (affects 0 or 1 rows).
Insert if the PRIMARY key value does not exist (inserts 1 or 0 rows).
This is the query:
UPDATE my_table SET
unique_name = 'one', update_datetime = NOW()
WHERE id = 1;
INSERT INTO my_table
SELECT 1, 'one', NOW()
FROM my_table
WHERE id = 1
HAVING COUNT(*) = 0;
Only one of these queries will have an effect. The UPDATE is easy. As for the INSERT: WHERE id = 1 results in a row if the id exists, or no row if it does not. HAVING COUNT(*) = 0 inverts that, resulting in a row if the id is new, or no row if it already exists.
I have explored other variants of the same idea, such as with a LEFT JOIN and WHERE, but they all looked more convoluted. Improvements are welcome.
13.2.5.3 INSERT ... ON DUPLICATE KEY UPDATE Syntax
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that
would cause a duplicate value in a UNIQUE index or PRIMARY KEY, MySQL
performs an UPDATE of the old row.
Example:
DELIMITER //
DROP PROCEDURE IF EXISTS `sp_upsert`//
DROP TABLE IF EXISTS `table_test`//
CREATE TABLE `table_test` (
`record_id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`person_id` INT UNSIGNED NOT NULL,
`some_text` VARCHAR(50),
`some_other_text` VARCHAR(50),
UNIQUE KEY `record_id_index` (`record_id`),
UNIQUE KEY `person_id_index` (`person_id`)
)//
INSERT INTO `table_test`
(`person_id`, `some_text`, `some_other_text`)
VALUES
(1, 'AAA', 'XXX'),
(2, 'BBB', 'YYY'),
(3, 'CCC', 'ZZZ')//
CREATE PROCEDURE `sp_upsert`(
`p_person_id` INT UNSIGNED,
`p_some_text` VARCHAR(50),
`p_some_other_text` VARCHAR(50)
)
BEGIN
INSERT INTO `table_test`
(`person_id`, `some_text`, `some_other_text`)
VALUES
(`p_person_id`, `p_some_text`, `p_some_other_text`)
ON DUPLICATE KEY UPDATE `some_text` = `p_some_text`,
`some_other_text` = `p_some_other_text`;
END//
DELIMITER ;
mysql> CALL `sp_upsert`(1, 'update_text_0', 'update_text_1');
Query OK, 2 rows affected (0.00 sec)
mysql> SELECT
-> `record_id`,
-> `person_id`,
-> `some_text`,
-> `some_other_text`
-> FROM
-> `table_test`;
+-----------+-----------+---------------+-----------------+
| record_id | person_id | some_text | some_other_text |
+-----------+-----------+---------------+-----------------+
| 1 | 1 | update_text_0 | update_text_1 |
| 2 | 2 | BBB | YYY |
| 3 | 3 | CCC | ZZZ |
+-----------+-----------+---------------+-----------------+
3 rows in set (0.00 sec)
mysql> CALL `sp_upsert`(4, 'new_text_0', 'new_text_1');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT
-> `record_id`,
-> `person_id`,
-> `some_text`,
-> `some_other_text`
-> FROM
-> `table_test`;
+-----------+-----------+---------------+-----------------+
| record_id | person_id | some_text | some_other_text |
+-----------+-----------+---------------+-----------------+
| 1 | 1 | update_text_0 | update_text_1 |
| 2 | 2 | BBB | YYY |
| 3 | 3 | CCC | ZZZ |
| 5 | 4 | new_text_0 | new_text_1 |
+-----------+-----------+---------------+-----------------+
4 rows in set (0.00 sec)
SQL Fiddle demo
How about my approach?
Let's say you have one table with a autoincrement id and three text-columns. You want to insert/update the value of column3 with the values in column1 and column2 being a (non unique) key.
I use this query (without explicitly locking the table):
insert into myTable (id, col1, col2, col3)
select tmp.id, 'col1data', 'col2data', 'col3data' from
(select id from myTable where col1 = 'col1data' and col2 = 'col2data' union select null as id limit 1) tmp
on duplicate key update col3 = values(col3)
Anything wrong with that? For me it works the way I want.
A flexible solution should retain the atomicity offered by INSERT ... ON DUPLICATE KEY UPDATE and work regardless of if it's autocommit=true and not depend on a transaction with an isolation level of REPEATABLE READ or greater.
Any solution performing check-then-act across multiple statements would not satisfy this.
Here are the options:
If there tends to be more inserts than updates:
INSERT INTO table (record_id, ..., some_text, some_other_text) VALUES (...);
IF <duplicate entry for primary key error>
UPDATE table SET some_text = ..., some_other_text = ... WHERE record_id = ...;
IF affected-rows = 0
-- retry from INSERT OR ignore this conflict and defer to the other session
If there tends to be more updates than inserts:
UPDATE table SET some_text = ..., some_other_text = ... WHERE record_id = ...;
IF affected-rows = 0
INSERT INTO table (record_id, ..., some_text, some_other_text) VALUES (...);
IF <duplicate entry for primary key error>
-- retry from UPDATE OR ignore this conflict and defer to the other session
If you don't mind a bit of ugliness, you can actually use INSERT ... ON DUPLICATE KEY UPDATE and do this in a single statement:
INSERT INTO table (record_id, ..., some_text, some_other_text) VALUES (...)
ON DUPLICATE KEY UPDATE
some_text = if(record_id = VALUES(record_id), VALUES(some_text), some_text),
some_other_text = if(record_id = VALUES(record_id), VALUES(some_other_text), some_other_text)
IF affected-rows = 0
-- handle this as a unique check constraint violation
Note: affected-rows in these examples mean affected rows and not found rows. The two can be confused because a single parameter switches which of these values the client is returned.
Also note, if some_text and some_other_text are not actually modified (and the record is not otherwise changed) when you perform the update, those checks on affected-rows = 0 will misfire.
I came across this post because I needed what's written in the title, and I found a pretty handy solution, but no one mentioned it here, so I thought of pasting it here. Note that this solution is very handy if you're initiating your database tables. In this case, when you create your corresponding table, define your primary key etc. as usual, and for the combination of columns you want to be unique, simply add
UNIQUE(column_name1,column_name2,...)
at the end of your CREATE TABLE statement, for any combination of the specified columns you want to be unique. Like this, according to this page here, "MySQL uses the combination of values in both column column_name1 and column_name2 to evaluate the uniqueness", and reports an error if you try to make an insert which already has the combination of values for column_name1 and column_name2 you provide in your insert. Combining this way of creating a database table with the corresponding INSERT ON DUPLICATE KEY syntax appeared to be the most suitable solution for me. Just need to think of it carefully before you actually start using your table; when setting up your database tables.
For anyone else, like me, who is a DB noob....the above things didn't work for me. I have a primary key and a unique key... And I wanted to insert if unique key didn't exist. After a LOT of Stack Overflow and Google searching, I found not many results for this... but I did find a site that gave me a working answer: https://thispointer.com/insert-record-if-not-exists-in-mysql/
And for ease of reading here is my answer from that site:
INSERT INTO table (unique_key_column_name)
SELECT * FROM (SELECT 'unique_value' AS unique_key_column_name) AS temp
WHERE NOT EXISTS (
SELECT unique_key_column_name FROM table
WHERE unique_key_column_name = 'unique_value'
) LIMIT 1;
Please also note the ' marks are wrapped around for me because I use string in this case.

Recursive mysql select?

I saw this answer and i hope he is incorrect, just like someone was incorrect telling primary keys are on a column and I can't set it on multiple columns.
Here is my table
create table Users(id INT primary key AUTO_INCREMENT,
parent INT,
name TEXT NOT NULL,
FOREIGN KEY(parent)
REFERENCES Users(id)
);
+----+--------+---------+
| id | parent | name |
+----+--------+---------+
| 1 | NULL | root |
| 2 | 1 | one |
| 3 | 1 | 1down |
| 4 | 2 | one_a |
| 5 | 4 | one_a_b |
+----+--------+---------+
I'd like to select user id 2 and recurse so I get all its direct and indirect child (so id 4 and 5).
How do I write it in such a way this will work? I seen recursion in postgresql and sqlserver.
CREATE DEFINER = 'root'#'localhost'
PROCEDURE test.GetHierarchyUsers(IN StartKey INT)
BEGIN
-- prepare a hierarchy level variable
SET #hierlevel := 00000;
-- prepare a variable for total rows so we know when no more rows found
SET #lastRowCount := 0;
-- pre-drop temp table
DROP TABLE IF EXISTS MyHierarchy;
-- now, create it as the first level you want...
-- ie: a specific top level of all "no parent" entries
-- or parameterize the function and ask for a specific "ID".
-- add extra column as flag for next set of ID's to load into this.
CREATE TABLE MyHierarchy AS
SELECT U.ID
, U.Parent
, U.`name`
, 00 AS IDHierLevel
, 00 AS AlreadyProcessed
FROM
Users U
WHERE
U.ID = StartKey;
-- how many rows are we starting with at this tier level
-- START the cycle, only IF we found rows...
SET #lastRowCount := FOUND_ROWS();
-- we need to have a "key" for updates to be applied against,
-- otherwise our UPDATE statement will nag about an unsafe update command
CREATE INDEX MyHier_Idx1 ON MyHierarchy (IDHierLevel);
-- NOW, keep cycling through until we get no more records
WHILE #lastRowCount > 0
DO
UPDATE MyHierarchy
SET
AlreadyProcessed = 1
WHERE
IDHierLevel = #hierLevel;
-- NOW, load in all entries found from full-set NOT already processed
INSERT INTO MyHierarchy
SELECT DISTINCT U.ID
, U.Parent
, U.`name`
, #hierLevel + 1 AS IDHierLevel
, 0 AS AlreadyProcessed
FROM
MyHierarchy mh
JOIN Users U
ON mh.Parent = U.ID
WHERE
mh.IDHierLevel = #hierLevel;
-- preserve latest count of records accounted for from above query
-- now, how many acrual rows DID we insert from the select query
SET #lastRowCount := ROW_COUNT();
-- only mark the LOWER level we just joined against as processed,
-- and NOT the new records we just inserted
UPDATE MyHierarchy
SET
AlreadyProcessed = 1
WHERE
IDHierLevel = #hierLevel;
-- now, update the hierarchy level
SET #hierLevel := #hierLevel + 1;
END WHILE;
-- return the final set now
SELECT *
FROM
MyHierarchy;
-- and we can clean-up after the query of data has been selected / returned.
-- drop table if exists MyHierarchy;
END
It might appear cumbersome, but to use this, do
call GetHierarchyUsers( 5 );
(or whatever key ID you want to find UP the hierarchical tree for).
The premise is to start with the one KEY you are working with. Then, use that as a basis to join to the users table AGAIN, but based on the first entry's PARENT ID. Once found, update the temp table as to not try and join for that key again on the next cycle. Then keep going until no more "parent" ID keys can be found.
This will return the entire hierarchy of records up to the parent no matter how deep the nesting. However, if you only want the FINAL parent, you can use the #hierlevel variable to return only the latest one in the file added, or ORDER BY and LIMIT 1
I know there is probably better and more efficient answer above but this snippet gives a slightly different approach and provides both - ancestors and children.
The idea is to constantly insert relative rowIds into temporary table, then fetch a row to look for it's relatives, rinse repeat until all rows are processed. Query can be probably optimized to use only 1 temporary table.
Here is a working sqlfiddle example.
CREATE TABLE Users
(`id` int, `parent` int,`name` VARCHAR(10))//
INSERT INTO Users
(`id`, `parent`, `name`)
VALUES
(1, NULL, 'root'),
(2, 1, 'one'),
(3, 1, '1down'),
(4, 2, 'one_a'),
(5, 4, 'one_a_b')//
CREATE PROCEDURE getAncestors (in ParRowId int)
BEGIN
DECLARE tmp_parentId int;
CREATE TEMPORARY TABLE tmp (parentId INT NOT NULL);
CREATE TEMPORARY TABLE results (parentId INT NOT NULL);
INSERT INTO tmp SELECT ParRowId;
WHILE (SELECT COUNT(*) FROM tmp) > 0 DO
SET tmp_parentId = (SELECT MIN(parentId) FROM tmp);
DELETE FROM tmp WHERE parentId = tmp_parentId;
INSERT INTO results SELECT parent FROM Users WHERE id = tmp_parentId AND parent IS NOT NULL;
INSERT INTO tmp SELECT parent FROM Users WHERE id = tmp_parentId AND parent IS NOT NULL;
END WHILE;
SELECT * FROM Users WHERE id IN (SELECT * FROM results);
END//
CREATE PROCEDURE getChildren (in ParRowId int)
BEGIN
DECLARE tmp_childId int;
CREATE TEMPORARY TABLE tmp (childId INT NOT NULL);
CREATE TEMPORARY TABLE results (childId INT NOT NULL);
INSERT INTO tmp SELECT ParRowId;
WHILE (SELECT COUNT(*) FROM tmp) > 0 DO
SET tmp_childId = (SELECT MIN(childId) FROM tmp);
DELETE FROM tmp WHERE childId = tmp_childId;
INSERT INTO results SELECT id FROM Users WHERE parent = tmp_childId;
INSERT INTO tmp SELECT id FROM Users WHERE parent = tmp_childId;
END WHILE;
SELECT * FROM Users WHERE id IN (SELECT * FROM results);
END//
Usage:
CALL getChildren(2);
-- returns
id parent name
4 2 one_a
5 4 one_a_b
CALL getAncestors(5);
-- returns
id parent name
1 (null) root
2 1 one
4 2 one_a

How to delete duplicates on a MySQL table?

I need to DELETE duplicated rows for specified sid on a MySQL table.
How can I do this with an SQL query?
DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"
Something like this, but I don't know how to do it.
This removes duplicates in place, without making a new table.
ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)
Note: This only works well if index fits in memory.
Suppose you have a table employee, with the following columns:
employee (first_name, last_name, start_date)
In order to delete the rows with a duplicate first_name column:
delete
from employee using employee,
employee e1
where employee.id > e1.id
and employee.first_name = e1.first_name
Deleting duplicate rows in MySQL in-place, (Assuming you have a timestamp col to sort by) walkthrough:
Create the table and insert some rows:
create table penguins(foo int, bar varchar(15), baz datetime);
insert into penguins values(1, 'skipper', now());
insert into penguins values(1, 'skipper', now());
insert into penguins values(3, 'kowalski', now());
insert into penguins values(3, 'kowalski', now());
insert into penguins values(3, 'kowalski', now());
insert into penguins values(4, 'rico', now());
select * from penguins;
+------+----------+---------------------+
| foo | bar | baz |
+------+----------+---------------------+
| 1 | skipper | 2014-08-25 14:21:54 |
| 1 | skipper | 2014-08-25 14:21:59 |
| 3 | kowalski | 2014-08-25 14:22:09 |
| 3 | kowalski | 2014-08-25 14:22:13 |
| 3 | kowalski | 2014-08-25 14:22:15 |
| 4 | rico | 2014-08-25 14:22:22 |
+------+----------+---------------------+
6 rows in set (0.00 sec)
Remove the duplicates in place:
delete a
from penguins a
left join(
select max(baz) maxtimestamp, foo, bar
from penguins
group by foo, bar) b
on a.baz = maxtimestamp and
a.foo = b.foo and
a.bar = b.bar
where b.maxtimestamp IS NULL;
Query OK, 3 rows affected (0.01 sec)
select * from penguins;
+------+----------+---------------------+
| foo | bar | baz |
+------+----------+---------------------+
| 1 | skipper | 2014-08-25 14:21:59 |
| 3 | kowalski | 2014-08-25 14:22:15 |
| 4 | rico | 2014-08-25 14:22:22 |
+------+----------+---------------------+
3 rows in set (0.00 sec)
You're done, duplicate rows are removed, last one by timestamp is kept.
For those of you without a timestamp or unique column.
You don't have a timestamp or a unique index column to sort by? You're living in a state of degeneracy. You'll have to do additional steps to delete duplicate rows.
create the penguins table and add some rows
create table penguins(foo int, bar varchar(15));
insert into penguins values(1, 'skipper');
insert into penguins values(1, 'skipper');
insert into penguins values(3, 'kowalski');
insert into penguins values(3, 'kowalski');
insert into penguins values(3, 'kowalski');
insert into penguins values(4, 'rico');
select * from penguins;
# +------+----------+
# | foo | bar |
# +------+----------+
# | 1 | skipper |
# | 1 | skipper |
# | 3 | kowalski |
# | 3 | kowalski |
# | 3 | kowalski |
# | 4 | rico |
# +------+----------+
make a clone of the first table and copy into it.
drop table if exists penguins_copy;
create table penguins_copy as ( SELECT foo, bar FROM penguins );
#add an autoincrementing primary key:
ALTER TABLE penguins_copy ADD moo int AUTO_INCREMENT PRIMARY KEY first;
select * from penguins_copy;
# +-----+------+----------+
# | moo | foo | bar |
# +-----+------+----------+
# | 1 | 1 | skipper |
# | 2 | 1 | skipper |
# | 3 | 3 | kowalski |
# | 4 | 3 | kowalski |
# | 5 | 3 | kowalski |
# | 6 | 4 | rico |
# +-----+------+----------+
The max aggregate operates upon the new moo index:
delete a from penguins_copy a left join(
select max(moo) myindex, foo, bar
from penguins_copy
group by foo, bar) b
on a.moo = b.myindex and
a.foo = b.foo and
a.bar = b.bar
where b.myindex IS NULL;
#drop the extra column on the copied table
alter table penguins_copy drop moo;
select * from penguins_copy;
#drop the first table and put the copy table back:
drop table penguins;
create table penguins select * from penguins_copy;
observe and cleanup
drop table penguins_copy;
select * from penguins;
+------+----------+
| foo | bar |
+------+----------+
| 1 | skipper |
| 3 | kowalski |
| 4 | rico |
+------+----------+
Elapsed: 1458.359 milliseconds
What's that big SQL delete statement doing?
Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp [ or max moo ] grouped by columns foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.
Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.
Make a backup of the table before you run this.
Prevent this problem from ever happening again on this table:
If you got this to work, and it put out your "duplicate row" fire. Great. Now define a new composite unique key on your table (on those two columns) to prevent more duplicates from being added in the first place.
Like a good immune system, the bad rows shouldn't even be allowed in to the table at the time of insert. Later on all those programs adding duplicates will broadcast their protest, and when you fix them, this issue never comes up again.
Following remove duplicates for all SID-s, not only single one.
With temp table
CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;
DROP TABLE table;
RENAME TABLE table_temp TO table;
Since temp_table is freshly created it has no indexes. You'll need to recreate them after removing duplicates. You can check what indexes you have in the table with SHOW INDEXES IN table
Without temp table:
DELETE FROM `table` WHERE id IN (
SELECT all_duplicates.id FROM (
SELECT id FROM `table` WHERE (`title`, `SID`) IN (
SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
)
) AS all_duplicates
LEFT JOIN (
SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
) AS grouped_duplicates
ON all_duplicates.id = grouped_duplicates.id
WHERE grouped_duplicates.id IS NULL
)
After running into this issue myself, on a huge database, I wasn't completely impressed with the performance of any of the other answers. I want to keep only the latest duplicate row, and delete the rest.
In a one-query statement, without a temp table, this worked best for me,
DELETE e.*
FROM employee e
WHERE id IN
(SELECT id
FROM (SELECT MIN(id) as id
FROM employee e2
GROUP BY first_name, last_name
HAVING COUNT(*) > 1) x);
The only caveat is that I have to run the query multiple times, but even with that, I found it worked better for me than the other options.
This always seems to work for me:
CREATE TABLE NoDupeTable LIKE DupeTable;
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;
Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.
I've also taken to doing the following so that the dupe issue no longer occurs after the removal:
CREATE TABLE NoDupeTable LIKE DupeTable;
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;
In other words, I create a duplicate of the first table, add a unique index on the fields I don't want duplicates of, and then do an Insert IGNORE which has the advantage of not failing as a normal Insert would the first time it tried to add a duplicate record based on the two fields and rather ignores any such records.
Moving fwd it becomes impossible to create any duplicate records based on those two fields.
The following works for all tables
CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;
Here is a simple answer:
delete a from target_table a left JOIN (select max(id_field) as id, field_being_repeated
from target_table GROUP BY field_being_repeated) b
on a.field_being_repeated = b.field_being_repeated
and a.id_field = b.id_field
where b.id_field is null;
This work for me to remove old records:
delete from table where id in
(select min(e.id)
from (select * from table) e
group by column1, column2
having count(*) > 1
);
You can replace min(e.id) to max(e.id) to remove newest records.
delete p from
product p
inner join (
select max(id) as id, url from product
group by url
having count(*) > 1
) unik on unik.url = p.url and unik.id != p.id;
I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.
As I stated in my comment, that solution hasn't been properly explained though.
So this is mine, based on it.
1) add a new boolean column
alter table mytable add tokeep boolean;
2) add a constraint on the duplicated columns AND the new column
alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);
3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint
update ignore mytable set tokeep = true;
4) delete rows that have not been marked as tokeep
delete from mytable where tokeep is null;
5) drop the added column
alter table mytable drop tokeep;
I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.
This procedure will remove all duplicates (incl multiples) in a table, keeping the last duplicate. This is an extension of Retrieving last record in each group
Hope this is useful to someone.
DROP TABLE IF EXISTS UniqueIDs;
CREATE Temporary table UniqueIDs (id Int(11));
INSERT INTO UniqueIDs
(SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
(T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields
AND T1.ID < T2.ID)
WHERE T2.ID IS NULL);
DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);
Another easy way... using UPDATE IGNORE:
U have to use an index on one or more columns (type index).
Create a new temporary reference column (not part of the index). In this column, you mark the uniques in by updating it with ignore clause. Step by step:
Add a temporary reference column to mark the uniques:
ALTER TABLE `yourtable` ADD `unique` VARCHAR(3) NOT NULL AFTER `lastcolname`;
=> this will add a column to your table.
Update the table, try to mark everything as unique, but ignore possible errors due to to duplicate key issue (records will be skipped):
UPDATE IGNORE `yourtable` SET `unique` = 'Yes' WHERE 1;
=> you will find your duplicate records will not be marked as unique = 'Yes', in other words only one of each set of duplicate records will be marked as unique.
Delete everything that's not unique:
DELETE * FROM `yourtable` WHERE `unique` <> 'Yes';
=> This will remove all duplicate records.
Drop the column...
ALTER TABLE `yourtable` DROP `unique`;
If you want to keep the row with the lowest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.email = n2.email
If you want to keep the row with the highest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.email = n2.email
Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid to handle big data sources (with examples for different use cases).
Ali, in your case, you can run something like this:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;
delete from `table` where `table`.`SID` in
(
select t.SID from table t join table t1 on t.title = t1.title where t.SID > t1.SID
)
Love #eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay when I try to run it). So I limited the join query to only consider the duplicate rows and I ended up with:
DELETE a FROM penguins a
LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
FROM penguins
GROUP BY deviceId HAVING num > 1) b
ON a.baz != b.keepBaz
AND a.foo = b.foo
WHERE b.foo IS NOT NULL
The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored. Change MIN(baz) to MAX(baz) to keep the last instance instead of the first.
This works for large tables:
CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;
DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;
To delete oldest change max(id) to min(id)
This here will make the column column_name into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value for column_name.
ALTER IGNORE TABLE `table_name` ADD PRIMARY KEY (`column_name`);
I think this will work by basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.
Creates a carbon copy of your table
create table temp_table like oldtablename;
insert temp_table select * from oldtablename;
Empties your original table
DELETE * from oldtablename;
Copies all distinct values from the copied table back to your original table
INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob
Deletes your temp table.
Drop Table temp_table
You need to group by aLL fields that you want to keep distinct.
DELETE T2
FROM table_name T1
JOIN same_table_name T2 ON (T1.title = T2.title AND T1.ID <> T2.ID)
here is how I usually eliminate duplicates
add a temporary column, name it whatever you want(i'll refer as active)
group by the fields that you think shouldn't be duplicate and set their active to 1, grouping by will select only one of duplicate values(will not select duplicates)for that columns
delete the ones with active zero
drop column active
optionally(if fits to your purposes), add unique index for those columns to not have duplicates again
You could just use a DISTINCT clause to select the "cleaned up" list (and here is a very easy example on how to do that).
Could it work if you count them, and then add a limit to your delete query leaving just one?
For example, if you have two or more, write your query like this:
DELETE FROM table WHERE SID = 1 LIMIT 1;
There are just a few basic steps when removing duplicate data from your table:
Back up your table!
Find the duplicate rows
Remove the duplicate rows
Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473