Delete with join not completing due to high number of rows - mysql

I have a table with approximately 550k rows. I'm trying to execute this query:
DELETE t1 FROM categories t1
INNER JOIN categories t2
WHERE t1.id < t2.id
AND t1.name = t2.name
AND t1.book_id = t2.book_id
Unfortunately the shell freezes up and I can tell by counting the rows in another shell that nothing is happening.
Is there any way to buffer this query, or solve this issue in another way?
Any help is appreciated.

If you need to delete a large number of rows, it is usually more efficient to move the rows you want to retain to another table, truncate the original table, then insert back into it:
-- move the rows we want to keep
create table t1_tmp as
select name, book_id, min(id) id from t1 group by name, book_id;
-- empty the table - back it up first!
truncate table t1;
-- insert back into the table
insert into t1 (id, name, book_id)
select id, name, book_id from t1_tmp;
-- drop the temporary table
drop table t1_tmp;

This index should help:
INDEX(name, book_id, id)

Related

MySQL: delete query taking too long

I am trying to delete records from table with duplicate column values but it's taking forever. Basically it gets stuck and no response for hours. I have a significantly large table with over 1.3M records. Is the query inefficient? any wat to optimize it?
delete n1 from ids n1, ids n2 where n1.id > n2.id and n1.user_id = n2.user_id
Database is remote, and am using putty to run queries.
Add an index:
ALTER TABLE ids ADD INDEX (user_id, id);
This makes it efficient to find all the rows with the same user ID and higher IDs.
It will also help to join with a subquery.
DELETE n1
FROM ids AS n1
JOIN (SELECT user_id, MIN(id) AS minid
FROM ids
GROUP BY user_id) AS n2
ON n1.user_id = n2.user_id AND n1.id > n2.minid
This will still be faster with the above index.
yes, that query is very inefficient. Even if you used explicit joins you need to keep in mind that basically every row "N" is being matched up with every row before "N", and every row "N-1" is being matched up with the rows before it.
Try something like this:
DROP TEMPORARY TABLE IF EXISTS keeps;
CREATE TEMPORARY TABLE keeps (
user_id INT,
keepID INT,
INDEX (user_id, keepID)
)
INSERT INTO keeps (user_id, keepID)
SELECT user_id, MIN(id) As keepID
FROM ids
GROUP BY user_id;
DELETE FROM ids WHERE (user_id, id) NOT IN (SELECT user_id, keepID FROM keeps);
DROP TEMPORARY TABLE IF EXISTS keeps;
I'm also tempted to suggest trying something like the below, but I can't remember if MySQL allows subquerying the delete table in the delete query ... which is why I suggested the temp table in the first one.
DELETE a
FROM ids AS a
WHERE EXISTS (
SELECT *
FROM ids AS b
WHERE b.id < a.id
AND b.user_id = a.user_id
)

Update/Insert column values with another table's group by result in mysql

I have an empty table (t1) and I want to insert or update the t1.uid column from another table's (t2) GROUP BY uid values.
So far I have tried like this:
UPDATE table1 t1 JOIN
(SELECT uid FROM table2 GROUP BY uid) t2
SET t1.uid = t2.uid;
but it's not working for me.
N.B. I've got a massive data set for which group by (uid from table-t2) results giving me total 1114732 results which I have to insert/update in t1 table's uid column.
Please try this:
Insert into table1(uid)
select distinct uid from table2
If table1 is empty, then UPDATE is not the correct verb. Would this suit your needs?
INSERT into table1 SELECT distinct uid from table2;
INSERT ... SELECT docs

Alternative to in operator in mysql

I see In operator alternative in mysql
I have nearly 25,000 ids.I am using in operator on that.Then i am getting Stackoverflow Exception.Is there any other alternative for IN operator in mysql.
Thanks in advance..
If the ID's are in another table:
SELECT * FROM table1 WHERE id IN (SELECT id FROM table2);
then you can use a join instead:
SELECT table1.* FROM table1 INNER JOIN table2 ON table1.id = table2.id;
You could do the following:
1 - Create a MySQL Temporary Table
CREATE TEMPORARY TABLE tempIdTable (id int unsigned not null primary key);
2 - Insert All Your ids into the Temporary Table
For every id in your list:
insert ignore into myId (id) values (anId);
(this will have the added bonus of de-duplicating your list of ids ready for the final step)
3 - Join Against the Temporary Table
SELECT t1.* FROM myTable1 t1 INNER JOIN tempIdTable tt ON t1.id = tt.id;
The temporary table will disappear as soon as your connection is dropped so your don't have to worry about dropping it before you create it next time.

deleting duplicate records on mysql?

I have this mysql query that finds duplicates and the number of occurances for each topic:
SELECT name,
COUNT(name) AS NumOccurrences
FROM topics
GROUP BY name
HAVING ( COUNT(name) > 1 )
but what I want to do is delete all the duplicates that are found. I only want one unique name for each topic, and no duplicates!! thanks
DELETE t2
FROM topics t1
JOIN topics t2
ON t2.name = t1.name
AND t2.id < t1.id
I would copy all the unique entries to a new table:
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY unique_column_name;
Check the data, then delete your old table when you're sure everything's good and rename the new table to the old one.
Then make the name column unique so you won't have to do this again.
Cheers

How to remove duplicate entries from a mysql db?

I have a table with some ids + titles. I want to make the title column unique, but it has over 600k records already, some of which are duplicates (sometimes several dozen times over).
How do I remove all duplicates, except one, so I can add a UNIQUE key to the title column after?
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Edit: Note that this command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
Create a new table with just the distinct rows of the original table. There may be other ways but I find this the cleanest.
CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table
More specifically:
The faster way is to insert distinct rows into a temporary table. Using delete, it took me a few hours to remove duplicates from a table of 8 million rows. Using insert and distinct, it took just 13 minutes.
CREATE TABLE tempTableName LIKE tableName;
CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;
DROP TABLE tableName;
INSERT tableName SELECT * FROM tempTableName;
DROP TABLE tempTableName;
Since the MySql ALTER IGNORE TABLE has been deprecated, you need to actually delete the duplicate date before adding an index.
First write a query that finds all the duplicates. Here I'm assuming that email is the field that contains duplicates.
SELECT
s1.email
s1.id,
s1.created
s2.id,
s2.created
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
/* Emails are the same */
s1.email = s2.email AND
/* DON'T select both accounts,
only select the one created later.
The serial id could also be used here */
s2.created > s1.created
;
Next select only the unique duplicate ids:
SELECT
DISTINCT s2.id
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
;
Once you are sure that only contains the duplicate ids you want to delete, run the delete. You have to add (SELECT * FROM tblname) so that MySql doesn't complain.
DELETE FROM
student
WHERE
id
IN (
SELECT
DISTINCT s2.id
FROM
(SELECT * FROM student) AS s1
INNER JOIN
(SELECT * FROM student) AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
);
Then create the unique index:
ALTER TABLE
student
ADD UNIQUE INDEX
idx_student_unique_email(email)
;
Below query can be used to delete all the duplicate except the one row with lowest "id" field value
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id > t2.id AND t1.name = t2.name
In the similar way, we can keep the row with the highest value in 'id' as follows
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id < t2.id AND t1.name = t2.name
This shows how to do it in SQL2000. I'm not completely familiar with MySQL syntax but I'm sure there's something comparable
create table #titles (iid int identity (1, 1), title varchar(200))
-- Repeat this step many times to create duplicates
insert into #titles(title) values ('bob')
insert into #titles(title) values ('bob1')
insert into #titles(title) values ('bob2')
insert into #titles(title) values ('bob3')
insert into #titles(title) values ('bob4')
DELETE T FROM
#titles T left join
(
select title, min(iid) as minid from #titles group by title
) D on T.title = D.title and T.iid = D.minid
WHERE D.minid is null
Select * FROM #titles
delete from student where id in (
SELECT distinct(s1.`student_id`) from student as s1 inner join student as s2
where s1.`sex` = s2.`sex` and
s1.`student_id` > s2.`student_id` and
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
The solution posted by Nitin seems to be the most elegant / logical one.
However it has one issue:
ERROR 1093 (HY000): You can't specify target table 'student' for
update in FROM clause
This can however be resolved by using (SELECT * FROM student) instead of student:
DELETE FROM student WHERE id IN (
SELECT distinct(s1.`student_id`) FROM (SELECT * FROM student) AS s1 INNER JOIN (SELECT * FROM student) AS s2
WHERE s1.`sex` = s2.`sex` AND
s1.`student_id` > s2.`student_id` AND
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
Give your +1's to Nitin for coming up with the original solution.
Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way (with examples for different use cases).
In this case, something like this should work:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(id, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;