I'm using this query to delete unique records from one table.
DELETE FROM TABLE 1 WHERE ID NOT IN (SELECT ID form TABLE 2)
But the problem is that both the tables have millions of records and using subquery will be very slow.
Can anyone tell me any alternative.
Delete t1
from table_1 t1
left join table_2 t2 on t1.id = t2.id
where t2.id is null
SubQuery are really slow infact joins exists!
DELETE table1
FROM table1 LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id is null
Deleting millions of records from a table always have performance issues; you need to check if the table has -
1. Constraints
2. Triggers, &
3. Indexes
on it. These things will make your delete even slower...
Please disable them before this activity. You should also check ratio of the "to be deleted" records to the entire table volume. If the number of records to be deleted is more than 50% of the entire table volume then you should consider below approach -
Create a temporary table containing records that you want to retain from the original table.
Drop the original table.
Rename temporary table to original table.
Before going for the above approach, please make sure that you have a copy of the definition of each of the objects dependent on this original table like the constraints, indexes, triggers etc. You may also need to check if the table that you are going to delete has any children.
Once this activity is complete, you can enable the constraints, indexes, triggers again!
Thanks,
Aditya
Related
I have a MySQL database with just 1 table:
Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.
Indexes exist on both btcaddress and txid.
Data in it looks like this:
I need to delete all "deletable" record pairs. An example is given in red.
Conditions are:
txid must be the same (there can be more than 2 records with same txid)
vinvoutnr must be the same
vin must be different (can have only 2 values 0 and 1, so 1 must be 0 other must be 1)
In a table of 36M records, about 33M records will be deleted.
I've used this:
delete t1
from registration t1
inner join registration t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
It works but takes 5 hours.
Maybe this would work too (not tested yet):
delete t1
from registration as t1, registration as t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?
Database can be offline for this delete query.
Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:
create table temp_registration as
<query for the rows to keep here>;
truncate table registration;
insert into registration
select *
from temp_registration;
Your logic is a bit hard to follow, but I think the logic on the rows to keep is:
select r.*
from registration r
where not exists (select 1
from registration r2
where r2.txid = r.txid and
r2.vinvoutnr = r.vinvoutnr and
r2.vin <> r.vin
);
For best performance, you want an index on registration(txid, vinvoutnr, vin).
Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.
I'm trying to adapt the solutions here (SQL Delete Rows Based on Another Table) to my needs. E.g.,
DELETE
FROM complete_set
WHERE slice_name IN (SELECT slice_name FROM changes
GROUP BY slice_name HAVING COUNT(slice_name) > 1);
Tables definitions:
Table1 ... Name: changes, Fields: Id, slice_name, slice_value, Rows: Approx. 100 Thousand.
Table2 ... Name: complete_set, Fields: Id, slice_name, slice_value, Rows: Approx. 3 million.
While running the query's components individually is extremely fast ...
E.g.,
SELECT slice_name
FROM changes
GROUP BY slice_name
HAVING COUNT(sym) > 1;
(off-the-cuff about a second), and
DELETE FROM complete_set
WHERE slice_name = 'ABC'
(also about a second, or so)
The above solution (w/ subquery) takes too long to execute be useful. Is there an optimization I can apply here?
Thanks for the assist.
One possible explanation for the slow delete is that takes some time for MySQL to lookup each slice_name in the complete_set table against the values in the subquery. We can try speeding this up as follows. First, create a new table to replace the subquery, which will serve as a materialized view:
CREATE TEMPORARY TABLE changes_view
(PRIMARY KEY pkey (slice_name))
SELECT slice_name
FROM changes
GROUP BY slice_name
HAVING COUNT(slice_name) > 1;
Now phrase your delete using a join:
DELETE t1
FROM complete_set t1
INNER JOIN changes_view t2
ON t1.slice_name = t2.slice_name;
The (intended) trick here is that the delete join should run fast because MySQL can quickly lookup a slice_name value in the complete_set table against the materialized view table, since the latter has an index on slice_name.
If the table size is too big the above execution will definitely take lot of time because the inner query shall run for every outer query row during the deletion.
The deletion would be much quicker if all the individual deletion statement is defined separately and executed in a batch or sequentially.
I have two tables T1 and T2 and want to update one field of T1 from T2 where T2 holds massive data.
What is more efficient?
Updating T1 in a for loop iteration over the values
or
Left join it with T2 and update.
Please note that i'm updating these tables in a shell script
In general, the JOIN will always work much better than a loop. The size should not be an issue if it is properly indexed.
There is no simple answer which will be more effective, it will depend on table size and data size to which you are going to update in one go.
Suppose you are using innodb engine and trying to update 1,000 or more rows in one go with 2 heavy tables join and it is quite frequent then it will not be good idea on production server as it will lock your table for some time and due to this locking some other operations also can be hit on your production server.
Option1: If you are trying to update few rows and based on proper indexed fields (preferred based on primary key) then you can go with join.
Option2: If you are trying to update a large amount of data based on multiple tables join then below option will be better:
Step1: Create a stored procedure.
Step2: Keep below query results in a cursor.
suppose you want TO UPDATE corresponding field2 DATA of TABLE table2 IN field1 of TABLE table1:
SELECT a.primary_key,b.field2 FROM table1 a JOIN table2 b ON a.primary_key=b.foreign_key WHERE [place CONDITION here IF any...];
Step3: Now update all rows one by one based on primary key using stored values in cursor.
Step4: You can call this stored procedure from your script.
Let's say I have 5 MyISAM tables in my database. Each table has a key, let's call it "id_num" ...
"id_num" is the field which I use to connect all the tables together. A certain value of "id_num" may appear in all tables or sometimes only a subset of the tables.
If I want to delete all instances of a certain "id_num" in the database, can I just make a DELETE command on all tables or should I check to see if that value for "id_num" exists?
DELETE * FROM table1 WHERE id_num = 123;
DELETE * FROM table2 WHERE id_num = 123;
DELETE * FROM table3 WHERE id_num = 123;
DELETE * FROM table4 WHERE id_num = 123;
DELETE * FROM table5 WHERE id_num = 123;
Or should I perform a SELECT command first on each table to check if these rows exist in the table before deletion? What is best practice?
(I am using MyISAM so cascading delete is not an option here.)
To answer your question about first running SELECT, there's no advantage to doing so. If there's no row in a given table, then the DELETE will simply affect zero rows. If there are matching rows, then doing the SELECT first and then the DELETE would just be doing double the work of finding the rows. So just do the DELETE and get it over with.
Are you aware that MySQL has multi-table DELETE syntax?
If you are certain that table1 has a matching row, you can use outer joins for the others:
DELETE table1.*, table2.*, table3.*, table4.*, table5.*
FROM table1
LEFT OUTER JOIN table2 USING (id_num)
LEFT OUTER JOIN table3 USING (id_num)
LEFT OUTER JOIN table4 USING (id_num)
LEFT OUTER JOIN table5 USING (id_num)
WHERE table1.idnum = 123;
I'm assuming id_num is indexed in all these tables, otherwise doing the JOIN will perform poorly. But doing the DELETE without the aid of an index to find the rows would perform poorly too.
Sounds like you need to change your design as follows - have a table with id_num as a PK and make id_num a FK in the other tables, with on-delete-cascade. This will allow you to only run a single delete statement to delete all applicable data (and this is also generally the more correct way of doing things).
The above apparently doesn't work in MyISAM, but there is a workaround using triggers (but now it does seem like a less appealing option).
But I believe your above queries should work, no need to check if something exists first, DELETE will just not do anything.
Most APIs provide you with some sort of rows affected count if you'd like to see whether data was actually deleted.
You should not execute select query before deleting from the table. As select query will put some extra load to the server. However after executing delete query you can check how many rows has been deleted using mysql_affected_rows() function in php.
I have table1(id_table1) and table2(id_table2, id_table1). I'd like to remove records in table2 (under a given condition) but then also remove items in table1 that have no more relationships to table2. What is the most efficient way to do that in SQL? I'm using mySql.
Thanks in advance!
If you use InnoDB, add a foreign key constraint with an ON DELETE CASCADE. This will automatically delete the rows if the relationship is no longer correct. That way, you don't have to query the database after deleting rows in table2 to check if the relation is still intact.
Foreign key constraints
In addition to cularis's answer a less efficient option, if you're using MyISAM you don't have foreign key constraints.
Create a trigger:
DELIMITER $$
CREATE TRIGGER ad_table1_each AFTER DELETE ON table2 FOR EACH ROW
BEGIN
DELETE FROM table1 WHERE table1.table2_id = OLD.id;
END $$
DELIMITER ;
http://dev.mysql.com/doc/refman/5.5/en/triggers.html
http://dev.mysql.com/doc/refman/5.5/en/create-trigger.html
Assuming you did not set up any cascading deletes and since you asked how to do it in sql, i can see two options:
1)delete from table2 where (condition)
delete from table1 where id not in (select distinct id_table1 from table2)
2)delete from table1 where id in (select distinct id_table1 from table2 where (condition))
delete from table2 where id_table2 not in (select id from table1)
Assuming table2 size is much larger than table1, and the condition considerably shortens the size
method 1 scans full table2 once,deletes many records,then scans once
method 2 scans full table2 twice
This makes me think method 1 is little bit more efficient if the sizes of tables are very very large.