Consider a table table1 and its corresponding lookup table table2:
`table1`
columns: col1 | id
number of rows: ~2Mi
`table2`
columns: id <primary key> | col2
number of rows: ~1Mi
Since there are several ids in table2 which are not being used in table1, they can be deleted. This can be done with the following query:
DELETE FROM `table2` WHERE `id` IN (
SELECT * FROM (
SELECT `table2`.`id` FROM `table2`
LEFT JOIN `table1`
ON `table1`.`id` = `table2`.`id`
WHERE `table1`.`id` IS NULL
) AS p
)
However, this query is very slow (~100 rows per minute).
Question
Why this query is so slow? How to improve it?
You could rewrite it as:
DELETE FROM `table2`
WHERE NOT EXISTS(SELECT 1 FROM `table1` WHERE `table1`.id = `table2`.id);
Adding index on table2(id) and table1(id) will also help.
Related
I have two tables table1 and table2 their definitions are:
CREATE `table1` (
'table1_id' int(11) NOT NULL AUTO_INCREMENT,
'table1_name' VARCHAR(256),
PRIMARY KEY ('table1_id')
)
CREATE `table2` (
'table2_id' int(11) NOT NULL AUTO_INCREMENT,
'table1_id' int(11) NOT NULL,
'table1_name' VARCHAR(256),
PRIMARY KEY ('table2_id'),
FOREIGN KEY ('table1_id') REFERENCES 'table1' ('table1_id')
)
I want to know the number of rows in table1 that are NOT referenced in table2, that can be done with:
SELECT COUNT(t1.table1_id) FROM table1 t1
WHERE t1.table1_id NOT IN (SELECT t2.table1_id FROM table2 t2)
Is there a more efficient way of performing this query?
Upgrade to MySQL 5.6, which optimizes semi-joins against subqueries better.
See http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html
Or else use an exclusion join:
SELECT COUNT(t1.table1_id) FROM table1 t1
LEFT OUTER JOIN table2 t2 USING (table1_id)
WHERE t2.table1_id IS NULL
Also, make sure table2.table1_id has an index on it.
try using EXISTS.. its generally more efficient than IN
SELECT COUNT(t1.table1_id)
FROM table1 t1
WHERE EXISTS
( SELECT 1
FROM table2 t2
WHERE t2.table1_id <=> t1.table1_id
)
you can do it with NOT EXISTS as well
SELECT COUNT(t1.table1_id)
FROM table1 t1
WHERE NOT EXISTS
( SELECT 1
FROM table2 t2
WHERE t2.table1_id = t1.table1_id
)
EXISTS is generally faster because the execution plan is once it finds a hit, it will quit searching since the condition has proved true. The problem with IN is it will collect all the results from the subquery before further processing... and that takes longer
As #billkarwin noted in the comments EXISTS is using a dependent subquery.. Here is the explain on my two queries and also the OP's query.. http://sqlfiddle.com/#!2/53199d/5
How can i construct a SQL query to delete how i want.
I have two tables.
Table 1.
ID: Some Random Not Significant To This Question Columns : DateTime : UserID
Table 2.
ID: Some Random Not Significant To This Question Columns : DateTime : UserID
The two tables are related by DateTime and UserID
Is there anyway i can create a query so that it deletes from table 2 if no rows in table1 have a matching DateTime & UserID.
Thanks
You can use LEFT JOIN :
DELETE table2
FROM table2 t2 LEFT JOIN table1 t1 ON t1.`DateTime` = t2.`DateTime`
AND t1.`UserID` = t2.`UserID`
WHERE t1.`UserID` IS NULL
DELETE
FROM table2 t2
WHERE NOT EXISTS
(
SELECT NULL
FROM table1 t1
WHERE (t1.userId, t1.dateTime) = (t2.userId, t2.dateTime)
)
First of all: create a backup before you delete lots of records :)
The idea:
DELETE FROM
table1
WHERE
NOT EXISTS (SELECT 1 FROM table2 WHERE table1.referenceColumn = table2.referenceColumn)
You can check which records will be deleted by replacing the DELETE with SELECT *
And now the solution
DELETE FROM
table2
WHERE
NOT EXISTS (
SELECT 1 FROM
table1
WHERE
table2.UserID = table1.UserID
AND table2.DateTime = table1.DateTime
)
Okay so I have a table that has xid. Each xid can have several pids. I am trying to delete everything except the row that has the highest pid for each xid.
I am trying:
DELETE FROM table WHERE `pid` NOT IN
( SELECT MAX(`pid`)
FROM table
GROUP BY `xid`
)
If I use the same query but with SELECT instead of DELETE, I get all of the records that I want to delete. When the DELETE is there, I get the error:
#1093 - You can't specify target table 'mod_personnel' for update in FROM clause
Use a JOIN rather than NOT IN:
DELETE t1.* FROM table t1
LEFT JOIN (SELECT xid, MAX(pid) pid
FROM table
GROUP BY xid) t2
ON t1.pid = t2.pid
WHERE t2.pid IS NULL
DELETE FROM table WHERE `pid` NOT IN
(SELECT maxpid FROM
( SELECT MAX(`pid`) as maxpid
FROM table
GROUP BY `xid`
)as m
)
This might be something very simple to do. If so, I apologize. I'm still learning MySQL.
Say, I have two tables:
Table1:
`id` int autoincrement primary key
`Name` tinytext
`Phone` tinytext
`Date` etc.
and
Table2:
`id` int autoincrement primary key
`itmID` int
Each row in Table2 specifes the order at which elements should be selected out of Table1. itmID field in Table2 is linked to id field in Table1.
So right at this moment to select elements from Table1 I do this:
SELECT * FROM `Table1`;
But how do you order them according to Table2, something like this?
SELECT * FROM `Table1` ORDER BY <itmID's in Table2> ASC;
If all ids of the Table1 have an entry on Table2 use an INNER JOIN, like this.
SELECT * FROM Table1 t1
INNER JOIN Table2 t2 ON t1.id = t2.itmID
ORDER BY t2.itmID
If not all of them have an entry, then use a LEFT JOIN, like this:
SELECT * FROM Table1 t1
LEFT JOIN Table2 t2 ON t1.id = t2.itmID
ORDER BY t2.itmID
Select from the first table, join it to the second, and order by the second. Something like
SELECT *
FROM table1
LEFT JOIN table 2 on table.id = table2.id
ORDER by table2.itmID
Ryan's answer is almost right
SELECT *
FROM table1
INNER JOIN table2 on table1.id = table2.itmID
ORDER BY table2.id
http://dev.mysql.com/doc/refman/5.5/en/join.html
SELECT * FROM `Table1`
INNER JOIN `Table2` USING (`id`)
ORDER BY `Table2`.`itmID` ASC
I created a simple test case:
CREATE TABLE `t1` (
`id` int NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
)
CREATE TABLE `t2` (
`id2` int NOT NULL AUTO_INCREMENT,
`id1` int,
PRIMARY KEY (`id2`)
)
CREATE TABLE `t3` (
`id3` int NOT NULL AUTO_INCREMENT,
`id1` int,
PRIMARY KEY (`id3`)
)
insert into t1 (id) values (1);
insert into t2 (id1) values (1),(1);
insert into t3 (id1) values (1),(1),(1),(1);
I need to select all DISTINCT data from t1 left join t2 and DISTINCT data from t1 left join t3, returning a total of 6 rows ,1 x (2 [from t2] + 4 [from t3]) = 6, but beacause of the nature of this join I get 8 rows, 1 [from t1] x 2 [from t2] x 4 [from t3] = 8.
select * from t1 left join t2 on (t1.id = t2.id1);
2 rows in set (0.00 sec)
select * from t1 left join t3 on (t1.id = t3.id1);
4 rows in set (0.00 sec)
select * from t1 left join t2 on (t1.id = t2.id1) left join t3 on (t1.id = t3.id1);
8 rows in set (0.00 sec)
select * from t1 left join t2 on (t1.id = t2.id1) union select * from t1 left join t3 on (t1.id = t3.id1);
4 rows in set (0.00 sec)
What query should I use to get just the 6 rows I need, is it posible without subquery's or I need them (It will be more complicatet in the big query where I need this) ?
I need this for a big query where I allready get data from 8 tables, but I need to get data from 2 more to get all the data I need in just one single query, but when joining the 9th table, the returned data get's duplicated (the 9th table in this simple test case would be t3, and the 8th will be t2).
I hope someone could show me the right path to follow.
Thank you.
UPDATE SOLVED:
I realy don't know how to do this test case in one select, but in my BIG query I solved it this way: beacause I used group_concat and group by, I did it by spliting a value in multipe group_concat(DISTINCT ... ) and concat all of them like this
// instead of this
... group_concat(DISTINCT concat(val1, val2, val3)) ...
// I did this
concat(group_concat(DISTINCT val1,val2), group_concat(DISTINCT val1,val3)) ...
so the distinct on small groups of value prevent all of those duplicates.
I'm not sure if you're looking for at this solution
select * from t1 left join t2 on (t1.id = t2.id1);
union all
select * from t1 left join t3 on (t1.id = t3.id1);
I think there is a small mistake in #nick rulez's query. If it is written like this it really returns 6 rows:
(SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id1))
UNION ALL
(SELECT * FROM t1 LEFT JOIN t3 ON (t1.id = t3.id1))