How can we optimize the delete query.
delete FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN(SELECT MaxID FROM temp)
ORDER BY id
LIMIT 1000
This select statement return "SELECT MaxID FROM temp" 35k lines and temp is a temporary table.
and select * FROM student_score WHERE
lesson_id IS NOT null return around 500k rows
I tried using limit and order by clause but doesn't result in faster ways
IN(SELECT...)` is, in many situations, really inefficient.
Use a multi-table DELETE. This involves a LEFT JOIN ... IS NULL, which is much more efficient.
Once you have mastered that, you might be able to get rid of the temp and simply fold it into the query.
Also more efficient is
WHERE NOT EXISTS ( SELECT 1 FROM temp
WHERE student_score.lesson_id = temp.MAXID )
Also, DELETEing a large number of rows is inherently slow. 1000 is not so bad; 35K is. The reason is the need to save all the potentially-deleted rows until "commit" time.
Other techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
Note that one of then explains a more efficient way to walk through the PRIMARY KEY (via id). Note that your query may have to step over lots of ids that have lesson_id IS NULL. That is, the LIMIT 1000 is not doing what you expected.
You can do it without order by :
DELETE FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN (SELECT MaxID FROM temp)
Or like this using left join which is more optimized in term of speed :
DELETE s
FROM student_score s
LEFT JOIN temp t1 ON s.id = t1.MaxID
WHERE lesson_id IS NOT null and t1.MaxID is null;
I am looking to optimize below query which has a subquery from relation table and has a order by on subquery count data. Please see the below query:
SELECT table1.*,
( SELECT COUNT(*)
FROM table2
WHERE table2.user_id=table1.id
AND table2.deleted = 0) AS table2_total
FROM table1
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
This query works well but it stuck when table2 has more than 50K data. I have also tried to use left join instead of sub query but that is even more slower:
SELECT table1.*,
COUNT(DISTINCT table2.id) as table2_total
FROM table1
LEFT JOIN table2 ON table2.user_id=table1.id
AND table2.deleted = 0
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
table2 already has indexes on user_id and deleted column. Please see below table2 structure:
Is there any way to optimize this query in better way?
As written, it will go through the entirety of table1, and probe table2 that many times.
Add this composite index to table2: INDEX(user_id, deleted) and remove the INDEX(user_id) that you currently seem to have.
You can try to add index to the column table2.deleted And table1.parent_id. The index is going to impact the performance of the insert .
does anyone know why joining to an empty temp table is very slow? When I have data in the temp table, the query runs in 0.2 seconds, but when the temp table is empty it takes 62 seconds to return an empty table. In my code, table1 is the empty table. Joining to an empty table should always result in an empty table, so why does this take so long?
drop table if exists table1;
CREATE TEMPORARY TABLE IF NOT EXISTS table1 AS
(
select
username, channelnumber, LINKEDCHANNELDATA.ID
from
voijavuusers.tbluserdata USERDATA
left join
voijavuusers.tbllinkedchanneldata LINKEDCHANNELDATA ON USERDATA.userguid = LINKEDCHANNELDATA.userguid
where
USERDATA.username = 'tatatata'
);
select
CALLDATA.id,
CALLDATA.chanid,
POPUPDATA.textboxfield1
from
trmsmain.tblcalldata CALLDATA
left join trmsmain.tblpopupdata POPUPDATA on CALLDATA.recordguid = POPUPDATA.recordguid
join
(select
username,
channelnumber,
ID
from
table1
where
username = 'tatatata') LINKEDCHANNELS ON CALLDATA.chanid = LINKEDCHANNELS.channelnumber
order by CALLDATA.id desc limit 100000;
This is a question about how the query is optimized -- the query plan. I would suggest removing the subquery around TABLE1. This should help the optimizer:
select cd.id, cd.chanid, pud.textboxfield1
from trmsmain.tblcalldata cd left join
trmsmain.tblpopupdata pud
on cd.recordguid = pud.recordguid join
from table1 t1
on cd.chanid = t1.channelnumber
where t1.username = 'Installer'
order by cd.id desc
limit 100000;
This doesn't guarantee that the optimizer will choose the right execution path, but it gives it more information to go on. It shouldn't really make a difference, but I would also be inclined to put table1 first in the from clause.
do you think a query like this will create problem in the execution of my software?
I need to delete the all the table, except the last 2 groups of entries, grouped by the same time of insert.
delete from tableA WHERE time not in
(
SELECT time FROM
(select distinct time from tableA order by time desc limit 2
) AS tmptable
);
Do you have better solution? I'm using mysql 5.5
I don't see anything wrong with your query, but I prefer using an OUTER JOIN/NULL check (plus it alleviates the need for one of the nested subqueries):
delete a
from tableA a
left join
(
select distinct time
from tableA
order by time desc
limit 2
) b on a.time = b.time
where b.time is null
SQL Fiddle Demo
I'm coming from a Postgres background and trying to convert my application to MySQL. I have a query which is very fast on Postgres and very slow on MySQL. After doing some analysis, I have determined that one cause of the drastic speed difference is nested queries. The following pseudo query takes 170 ms on Postgres and 5.5 seconds on MySQL.
SELECT * FROM (
SELECT id FROM a INNER JOIN b
) AS first LIMIT 10
On both MySQL and Postgres the speed is the same for the following query (less than 10 ms)
SELECT id FROM a INNER JOIN b LIMIT 10
I have the exact same tables, indices, and data on both databases, so I really have no idea why this is so slow.
Any insight would be greatly appreciated.
Thanks
EDIT
Here is one specific example of why I need to do this. I need to get the sum of max. In order to do this I need a sub select as shown in the query below.
SELECT SUM(a) AS a
FROM (
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
) AS first
GROUP BY b
LIMIT 10
Again this query takes 14 seconds on MySQL and 238 ms on Postgres. Here is the output from explain on MySQL:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,25584,Using temporary; Using filesort
2,DERIVED,table2,index,PRIMARY,index_table2_on_b,index_table2_on_d,index_table2_on_issuance_datetime,index_table2_on_unassignment_datetime,index_table2_on_e,PRIMARY,4,\N,25584,Using where
2,DERIVED,tz,ref,index_table1_on_d,index_table1_on_read_datetime,index_table1_on_d_and_read_datetime,index_table1_on_4,4,db.table2.dosimeter_id,1,Using where
Jon, answering your comment, here is an example:
drop table if exists temp_preliminary_table;
create temporary table temp_preliminary_table
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b;
-- I suggest you add indexes to this temp table
alter table temp_preliminary_table
add index idx_b(b); -- Add as many indexes as you need
-- Now perform your query on this temp_table
SELECT SUM(a) AS a
FROM temp_preliminary_table
GROUP BY b
LIMIT 10;
This is just an example, splitting your query in three steps.
You need to remember that temp tables in MySQL are only visible to the connection that created them, so any other connection won't see temp tables created by another connection (for better or worse).
This "divide-and-conquer" approach has saved me many headaches. I hope it helps you.
In the nested query MySQL is doing the whole join before applying the limit while postgresql is smart enough to figure out that it is only necessary to join any 10 tuples.
Correct me if I am wrong, but why don't you try:
SELECT * FROM a INNER JOIN b LIMIT 10;
Given the fact that table2.id is the primary key this query with the limit in the inner query is functionally equivalent to yours where the limit is in the outer query and that is what the Postgresql planner figured out.
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
order by a desc
LIMIT 10