optimize subquery with count and order by field - mysql

I am looking to optimize below query which has a subquery from relation table and has a order by on subquery count data. Please see the below query:
SELECT table1.*,
( SELECT COUNT(*)
FROM table2
WHERE table2.user_id=table1.id
AND table2.deleted = 0) AS table2_total
FROM table1
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
This query works well but it stuck when table2 has more than 50K data. I have also tried to use left join instead of sub query but that is even more slower:
SELECT table1.*,
COUNT(DISTINCT table2.id) as table2_total
FROM table1
LEFT JOIN table2 ON table2.user_id=table1.id
AND table2.deleted = 0
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
table2 already has indexes on user_id and deleted column. Please see below table2 structure:
Is there any way to optimize this query in better way?

As written, it will go through the entirety of table1, and probe table2 that many times.
Add this composite index to table2: INDEX(user_id, deleted) and remove the INDEX(user_id) that you currently seem to have.

You can try to add index to the column table2.deleted And table1.parent_id. The index is going to impact the performance of the insert .

Related

delete multiple rows in mysqldb

How can we optimize the delete query.
delete FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN(SELECT MaxID FROM temp)
ORDER BY id
LIMIT 1000
This select statement return "SELECT MaxID FROM temp" 35k lines and temp is a temporary table.
and select * FROM student_score WHERE
lesson_id IS NOT null return around 500k rows
I tried using limit and order by clause but doesn't result in faster ways
IN(SELECT...)` is, in many situations, really inefficient.
Use a multi-table DELETE. This involves a LEFT JOIN ... IS NULL, which is much more efficient.
Once you have mastered that, you might be able to get rid of the temp and simply fold it into the query.
Also more efficient is
WHERE NOT EXISTS ( SELECT 1 FROM temp
WHERE student_score.lesson_id = temp.MAXID )
Also, DELETEing a large number of rows is inherently slow. 1000 is not so bad; 35K is. The reason is the need to save all the potentially-deleted rows until "commit" time.
Other techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
Note that one of then explains a more efficient way to walk through the PRIMARY KEY (via id). Note that your query may have to step over lots of ids that have lesson_id IS NULL. That is, the LIMIT 1000 is not doing what you expected.
You can do it without order by :
DELETE FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN (SELECT MaxID FROM temp)
Or like this using left join which is more optimized in term of speed :
DELETE s
FROM student_score s
LEFT JOIN temp t1 ON s.id = t1.MaxID
WHERE lesson_id IS NOT null and t1.MaxID is null;

MYSQL joining to empty temp table is very slow

does anyone know why joining to an empty temp table is very slow? When I have data in the temp table, the query runs in 0.2 seconds, but when the temp table is empty it takes 62 seconds to return an empty table. In my code, table1 is the empty table. Joining to an empty table should always result in an empty table, so why does this take so long?
drop table if exists table1;
CREATE TEMPORARY TABLE IF NOT EXISTS table1 AS
(
select
username, channelnumber, LINKEDCHANNELDATA.ID
from
voijavuusers.tbluserdata USERDATA
left join
voijavuusers.tbllinkedchanneldata LINKEDCHANNELDATA ON USERDATA.userguid = LINKEDCHANNELDATA.userguid
where
USERDATA.username = 'tatatata'
);
select
CALLDATA.id,
CALLDATA.chanid,
POPUPDATA.textboxfield1
from
trmsmain.tblcalldata CALLDATA
left join trmsmain.tblpopupdata POPUPDATA on CALLDATA.recordguid = POPUPDATA.recordguid
join
(select
username,
channelnumber,
ID
from
table1
where
username = 'tatatata') LINKEDCHANNELS ON CALLDATA.chanid = LINKEDCHANNELS.channelnumber
order by CALLDATA.id desc limit 100000;
This is a question about how the query is optimized -- the query plan. I would suggest removing the subquery around TABLE1. This should help the optimizer:
select cd.id, cd.chanid, pud.textboxfield1
from trmsmain.tblcalldata cd left join
trmsmain.tblpopupdata pud
on cd.recordguid = pud.recordguid join
from table1 t1
on cd.chanid = t1.channelnumber
where t1.username = 'Installer'
order by cd.id desc
limit 100000;
This doesn't guarantee that the optimizer will choose the right execution path, but it gives it more information to go on. It shouldn't really make a difference, but I would also be inclined to put table1 first in the from clause.

How to rewrite slow WHERE IN query

I am not very familiar with mysql and was wondering how I might rewrite the following query to fix it and speed it up. I believe I'd have to use a JOIN (or something else), but am not sure how to do this.
What I want to achieve is to perform a simple MATCH AGAINST query, and using the IDs from these results (in table2), retrieve all corresponding rows with the same IDs in table1.
SELECT *
FROM table1
WHERE id IN (
SELECT id
FROM table2
WHERE MATCH (gloss) AGAINST ('example' IN BOOLEAN MODE)
LIMIT 10
)
Note that I'm aware the subquery above doesn't work with a LIMIT.
Thank you for your time.
try a simple inner join
SELECT *
FROM table1
INNER JOIN table2
USING(id)
WHERE MATCH (gloss) AGAINST ('example' IN BOOLEAN MODE)
LIMIT 10
to run your IN with a LIMIT you have to wrap it inside another query like this
SELECT *
FROM table1
WHERE id IN (SELECT id FROM
(SELECT id
FROM table2
WHERE MATCH (gloss) AGAINST ('example' IN BOOLEAN MODE)
LIMIT 10
)T1
)
to get exact same result as your IN with LIMIT 10 using INNER JOIN you can do this
SELECT *
FROM table1
INNER JOIN
(SELECT id
FROM table2
WHERE MATCH (gloss) AGAINST ('example' IN BOOLEAN MODE)
LIMIT 10
)T1
USING (id)
Try adding the index on the id column of the table as
ALTER TABLE TABLE_NAME ADD INDEX (COLUMN_NAME);
This will definitely increase the performance of the query...
Hope this helps... :)

mySQL index and preparing state

I get a complicate query:
SELECT * FROM
(
SELECT Transaction
FROM table1
WHERE
Transaction IN (SELECT Transaction FROM table2 WHERE Plugin='XXX' AND Server='XXX')
AND
Transaction NOT IN (SELECT Transaction FROM table1 WHERE Detail IN ('Monitor','Version','monitor','version'))
ORDER BY Date DESC, Millisecond DESC LIMIT 10)
AS res
I get indexes on table1:Detail and the "Transaction" is the primary key of table2.
It will take a while(5-10 secs) for the database to return result. So I create another index on table2:Plugin, the query is fasted now, but a preparing state shows up and also takes 5-10 secs. So after I create a new index, the time does not change at all.
Can someone tell me what`s going on and how can I optimize this query? Thank you!
Could you not simply rewrite the query as follows:
SELECT a.Transaction
FROM table1 a
INNER JOIN table2 b ON b.Transaction = a.Transaction
WHERE (b.Plugin='XXX' AND b.Server='XXX')
AND a.Detail NOT IN ('Monitor','Version','monitor','version')
ORDER BY a.Date DESC, a.Millisecond DESC LIMIT 10
So you join the table2 (which will be faster) and remove all the subqueries.
This should be much faster.

MySQL nested query speed

I'm coming from a Postgres background and trying to convert my application to MySQL. I have a query which is very fast on Postgres and very slow on MySQL. After doing some analysis, I have determined that one cause of the drastic speed difference is nested queries. The following pseudo query takes 170 ms on Postgres and 5.5 seconds on MySQL.
SELECT * FROM (
SELECT id FROM a INNER JOIN b
) AS first LIMIT 10
On both MySQL and Postgres the speed is the same for the following query (less than 10 ms)
SELECT id FROM a INNER JOIN b LIMIT 10
I have the exact same tables, indices, and data on both databases, so I really have no idea why this is so slow.
Any insight would be greatly appreciated.
Thanks
EDIT
Here is one specific example of why I need to do this. I need to get the sum of max. In order to do this I need a sub select as shown in the query below.
SELECT SUM(a) AS a
FROM (
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
) AS first
GROUP BY b
LIMIT 10
Again this query takes 14 seconds on MySQL and 238 ms on Postgres. Here is the output from explain on MySQL:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,25584,Using temporary; Using filesort
2,DERIVED,table2,index,PRIMARY,index_table2_on_b,index_table2_on_d,index_table2_on_issuance_datetime,index_table2_on_unassignment_datetime,index_table2_on_e,PRIMARY,4,\N,25584,Using where
2,DERIVED,tz,ref,index_table1_on_d,index_table1_on_read_datetime,index_table1_on_d_and_read_datetime,index_table1_on_4,4,db.table2.dosimeter_id,1,Using where
Jon, answering your comment, here is an example:
drop table if exists temp_preliminary_table;
create temporary table temp_preliminary_table
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b;
-- I suggest you add indexes to this temp table
alter table temp_preliminary_table
add index idx_b(b); -- Add as many indexes as you need
-- Now perform your query on this temp_table
SELECT SUM(a) AS a
FROM temp_preliminary_table
GROUP BY b
LIMIT 10;
This is just an example, splitting your query in three steps.
You need to remember that temp tables in MySQL are only visible to the connection that created them, so any other connection won't see temp tables created by another connection (for better or worse).
This "divide-and-conquer" approach has saved me many headaches. I hope it helps you.
In the nested query MySQL is doing the whole join before applying the limit while postgresql is smart enough to figure out that it is only necessary to join any 10 tuples.
Correct me if I am wrong, but why don't you try:
SELECT * FROM a INNER JOIN b LIMIT 10;
Given the fact that table2.id is the primary key this query with the limit in the inner query is functionally equivalent to yours where the limit is in the outer query and that is what the Postgresql planner figured out.
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
order by a desc
LIMIT 10