I'm coming from a Postgres background and trying to convert my application to MySQL. I have a query which is very fast on Postgres and very slow on MySQL. After doing some analysis, I have determined that one cause of the drastic speed difference is nested queries. The following pseudo query takes 170 ms on Postgres and 5.5 seconds on MySQL.
SELECT * FROM (
SELECT id FROM a INNER JOIN b
) AS first LIMIT 10
On both MySQL and Postgres the speed is the same for the following query (less than 10 ms)
SELECT id FROM a INNER JOIN b LIMIT 10
I have the exact same tables, indices, and data on both databases, so I really have no idea why this is so slow.
Any insight would be greatly appreciated.
Thanks
EDIT
Here is one specific example of why I need to do this. I need to get the sum of max. In order to do this I need a sub select as shown in the query below.
SELECT SUM(a) AS a
FROM (
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
) AS first
GROUP BY b
LIMIT 10
Again this query takes 14 seconds on MySQL and 238 ms on Postgres. Here is the output from explain on MySQL:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,25584,Using temporary; Using filesort
2,DERIVED,table2,index,PRIMARY,index_table2_on_b,index_table2_on_d,index_table2_on_issuance_datetime,index_table2_on_unassignment_datetime,index_table2_on_e,PRIMARY,4,\N,25584,Using where
2,DERIVED,tz,ref,index_table1_on_d,index_table1_on_read_datetime,index_table1_on_d_and_read_datetime,index_table1_on_4,4,db.table2.dosimeter_id,1,Using where
Jon, answering your comment, here is an example:
drop table if exists temp_preliminary_table;
create temporary table temp_preliminary_table
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b;
-- I suggest you add indexes to this temp table
alter table temp_preliminary_table
add index idx_b(b); -- Add as many indexes as you need
-- Now perform your query on this temp_table
SELECT SUM(a) AS a
FROM temp_preliminary_table
GROUP BY b
LIMIT 10;
This is just an example, splitting your query in three steps.
You need to remember that temp tables in MySQL are only visible to the connection that created them, so any other connection won't see temp tables created by another connection (for better or worse).
This "divide-and-conquer" approach has saved me many headaches. I hope it helps you.
In the nested query MySQL is doing the whole join before applying the limit while postgresql is smart enough to figure out that it is only necessary to join any 10 tuples.
Correct me if I am wrong, but why don't you try:
SELECT * FROM a INNER JOIN b LIMIT 10;
Given the fact that table2.id is the primary key this query with the limit in the inner query is functionally equivalent to yours where the limit is in the outer query and that is what the Postgresql planner figured out.
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
order by a desc
LIMIT 10
Related
I am looking to optimize below query which has a subquery from relation table and has a order by on subquery count data. Please see the below query:
SELECT table1.*,
( SELECT COUNT(*)
FROM table2
WHERE table2.user_id=table1.id
AND table2.deleted = 0) AS table2_total
FROM table1
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
This query works well but it stuck when table2 has more than 50K data. I have also tried to use left join instead of sub query but that is even more slower:
SELECT table1.*,
COUNT(DISTINCT table2.id) as table2_total
FROM table1
LEFT JOIN table2 ON table2.user_id=table1.id
AND table2.deleted = 0
WHERE table1.parent_id = 0
ORDER BY table2_total DESC LIMIT 0, 50
table2 already has indexes on user_id and deleted column. Please see below table2 structure:
Is there any way to optimize this query in better way?
As written, it will go through the entirety of table1, and probe table2 that many times.
Add this composite index to table2: INDEX(user_id, deleted) and remove the INDEX(user_id) that you currently seem to have.
You can try to add index to the column table2.deleted And table1.parent_id. The index is going to impact the performance of the insert .
I want to use a same subquery multiple times into UNION. This subquery is time consumed and I think that using it a lot of times may will be increased the total time of execution.
For example
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column1 DESC LIMIT 10)
UNION
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column2 DESC LIMIT 10)
UNION
(SELECT * FROM (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) as T ORDER BY column3 DESC LIMIT 10)
Does the (SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS) executed 3 times ?
If mysql is smart enough the internal subquery will be executed only one so I don't need any optimization, but if not I have to use something else to optimize it (like using a temporary table, but I want to avoid it)
Do I have to optimize this query by other syntax ? Any suggestion ?
In practice I want to filter some data from huge records and get some of them in 3 group-sections, each section in different order
Plan A:
A TEMPORARY TABLE cannot be referenced more than once. So, build a permanent table and DROP it when finished. (If you might have multiple connections doing the same thing, it will be a hassle to make sure you are not using the same table name.)
Plan B:
With MySQL 8.0, you can do
WITH T AS ( SELECT ... )
SELECT ... FROM T ORDER BY col1
UNION ...
Plan C:
If it is possible to do this:
SELECT id FROM A
ORDER BY col1 LIMIT 10
You could use that as a 'derived' table inside
(SELECT * FROM A INNER JOIN B ... AND SOME COMPLEX WHERE CONDITIONS)
Something like
SELECT A.*, B.*
FROM ( SELECT id FROM A
ORDER BY col1 LIMIT 10 ) AS x1
JOIN A USING(id)
JOIN B ... AND SOME COMPLEX WHERE CONDITIONS
Similarly for the other two SELECTs, then UNION them together.
Better yet, UNION together the 3 sets of ids, then JOIN to A and B once.
This may have the advantage of dealing with fewer rows.
So.. i have a query which is look like this:
SELECT t.* FROM (select * from sp2010 LIMIT 300000) t WHERE t.something = 1;
It's work fine, but when i want to retrieve > 300.000 row, it is just got stuck and my MySQL server has stopped working.
What's going on?
Just to optimize memory usage a bit you can do:
SELECT t.*
FROM sp2010 t
INNER JOIN (
SELECT id
FROM sp2010
LIMIT 500000) t2
ON t.id = t2.id
WHERE t.something = 1;
This way you will get less data when subquery and faster filter with JOIN indexed table against WHERE with temp dataset in memory
do you think a query like this will create problem in the execution of my software?
I need to delete the all the table, except the last 2 groups of entries, grouped by the same time of insert.
delete from tableA WHERE time not in
(
SELECT time FROM
(select distinct time from tableA order by time desc limit 2
) AS tmptable
);
Do you have better solution? I'm using mysql 5.5
I don't see anything wrong with your query, but I prefer using an OUTER JOIN/NULL check (plus it alleviates the need for one of the nested subqueries):
delete a
from tableA a
left join
(
select distinct time
from tableA
order by time desc
limit 2
) b on a.time = b.time
where b.time is null
SQL Fiddle Demo
I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;