MySQL queries performance comparison - mysql

Between these 2 queries, is one faster than the other in MySQL 5.7.19?
select {some columns}
from
table1 t1
join table2 t2 using ({some columns})
where t1.col1=1
vs
select {some columns}
from
(select {some columns} from table1 where col1=1) t1
join table2 t2 using ({some columns})
assuming that all indexes are correctly set

I've created a SQL Fiddle so we can experiment.
Your first query translates to:
select *
from
table1 t1
join table2 t2 on t2.table1_id = t1.id
where t1.col1=1
And the execution plan is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t1 ref id,col1 col1 4 const 2
100.00 Using index
1 SIMPLE t2 ref table1_id table1_id 4 db_9_0005cd.t1.id 1
100.00 Using index
This is pretty much as fast as it can possibly be.
Your second query becomes
select *
from
(select * from table1 where col1=1) as t1
join table2 t2 on t2.table1_id = t1.id
And the execution plan is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t2 index table1_id table1_id 4 3 100.00 Using index
1 PRIMARY ref 4 db_9_0005cd.t2.table1_id 2100.00
2 DERIVED table1 ref col1 col1 4 const 2 100.00 Using index
The difference here is that you're using a derived table, but it's still using the index. My expectation is that this would perform equally quickly as version 1, as long as the database is not resource constrained - if you're bumping up against memory or CPU limits, the second query may behave slightly more unpredictably.
However...
The theoretical approach is no substitute for having a test environment with test data, and tuning this thing in representative conditions. I doubt that the real query you're building will be as simple as the examples...

For that simple pair of queries, the one involving the "derived table" (subquery), will definitely be no faster.
There are other cases where a derived table can be faster. This includes case where a GROUP BY or LIMIT decreases the number of rows before doing the JOIN.

Related

Super slow SQL query when `WHERE` and `OR` are used together [duplicate]

The following query takes mysql to execute almost 7 times longer than implementing the same using two separate queries, and avoiding OR on the WHERE statement. I prefer using a single query as I can sort and group everything.
Here is the problematic query:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1'
OR posts.user_id='7135');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ALL user_id NULL NULL NULL 169642
1 SIMPLE teams_users eq_ref PRIMARY PRIMARY 8 posts.team_id,const 1 Using where
Now if I do the following two queries instead, the aggregate execution time, as said, is shorter by 7 times:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE teams_users ref PRIMARY,status status 1 const 5822 Using where
1 SIMPLE posts ref team_id team_id 5 teams_users.team_id 9 Using where
and:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (posts.user_id='7135');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ref user_id user_id 4 const 142
1 SIMPLE teams_users eq_ref PRIMARY PRIMARY 8 posts.team_id,const 1
Obviously the amount of scanned rows is much lower on the two queries.
Why is the initial query slow?
Thanks.
Yes, OR is frequently a performance-killer. A common work-around is to do UNION. For your example:
SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1')
UNION DISTINCT
SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (posts.user_id='7135');
If you are sure there are not dups, change to the faster UNION ALL.
If you are not fishing for missing team_users rows, use JOIN instead of LEFT JOIN.
If you need ORDER BY, add some parens:
( SELECT ... )
UNION ...
( SELECT ... )
ORDER BY ...
Otherwise, the ORDER BY would apply only to the second SELECT. (If you also need 'pagination', see my blog .)
Please note that you might also need LIMIT in certain circumstances.
The queries without the OR clause are both sargable. That is, they both can be satisfied using indexes.
The query with the OR would be sargable if the MySQL query planner contained logic to figure out it can rewrite it as the UNION ALL of two queries. By the MySQL query planner doesn't (yet) have that kind of logic.
So, it does table scans to get the result set. Those are often very slow.

MYSQL Insert query fails with total number of locks exceeds the lock table size

The table table1 contains 1500000 rows and contains 80 fields and i want to remove the duplicates based on field1 and field2 and ID field is unique so i used the maximum option.
Option1: Insert Option
insert into table2_unique
select * from table1 a
where a.id = ( select max(b.id) from table1 b
where a.field1 = b.field1
and a.field2 = b.field2 );
But the query fails because of the below error.
Error Code: 1206. The total number of locks exceeds the lock table size
Explain Statement:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 INSERT table2 NULL ALL NULL NULL NULL NULL NULL NULL NULL
1 PRIMARY a NULL ALL NULL NULL NULL NULL 1387764 100 Using where
2 DEPENDENT SUBQUERY b NULL ref field1x,field2x field1x 39 a.field1 537 10 Using where
Option2 DELETE Statement:
DELETE n1 FROM table1 n1, table1 n2 WHERE n1.id > n2.id AND n1.field1 = n2.field1 and n1.field2 and n2.field2
When i execute then Deadlock occured.
I am not able to increase the buffer pool size, please let me know shall i write the query in different way.
Increased the INNODB_BUFFER_POOL_SIZE in my.ini file and the query ran in 27 mins for that specified volume
I'm not sure how it will impact the locks but using dependant sub-queries (i.e. pushed predicates) in mysql has never worked very well in my experience. I would have written the first query as:
insert into table2_unique (id, col1, col2, ...col79)
select a.id, a.col1, a.col2, ...a.col79
from table1 a
Inner join (
Select max(b.id) as id
From table1 b
Group by b.col1, b.col2
) As dedup
On a.id=dedup.id;
Trying to update a table using a join is always a bit dodgy. When its a self-join, then its not surprising it fails. Using a temporary table and splitting the operation into 2 steps avoids this.

MySQL: dependent subquery select type for uncorrelated subquery

There are DDL statements:
CREATE TABLE t1(
c1 INT NOT NULL
);
CREATE TABLE t2(
c2 INT NOT NULL
);
My query:
SELECT c1 FROM t1 WHERE c1 NOT IN (SELECT c2 from t2)
EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 3 Using where
2 DEPENDENT SUBQUERY t2 ALL NULL NULL NULL NULL 3 Using where
Subquery doesn't correlate with outer query. Why it's type is DEPENDENT SUBQUERY?
UPD: query is SELECT c1 FROM t1 WHERE c1 NOT IN (SELECT c2 from t2)
The execution planner / optimizer of the MySQL version you are using, rewrites the query internally as a correlated subquery (or to be more accurate, both these are transformed into the same execution plan):
SELECT c1
FROM t1
WHERE NOT EXISTS
(SELECT * from t2 WHERE c2 = t1.c1) ;
This type of query is called antijoin (or anti-semijoin) and can also be written in another way, with a LEFT JOIN / WHERE IS NULL, which produces in MySQL (5.1 and 5.5 versions) a slightly different Explain plan:
SELECT t1.c1
FROM t1
LEFT JOIN t2 ON t2.c2 = t1.c1
WHERE t2.c2 IS NULL ;
Notice that in other versions, like the 5.6 (which is still in development), or in MariaDB (that has some optimizer improvements in the recent versions), the queries may be rewritten differently.
Even in the same version, the final execution plan for the same query (and especially for more complex ones) may vary from execution to execution, depending on the indexes available, the sizes of the tables and several other factors.

mysql join performance IF multiple OR conditions

I hold a set of nodes in one mysql table1 and a table of edges in another one (table2). Nodes come with primary keys and edges use this "foreign key"
**table1**
id label
1 node1
2 node2
3 node3
**table2**
FK_first FK_sec rel
1 3 guardian
2 1 guardian
1 3 times
I know the db-design is not perfect, but its simple...
Now i want the number of 'rel' for every node and do a query like:
SELECT
label,
COUNT( rel ) as freq
FROM
`table1`
LEFT JOIN table2 ON (id=FK_first OR id=FK_second)
GROUP BY label
ORDER BY freq DESC
I have about 1000 nodes and 2000 edges. A query with ON (id=FK_first OR id=FK_second), then the query is way faster (<1 sec). The other query needs about 6 sec which is ver slow.
I would appreciate some comments to speed this up a bit :-)
LEFT JOIN table2 ON (id=FK_first OR id=FK_second) ~6 sec
LEFT JOIN table2 ON (id=FK_first) ~0.16 sec
LEFT JOIN table2 ON (id=FK_second) ~0.16 sec
LEFT JOIN table2 ON id IN (FK_first,FK_second) ~6 sec
EXPLAIN 1:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table1 ALL NULL NULL NULL NULL 2571 Using temporary; Using filesort
1 SIMPLE table2 ALL FK_first,FK_second,FK_first_2 NULL NULL NULL 3858
EXPLAIN 2:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table1 index NULL PRIMARY 2 NULL 2571 Using index; Using temporary; Using filesort
1 SIMPLE table2 ref FK_first,FK_first_2 FK_first_2 4 table1.id 1
Try doing two joins and moving the "OR" into the COUNT() function:
For every row, this joins table2 once on FK1, then again on FK2 (if it is not already joined to that row via FK1. Then in the COUNT, we specify that only rows which have either join's rel column as non-null.
SELECT
label,
COUNT( table2A.rel || table2B.rel ) as freq
FROM
`table1`
LEFT JOIN
table2 as table2A
ON id=table2A.FK_first
LEFT JOIN
table2 as table2B
ON id=table2B.FK_second
AND table2A.FKFirst != table2B.FKFirst
GROUP BY label
ORDER BY freq DESC

MySQL query optimisation with group by and order by rand

I have a problem with the following query which is very slow :
SELECT A.* FROM B
INNER JOIN A ON A.id=B.fk_A
WHERE A.creationDate BETWEEN '20120309' AND '20120607'
GROUP BY A.id
ORDER BY RAND()
LIMIT 0,5
EXPLAIN :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE B index fk_A fk_A 4 \N 58962 Using index; Using temporary; Using filesort
1 SIMPLE A eq_ref PRIMARY,creationDate PRIMARY 4 B.fk_A 1 Using where
INDEXES :
A.id (int) = PRIMARY index
A.creationDate (date) = index
B.fk_A = index
Do you see something to optimize ?
Thanks a lot for your advice
I think the RAND() function will create a Rand() value for every row (this is why the using temporary shows up, and filesort because it can't use an index.
the best way would be to SELECT MAX(id) FROM a to get the max value.
then create 5 random numbers between 1 and MAX(id) and do a SELECT ... WHERE a.id IN (...) query.
If the result has fewer than 5 rows (because a record has been deleted) repeat the procedure until you are fine (or initially create 100 random numbers and LIMIT the query to 5.
That is not a 100% mysql solution, because you have to do the logic in your code, but will be much faster I believe.
Update
Just Found an interesting article in the net, that basically tells the same: http://akinas.com/pages/en/blog/mysql_random_row/
One possible rewriting of the query:
SELECT A.*
FROM A
WHERE A.creationDate BETWEEN '20120309' AND '20120607'
AND EXISTS
( SELECT *
FROM B
WHERE A.id = B.fk_A
)
ORDER BY RAND()
LIMIT 0,5