Why is this MySQL query poor performance (DEPENDENT_SUBQUERY) - mysql

explain select id, nome from bea_clientes where id in (
select group_concat(distinct(bea_clientes_id)) as list
from bea_agenda
where bea_clientes_id>0
and bea_agente_id in(300006,300007,300008,300009,300010,300011,300012,300013,300014,300018,300019,300020,300021,300022)
)
When I try to do the above (without the explain), MySQL simply goes busy, using DEPENDENT SUBQUERY, which makes this slow as hell. The thing is why the optimizer calculates the subquery for each ids in client. I even put the IN argument in a group_concat believing that would be the same to put that result as a plain "string" to avoid scanning.
I thought this wouldn't be a problem for MySQL server which is 5.5+?
Testing in MariaDb also does the same.
Is this a known bug? I know I can rewrite this as a join, but still this is terrible.
Generated by: phpMyAdmin 4.4.14 / MySQL 5.6.26
Comando SQL: explain select id, nome from bea_clientes where id in ( select group_concat(distinct(bea_clientes_id)) as list from bea_agenda where bea_clientes_id>0 and bea_agente_id in(300006,300007,300008,300009,300010,300011,300012,300013,300014,300018,300019,300020,300021,300022) );
Lines: 2
Current selection does not contain a unique column. Grid edit, checkbox, Edit, Copy and Delete features are not available.
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|----|--------------------|--------------|-------|-------------------------------|---------------|---------|------|-------|------------------------------------|
| 1 | PRIMARY | bea_clientes | ALL | NULL | NULL | NULL | NULL | 30432 | Using where |
| 2 | DEPENDENT SUBQUERY | bea_agenda | range | bea_clientes_id,bea_agente_id | bea_agente_id | 5 | NULL | 2352 | Using index condition; Using where |

Obviously hard to test without the data but something like below.
Subqueries are just not good in mysql (though its my prefered engine).
I could also recommend indexing the relevant columns which will improve performance for both queries.
For clarity can I also advise expanding queries.
select t1.id,t1.nome from (
(select group_concat(distinct(bea_clientes_id)) as list from bea_agenda where bea_clientes_id>0 and bea_agente_id in (300006,300007,300008,300009,300010,300011,300012,300013,300014,300018,300019,300020,300021,300022)
) as t1
join
(select id, nome from bea_clientes) as t2
on t1.list=t2.id
)

Related

Optimizing MySQL select distinct order by limit safely

I have a problematic query that I know how to write faster, but technically the SQL is invalid and it has no guarantee of working correctly into the future.
The original, slow query looks like this:
SELECT sql_no_cache DISTINCT r.field_1 value
FROM table_middle m
JOIN table_right r on r.id = m.id
WHERE ((r.field_1) IS NOT NULL)
AND (m.kind IN ('partial'))
ORDER BY r.field_1
LIMIT 26
This takes about 37 seconds. Explain output:
+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+
| 1 | SIMPLE | r | range | PRIMARY,index_field_1 | index_field_1 | 9 | 1544595 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | m | eq_ref | PRIMARY,index_kind | PRIMARY | 4 | 1 | Using where; Distinct |
+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+
The faster version looks like this; the order by clause is pushed down into a subquery, which is joined on and is in turn limited with distinct:
SELECT sql_no_cache DISTINCT value
FROM (
SELECT r.field_1 value
FROM table_middle m
JOIN table_right r ON r.id = m.id
WHERE ((r.field_1) IS NOT NULL)
AND (m.kind IN ('partial'))
ORDER BY r.field_1
) t
LIMIT 26
This takes about 2.7 seconds. Explain output:
+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | 1346348 | Using temporary |
| 2 | DERIVED | m | ref | PRIMARY,index_kind | index_kind | 99 | 1539558 | Using where; Using index; Using temporary; Using filesort |
| 2 | DERIVED | r | eq_ref | PRIMARY,index_field_1 | PRIMARY | 4 | 1 | Using where |
+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+
There are three million rows in table_right and table_middle, and all mentioned columns are individually indexed. The query should be read as having an arbitrary where clause - it's dynamically generated. The query can't be rewritten in any way that prevents the where clause being easily replaced, and similarly the indexes can't be changed - MySQL doesn't support enough indexes for the number of potential filter field combinations.
Has anyone seen this problem before - specifically, select / distinct / order by / limit performing very poorly - and is there another way to write the same query with good performance that doesn't rely on unspecified implementation behaviour?
(AFAIK MariaDB, for example, ignores order by in a subquery because it should not logically affect the set-theoretic semantics of the query.)
For the more incredulous
Here's how you can create a version of database for yourself! This should output a SQL script you can run with mysql command-line client:
#!/usr/bin/env ruby
puts "create database testy;"
puts "use testy;"
puts "create table table_right(id int(11) not null primary key, field_0 int(11), field_1 int(11), field_2 int(11));"
puts "create table table_middle(id int(11) not null primary key, field_0 int(11), field_1 int(11), field_2 int(11));"
puts "begin;"
3_000_000.times do |x|
puts "insert into table_middle values (#{x},#{x*10},#{x*100},#{x*1000});"
puts "insert into table_right values (#{x},#{x*10},#{x*100},#{x*1000});"
end
puts "commit;"
Indexes aren't important for reproducing the effect. The script above is untested; it's an approximation of a pry session I had when reproducing the problem manually.
Replace the m.kind in ('partial') with m.field_1 > 0 or something similar that's trivially true. Observe the large difference in performance between the two different techniques, and how the sorting semantics are preserved (tested using MySQL 5.5). The unreliability of the semantics are, of course, precisely the reason I'm asking the question.
Please provide SHOW CREATE TABLE. In the absence of that, I will guess that these are missing and may be useful:
m: (kind, id)
r: (field_1, id)
You can turn off MariaDB's ignoring of the subquery's ORDER BY.

mysql "in" clause quite slow(seems to hang)

I'm using mysql and a simple query like this seems to hang forever with cpu 100%.
select login_name, server, recharge
from role_info
where login_name in (select login_name
from role_info
group by login_name
having count(login_name) > 1)
the table role_info is small, with only 33535 rows.
below is the output of explain:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | PRIMARY | role_info | ALL | NULL | NULL | NULL | NULL | 33535 | Using where |
| 2 | DEPENDENT SUBQUERY | role_info | index | NULL | a | 302 | NULL | 1 | Using index |
show processlist reports the query keeps Sending data.
Query | 1135 | Sending data | select login_name, server, recharge from role_info where login_name in (select login_name from ...
I can switch to join instead of "in" and it works fine. but still curious why this simple query should act so abnormally.
I wonder how many people stumble over this bug again and again. It's long lasting MySQL bug#32665, where MySQL 6 is the target version. It means that MySQL will turn you uncorrelated subquery into correlated (executed per row of outer resultset). Just in case, in this blog post are some ideas how to workaround the limitation when you can't rewrite the query into using JOINs.
In some versions of MySQL, in with a subquery is quite inefficient. The subquery gets executed for each row being processed. In other words, the results are not cached. I think this is fixed in the most recent versions.
You know how to fix it, by using a join. In general, I gravitate to joins or exists in MySQL instead of using in with a subquery.

mysql slow complex query with order by

The below query even without the order by is very slow and I can't figure out why. I'm guessing it's the where date_affidavit_file but how can I make it fast with that order byas well? Perhaps a sublect on the job_id's that match the where and then pass that into the rest of the code but I still need to order by server the servername like this. Any suggestions?
explain select sql_no_cache court_county, job.id as jid, job_status,
DATE_FORMAT(job.datetime_served, '%m/%d/%Y') as dserved ,
CONCAT(server.namefirst, ' ', server.namelast) as servername, client_name,
DATE_FORMAT(job.datetime_received, '%m/%d/%Y') as dtrec ,
DATE_FORMAT(job.datetime_give2server, '%m/%d/%Y') as dtg2s,
DATE_FORMAT(date_kase_filed, '%m/%d/%Y') as dkf,
DATE_FORMAT(job.date_sent_to_court, '%m/%d/%Y') as dtstc ,
TO_DAYS(datetime_served )-TO_DAYS(date_kase_filed) as totaldays from job
left join kase on kase.id=job.kase_id
left join server on job.server_id=server.id
left join client on kase.client_id=client.id
left join LUcourt on LUcourt.id=kase.court_id
where date_affidavit_filed is not null and date_affidavit_filed !='' order by servername;
+----+-------------+---------+--------+----------------------+---------+---------+-----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+----------------------+---------+---------+-----------------------+--------+----------------------------------------------+
| 1 | SIMPLE | job | ALL | date_affidavit_filed | NULL | NULL | NULL | 365212 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | kase | eq_ref | PRIMARY | PRIMARY | 4 | pserve.job.kase_id | 1 | |
| 1 | SIMPLE | server | eq_ref | PRIMARY | PRIMARY | 4 | pserve.job.server_id | 1 | |
| 1 | SIMPLE | client | eq_ref | PRIMARY | PRIMARY | 4 | pserve.kase.client_id | 1 | |
| 1 | SIMPLE | LUcourt | eq_ref | PRIMARY | PRIMARY | 4 | pserve.kase.court_id | 1 | |
+----+-------------+---------+--------+----------------------+---------+---------+-----------------------+--------+----------------------------------------------+
Check that you have indexes on the following columns. job.kase_id or job.server_id
Also you are ordering by a calculated field which is not optimal. Perhaps order by a field with index.
If you need to preserve that exact sort, you might want to add a field in the DB for that value. And populate it with appropriate values or set up a trigger on the DB to populate it for you automatically.
This can speed up the order by:
CREATE INDEX namefull ON server (namefirst,namelast);
if you do ORDER BY (server.namefirst, server.namelast) instead of ORDER BY servername, which should produce the same output.
You can also create indexes on each table on any field you are left joining, that can improve the performance of your query too.
When you write,
where date_affidavit_filed is not null and date_affidavit_filed !=''
you practically are selecting most of the rows. Or at least so many that it is not worthwhile to run through the indexing. The query planner sees that there is an index involving date_affidavit_filed, but decides not to use it and go with the WHERE clause, which only involves date_affidavit_filed; so we know it's not a key issue, it must be a cardinality issue.
| 1 | SIMPLE | job | ALL | date_affidavit_filed | NULL | NULL | NULL | 365212 | Using where; Using temporary; Using filesort |
You can try optimizing this by creating an index on
date_affidavit_filed, kase_id, server_id
in that order. How many rows are returned by the query?
You are selecting everything that isn't empty really.
That really means everything.
I don't know how many rows of data you have have but it's a lot to go through.
Try narrowing your query to a date range or specific client.
If you really need everything, don't output it one row after a time, but build up a big string in the software you use to output with all formatting, and then when you're finished looping through the results and you have constructed the data you wish to output you can output them in one big go.
You could also use paging.
Just add limit 0,30 on page 1, limit 30,30 on page two, etc.. and let the end user walk through the pages.

SQL Query Optimization

I am trying to speed up this django app (note: I didn't design this... just stuck maintaining it) and the biggest bottle neck seems to be these queries that are being generated by the admin. We have a content class that 4-5 other sub-classes inherit from and anytime the master list is pulled up in the admin a query like this is generated:
SELECT `content_content`.`id`,
`content_content`.`issue_id`,
`content_content`.`slug`,
`content_content`.`section_id`,
`content_content`.`priority`,
`content_content`.`group_id`,
`content_content`.`rotatable`,
`content_content`.`pub_status`,
`content_content`.`created_on`,
`content_content`.`modified_on`,
`content_content`.`old_pk`,
`content_content`.`content_type_id`,
`content_image`.`content_ptr_id`,
`content_image`.`caption`,
`content_image`.`kicker`,
`content_image`.`pic`,
`content_image`.`crop_x`,
`content_image`.`crop_y`,
`content_image`.`crop_side`,
`content_issue`.`id`,
`content_issue`.`special_issue_name`,
`content_issue`.`web_publish_date`,
`content_issue`.`issue_date`,
`content_issue`.`fm_name`,
`content_issue`.`arts_name`,
`content_issue`.`comments`,
`content_section`.`id`,
`content_section`.`name`,
`content_section`.`audiodizer_id`
FROM `content_image`
INNER
JOIN `content_content`
ON `content_image`.`content_ptr_id` = `content_content`.`id`
INNER
JOIN `content_issue`
ON `content_content`.`issue_id` = `content_issue`.`id`
INNER
JOIN `content_section`
ON `content_content`.`section_id` = `content_section`.`id`
WHERE NOT ( `content_content`.`pub_status` = -1 )
ORDER BY `content_issue`.`issue_date` DESC LIMIT 30
I ran an EXPLAIN on this and got the following:
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
| 1 | SIMPLE | content_image | ALL | PRIMARY | NULL | NULL | NULL | 40499 | Using temporary; Using filesort |
| 1 | SIMPLE | content_content | eq_ref | PRIMARY,issue_id,content_content_issue_id,content_content_section_id,content_content_pub_status | PRIMARY | 4 | content_image.content_ptr_id | 1 | Using where |
| 1 | SIMPLE | content_section | eq_ref | PRIMARY | PRIMARY | 4 | content_content.section_id | 1 | |
| 1 | SIMPLE | content_issue | eq_ref | PRIMARY | PRIMARY | 4 | content_content.issue_id | 1 | |
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
Now, from what I've read, I need to somehow figure out how to make the query to content_image not be terrible; however, I'm drawing a blank on where to start.
Currently, judging by the execution plan, MySQL is starting with content_image, retrieving all rows, and only thereafter using primary keys on the other tables: content_image has a foreign key to content_content, and content_content has foreign keys to content_issue and content_section. Also, only after all the joins are complete can it make much use of the ORDER BY content_issue.issue_date DESC LIMIT 30, since it can't tell which of these joins might fail, and therefore, how many records from content_issue will really be needed before it can get the first thirty rows of output.
So, I would try the following:
Change JOIN content_issue to JOIN (SELECT * FROM content_issue ORDER BY issue_date DESC LIMIT 30) content_issue. This will allow MySQL, if it starts with content_issue and works its way to the other tables, to grab a very small subset of content_issue.
Note: properly speaking, this changes the semantics of the query: it means that only records from at most the last 30 content_issues will be retrieved, and therefore that if some of those issues don't have published contents with images, then fewer than 30 records will be retrieved. I don't have enough information about your data to gauge whether this change of semantics would actually change the results you get.
Also note: I'm not suggesting to remove the ORDER BY content_issue.issue_date DESC LIMIT 30 from the end of the query. I think you want it in both places.
Add an index on content_issue.issue_date, to optimize the above subquery.
Add an index on content_image.content_ptr_id, so MySQL can work its way from content_content to content_image without doing a full table scan.

Cascading WHERE clause inside a VIEW with a UNION

This isn't solved, but I found out why: MySQL View containing UNION does not optimize well...In other words SLOW!
Original post:
I'm working with a database for a game. There are two identical tables equipment and safety_dep_box. To check if a player has a piece of equipment I'd like to check both tables.
Instead of doing two queries, I want to take advantage of the UNION functionality in MySQL. I've recently learned that I can create a VIEW. Here's my view:
CREATE VIEW vAllEquip AS SELECT * FROM equipment UNION SELECT * FROM safety_dep_box;
The view created just fine. However when I run
SELECT * FROM vAllEquip WHERE owner=<id>
The query takes forever, while independent select queries are quick. I think I know why, but I don't know how to fix it.
Thanks!
P.S. with Additional Information:
The two tables are identical in structure, but split because they are multi-100-million row tables.
The structure includes primary key on int id, multiple index on int owner.
What I don't understand is the speed difference between the following:
SELECT COUNT(*) FROM (SELECT * FROM equipment WHERE owner=1 UNION ALL SELECT * FROM safety_dep_box WHERE owner=1) AS uES;
0.42 sec
SELECT COUNT(*) FROM (SELECT * FROM equipment WHERE owner=1 UNION SELECT * FROM safety_dep_box WHERE owner=1) AS uES;
0.37 sec
SELECT COUNT(*) FROM vAllEquip WHERE owner=1;
aborted after 60 seconds
Version: 5.1.51
mysql> explain SELECT * FROM equipment UNION SELECT * FROM safety_dep_box;
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
| 1 | PRIMARY | equipment | ALL | NULL | NULL | NULL | NULL | 1499148 | |
| 2 | UNION | safety_dep_box | ALL | NULL | NULL | NULL | NULL | 867321 | |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
with a WHERE clause
mysql> explain SELECT * FROM equipment WHERE owner=1 UNION ALL SELECT * FROM safety_dep_box WHERE owner=1
-> ;
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
| 1 | PRIMARY | equipment | ref | owner,owner_2,owner_3 | owner | 4 | const | 1 | |
| 2 | UNION | safety_dep_box | ref | owner,owner_3 | owner | 4 | const | 1 | |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
First off, you should probably be using UNION ALL instead of plain UNION. With plain UNION, the engine will try to de-duplicate your result set. That is likely the source of your problem.
Secondly, you'll need indexes on owner in both tables, not just one. And, ideally, they'll be integer columns.
Thirdly, Randolph is right that you should not be using "*" in your SELECT statement. List out all the columns you want included. That is especially important in a UNION because the columns must match up exactly and, if there's a disagreement in the column order in your two tables you may be forcing some type conversion to go on that is costing you some time.
Finally, the phrase "There are two identical tables" is almost always a tip-off that your database is not optimally designed. These should probably be a single table. To indicate ownership of an item, your safety_dep_box table should contain only the ownerID and itemID of the item (to relate equipment and players), and possibly an additional autonumbered integer key column.
First off, don't use SELECT * in views ever. It's lazy code. Secondly, without knowing what the base tables look like, we're even less likely to be able to help you.
The reason it takes forever is because it has to build the full result and then filter it. You'll want indexes on your owner fields, whatever they may be.