I'm working with mysql 5.5.52 on a Debian 8 machine and sometimes we have a slow query (>3s) that usually spends 0.1s. I've started with the explain command to find what is happening.
This is the query and the explain info
explain
SELECT
`box`.`message_id` ID
, `messages`.`tipo`
, `messages`.`text`
, TIME_TO_SEC(TIMEDIFF(NOW(), `messages`.`date`)) `date`
FROM (`box`)
INNER JOIN `messages` ON `messages`.`id` = `box`.`message_id`
WHERE `box`.`user_id` = '1010231' AND `box`.`deleted` = 0
AND `messages`.`deleted` = 0
AND `messages`.`date` + INTERVAL 10 MINUTE > NOW()
ORDER BY `messages`.`id` ASC LIMIT 100;
id| select_type| table | type | possible_keys | key | key_len| ref | rows | Extra
1|SIMPLE |box |ref |user_id,message_id|user_id| 4|const | 2200 |Using where; Using temporary; Using filesort
1|SIMPLE |messages|eq_ref|PRIMARY |PRIMARY| 4|box.message_id| 1 |Using where
I know that temporary table and filesort are a bad thing, and I suppose that the problem is that order key doesn't belong to the first table in the query (box) and changing it to box.message_id, the explain info is
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1 SIMPLE box index user_id,message_id message_id 4 443 Using where
1 SIMPLE messages eq_ref PRIMARY PRIMARY 4 box.message_id 1 Using where
It looks better, but I don't understand why it's using the message_id index, and worst, now the query takes 1.5s instead of initial 0.1s
Edit:
Forcing the query to use user_id index, I get the same result (0.1s) as the initial query but without the temporary
explain
SELECT
`box`.`message_id` ID
, `messages`.`tipo`
, `messages`.`text`
, TIME_TO_SEC(TIMEDIFF(NOW(), `messages`.`date`)) `date`
FROM (`box` use index(user_id) )
INNER JOIN `messages` ON `messages`.`id` = `box`.`message_id`
WHERE `box`.`user_id` = '1010231' AND `box`.`deleted` = 0
AND `messages`.`deleted` = 0
AND `messages`.`date` + INTERVAL 10 MINUTE > NOW()
ORDER BY `box`.`message_id` ASC LIMIT 100;
id| select_type| table | type | possible_keys | key | key_len| ref | rows | Extra
1|SIMPLE |box |ref |user_id,message_id|user_id| 4|const | 2200 |Using where; Using filesort
1|SIMPLE |messages|eq_ref|PRIMARY |PRIMARY| 4|box.message_id| 1 |Using where
I think that skipping temporary is better solution than the initial query, next step is check combined index as ysth recommends.
it is not a good idea to calculate on fieldvalues to compare. then you get a FULL TABLE SCAN. MySQL must do it for each ROW before it can check the condition. Its better to do it on the constant piece of condition. Then MySQL can use a Index (if there one on this field)
change from
AND messages.date + INTERVAL 10 MINUTE > NOW()
to
AND messages.date > NOW() - INTERVAL 10 MINUTE
Temporary and file sort are not bad here; they are needed because using the best index (user_id) doesn't naturally produce records sorted in the order you ask for.
It's possible you might do better having a combined user_id,message_id index, but that might also end up worse. Depends on your exact data.
It isn't clear to me if you are seeing longer queries for certain user id's or the same user id sometimes taking much longer.
Update: it seems likely that having a combined index and changing order by to box.user_id,box.message_id will solve your problem, at least for users that don't have a large number of deleted messages.
Related
I'm trying to figure out why is one of my query slow and how I can fix it but I'm a bit puzzled on my results.
I have an orders table with around 80 columns and 775179 rows and I'm doing the following request :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200
which returns 38 rows in 4.5s
When removing the ORDER BY I'm getting a nice improvement :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL LIMIT 200
38 rows in 0.30s
But when removing the LIMIT without touching the ORDER BY I'm getting an even better result :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC
38 rows in 0.10s (??)
Why is my LIMIT so hungry ?
GOING FURTHER
I was trying a few things before sending my answer and after noticing that I had an index on creation_date (which is a datetime) I removed it and the first query now runs in 0.10s. Why is that ?
EDIT
Good guess, I have indexes on the others columns part of the where.
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200;
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| 1 | SIMPLE | orders | index | id_state_idx,id_mp_idx | creation_date | 5 | NULL | 1719 | Using where |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC;
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| 1 | SIMPLE | orders | range | id_state_idx,id_mp_idx | id_mp_idx | 3 | NULL | 87502 | Using index condition; Using where; Using filesort |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
Indexes do not necessarily improve performance. To better understand what is happening, it would help if you included the explain for the different queries.
My best guess would be that you have an index in id_state or even id_state, id_mp that can be used to satisfy the where clause. If so, the first query without the order by would use this index. It should be pretty fast. Even without an index, this requires a sequential scan of the pages in the orders table, which can still be pretty fast.
Then when you add the index on creation_date, MySQL decides to use that index instead for the order by. This requires reading each row in the index, then fetching the corresponding data page to check the where conditions and return the columns (if there is a match). This reading is highly inefficient, because it is not in "page" order but rather as specified by the index. Random reads can be quite inefficient.
Worse, even though you have a limit, you still have to read the entire table because the entire result set is needed. Although you have saved a sort on 38 records, you have created a massively inefficient query.
By the way, this situation gets significantly worse if the orders table does not fit in available memory. Then you have a condition called "thrashing", where each new record tends to generate a new I/O read. So, if a page has 100 records on it, the page might have to be read 100 times.
You can make all these queries run faster by having an index on orders(id_state, id_mp, creation_date). The where clause will use the first two columns and the order by will use the last.
Same problem happened in my project,
I did some test, and found out that LIMIT is slow because of row lookups
See:
MySQL ORDER BY / LIMIT performance: late row lookups
So, the solution is:
(A)when using LIMIT, select not all columns, but only the PK columns
(B)Select all columns you need, and then join with the result set of (A)
SQL should likes:
SELECT
*
FROM
orders O1 <=== this is what you want
JOIN
(
SELECT
ID <== fetch the PK column only, this should be fast
FROM
orders
WHERE
[your query condition] <== filter record by condition
ORDER BY
[your order by condition] <== control the record order
LIMIT 2000, 50 <== filter record by paging condition
) as O2
ON
O1.ID = O2.ID
ORDER BY
[your order by condition] <== control the record order
in my DB,
the old SQL which select all columns using "LIMIT 21560, 20", costs about 4.484s.
the new sql costs only 0.063s. The new one is about 71 times faster
I had a similar issue on a table of 2.5 million records. Removing the limit part the query took a few seconds. With the limit part it stuck forever.
I solved with a subquery. In your case it would became:
SELECT *
FROM
(SELECT *
FROM orders
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC) tmp
LIMIT 200
I noted that the original query was fast when the number of selected rows was greater than the limit parameter. Se the query became extremely slow when the limit parameter was useless.
Another solution is trying forcing index. In your case you can try with
SELECT *
FROM orders force index (id_mp_idx)
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC
LIMIT 200
Problem is that mysql is forced to sort data on the fly. My query of deep offset like:
ORDER BY somecol LIMIT 99990, 10
Took 2.5s.
I fixed it by creating a new table, which has presorted data by column somecol and contains only ids, and there the deep offset (without need to use ORDER BY) takes 0.09s.
0.1s is not still enough fast though. 0.01s would be better.
I will end up creating a table that holds the page number as special indexed column, so instead of doing limit x, y i will query where page = Z.
i just tried it and it is fast as 0.0013. only problem is, that the offseting is based on static numbers (presorted in pages by 10 items for example.. its not that big problem though.. you can still get out any data of any pages.)
We have a MyISAM table with approximately 75 milion rows that has 5 columns:
id (int),
user_id(int),
page_id (int),
type (enum with 6 strings)
date_created(datetime).
We have a primary index on the ID column, a unique index (user_id, page_id, date_created) AND a composite index (page_id, date_created)
The problem is that the query below takes up to 90 seconds to complete
SELECT SQL_NO_CACHE user_id, count(id) nr
FROM `table`
WHERE `page_id`=301
and `date_created` BETWEEN '2012-01-03' AND '2012-02-03 23:59:59'
AND page_id<>user_id
group by `user_id`
This is the explain of this query
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | table | range | page_id | page_id | 12 | NULL | 520024 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
EDIT:
At the suggestion of ypercube I tried adding a new index (page_id, user_id, date_created). However mysql does not use it bu default so i had to suggest it to the query optimizer. Here is the new query and the explain:
SELECT SQL_NO_CACHE user_id, count(*) nr FROM `table` USE INDEX (usridexp) WHERE `page_id`=301 and `date_created` BETWEEN '2012-01-03' AND '2012-02-03 23:59:59' AND page_id<>user_id group by `user_id` ORDER BY NULL
+----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
| 1 | SIMPLE | table | ref | usridexp | usridexp | 4 | const | 3943444 | Using where; Using index |
+----+-------------+----------------------------+------+---------------+----------+---------+-------+---------+--------------------------+
Some changes that may improve the query:
Change COUNT(id) to COUNT(*). Since id is (I guess) the PRIMARY KEY and NOT NULL, the results will be identical.
Add an ORDER BY NULL after ther GROUP BY clause. In MySQL, a group by operation also sorts the results, unless you specify other wise.
The (page_id, date_created) is probably the best index that MySQL can use for this query but you could also try (page_id, user_id, date_created) (can you also post the EXPLAIN if you add this index?)
Another thing not related to the performance of this query:
If your (user_id, page_id, date_created) is UNIQUE and the id is auto generated (and not used for anything else but as a Primary Key), you can make it the PRIMARY KEY and drop the id column. One less index and 4 bytes less per row.
1) It depends on your data - but you should have multiple indexes available to allow MySQL to choose the best one. e.g. if the table had an index on page_id it wouldn't be scanning so many rows.
2) There is a way of optimising date searches. I haven't actually implemented this myself yet, but have a similar problem that I have thought about quite a bit.
Basically you are looking up data by day - but date compares are really slow. What you could do is create another table that stores earliest and latest ID from table for each day. That table would need to be populated at the end of each day.
After that you could break your query into two parts:
i) Find the IDs to search y running two queries:
select earliestID from idCacheTable where date = '2012-01-03';
select latestID from idCacheTable where date = '2012-02-03';
ii) You can then search directly on the primary key of the table, without doing a date compare on each row, which would be waaaaaay faster.
SELECT SQL_NO_CACHE user_id, count(id) nr
FROM table
WHERE page_id=301
and (id >= earliestID and id <= latestID)
AND page_id<>user_id
group by user_id;
The exact solution to your problem will depend on what your data looks like though, rather than one of those two things always being correct.
Sounds odd, but try to add JOIN statement:
SELECT SQL_NO_CACHE user_id, count(id) nr
FROM `table` t
JOIN `table` t2 ON t.`user_id`= t2.`user_id`
WHERE t.`page_id`=301
and t.`date_created` BETWEEN '2012-01-03' AND '2012-02-03 23:59:59'
AND t.`page_id`<>t.`user_id`
group by t.`user_id`
For similar problem, I got that query execute 20 times faster (3-4s instead 60+). JOIN statement does not perform anything smart - seems that speedup is fully to internal MySql implementation (Tested on MySql 5.1., table have rare user_id duplicates).
I feel like the following query is too slow:
(1679.1ms)
SELECT `media_files` . *
FROM `media_files`
INNER JOIN `playlist_media_files` ON `media_files`.`id` = `playlist_media_files`.`media_file_id`
WHERE `media_files`.`type`
IN (
'AudioFile'
)
AND `playlist_media_files`.`playlist_id` =7
ORDER BY media_files.artist ASC , media_files.release_year ASC , media_files.album ASC , media_files.disc_number ASC , media_files.position ASC
EXPLAIN:
+----+-------------+----------------------+--------+---------------------------------------------------------------------------------------+-------------------------------------------+---------+---------------------------------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+--------+---------------------------------------------------------------------------------------+-------------------------------------------+---------+---------------------------------------------------------+------+---------------------------------+
| 1 | SIMPLE | playlist_media_files | ref | index_playlist_media_files_on_playlist_id,index_playlist_media_files_on_media_file_id | index_playlist_media_files_on_playlist_id | 4 | const | 3782 | Using temporary; Using filesort |
| 1 | SIMPLE | media_files | eq_ref | PRIMARY,index_media_files_on_type | PRIMARY | 4 | mydb.playlist_media_files.media_file_id | 1 | Using where |
+----+-------------+----------------------+--------+---------------------------------------------------------------------------------------+-------------------------------------------+---------+---------------------------------------------------------+------+---------------------------------+
Every column is indexed.
Any MySQL expert can tell how can it be improved by looking at the explain?
The multiple ORDER BY is killing the performance.
Edit: removed private URLs from comments
Update: it seems I can do something like concat(..fields..) AS sort for a late ORDER BY sort.
For this query you would probably benefit from having a composite index on (playlist_id,media_file_id) in the playlist_media_files table, which would let mysql to use only this index to know which row to read from media_files without having to read actual data from playlist_media_files to see what is the value of media_file_id for every row that satisfies playlist_id = 7 condition (a lot of them do).
You should see additional using index for the first row of explain output.
Mysql would still have to create temporary table to sort it by so many columns, but sorting 4k rows in memory is not so expensive.
So basically try:
ALTER TABLE `playlist_media_files`
ADD INDEX `playlist_media_composite` ( `playlist_id` , `media_file_id` ) ;
and see the results.
Edit: I tried to simulate the same problem on my test db, creating the same tables and filling them with random 400k rows using php, trying to get similar index cardinality.
Without composite index the same query has following execution plan:
1 SIMPLE playlist_media_files ref playlist_id,media_file_id playlist_id 4 const 3925 Using temporary; Using filesort
1 SIMPLE media_files eq_ref PRIMARY PRIMARY 4 test.playlist_media_files.media_file_id 1 Using where
And the average result is about:
Showing rows 0 - 29 ( 2,702 total, Query took 0.0359 sec)
After adding the composite index and doing ANALYZE TABLE playlist_media_files explain shows:
1 SIMPLE playlist_media_files ref playlist_id,media_file_id,playlist_media_composite playlist_media_composite 4 const 3925 Using index; Using temporary; Using filesort
1 SIMPLE media_files eq_ref PRIMARY PRIMARY 4 test.playlist_media_files.media_file_id 1 Using where
And the average result:
Showing rows 0 - 29 ( 2,702 total, Query took 0.0176 sec)
However in both cases sorting was done in memory (and still creating tmp table and sorting takes 80% of the time here) and looking at your profiling screenshot most of the time is lost on copying temporary table to disc. Thats where the difference comes from. My tables have only columns required for this query, and probably my random strings weren't as long as yours, while you have a lot more columns and you are selecting all of them, sorting only on few. So your temporary table doesn't fit in the memory and obviously doing things on disc has to be a lot slower.
So your main focus here should be either on increasing buffer sizes to accomodate your big select or limiting number of columns to select that maybe you don't need that much.
What's the meaning of the orderbys? Using it in more than one variable just doesn't make sense in this case.
Why not just order by one thing?
You might be having problems with database design normalization, are you familiar with that?
I have follow tables:
|ELEMENTS|
------------
|id_element|
|id_catalog|
|value|
|CATALOG|
------------
|id_catalog|
|catalog_name|
|show|
|status|
I tried to add different indecies (several variants):
1) ELEMENT: pair(id_element, id_catalog) and id_element and id_catalog
2) ELEMENT: pair(id_element, id_catalog) and id_element
3) ELEMENT: pair(id_element, id_catalog) and id_catalog
4) ELEMENT: id_element and id_catalog
1) CATALOG: pair(show, status) and id_catalog
2) CATALOG: id_catalog and show and status
Execute follow select:
SELECT DISTINCT `id_element` FROM `ELEMENTS`
WHERE (id_catalog IN (SELECT `id_catalog` FROM `CATALOG` WHERE status=1 AND show = 1)) limit 10
If there are some rows then it works very fast. But if it is empty - it takes more than 4 sec.
At the same time "SELECTid_catalogFROMCATALOGWHERE status=1 AND show = 1" works fast both there are some rows and empty.
In the table ELEMENTS there are 100.000 records
In the table CATALOG there are 15.000 records
Also I tried "join" but it takes more time than it was before.
Why empty query works so long and what I should do to increase speed rate?
Here are explain answer:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | 'PRIMARY', |'ELEMENTS' | 'index' | '' | null | null | null | 270044 | 'Using where; Using temporary'
2 | 'DEPENDENT SUBQUERY' | 'CATALOG' | 'unique_subquery' | 'PRIMARY,pair,id_catalog' | 'PRIMARY' | '4' | 'func' | 1 | 'Using where'
I guess indexing CATALOG(status,show) would allow a quick answer to the sub-select.
And then some index on ELEMENTS(id_catalog) would speed up the answer to the main question.
Maybe it depends on the statistics on these columns: it they are no selective enough, you'll end up with many rows anyway.
Could you show the output of EXPLAIN when using the two indexes above?
Why not simply writing a join to help the optimizer do its job?
SELECT DISTINCT id_element
FROM elements JOIN catalog ON elements.id_catalog=catalog.id_catalog
WHERE status=1 AND show = 1
LIMIT 10
(untested)
Well, the reason you're having the problem is that you're pulling up the entire catalog database for each request and finding every match between the element and the catalog. If MySQL finds 10 entries, it bails out, but if it never finds them it will continue to check your entire database. I would use an EXISTS query to try and get some performance increase.
SELECT DISTINCT(e.id_element)
FROM ELEMENTS e
WHERE EXISTS (
SELECT *
FROM CATALOG c
WHERE c.id_catalog = e.id_catalog
AND c.status = 1
AND c.show = 1)
LIMIT 10;
This will decrease the amount of time MySQL spends looking for the catalog for each element by imposing a LIMIT 1 on the inner query, but you always run the risk of a long search time when there are possibly no matches.
I would put these indices there:
CREATE INDEX idx_element_1 ON ELEMENT (id_catalog);
CREATE INDEX idx_catalog_1 ON CATALOG (status, show);
Also these, although they might not be needed for your query (these should probably be primary keys, unless you have duplicates):
CREATE INDEX idx_element_2 ON ELEMENT (id_element);
CREATE INDEX idx_catalog_2 ON CATALOG (id_catalog);
Could you drop other indices and create these and check back with the query results?
Thx to all. I solved it by table denormalization. Because there are too much data in this dables which are separated.
I decided to combine it to one table. And now it works perfect. Now query always takes 0.03 second.
I've got a very large table (~100Million Records) in MySQL that contains information about files. One of the pieces of information is the modified date of each file.
I need to write a query that will count the number of files that fit into specified date ranges. To do that I made a small table that specifies these ranges (all in days) and looks like this:
DateRanges
range_id range_name range_start range_end
1 0-90 0 90
2 91-180 91 180
3 181-365 181 365
4 366-1095 366 1095
5 1096+ 1096 999999999
And wrote a query that looks like this:
SELECT r.range_name, sum(IF((DATEDIFF(CURDATE(),t.file_last_access) > r.range_start and DATEDIFF(CURDATE(),t.file_last_access) < r.range_end),1,0)) as FileCount
FROM `DateRanges` r, `HugeFileTable` t
GROUP BY r.range_name
However, quite predictably, this query takes forever to run. I think that is because I am asking MySQL to go through the HugeFileTable 5 times, each time performing the DATEDIFF() calculation on each file.
What I want to do instead is to go through the HugeFileTable record by record only once, and for each file increment the count in the appropriate range_name running total. I can't figure out how to do that....
Can anyone help out with this?
Thanks.
EDIT: MySQL Version: 5.0.45, Tables are MyISAM
EDIT2: Here's the descibe that was asked for in the comments
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL NULL NULL NULL NULL 5 Using temporary; Using filesort
1 SIMPLE t ALL NULL NULL NULL NULL 96506321
First, create an index on HugeFileTable.file_last_access.
Then try the following query:
SELECT r.range_name, COUNT(t.file_last_access) as FileCount
FROM `DateRanges` r
JOIN `HugeFileTable` t
ON (t.file_last_access BETWEEN
CURDATE() + INTERVAL r.range_start DAY AND
CURDATE() + INTERVAL r.range_end DAY)
GROUP BY r.range_name;
Here's the EXPLAIN plan that I got when I tried this query on MySQL 5.0.75 (edited down for brevity):
+-------+-------+------------------+----------------------------------------------+
| table | type | key | Extra |
+-------+-------+------------------+----------------------------------------------+
| t | index | file_last_access | Using index; Using temporary; Using filesort |
| r | ALL | NULL | Using where |
+-------+-------+------------------+----------------------------------------------+
It's still not going to perform very well. By using GROUP BY, the query incurs a temporary table, which may be expensive. Not much you can do about that.
But at least this query eliminates the Cartesian product that you had in your original query.
update: Here's another query that uses a correlated subquery but I have eliminated the GROUP BY.
SELECT r.range_name,
(SELECT COUNT(*)
FROM `HugeFileTable` t
WHERE t.file_last_access BETWEEN
CURDATE() - INTERVAL r.range_end DAY AND
CURDATE() - INTERVAL r.range_start DAY
) as FileCount
FROM `DateRanges` r;
The EXPLAIN plan shows no temporary table or filesort (at least with the trivial amount of rows I have in my test tables):
+----+--------------------+-------+-------+------------------+--------------------------+
| id | select_type | table | type | key | Extra |
+----+--------------------+-------+-------+------------------+--------------------------+
| 1 | PRIMARY | r | ALL | NULL | |
| 2 | DEPENDENT SUBQUERY | t | index | file_last_access | Using where; Using index |
+----+--------------------+-------+-------+------------------+--------------------------+
Try this query on your data set and see if it performs better.
Well, start by making sure that file_last_access is an index for the table HugeFileTable.
I'm not sure if this is possible\better, but try to compute the dates limits first (files from date A to date B), then use some query with >= and <=. It will, theoretically at least, improve the performance.
The comparison would be something like:
t.file_last_access >= StartDate AND t.file_last_access <= EndDate
You could get a small improvement by removing CURDATE() and putting a date in the query as it will run this function for each row twice in your SQL.