Why does this query cause lock wait timeouts? - mysql

Our team just spent the last week debugging and trying to find the source of many mysql lock timeouts and many extremely long running queries. In the end it appears that this query is the culprit.
mysql> explain
SELECT categories.name AS cat_name,
COUNT(distinct items.id) AS category_count
FROM `items`
INNER JOIN `categories` ON `categories`.`id` = `items`.`category_id`
WHERE `items`.`state` IN ('listed', 'reserved')
AND (items.category_id IS NOT NULL)
GROUP BY categories.name
ORDER BY category_count DESC
LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: items
type: range
possible_keys: index_items_on_category_id,index_items_on_state
key: index_items_on_category_id
key_len: 5
ref: NULL
rows: 119371
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: production_db.items.category_id
rows: 1
Extra:
2 rows in set (0.00 sec)
I can see that it is doing a nasty table scan and creating a temporary table to run.
Why would this query cause database response times to go up by a factor of ten and some queries that usually take 40-50ms (updates on items table), to explode to 50,000 ms and higher at times?

Is hard to tell without more information like
Is that running inside a transaction?
If so, what's the isolation level?
How many categories are there?
How many items?
My guess would be that the query is too slow and its running inside a
transaction (which it probably is since you have this problem) and is
probably issuing range-locks on the items table which cannot allow
writes to proceed hence slowing the updates till they can get a lock
on the table.
And I have a couple of comments based on what I can see from your query and execution plan:
1) Your items.state would probably be better as a catalog, instead of having the string on every row in items, this is for space efficiency and comparing IDs is way faster than comparing strings (regardless of whatever optimizations the engine may do).
2) I am guessing items.state is a column with low cardinality (few unique values), hence an index in that column is probably hurting you more than helping you. Every index adds over head when inserting/deleting/updating rows since the indexes have to be mantained, this particular index probably is not used that much to be worthwhile. Of course, I am just guessing, it depends on the rest of the queries.
SELECT
; Grouping by name, means comparing strings.
categories.name AS cat_name,
; No need for distinct, the same item.id cannot belong to different categories
COUNT(distinct items.id) AS category_count
FROM `items`
INNER JOIN `categories` ON `categories`.`id` = `items`.`category_id`
WHERE `items`.`state` IN ('listed', 'reserved')
; Not needed, the inner join gets rid of items with no category_id
AND (items.category_id IS NOT NULL)
GROUP BY categories.name
ORDER BY category_count DESC
LIMIT 10\G
The way this query is structured is basically having to scan the entire items table since its using the category_id index, then filtering by the where clause, then, joining with the category table, which means an index seek on the primary key (categories.id) index per item row in the items result set. Then grouping by name (using strings comparison) to count, then getting rid of everything but 10 of the results.
I would write the query like:
SELECT categories.name, counts.n
FROM (SELECT category_id, COUNT(id) n
FROM items
WHERE state IN ('listed', 'reserved') AND category_id is not null
GROUP BY category_id ORDER BY COUNT(id) DESC LIMIT 10) counts
JOIN categories on counts.category_id = categories.id
ORDER BY counts.n desc
(I am sorry if the syntax ain't perfect I am not running MySQL)
With this query what the engine will probably do is :
Use the items.state index to get the 'listed', 'reserved' items and group by category_id comparing numbers, not strings then getting only the 10 topmost counts, then join with categories to get the name (but using only 10 index seeks).

Related

Basic query in unexpectedly slow in MySQL

I am running a basic select on a table with 189,000 records. The table structure is:
items
id - primary key
ad_count - int, indexed
company_id - varchar, indexed
timestamps
the select query is:
select *
from `items`
where `company_id` is not null
and `ad_count` <= 100
order by `ad_count` desc, `items`.`id` asc
limit 50
On my production servers, just the MySQL portion of the execution takes 300 - 400ms
If I run an explain, I get:
select type: SIMPLE
table: items
type: range
possible_keys: items_company_id_index,items_ad_count_index
key: items_company_id_index
key_len: 403
ref: NULL
rows: 94735
Extra: Using index condition; Using where; Using filesort
When fetching this data in our application, we paginate it groups of 50, but the above query is "the first page"
I'm not too familiar with dissecting explain queries. Is there something I'm missing here?
An ORDER BY clause with different sorting order can cause the creation of temporary tables and filesort. MySQL below (and including) v5.7 doesn't handle such scenarios well at all, and there is actually no point in indexing the fields in the ORDER BY clause, as MySQL's optimizer will never use them.
Therefore, if the application's requirements allow, it's best to use the same order for all columns in the ORDER BY clause.
So in this case:
order by `ad_count` desc, `items`.`id` asc
Will become:
order by `ad_count` desc, `items`.`id` desc
P.S, as a small tip to read more about - it seems that MySQL 8.0 is going to change things and these use cases might perform significantly better when it's released.
Try replacing items_company_id_index with a multi-column index on (company_id, ad_count).
DROP INDEX items_company_id_index ON items;
CREATE INDEX items_company_id_ad_count_index ON items (company_id, ad_count);
This will allow it to use the index to test both conditions in the WHERE clause. Currently, it's using the index just to find non-null company_id, and then doing a full scan of those records to test ad_count. If most records have non-null company_id, it's scanning most of the table.
You don't need to retain the old index on just the company_id column, because a multi-column index is also an index on any prefix columns, because of the way B-trees work.
I could be wrong here (depending on your sql version this could be faster) but try a Inner Join with your company table.
Like:
Select *
From items
INNER JOIN companies ON companies.id = items.company_id
and items.ad_count <= 100
LIMIT 50;
because of your high indexcount building the btrees will slow down the database each time a new entry is inserted. Maybe remove the index of ad_count?! (this depends on how often you use that entry for queries)

Simple query optimization (WHERE + ORDER + LIMIT)

I have this query that runs unbelievably slow (4 minutes):
SELECT * FROM `ad` WHERE `ad`.`user_id` = USER_ID ORDER BY `ad`.`id` desc LIMIT 20;
Ad table has approximately 10 million rows.
SELECT COUNT(*) FROM `ad` WHERE `ad`.`user_id` = USER_ID;
Returns 10k rows.
Table has following indexes:
PRIMARY KEY (`id`),
KEY `idx_user_id` (`user_id`,`status`,`sorttime`),
EXPLAIN gives this:
id: 1
select_type: SIMPLE
table: ad
type: index
possible_keys: idx_user_id
key: PRIMARY
key_len: 4
ref: NULL
rows: 4249
Extra: Using where
I am failing to understand why does it take so long? Also this query is generated by ORM (pagination) so it would be nice to optimize it from outside (maybe add some extra index).
BTW this query works fast:
select aa.*
from (select id from ad where user_id=USER_ID order by id desc limit 20) as a
join ad as aa on a.id = aa.id ;
Edit: I tried another user with much less rows (dozens) than original one. I am wondering why doesn't original query use idx_user_id:
EXPLAIN SELECT * FROM `ad` WHERE `ad`.`user_id` = ANOTHER_ID ORDER BY `ad`.`id` desc LIMIT 20;
id: 1
select_type: SIMPLE
table: ad
type: ref
possible_keys: idx_user_id
**key: idx_user_id**
key_len: 3
ref: const
rows: 84
Extra: Using where; Using filesort
Edit2: with help of Alexander I decided to try force MySQL to use the index I want, and following query is much faster (1 sec instead of 4 mins):
SELECT *
FROM `ad` USE INDEX (idx_user_id)
WHERE `ad`.`user_id` = 1884774
ORDER BY `ad`.`id` desc LIMIT 20;
In the EXPLAIN output you can see that the key value is PRIMARY. This means that MySQL optimizer decided that it is faster to scan all table records (which are already sorted by id) and search first 20 records with the specific user_id value than to use idx_user_id key, which was considered by optimizer as a possible key and then rejected.
In your second query the optimizer sees that only id values are necessary in the subquery, and decided to use idx_user_id index instead, as that index allows to calculate the list of necessary ids without touching the table itself. Then only 20 records are retrieved by direct search by primary key value, which is very fast operation for that small number of records.
As you query with ANOTHER_ID shows, the MySQL wrong decision was based on the number of rows for the previous USER_ID value. This number was so big that the optimizer guessed that it will find the first 20 records with this specific user_id faster just by looking at the table records itself and skipping records with wrong user_id values.
If table rows are accessed by index, it requires random access operations. For typical HDD random access operations are about 100 time slower then sequential scan. So in order for index to be useful it must reduce the count of rows to less then 1% of the total rows count. If the rows for the specific USER_ID value accounts for more than 1% of the total number of rows, it may be more efficient to do full table scan instead of using of index, if we want to retrieve all these rows. But MySQL optimizer doesn't takes into account the fact that only 20 of this rows will be retrieved. So it mistakenly decided not to use index and do full table scan instead.
In order to make your query fast for any user_id value you can add one more index which will allow the query execution in the fastest way possible:
create index idx_user_id_2 on ad(user_id, id);
This index allows MySQL to do both filtering and sorting. To do that the columns used for filtering should be placed first, and the columns used for ordering should be placed second. MySQL should be smart enough to use that index, because this index allows to search all necessary records without skipping any records.

Cost of a count statement for leaderboard ranking?

I'm using mysql for a game. I have a scores table of approximately 150,000 records. The table looks like:
fk_user_id | high_score
The high_score column is an int. It has an index on it. I want to figure out a user's rank by running the following:
SELECT COUNT(*) AS count FROM scores WHERE high_score >= [x]
so supplying a user's current high_score to the above, I can get their rank. The idea would be that every time the user looks at a profile page, I would run the above to get the rank.
I'm wondering how expensive this is, and if I should even go down this path. Is mysql scanning the entire table every time the query is issued? Is this a crazy idea?
Update: Here's what 'explain' says about the query:
id: 1
select_type: SIMPLE
table: scores
type: range
possible_keys: high_score
key: high_score
key_len: 5
ref: null
rows: 1
extra: Using where; Using index
Thanks
MySQL is scanning the entire table for every record you ask it to return.
Why use count(*) can't you use a count(distinct User_ID) or count(user_ID)
You should already have that column indexed, and i'm sure it would return your results accurately.
SELECT COUNT(distinct user_ID) AS count FROM scores WHERE high_score >= [x]
If high_score is index then cost is relatively small if not - then full table scan is made.
Relatively small - just read rowids from key and count them - very small cost.
You can always write explain followed by query to check database exactly is doing for fetching your data.

Force query optimizer to use Primary key

I'm running a query that spans 3 tables, none of which have > 55K rows. This query is taking 20+ seconds to run, which seems excessive:
SELECT
`cp`.`author`,
`cc`.`contents`
FROM
`challenge_properties` as `cp`,
`challenges` as `c`,
`challenge_contents` as `cc`
WHERE
`cp`.`followup_id` = `c`.`latest_followup` AND
`cp`.`status` = 'new' AND
`c`.`id` = `cp`.`challenge_id` AND
`c`.`id` = `cc`.`challenge_id`
This is the result of EXPLAINing that query:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ALL
possible_keys: PRIMARY,latest_followup_index
key: NULL
key_len: NULL
ref: NULL
rows: 13817
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: cc
type: ref
possible_keys: challenge_id
key: challenge_id
key_len: 5
ref: cts.c.id
rows: 1
Extra: Using where
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: cp
type: ref
possible_keys: challenge_id,followup_id
key: followup_id
key_len: 5
ref: cts.c.latest_followup
rows: 1
Extra: Using where
As you can see, the first table, challenges has a primary key, but it's not being used. I've tried adding the FORCE KEY(PRIMARY) clause to the challenges table declaration, but it's still not used.
What can I do to speed up this query? Thanks.
Your query is selecting ALL records from the challenges table — therefore there is no need to use any index on the records from that table. Basically MySQL is selecting every record in challenges, then finding matching records in the other two tables.
Couldn't you just leave out the challenges table all together? You're not selecting data from that table, and the only time that table would limit the data selected would be if your other tables had invalid challenge_ids, which foreign keys can take care of...
SELECT
`cp`.`author`,
`cc`.`contents`
FROM
`challenge_properties` as `cp`,
`challenge_contents` as `cc`
WHERE
`cp`.`status` = 'new' AND
`cp`.`challenge_id` = `cc`.`challenge_id`
EDIT: You say you can't remove the challenges table from the query... I would try specifying your JOIN conditions in the JOIN clause instead of the WHERE:
SELECT
`cp`.`author`,
`cc`.`contents`
FROM `challenge_properties` as `cp`
JOIN `challenges` as `c`
ON `cp`.`challenge_id` = `c`.`id`
AND `cp`.`followup_id` = `c`.`latest_followup`
JOIN `challenge_contents` as `cc`
ON `cc`.`challenge_id` = `c`.`id`
WHERE `cp`.`status` = 'new'
The query optimizer might already do this for you, but it doesn't hurt to try it, and I think it's easier to see how the joins are happening with this syntax.
You could also try adding another index to challenge_properties on ( challenge_id, followup_id ), and another to challenges on ( challenge_id, latest_followup ) — the complex key might help MySQL work quicker. But it's also possible that the problem might be outside your query... usually when you EXPLAIN and see only one table with big numbers in the rows column, your query is pretty well optimized. MySQL is only looking at one row in challenge_properties and one row in challenge_contents, and scanning every row in challenges to find a match.
EDIT 2:
Unfortunately, I'm not sure what else can be done to optimize this query. You can get slightly more performance if the indexes used (cc.challenge_id and cp.followup_id) are UNIQUE NOT NULL indexes, and you should get better performance with a complex index for cp on (cp.challenge_id, cp.followup_id). This would turn those type: ref into type: eq_ref, which is slightly better. But that's about it... do you not have problems with any other queries? Your query should theoretically return 13817 rows... is the amount of data possibly the problem? Does it speed up significantly if you just select COUNT(*) instead of returning all the rows?

MySQL query optimization. Avoiding temporary & filesort

Currently I have a table with close to 1 million rows, which I need to query from. What I need to be able to do is stack rank packages on the number of products they include from a given list of product id's.
SELECT count(productID) AS commonProducts, packageID
FROM supply
WHERE productID IN (2,3,4,5,6,7,8,9,10)
GROUP BY packageID
ORDER BY commonProducts
DESC LIMIT 10
The query works fine, but I would like to improve upon it. I tried a multi-column index on productID and packageID, but it seemed to seek more rows than just having a separate index for each of the columns.
MySQL Explain
select_type: SIMPLE
table: supply
type: range
possible_keys: supplyID
key: supplyID
key_len: 3
ref: null
rows: 996
extra: Using where; Using temporary; Using filesort
My main concern is that the query is using a temporary table and filesort. How could I go about optimizing this query? I presume that the biggest issues is count() and the ORDER BY on the results of count().
You can remove the temp table using a Dependent Subquery:
select * from
(
SELECT count(productID) AS commonProducts, s.productId, s.packageID
FROM supply as s
WHERE EXISTS
(
select 1 from supply as innerS
where innerS.productID in (2,3,4,5,6,7,8,9,10)
and s.productId = innerS.productId
)
GROUP BY s.packageID
) AS t
ORDER BY t.commonProducts
DESC LIMIT 10
The inner query links to the outer query and preserves the index. You'll find that any query that sorts on commonProducts, including the above query, will use a filesort, as count(*) is definitely not indexed. But fear not, filesort is just a fancy word for sort -- mysql can choose to use an effective in-memory sort -- and whether you did it now or as a mergesort on the way to an indexed temporary table, you'll have to pay for that sorting somewhere. However, this case is pretty good because filesort will stop sorting once it hits the LIMIT you've put in place. It will not sort the entire list of commonProducts.
Update
If this query is going to be run all the time, I would recommend (without getting too fancy) setting triggers on the supply table to update a legitimate table that tracks counters like this one.
Creatng a temporary resulte set:
SELECT TMP.*
FROM ( SELECT count(productID) AS commonProducts, packageID
FROM supply
WHERE productID IN (2,3,4,5,6,7,8,9,10)
GROUP BY packageID
) AS TMP
ORDER BY commonProducts
DESC LIMIT 10
Perhaps it's not the most elegant way and I cannot guarantee it will be faster because everything depends on your particular data. But in some cases this gives much better results:
SELECT count(*) AS commonProducts, packageID
FROM (
SELECT packageID FROM supply WHERE productID = 2
UNION ALL
SELECT packageID FROM supply WHERE productID = 3
UNION ALL
.
.
.
SELECT packageID FROM supply WHERE productID = 10
) AS t
GROUP BY packageID
ORDER BY commonProducts DESC
LIMIT 10