Force query optimizer to use Primary key - mysql

I'm running a query that spans 3 tables, none of which have > 55K rows. This query is taking 20+ seconds to run, which seems excessive:
SELECT
`cp`.`author`,
`cc`.`contents`
FROM
`challenge_properties` as `cp`,
`challenges` as `c`,
`challenge_contents` as `cc`
WHERE
`cp`.`followup_id` = `c`.`latest_followup` AND
`cp`.`status` = 'new' AND
`c`.`id` = `cp`.`challenge_id` AND
`c`.`id` = `cc`.`challenge_id`
This is the result of EXPLAINing that query:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ALL
possible_keys: PRIMARY,latest_followup_index
key: NULL
key_len: NULL
ref: NULL
rows: 13817
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: cc
type: ref
possible_keys: challenge_id
key: challenge_id
key_len: 5
ref: cts.c.id
rows: 1
Extra: Using where
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: cp
type: ref
possible_keys: challenge_id,followup_id
key: followup_id
key_len: 5
ref: cts.c.latest_followup
rows: 1
Extra: Using where
As you can see, the first table, challenges has a primary key, but it's not being used. I've tried adding the FORCE KEY(PRIMARY) clause to the challenges table declaration, but it's still not used.
What can I do to speed up this query? Thanks.

Your query is selecting ALL records from the challenges table — therefore there is no need to use any index on the records from that table. Basically MySQL is selecting every record in challenges, then finding matching records in the other two tables.
Couldn't you just leave out the challenges table all together? You're not selecting data from that table, and the only time that table would limit the data selected would be if your other tables had invalid challenge_ids, which foreign keys can take care of...
SELECT
`cp`.`author`,
`cc`.`contents`
FROM
`challenge_properties` as `cp`,
`challenge_contents` as `cc`
WHERE
`cp`.`status` = 'new' AND
`cp`.`challenge_id` = `cc`.`challenge_id`
EDIT: You say you can't remove the challenges table from the query... I would try specifying your JOIN conditions in the JOIN clause instead of the WHERE:
SELECT
`cp`.`author`,
`cc`.`contents`
FROM `challenge_properties` as `cp`
JOIN `challenges` as `c`
ON `cp`.`challenge_id` = `c`.`id`
AND `cp`.`followup_id` = `c`.`latest_followup`
JOIN `challenge_contents` as `cc`
ON `cc`.`challenge_id` = `c`.`id`
WHERE `cp`.`status` = 'new'
The query optimizer might already do this for you, but it doesn't hurt to try it, and I think it's easier to see how the joins are happening with this syntax.
You could also try adding another index to challenge_properties on ( challenge_id, followup_id ), and another to challenges on ( challenge_id, latest_followup ) — the complex key might help MySQL work quicker. But it's also possible that the problem might be outside your query... usually when you EXPLAIN and see only one table with big numbers in the rows column, your query is pretty well optimized. MySQL is only looking at one row in challenge_properties and one row in challenge_contents, and scanning every row in challenges to find a match.
EDIT 2:
Unfortunately, I'm not sure what else can be done to optimize this query. You can get slightly more performance if the indexes used (cc.challenge_id and cp.followup_id) are UNIQUE NOT NULL indexes, and you should get better performance with a complex index for cp on (cp.challenge_id, cp.followup_id). This would turn those type: ref into type: eq_ref, which is slightly better. But that's about it... do you not have problems with any other queries? Your query should theoretically return 13817 rows... is the amount of data possibly the problem? Does it speed up significantly if you just select COUNT(*) instead of returning all the rows?

Related

Should using USE/FORCE INDEX change the EXPLAIN output in a MySQL query?

As title suggests, should the EXPLAIN output change after explicitly using FORCE INDEX (index_1, index_2) in a query?
As an example I have the following query:
select
person_id,
role_id,
scope_id,
count(distinct qualification_id) as ncomps
from dw_rolepersonqualification
where ((mandatory = 'y') and (expiry_date > now()))
group by 1, 2, 3
When I run it with EXPLAIN, I get:
id: 1
select_type: SIMPLE
table: dw_rolepersonqualification
type: ALL
possible_keys: PRIMARY, idx_person, idx_role, idx_qualification, idx_scope, idx_mandatory
key: null
key_len: null
ref: null
rows: 8267852
Extra: Using where; Using filesort
When I add in FORCE INDEX (dx_person, idx_role, idx_qualification, idx_scope) it does not change the output of EXPLAIN. Is this to be expected or am I missing something?
The optimal index for that query is
INDEX(mandatory, -- tested with "=", so first
expiry_date) -- a range
Even so, it may decide that the index is not worth the effort. If the Optimizer estimates that more than ~20% of the table matches the WHERE clause, it will decide to scan the table rather than bouncing between the index's BTree and the data BTree.
A "covering" index may (or may not) be better:
INDEX(mandatory, -- tested with "=", so first
expiry_date, -- a range
person_id, role_id, scope_id,
qualification_id) -- all other touched columns (any order)
(Caveat: Some of what I say may not be valid; please provide SHOW CREATE TABLE.)
My Mantra: "Force Index may help today, but hurt tomorrow."

Simple SQL Query takes 10-20 times more with "ORDER BY"

Simple SQL query takes 10 to 20 times longer with "ORDER BY". How can I speed it up?
My first query was:
SELECT *
FROM wp_usermeta
WHERE meta_key = 'partner'
AND meta_value = 1
ORDER BY user_id DESC
LIMIT 5
It takes 0.2601 seconds. After some research I could optimize it to:
SELECT user_id
FROM wp_usermeta
WHERE meta_key = 'partner'
AND meta_value = '1'
ORDER BY umeta_id DESC
LIMIT 5
This query takes just 0.1491 seconds, but still too much. If I remove the ORDER BY, it takes only 0.0075 seconds.
I read a lot on Stackoverflow and other forums, but I could not get a better output. Has someone an idea?
It is a standard WordPress usermeta table.
The wp_usermeta table in WordPress is well-known and it has a single-column index on meta_key.
But this selects all rows with the specified key, which doesn't narrow down the search much. Also it doesn't help sorting, so the query must do extra work to do the sorting:
mysql> explain SELECT * FROM wp_usermeta WHERE meta_key = 'partner' AND meta_value = 1 ORDER BY user_id DESC LIMIT 5\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: wp_usermeta
type: ref
possible_keys: meta_key
key: meta_key
key_len: 767
ref: const
rows: 1
Extra: Using where; Using filesort
Adding a new index should help:
mysql> alter table wp_usermeta add key (meta_key(191), meta_value(191), user_id);
mysql> explain SELECT * FROM wp_usermeta WHERE meta_key = 'partner' AND meta_value = 1 ORDER BY user_id DESC LIMIT 5\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: wp_usermeta
type: ref
possible_keys: meta_key_2,meta_key
key: meta_key_2
key_len: 767
ref: const
rows: 1
Extra: Using where; Using filesort
Even though this shows it's using the new index (meta_key_2), it's not helping. The key_len and ref indicate it's only using the first column of the index. Why can't it use both columns?
Because your query compares the integer value 1 to a string column meta_value. You must compare similar types, i.e. string '1' to the string column:
mysql> explain SELECT * FROM wp_usermeta WHERE meta_key = 'partner' AND meta_value = '1' ORDER BY user_id DESC LIMIT 5\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: wp_usermeta
type: ref
possible_keys: meta_key_2,meta_key
key: meta_key_2
key_len: 1534
ref: const,const
rows: 1
Extra: Using where
Now it's able to use the second column in the index to search for value '1', you can tell because key_len: 1534 and ref: const,const indicate it's using two columns of the index instead of one column.
Then the optimizer realizes it's already reading the data in order by user_id, so there's no need to sort. The "Using filesort" goes away.
WP has an inefficient schema for its "meta" tables. But they can be fixed. In [here], I discuss several things that need fixing. And I explain the 191 kludge, plus 5 options for avoiding it.
And I don't get into "index merge intersect", since a composite index is always(?) better.

MySQL Slow query for 'COUNT'

The following query takes 0.7s on a 2.5Ghz dual core Windows Server 2008 R2 Enterprise when run over a 4.5Gb MySql database. sIndex10 is a varchar(1024) column type:
SELECT COUNT(*) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
An EXPLAIN shows the following information:
id: 1
select_type: SIMPLE
table: e_entity
type: ref
possible_keys: App_Parent,sIndex10
key: App_Parent
key_len: 4
ref: const
rows: 270066
extra: Using Where
There are 230060 rows matching the first condition and 124216 rows matching the clause with AND operator. meta_oid is indexed, and although sIndex10 is also indexed I think it's correctly not picking up this index as a FORCE INDEX (sIndex10), takes longer.
We have had a look at configuration parameters such as innodb_buffer_pool_size and they look correct as well.
Given that this table has already 642532 records, have we reached the top of the performance mysql can offer? Is it at this point investing in hardware the only way forward?
WHERE meta_oid=336799 AND sIndex10 = ''
begs for a composite index
INDEX(meta_oid, sIndex10) -- in either order
That is not the same as having separate indexes on the columns.
That's all there is to it.
Index Cookbook
One thing I alway do is just count(id) since id is (nearly) always indexed, counting just the id only has to look at the index.
So try running and see if it performs any better. You should also add SQL_NO_CACHE when testing to get a better idea of how the query performs.
SELECT SQL_NO_CACHE COUNT(id) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
Note: This is probably not the complete answer for your question, but it was too long for just a comment.

Simple query optimization (WHERE + ORDER + LIMIT)

I have this query that runs unbelievably slow (4 minutes):
SELECT * FROM `ad` WHERE `ad`.`user_id` = USER_ID ORDER BY `ad`.`id` desc LIMIT 20;
Ad table has approximately 10 million rows.
SELECT COUNT(*) FROM `ad` WHERE `ad`.`user_id` = USER_ID;
Returns 10k rows.
Table has following indexes:
PRIMARY KEY (`id`),
KEY `idx_user_id` (`user_id`,`status`,`sorttime`),
EXPLAIN gives this:
id: 1
select_type: SIMPLE
table: ad
type: index
possible_keys: idx_user_id
key: PRIMARY
key_len: 4
ref: NULL
rows: 4249
Extra: Using where
I am failing to understand why does it take so long? Also this query is generated by ORM (pagination) so it would be nice to optimize it from outside (maybe add some extra index).
BTW this query works fast:
select aa.*
from (select id from ad where user_id=USER_ID order by id desc limit 20) as a
join ad as aa on a.id = aa.id ;
Edit: I tried another user with much less rows (dozens) than original one. I am wondering why doesn't original query use idx_user_id:
EXPLAIN SELECT * FROM `ad` WHERE `ad`.`user_id` = ANOTHER_ID ORDER BY `ad`.`id` desc LIMIT 20;
id: 1
select_type: SIMPLE
table: ad
type: ref
possible_keys: idx_user_id
**key: idx_user_id**
key_len: 3
ref: const
rows: 84
Extra: Using where; Using filesort
Edit2: with help of Alexander I decided to try force MySQL to use the index I want, and following query is much faster (1 sec instead of 4 mins):
SELECT *
FROM `ad` USE INDEX (idx_user_id)
WHERE `ad`.`user_id` = 1884774
ORDER BY `ad`.`id` desc LIMIT 20;
In the EXPLAIN output you can see that the key value is PRIMARY. This means that MySQL optimizer decided that it is faster to scan all table records (which are already sorted by id) and search first 20 records with the specific user_id value than to use idx_user_id key, which was considered by optimizer as a possible key and then rejected.
In your second query the optimizer sees that only id values are necessary in the subquery, and decided to use idx_user_id index instead, as that index allows to calculate the list of necessary ids without touching the table itself. Then only 20 records are retrieved by direct search by primary key value, which is very fast operation for that small number of records.
As you query with ANOTHER_ID shows, the MySQL wrong decision was based on the number of rows for the previous USER_ID value. This number was so big that the optimizer guessed that it will find the first 20 records with this specific user_id faster just by looking at the table records itself and skipping records with wrong user_id values.
If table rows are accessed by index, it requires random access operations. For typical HDD random access operations are about 100 time slower then sequential scan. So in order for index to be useful it must reduce the count of rows to less then 1% of the total rows count. If the rows for the specific USER_ID value accounts for more than 1% of the total number of rows, it may be more efficient to do full table scan instead of using of index, if we want to retrieve all these rows. But MySQL optimizer doesn't takes into account the fact that only 20 of this rows will be retrieved. So it mistakenly decided not to use index and do full table scan instead.
In order to make your query fast for any user_id value you can add one more index which will allow the query execution in the fastest way possible:
create index idx_user_id_2 on ad(user_id, id);
This index allows MySQL to do both filtering and sorting. To do that the columns used for filtering should be placed first, and the columns used for ordering should be placed second. MySQL should be smart enough to use that index, because this index allows to search all necessary records without skipping any records.

Why does this query cause lock wait timeouts?

Our team just spent the last week debugging and trying to find the source of many mysql lock timeouts and many extremely long running queries. In the end it appears that this query is the culprit.
mysql> explain
SELECT categories.name AS cat_name,
COUNT(distinct items.id) AS category_count
FROM `items`
INNER JOIN `categories` ON `categories`.`id` = `items`.`category_id`
WHERE `items`.`state` IN ('listed', 'reserved')
AND (items.category_id IS NOT NULL)
GROUP BY categories.name
ORDER BY category_count DESC
LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: items
type: range
possible_keys: index_items_on_category_id,index_items_on_state
key: index_items_on_category_id
key_len: 5
ref: NULL
rows: 119371
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: production_db.items.category_id
rows: 1
Extra:
2 rows in set (0.00 sec)
I can see that it is doing a nasty table scan and creating a temporary table to run.
Why would this query cause database response times to go up by a factor of ten and some queries that usually take 40-50ms (updates on items table), to explode to 50,000 ms and higher at times?
Is hard to tell without more information like
Is that running inside a transaction?
If so, what's the isolation level?
How many categories are there?
How many items?
My guess would be that the query is too slow and its running inside a
transaction (which it probably is since you have this problem) and is
probably issuing range-locks on the items table which cannot allow
writes to proceed hence slowing the updates till they can get a lock
on the table.
And I have a couple of comments based on what I can see from your query and execution plan:
1) Your items.state would probably be better as a catalog, instead of having the string on every row in items, this is for space efficiency and comparing IDs is way faster than comparing strings (regardless of whatever optimizations the engine may do).
2) I am guessing items.state is a column with low cardinality (few unique values), hence an index in that column is probably hurting you more than helping you. Every index adds over head when inserting/deleting/updating rows since the indexes have to be mantained, this particular index probably is not used that much to be worthwhile. Of course, I am just guessing, it depends on the rest of the queries.
SELECT
; Grouping by name, means comparing strings.
categories.name AS cat_name,
; No need for distinct, the same item.id cannot belong to different categories
COUNT(distinct items.id) AS category_count
FROM `items`
INNER JOIN `categories` ON `categories`.`id` = `items`.`category_id`
WHERE `items`.`state` IN ('listed', 'reserved')
; Not needed, the inner join gets rid of items with no category_id
AND (items.category_id IS NOT NULL)
GROUP BY categories.name
ORDER BY category_count DESC
LIMIT 10\G
The way this query is structured is basically having to scan the entire items table since its using the category_id index, then filtering by the where clause, then, joining with the category table, which means an index seek on the primary key (categories.id) index per item row in the items result set. Then grouping by name (using strings comparison) to count, then getting rid of everything but 10 of the results.
I would write the query like:
SELECT categories.name, counts.n
FROM (SELECT category_id, COUNT(id) n
FROM items
WHERE state IN ('listed', 'reserved') AND category_id is not null
GROUP BY category_id ORDER BY COUNT(id) DESC LIMIT 10) counts
JOIN categories on counts.category_id = categories.id
ORDER BY counts.n desc
(I am sorry if the syntax ain't perfect I am not running MySQL)
With this query what the engine will probably do is :
Use the items.state index to get the 'listed', 'reserved' items and group by category_id comparing numbers, not strings then getting only the 10 topmost counts, then join with categories to get the name (but using only 10 index seeks).