How does Index Scope work in Mysql?

How does Index Scope work in Mysql? - mysql

In the MySQL manual there is a page on index hinting that mentions that you can specify the index hinting for specific parts of the query.
You can specify the scope of an index hint by adding a FOR clause to the hint. This provides more fine-grained control over the optimizer's selection of an execution plan for various phases of query processing. To affect only the indexes used when MySQL decides how to find rows in the table and how to process joins, use FOR JOIN. To influence index usage for sorting or grouping rows, use FOR ORDER BY or FOR GROUP BY.
However, there is little to no more information about how this works or what it actually does in the MySQL optimizer. As well in practice it appears to be negligible in actually improving anything.
Here is a test query and what explain says about the query:
SELECT
`property`.`primary_id` AS `id`
FROM `California` `property`
USE INDEX FOR JOIN (`Zipcode Bedrooms`)
USE INDEX FOR ORDER BY (`Zipcode Bathrooms`)
INNER JOIN `application_zipcodes` `az`
ON `az`.`application_id` = '18'
AND `az`.`zipcode` = `property`.`zipcode`
WHERE `property`.`city` = 'San Jose'
AND `property.`zipcode` = '95133'
AND `property`.property_type` = 'Residential'
AND `property`.`style` = 'Condominium'
AND `property`.`bedrooms` = '3'
ORDER BY `property`.`bathrooms` ASC
LIMIT 15
;
Explain:
EXPLAIN SELECT `property`.`primary_id` AS `id` FROM `California` `property` USE INDEX FOR JOIN (`Zipcode Bedrooms`) USE INDEX FOR ORDER BY (`Zipcode Bathrooms`) INNER JOIN `application_zipcodes` `az` ON `az`.`application_id` = '18' AND `az`.`zipcode` = `property`.`zipcode` WHERE `property`.`city` = 'San Jose' AND `property.`zipcode` = '95133' AND `property`.property_type` = 'Residential' AND `property`.`style` = 'Condominium' AND `property`.`bedrooms` = '3' ORDER BY `property`.`bathrooms` ASC LIMIT 15\g
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | Property | ref | Zip Bed | Zip Bed | 17 | const,const | 2364 | Using index condition; Using where; Using filesort |
| 1 | SIMPLE | az | eq_ref | PRIMARY | PRIMARY | 7 | const,Property.zipcode | 1 | Using where; Using index |
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
2 rows in set (0.01 sec)
So to summarize I am basically wondering how the index scope was meant to be used, as this doesn't seem to do anything when I add or remove the line USE INDEX FOR ORDER BY (Zipcode Bathrooms).

I have yet to figure out how multiple hints can be used. MySQL will almost never use more than one index per SELECT. The only exception I know of is with "index merge", which is not relevant in your example.
The Optimizer usually focuses on finding a good index for the WHERE clause. If it entirely covers the WHERE, without any "ranges", then it checks to see if there are GROUP BY and ORDER BY fields, in the right order, to use. If it can handle all of WHERE, GROUP BY, and ORDER BY, then it can actually optimize the LIMIT (but not OFFSET).
If the Optimizer can't consume all the WHERE, it may reach into the ORDER BY in hopes avoiding the "filesort" that ORDER BY otherwise requires.
None of this allows for different indexes for different clauses. A single hint may encourage the use of one of the above cases (above) in preference to the other; I don't know.
Don't use utf8 for zipcode; it makes things bulkier than necessary (3 bytes per character). In general, shrinking the size of the table will help performance some. Or, if you have a huge dataset, it may help perf a lot. (Avoiding I/O is very important.)
Bathrooms is not very selective; there is not much to gain even if it would be possible.
az.application_id is the big monkey wrench in the query; what is it?

Related

MySql not picking correct index for few queries

I'm running follwing query on the table, I'm changing values in the where condition, while running in one case it's taking one index and another case taking it's another(wrong??) index.
row count for query 1 is 402954 it's taking approx 1.5 sec
row count for query 2 is 52097 it's taking approx 35 sec
Both queries query 1 and query 2 are same , only I'm changing values in the where condition
query 1
EXPLAIN SELECT
log_type,count(DISTINCT subscriber_id) AS distinct_count,
count(subscriber_id) as total_count
FROM campaign_logs
WHERE
domain = 'xxx' AND
campaign_id='123' AND
log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND
log_time BETWEEN
CONVERT_TZ('2015-02-12 00:00:00','+05:30','+00:00') AND
CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00')
GROUP BY log_type;
EXPLAIN of above query
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
| 1 | SIMPLE | campaign_logs | range | campaign_id_index,domain_index,log_type_index,log_time_index,campaignid_domain_logtype_logtime_index | campaignid_domain_logtype_logtime_index | 468 | NULL | 402954 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+------+--------+-------------+
query 2
EXPLAIN SELECT
log_type,count(DISTINCT subscriber_id) AS distinct_count,
count(subscriber_id) as total_count
FROM stats.campaign_logs
WHERE
domain = 'yyy' AND
campaign_id='345' AND
log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND
log_time BETWEEN
CONVERT_TZ('2014-02-05 00:00:00','+05:30','+00:00') AND
CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00')
GROUP BY log_type;
explain of above query
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
| 1 | SIMPLE | campaign_logs | index_merge | campaign_id_index,domain_index,log_type_index,log_time_index,campaignid_domain_logtype_logtime_index | campaign_id_index,domain_index | 153,153 | NULL | 52097 | Using intersect(campaign_id_index,domain_index); Using where; Using filesort |
+----+-------------+---------------+-------------+------------------------------------------------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
Query 1 is using correct index because I have composite index
Query 2 is using index merge , it's taking long time to execute
Why MySql using different indexes for same query
I know we can mention USE INDEX in the query , but why MySql is not picking correct index in this case??. am I doing anything wrong??

No, you're not doing anything wrong.
As Chipmonkey stated in comments, sometimes MySQL will choose the wrong execution plan because of outdated table statistics. You can update the table statistics by performing ANALYZE TABLE.
Still, MySQL optimizer isn't that sophisticated. It sees that in both cases, MySQL will have to visit both the secondary index and then perform a lookup to the clustered index to get the actual table data, so when it saw that perhaps the second query had better selectivity by using the two separate indexes and merging them, you can't blame it too much just because it guessed wrong.
I'm guessing that if you had a covering index so that MySQL could perform the entire query with just the index, it will favor that index over performing a merge.
Try adding subscriber_id to the end of your multi-column index to get a covering index.
Otherwise, use USE INDEX or FORCE INDEX, because that's what they're there for. You know more about the data than MySQL does.

I suggest you try this:
Add this permutation of your compound index.
(campaign_id,domain,log_time,log_type,subscriber_id)
Change your query to remove the WHERE log_type IN() criterion, thus allowing the aggregate function to use all the records it finds in the range scan on log_time. Including subscriber_id in the index should allow the whole query to be satisfied directly from the index. That is, this is a covering index.
Finally, you can filter on your log_type values by wrapping the whole query in
SELECT *
FROM (/*the whole query*/) x
WHERE log_type IN
('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED')
ORDER BY log_type
This should give you better, and more predictable, performance.
(Unless the log_types you want are a tiny subset of the records, in which case please ignore this suggestion.)

retrieving top-ranking rows from large tables using FULLTEXT is very slow

When we log into our database with mysql-client and launch these queries:
first test query:
select a.*
from ads a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"bmw serie 3"' in boolean mode)
order by a.ranking asc limit 0, 10;
The result is:
10 rows in set (1 min 5.37 sec)
second test query:
select a.*
from ads a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"ford mondeo"' in boolean mode)
order by a.ranking asc limit 0, 10;
The result is:
10 rows in set (2 min 13.88 sec)
These queries are too slow. Is there a way to improve this?
The 'ads' table contains 2 millions rows, triggers are set to duplicate the data into search title. Search titles contains the id, title and label of each row in ads.
Table 'ads' is powered by innoDB and 'searchs_titles' by myISAM with a fulltext index on the label field.
Do we have too many columns? Too many indexes? Too many rows?
Is it a bad query?
Thanks a lot for the time you will spend helping us!
Edit: add explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | s | fulltext | id_ad,label | label | 0 | | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,id,id_2,id_3 | PRIMARY | 4 | XXXXXX.s.id_ad | 1 | |

Pro tip: Never use * in a SELECT statement in production software (unless you have a very good reason). By asking for all columns, you are denying the optimizer access to information about how best to exploit your indexes.
Observation: you're ordering by ads.ranking and taking ten results. But ads.ranking has very low cardinality -- according to that image in your question, it has 26 distinct values. Is your query working correctly?
Observation: You've said that the fulltext part of your search takes .77 seconds. I mean this part:
select s.id
from searchs_titles AS s
where match(s.label) against ('"ford mondeo"' in boolean mode)
That is good. It means we can focus on the rest of the query.
You also said you've been testing with the insertions to the table turned off. That's good because it rules out contention as a cause for the slow queries.
Suggestion: Create a suitable compound index for ads. For your present query, try an index on (id, ranking) This may allow your ORDER BY operation to avoid a full table scan.
Then, try this query to extract the set of ten a.id values you need, and then retrieve the data rows. This will exploit your compound index.
select z.*
from ads AS z
join ( select a.id, a.ranking
from ads AS a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"ford mondeo"' in boolean mode)
order by a.ranking asc
limit 0, 10
) AS b ON z.id = b.id
order by z.ranking
This uses a subquery to do the order by ... limit ... datashuffling operation on a small subset of the columns. This should make the retrieval of the appropriate id values much faster. Then the outer query fetches the appropriate rows.
The bottom line is this: ORDER BY ... LIMIT ... can be a very expensive operation if it's done on lots of data. But if you can arrange for it to be done on a minimal choice of columns, and those columns are indexed correctly, it can be very fast.

MySQL query with 2 joins, large keylen leads to 'Copying to tmp table on disk' process hanging forever

I'm sure I must be doing something stupid, but as is often the case I can't figure out what it is.
I'm trying to run this query:
SELECT `f`.`FrenchWord`, `f`.`Pronunciation`, `e`.`EnglishWord`
FROM (`FrenchWords` f)
INNER JOIN `FrenchEnglishMappings` m ON `m`.`FrenchForeignKey`=`f`.`id`
INNER JOIN `EnglishWords` e ON `e`.`id`=`m`.`EnglishForeignKey`
WHERE `f`.`Pronunciation` = '[whatever]';
When I run it, what happens seems quite weird to me. I get the results of the query fine, 2 rows in about 0.002 seconds.
However, I also get a huge spike in CPU and SHOW PROCESSLIST shows two identical processes for that query with state 'Copying to tmp table on disk'. These seem to keep running endlessly until I kill them or the system freezes.
None of the tables involved is big - between 100k and 600k rows each. tmp_table_size and max_heap_table_size are both 16777216.
Edit: EXPLAIN on the statement gives:
+edit reduced keylen of Pronunciation to 112
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
| 1 | SIMPLE | f | ref | PRIMARY,Pronunciation | Pronunciation | 112 | const | 2 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | m | ref | tmpindex,CombinedIndex,FrenchForeignKey,EnglishForeignKey | tmpindex | 4 | dict.f.id | 1 | Using index |
| 1 | SIMPLE | e | eq_ref | PRIMARY,id | PRIMARY | 4 | dict.m.EnglishForeignKey | 1 | |
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
I'd be grateful if someone could point out what might be causing this. What I really don't understand is what MySQL is doing - surely if the query is complete then it doesn't need to do anything else?
UPDATE
Thanks for all the responses. I learnt something from all of them. This query was made massively faster after following the advice of nrathaus. I added a PronunciationHash binary(16) column to FrenchWords that contains unhex( md5 ( Pronunciation ) ). That is indexed with a keylen of 16 (vs 600+ for the varchar index on Pronunciation), and queries are much faster now.

As said by the EXPLAIN, you key size is HUGE : 602, this requires MySQL to write down the data.
You need to reduce (greatly) the keylen, I believe recommended is below 128.
I suggest you create a column called MD5_FrenchWord which will contain the MD5 value of FrenchWord. Then use this column for the GROUP BY. This assumes that you are looking for similarities, when you group by rather than the actual value

You are misusing GROUP BY. This clause is entirely pointless unless you also have a summary function such as MAX(something) or COUNT(*) in your SELECT clause.
Try removing GROUP BY and see if it helps.
It's not clear what you're trying to do with GROUP BY. But you might try SELECT DISTINCT if you're trying to dedup your result set.

Looking further at this question, it seems like you might benefit from a couple of compound indexes.
First, can you make sure your table declarations have NOT NULL in as many columns as possible?
Second, you're retrieving Pronunciation, FrenchWord, and id from your Frenchwords table, so try this compound index on that table. Your query will then be able to get what it needs directly from the index, saving a bunch of disk io. Notice that Pronunciation is mentioned first in the compound index declaration because that's the value you're searching for. This allows MySQL to do a lookup on the index, and get the other information it needs directly from the index, without thrashing back to the table itself.
(Pronunciation, FrenchWord, id)
You're retrieving Englishword from Englishwords looking it up by id. So, the same reasoning can apply to this compound index.
(id, Englishword)
Finally, I can't tell what your ORDER BY is for, once you use SELECT DISTINCT. You might try getting rid of it. But it probably makes no difference.
Give this a try. If your MySQL server is still thrashing after you make these changes, you have some kind of configuration problem.

Optimize mysql NOT IN query by using temporary variable

I was trying to optimize NOT IN clause in mysql: Some how I ended up in the following query:
SELECT #i:=(SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
SELECT * FROM word WHERE #i IS NULL OR word_id NOT IN (#i);
There is no relationship between sent_question table and word table. And also I cannot place index on correct_option_word_id.
Can somebody please explain, will this method even optimize the query or not?
UPDATE: As mentioned here that both the methods: NOT IN and LEFT JOIN/IS NULL are almost equally efficient. That's why I don't want to use LEFT JOIN/IS NULL method.
UPDATE 2:
Explain results for original query:
EXPLAIN SELECT * FROM word WHERE word_id NOT IN (SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| 1 | PRIMARY | word | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 2 | DEPENDENT SUBQUERY | sent_question | ref | fk_question_subscriber1 | fk_question_subscriber1 | 48 | const | 1 | Using where |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+

You're right in that both the NOT IN and LEFT JOIN/IS NULL method are equally efficient, however, unfortunately, there is no faster option, only slower ones (NOT EXISTS).
Here's your query, simplified:
SELECT *
FROM word
WHERE
word_id NOT IN (SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc')
As you know, MySQL will do the subquery first and use the returned result set for the NOT IN clause. Then, it will scan through all of the rows in word to see if word_id is in the list for each row.
Unfortunately for this case, indexes are inclusive, not exclusive. They don't help with NOT queries. A covering index on word could potentially still be used to avoid accessing the actual table, and provide some IO benefits, but it won't be used in the traditional "lookup" sense. However, since you are returning all columns on the word table, it may not be viable to have such a large index.
The most important index that will be used here is an index on sent_question.msisdn for the subquery. Ensure that you have that index defined. A multi-column "covering" index on (msisdn, correct_option_word_id) would be best.
If you share your design, we can probably offer some design solutions for optimization.

I doubt it'll work at all.
Try
SELECT *
FROM word AS w
LEFT JOIN sent_question AS sq
ON w.word_id = sq.correct_option_word_id AND sq.msisdn='abc'
WHERE sq.correct_option_word_id IS NULL

Give this simple query a try
SELECT
sent_question.*,
word.word_id AS foundWord
FROM sent_question
LEFT JOIN word
ON word.word_id = sent_question.correct_option_word_id
WHERE sent_question.msisdn='abc'
// GROUP BY sent_question.correct_option_word_id // This shouldn't be needed but included for completion
HAVING foundWord IS NULL

MySQL Order by Optimization

Below is the structure of a table:-
Article: ID, Title, Desc, PublishedDateTime, ViewsCount, Published
Primary Key: ID
Query Used:
Select Title FROM Article ORDER By ViewsCount DESC, PublishedDateTime ASC
As you can see that I am mixing ASC and DESC & according to MySQL Order By optimization, indexes will not be used.
I have thought to use a composite index using the ViewsCount and PublishedDateTime. Do you recommend to use 2 different keys instead of using composite index. But then I have read that composite index is better than using 2 different keys (if both fields are going to be used).
Some more information shared:
The table contains more than 550K+ records and also I am having big trouble in adding and deleting indexes for test purpose. What do you guys recommend ? Should I test on a small sample ?
Below are some more insights:
Indexes Used:
1) ViewsCount
2) PublishedDateTime.
3) ViewsCount & PublishedDateTime (named as ViewsDate_Index )
A) EXPLAIN Query mixing ASC and DESC:
EXPLAIN SELECT title FROM `article` ORDER BY ViewsCount DESC , PublishedDateTime ASC LIMIT 0 , 20
====+===============+=========+======+===============+=====+=========+======+========+================+
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | article | ALL | NULL | NULL| NULL | NULL | 550116 | Using filesort
====+===============+=========+======+===============+=====+=========+======+========+================+
B) EXPLAIN Query using the same sorting order:
EXPLAIN SELECT title FROM `article` ORDER BY ViewsCount DESC , PublishedDateTime DESC LIMIT 0 , 20
====+===============+=========+=======+===============+=================+=========+=============+========+================+
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | article | index | NULL | ViewsDate_Index | 16 | NULL | 550116 |
====+===============+=========+=======+===============+=================+=========+=============+========+================+
You can see that if ViewsCount and PublishedDateTime are in 2 same sorting order then it uses the ViewsDate_Index index. One thing that I found strange was that possible_keys is NULL and still it selects an index. Can someone explain the reason for this.
Also any tips on adding indexes on this table because it takes alot of time to add a new index. Any workaround or help in this regarding will be appreciated.

First of all, run the whole query live, and see how it performs. When you've got some benchmarks down, plug the query into your MySQL console and prepend EXPLAIN to it. MySQL will not perform the query, but it will display it's plan to execute the query, including where it thinks is important to optimize, which indexes it will use, how many rows it has to traverse, and how efficiently it will traverse each set of rows, among other things as well. The best way to gauge a performance problem is through benchmarking. Use it often.

In practice, indexes won't be used even for ORDER By ViewsCount, PublishedDateTime here, since you SELECT all columns and apply no condition. Is it a real query? Because any conditions will spoil your optimizations.
If your table is so small that you are going to pull it as the whole, indexes will only slow down your query. (Relates to the original query: SELECT * FROM article ORDER BY ViewsCount DESC, PublishedDateTime;)
UPD
In case where you have 500K+ rows, I think you are going to use LIMIT clause. I would do the following:
add an index on (ViewCount, PublishedDateTime)
rewrite the query as follows:
SELECT Title
FROM (
SELECT id
FROM article
ORDER BY ViewsCount DESC, PublishedDateTime
LIMIT 100, 100
) ids
JOIN article
USING (id);
The ordering would benefit from operating on a subset of data from the covering index. The join will just obtain Titles by ids.
UPD2
Another query that might work much better when the cardinality of the ViewCount is rather small (though you should benchmark):
SELECT Title
FROM (
SELECT ViewCount
FROM article
GROUP BY ViewCount DESC) as groups
JOIN article USING (ViewCount)
LIMIT 0, 100;
It as well assumes you have (ViewCount, PublishedDateTime) index on the table.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008