MySQL join query not using Indexes? - mysql

I have this query:
SELECT
COUNT(*) AS `numrows`
FROM (`tbl_A`)
JOIN `tbl_B` ON `tbl_A`.`B_id` = `tbl_B`.`id`
WHERE
`tbl_B`.`boolean_value` <> 1;
I added three indexes for tbl_A.B_id, tbl_B.id and tbl_B.boolean_value but mysql still says it doesn't use indexes (in queries not using indexes log) and it examine whole of tables to retrieve the result.
I need to know what I should do to optimize this query.
EDIT:
Explain output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tbl_B ALL PRIMARY,boolean_value NULL NULL NULL 5049 Using where
1 SIMPLE tbl_A ref B_id B_id 9 tbl_B.id 9 Using where; Using index

The explain show us that an index is used to make the join to tbl_B but no index is used to filter tbl_A on the boolean value.
An index was available but the engine choose not to use it. Why it happen:
maybe 5049 rows is not a big deal and the engine saw that using the index to filter something like 10% of the rows using the index would be as fast as doing it without using it.
Booleans takes only 3 values: 1, 0 or NULL. So the cardinality of the index will always be very low (3 max). Low cardinality index are usually dropped by the query analyser (which is quite right usually thinking this index won't help him a lot)
It would be interesting to see if the query analyser behaves the same way when you have a 50/50 repartition of true and false value for this boolean, or when you have just a few False.
Now usually boolean fields are useful only on indexes containing multiple keys, so that if your queries use all the fields of the index (in where or order by) the query analyser will trust that index to be really a good tool.
Note that indexes are slowing down your writes and takes extra-spaces, do not add useless indexes. Using logt-query-not-using-indexes is a good thing, but you should compensate that log information with the slow queries log.If the query is fast it's not a problem.

if boolean_value it's really boolean value indexing of it not so good idea. Index wouldn't be effective.

Related

Optimise Select Query containing inner join

Mysql Version - 5.5.39
I have these two tables Bugs and BugStatus
I want to fetch the Open and Closed bug counts for a given user.
I am currently using this query
SELECT BugStatus.name,
count(BugStatus.name) AS count
FROM bugs
INNER JOIN BugStatus ON bugs.status = bugstatus.id
WHERE bugs.assignee='irakam'
GROUP BY bugstatus.name;
Now let's assume I am going to have 100,000 rows in my Bugs table. Does this query still stand or how should I modify it. I did use Explain but I am still confused. So can this query be optimised?
SQLFiddle link - Click here
Select bs.name,
count(*) as count -- simply count(*) unless you are avoiding nulls
from bugs
inner join BugStatus AS bs ON bugs.status = bs.id
where bugs.assignee='irakam'
group by bs.name;
bugs: INDEX(assignee) -- since filtering occurs first
Index Cookbook
You can further optimize your table by creating an index on bugs.status and bugs.assignee:
CREATE INDEX idx_bugs_assignee_status on bugs(assignee, status);
As far as the execution plan goes:
Select Type: Simple
This means you are executing a simple query, without any subqueries or unions.
Type: ALL
This means that you are doing a full-table scan is being done on the bug status table (every row is inspected), should be avoided for large tables, but this is ok for the BugStatus table, since it only contains 2 rows.
Type: ref
This means all rows with the matching index values are read from the Bugs table, for each combination of rows found in BugStatus.
possible_keys
This lists out the possible indexes that might be used to answer your query (The primary key of BugStatus, and the foreign key on bugs.status)
Key
This is the actual index that the optimizer chose to answer your query (none in the case of the BugStatus table, since a full-table scan is being performed on it, and the foreign key on status in the case of the bugs table.)
ref
This shows the index that was used on the joined table to compare results.
rows
This column indicates the number of rows that were examined.
extra: Using temporary; Using filesort
'Using temporary' means that mysql needs to create a temporary table to sort your results, which is done because of your GROUP BY clause.
'Using filesort' this means the database had to perform an another pass over your results to figure out how to retrieve the sorted rows.
extra: Using where
Means you had a WHERE clause in your query.
See: https://dev.mysql.com/doc/refman/5.5/en/explain-output.html

understanding perf of mysql query using explain extended

I am trying to understand performance of an SQL query using MySQL.
With only indexes on the PK, the query failed to complete in over 10mins.
I have added indexes on all the columns used in the where clauses (timestamp, hostname, path, type) and the query now completes in approx 50seconds -- however this still seems a long time for what does not seem an overly complex query.
So, I'd like to understand what it is about the query that is causing this. My assumption is that my inner subquery is in someway causing an explosion in the number of comparisons necessary.
There are two tables involved:
storage (~5,000 rows / 4.6MB ) and machines (12 rows, <4k)
The query is as follows:
SELECT T.hostname, T.path, T.used_pct,
T.used_gb, T.avail_gb, T.timestamp, machines.type AS type
FROM storage AS T
JOIN machines ON T.hostname = machines.hostname
WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
WHERE st.hostname = T.hostname AND
st.path = T.path)
AND (machines.type = 'nfs')
ORDER BY used_pct DESC
An EXPLAIN EXTENDED for the query returns the following:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY machines ref hostname,type type 768 const 1 100.00 Using where; Using temporary; Using filesort
1 PRIMARY T ref fk_hostname fk_hostname 768 monitoring.machines.hostname 4535 100.00 Using where
2 DEPENDENT SUBQUERY st ref fk_hostname,path path 1002 monitoring.T.path 648 100.00 Using where
Noticing that the 'extra' column for Row 1 includes 'using filesort' and question:
MySQL explain Query understanding
states that "Using filesort is a sorting algorithm where MySQL isn't able to use an index for sorting and therefore can't do the complete sort in memory."
What is the nature of this query which is causing slow performance?
Why is it necessary for MySQL to use 'filesort' for this query?
Indexes don't get populated, they are there as soon as you create them. That's why inserts and updates become slower the more indexes you have on a table.
Your query runs fast after the first time because the whole result of the query is put into cache. To see how fast the query is without using the cache you can do
SELECT SQL_NO_CACHE T.hostname ...
MySQL uses filesort usually for ORDER BY or in your case to determine the maximum value for timestamp. Instead of going through all possible values and memorizing which value is the greatest, MySQL sorts the values descending and picks the first one.
So, why is your query slow? Two things jumped into my eye.
1) Your subquery
WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
WHERE st.hostname = T.hostname AND
st.path = T.path)
gets evaluated for every (hostname, path). Have a try with an index on timestamp (btw, I discourage naming columns like keywords / datatypes). If that alone doesn't help, try to rewrite your query. There are two excellent examples in the MySQL manual: The Rows Holding the Group-wise Maximum of a Certain Column.
2) This is a minor issue, but it seems you are joining on char/varchar fields. Numbers / IDs are much faster.

MySQL not using Index (despite FORCE INDEX)

I need to do a live search using PHP and jQuery to select cities and countries from the two tables cities (almost 3M rows) and countries (few hundred rows).
For a short moment I was thinking of using a MyISAM table for cities as InnoDB does not support FULLTEXT search, however decided it is not a way to go (frequent table crashes, all other tables are InnoDB etc and with MySQL 5.6+ InnoDB also starts to support FULLTEXT index).
So, right now I still use MySQL 5.1, and as most cities consist of one-word only or max 2-3 Words, but e.g. "New York" - most people will not search for "York" if they mean "New York". So, I just put an index on the city_real column (which is a varchar).
The following query (I tried it in different versions, without any JOIN and without ORDER BY, with USE INDEX and even with FORCE INDEX, I have tried LIKE instead equal (=) but another post said = was faster and if the wildcard is only at the end, it is OK to use it), in EXPLAIN it always says "using where, using filesort". The average time for the query is about 4sec, which you have to admit is a little bit to slow for a live search (user typing in text-box and seeing suggestions of cities and countries)...
Live search (jQuery ajax) searches if the user typed at least 3 characters...
SELECT ci.id, ci.city_real, co.country_name FROM cities ci LEFT JOIN countries co ON(ci.country_id=co.country_id) WHERE city_real='cit%' ORDER BY population DESC LIMIT 5
There is a PRIMARY on ci.id and an INDEX on ci.city_real. Any ideas why MySQL does not use the index? Or how I could speed up the query? Or where else I should/should not set an INDEX?
Thank you very much in advance for your help!
Here's the explain output
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ci range city_real city_real 768 NULL 1250 Using where; Using filesort
1 SIMPLE co eq_ref PRIMARY PRIMARY 6 fibsi_1.ci.country_id 1
You should use WHERE city_real LIKE 'cit%', not WHERE city_real='cit%'.
I have tried LIKE instead equal (=) but another post said = was faster and if the wildcard is only at the end, it is OK to use it
This is wrong. = doesn't support wildcards so it will give you the wrong results.
Or how I could speed up the query?
Make sure you have an index on country_id in both tables. Post the output of EXPLAIN SELECT ... if you need further help.
The query do use index as seen in the key field of explain output. The reason it uses filesort is the order by, and the reason it uses where is probably one of the fields (city_real, population) allows null values.

how to set indexes for join and group by queries

Let's say we have a common join like the one below:
EXPLAIN SELECT *
FROM visited_links vl
JOIN device_tracker dt ON ( dt.Client_id = vl.Client_id
AND dt.Device_id = vl.Device_id )
GROUP BY dt.id
if we execute an explain, it says:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE vl index NULL vl_id 273 NULL 1977 Using index; Using temporary; Using filesort
1 SIMPLE dt ref Device_id,Device_id_2 Device_id 257 datumprotect.vl.device_id 4 Using where
I know that sometimes it's difficult to choose the right indexes when you are using group by but, what indexes could I set for avoiding 'using temporary, using filesort' in this query? why is this happening? and specially, why this happens after using an index?
One point to mention is that the fields returned by the select (* in this case) should either be in the GROUP BY clause or be using agregate functions such as SUM() or MAX(). Otherwise unexpected results can occur. This is because if the database is not told how to choose fields that are not in the group by clause you may get any member of the group, pretty much at random.
The way I look at it is to break the query down into bits.
you have a join on (dt.Client_id = vl.Client_id and dt.Device_id = vl.Device_id) so all of those fields should be indexed in their respective tables.
You are using GROUP BY dt.id so you need an index that includes dt.id
BUT...
an index on (dt.client_id,dt.device_id,dt.id) will not work for the GROUP BY
and
an index on (dt.id, dt.client_id, dt.device_id) will not work for the join.
Sometimes you end up with a query which just can't use an index.
See also:
http://ntsrikanth.blogspot.com/2007/11/sql-query-order-of-execution.html
You didn't post your indices, but first of all, you'll want to have an index for (client_id, device_id) on visited_links, and (client_id, device_id, id) on device_tracker to make sure that query is fully indexed.
From page 191 of the excellent High Performance MySQL, 2nd Ed.:
MySQL has two kinds of GROUP BY strategies when it can't use an index: it can use a temporary table or a filesort to perform the grouping. Either one can be more efficient depending on the query. You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.
In your case, I think the issue stems from joining on multiple columns and using GROUP BY together, even after the suggested indices are in place. If you remove either (a) one of the join conditions or (b) the GROUP BY, this shouldn't need a filesort.
However, keep in mind that a filesort doesn't always use actual files, it can also happen entirely within a memory buffer if the result set is small enough, so the performance penalty may be minimal. Consider the wall-clock time for the query too.
HTH!

MySQL Indices and Order By Clause

Say I have a simple query like this:
SELECT * FROM topics ORDER BY last_post_id DESC
In accordance to MySQL's Order-By Optimization Docs, an index on last_post_id should be utilized. But EXPLAINing the query tells the contrary:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE topic_info ALL NULL NULL NULL NULL 13 Using filesort
Why is my index not being utilized?
Do you really only have 13 rows? The database may be deciding that simply sorting them is quicker than going via the index.
A basic principle of index optimization is to use representative data for testing. A few rows in a table has no predictive value about how either the index or the optimizer will work in real life.
If you really have only a few records, then the indexing will provide no effective benefit.
EXPLAIN only tells you about the selection process, in which case it's expected that the query would have to examine all 13 rows to see if they meet the WHERE clause (which you don't have, so it's useless information!). It only reports the indexes and keys used for this purpose (evaluating WHERE, JOIN, HAVING). So, regardless of whether the query uses an index to sort or not, EXPLAIN won't tell you about it, so don't get caught up in its results.
Yes, the query uses the index to sort quickly. I've noticed the same results from EXPLAIN (aka reporting all rows despite being sorted by an index and having a limit) and I doubt it's a result of the small number of rows in your table, but rather a limitation of the power of EXPLAIN.
Try selecting the specific columns in order as they are in the table. MySQL indexes don't hold when the order is changed.