I have a mysql query that I thought should be using my indexes but still seems to be needing to scan alot of rows (I think).
Here is my query:
SELECT DISTINCT DAY(broadcast_at) AS 'days'
from v3211062009
where month(broadcast_at) = 5 and
year(broadcast_at) = 2012
and deviceid = 337 order by days;
On my table I have an index setup on broadcast_at, deviceid. However the results of a explain on this query looks like:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE v3211062009 ref indx_deviceid,indx_tracking_query indx_tracking_query 4 const **172958** Using where; Using index; Using temporary; Using filesort
I don't understand why it needs to look up so many of the rows. The total amount of rows for this deviceid record is only 184085 so my query seems to be almost looking at all of them just to get the result set. Is the index on broadcast_at not working.
I'm obviously doing something fundamentally wrong but can't figure it out. Changing the order of the columns in my index didn't work.
I don't think MySQL can take advantage of the index on broadcast_at if you use functions on that field.
How does it perform if you do:
SELECT DISTINCT DAY(broadcast_at) AS 'days'
from v3211062009
where broadcast_at >= ('2012-05-01') AND
broadcast_at < ('2012-06-01')
and deviceid = 337 order by days;
Related
I am trying to understand performance of an SQL query using MySQL.
With only indexes on the PK, the query failed to complete in over 10mins.
I have added indexes on all the columns used in the where clauses (timestamp, hostname, path, type) and the query now completes in approx 50seconds -- however this still seems a long time for what does not seem an overly complex query.
So, I'd like to understand what it is about the query that is causing this. My assumption is that my inner subquery is in someway causing an explosion in the number of comparisons necessary.
There are two tables involved:
storage (~5,000 rows / 4.6MB ) and machines (12 rows, <4k)
The query is as follows:
SELECT T.hostname, T.path, T.used_pct,
T.used_gb, T.avail_gb, T.timestamp, machines.type AS type
FROM storage AS T
JOIN machines ON T.hostname = machines.hostname
WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
WHERE st.hostname = T.hostname AND
st.path = T.path)
AND (machines.type = 'nfs')
ORDER BY used_pct DESC
An EXPLAIN EXTENDED for the query returns the following:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY machines ref hostname,type type 768 const 1 100.00 Using where; Using temporary; Using filesort
1 PRIMARY T ref fk_hostname fk_hostname 768 monitoring.machines.hostname 4535 100.00 Using where
2 DEPENDENT SUBQUERY st ref fk_hostname,path path 1002 monitoring.T.path 648 100.00 Using where
Noticing that the 'extra' column for Row 1 includes 'using filesort' and question:
MySQL explain Query understanding
states that "Using filesort is a sorting algorithm where MySQL isn't able to use an index for sorting and therefore can't do the complete sort in memory."
What is the nature of this query which is causing slow performance?
Why is it necessary for MySQL to use 'filesort' for this query?
Indexes don't get populated, they are there as soon as you create them. That's why inserts and updates become slower the more indexes you have on a table.
Your query runs fast after the first time because the whole result of the query is put into cache. To see how fast the query is without using the cache you can do
SELECT SQL_NO_CACHE T.hostname ...
MySQL uses filesort usually for ORDER BY or in your case to determine the maximum value for timestamp. Instead of going through all possible values and memorizing which value is the greatest, MySQL sorts the values descending and picks the first one.
So, why is your query slow? Two things jumped into my eye.
1) Your subquery
WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
WHERE st.hostname = T.hostname AND
st.path = T.path)
gets evaluated for every (hostname, path). Have a try with an index on timestamp (btw, I discourage naming columns like keywords / datatypes). If that alone doesn't help, try to rewrite your query. There are two excellent examples in the MySQL manual: The Rows Holding the Group-wise Maximum of a Certain Column.
2) This is a minor issue, but it seems you are joining on char/varchar fields. Numbers / IDs are much faster.
I have a select query below, what it does is it selects all the products matching a certain attribute from a Virtuemart table. The attribute table is rather large (almost 6000 rows). Is there any way to optimize the query below or are there any other process that might be helpful, I already tried adding indexes to one and even two tables.
SELECT DISTINCT `jos_vm_product`.`product_id`,
`jos_vm_product_attribute`.`attribute_name`,
`jos_vm_product_attribute`.`attribute_value`,
`jos_vm_product_attribute`.`product_id`
FROM (`jos_vm_product`)
RIGHT JOIN `jos_vm_product_attribute`
ON `jos_vm_product`.`product_id` = `jos_vm_product_attribute`.`product_id`
WHERE ((`jos_vm_product_attribute`.`attribute_name` = 'Size')
AND ((`jos_vm_product_attribute`.`attribute_value` = '6.5')
OR (`jos_vm_product_attribute`.`attribute_value` = '10')))
GROUP BY `jos_vm_product`.`product_sku`
ORDER BY CONVERT(`jos_vm_product_attribute`.`attribute_value`, SIGNED INTEGER)
LIMIT 0, 24
Here is the results of the EXPLAIN table:
id select_type table type possible_keys key key_len ref rows Extras
1 SIMPLE jos_vm_product_attribute range idx_product_attribute_name,attribute_value,attribute_name attribute_value 765 NULL 333 Using where; Using temporary; Using filesort
1 SIMPLE jos_vm_product eq_ref PRIMARY PRIMARY 4 shoemark_com_shop.jos_vm_product_attribute.product_id
Any help would be greatly appreciated. Thanks.
Replacing the jos_vm_product_attribute.attribute_name index with a composite index on jos_vm_product_attribute.attribute_name and jos_vm_product_attribute.attribute_value (in that order) should help this query. Currently, it's only using an index in the WHERE condition for jos_vm_product_attribute.attribute_value, but this new index will be usable for both parts of the WHERE condition.
I have a database with a bunch of records and when I load up the page with the following SQL its really slow.
SELECT goal.title, max(updates.date_updated) as update_sort
FROM `goal`
LEFT OUTER JOIN `update` `updates` ON (`updates`.`goal_id`=`goal`.`goal_id`)
WHERE (goal.private=0)
GROUP BY updates.goal_id
ORDER BY update_sort desc
LIMIT 12
When I do an explain it says its not using any keys and that its searching every row. Also telling me its using "Using where; Using temporary; Using filesort".
Any help is much appreciated
Thanks
It needs to be grouped by goal_id because of the MAX() in the select is returning only one row.
What i'm trying to do is return the MAX date_updated row from the updates table for each goal and then sort it by that column.
Current indices are on goal.private and update.goal_id
output of EXPLAIN (can't upload images so have to put it here sorry if it isnt clear:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE goal ref private private 1 const 27 Using temporary; Using filesort
1 SIMPLE updates ref goal_id goal_id 4 goal.goal_id 1
SELECT u.date_updated, g.title
FROM updates u
JOIN goal g
ON g.goal_id = u.goal_id
WHERE u.id =
(
SELECT ui.id
FROM updates ui
WHERE ui.goal_id = u.goal_id
ORDER BY
ui.goal_id, ui.date_updated, ui.id
LIMIT 1
)
AND g.private = 0
ORDER BY
u.date_update, u.id
LIMIT 12
Create two indexes on updates for the query to work fast:
updates (goal_id, date_updated, id)
updates (date_updated, id)
My guess is that the MAX() function causes this behaviour. It needs to look at every record to decide what rows to select. What kind of grouping are you trying to accomplish?
How is
SELECT t.id
FROM table t
JOIN (SELECT(FLOOR(max(id) * rand())) AS maxid FROM table)
AS tt
ON t.id >= tt.maxid
LIMIT 1
faster than
SELECT * FROM `table` ORDER BY RAND() LIMIT 1
I am actually having trouble understanding the first. Maybe if I knew why one is faster than the other I would have a better understanding.
*original post # Difficult MySQL self-join please explain
You can use EXPLAIN on the queries, but basically:
In the first you're getting a random number (which isn't very slow), based on the maximum of a (i presume) indexed field. This is quite quick, i'd say maybe even near-constant time (depends on the implementation of the index hash?)
Then you're joining on that number and returning only the first row that's greater then, and because you're using an index again, this is lightning quick.
The second is ordering by some random function. This has to, but you'll need to look at the explain for that, do a FULL TABLE scan, and then return the first. This is ofcourse VERY expensive. You're not using any indexes because of that rand.
(the explain will look like this, showing that you're not using keys)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 14 Using temporary; Using filesort
EXPLAIN EXTENDED SELECT `board` . *
FROM `board`
WHERE `board`.`category_id` = '5'
AND `board`.`board_id` = '0'
AND `board`.`display` = '1'
ORDER BY `board`.`order` ASC
The output of the above query is
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE board ref category_id_2 category_id_2 9 const,const,const 4 100.00 Using where
I'm a little confused by this because I have an index that contains the columns that I'm using in the same order they're used in the query...:
category_id_2 BTREE No No
category_id 33 A
board_id 33 A
display 33 A
order 66 A
The output of EXPLAIN can sometimes be misleading.
For instance, filesort has nothing to do with files, using where does not mean you are using a WHERE clause, and using index can show up on the tables without a single index defined.
Using where just means there is some restricting clause on the table (WHERE or ON), and not all record will be returned. Note that LIMIT does not count as a restricting clause (though it can be).
Using index means that all information is returned from the index, without seeking the records in the table. This is only possible if all fields required by the query are covered by the index.
Since you are selecting *, this is impossible. Fields other than category_id, board_id, display and order are not covered by the index and should be looked up.
It is actually using index category_id_2.
It's using the index category_id_2 properly, as shown by the key field of the EXPLAIN.
Using where just means that you're selecting only some rows by using the WHERE statement, so you won't get the entire table back ;)