Why is Mysql using the wrong index?

Why is Mysql using the wrong index? - mysql

Mysql is using an index on (faver_profile_id,removed,notice_id) when it should be using the index on (faver_profile_id,removed,id). The weird thing is that for some values of faver_profile_id it does use the correct index. I can use FORCE INDEX which drastically speeds up the query, but I'd like to figure out why mysql is doing this.
This is a new table (35m rows) copied from another table using INSERT INTO.. SELECT FROM.
I did not run OPTIMIZE TABLE or ANALYZE after. Could that help?
SELECT `Item`.`id` , `Item`.`cached_image` , `Item`.`submitter_id` , `Item`.`source_title` , `Item`.`source_url` , `Item`.`source_image` , `Item`.`nudity` , `Item`.`tags` , `Item`.`width` , `Item`.`height` , `Item`.`tumblr_id` , `Item`.`tumblr_reblog_key` , `Item`.`fave_count` , `Item`.`file_size` , `Item`.`animated` , `Favorite`.`id` , `Favorite`.`created`
FROM `favorites` AS `Favorite`
LEFT JOIN `items` AS `Item` ON ( `Favorite`.`notice_id` = `Item`.`id` )
WHERE `faver_profile_id` =11619
AND `Favorite`.`removed` =0
AND `Item`.`removed` =0
AND `nudity` =0
ORDER BY `Favorite`.`id` DESC
LIMIT 26
Query execution plan: "idx_notice_id_profile_id" is an index on (faver_profile_id,removed,notice_id)
1 | SIMPLE | Favorite | ref | idx_faver_idx_id,idx_notice_id_profile_id,notice_id_idx | idx_notice_id_profile_id | 4 | const,const | 15742 | Using where; Using filesort |
1 | SIMPLE | Item | eq_ref | PRIMARY | PRIMARY | 4 | gragland_imgfave.Favorite.notice_id | 1 | Using where

I don't know if its causing any confusion or not, but maybe by moving some of the AND qualifiers to the Item's join might help as its directly correlated to the ITEM and not the favorite. In addition, I've explicitly qualified table.field references where they were otherwise missing.
SELECT
Item.id,
Item.cached_image,
Item.submitter_id,
Item.source_title,
Item.source_url,
Item.source_image,
Item.nudity,
Item.tags,
Item.width,
Item.height,
Item.tumblr_id,
Item.tumblr_reblog_key,
Item.fave_count,
Item.file_size,
Item.animated,
Favorite.id,
Favorite.created
FROM favorites AS Favorite
LEFT JOIN items AS Item
ON Favorite.notice_id = Item.id
AND Item.Removed = 0
AND Item.Nudity = 0
WHERE Favorite.faver_profile_id = 11619
AND Favorite.removed = 0
ORDER BY Favorite.id DESC
LIMIT 26
So now, from the "Favorites" table, its criteria is explicitly down to faver_profile_id, removed, id (for order)

Related

Mysql subquery much faster than join

I have the following queries which both return the same result and row count:
select * from (
select UNIX_TIMESTAMP(network_time) * 1000 as epoch_network_datetime,
hbrl.business_rule_id,
display_advertiser_id,
hbrl.campaign_id,
truncate(sum(coalesce(hbrl.ad_spend_network, 0))/100000.0, 2) as demand_ad_spend_network,
sum(coalesce(hbrl.ad_view, 0)) as demand_ad_view,
sum(coalesce(hbrl.ad_click, 0)) as demand_ad_click,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else 100*sum(hbrl.ad_click)/sum(hbrl.ad_view) end, 0), 2) as ctr_percent,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else sum(hbrl.ad_spend_network)/100.0/sum(hbrl.ad_view) end, 0), 2) as ecpm,
truncate(coalesce(case when sum(hbrl.ad_click) = 0 then 0 else sum(hbrl.ad_spend_network)/100000.0/sum(hbrl.ad_click) end, 0), 2) as ecpc
from hourly_business_rule_level hbrl
where (publisher_network_id = 31534)
and network_time between str_to_date('2017-08-13 17:00:00.000000', '%Y-%m-%d %H:%i:%S.%f') and str_to_date('2017-08-14 16:59:59.999000', '%Y-%m-%d %H:%i:%S.%f')
and (network_time IS NOT NULL and display_advertiser_id > 0)
group by network_time, hbrl.campaign_id, hbrl.business_rule_id
having demand_ad_spend_network > 0
OR demand_ad_view > 0
OR demand_ad_click > 0
OR ctr_percent > 0
OR ecpm > 0
OR ecpc > 0
order by epoch_network_datetime) as atb
left join dim_demand demand on atb.display_advertiser_id = demand.advertiser_dsp_id
and atb.campaign_id = demand.campaign_id
and atb.business_rule_id = demand.business_rule_id
ran explain extended, and these are the results:
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1451739 | 100.00 | NULL |
| 1 | PRIMARY | demand | ref | PRIMARY,join_index | PRIMARY | 4 | atb.campaign_id | 1 | 100.00 | Using where |
| 2 | DERIVED | hourly_business_rule_level | ALL | _hourly_business_rule_level_supply_idx,_hourly_business_rule_level_demand_idx | NULL | NULL | NULL | 1494447 | 97.14 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+
and the other is:
select UNIX_TIMESTAMP(network_time) * 1000 as epoch_network_datetime,
hbrl.business_rule_id,
display_advertiser_id,
hbrl.campaign_id,
truncate(sum(coalesce(hbrl.ad_spend_network, 0))/100000.0, 2) as demand_ad_spend_network,
sum(coalesce(hbrl.ad_view, 0)) as demand_ad_view,
sum(coalesce(hbrl.ad_click, 0)) as demand_ad_click,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else 100*sum(hbrl.ad_click)/sum(hbrl.ad_view) end, 0), 2) as ctr_percent,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else sum(hbrl.ad_spend_network)/100.0/sum(hbrl.ad_view) end, 0), 2) as ecpm,
truncate(coalesce(case when sum(hbrl.ad_click) = 0 then 0 else sum(hbrl.ad_spend_network)/100000.0/sum(hbrl.ad_click) end, 0), 2) as ecpc
from hourly_business_rule_level hbrl
join dim_demand demand on hbrl.display_advertiser_id = demand.advertiser_dsp_id
and hbrl.campaign_id = demand.campaign_id
and hbrl.business_rule_id = demand.business_rule_id
where (publisher_network_id = 31534)
and network_time between str_to_date('2017-08-13 17:00:00.000000', '%Y-%m-%d %H:%i:%S.%f') and str_to_date('2017-08-14 16:59:59.999000', '%Y-%m-%d %H:%i:%S.%f')
and (network_time IS NOT NULL and display_advertiser_id > 0)
group by network_time, hbrl.campaign_id, hbrl.business_rule_id
having demand_ad_spend_network > 0
OR demand_ad_view > 0
OR demand_ad_click > 0
OR ctr_percent > 0
OR ecpm > 0
OR ecpc > 0
order by epoch_network_datetime;
and these are the results for the second query:
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | hourly_business_rule_level | ALL | _hourly_business_rule_level_supply_idx,_hourly_business_rule_level_demand_idx | NULL | NULL | NULL | 1494447 | 97.14 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | demand | ref | PRIMARY,join_index | PRIMARY | 4 | my6sense_datawarehouse.hourly_business_rule_level.campaign_id | 1 | 100.00 | Using where; Using index |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+
the first one takes about 2 seconds while the second one takes over 2 minutes!
why is the second query taking so long?
what am I missing here?
thanks.

Use a subquery whenever the subquery significantly shrinks the number of rows before - ANY JOIN - always to reinforce Rick James Plan B.
To reinforce Rick & Paul's answer which you have already documented. The answers by Rick and Paul deserve Acceptance.

One possible reason is the number of rows that have to be joined with the second table.
The GROUP BY clause and the HAVING clause will limit the number of rows returned from your subquery.
Only those rows will be used for the join.
Without the subquery only the WHERE clause is limiting the number of rows for the JOIN.
The JOIN is done before the GROUP BY and HAVING clauses are processed.
Depending on group size and the selectivity of the HAVING conditions there would be much more rows that need to be joined.
Consider the following simplified example:
We have a table users with 1000 entries and the columns id, email.
create table users(
id smallint auto_increment primary key,
email varchar(50) unique
);
Then we have a (huge) log table user_actions with 1,000,000 entries and the columns id, user_id, timestamp, action
create table user_actions(
id mediumint auto_increment primary key,
user_id smallint not null,
timestamp timestamp,
action varchar(50),
index (timestamp, user_id)
);
The task is to find all users who have at least 900 entries in the log table since 2017-02-01.
The subquery solution:
select a.user_id, a.cnt, u.email
from (
select a.user_id, count(*) as cnt
from user_actions a
where a.timestamp >= '2017-02-01 00:00:00'
group by a.user_id
having cnt >= 900
) a
left join users u on u.id = a.user_id
The subquery returns 135 rows (users). Only those rows will be joined with the users table.
The subquery runs in about 0.375 seconds. The time needed for the join is almost zero, so the full query runs in about 0.375 seconds.
Solution without subquery:
select a.user_id, count(*) as cnt, u.email
from user_actions a
left join users u on u.id = a.user_id
where a.timestamp >= '2017-02-01 00:00:00'
group by a.user_id
having cnt >= 900
The WHERE condition filters the table to 866,081 rows.
The JOIN has to be done for all those 866K rows.
After the JOIN the GROUP BY and the HAVING clauses are processed and limit the result to 135 rows.
This query needs about 0.815 seconds.
So you can already see, that a subquery can improve the performance.
But let's make things worse and drop the primary key in the users table.
This way we have no index which can be used for the JOIN.
Now the first query runs in 0.455 seconds. The second query needs 40 seconds - almost 100 times slower.
Notes
It's difficult to say if the same applies to your case. Reasons are:
Your queries are quite complex and far away from from beeing an MVCE.
I don't see anything beeng selected from the demand table - So it's unclear why you are joining it at all.
You use a LEFT JOIN in one query and an INNER JOIN in another one.
The relation between the two tables is unclear.
No information about indexes. You should provide the CREATE statements (SHOW CREATE table_name).
Test setup
drop table if exists users;
create table users(
id smallint auto_increment primary key,
email varchar(50) unique
)
select seq as id, rand(1) as email
from seq_1_to_1000
;
drop table if exists user_actions;
create table user_actions(
id mediumint auto_increment primary key,
user_id smallint not null,
timestamp timestamp,
action varchar(50),
index (timestamp, user_id)
)
select seq as id
, floor(rand(2)*1000)+1 as user_id
#, '2017-01-01 00:00:00' + interval seq*20 second as timestamp
, from_unixtime(unix_timestamp('2017-01-01 00:00:00') + seq*20) as timestamp
, rand(3) as action
from seq_1_to_1000000
;
MariaDB 10.0.19 with sequence plugin.

The queries are different. One says JOIN, the other says LEFT JOIN. You are not using demand, so the join is probably useless. However, in the case of JOIN, you are filtering out advertisers that are not in dim_demand; it that the intent?
But that does not address the question.
The EXPLAINs estimate that there are 1.5M rows in hbrl. But how many show up in the result? I would guess it is a lot fewer. From this, I can answer your question.
Consider these two:
SELECT ... FROM ( SELECT ... FROM a
GROUP BY or HAVING or LIMIT ) x
JOIN b
SELECT ... FROM a
JOIN b
GROUP BY or HAVING or LIMIT
The first will decrease the number of rows that need to join to b; the second will need to do a full 1.5M joins. I suspect that the time taken to do the JOIN (be it LEFT or not) is where the difference is.
Plan A: Remove demand from the query.
Plan B: Use a subquery whenever the subquery significantly shrinks the number of rows before the JOIN.
Indexing (may speed up both variants):
INDEX(publisher_network_id, network_time)
and get rid of this as being useless (since the between will fail anyway for NULL):
and network_time IS NOT NULL
Side note: I recommend simplifying and fixing this
and network_time
between str_to_date('2017-08-13 17:00:00.000000', '%Y-%m-%d %H:%i:%S.%f')
AND str_to_date('2017-08-14 16:59:59.999000', '%Y-%m-%d %H:%i:%S.%f')
to
and network_time >= '2017-08-13 17:00:00
and network_time < '2017-08-13 17:00:00 + INTERVAL 24 HOUR

where like and order by on different tables/columns

For information, on the following examples, big_table is composed of millions of rows and small_table of hundreds.
Here is the basic query i'm trying to do:
SELECT b.id
FROM big_table b
LEFT JOIN small_table s
ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
This is slow and I can understand why both index can't be used.
My initial idea was to split the query into parts.
This is fast:
SELECT id FROM small_table WHERE name like 'something%';
This is also fast:
SELECT id FROM big_table WHERE small_id IN (1, 2) ORDER BY name LIMIT 10, 10;
But, put together, it becomes slow:
SELECT id FROM big_table
WHERE small_id
IN (
SELECT id
FROM small_table WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
Unless the subquery is re-evaluated for every row, it shouldn't be slower than executing both query separately right?
I'm looking for any help optimizing the initial query and understanding why the second one doesn't work.
EXPLAIN result for the last query :
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
| 1 | PRIMARY | small_table | range | PRIMARY, ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | big_table | ref | ix_join_foreign_key | ix_join_foreign_key | 9 | small_table.id | 11870 | |
temporary solution :
SELECT id FROM big_table ignore index(ix_join_foreign_key)
WHERE small_id
IN (
SELECT id
FROM small_table ignore index(PRIMARY)
WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
(result & explain is the same with an EXISTS instead of IN)
EXPLAIN output becomes:
| 1 | PRIMARY | big_table | index | NULL | ix_big_name | 768 | NULL | 20 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 8 | func | 1 | |
| 2 | MATERIALIZED | small_table | range | ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index |
if anyone has a better solution, I'm still interested.

The problem that you are facing is that you have conditions on the small table but are trying to avoid a sort in the large table. In MySQL, I think you need to do at least a full table scan.
One step is to write the query using exists, as others have mentioned:
SELECT b.id
FROM big_table b
WHERE EXISTS (SELECT 1
FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id
)
ORDER BY b.name;
The question is: Can you trick MySQL into doing the ORDER BY using an index? One possibility is to use the appropriate index. In this case, the appropriate index is: big_table(name, small_id, id) and small_table(id, name). The ordering of the keys in the index is important. Because the first is a covering index, MySQL might read through the index in order by name, choosing the appropriate ids.

You are looking for an EXISTS or IN query. As MySQL is known to be weak on IN I'd try EXISTS in spite of liking IN better for its simplicity.
select id
from big_table b
where exists
(
select *
from small_table s
where s.id = b.small_id
and s.name = 'something%'
)
order by name
limit 10, 10;
It would be helpful to have a good index on big_table. It should first contain the small_id to find the match, then the name for the sorting. The ID is automatically included in MySQL indexes, as far as I know (otherwise it should also be added to the index). So thus you'd have an index containing all fields needed from big_table (that is called a covering index) in the desired order, so all data can be read from the index alone and the table itself doesn't have to get accessed.
create index idx_big_quick on big_table(small_id, name);

you can try this:
SELECT b.id
FROM big_table b
JOIN small_table s
ON b.small_id = s.id
WHERE s.name like 'something%'
ORDER BY b.name;
or
SELECT b.id FROM big_table b
WHERE EXISTS(SELECT 1 FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id)
ORDER BY b.name;
NOTE: you don't seem to need LEFT JOIN. Left outer join will almost always result in full table scan of the big_table
PS make sure you have an index on big_table.small_id

Plan A
SELECT b.id
FROM big_table b
JOIN small_table s ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
(Note removal of LEFT.)
You need
small_table: INDEX(name, id)
big_table: INDEX(small_id), or, for 'covering': INDEX(small_id, name, id)
It will use the s index to find 'something%' and walk through. But it must find all such rows, and JOIN to b to find all such rows there. Only then can it do the ORDER BY, OFFSET, and LIMIT. There will be a filesort (which may happen in RAM).
The column order in the indexes is important.
Plan B
The other suggestion may work well; it depends on various things.
SELECT b.id
FROM big_table b
WHERE EXISTS
( SELECT *
FROM small_table s
WHERE s.name LIKE 'something%'
AND s.id = b.small_id
)
ORDER BY b.name
LIMIT 10, 10;
That needs these:
big_table: INDEX(name), or for 'covering', INDEX(name, small_id, id)
small_table: INDEX(id, name), which is 'covering'
(Caveat: If you are doing something other than SELECT b.id, my comments about covering may be wrong.)
Which is faster (A or B)? Cannot predict without understanding the frequency of 'something%' and how 'many' the many-to-1 mapping is.
Settings
If these tables are InnoDB, then be sure that innodb_buffer_pool_size is set to about 70% of available RAM.
Pagination
Your use of OFFSET implies that you are 'paging' through the data? OFFSET is an inefficient way to do it. See my blog on such, but note that only Plan B will work with it.

how to optimize this sql in mysql

i have a sql like this:
select t1.id,t1.type from collect t1
where t1.type='1' and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid='1146')
limit 1;
is is ok, but its performance seems not very good and if i using a order command:
select t1.id,t1.type from collect t1
where t1.type='1' and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid='1146')
order by t1.id asc
limit 1;
it become even worse.
how can i optimize this?
here is explain:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+-----+---------+-----------------+------+-----------------------------+
| 1 | PRIMARY | t1 | ref | i2,i1,i3 | i1 | 4 | const,const | 99 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | t2 | ref | i5 | i5 | 65 | img.t1.id,const | 2 | Using where; Using index

1) If it's not already done, define an index on the collect.id column :
CREATE INDEX idx_collect_id ON collect (id);
Or possibly a unique index if you can (if id is never the same for any two lines) :
CREATE UNIQUE INDEX idx_collect_id ON collect (id);
Maybe you need an index on collect_log.tid or collect_log.bid, too. Or even on both columns, like so :
CREATE INDEX idx_collect_log_tidbid ON collect (tid, bid);
Make it UNIQUE if it makes sense, that is, if no two lines have the same values for the (tid, bid) couple in the table. For instance if these queries give the same result, it might be possible :
SELECT count(*) FROM collect_log;
SELECT count(DISTINCT tid, bid) FROM collect_log;
But don't make it UNIQUE if you're unsure what it means.
2) Verify the types of the columns collect.type, collect.status and collect_log.bid. In your query, you are comparing them with strings, but maybe they are defined as INT (or SMALLINT or TINYINT...) ? In this case I advise you to drop the quotes around the numbers, because string comparisons are painfully slow compared to integer comparisons.
select t1.id,t1.type from collect t1
where t1.type=1 and t1.status=1
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid=1146)
order by t1.id asc
limit 1;
3) If that still doesn't help, just add EXPLAIN in front of your query, and you'll get the execution plan. Paste the results here and we can help you make some sense out of it. Actually, I would advise you to do this step before creating any new index.

I'd try to get rid of the IN statement using an INNER LEFT JOIN first.
Something like this (untested):
select t1.id,t1.type
from collect t1
LEFT JOIN collect_log t2 ON t1.id = t2.tid
where t1.type='1'
and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and NOT t2.bid = '1146'
order by t1.id asc
limit 1;

Query works too slow when there is no results. How to improve it?

I have three tables
filters (id, name)
items(item_id, name)
items_filters(item_id, filter_id, value_id)
values(id, filter_id, filter_value)
about 20000 entries in items.
about 80000 entries in items_filters.
SELECT i.*
FROM items_filters itf INNER JOIN items i ON i.item_id = itf.item_id
WHERE (itf.filter_id = 1 AND itf.value_id = '1')
OR (itf.filter_id = 2 AND itf.value_id = '7')
GROUP BY itf.item_id
WITH ROLLUP
HAVING COUNT(*) = 2
LIMIT 0,10;
It 0.008 time when there is entries that match query and 0.05 when no entries match.
I tried different variations before:
SELECT * FROM items WHERE item_id IN (
SELECT `item_id`
FROM `items_filters`
WHERE (`filter_id`='1' AND `value_id`=1)
OR (`filter_id`='2' AND `value_id`=7)
GROUP BY `item_id`
HAVING COUNT(*) = 2
) LIMIT 0,6;
This completely freezes mysql when there are no entries.
What I really don't get is that
SELECT i.*
FROM items_filters itf INNER JOIN items i ON i.item_id = itf.item_id
WHERE itf.filter_id = 1 AND itf.value_id = '1' LIMIT 0,1
takes ~0.05 when no entries found and ~0.008 when there are
Explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | i | ALL | PRIMARY | NULL | NULL | NULL | 10 | Using temporary; Using filesort |
| 1 | SIMPLE | itf | ref | item_id | item_id | 4 | ss_stylet.i.item_id | 1 | Using where; Using index |

Aside from ensuring and index on items_filters on both (filter_id, value_id), I would prequalify your item IDs up front with a group by, THEN join to the items table. It looks like you are trying to find an item that meets two specific conditions, and for those, grab the items...
I've also left the "group by with rollup" in the outer, even though there will be a single instance per ID returned from the inner query. But since the inner query is already applying the limit of 0,10 records, its not throwing too many results to be joined to your items table.
However, since you are not doing any aggregates, I believe the outer group by and rollup are not really going to provide you any benefit and could otherwise be removed.
SELECT i.*
FROM
( select itf.item_id
from items_filters itf
WHERE (itf.filter_id = 1 AND itf.value_id = '1')
OR (itf.filter_id = 2 AND itf.value_id = '7')
GROUP BY itf.item_id
HAVING COUNT(*) = 2
LIMIT 0, 10 ) PreQualified
JOIN items i
ON PreQualified.item_id = i.item_id
Another approach MIGHT be to do a JOIN on the inner query so you don't even need to apply a group by and having. Since you are explicitly looking for exactly two items, I would then try the following. This way, the first qualifier is it MUST have an entry of the ID = 1 and value = '1'. It it doesn't even hit THAT entry, it would never CARE about the second. Then, by applying a join to the same table (aliased itf2), it has to find on that same ID -- AND the conditions for the second (id = 2 value = '7'). This basically forces a look almost like a single pass against the one entry FIRST and foremost before CONSIDERING anything else. That would STILL result in your limited set of 10 before getting item details.
SELECT i.*
FROM
( select itf.item_id
from items_filters itf
join items_filters itf2
on itf.item_id = itf2.item_id
AND itf2.filter_id = 2
AND itf2.value_id = '7'
WHERE
itf.filter_id = 1 AND itf.value_id = '1'
LIMIT 0, 10 ) PreQualified
JOIN items i
ON PreQualified.item_id = i.item_id
I also removed the group by / with rollup as per your comment of duplicates (which is what I expected).

That looks like four tables to me.
Do an EXPLAIN PLAN on the query and look for a TABLE SCAN. If you see one, add indexes on the columns in the WHERE clauses. Those will certainly help.

Optimizing unexplainably slow MySQL query

I'm losing hair on a stupid query. First, I would explain what's its goal. I have a set of values fetched every hour and stored in the DB. These values can increase or stay equal with time. This query extracts the latest value day by day for latest 60 days (I have twins query for extract lastest value by weeks and months, they are similar). The query is self explanatory:
SELECT l.value AS value
FROM atable AS l
WHERE l.time = (
SELECT MAX(m.time)
FROM atable AS m
WHERE DATE(l.time) = DATE(m.time)
LIMIT 1
)
ORDER BY l.time DESC
LIMIT 60
It looks no special. But it's extremely slow (> 30 secs), considering time is an index and table contains less than 5000 rows. And I'm sure the problem is with sub-query.
Where is the noob mistake?
Update 1: Same situation if I avoid MAX() using SELECT m.time ... ORDER BY m.time DESC.
Update 2: Seems is not a problem with DATE() function called to many times. I've tried to create a calculated field day DATE. The UPDATE atable SET day = DATE(time) runs in less than 2secs. The modified query, with l.day = m.day (no functions!), runs in the same exactly time as before.

The main issue I see is using DATE() on the left of the expression in the WHERE clause. Using the function DATE() on both sides of the WHERE expression explicitly prevents MySQL from using an index on the date field. Instead, it must scan all rows to apply the function on each row.
Instead of this:
WHERE DATE(l.time) = DATE(m.time)
Try something like this:
WHERE l.time BETWEEN
DATE_SUB(m.date, INTERVAL TIME_TO_SEC(m.date) SECOND)
AND DATE_ADD(DATE_SUB(m.date, INTERVAL TIME_TO_SEC(m.date) SECOND), INTERVAL 86399 SECOND)
Maybe you know of a better way to turn m.date into a range like 2012-02-09 00:00:00 and 2012-02-09 23:59:59 than the above example, but the idea is that you want to keep the left side of the expression as the raw column name, l.time in this case, and give it a range in the form of two constants (or two expressions that can be converted to constants) on the right side.
EDIT
I'm using your pre-calculated day field:
SELECT *
FROM atable a
WHERE a.time IN
(SELECT MAX(time)
FROM atable
GROUP BY day
ORDER BY day DESC
LIMIT 60)
At least here, the inner query is only ran once, and then a binary search is done with the IN cluase. You're still scanning the table, but just once, and the advantage of the inner query being run just once will probably make a huge dent.
If you know that you have values for every day, you could improve that inner query by adding a WHERE clause, limiting it to the last 60 calendar days, and losing the LIMIT 60. Make sure that day and time are indexed.

Instead of using MAX(m.time) do the following in the sub-select
SELECT m.time
FROM table AS m
WHERE DATE(l.time) = DATE(m.time)
ORDER BY m.time DESC
LIMIT 1
This might help speed up the query since it is giving the query parser an alternative
However one other piece i noticed is you are using the DATE(l.time) and DATE(m.time) which if your index is not created on DATE(m.time) then you will not be using the index and hence could cause slowness.

Based on the feedback answer, if the entries are sequentially added via date/time, directly correlated to the auto-increment ID, who cares about the TIME... get the auto-inc number for exact, non-ambiguous join
select
A1.AutoID,
A1.time,
A1.Value
from
( select date( A2.time ) as SingleDate,
max( A2.AutoID ) as MaxAutoID
from aTable A2
where date( A2.Time ) >= date( date_sub( now(), interval 60 day ))
group by date( A2.time ) ) into MaxPerDate
JOIN aTable A1
on MaxPerDate.MaxAutoID = A1.AutoID
order by
A1.AutoID DESC

You could use the "explain" statement to get mysql to tell you what it's doing.
EXPLAIN SELECT l.value AS value
FROM table AS l
WHERE l.time = (
SELECT MAX(m.time)
FROM table AS m
WHERE DATE(l.time) = DATE(m.time) LIMIT 1
)
ORDER BY l.time DESC LIMIT 60
That should at least give you an insight where to look further.

If you have an index on time, I would suggest getting TOP 1 instead of MAX as follows:
SELECT l.value AS value
FROM table AS l
WHERE l.time = (
SELECT TOP 1 m.time
FROM table AS m
ORDER BY m.time DESC LIMIT 1
)
ORDER BY l.time DESC LIMIT 60

Your outer query is using a filesort without indexes.
Try changing to InnoDB engine to see if it improves things.
Doing a quick test:
mysql> show create table atable\G
*************************** 1. row ***************************
Table: atable
Create Table: CREATE TABLE `atable` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`t` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `t` (`t`)
) ENGINE=InnoDB AUTO_INCREMENT=51 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> explain SELECT id FROM atable AS l WHERE l.t = ( SELECT MAX(m.t) FROM atable AS m WHERE DATE(l.t) = DATE(m.t) LIMIT 1 ) ORDER BY l.t DESC LIMIT 50;
+----+--------------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | PRIMARY | l | index | NULL | t | 4 | NULL | 50 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | m | index | NULL | t | 4 | NULL | 50 | Using where; Using index |
+----+--------------------+-------+-------+---------------+------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)
After changing to MyISAM:
mysql> explain SELECT id FROM atable AS l WHERE l.t = ( SELECT MAX(m.t) FROM atable AS m WHERE DATE(l.t) = DATE(m.t) LIMIT 1 ) ORDER BY l.t DESC LIMIT 50;
+----+--------------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
| 1 | PRIMARY | l | ALL | NULL | NULL | NULL | NULL | 50 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | m | index | NULL | t | 4 | NULL | 50 | Using where; Using index |
+----+--------------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
2 rows in set (0.00 sec)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why is Mysql using the wrong index? - mysql

Related

Mysql subquery much faster than join

where like and order by on different tables/columns

how to optimize this sql in mysql

Query works too slow when there is no results. How to improve it?

Optimizing unexplainably slow MySQL query

Categories

Resources