mysql fastest way to get max value under specific threshold - mysql

I've got two tables and a slow query in mysql.
The tables:
Table clips with fields channel,start_time,end_time
Table shows with fields channel,start_time,end_time
both tables have indeces for field start_time.
I am trying to find the show that started just before the clip for many clips.
So far I've got this query:
SELECT (
SELECT shows.id
FROM shows
WHERE shows.starttime<=clips.starttime AND shows.channel=clips.channel
ORDER BY shows.starttime DESC
LIMIT 1) as show_id,
clips.*
FROM clips
For a small number of clips this works great but for large number of clips it gets too slow.
My understanding would be that the dependent subquery should be extra fast since there is an index on start_time and all that needs to be done is an index lookup. Nevertheless it is slow and explaining the query states "using where" instead of "using index".
Here is the output of explain
--+------------------+-----+-----+------------+---------+------+----+------+-----------------------+
id| select_type |table|type |possibleKeys| key |keylen|ref |rows | Extra |
--+------------------+-----+-----+------------+---------+------+----+------+-----------------------+
1|PRIMARY |clips|range| startDate |startDate| 8 |NULL| 9095 |Using where;Using index|
2|DEPENDENT SUBQUERY|shows|index| startDate |startDate| 8 |NULL|287896|Using where;Using index|
--+------------------+-----+-----+------------+---------+------+----+------+-----------------------+
Any suggestions on how to improve performace for this task would be greatly appreciated.

Try to rewrite the query as
SELECT max(shows.starttime) as show_start, shows.id as show_id, clips.*
FROM shows
INNER JOIN clips ON (clips.channel = shows.channel AND shows.starttime<=clips.starttime)
GROUP BY clips.id
Because clips are part of a show, you would expect them to be close together, you can limit the number of hits further by doing something like:
SELECT max(shows.starttime) as show_start, shows.id as show_id, clips.*
FROM shows
INNER JOIN clips ON (clips.channel = shows.channel
AND clips.starttime BETWEEN shows.starttime AND DATE_ADD(shows.starttime, INTERVAL 1 DAY) )
GROUP BY clips.id
This will prevent MySQL from running a full subquery with sort on every row of clips.

I think adding an index that uses both start_time and channel columns may improve the query performance to an acceptable value.
Johan's answer is great, but given your filters I think the index may improve the performance in any case.

Related

MySQL JOIN and ORDER BY - performance issue

I have this query that drives me crazy for quite some time. It has 3 tables (originally it has a lot more but I isolated the performance issue), 1 base table, 1 product table which adds more data, and 1 with product types.
The product types table contains a "max age" column which indicates the maximum age of a row I want to fetch (anything older is considered "archived") and its value is different according to the product type.
My poor performance query goes like this and it takes 50 seconds for a 250,000 rows base table:
(select d_baseservices.ID
from d_baseservices
inner join d_products on d_baseservices.ServiceID = d_products.ServiceID
inner join md_prodtypes on d_products.ProdType = md_prodtypes.ProdType
where
(d_baseservices.CreationDate > (curdate() - INTERVAL md_prodtypes.MaxAge DAY))
order by CreationDate desc
limit 750);
Here is the EXPLAIN of this query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE md_prodtypes index PRIMARY,ProdType_UNIQUE,ID_MAX_AGE MAX_AGE 5 23 Using index; Using temporary; Using filesort
1 SIMPLE d_products ref PRIMARY,ServiceID_UNIQUE,fk_Products_BaseServices1,fk_d_products_md_prodtypes1 fk_d_products_md_prodtypes1 4 combina.md_prodtypes.ProdType 8625
1 SIMPLE d_baseservices eq_ref PRIMARY,CreationDateDesc_index,CreationDate_index PRIMARY 8 combina.d_products.ServiceID 1 Using where
I found a clue a few days back, when I was able to determine that limiting the query to 750 records would cause is to go fast, but 751 would bring poor performance.
I tried creating indexes of many kinds, with no success.
I tried removing the reference to MAX_AGE and the curdate function and just set a fixed value, with little success as the query now takes 20 seconds:
(select d_baseservices.ID
from d_baseservices
inner join d_products on d_baseservices.ServiceID = d_products.ServiceID
inner join md_prodtypes on d_products.ProdType = md_prodtypes.ProdType
where
(d_baseservices.CreationDate > '2015-09-21 19:02:25')
order by CreationDate desc
limit 750);
And the EXPLAIN command output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE md_prodtypes index PRIMARY,ProdType_UNIQUE,ID_MAX_AGE ProdType_UNIQUE 4 23 Using index; Using temporary; Using filesort
1 SIMPLE d_products ref PRIMARY,ServiceID_UNIQUE,fk_Products_BaseServices1,fk_d_products_md_prodtypes1 fk_d_products_md_prodtypes1 4 combina.md_prodtypes.ProdType 8625
1 SIMPLE d_baseservices eq_ref PRIMARY,CreationDateDesc_index,CreationDate_index PRIMARY 8 combina.d_products.ServiceID 1 Using where\
Can anyone please help? I'm stuck for almost a month
It's hard to say exactly what to do without knowing more about the specific data you have (how many rows in each table, how many rows you expect the query to return, the distribution of the data values, etc), but I'll make some educated guesses and hopefully point you in the right direction.
First an explanation about why taking md_prodtypes.MaxAge out of the query greatly reduced the run time: Prior to that change the database had no ability at all to filter using indexes because in order to see if rows are candidates for inclusion it had to join the three tables in order to compare CreationDate from the first table to MaxAge in the third table. There is simply no index that you can add to correlate these two values. You're forcing the database engine to look at every single row.
As to the 750 magic number - I'm guessing that past 750 results the database has to page data or that it's hitting some other memory limit based on the values in your specific MySQL configuration file. I wouldn't read too much into that 750 number.
Lastly I'd like to point out that the EXPLAIN of your second query is a bit strange since it's showing md_prodtypes as the first table despite the fact that you took MaxAge out of the WHERE. That means the database is starting from md_prodtypes then moving up to d_products and finally to d_baseservices and only then filtering based on the date. I'm guessing that you're expecting it to first filter on the date then join only when it's decided what baseservices records to include. It's impossible to know why this is happening with the information you've provided. Perhaps you are missing an index.
Another possibility may have to do with variance in your CreationDate column. Let me explain by example: Say you had a table of users, and each user had a gender column that could be either f or m. Let's pretend that we have a 50%/50% split of females and males. Now, if you add an index on the column gender and do a query filtered by WHERE gender='f' expecting that the index will filter out half of the records, you'd be surprised to see that the database will totally ignore the index and just scan the table. The reason being is that it's cheaper to just read the whole table if you know the index isn't filtering out enough (the alternative being jumping constantly from the index to the main table data). In your case, if the WHERE on the CreationDate column doesn't filter out enough records, then even if you have an index on it, it won't be used.
With a constant date...
INDEX(CreationDate)
That will encourage the optimizer to start with the table that can be filtered. Also, since the ORDER BY is on the same field, the WHERE, ORDER BY and LIMIT can all be done at the same time.
Otherwise, it must read all the relevant records from all 3 tables, sort them, then deliver 750 (or 751) of them.
Using MAX_AGE...
Now the optimizer won't know whether it is better to do as above or find all the rows, sort them, then deliver the LIMIT.

Query optimization order by

I have two tables
LangArticles | columns: id (INT) ,de (VARCHAR),en (VARCHAR),count_links(INT)
WikiLinks | columns: article_id,link_id,nr_in_article (all integer)
The name of an article is in the columns de (German) and en (English).
The id in the LangArticles table is the same as the ids article_id and link_id.
I want now to get all article names which links to another article. So I want all articles which links to 'abc'. 'abc' has the id = '1'
So my normal query (without an order by) looks like:
select distinct(LA.de),W.nr_in_article,LA.count_links from
LangArticles as LA inner join WikiLinks as W on W.article_id = LA.id
where W.link_id in ("1")
This maybe took 0.001 seconds and give me 100000 results. Actually I want the best 5 hits.
Best means in this case the most relevant ones. I want to sort it like this:
The articles which links to 'abc' at the beginning of an article (nr_in_article) and which has a lot of links itself (count_links) should have a high ranking.
I am using an
order by (1-(W.nr_in_article/LA.count_links)) desc
for this.
The problem is that I am not sure how to optimize this order by.
The Explain in mysql says that he has to use a temporary file and filesort and can't use the index on the order by keys. For testing I tried an "easy" order by W.nr_in_article so an normal order with one key.
For your information my indices are:
in LangArticles: id (primary),de (unique),en (unique), count_links(index)
in WikiLinks: article_id(index),link_id(index),nr_in_article(index)
But I tried this two multiindices link_id,nr_in_article & article_id,nr_in_article as well.
And the query with order by tooks approximately 5.5 seconds. :(
I think I know why MySql has to use a temporary file and filesort here because all 100,000 entries has to be found with one index (link_id) and afterwards it has to be sorted and in a temporary file it can't use an index.
But is there any way to make this faster?
Actually I only want the best 5 hits so there is no need to sort everything. I am not sure if sth. like the bad sort (bubble sort) would be faster for this than Quicksort which sorts the hole temporary table.
Since you only need the top 5 I think you could split it into two queries that should lead less results.
First like Sam pointed out,
order by (W.nr_in_article/LA.count_links) asc
should be equivalent to your
order by (1-(W.nr_in_article/LA.count_links)) desc
unless I'm overlooking some corner case here.
Furthermore, anything where
W.nr_in_article > LA.count_links
will be in the TOP 5 unless that result is empty, so I would try the query
select distinct(LA.de),W.nr_in_article,LA.count_links
from LangArticles as LA
inner join WikiLinks_2 as W on W.article_id = LA.id
and W.nr_in_article > LA.count_links
where W.link_id in ("1")
order by W.nr_in_article/La.count_links
limit 5
Only if this returns less than 5 results you have to further execute the query again with a changed where condition.
This however will not bring the runtime down by orders of magnitude, but should help a little. If you need more performance I don't see any other way than a materialized view, which I don't think is available in mysql, but can be simulated using triggers.

Slow MySQL query with AS and subquery

I have a problem with this slow query that runs for 10+ seconds:
SELECT DISTINCT siteid,
storyid,
added,
title,
subscore1,
subscore2,
subscore3,
( 1 * subscore1 + 0.8 * subscore2 + 0.1 * subscore3 ) AS score
FROM articles
WHERE added > '2011-10-23 09:10:19'
AND ( articles.feedid IN (SELECT userfeeds.siteid
FROM userfeeds
WHERE userfeeds.userid = '1234')
OR ( articles.title REGEXP '[[:<:]]keyword1[[:>:]]' = 1
OR articles.title REGEXP '[[:<:]]keyword2[[:>:]]' = 1 ) )
ORDER BY score DESC
LIMIT 0, 25
This outputs a list of stories based on the sites that a user added to his account. The ranking is determined by score, which is made up out of the subscore columns.
The query uses filesort and uses indices on PRIMARY and feedid.
Results of an EXPLAIN:
1 PRIMARY articles
range
PRIMARY,added,storyid
PRIMARY 729263 rows
Using where; Using filesort
2 DEPENDENT SUBQUERY
userfeeds
index_subquery storyid,userid,siteid_storyid
siteid func
1 row
Using where
Any suggestions to improve this query? Thank you.
I would move the calculation logic to the client and only load fields from the database. This makes your query and the calculation itself faster. It's not a good style to do such things in SQL code.
And also is the regex very slow, maybe another searching mode like 'LIKE' is faster.
Looking at your EXPLAIN, it doesn't appear your query is utilizing any index (thus the filesort). This is being caused by the sort on the calculated column (score).
Another barrier is the size of the table (729263 rows). You don't want to create an index that is too wide as it will take much more space and impact performance of your CUD operations. What we want to do is target the columns that are being selected, however, in this situation we can't since it's a calculated column. You can try creating a VIEW or either remove the sort or do it at the application layer.

Why does order by primary index make this query slow?

This query is getting the newest videos uploaded by the user's subscriptions, its running very slow so I rewrote it to use joins but It didn't make a difference and after tinkering with it I found out that removing ORDER BY would make it run fast (however it defeats the purpose of the query).
Query:
SELECT vid. *
FROM video AS vid
INNER JOIN subscriptions AS sub ON vid.uploader = sub.subscription_id
WHERE sub.subscriber_id = '1'
AND vid.privacy = 0 AND vid.blocked <> 1 AND vid.converted = 1
ORDER BY vid.id DESC
LIMIT 8
Running explain, it would show "Using temporary; Using filesort" in subscriptions table and its slow (0.0900 seconds).
Without ORDER BY vid.id DESC it doesn't show "Using temporary; Using filesort" so its fast (0.0004 seconds) but I don't understand how the other table can affect it like this.
All the fields are indexed (privacy blocked and converted fields don't affect performance by more than 10%).
I would paste the full explain information but I can't seem to make it fit nice in the layout of this site.
You're limiting the query to 8 results. When you run it without an order by, it can grab the first 8 rows it comes across that pass the condition, and then hand them back. Boom, it's done.
When you use the order by, you're not asking for any 8 records. You're asking for the first 8 records in terms of vid.id. So it has to figure out which those are, and the only way to do that is to look through the entire table and compare vid.id values. That's a lot more work.
Is there actually an index on the column? If so, it may be out of date. You could try rebuilding it.
Fixed it by suggesting that mysql use the primary index with USE_INDEX(PRIMARY)
SELECT vid. *
FROM video AS vid USE INDEX ( PRIMARY )
INNER JOIN subscriptions AS sub ON vid.uploader = sub.subscription_id
WHERE sub.subscriber_id = '1'
AND vid.privacy =0
AND vid.blocked <>1
AND vid.converted =1
ORDER BY vid.id DESC
LIMIT 8

Mysql: Best practice to get last records

I got a spicy question about mysql...
The idea here is to select the n last records from a table, filtering by a property, (possibly from another table). That simple.
At this point you wanna reply :
let n = 10
SELECT *
FROM huge_table
JOIN another_table
ON another_table.id = huge_table.another_table_id
AND another_table.some_interesting_property
ORDER BY huge_table.id DESC
LIMIT 10
Without the JOIN that's OK, mysql reads the index from the end and trow me 10 items, execution time is negligible
With the join, the execution time become dependent of the size of the table and in many case not negligible, the explain stating that mysql is : "Using where; Using index; Using temporary; Using filesort"
MySQL documentation (http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html) states that :
"You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)"
explaining why MySQL can't use index to resolve my ORDER BY prefering a huge file sort ...
My question is : Is it natural to use ORDER BY ... LIMIT 10 to get last items ? Do you really do it while picking last 10 cards in an ascending ordered card deck ? Personally i just pick 10 from the bottom ...
I tried many possibilities but all ended giving the conclusion that i'ts really fast to query 10 first elements and slow to query 10 last cause of the ORDER BY clause.
Can a "Select last 10" really be fast ? Where i am wrong ?
Nice question, I think you should make order by column i.e., id a DESC index.
That should do the trick.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
With the join you're now restricting rows to "some_interesting_property" and the ID's in your huge_table may no longer be consecutive... Try an index on another_table (some_interesting_property, id) and also huge_table (another_table_id, id) and see if your EXPLAIN gives you better hints.
I'm having trouble reproducing your situation. Whether I use ASC or DESC with my huge_table/another_table mock up, my EXPLAINs and execution time all show approx N rows read and a logical join. Which version of MySQL are you using?
Also, from the EXPLAIN doc, it states that Using index indicates
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row
which doesn't correspond with the fact you're doing a SELECT *, unless you have an index which covers your whole table.
Perhaps you should show your schema, including indexes, and the EXPLAIN output.