MySQL: nested select speed problem - mysql

I have follow tables:
|ELEMENTS|
------------
|id_element|
|id_catalog|
|value|
|CATALOG|
------------
|id_catalog|
|catalog_name|
|show|
|status|
I tried to add different indecies (several variants):
1) ELEMENT: pair(id_element, id_catalog) and id_element and id_catalog
2) ELEMENT: pair(id_element, id_catalog) and id_element
3) ELEMENT: pair(id_element, id_catalog) and id_catalog
4) ELEMENT: id_element and id_catalog
1) CATALOG: pair(show, status) and id_catalog
2) CATALOG: id_catalog and show and status
Execute follow select:
SELECT DISTINCT `id_element` FROM `ELEMENTS`
WHERE (id_catalog IN (SELECT `id_catalog` FROM `CATALOG` WHERE status=1 AND show = 1)) limit 10
If there are some rows then it works very fast. But if it is empty - it takes more than 4 sec.
At the same time "SELECTid_catalogFROMCATALOGWHERE status=1 AND show = 1" works fast both there are some rows and empty.
In the table ELEMENTS there are 100.000 records
In the table CATALOG there are 15.000 records
Also I tried "join" but it takes more time than it was before.
Why empty query works so long and what I should do to increase speed rate?
Here are explain answer:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | 'PRIMARY', |'ELEMENTS' | 'index' | '' | null | null | null | 270044 | 'Using where; Using temporary'
2 | 'DEPENDENT SUBQUERY' | 'CATALOG' | 'unique_subquery' | 'PRIMARY,pair,id_catalog' | 'PRIMARY' | '4' | 'func' | 1 | 'Using where'

I guess indexing CATALOG(status,show) would allow a quick answer to the sub-select.
And then some index on ELEMENTS(id_catalog) would speed up the answer to the main question.
Maybe it depends on the statistics on these columns: it they are no selective enough, you'll end up with many rows anyway.
Could you show the output of EXPLAIN when using the two indexes above?

Why not simply writing a join to help the optimizer do its job?
SELECT DISTINCT id_element
FROM elements JOIN catalog ON elements.id_catalog=catalog.id_catalog
WHERE status=1 AND show = 1
LIMIT 10
(untested)

Well, the reason you're having the problem is that you're pulling up the entire catalog database for each request and finding every match between the element and the catalog. If MySQL finds 10 entries, it bails out, but if it never finds them it will continue to check your entire database. I would use an EXISTS query to try and get some performance increase.
SELECT DISTINCT(e.id_element)
FROM ELEMENTS e
WHERE EXISTS (
SELECT *
FROM CATALOG c
WHERE c.id_catalog = e.id_catalog
AND c.status = 1
AND c.show = 1)
LIMIT 10;
This will decrease the amount of time MySQL spends looking for the catalog for each element by imposing a LIMIT 1 on the inner query, but you always run the risk of a long search time when there are possibly no matches.

I would put these indices there:
CREATE INDEX idx_element_1 ON ELEMENT (id_catalog);
CREATE INDEX idx_catalog_1 ON CATALOG (status, show);
Also these, although they might not be needed for your query (these should probably be primary keys, unless you have duplicates):
CREATE INDEX idx_element_2 ON ELEMENT (id_element);
CREATE INDEX idx_catalog_2 ON CATALOG (id_catalog);
Could you drop other indices and create these and check back with the query results?

Thx to all. I solved it by table denormalization. Because there are too much data in this dables which are separated.
I decided to combine it to one table. And now it works perfect. Now query always takes 0.03 second.

Related

SQL query optimization - really nothing more to improve?

I have the following query. I picked it from mysql slow queries log:
SELECT AVG(item.duration) AS dur
FROM `item`
INNER JOIN item_step ON item_step.item_id = item.id
WHERE
item_step.number = '2' AND
(IS_OK(item_step.result) OR item_step.result2 IN ("R1", "R2")) AND
item.time >= '2015-03-01 07:00:00' AND
item.time < '2015-05-01 07:00:00';
As usually I tried to inspect it using explain:
+----+-------------+-----------+------+----------------------------+---------+---------+------------------+--------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------+----------------------------+---------+---------+------------------+--------+----------+-------------+
| 1 | SIMPLE | item | ALL | PRIMARY,time | NULL | NULL | NULL | 790464 | 38.74 | Using where |
| 1 | SIMPLE | item_step | ref | number,item_id,result2_idx | item_id | 4 | debug_db.item.id | 1 | 100.00 | Using where |
+----+-------------+-----------+------+----------------------------+---------+---------+------------------+--------+----------+-------------+
Adding index to table item on id and time gave nothing.
Actually time column has an index,tables are connected using foreign keys and have an indexes..
I have no idea about what to do here. Is it really impossible to optimize this query to avoid using join_type = ALL ?
Since you already seem to have a FK from item_step.item_id to item.item_id, the only option you have for improvement is focusing on the parts being used to filter out records.
Slightly reformatting your query we have :
SELECT AVG(item.duration) AS dur
FROM `item`
INNER JOIN item_step
ON item_step.item_id = item.id
AND item_step.number = '2'
AND (IS_OK(item_step.result) OR item_step.result2 IN ("R1", "R2"))
WHERE item.time >= '2015-03-01 07:00:00'
AND item.time < '2015-05-01 07:00:00';
First thing to notice is IS_OK(item_step.result). I have no clue what's behind this function but I'm pretty sure it blocks the optimizer from using any index this field efficiently. If the formula is something that can be written in the query directly I would suggest to do so. (e.g. IN (1, 4, 9), or IN (SELECT OK FROM result_values) etc...)
Going by the field-names I'm going to assume that we FIRST want to reduce the item_id list to a minimum first and then use that reduced list to work on the item_step table. To do so you'll need an index on the time field first. I'm assuming that the item_id field is automatically included in the index as it's the PK field, but I'm no MySQL specialist and it might also depend on your storage engine. Anyay, in MSSQL that's how it would work, YMMV.
The second thing to do then is to go with this list of item_ids to the item_step table and reduce the number of records there. For this you'll want a compound index on item_id, number, result2, result. If you manage to write the IS_OK() function 'inline' into the query you might want to try swapping the last two fields around... something you'll need to test.
From what I read here and there, MySQL does not support something like INCLUDE on indexes in the same way as MSSQL does. A way around that would be to create a 'covering' index on time, duration on item. That way, everything can be done from the index directly, at the cost of more disk-space and CPU requirements when adding data to the item table.
In short:
add index on item on time, duration
add index on item_step on item_id, number, result2, result
see if you can inline the IS_OK() function.

retrieving top-ranking rows from large tables using FULLTEXT is very slow

When we log into our database with mysql-client and launch these queries:
first test query:
select a.*
from ads a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"bmw serie 3"' in boolean mode)
order by a.ranking asc limit 0, 10;
The result is:
10 rows in set (1 min 5.37 sec)
second test query:
select a.*
from ads a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"ford mondeo"' in boolean mode)
order by a.ranking asc limit 0, 10;
The result is:
10 rows in set (2 min 13.88 sec)
These queries are too slow. Is there a way to improve this?
The 'ads' table contains 2 millions rows, triggers are set to duplicate the data into search title. Search titles contains the id, title and label of each row in ads.
Table 'ads' is powered by innoDB and 'searchs_titles' by myISAM with a fulltext index on the label field.
Do we have too many columns? Too many indexes? Too many rows?
Is it a bad query?
Thanks a lot for the time you will spend helping us!
Edit: add explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | s | fulltext | id_ad,label | label | 0 | | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,id,id_2,id_3 | PRIMARY | 4 | XXXXXX.s.id_ad | 1 | |
Pro tip: Never use * in a SELECT statement in production software (unless you have a very good reason). By asking for all columns, you are denying the optimizer access to information about how best to exploit your indexes.
Observation: you're ordering by ads.ranking and taking ten results. But ads.ranking has very low cardinality -- according to that image in your question, it has 26 distinct values. Is your query working correctly?
Observation: You've said that the fulltext part of your search takes .77 seconds. I mean this part:
select s.id
from searchs_titles AS s
where match(s.label) against ('"ford mondeo"' in boolean mode)
That is good. It means we can focus on the rest of the query.
You also said you've been testing with the insertions to the table turned off. That's good because it rules out contention as a cause for the slow queries.
Suggestion: Create a suitable compound index for ads. For your present query, try an index on (id, ranking) This may allow your ORDER BY operation to avoid a full table scan.
Then, try this query to extract the set of ten a.id values you need, and then retrieve the data rows. This will exploit your compound index.
select z.*
from ads AS z
join ( select a.id, a.ranking
from ads AS a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"ford mondeo"' in boolean mode)
order by a.ranking asc
limit 0, 10
) AS b ON z.id = b.id
order by z.ranking
This uses a subquery to do the order by ... limit ... datashuffling operation on a small subset of the columns. This should make the retrieval of the appropriate id values much faster. Then the outer query fetches the appropriate rows.
The bottom line is this: ORDER BY ... LIMIT ... can be a very expensive operation if it's done on lots of data. But if you can arrange for it to be done on a minimal choice of columns, and those columns are indexed correctly, it can be very fast.

Why is MySQL slow when using LIMIT in my query?

I'm trying to figure out why is one of my query slow and how I can fix it but I'm a bit puzzled on my results.
I have an orders table with around 80 columns and 775179 rows and I'm doing the following request :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200
which returns 38 rows in 4.5s
When removing the ORDER BY I'm getting a nice improvement :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL LIMIT 200
38 rows in 0.30s
But when removing the LIMIT without touching the ORDER BY I'm getting an even better result :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC
38 rows in 0.10s (??)
Why is my LIMIT so hungry ?
GOING FURTHER
I was trying a few things before sending my answer and after noticing that I had an index on creation_date (which is a datetime) I removed it and the first query now runs in 0.10s. Why is that ?
EDIT
Good guess, I have indexes on the others columns part of the where.
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200;
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| 1 | SIMPLE | orders | index | id_state_idx,id_mp_idx | creation_date | 5 | NULL | 1719 | Using where |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC;
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| 1 | SIMPLE | orders | range | id_state_idx,id_mp_idx | id_mp_idx | 3 | NULL | 87502 | Using index condition; Using where; Using filesort |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
Indexes do not necessarily improve performance. To better understand what is happening, it would help if you included the explain for the different queries.
My best guess would be that you have an index in id_state or even id_state, id_mp that can be used to satisfy the where clause. If so, the first query without the order by would use this index. It should be pretty fast. Even without an index, this requires a sequential scan of the pages in the orders table, which can still be pretty fast.
Then when you add the index on creation_date, MySQL decides to use that index instead for the order by. This requires reading each row in the index, then fetching the corresponding data page to check the where conditions and return the columns (if there is a match). This reading is highly inefficient, because it is not in "page" order but rather as specified by the index. Random reads can be quite inefficient.
Worse, even though you have a limit, you still have to read the entire table because the entire result set is needed. Although you have saved a sort on 38 records, you have created a massively inefficient query.
By the way, this situation gets significantly worse if the orders table does not fit in available memory. Then you have a condition called "thrashing", where each new record tends to generate a new I/O read. So, if a page has 100 records on it, the page might have to be read 100 times.
You can make all these queries run faster by having an index on orders(id_state, id_mp, creation_date). The where clause will use the first two columns and the order by will use the last.
Same problem happened in my project,
I did some test, and found out that LIMIT is slow because of row lookups
See:
MySQL ORDER BY / LIMIT performance: late row lookups
So, the solution is:
(A)when using LIMIT, select not all columns, but only the PK columns
(B)Select all columns you need, and then join with the result set of (A)
SQL should likes:
SELECT
*
FROM
orders O1 <=== this is what you want
JOIN
(
SELECT
ID <== fetch the PK column only, this should be fast
FROM
orders
WHERE
[your query condition] <== filter record by condition
ORDER BY
[your order by condition] <== control the record order
LIMIT 2000, 50 <== filter record by paging condition
) as O2
ON
O1.ID = O2.ID
ORDER BY
[your order by condition] <== control the record order
in my DB,
the old SQL which select all columns using "LIMIT 21560, 20", costs about 4.484s.
the new sql costs only 0.063s. The new one is about 71 times faster
I had a similar issue on a table of 2.5 million records. Removing the limit part the query took a few seconds. With the limit part it stuck forever.
I solved with a subquery. In your case it would became:
SELECT *
FROM
(SELECT *
FROM orders
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC) tmp
LIMIT 200
I noted that the original query was fast when the number of selected rows was greater than the limit parameter. Se the query became extremely slow when the limit parameter was useless.
Another solution is trying forcing index. In your case you can try with
SELECT *
FROM orders force index (id_mp_idx)
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC
LIMIT 200
Problem is that mysql is forced to sort data on the fly. My query of deep offset like:
ORDER BY somecol LIMIT 99990, 10
Took 2.5s.
I fixed it by creating a new table, which has presorted data by column somecol and contains only ids, and there the deep offset (without need to use ORDER BY) takes 0.09s.
0.1s is not still enough fast though. 0.01s would be better.
I will end up creating a table that holds the page number as special indexed column, so instead of doing limit x, y i will query where page = Z.
i just tried it and it is fast as 0.0013. only problem is, that the offseting is based on static numbers (presorted in pages by 10 items for example.. its not that big problem though.. you can still get out any data of any pages.)

Why my mysql answer that "not using key" when I use rand in where

I have a table that has 4,000,000 records.
The table is created that : (user_id int, partner_id int, PRIMARY_KEY ( user_id )) engine=InnoDB;
I want to test the performance of select 100 records.
Then, I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( 1 );
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | MY_TABLE | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set, 1 warning (0.00 sec)
This is OK.
But, this query is buffered by mysql.
So, this test make no after the first test.
Then, I thinked of a sql that select by random value.
I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( select ceil( rand() ) );
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | MY_TABLE | index | NULL | PRIMARY | 4 | NULL | 3998727 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
But, it's bad.
Explain shows that possible_keys is NULL.
So, full index scanning is planned, and in fact, it's too slow rather than the one before.
Then, I want to ask you to teach me how do I write random value with index looking up.
Thanks
Using rand() in SQL is usually a sure-fire way to make the query slow. A common theme here is people using it in ORDER BY to get a random sequence. It's slow because not only does it throw away the indexes, but it also reads through the whole table.
However in your case, the fact that the function calls are in a sub-query ought to allow the outer query to still use its indexes. The fact that it isn't seems quite odd (so I've given the question a +1 vote).
My theory is that perhaps MySQL's optimiser is getting it wrong -- it's seeing the functions in the inner query, and deciding incorrectly that it can't use an index.
The only thing I can suggest to work around that is using force index to push MySQL into using the index you want.
See the definition of rand().
If i understand right, you are trying to get a random record from the database. If that is the case, again from the rand() definition:
ORDER BY RAND() combined with LIMIT is useful for selecting a random sample from a set of rows:
SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY RAND() LIMIT 1000;
It's a limitation of the MySQL optimizer, that it can't tell that the subquery returns exactly one value, it has to assume the subquery returns multiple rows with unpredictable values, potentially even all the values of user_id. Therefore it decides it's just going to do an index scan.
Here's a workaround:
mysql> explain select user_id from MY_TABLE use index (PRIMARY)
where user_id = ( select ceil( rand() ) );
Note that MySQL's RAND() function returns a value in the range 0 <= v < 1.0. If you CEIL() it, you'll likely get the value 1. Therefore you'll virtually always get the row where user_id=1. If you don't have such a row in your table, you'll get an empty set result. You certainly won't get a user chosen randomly among all your users.
To fix that problem, you'd have to multiply the rand() by the number of distinct user_id values. And that brings up the problem that you might have gaps, so a randomly chosen value won't match any existing user_id.
Re your comment:
You'll always see possible keys as NULL when you get an index scan (i.e., "type" is "index").
I tried your explain query on a similar table, and it appears that the optimizer can't figure out that the subquery is a constant expression. You can workaround this limitation by calculating the random number in application code and then using the result as a constant value in your query:
select user_id from MY_TABLE use index (PRIMARY)
where user_id = $random;

Performance difference between DISTINCT and GROUP BY

My understanding is that in (My)SQL a SELECT DISTINCT should do the same thing as a GROUP BY on all columns, except that GROUP BY does implicit sorting, so these two queries should be the same:
SELECT boardID,threadID FROM posts GROUP BY boardID,threadID ORDER BY NULL LIMIT 100;
SELECT DISTINCT boardID,threadID FROM posts LIMIT 100;
They're both giving me the same results, and they're giving identical output from EXPLAIN:
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
| 1 | SIMPLE | posts | ALL | NULL | NULL | NULL | NULL | 1263320 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
1 row in set
But on my table the query with DISTINCT consistently returns instantly and the one with GROUP BY takes about 4 seconds. I've disabled the query cache to test this.
There's 25 columns so I've also tried creating a separate table containing only the boardID and threadID columns, but the same problem and performance difference persists.
I have to use GROUP BY instead of DISTINCT so I can include additional columns without them being included in the evaluation of DISTINCT. So now I don't how to proceed. Why is there a difference?
First of all, your queries are not quite the same - GROUP BY has ORDER BY, but DISTINCT does not.
Note, that in either case, index is NOT used, and that cannot be good for performance.
I would suggest creating compound index for (boardid, threadid) - this should let both queries to make use of index and both should start working much faster
EDIT: Explanation why SELECT DISTINCT ... LIMIT 100 is faster than GROUP BY ... LIMIT 100 when you do not have indexes.
To execute first statement (SELECT DISTINCT) server only needs to fetch 100, maybe slightly more rows and can stop as soon as it has 100 different rows - no more work to do.
This is because original SQL statement did not specify any order, so server can deliver any 100 rows as it pleases, as long as they are distinct. But, if you were to impose any index-less ORDER BY on this before LIMIT 100, this query will immediately become slow.
To execute second statement (SELECT ... GROUP BY ... LIMIT 100), MySQL always does implicit ORDER BY by the same columns as were used in GROUP BY. In other words, it cannot quickly stop after fetching first few 100+ rows until all records are fetched, groupped and sorted. After that, it applies ORDER BY NULL you added (which does not do much I guess, but dropping it may speed things up), and finally, it gets first 100 rows and throws away remaining result. And of course, this is damn slow.
When you have compound index, all these steps can be done very quickly in either case.