Bad query performance. Where do I need to create the indexes? - mysql

I have a relativelly simple query, but it's performance is really bad.
I'm currently at something like 0.800 sec. per query.
Is there anything that I could change to make it faster?
I've already tried indexing the columns, used in where statement and join, but nothing works.
Here's the query that I'm using:
SELECT c.bloqueada, c.nomeF, c.confirmadoData
FROM encomendas_c_linhas el1
LEFT JOIN encomendas_c c ON c.lojaEncomenda=el1.lojaEncomenda
WHERE (el1.artigoE IN (342197) OR el1.artigoD IN (342197))
AND el1.desmembrado = 1
Here's the EXPLAIN:
As Bill Karwin asked, here follows the Query used to create the table:
Table "encomendas_c_linhas"
https://pastebin.com/73WcsDnE
Table "encomendas_c"
https://pastebin.com/yCUx3wh0

In your EXPLAIN, we see that it's accessing the el1 table with type: ALL which means it's scanning the whole table. The rows field shows an estimate of 644,236 rows scanned (this is approximate).
In your table definition for the el1 table, you have this index:
KEY `order_item_id_desmembrado` (`order_item_id`,`desmembrado`),
Even though desmembrado appears in an index, that index will not help this query. Your query searches for desmembrado = 1, with no condition for order_item_id.
Think of a telephone book: I can search for people with last name 'Smith' and the order of the phone book helps me find them quickly. But if I search for people with the first name 'Sarah' the book doesn't help. Only if I search on the leftmost column(s) of the index does it help.
So you need an index with desmembrado as the leftmost column. Then the search for desmembrado = 1 might use the index to select those matching rows.
ALTER TABLE encomendas_c_linhas ADD INDEX (desmembrado);
Note that if a large enough portion of the table matches, MySQL skips that index anyway. There's no advantage in using the index if it will match a large portion of rows. In my experience, MySQL's optimizer's judgement is that it avoids the index if the condition matches > 20% of the rows of the table.
The other conditions are in a disjunction (terms of an OR expression). There's no way to optimize these with a single index. Again, the telephone book example: Search for people with last name 'Smith' OR first name 'Sarah'. The last name lookup is optimized, but the first name lookup is not. No problem, we can make another index with first name listed first, so the first name lookup will be optimized. But in general, MySQL will use only one index per table reference per query. So optimizing the first name lookup will spoil the last name lookup.
Here's a workaround: Rewrite the OR condition into a UNION of two queries, with one term in each query:
SELECT c.bloqueada, c.nomeF, c.confirmadoData
FROM encomendas_c_linhas el1
LEFT JOIN encomendas_c c ON c.lojaEncomenda=el1.lojaEncomenda
WHERE el1.artigoD IN ('342197')
AND el1.desmembrado = 1
UNION
SELECT c.bloqueada, c.nomeF, c.confirmadoData
FROM encomendas_c_linhas el1
LEFT JOIN encomendas_c c ON c.lojaEncomenda=el1.lojaEncomenda
WHERE el1.artigoE IN ('342197')
AND el1.desmembrado = 1;
Make sure there's an index for each case.
ALTER TABLE encomendas_c_linhas
ADD INDEX des_artigoD (desmembrado, artigoD),
ADD INDEX des_artigoE (desmembrado, artigoE);
Each of these compound indexes may be used in the respective subquery, so it will optimize the lookup of two columns in each case.
Also notice I put the values in quotes, like IN ('342197') because the columns are varchar, and you need to compare to a varchar to make use of the index. Comparing a varchar column to an integer value will make the match successfully, but will not use the index.
Here's an EXPLAIN I tested for the previous query, that shows the two new indexes are used, and it shows ref: const,const which means both columns of the index are used for the lookup.
+----+--------------+------------+------+-------------------------+---------------+---------+------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+-------------------------+---------------+---------+------------------------+------+-----------------------+
| 1 | PRIMARY | el1 | ref | des_artigoD,des_artigoE | des_artigoD | 41 | const,const | 1 | Using index condition |
| 1 | PRIMARY | c | ref | lojaEncomenda | lojaEncomenda | 61 | test.el1.lojaEncomenda | 2 | NULL |
| 2 | UNION | el1 | ref | des_artigoD,des_artigoE | des_artigoE | 41 | const,const | 1 | Using index condition |
| 2 | UNION | c | ref | lojaEncomenda | lojaEncomenda | 61 | test.el1.lojaEncomenda | 2 | NULL |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+------+-------------------------+---------------+---------+------------------------+------+-----------------------+
But we can do one step better. Sometimes adding other columns to the index is helpful, because if all columns needed by the query are included in the index, then it doesn't have to read the table rows at all. This is called a covering index and it's indicated if you see "Using index" in the EXPLAIN Extra field.
So here's the new index definition:
ALTER TABLE encomendas_c_linhas
ADD INDEX des_artigoD (desmembrado, artigoD, lojaEncomenda),
ADD INDEX des_artigoE (desmembrado, artigoE, lojaEncomenda);
The third column is not used for lookup, but it's used when joining to the other table c.
You can also get the same covering index effect by creating the second index suggested in the answer by #TheImpaler:
create index ix2 on encomendas_c (lojaEncomenda, bloqueada, nomeF, confirmadoData);
We see the EXPLAIN now shows the "Using index" note for all table references:
+----+--------------+------------+------+-------------------------+-------------+---------+------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+-------------------------+-------------+---------+------------------------+------+--------------------------+
| 1 | PRIMARY | el1 | ref | des_artigoD,des_artigoE | des_artigoD | 41 | const,const | 1 | Using where; Using index |
| 1 | PRIMARY | c | ref | lojaEncomenda,ix2 | ix2 | 61 | test.el1.lojaEncomenda | 1 | Using index |
| 2 | UNION | el1 | ref | des_artigoD,des_artigoE | des_artigoE | 41 | const,const | 1 | Using where; Using index |
| 2 | UNION | c | ref | lojaEncomenda,ix2 | ix2 | 61 | test.el1.lojaEncomenda | 1 | Using index |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+------+-------------------------+-------------+---------+------------------------+------+--------------------------+

I would create the following indexes:
create index ix1 on encomendas_c_linhas (artigoE, artigoD, desmembrado);
create index ix2 on encomendas_c (lojaEncomenda, bloqueada, nomeF, confirmadoData);
The first one is the critical one. The second one will improve performance further.

Related

Speed up MYSQL query that has order by

I got 300,000 rows in 'tblmessages', and I'm trying to run this query.
If U don't use "order by msgId desc" It's run very fast, but when I add the order It's very very slow.
What am I missing...?
my index
SELECT msgId
FROM tblmessages
left join wp_users as a on a.ID = (CASE WHEN (msgFromUserId=1) then tblmessages.msgToUserId else tblmessages.msgFromUserId END)
left join tblforum_users u1 on u1.user_ID =(CASE WHEN (msgFromUserId=1) then tblmessages.msgToUserId else tblmessages.msgFromUserId END)
where (msgFromUserId=1 or msgToUserId=1)
order by msgId desc limit 0,20
after Explain:
+----+-------------+-------------+------------+-------------+---------------------------------------+---------------------------------------+---------+------+--------+----------+-----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------------+---------------------------------------+---------------------------------------+---------+------+--------+----------+-----------------------------------------------------+
| 1 | SIMPLE | tblmessages | NULL | index_merge | IXtbl_messages_from,IX_tblmessages_to | IX_tblmessages_from,IX_tblmessages_to | 5,5 | NULL | 726454 | 100.00 | Using union(IX_tblmessages_from,IX_tblmessages_to); |
Using where; Using filesort |
| 1 | SIMPLE | a | NULL | eq_ref | PRIMARY | PRIMARY | 8 | func | 1 | 100.00 | Using where; Using index |
| 1 | SIMPLE | u1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | func | 1 | 100.00 | Using where; Using index |
+----+-------------+-------------+------------+-------------+---------------------------------------+---------------------------------------+---------+------+--------+----------+-----------------------------------------------------+
and more information about tblmessages:
Data 12.1 GiB
Index 190.8 MiB
Total 12.3 GiB
There are 2 reasons for this:
If U don't use "order by msgId desc" It's run very fast, but when I
add the order It's very very slow.
1st is that your query has a "LIMIT 20" rows at the end of it. Ordering your results for this query (via "order by") will require all rows which satisfy the where clause to be pulled/generated,sorted,etc. in order to get the correct 20 rows.
2nd is sort-of subjective, but I'll say your table's don't have the necessary indexing to support optimization of that query. Or, another way of looking at it, could be a query like that is generally gonna run slowly because of the multiple joins on IF-THEN clauses. My advice on fixing this is to either simplify the query into multiple queries or update your question with "SHOW CREATE TABLE" statements for the relevant tables along with one of those INFORMATION_SCHEMA queries that summarizes indices, partitions, etc. From there we could potentially update design (e.g. adding multi-column indices/indexes/however its spelled...)

Slow mysql query, join on huge table, not using indexes

SELECT archive.id, archive.file, archive.create_timestamp, archive.spent
FROM archive LEFT JOIN submissions ON archive.id = submissions.id
WHERE submissions.id is NULL
AND archive.file is not NULL
AND archive.create_timestamp < DATE_SUB(NOW(), INTERVAL 6 month)
AND spent = 0
ORDER BY archive.create_timestamp ASC LIMIT 10000
EXPLAIN result:
+----+-------------+--------------------+--------+--------------------------------+------------------+---------+--------------------------------------------+-----------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+--------------------------------+------------------+---------+--------------------------------------------+-----------+--------------------------------------+
| 1 | SIMPLE | archive | range | create_timestamp,file_in | create_timestamp | 4 | NULL | 111288502 | Using where |
| 1 | SIMPLE | submissions | eq_ref | PRIMARY | PRIMARY | 4 | production.archive.id | 1 | Using where; Using index; Not exists |
+----+-------------+--------------------+--------+--------------------------------+------------------+---------+--------------------------------------------+-----------+--------------------------------------+
I've tried hinting use of indexes for archive table with:
USE INDEX (create_timestamp,file_in)
Archive table is huge, ~150mil records.
Any help with speeding up this query would be greatly appreciated.
You want to use a composite index. For this query:
create index archive_file_spent_createts on archive(file, spent, createtimestamp);
In such an index, you want the where conditions with = to come first, followed by up to one column with an inequality. In this case, I'm not sure if MySQL will use the index for the order by.
This assumes that the where conditions do, indeed, significantly reduce the size of the data.

joining table in mysql not using index properly?

I have four tables that I am trying to join and output the result to a new table. My code looks like this:
create table tbl
select a.dte, a.permno, (ret - rf) f0_xs_ret, (xs_ret - (betav*xs_mkt)) f0_resid, mkt_cap last_year_mkt_cap, betav beta_value
from a inner join b using (dte)
inner join c on (year(a.dte) = c.yr and a.permno = c.permno)
inner join d on (a.permno = d.permno and year(a.dte)-1 = year(d.dte));
All of the tables have multiple indices and for table a, (dte, permno) identify a unique record, for table b, dte id's a unique record, for table c, (yr, permno) id a unique record and for table d, (dte, permno) id a unique record. the explain from the select part of the query is:
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| 1 | SIMPLE | d | ALL | idx1 | NULL | NULL | NULL | 264129 | |
| 1 | SIMPLE | c | ref | idx2 | idx2 | 4 | achernya.d.permno | 16 | |
| 1 | SIMPLE | b | ALL | PRIMARY,idx2 | NULL | NULL | NULL | 12336 | Using join buffer |
| 1 | SIMPLE | a | eq_ref | PRIMARY,idx1,idx2 | PRIMARY | 7 | achernya.b.dte,achernya.d.permno | 1 | Using where |
+----+-------------+-------+--------+-------------------+---------+---------+----------------------------------+--------+-------------------+
Why does mysql have to read so many rows to process this thing? and if i am reading this correctly, it has to read (264129*16*12336) rows which should take a good month.
Could someone please explain what's going on here?
MySQL has to read the rows because you're using functions as your join conditions. An index on dte will not help resolve YEAR(dte) in a query. If you want to make this fast, then put the year in its own column to use in joins and move the index to that column, even if that means some denormalization.
As for the other columns in your index that you don't apply functions to, they may not be used if the index won't provide much benefit, or they aren't the leftmost column in the index and you don't use the leftmost prefix of that index in your join condition.
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

Why scan type is changed from ALL to RANGE when using LIMIT on SQL queries + Optimize query

I have this query
SELECT l.licitatii_id,
l.nume,
l.data_publicarii,
l.data_limita
FROM licitatii_ue l
INNER JOIN domenii_licitatii dl
ON l.licitatii_id = dl.licitatii_id
AND dl.tip_licitatie = '2'
INNER JOIN domenii d
ON dl.domenii_id = d.domenii_id
AND d.status = 1
AND d.tip_domeniu = '1'
WHERE l.status = 1
AND Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita)
AS DATE)))
< '1300683793'
GROUP BY l.licitatii_id
ORDER BY data_publicarii DESC
Explain outputs:
+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | SIMPLE | d | ALL | PRIMARY,key_status_tip_domeniu | NULL | NULL | NULL | 120 | 85.83 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dl | ref | PRIMARY,tip_licitatie,licitatii_id | PRIMARY | 4 | web61db1.d.domenii_id | 6180 | 100.00 | Using where; Using index |
| 1 | SIMPLE | l | eq_ref | PRIMARY | PRIMARY | 4 | web61db1.dl.licitatii_id | 1 | 100.00 | Using where |
+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+
As you see type=ALL for d table
now if I add LIMIT 100 to the query
plan changes to range:
+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | SIMPLE | d | range | PRIMARY,key_status_tip_domeniu | key_status_tip_domeniu | 9 | NULL | 103 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dl | ref | PRIMARY,tip_licitatie,licitatii_id | PRIMARY | 4 | web61db1.d.domenii_id | 6180 | 100.00 | Using where; Using index |
| 1 | SIMPLE | l | eq_ref | PRIMARY | PRIMARY | 4 | web61db1.dl.licitatii_id | 1 | 100.00 | Using where |
+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+
Why does this happen?
Can this query be optimized more, both queries take 13 seconds.
Table schema is visible on gist github
MySQL chooses domenii as the leading table for the join.
This table is filtered on (status, tip_domeniu) = (1, 1).
It does not seem to be a very selective condition, so normally a full table scan with filtering would be preferred over the index scan.
We can see that MySQL expects 120 records to be returned from domanii for which this condition would hold.
When you add a LIMIT, the number of records expected to be processed is decreased, and MySQL considers the index scan more efficient for this.
Note that this condition:
Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) AS DATE))) < '1300683793'
is not sargable, so you deprive the optimizer to use an index on data_limita.
Create the following indexes:
licitatii_ue (status, data_limita)
licitatii_ue (status, data_publicarii)
and rewrite the query like this:
SELECT l.licitatii_id,
l.nume,
l.data_publicarii,
l.data_limita
FROM licitatii_ue l
JOIN domenii_licitatii dl
ON l.licitatii_id = dl.licitatii_id
AND dl.tip_licitatie = '2'
JOIN domenii d
ON dl.domenii_id = d.domenii_id
AND d.status = 1
AND d.tip_domeniu = '1'
WHERE l.status = 1
AND l.data_limita < FROM_UNIXTIME(((1300683793 - 86400) div 86400) * 86400)
GROUP BY
l.licitatii_id
ORDER BY
data_publicarii DESC
Ah, the mysteries of the query optimizer are many and unknowable...
At a quick glance, the most obvious thing to optimize might be the
AND Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita)
AS DATE)))
clause.
depending on the number of records in the licitatii_ue table, this looks like an expensive operation, and it will bypass any indices available.
ALL is table scan, range is range scan (due to LIMIT). Nothing bad with that, actually it also causes a key to be used (key_status_tip_domeniu).
The reason you are slow is, most likely, that you are using ORDER BY data_publicarii DESC (this is easy to test, just drop the ORDER BY and benchmark the query; would expect few orders of magnitude).
Mysql admits (under Extra column of explain) that it is using filesort (needed for order by because it can't or does not know how to use an index). Adding yet another index to the mix might help, especially if you confirm that ORDER BY is making it slow.
EDIT
Actually, you do have a cardinal sin in your query:
Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita)
AS DATE)))
< '1300683793'
Avoid applying any functions to your field values if you can apply them to a constant. So switch it around and rewrite it as
l.data_limita < some_function('1300683793')
However complext the some_function would be, it will be calculated only once. Mysql planner will know it is a constant. The way you wrote it would force mysql to apply unix_timestamp, timestampadd, cast and from_unixtime to value of data_limita from each row. Now in I/O bound systems this will usually just burn some extra CPU cycles while waiting for the disks to spin around (however, it might get significant, your system might get CPU bound and it is just a bad thing). Biggest difference is that you loose possibility to use an index on data_limita.
Finally, all your indexes are singe field indexes and mysql does some index merging, but is not stellar in it. You might want to try creating indexes that cover all your conditions and sorting order (in order of selectivity for target query).

MySQL query optimization is driving me nuts! Almost same, but horribly different

I have the following two queries (*), which only differ in the field being restricted in the WHERE clause (name1 vs name2):
SELECT A.third_id, COUNT(DISTINCT B.fourth_id) AS num
FROM first A
JOIN second B ON A.third_id = B.third_id
WHERE A.name1 LIKE 'term%'
SELECT A.third_id, COUNT(DISTINCT B.fourth_id) AS num
FROM first A
JOIN second B ON A.third_id = B.third_id
WHERE A.name2 LIKE 'term%'
Both of the name fields have a single-column index on them. There is also an index on both third_id columns as well as fourth_id (which are all foreign keys into other tables, but it is not relevant here).
According to EXPLAIN, the first one behaves like this - which is what I want:
+----+-------------+-------+-------+---------------+----------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+---------------+------+-------------+
| 1 | SIMPLE | A | range | third_id,name | name | 767 | NULL | 3491 | Using where |
| 1 | SIMPLE | B | ref | third_id | third_id | 4 | db.A.third_id | 16 | |
+----+-------------+-------+-------+---------------+----------+---------+---------------+------+-------------+
The second one does this, which I definitely do not want:
+----+-------------+-------+------+----------------+----------+---------+---------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------+----------+---------+---------------+--------+-------------+
| 1 | SIMPLE | B | ALL | third_id | NULL | NULL | NULL | 507539 | |
| 1 | SIMPLE | A | ref | third_id,name2 | third_id | 4 | db.B.third_id | 1 | Using where |
+----+-------------+-------+------+----------------+----------+---------+---------------+--------+-------------+
What the heck is happening here? How do I make the second one behave properly (i.e. like the first one)?
(*) Actually, I don't. I have a bit more complex queries; I have eliminated extras for this post, and distilled them to the minimal queries that still exhibit the problematic behaviour. Also, names were changed to protect the guilty.
Add CREATE TABLE statements to your post.
A real SELECT statement would be helpful too.
1 possible reason is that name2 has a much higher percentage of values starting with "term%".
Try enforcing order of tables in query by using STRAIGHT_JOIN.
SELECT A.third_id, COUNT(DISTINCT B.fourth_id) AS num
FROM first A
STRAIGHT_JOIN second B ON A.third_id = B.third_id
WHERE A.name2 LIKE 'term%'
How many records is in those tables ? Check cardinality/slectivity in name2 column.
If selectivity is low try Naktibalda "STRAIGHT_JOIN" or hints http://dev.mysql.com/doc/refman/5.0/en/index-hints.html