I have two servers (linode 3072 vps), one (older) has ubuntu 11.04 + Mysql 5.5.32 and the other (newer) has centos 6.2. + Mysql 5.5.36. The My.cnf files are same as well. However, when I run the same query on the same db (straight export/import) then I got 2 different response times and execution paths from the 2 servers.
Older one with faster response.
1 SIMPLE ch ref PRIMARY,channel_name channel_name 122 const 1 Using where; Using temporary; Using filesort
1 SIMPLE t ref PRIMARY,channel_id channel_id 4 bcc.ch.channel_id 1554
1 SIMPLE p ref PRIMARY PRIMARY 4 bcc.t.entry_id 1 Using index
1 SIMPLE c eq_ref PRIMARY,group_id PRIMARY 4 bcc.p.cat_id 1 Using where
Newer one with slower response.
1 SIMPLE ch ref PRIMARY,channel_name channel_name 122 const 1 Using where; Using temporary; Using filesort
1 SIMPLE p index PRIMARY PRIMARY 8 NULL 25385 Using index; Using join buffer
1 SIMPLE t eq_ref PRIMARY,channel_id PRIMARY 4 bcc.p.entry_id 1 Using where
1 SIMPLE c eq_ref PRIMARY,group_id PRIMARY 4 bcc.p.cat_id 1 Using where
The big difference is in the 2nd step. The first server uses an index and only has to scan 1554 rows, where as the 2nd server uses index + join buffer and has to scan 25385 rows. Any thoughts?
Queries like this and others are causing an increase of a few seconds per page load on the new server on certain pages. I'm using varnish to serve front end, but still want to fix this issue.
Here's the sql being run
select SQL_NO_CACHE cat_name,cat_url_title, count(p.entry_id) as count
from exp_categories as c
join exp_category_posts as p on c.cat_id = p.cat_id
join exp_channel_titles as t on t.entry_id = p.entry_id
join exp_channels as ch on ch.channel_id = t.channel_id
where channel_name IN ('resources')
AND group_id = 2
group by cat_name
order by count desc
limit 5
The query optimizer in MySQL picks the indexes to used based on statistics it has on the indexes and tables. Sometimes the choice of index isn't that optimal and the query execution is different.
We have found on our database that at certain points in the day MySQL changes the execution path it uses for the same query.
You could try
analyze table exp_categories,exp_category_posts,exp_channel_titles,exp_channels ;
This sometimes improves the execution plan. Alternatively use index hints to determine which indexes are used.
Related
Currently I am facing a rather slow query on a website, which also slows down the server on more traffic. How can I rewrite the query or what index can I write to avoid "Using temporary; Using filesort"? Without "order by" everything works fast, but without the wanted result/order.
SELECT cams.name, models.gender, TIMESTAMPDIFF(YEAR, models.birthdate, CURRENT_DATE) AS age, lcs.viewers
FROM cams
LEFT JOIN cam_tags ON cams.id = cam_tags.cam_id
INNER JOIN tags ON cam_tags.tag_id = tags.id
LEFT JOIN model_cams ON cams.id = model_cams.cam_id
LEFT JOIN models ON model_cams.model_id = models.id
LEFT JOIN latest_cam_stats lcs ON cams.id = lcs.cam_id
WHERE tags.name = '?'
ORDER BY lcs.time_stamp_id DESC, lcs.viewers DESC
LIMIT 24 OFFSET 96;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
tags
NULL
const
PRIMARY,tags_name_uindex
tags_name_uindex
766
const
1
100
Using temporary; Using filesort
1
SIMPLE
cam_tags
NULL
ref
PRIMARY,cam_tags_cams_id_fk,cam_tags_tags_id_fk
cam_tags_tags_id_fk
4
const
75565047
100
Using where
1
SIMPLE
cams
NULL
eq_ref
PRIMARY
PRIMARY
4
cam_tags.cam_id
1
100
NULL
1
SIMPLE
model_cams
NULL
eq_ref
model_platforms_platforms_id_fk
model_platforms_platforms_id_fk
4
cam_tags.cam_id
1
100
NULL
1
SIMPLE
models
NULL
eq_ref
PRIMARY
PRIMARY
4
model_cams.model_id
1
100
NULL
1
SIMPLE
lcs
NULL
eq_ref
PRIMARY,latest_cam_stats_cam_id_time_stamp_id_viewers_index
PRIMARY
4
cam_tags.cam_id
1
100
NULL
There are many cases where it is effectively impossible to avoid "using temporary, using filesort".
"Filesort" does not necessarily involve a "file"; it is often done in RAM. Hence performance may not be noticeably hurt.
That said, I will assume your real question is "How can this query be sped up?".
Most of the tables are accessed via PRIMARY or "eq_ref" -- all good. But the second table involves touching an estimated 75M rows! Often that happens as the first table, not second. Hmmmm.
Sounds like cam_tags is a many-to-many mapping table? And it does not have any index starting with name? See this for proper indexes for such a table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Since the WHERE and ORDER BY reference more than one table, it is essentially impossible to avoid "using temporary, using filesort".
Worse than that, it needs to find all the ones with "name='?'", sort the list, skip 96 rows, and only finally deliver 24.
MOVED TO: MySQL Distinct performance
Basic Idea:
1) I have a Mysql Server with lots of data:
9 tables linked all with foreign keys in more or less linear way.
2) With a GUI I want to extract some results:
There are shown these 9 tables and only one variable for each table. Lets say:
Table 1: Frequency: 20,40,80,100
Table 2: Wavelength: 300,400,500,600
Table 3: ....
Now by marking in table->Frequency->20 the database should check every other table if there are entries which are measured with Freq 20 and update all tables depending on the 20.
BUT: I only want to show distinct values in every table. And this distinct takes 17s, which is very poor for a GUI to wait for.
Example Code:
SELECT wafer.ID
FROM product
JOIN chip ON chip.product_name=product.name
JOIN wafer ON wafer.ID = chip.wafer_ID
JOIN lot ON lot.ID = wafer.lot_ID
JOIN ROI ON ROI.ID_string = chip.ROI_ID
JOIN result ON result.chip_ID = chip.ID_string
JOIN setup ON setup.ID_md5 = result.setup_ID
JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")
Duration: 0.34 s
fetch: 17 s (1.5e6 rows)
Explain:
id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1 SIMPLE product const PRIMARY,name_UNIQUE PRIMARY 137 const 1 100.00 Using index
1 SIMPLE dataset index PRIMARY,ID_UNIQUE ID_UNIQUE 137 501 100.00 Using index
1 SIMPLE result ref dataset-result_idx,chip_ID_idx,setupID dataset-result_idx 137 databaseName.dataset.ID_md5 159 100.00
1 SIMPLE setup eq_ref PRIMARY PRIMARY 137 databaseName.result.setup_ID 1 100.00 Using index
1 SIMPLE chip eq_ref PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID PRIMARY 452 databaseName.result.chip_ID 1 49.99 Using where
1 SIMPLE ROI eq_ref PRIMARY,ID_UNIQUE PRIMARY 302 databaseName.chip.ROI_ID 1 100.00 Using index
1 SIMPLE wafer eq_ref PRIMARY,waferID_UNIQUE,number PRIMARY 62 databaseName.chip.wafer_ID 1 100.00
1 SIMPLE lot eq_ref PRIMARY,lotnumber_UNIQUE PRIMARY 62 databaseName.wafer.lot_ID 1 100.00 Using index
SELECT distinct wafer.ID {...same code as before}
Duration: 23 s
fetch: 0.000 s (54 rows)
Explain:
id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1 SIMPLE product const PRIMARY,name_UNIQUE PRIMARY 137 const 1 100.00 Using index; Using temporary
1 SIMPLE dataset index PRIMARY,ID_UNIQUE ID_UNIQUE 137 501 100.00 Using index
1 SIMPLE result ref dataset-result_idx,chip_ID_idx,setupID dataset-result_idx 137 databaseName.dataset.ID_md5 159 100.00
1 SIMPLE setup eq_ref PRIMARY PRIMARY 137 databaseName.result.setup_ID 1 100.00 Using index
1 SIMPLE chip eq_ref PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID PRIMARY 452 databaseName.result.chip_ID 1 49.99 Using where
1 SIMPLE ROI eq_ref PRIMARY,ID_UNIQUE PRIMARY 302 databaseName.chip.ROI_ID 1 100.00 Using index
1 SIMPLE wafer eq_ref PRIMARY,waferID_UNIQUE,number PRIMARY 62 databaseName.chip.wafer_ID 1 100.00
1 SIMPLE lot eq_ref PRIMARY,lotnumber_UNIQUE PRIMARY 62 databaseName.wafer.lot_ID 1 100.00 Using index; Distinct
I really wonder why this distinct takes so long. All rows here have indices.
This example only shows the code for one table. But I need 9 updating tables.
Is there any way to speed up this process or this "select distinct" query?
Btw: I'm not really capable of understanding the explain. If there is a big hint I wouldn't see it...
database
When asking a query performance question, you should show the tables structures and indexes so that it would be easier to help.
You are joining the 8 tables together and the sole limitation you have is that the product name has to be "GoodProduct". The product-table is joined against chip with the product_name so you should check if you have indexes on those name/product_name-columns. Depending on the number of rows in ROI and result, you might need a composite index on those.
Your query formatting is bit complex and hard to read. You simplify things by using format:
SELECT wafer.ID
FROM product
JOIN chip ON chip.product_name=product.name
JOIN wafer ON wafer.ID = chip.wafer_ID
JOIN lot ON lot.ID = wafer.lot_ID
JOIN ROI ON ROI.ID_string = chip.ROI_ID
JOIN result ON result.chip_ID = chip.ID_string
JOIN setup ON setup.ID_md5 = result.setup_ID
JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")
Note that tables lot, ROI, result, setup and dataset are in the query only for the reason that there needs to be a row on each table that matches the "GoodProduct". If this is not a requirement, you could do the query with just product, chip and wafer-tables and the performance would be considerably better.
Most of those tables do not prove anything to the query. Remove lot, dataset and perhaps some more. OTOH, one thing they may be providing is, for example, whether there is a "lot" for the item. That is, won't this give you the desired answer?
SELECT DISTINCT wafer.ID
FROM product
JOIN chip ON chip.product_name = product.name
JOIN wafer ON wafer.ID = chip.wafer_ID
WHERE product.name IN ("GoodProduct")
These indexes might help, if you don't already have them:
product: (name)
result: (dataset_ID, setup_ID, chip_ID)
dataset: (ID_md5)
setup: (ID_md5)
chip: (ID_string, ROI_ID, wafer_ID, product_name)
ROI: (ID_string)
wafer: (lot_ID, ID)
If I export our production database to a .sql file and then import it on my local machine, the same exact queries that run very fast (90ms) on the production database run insanely slow (100 seconds or more) locally.
production mysql version: 5.7.18
local mysql version: 5.7.21
I checked and it looks like all the indexes are there and the same in both databases. The only difference is the Cardinality values are different on the indexes. Would that make a difference?
Here's an example of a query that is slow:
SELECT DISTINCT v0.`id`
FROM `visibility__slugs` AS v0
INNER JOIN `visibility__business_type_slug` AS v5 ON v5.`slug_id` = v0.`id`
INNER JOIN `visibility__business_types` AS v1 ON (v5.`business_type_id` = v1.`id`) AND (v1.`type_name` != 'Other')
INNER JOIN `visibility__business_type_page` AS v6 ON v6.`business_type_id` = v1.`id`
INNER JOIN `visibility__pages` AS v2 ON v6.`page_id` = v2.`id`
INNER JOIN `visibility__page_postal_code` AS v7 ON v7.`page_id` = v2.`id`
INNER JOIN `visibility__postal_codes` AS v3 ON v7.`postal_code` = v3.`postal_code`
WHERE (v2.`is_live`)
Here's EXPLAIN
1 SIMPLE v0 NULL index PRIMARY,visibility__slugs_slug_unique visibility__slugs_slug_unique 767 NULL 1 100.00 Using index; Using temporary
1 SIMPLE v3 NULL index visibility__postal_codes_postal_code_unique,visibility__postal_codes_postal_code_index visibility__postal_codes_postal_code_index 767 NULL 1 100.00 Using index; Distinct; Using join buffer (Block Nested Loop)
1 SIMPLE v5 NULL ref visibility__business_type_slug_slug_id_business_type_id_unique,visibility__business_type_slug_slug_id_index,visibility__business_type_slug_business_type_id_index visibility__business_type_slug_slug_id_index 4 zipbooks_development.v0.id 1 100.00 Using index; Distinct
1 SIMPLE v1 NULL eq_ref PRIMARY PRIMARY 4 zipbooks_development.v5.business_type_id 1 90.00 Using where; Distinct
1 SIMPLE v6 NULL ref visibility__business_type_page_page_id_business_type_id_unique,visibility__business_type_page_page_id_index,visibility__business_type_page_business_type_id_index visibility__business_type_page_business_type_id_index 4 zipbooks_development.v5.business_type_id 15 100.00 Using index; Distinct
1 SIMPLE v2 NULL eq_ref PRIMARY PRIMARY 4 zipbooks_development.v6.page_id 1 90.00 Using where; Distinct
1 SIMPLE v7 NULL eq_ref visibility__page_postal_code_page_id_postal_code_unique,visibility__page_postal_code_page_id_index,visibility__page_postal_code_postal_code_index visibility__page_postal_code_page_id_postal_code_unique 771 zipbooks_development.v6.page_id,zipbooks_development.v3.postal_code 1 100.00 Using index; Distinct
Here's the slow query log:
Time Id Command Argument# Time: 2018-02-14T18:12:40.968206Z
# User#Host: root[root] # localhost [127.0.0.1] Id: 2
# Query_time: 191.781505 Lock_time: 0.000142 Rows_sent: 0 Rows_examined: 183768270
SET timestamp=1518631960;
SELECT DISTINCT v0.`id`
FROM `visibility__slugs` AS v0
INNER JOIN `visibility__business_type_slug` AS v5 ON v5.`slug_id` = v0.`id`
INNER JOIN `visibility__business_types` AS v1 ON (v5.`business_type_id` = v1.`id`) AND (v1.`type_name` != 'Other')
INNER JOIN `visibility__business_type_page` AS v6 ON v6.`business_type_id` = v1.`id`
INNER JOIN `visibility__pages` AS v2 ON v6.`page_id` = v2.`id`
INNER JOIN `visibility__page_postal_code` AS v7 ON v7.`page_id` = v2.`id`
INNER JOIN `visibility__postal_codes` AS v3 ON v7.`postal_code` = v3.`postal_code`
WHERE (v2.`is_live`);
It's also interesting to note that sometimes it works. Sometimes I import the backup and the queries are as fast as on production. I haven't seen a pattern between when it works and when the queries are very slow after import, but thats noteworthy. Could it be related to our use of --single-transaction --quick to dump production?
Can anyone think of some things I can try? Configuration values? What might be wrong here?
My indexes were corrupted. I found the offending table and ran
ALTER TABLE visibility__postal_codes ENGINE = InnoDB;
to rebuild the indexes and now its fixed.
Please consider the following query
SELECT * FROM PC_SMS_OUTBOUND_MESSAGE AS OM
JOIN MM_TEXTOUT_SERVICE AS TOS ON TOS.TEXTOUT_SERVICE_ID = OM.SERVICE_ID
JOIN PC_SERVICE_NUMBER AS SN ON OM.TO_SERVICE_NUMBER_ID = SN.SERVICE_NUMBER_ID
JOIN PC_SUBSCRIBER AS SUB ON SUB.SERVICE_NUMBER_ID = SN.SERVICE_NUMBER_ID
JOIN MM_CONTACT CON ON CON.SUBSCRIBER_ID = SUB.SUBSCRIBER_ID
--AND CON.MM_CLIENT_ID = 1
AND OM.CLIENT_ID= 1
AND OM.CREATED>='2013-05-08 11:47:53' AND OM.CREATED<='2014-05-08 11:47:53'
ORDER BY OM.SMS_OUTBOUND_MESSAGE_ID DESC LIMIT 50
To get the dataset I require I need to filter on the (commented out) CONTACTS client_id as well as the OUTBOUND_MESSAGES client_id but this is what changes the performance from milliseconds to tens of minutes.
Execution plan without "AND CON.MM_CLIENT_ID = 1":
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE OM index FK4E518EAA19F2EA2B,SERVICEID_IDX,CREATED_IDX,CLIENTID_IDX,CL_CR_ST_IDX,CL_CR_STYPE_ST_IDX,SID_TOSN_CL_CREATED_IDX PRIMARY 8 NULL 6741 3732.00 Using where
1 SIMPLE SUB ref PRIMARY,FKA1845E3459A7AEF FKA1845E3459A7AEF 9 mmlive.OM.TO_SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE SN eq_ref PRIMARY PRIMARY 8 mmlive.OM.TO_SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE CON ref FK2BEC061CA525D30,SUB_CL_IDX FK2BEC061CA525D30 8 mmlive.SUB.SUBSCRIBER_ID 1 100.00
1 SIMPLE TOS eq_ref PRIMARY,FKDB3DF298AB3EF4E2 PRIMARY 8 mmlive.OM.SERVICE_ID 1 100.00
Execution plan with "AND CON.MM_CLIENT_ID = 1":
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE CON ref FK2BEC061CA525D30,FK2BEC06134399E2A,SUB_CL_IDX FK2BEC06134399E2A 8 const 18306 100.00 Using temporary; Using filesort
1 SIMPLE SUB eq_ref PRIMARY,FKA1845E3459A7AEF PRIMARY 8 mmlive.CON.SUBSCRIBER_ID 1 100.00
1 SIMPLE OM ref FK4E518EAA19F2EA2B,SERVICEID_IDX,CREATED_IDX,CLIENTID_IDX,CL_CR_ST_IDX,CL_CR_STYPE_ST_IDX,SID_TOSN_CL_CREATED_IDX FK4E518EAA19F2EA2B 9 mmlive.SUB.SERVICE_NUMBER_ID 3 100.00 Using where
1 SIMPLE SN eq_ref PRIMARY PRIMARY 8 mmlive.SUB.SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE TOS eq_ref PRIMARY,FKDB3DF298AB3EF4E2 PRIMARY 8 mmlive.OM.SERVICE_ID 1 100.00
Any suggestions on how to format the above to make it a little easier on the eye would be good.
ID fields are primary keys.
There are indexes on all joining columns.
You may be able to fix this problem by using a subquery:
JOIN (SELECT C.* FROM CONTACTS C WHERE C.USER_ID = 1) ON C.SUBSCRIBER_ID = SUB.ID
This will materialize the matching rows, which could have downstream effects on the query plan.
If this doesn't work, then edit your query and add:
The explain plans for both queries.
The indexes available on the table.
EDIT:
Can you try creating a composite index:
PC_SMS_OUTBOUND_MESSAGE(CLIENT_ID, CREATED, SERVICE_ID, TO_ SERVICE_ID, SMS_OUTBOUND_MESSAGE_ID);
This may change both query plans to start on the OM table with the appropriate filtering, hopefully making the results stable and good.
I've solved the riddle! For my case anyway so I'll share.
This all came down to the join order changing once I added that extra clause, which you can clearly see in the execution plan. When the query is fast, the Outbound Messages are at the top of the plan but when slow (after adding the clause), the Contacts table is at the top.
I think this means that the Outbound Messages index can no longer be utilised for the sorting which causes the dreaded;
"Using temporary; Using filesort"
By simply adding STRAIGHT_JOIN keyword directly after the select I could force the execution plan to join in the order denoted directly by the query.
Happy for anyone with a more intimate knowledge of this field to contradict any of the above in terms of what is actually happening but it definitely worked.
For the last couple of days I have been working on normalizing our 600GB database.
I have broken out all redundant data into 4 separate tables plus a main entry table.
All is well so far, but the last step, joining the new tables with the old data records and inserting the new normalized data records into the database.For this I'm using "INSERT INTO SELECT". But not to the problem. If i run this query on the first 100 id's It takes 10sec but if I run it on the first 300 rows it takes several minutes. What can I do to fix this?
SELECT * FROM oldDB.`unNormalized`
INNER JOIN `new_normalized_db`.`keyword` k ON `unNormalized_tabel`.`keyword` = k.`keyword`
INNER JOIN `new_normalized_db`.`project` p ON `unNormalized_tabel`.`awrProject` = p.`project`
INNER JOIN `new_normalized_db`.`searchEngine` s ON `unNormalized_tabel`.`searchEngine` = s.`searchEngine`
INNER JOIN `new_normalized_db`.`urlHash` u ON MD5(`unNormalized_tabel`.`url`) = u.`hash`
WHERE oldDB.`unNormalized_tabel`.`id` < 100
GROUP BY k.`id`, p.`id`, s.`id`,u.`id`
As of right now the old entrys only have a primery key index, should I add a full text index to all the old data columns? Am thinking this could take months on a 600gb un-normalized database? And what about space how much extra space does 4 new indexes take up?
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ----------------- ------ --------------------------------------------------------------- ------------ ------- -------------------------------- ------ ----------------------------------------------
1 SIMPLE p index (NULL) projectName 42 (NULL) 427 Using index; Using temporary; Using filesort
1 SIMPLE unormalized_tabel range PRIMARY,keyword_url_insDate,keyword,searchEngine,url,awrProject PRIMARY 4 (NULL) 358 Using where; Using join buffer
1 SIMPLE u ref url url 767 oldDB.unormalized_tabel.url 1
1 SIMPLE k index (NULL) keyword 42 (NULL) 107340 Using where; Using index; Using join buffer
1 SIMPLE s index (NULL) searchEngine 42 (NULL) 1155 Using where; Using index; Using join buffer
You can speed up your query by adding indexes.
The first join of k can be faster if there is an index on keyword and also an index on unNormalized_tabel.keyword the same for project p with awrProject and p.project and s.searchEngine and unNormalized_tabel.searchEngine
But the last join will be slow anyway, because a hash must be calculated on query time, which is very slow on much data. What you can do is hash the url before inserting in the unNormalized_tabel, and then add an index on the hash_field.