I have the following query that sometimes returns an empty set on the master but NEVER on the read replica and there is data that is there that match on both databases. It is random and am wondering if there is a mysql setting or something with query cache. Running mysql 5.6.40-log on rds.
I have tried doing optimizer_switch="index_merge_intersection=off" but it didn't work.
UPDATE optimizer_switch="index_merge_intersection=off seems to have worked, but I cleared the query cache after making this change and the problem seems to have resolved itself).
One really odd issue that happened is the query worked via mysql command line 100% of the time; but the web application didn't work until I cleared the query cache (even though it connects as the same user).
Once I do optimize table phppos_items it fixes it for a little bit (3 minutes) and then it goes back to being erratic (mostly empty sets). These are all innodb tables.
settings:
https://gist.github.com/blasto333/82b18ef979438b93e4c39624bbf489d7
Seems to return empty set more often during busy time of day. Server is rds m4.large with 500 databases with 100 tables each
Query:
SELECT SUM( phppos_sales_items.damaged_qty ) AS damaged_qty,
SUM( phppos_sales_items.subtotal ) AS subtotal,
SUM( phppos_sales_items.total ) AS total,
SUM( phppos_sales_items.tax ) AS tax,
SUM( phppos_sales_items.profit ) AS profit
FROM `phppos_sales`
JOIN `phppos_sales_items` ON `phppos_sales_items`.`sale_id` = `phppos_sales`.`sale_id`
JOIN `phppos_items` ON `phppos_sales_items`.`item_id` = `phppos_items`.`item_id`
WHERE `phppos_sales`.`deleted` =0
AND `sale_time` BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 23:59:59'
AND `phppos_sales`.`location_id` IN ( 1 )
AND `phppos_sales`.`store_account_payment` =0
AND `suspended` <2
AND `phppos_items`.`deleted` =0
AND `phppos_items`.`supplier_id` = '485'
GROUP BY `phppos_sales_items`.`sale_id`
Explain:
+----+-------------+--------------------+-------------+-----------------------------------------------------------------------------------------------+-----------------------------+---------+-------------------------------------------------------+------+---------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------------+-----------------------------------------------------------------------------------------------+-----------------------------+---------+-------------------------------------------------------+------+---------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | phppos_items | index_merge | PRIMARY,phppos_items_ibfk_1,deleted,deleted_system_item | phppos_items_ibfk_1,deleted | 5,4 | NULL | 44 | Using intersect(phppos_items_ibfk_1,deleted); Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | phppos_sales_items | ref | PRIMARY,item_id,phppos_sales_items_ibfk_3,phppos_sales_items_ibfk_4,phppos_sales_items_ibfk_5 | item_id | 4 | phppoint_customer.phppos_items.item_id | 16 | NULL |
| 1 | SIMPLE | phppos_sales | eq_ref | PRIMARY,deleted,location_id,sales_search,phppos_sales_ibfk_10 | PRIMARY | 4 | phppoint_customer.phppos_sales_items.sale_id | 1 | Using where |
+----+-------------+--------------------+-------------+-----------------------------------------------------------------------------------------------+-----------------------------+---------+-------------------------------------------------------+------+---------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
Related
We support both MSSQL and MySQL for Entityframework 6 in an MVC 5 Application. Now, the problem I am having is when using the MySQL connectors and LINQ, queries which have an INNER JOIN and an ORDER BY will cause the query to be brought into a sub-select and the ORDER BY is applied on the outside. This causes a substantial performance impact. This does not happen when using the MSSQL connector. Here is an example:
SELECT
`Project3`.*
FROM
(SELECT
`Extent1`.*,
`Extent2`.`Name_First`
FROM
`ResultRecord` AS `Extent1`
LEFT OUTER JOIN `ResultInputEntity` AS `Extent2` ON `Extent1`.`Id` = `Extent2`.`Id`
WHERE
`Extent1`.`DateCreated` <= '4/4/2016 6:29:59 PM'
AND `Extent1`.`DateCreated` >= '12/31/2015 6:30:00 PM'
AND 0000 = `Extent1`.`CustomerId`
AND (`Extent1`.`InUseById` IS NULL OR 0000 = `Extent1`.`InUseById` OR `Extent1`.`LockExpiration` < '4/4/2016 6:29:59 PM')
AND `Extent1`.`DivisionId` IN (0000)
AND `Extent1`.`IsDeleted` != 1
AND EXISTS( SELECT
1 AS `C1`
FROM
`ResultInputEntityIdentification` AS `Extent3`
WHERE
`Extent1`.`Id` = `Extent3`.`InputEntity_Id`
AND 0 = `Extent3`.`Type`
AND '0000' = `Extent3`.`Number`
AND NOT (`Extent3`.`Number` IS NULL)
OR LENGTH(`Extent3`.`Number`) = 0)
AND EXISTS( SELECT
1 AS `C1`
FROM
`ResultRecordAssignment` AS `Extent4`
WHERE
1 = `Extent4`.`AssignmentType`
AND `Extent4`.`AssignmentId` = 0000
OR 2 = `Extent4`.`AssignmentType`
AND `Extent4`.`AssignmentId` = 0000
AND `Extent4`.`ResultRecordId` = `Extent1`.`Id`)) AS `Project3`
ORDER BY `Project3`.`DateCreated` ASC , `Project3`.`Name_First` ASC , `Project3`.`Id` ASC
LIMIT 0 , 25
This query simply times out when being ran against against a few million rows. This is the explain for the above query:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
| 1 | PRIMARY | Extent1 | ref | IX_ResultRecord_CustomerId,IX_ResultRecord_DateCreated,IX_ResultRecord_IsDeleted,IX_ResultRecord_InUseById,IX_ResultRecord_LockExpiration,IX_ResultRecord_DivisionId | IX_ResultRecord_CustomerId | 4 | const | 1 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | Extent2 | ref | PRIMARY | PRIMARY | 8 | Extent1.Id | 1 | |
| 4 | DEPENDENT SUBQUERY | Extent4 | ref | IX_RA_AT,IX_RA_A_ID,IX_RA_RR_ID | IX_RA_A_ID | 5 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | Extent3 | ALL | IX_InputEntity_Id,IX_InputEntityIdentification_Type,IX_InputEntityIdentification_Number | | | | 14341877 | Using where
Now, as it would get generated in MSSQL, or we simply get rid of the sub select to ORDER BY, the improvement is dramatic!
SELECT
`Extent1`.*,
`Extent2`.`Name_First`
FROM
`ResultRecord` AS `Extent1`
LEFT OUTER JOIN `ResultInputEntity` AS `Extent2` ON `Extent1`.`Id` = `Extent2`.`Id`
WHERE
`Extent1`.`DateCreated` <= '4/4/2016 6:29:59 PM'
AND `Extent1`.`DateCreated` >= '12/31/2015 6:30:00 PM'
AND 0000 = `Extent1`.`CustomerId`
AND (`Extent1`.`InUseById` IS NULL
OR 0000 = `Extent1`.`InUseById`
OR `Extent1`.`LockExpiration` < '4/4/2016 6:29:59 PM')
AND `Extent1`.`DivisionId` IN (0000)
AND `Extent1`.`IsDeleted` != 1
AND EXISTS( SELECT
1 AS `C1`
FROM
`ResultInputEntityIdentification` AS `Extent3`
WHERE
`Extent1`.`Id` = `Extent3`.`InputEntity_Id`
AND 9 = `Extent3`.`Type`
AND '0000' = `Extent3`.`Number`
AND NOT (`Extent3`.`Number` IS NULL)
OR LENGTH(`Extent3`.`Number`) = 0)
AND EXISTS( SELECT
1 AS `C1`
FROM
`ResultRecordAssignment` AS `Extent4`
WHERE
1 = `Extent4`.`AssignmentType`
AND `Extent4`.`AssignmentId` = 0000
OR 2 = `Extent4`.`AssignmentType`
AND `Extent4`.`AssignmentId` = 0000
AND `Extent4`.`ResultRecordId` = `Extent1`.`Id`)
ORDER BY `Extent1`.`DateCreated` ASC , `Extent2`.`Name_First` ASC , `Extent1`.`Id` ASC
LIMIT 0 , 25
This query now runs in 0.10 seconds! And the explain plan is now this:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | | | | 1 | Using temporary; Using filesort |
| 1 | PRIMARY | Extent1 | ref | PRIMARY,IX_ResultRecord_CustomerId,IX_ResultRecord_DateCreated,IX_ResultRecord_IsDeleted,IX_ResultRecord_InUseById,IX_ResultRecord_LockExpiration,IX_ResultRecord_DivisionId | PRIMARY | 8 | Extent3.InputEntity_Id | 1 | Using where |
| 1 | PRIMARY | Extent4 | ref | IX_RA_AT,IX_RA_A_ID,IX_RA_RR_ID | IX_RA_RR_ID | 8 | Extent3.InputEntity_Id | 1 | Using where; Start temporary; End temporary |
| 1 | PRIMARY | Extent2 | ref | PRIMARY | PRIMARY | 8 | Extent3.InputEntity_Id | 1 | |
| 2 | MATERIALIZED | Extent3 | ref | IX_InputEntity_Id,IX_InputEntityIdentification_Type,IX_InputEntityIdentification_Number | IX_InputEntityIdentification_Type | 4 | const | 1 | Using where |
Now, I have had this issue many times across the system, and it is clear that it is an issue with the MySQL EF 6 Connector deciding to always wrap queries in a sub-select to apply the ORDER BY, but only when there is a join in the query. This is causing major performance issues. Some answers I have seen suggest modifying the connector source code, but that can be tedious, has anyone had this same issue, know a work around, modified the connector already or have any other suggestions besides simply moving to SQL Server and leaving MySQL behind, as that is not an option.
Did you have a look to SQL Server generated SQL? Is it different or only performances are different?
Because [usually] is not the provider that decide the structure of the query (i.e. order a subquery). The provider just translate the structure of the query to the syntax of the DBMS. So, In your case the problem could be the DBMS optimizer.
In issues similar to your I used a different approach based on mapping a query to entities i.e. using ObjectContext.ExecuteStoreQuery.
It turns out that in order to work around this with the MySQL Driver, your entire lambda must be written in one go. Meaning in ONE Where(..) Predicate. This way the driver knows that it is all one result set. Now, if you build an initial IQueryable, and then keep appending Where clauses to it which access child tables, it will believe that there are multiple result sets and therefore wrap your entire query into a sub-select in order to sort and limit it.
I took over a project written in Laravel 4. We have MySQL 5.6.21 - PHP 5.4.30 - currently running on Windows 8.1.
Every morning on the first attempt to access the landingpage - which contain about 5 queries on the backend - this site will crash with a php-timeout (over 30 seconds for response).
After using following I got closer to the cause: Laravel 4 - logging SQL queries. One of the queries takes more than 25 seconds on the first call. After that its always < 0.5 seconds.
The query has got 3 joins and 2 subselects wrapped in Cache::remember. I want to go into optimizing this so that on production it won't run into this problem.
So I want to test different SQLs
The Problem is that the first time the data gets cached somehow and then I can't see whether my new SQL's are better or not.
Now, since I guess it's a caching issue (on the first attempt it takes long, afterwards not) I did these:
MySQL: FLUSH TABLES;
restart MySQL
restart Apache
php artisan cache:clear
But still, the query works fast. Then after some time I don't access the database at all (can't give an exact time, maybe 4 hours of inactivity) it happens again.
Explain says:
1 | Primary | table1 | ALL | 2 possible keys | NULL | ... | 1010000 | using where; using temporary; using filesort
1 | Primary | table2 | eq_ref | PRIMARY | PRIMARY | ... | 1 | using where; using index
1 | Primary | table3 | eq_ref | PRIMARY | PRIMARY | ... | 1 | using where; using index
1 | Primary | table4 | eq_ref | PRIMARY | PRIMARY | ... | 1 | NULL
3 | Dependent Subquery | table5 | ref | 2 possible keys | table1.id | ... | 17 | using where
2 | Dependent Subquery | table5 | ref | 2 possible keys | table1.id | ... | 17 | using where
So here the questions:
What's the reason for this long time?
How can I reproduce it? and
Is there a way to fix it?
I read mysql slow on first query, then fast for related queries. However that doesn't answer my question on how to reproduce this behaviour.
Update
I changed the SQL and now it is written like:
select
count(ec.id) as asdasda
from table1 ec force index for join (PRIMARY)
left join table2 e force index for join (PRIMARY) on ec.id = e.id
left join table3 v force index for join (PRIMARY) on e.id = v.id
where
v.col1 = 'aaa'
and v.col2 = 'bbb'
and v.col3 = 'ccc'
and e.datecol > curdate()
and e.col1 != 0
Now explain says:
+----+-------------+--------+--------+---------------+--------------+---------+-----------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+--------+---------------+--------------+---------+-----------------+--------+-------------+
| 1 | SIMPLE | table3 | ALL | PRIMARY | NULL | NULL | NULL | 114032 | Using where |
| 1 | SIMPLE | table2 | ref | PRIMARY | PRIMARY | 5 | table3.id | 11 | Using where |
| 1 | SIMPLE | table1 | eq_ref | PRIMARY | PRIMARY | 4 | table2.id | 1 | Using index |
+----+-------------+--------+--------+---------------+--------------+---------+-----------------+--------+-------------+
Is that as good as it can get?
The data might be cached in the InnoDB buffer pool or on Windows filesystem cache.
You can't explicitly flush the InnoDB cache but you can set the flushing parameters to more aggressive values:
SET GLOBAL innodb_old_blocks_pct = 5
SET GLOBAL innodb_max_dirty_pages_pct = 0
You can use the solution provided here to clear Windows filesystem cache: Clear file cache to repeat performance testing
But what you really need is an index on table3 (col1, col2, col3)
The below query even without the order by is very slow and I can't figure out why. I'm guessing it's the where date_affidavit_file but how can I make it fast with that order byas well? Perhaps a sublect on the job_id's that match the where and then pass that into the rest of the code but I still need to order by server the servername like this. Any suggestions?
explain select sql_no_cache court_county, job.id as jid, job_status,
DATE_FORMAT(job.datetime_served, '%m/%d/%Y') as dserved ,
CONCAT(server.namefirst, ' ', server.namelast) as servername, client_name,
DATE_FORMAT(job.datetime_received, '%m/%d/%Y') as dtrec ,
DATE_FORMAT(job.datetime_give2server, '%m/%d/%Y') as dtg2s,
DATE_FORMAT(date_kase_filed, '%m/%d/%Y') as dkf,
DATE_FORMAT(job.date_sent_to_court, '%m/%d/%Y') as dtstc ,
TO_DAYS(datetime_served )-TO_DAYS(date_kase_filed) as totaldays from job
left join kase on kase.id=job.kase_id
left join server on job.server_id=server.id
left join client on kase.client_id=client.id
left join LUcourt on LUcourt.id=kase.court_id
where date_affidavit_filed is not null and date_affidavit_filed !='' order by servername;
+----+-------------+---------+--------+----------------------+---------+---------+-----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+----------------------+---------+---------+-----------------------+--------+----------------------------------------------+
| 1 | SIMPLE | job | ALL | date_affidavit_filed | NULL | NULL | NULL | 365212 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | kase | eq_ref | PRIMARY | PRIMARY | 4 | pserve.job.kase_id | 1 | |
| 1 | SIMPLE | server | eq_ref | PRIMARY | PRIMARY | 4 | pserve.job.server_id | 1 | |
| 1 | SIMPLE | client | eq_ref | PRIMARY | PRIMARY | 4 | pserve.kase.client_id | 1 | |
| 1 | SIMPLE | LUcourt | eq_ref | PRIMARY | PRIMARY | 4 | pserve.kase.court_id | 1 | |
+----+-------------+---------+--------+----------------------+---------+---------+-----------------------+--------+----------------------------------------------+
Check that you have indexes on the following columns. job.kase_id or job.server_id
Also you are ordering by a calculated field which is not optimal. Perhaps order by a field with index.
If you need to preserve that exact sort, you might want to add a field in the DB for that value. And populate it with appropriate values or set up a trigger on the DB to populate it for you automatically.
This can speed up the order by:
CREATE INDEX namefull ON server (namefirst,namelast);
if you do ORDER BY (server.namefirst, server.namelast) instead of ORDER BY servername, which should produce the same output.
You can also create indexes on each table on any field you are left joining, that can improve the performance of your query too.
When you write,
where date_affidavit_filed is not null and date_affidavit_filed !=''
you practically are selecting most of the rows. Or at least so many that it is not worthwhile to run through the indexing. The query planner sees that there is an index involving date_affidavit_filed, but decides not to use it and go with the WHERE clause, which only involves date_affidavit_filed; so we know it's not a key issue, it must be a cardinality issue.
| 1 | SIMPLE | job | ALL | date_affidavit_filed | NULL | NULL | NULL | 365212 | Using where; Using temporary; Using filesort |
You can try optimizing this by creating an index on
date_affidavit_filed, kase_id, server_id
in that order. How many rows are returned by the query?
You are selecting everything that isn't empty really.
That really means everything.
I don't know how many rows of data you have have but it's a lot to go through.
Try narrowing your query to a date range or specific client.
If you really need everything, don't output it one row after a time, but build up a big string in the software you use to output with all formatting, and then when you're finished looping through the results and you have constructed the data you wish to output you can output them in one big go.
You could also use paging.
Just add limit 0,30 on page 1, limit 30,30 on page two, etc.. and let the end user walk through the pages.
First off, I've looked at several other questions about optimizing sql queries, but I'm still unclear for my situation what is causing my problem. I read a few articles on the topic as well and have tried implementing a couple possible solutions, as I'll describe below, but nothing has yet worked or even made an appreciable dent in the problem.
The application is a nutrition tracking system - users enter the foods they eat and based on an imported USDA database the application breaks down the foods to the individual nutrients and gives the user a breakdown of the nutrient quantities on a (for now) daily basis.
here's
A PDF of the abbreviated database schema
and here it is as a (perhaps poor quality) JPG. I made this in open office - if there are suggestions for better ways to visualize a database, I'm open to suggestions on that front as well! The blue tables are directly from the USDA, and the green and black tables are ones I've made. I've omitted a lot of data in order to not clutter things up unnecessarily.
Here's the query I'm trying to run that takes a very long time:
SELECT listing.date_time,listing.nutrdesc,data.total_nutr_mass,listing.units
FROM
(SELECT nutrdesc, nutr_no, date_time, units
FROM meals, nutr_def
WHERE meals.users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
AND (nutr_no <100000
OR nutr_no IN
(SELECT nutr_def_nutr_no
FROM nutr_rights
WHERE nutr_rights.users_userid = '2'))
) as listing
LEFT JOIN
(SELECT nutrdesc, date_time, nut_data.nutr_no, sum(ingred_gram_mass*entry_qty_num*nutr_val/100) AS total_nutr_mass
FROM nut_data, recipe_ingredients, food_entries, meals, nutr_def
WHERE nut_data.nutr_no = nutr_def.nutr_no
AND ndb_no = ingred_ndb_no
AND foods_food_id = entry_ident
AND meals_meal_id = meal_id
AND users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
GROUP BY date_time,nut_data.nutr_no ) as data
ON data.date_time = listing.date_time
AND listing.nutr_no = data.nutr_no
ORDER BY listing.date_time,listing.nutrdesc,listing.units
So I know that's rather complex - The first select gets a listing of all the nutrients that the user consumed within the given date range, and the second fills in all the quantities.
When I implement them separately, the first query is really fast, but the second is slow and gets very slow when the date ranges get large. The join makes the whole thing ridiculously slow. I know that the 'main' problem is the join between these two derived tables, and I can get rid of that and do the join by hand basically in php much faster, but I'm not convinced that's the whole story.
For example: for 1 month of data, the query takes about 8 seconds, which is slow, but not completely terrible. Separately, each query takes ~.01 and ~2 seconds respectively. 2 seconds still seems high to me.
If I try to retrieve a year's worth of data, it takes several (>10) minutes to run the whole query, which is problematic - the client-server connection sometimes times out, and in any case we don't want I don't want to sit there with a spinning 'please wait' icon. Mainly, I feel like there's a problem because it takes more than 12x as long to retrieve 12x more information, when it should take less than 12x as long, if I were doing things right.
Here's the 'explain' for each of the slow queries: (the whole thing, and just the second half).
Whole thing:
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5053 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 4341 | |
| 4 | DERIVED | meals | range | PRIMARY,day_ind | day_ind | 9 | NULL | 30 | Using where; Using temporary; Using filesort |
| 4 | DERIVED | food_entries | ref | meals_meal_id | meals_meal_id | 5 | nutrition.meals.meal_id | 15 | Using where |
| 4 | DERIVED | recipe_ingredients | ref | foods_food_id,ingred_ndb_no | foods_food_id | 4 | nutrition.food_entries.entry_ident | 2 | |
| 4 | DERIVED | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | |
| 4 | DERIVED | nut_data | ref | PRIMARY | PRIMARY | 36 | nutrition.nutr_def.nutr_no,nutrition.recipe_ingredients.ingred_ndb_no | 1 | |
| 2 | DERIVED | meals | range | day_ind | day_ind | 9 | NULL | 30 | Using where |
| 2 | DERIVED | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | Using where |
| 3 | DEPENDENT SUBQUERY | nutr_rights | index_subquery | users_userid,nutr_def_nutr_no | nutr_def_nutr_no | 19 | func | 1 | Using index; Using where |
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
10 rows in set (2.82 sec)
Second chunk (data):
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | meals | range | PRIMARY,day_ind | day_ind | 9 | NULL | 30 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | food_entries | ref | meals_meal_id | meals_meal_id | 5 | nutrition.meals.meal_id | 15 | Using where |
| 1 | SIMPLE | recipe_ingredients | ref | foods_food_id,ingred_ndb_no | foods_food_id | 4 | nutrition.food_entries.entry_ident | 2 | |
| 1 | SIMPLE | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | |
| 1 | SIMPLE | nut_data | ref | PRIMARY | PRIMARY | 36 | nutrition.nutr_def.nutr_no,nutrition.recipe_ingredients.ingred_ndb_no | 1 | |
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)
I've 'analyzed' all the tables involved in the query, and added an index on the datetime field that is joining meals and food entries. I called it 'day_ind'. I hoped that would accelerate things, but it didn't seem to make a difference. I also tried removing the 'sum' function, as I understand that having a function in the query will frequently mean a full table scan, which is obviously much slower. Unfortunately removing the 'sum' didn't seem to make a difference either (well, about 3-5% or so, but not the order magnitude that I'm looking for).
I would love any suggestions and will be happy to provide any more information you need to help diagnose and improve this problem. Thanks in advance!
There are a few type All in your explain suggest full table scan. and hence create temp table. You could re-index if it is not there already.
Sort and Group By are usually the performance killer, you can adjust Mysql memory settings to avoid physical i/o to tmp table if you have extra memory available.
Lastly, try to make sure the data type of the join attributes matches. Ie data.date_time = listing.date_time has same data format.
Hope that helps.
Okay, so I eventually figured out what I'm gonna end up doing. I couldn't make the 'data' query any faster - that's still the bottleneck. But now I've made it so the total query process is pretty close to linear, not exponential.
I split the query into two parts and made each one into a temporary table. Then I added an index for each of those temp tables and did the join separately afterwards. This made the total execution time for 1 month of data drop from 8 to 2 seconds, and for 1 year of data from ~10 minutes to ~30 seconds. Good enough for now, I think. I can work with that.
Thanks for the suggestions. Here's what I ended up doing:
create table listing (
SELECT nutrdesc, nutr_no, date_time, units
FROM meals, nutr_def
WHERE meals.users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
AND (
nutr_no <100000 OR nutr_no IN (
SELECT nutr_def_nutr_no
FROM nutr_rights
WHERE nutr_rights.users_userid = '2'
)
)
);
create table data (
SELECT nutrdesc, date_time, nut_data.nutr_no, sum(ingred_gram_mass*entry_qty_num*nutr_val/100) AS total_nutr_mass
FROM nut_data, recipe_ingredients, food_entries, meals, nutr_def
WHERE nut_data.nutr_no = nutr_def.nutr_no
AND ndb_no = ingred_ndb_no
AND foods_food_id = entry_ident
AND meals_meal_id = meal_id
AND users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
GROUP BY date_time,nut_data.nutr_no
);
create index joiner on data(nutr_no, date_time);
create index joiner on listing(nutr_no, date_time);
SELECT listing.date_time,listing.nutrdesc,data.total_nutr_mass,listing.units
FROM listing
LEFT JOIN data
ON data.date_time = listing.date_time
AND listing.nutr_no = data.nutr_no
ORDER BY listing.date_time,listing.nutrdesc,listing.units;
I ran into a problem last week moving from dev-testing where one of my queries which had run perfectly in dev, was crawling on my testing server.
It was fixed by adding FORCE INDEX on one of the indexes in the query.
Now I've loaded the same database into the production server (and it's running with the FORCE INDEX command, and it has slowed again.
Any idea what would cause something like this to happen? The testing and prod are both running the same OS and version of mysql (unlike the dev).
Here's the query and the explain from it.
EXPLAIN SELECT showsdate.bid, showsdate.bandid, showsdate.date, showsdate.time,
-> showsdate.title, showsdate.name, showsdate.address, showsdate.rank, showsdate.city, showsdate.state,
-> showsdate.lat, showsdate.`long` , tickets.link, tickets.lowprice, tickets.highprice, tickets.source
-> , tickets.ext, artistGenre, showsdate.img
-> FROM tickets
-> RIGHT OUTER JOIN (
-> SELECT shows.bid, shows.date, shows.time, shows.title, artists.name, artists.img, artists.rank, artists
-> .bandid, shows.address, shows.city, shows.state, shows.lat, shows.`long`, GROUP_CONCAT(genres.genre SEPARATOR
-> ' | ') AS artistGenre
-> FROM shows FORCE INDEX (biddate_idx)
-> JOIN artists ON shows.bid = artists.bid JOIN genres ON artists.bid=genres.bid
-> WHERE `long` BETWEEN -74.34926984058 AND -73.62463215942 AND lat BETWEEN 40.39373515942 AND 41.11837284058
-> AND shows.date >= '2009-03-02' GROUP BY shows.bid, shows.date ORDER BY shows.date, artists.rank DESC
-> LIMIT 0, 30
-> )showsdate ON showsdate.bid = tickets.bid AND showsdate.date = tickets.date;
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 30 | |
| 1 | PRIMARY | tickets | ref | biddate_idx | biddate_idx | 7 | showsdate.bid,showsdate.date | 1 | |
| 2 | DERIVED | genres | index | bandid_idx | bandid_idx | 141 | NULL | 531281 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | shows | ref | biddate_idx | biddate_idx | 4 | activeHW.genres.bid | 5 | Using where |
| 2 | DERIVED | artists | eq_ref | bid_idx | bid_idx | 4 | activeHW.genres.bid | 1 | |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+--------+----------------------------------------------+
I think I chimed in when you asked this question about the differences in dev -> test.
Have you tried rebuilding the indexes and recalculating statistics? Generally, forcing an index is a bad idea as the optimizer usually makes good choices as to which indexes to use. However, that assumes that it has good statistics to work from and that the indexes aren't seriously fragmented.
ETA:
To rebuild indexes, use:
REPAIR TABLE tbl_name QUICK;
To recalculate statistics:
ANALYZE TABLE tbl_name;
Does test server have only 10 records and production server 1000000000 records?
This might also cause different execution times
Are the two servers configured the same? It sounds like you might be crossing a "tipping point" in MySQL's performance. I'd compare the MySQL configurations; there might be a memory parameter way different.