Optimizing SQL query with inner join on 3 big tables - mysql

I have a SQL query with 3 tables joined on a distant MySQL DB
Two of these tables have size of about 15GB (STEP_RESULT and meas_numericlimit)
Before sending data, a TMP table is created on the server which takes about 2.5 hours to end
I am not the server admin but I can observe my queries with the MySql WorkBench
This server is up to date with 64GB of RAM
How can I optimize this query ?
Thank you
My query :
select
t1.UUT_NAME,
t1.STATION_NUM,
t1.START_DATE_TIME,
t3.LOW_LIMIT,
t3.DATA,
t3.HIGH_LIMIT,
t3.UNITS,
t2b.STEP_NAME
from
meas_numericlimit t3
inner join STEP_RESULT t2a on t3.ID = t2a.STEP_ID
inner join STEP_RESULT t2b on t2a.STEP_PARENT = t2b.STEP_ID
inner join uut_result t1 on t2b.UUT_RESULT = t1.ID
where
t1.UUT_NAME like 'Variable1-1%' and
t1.STATION_NUM = 'variable2' and
t2b.STEP_NAME = 'variable3' and
t2b.STEP_TYPE = 'constant'
Here the SHOW TABLES and EXPLAIN output queries :
+--------------------+
| Tables_in_spectrum |
+--------------------+
| cal_dates |
| calibrage |
| execution_time |
| meas_numericlimit |
| station_feature |
| step_callexe |
| step_graph |
| step_msgjnl |
| step_msgpopup |
| step_passfail |
| step_result |
| step_seqcall |
| step_stringvalue |
| syst_event |
| uptime |
| users |
| uut_result |
+--------------------+
and
+----+-------------+-------+--------+-------------------------+--------
| id | select_type | table | type | possible_keys | key
|
+----+-------------+-------+--------+-------------------------+--------
| 1 | SIMPLE | t2a | ALL | NULL | NULL
|
| 1 | SIMPLE | t3 | eq_ref | PRIMARY | PRIMARY
|
| 1 | SIMPLE | t2b | ALL | NULL | NULL
|
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,FK_uut_result_1 | PRIMARY
|
+----+-------------+-------+--------+-------------------------+--------
---------+----------------------+----------- +-------------------------
key_len | ref | rows | Extra
|
---------+----------------------+----------- +-------------------------
NULL | NULL | 48120004 |
|
40 | spectrum.t2a.STEP_ID | 1 |
|
NULL | NULL | 48120004 | Using where; Using join
buffer |
40 | spectrum.t2b.UUT_RESULT | 1 | Using where
|
-------+----------------------+------------+---------------------------
Here the SHOW CREATE TABLE :
CREATE TABLE `uut_result` (
`ID` varchar(38) NOT NULL DEFAULT '',
`STATION_NUM` varchar(255) DEFAULT NULL,
`SOFTVER_ODTGEN` varchar(10) DEFAULT NULL,
`HARDVER_ODTGEN` varchar(10) DEFAULT NULL,
`NEXT_CAL_DATE` date DEFAULT NULL,
`UUT_NAME` varchar(255) DEFAULT NULL,
`UUT_SERIAL_NUMBER` varchar(255) DEFAULT NULL,
`UUT_VERSION` varchar(255) DEFAULT NULL,
`USER_LOGIN_NAME` varchar(255) DEFAULT NULL,
`USER_LOGIN_LOGIN` varchar(255) NOT NULL DEFAULT '',
`START_DATE_TIME` datetime DEFAULT NULL,
`EXECUTION_TIME` float DEFAULT NULL,
`UUT_STATUS` varchar(255) DEFAULT NULL,
`UUT_ERROR_CODE` int(11) DEFAULT NULL,
`UUT_ERROR_MESSAGE` varchar(1023) DEFAULT NULL,
`PAT_NAME` varchar(255) NOT NULL DEFAULT '',
`PAT_VERSION` varchar(10) NOT NULL DEFAULT '',
`TEST_LEVEL` varchar(50) DEFAULT NULL,
`INTERFACE_ID` int(10) unsigned NOT NULL DEFAULT '0',
`EXECUTION_MODE` varchar(45) DEFAULT NULL,
`LOOP_MODE` varchar(45) DEFAULT NULL,
`STOP_ON_FAIL` tinyint(4) unsigned NOT NULL DEFAULT '0',
`EXECUTION_COMMENT` text,
PRIMARY KEY (`ID`),
KEY `FK_uut_result_1` (`STATION_NUM`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
and
CREATE TABLE `meas_numericlimit` (
`ID` varchar(38) NOT NULL DEFAULT '',
`STEP_RESULT` varchar(38) NOT NULL DEFAULT '',
`NAME` varchar(255) DEFAULT NULL,
`COMP_OPERATOR` varchar(30) DEFAULT NULL,
`HIGH_LIMIT` double DEFAULT NULL,
`LOW_LIMIT` double DEFAULT NULL,
`UNITS` varchar(255) DEFAULT NULL,
`DATA` double DEFAULT NULL,
`STATUS` varchar(255) DEFAULT NULL,
`FORMAT` varchar(15) DEFAULT NULL,
`NANDATA` int(11) DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `FK_meas_numericlimit_1` (`STEP_RESULT`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
and
CREATE TABLE `step_result` (
`ID` varchar(38) NOT NULL DEFAULT '',
`UUT_RESULT` varchar(38) NOT NULL DEFAULT '',
`STEP_PARENT` varchar(38) DEFAULT NULL,
`STEP_NAME` varchar(255) DEFAULT NULL,
`STEP_ID` varchar(38) NOT NULL DEFAULT '',
`STEP_TYPE` varchar(255) DEFAULT NULL,
`STATUS` varchar(255) DEFAULT NULL,
`REPORT_TEXT` text,
`DIAG` text,
`ERROR_OCCURRED` tinyint(1) NOT NULL DEFAULT '0',
`ERROR_CODE` int(11) DEFAULT NULL,
`ERROR_MESSAGE` varchar(1023) DEFAULT NULL,
`MODULE_TIME` float DEFAULT NULL,
`TOTAL_TIME` float DEFAULT NULL,
`NUM_LOOPS` int(11) DEFAULT NULL,
`NUM_PASSED` int(11) DEFAULT NULL,
`NUM_FAILED` int(11) DEFAULT NULL,
`ENDING_LOOP_INDEX` int(11) DEFAULT NULL,
`LOOP_INDEX` int(11) DEFAULT NULL,
`INTERACTIVE_EXENUM` int(11) DEFAULT NULL,
`STEP_GROUP` varchar(30) DEFAULT NULL,
`STEP_INDEX` int(11) DEFAULT NULL,
`ORDER_NUMBER` int(11) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `FK_step_result_1` (`UUT_RESULT`),
KEY `IDX_step_parent` (`STEP_PARENT`)
) ENGINE=MyISAM DEFAULT CHARSET=latin

First thing to note is that just because you write the joins in one order doesn't mean that they actually get executed in that order. (Look up declarative languages.)
For this reason I would start by creating indexes that would satisfy each where clause, then each join predicate, but in both directions...
From the WHERE clause, all compound/covering indexes should start with...
STEP_RESULT : (STEP_NAME, STEP_TYPE)
uut_result : (STATION_NUM, UUT_NAME)
Also accounting for the JOIN predicates, all compound/covering indexes should then be...
meas_numericlimit : (ID)
STEP_RESULT : (STEP_NAME, STEP_TYPE, STEP_ID)
STEP_RESULT : (STEP_NAME, STEP_TYPE, STEP_PARENT)
STEP_RESULT : (STEP_NAME, STEP_TYPE, UUT_RESULT)
uut_result : (STATION_NUM, UUT_NAME, ID)
Of those 5 indexes, you will likely only see four in use, so you may want to drop the one not in use, or keep it in case a change in statistics changes the explain plan.
This does depend somewhat on the nature of your data. For example, you may just want the index on uut_result to be "reversed" to be (ID, STATION_NUM, UUT_NAME). Without knowing anything about the behaviour of your data, it may be worth trying both. (The same applies to the other index suggestions.)

If these were InnoDB tables (and not MyISAM) I'd write the query like this:
SELECT t1.uut_name
, t1.station_num
, t1.start_date_time
, t3.low_limit
, t3.data
, t3.high_limit
, t3.units
, t2b.step_name
FROM uut_result t1
JOIN step_result t2b
ON t2b.uut_result = t1.id
AND t2b.step_type = 'constant'
AND t2b.step_name = 'variable3'
JOIN step_result t2a
ON t2a.step_parent = t2b.step_id
JOIN meas_numericlimit t3
ON t3.id = t2a.step_id
WHERE t1.station_num = 'variable2'
AND t1.uut_name LIKE 'Variable1-1%'
And I would create suitable covering indexes:
... uut_result_IX1 ON uut_result (station_num, uut_name, start_date_time, id)
... step_result_IX1 ON step_result (uut_result, step_type, step_name, step_id)
... step_result_IX2 ON step_result (step_parent, step_id)
For a possible incremental performance increase, I'd also consider one more covering index ...
... meas_numericlimit_IX1 ON meas_numericlimit (id, low_limit, data, high_limit, units)
(With InnoDB, the primary key column(s) are the cluster key, so there is less of a benefit here.)
With MyISAM, suitable indexes are important. But I don't think that covering indexes give the same kind of benefit as they do with InnoDB.
With MyISAM, a covering index isn't going to avoid a visit to the pages in the underlying table. So I think we'd be better off with shorter indexes, with just the columns used in the predicates:
... uut_result_IX1 ON uut_result (station_num, uut_name)
... step_result_IX1 ON step_result (uut_result, step_type, step_name)
... step_result_IX2 ON step_result (step_parent)
Increasing key_buffer_size to a larger value will allow MyISAM indexes to be cached; but don't over allocate. There's no caching of the MyISAM table pages, except the OS file system cache.
Making modifications to other configuration parameters for MyISAM (in my limited experience with MyISAM) have a net effect of just over allocating (i.e. wasting) memory, with negligible or marginal impacts on performance. So I wouldn't mess with those. (That's not to say that some haven't gleaned some performance improvement. I just haven't had any success with my test cases.)
Instead of mucking with MyISAM tuning, I'd expend my efforts to lobby for these tables to be changed to InnoDB storage engine. And then tuning InnoDB.
EDIT
The order of the columns in the indexes I suggested is based on having the leading columns with the equality comparisons... with a preference for the most "selective" columns before the columns with more repeated values.
These indexes were suggested with an execution plan in mind ... starting with t1 (uut_result) as the driving table, with a join to t2b, then the join to t2a, and finally the join to t3.
Moving the predicates from the WHERE clause to the ON clause wasn't for a performance gain... it was to keep the predicates on each table grouped together in the query, as an aid to the future reader.
I see the query as starting as a query against the t1 (uut_result) table
SELECT t1.uut_name
, t1.station_num
, t1.start_date_time
-- , t3.low_limit
-- , t3.data
-- , t3.high_limit
-- , t3.units
-- , t2b.step_name
FROM uut_result t1
-- JOIN step_result t2b
-- ON t2b.uut_result = t1.id
-- AND t2b.step_type = 'constant'
-- AND t2b.step_name = 'variable3'
-- JOIN step_result t2a
-- ON t2a.step_parent = t2b.step_id
-- JOIN meas_numericlimit t3
-- ON t3.id = t2a.step_id
WHERE t1.station_num = 'variable2'
AND t1.uut_name LIKE 'Variable1-1%'
And then un-commenting the lines that reference t2b ...

Related

MySQL query with ORDER BY takes long time to execute

I have a table named 'response_set' with following indexes (result of 'show create table response_set;'):
| response_set | CREATE TABLE `response_set` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`survey_id` int(11) NOT NULL DEFAULT '0',
`respondent_id` int(11) DEFAULT NULL,
`ext_ref` varchar(64) DEFAULT NULL,
`email_addr` varchar(128) DEFAULT NULL,
`ip` varchar(32) DEFAULT NULL,
`t` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`time_taken` int(11) DEFAULT NULL,
`category_id` int(11) DEFAULT NULL,
`duplicate` int(1) DEFAULT '0',
`email_group` varchar(30) DEFAULT NULL,
`external_email_id` int(11) DEFAULT NULL,
`geo_code_country` varchar(64) DEFAULT NULL,
`geo_code_country_code` varchar(2) DEFAULT NULL,
`terminated_survey` int(1) DEFAULT NULL,
`geo_code_region` varchar(128) DEFAULT NULL,
`geo_code_city` varchar(3) DEFAULT NULL,
`geo_code_area_code` varchar(3) DEFAULT NULL,
`geo_code_dma_code` varchar(3) DEFAULT NULL,
`restart_url` varchar(255) DEFAULT NULL,
`inset_list` varchar(1024) DEFAULT NULL,
`custom1` varchar(1024) DEFAULT NULL,
`custom2` varchar(1024) DEFAULT NULL,
`custom3` varchar(1024) DEFAULT NULL,
`custom4` varchar(1024) DEFAULT NULL,
`panel_member_id` int(11) DEFAULT NULL,
`external_id` int(11) DEFAULT NULL,
`weight` float DEFAULT NULL,
`custom5` varchar(1024) DEFAULT NULL,
`quota_overlimit` int(1) DEFAULT '0',
`panel_id` int(11) DEFAULT NULL,
`referer_url` varchar(255) DEFAULT NULL,
`referer_domain` varchar(64) DEFAULT NULL,
`user_agent` varchar(255) DEFAULT NULL,
`longitude` decimal(15,12) DEFAULT '0.000000000000',
`latitude` decimal(15,12) DEFAULT '0.000000000000',
`radius` decimal(7,2) DEFAULT '0.00',
`cx_business_unit_id` int(11) DEFAULT '0',
`survey_link_id` int(11) DEFAULT '0',
`data_quality_flag` int(1) DEFAULT '0',
`data_quality_score` double DEFAULT '0',
`extended_info_json` json DEFAULT NULL,
`updated_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`channel` int(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `panel_member_id` (`panel_member_id`),
KEY `panel_member_id_2` (`panel_member_id`),
KEY `email_group` (`email_group`),
KEY `email_group_2` (`email_group`),
KEY `survey_timestamp_idx` (`survey_id`,`t`),
KEY `cx_business_unit_id_idx` (`cx_business_unit_id`),
KEY `data_quality_flag_idx` (`data_quality_flag`),
KEY `data_quality_score_idx` (`data_quality_score`),
KEY `survey_timestamp_terminated_idx` (`survey_id`,`t`,`terminated_survey`),
KEY `survey_idx` (`survey_id`)
) ENGINE=InnoDB AUTO_INCREMENT=39759 DEFAULT CHARSET=utf8 |
Now I am executing the following query on a page to retrieve the response_set rows based on survey_id and order by id:
SELECT *
FROM response_set a
WHERE a.survey_id = 1602673827
ORDER BY a.id limit 100;
The issue is sometimes the query is taking more than 30 seconds to be executed and this behaviour is inconsistent (as it sometimes happen when order by a.id and sometimes when order by a.id DESC as the user can view the response sets in ascending or descending order on the page) for different survey_id.
There are approx 6.2 million records in the table and for the given survey_id (1602673827) there are 45,800 records. On using the EXPLAIN SELECT statement to understand the query execution plan, I got the following info:
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | a | NULL | index | survey_timestamp_idx,survey_timestamp_terminated_idx | PRIMARY | 4 | NULL | 6863 | 1.46 | Using where |
+----+-------------+-------+------------+-------+------------------------------------------------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Now I am not able to understand that even though the indexes -> 'survey_timestamp_idx,survey_timestamp_terminated_idx' are present, why is MySQL not using the indexes and is opting for the full table scan. Also when i modify the query as follows:
SELECT *
FROM response_set a USE INDEX (survey_timestamp_idx)
WHERE a.survey_id = 1602673827
ORDER BY a.id limit 100;
The query execution time is reduced to 0.17 seconds. On doing the EXPLAIN for the modified query, I get the following info:
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
| 1 | SIMPLE | a | NULL | ref | survey_timestamp_idx | survey_timestamp_idx | 4 | const | 87790 | 100.00 | Using index condition; Using filesort |
+----+-------------+-------+------------+------+----------------------+----------------------+---------+-------+-------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
However, I don't want to explicitly use 'USE INDEX' in the query as the where clause is dynamic and may contain following combinations in where clause as per the user's choice of filter:
1. where survey_id = ?;
2. where survey_id = ? and t = ?; (t is timestamp)
3. where survey_id = ? and terminated_survey = ?;
4. where survey_id = ? and t = ? and terminated_survey = ?;
Also, if I remove the ORDER BY clause from the query, the query always uses index and gets executed very fast.
Is there any other way, so that the MySQL query engine chooses the correct (faster) execution plan (by using correct indexes) when ORDER BY clause is present in query?
I am using MySQL version : 5.7.22
I have read the MySQL official documentation for ORDER BY query optimization (https://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html) and tried adding composite index on (id, survey_id) and (survey_id, id) but it didn't work. Can somebody please help?
where survey_id = ?;
where survey_id = ? and t = ?; (t is timestamp)
where survey_id = ? and terminated_survey = ?;
where survey_id = ? and t = ? and terminated_survey = ?;
Assuming you have ORDER BY id ASC (or DESC), then you need 4 indexes to handle all of them optimally. Start with the 1, 2, or 3 columns (in any order) mentioned in the WHERE, then finish with id.
I cannot explain why KEY survey_idx (survey_id) was not used for the query in question, nor was that index a "possible_key" in the EXPLAIN. It is as if something changed between running the queries and posting this Question. Please recheck.
BTW, INT(1) still takes 4 bytes; you probably wanted the one-byte TINYINT UNSIGNED. Many of the other fields are bigger than necessary. Size plays into performance, at least a little.
0.17s -- Might be even faster with FORCE INDEX(survey_idx)
Starting with the PRIMARY KEY (as in (id, survey_id)) is almost always useless. An index should start things that are tested with =, then move onto something tested as a range or a GROUP BY or (as in your case), ORDER BY.
Cookbook: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Running MySQL on RPI / optimising query

I have a raspberry PI 3 running MySQL 5.5.57 - this is the only service running on the RPI.
My app makes a key query (below) which takes 5-7 sec to execute on the MySQL server.
I have done a lot to optimize indexes and FK but it really hasn't helped much. When I do an explain I see that it is using temporary and filesort, which I don't really understand.
Are there any configuration tweaks which should be done when running mysql on a RPI. I don't know much about the various buffers...
Is there anything else I should do to optimise the query?
The table has about 30.000 rows and growing...
This is the query:
SELECT SQL_NO_CACHE distinct `photos`.*
FROM `photos`
LEFT OUTER JOIN `facets` ON `photos`.`id` = `facets`.`photo_id`
WHERE (`photos`.`date_taken` <= '2017-08-24')
AND (photos.status != 1 or photos.status is NULL)
ORDER BY photos.date_taken DESC LIMIT 500 OFFSET 500;
This is the table setup:
CREATE TABLE `photos` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`date_taken` datetime DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`file_extension` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`file_size` int(11) DEFAULT NULL,
`location_id` bigint(20) DEFAULT NULL,
`make` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`model` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`original_height` int(11) DEFAULT NULL,
`original_width` int(11) DEFAULT NULL,
`longitude` decimal(16,10) DEFAULT NULL,
`latitude` decimal(16,10) DEFAULT NULL,
`status` int(11) DEFAULT ''0'',
`phash` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`org_id` int(11) DEFAULT NULL,
`lg_id` int(11) DEFAULT NULL,
`md_id` int(11) DEFAULT NULL,
`tm_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_photos_on_location_id` (`location_id`),
KEY `index_photos_on_date_taken` (`date_taken`),
KEY `index_photos_on_status` (`status`),
KEY `index_photos_on_phash` (`phash`),
CONSTRAINT `fk_rails_47f4e5f105` FOREIGN KEY (`location_id`) REFERENCES `locations` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25672 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
When I do an explain on the query then this is what I get:
+----+-------------+--------+-------+---------------------------------------------------+----------------------------+---------+-------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------------------------------------+----------------------------+---------+-------------------+-------+----------------------------------------------+
| 1 | SIMPLE | photos | range | index_photos_on_date_taken,index_photos_on_status | index_photos_on_date_taken | 9 | NULL | 13147 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | facets | ref | index_facets_on_photo_id | index_facets_on_photo_id | 9 | pt_prod.photos.id | 1 | Using index; Distinct |
+----+-------------+--------+-------+---------------------------------------------------+----------------------------+---------+-------------------+-------+----------------------------------------------+
The query will at times be extended to:
SELECT DISTINCT `photos`.*
FROM `photos`
LEFT OUTER JOIN `facets` ON `photos`.`id` = `facets`.`photo_id`
LEFT OUTER JOIN `tags` ON `facets`.`source_id` = `tags`.`id`
LEFT OUTER JOIN `comments` ON `facets`.`source_id` = `comments`.`id`
WHERE `photos`.`date_taken` >= '2017-01-25'
AND `photos`.`date_taken` <= '2018-01-10'
AND `locations`.`country_id` = 16
AND `locations`.`city_id` = 21
OR `facets`.`source_id` = 9 AND `facets`.`type` = 'AlbumFacet'
OR `facets`.`source_id` = 9 AND `facets`.`type` = 'TagFacet'
THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION.
Your query is only using columns form the first table. I would write it as:
SELECT SQL_NO_CACHE `photos`.*
FROM `photos` p
LEFT OUTER JOIN `facets` ON `photos`.`id` = `facets`.`photo_id`
WHERE (p.`date_taken` <= '2017-08-24') AND (NOT p.status <=> 1) AND
EXISTS (SELECT 1 FROM facets f WHERE pid = f.photo_id)
ORDER BY p.date_taken DESC
LIMIT 500 OFFSET 500;
Removing the SELECT DISTINCT should be a bit win. You should also have an index on facets(photo_id).
An index on (date_taken, status) might help. However, it is not clear how selective your conditions are, so an index on photos might not be of much use.

Slow Query on Rails joins

The following rails query throws back in slow query log:
Class ParserRun
scope :active, -> {
where(completed_at: nil)
.joins('LEFT JOIN system_events ON parser_runs.id = system_events.parser_run_id')
.where("system_events.created_at > '#{active_system_events_threshold}' OR parser_runs.created_at > '#{1.minute.ago.to_s(:db)}'")
}
How can I optimize this?
Slow querylog:
SELECT `parser_runs`.*
FROM `parser_runs`
INNER JOIN `system_events` ON `system_events`.`parser_run_id` = `parser_runs`.`id`
WHERE `parser_runs`.`type` IN ('DatasetParserRun')
AND `parser_runs`.`completed_at` IS NULL
AND (system_events.created_at <= '2017-08-05 04:03:09');
# Time: 170805 5:03:43
Output of 'show create table parser_runs;'
| parser_runs | CREATE TABLE `parser_runs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
`options` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`completed_at` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_parser_runs_on_customer_id` (`customer_id`)
) ENGINE=InnoDB AUTO_INCREMENT=143327 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
output of 'show create table system_events;'
| system_events | CREATE TABLE `system_events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`log_level` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
`classification` int(11) DEFAULT NULL,
`information` text COLLATE utf8_unicode_ci,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`parser_run_id` int(11) DEFAULT NULL,
`notified` tinyint(1) DEFAULT '0',
`dataset_log_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_system_events_on_classification` (`classification`),
KEY `index_system_events_on_customer_id` (`customer_id`),
KEY `index_system_events_on_parser_run_id` (`parser_run_id`),
KEY `index_system_events_on_dataset_log_id` (`dataset_log_id`)
) ENGINE=InnoDB AUTO_INCREMENT=730539 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
Output of EXPLAIN:
EXPLAIN for: SELECT `parser_runs`.* FROM `parser_runs` LEFT JOIN system_events ON parser_runs.id = system_events.parser_run_id WHERE `parser_runs`.`completed_at` IS NULL AND (system_events.created_at > '2017-08-07 10:09:03')
+----+-------------+---------------+--------+------------------------- -------------+---------+---------+--------------------------------------+- -------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+--------------------------------------+---------+---------+--------------------------------------+--------+-------------+
| 1 | SIMPLE | system_events | ALL | index_system_events_on_parser_run_id | NULL | NULL | NULL | 655946 | Using where |
| 1 | SIMPLE | parser_runs | eq_ref | PRIMARY | PRIMARY | 4 | ashblood.system_events.parser_run_id | 1 | Using where |
+----+-------------+---------------+--------+--------------------------------------+---------+---------+--------------------------------------+--------+-------------+
2 rows in set (0.00 sec)
The first step in the query execution plan (the output of EXPLAIN SELECT ...) indicates that the whole system_events table is being scanned in order to check which rows in the system_events table will be used in the join with the parser_runs table.
Please, add an index on the created_at column in the system_events and repeat the query. Please, check the new execution path to verify whether the whole table is being scanned, or if the new index is being used.
In addition, although probably not the root of the problem, you could add an index on the type and completed_at columns of the table parser_runs. Please, note that I mean an index on both columns (in the given order) instead of an index on each column.
INDEX(type, completed_at)
INDEX(completed_at, type)
INDEX(created_at, parser_run_id)
INDEX(parser_run_id, created_at)
It is not obvious which indexes the Optimizer will prefer; add all of those.
Don't use joins. Instead break the join queries in separate queries and store those data in variables. And later get your desired results from those data.

MYSQL, very slow order by

I have got two tables. One is a User table with a primary key on the userid and the other table references the user table with a foreign key.
The User table has only one entry (for now) and the other table has one million entrys.
The following join drives me mad:
SELECT p0_.*, p1_.*
FROM photo p0_, User p1_
WHERE p0_.user_id = p1_.user_id
ORDER BY p0_.uploaddate DESC Limit 10 OFFSET 100000
The query takes 12sec on a very fast machine with the order by and 0.0005 sec without the order by.
I've got an index on user_id (IDX_14B78418A76ED395) and a composite index ("search2") on user_id and uploaddate.
EXPLAIN shows the following:
+----+-------------+-------+------+------------------------------+----------------------+---------+---------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------------+----------------------+---------+---------------------+-------+---------------------------------+
| 1 | SIMPLE | p1_ | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | p0_ | ref | IDX_14B78418A76ED395,search2 | IDX_14B78418A76ED395 | 4 | odsfoto.p1_.user_id | 58520 | |
+----+-------------+-------+------+------------------------------+----------------------+---------+---------------------+-------+---------------------------------+
Table definitions:
CREATE TABLE `photo` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`album_id` int(11) DEFAULT NULL,
`exif_id` int(11) DEFAULT NULL,
`title` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`width` int(11) NOT NULL,
`height` int(11) NOT NULL,
`uploaddate` datetime NOT NULL,
`filesize` int(11) DEFAULT NULL,
`path` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`originalFilename` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`mimeType` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`description` longtext COLLATE utf8_unicode_ci,
`gpsData_id` int(11) DEFAULT NULL,
`views` int(11) DEFAULT NULL,
`likes` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_14B78418B0FC9251` (`exif_id`),
UNIQUE KEY `UNIQ_14B7841867E96507` (`gpsData_id`),
KEY `IDX_14B78418A76ED395` (`user_id`),
KEY `IDX_14B784181137ABCF` (`album_id`),
KEY `search_idx` (`uploaddate`),
KEY `search2` (`user_id`,`uploaddate`),
KEY `search3` (`uploaddate`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `user` (
`user_id` int(11) NOT NULL,
`photoCount` int(11) NOT NULL,
`photoViews` int(11) NOT NULL,
`photoComments` int(11) NOT NULL,
`photoLikes` int(11) NOT NULL,
`username` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
What can I do to speed up this query?
Seems you're suffering from MySQL's inability to do late row lookups:
MySQL ORDER BY / LIMIT performance: late row lookups
Late row lookups: InnoDB
Try this:
SELECT p.*, u.*
FROM (
SELECT id
FROM photo
ORDER BY
uploaddate DESC, id DESC
LIMIT 10
OFFSET 100000
) pi
JOIN photo p
ON p.id = pi.id
JOIN user u
ON u.user_id = p.user_id
You need a separate index on uploaddate. This sort will take advantage of composite index only if uploaddate is first column in it.
You can also try to add user_id to ORDER BY:
....
ORDER BY p0_.user_id, p0_.uploaddate
You have two problems:
You need to create an INDEX(user_id, uploaddate) which will greatly increase the efficiency of the query.
You need to find a workaround to using LIMIT 10 OFFSET 100000. MySQL is creating a recordset with 100,000 records in it, then it pulls the last 10 records off the end... that is extremely inefficient.
https://www.percona.com/blog/2006/09/01/mysql-order-by-limit-performance-optimization/
First try to get result based on primary key with out join and use result to query result again.
For ex:
$userIds=mysql::select("select user_id from photo ORDER BY p0_.uploaddate DESC Limit 10 OFFSET 100000");
$photoData=mysql::select("SELECT p0_., p1_.
FROM photo p0_, User p1_
WHERE p0_.user_id = p1_.user_id and p0_.user_id in ($userIds->user_id) order by p0_.uploaddate");
Here we had divided the statement into two parts:
1.We can easily order and get based on primary key and also there are no joins.
2.Getting query results based on id and order by is only on limited columns we can retrieve data in less time
From 30 seconds to 0.015 sec / 0.000 sec using Quassnoi answer !
This is what I called MySql expertise !
I cut out one Join from my personal project (no join with itself)
Select ser.id_table, ser.id_rec, ser.relevance, cnt, title, description, sell_url, medium_thumb,
unique_id_supplier, keywords width, height, media_type
from (
Select ser.id_rec, ser.id_table, ser.relevance, ser.cnt
from searchEngineResults ser
where thisSearch = 16287
order by ser.relevance desc, cnt desc, id_rec
) ser
join photo_resell sou on sou.id = ser.id_rec
#join searchEngineResults ser on ser.id_rec = tmp.id_rec
limit 0, 9

Need help optimizing MYSQL query with join

I'm doing a join between the "favorites" table (3 million rows) the "items" table (600k rows).
The query is taking anywhere from .3 seconds to 2 seconds, and I'm hoping I can optimize it some.
Favorites.faver_profile_id and Items.id are indexed.
Instead of using the faver_profile_id index I created a new index on (faver_profile_id,id), which eliminated the filesort needed when sorting by id. Unfortunately this index doesn't help at all and I'll probably remove it (yay, 3 more hours of downtime to drop the index..)
Any ideas on how I can optimize this query?
In case it helps:
Favorite.removed and Item.removed are "0" 98% of the time.
Favorite.collection_id is NULL about 80% of the time.
SELECT `Item`.`id`, `Item`.`source_image`, `Item`.`cached_image`, `Item`.`source_title`, `Item`.`source_url`, `Item`.`width`, `Item`.`height`, `Item`.`fave_count`, `Item`.`created`
FROM `favorites` AS `Favorite`
LEFT JOIN `items` AS `Item`
ON (`Item`.`removed` = 0 AND `Favorite`.`notice_id` = `Item`.`id`)
WHERE ((`faver_profile_id` = 1) AND (`collection_id` IS NULL) AND (`Favorite`.`removed` = 0) AND (`Item`.`removed` = '0'))
ORDER BY `Favorite`.`id` desc LIMIT 50;
+----+-------------+----------+--------+----------------------------------------------------- ----------+------------------+---------+-----------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------------------------------------------------------+------------------+---------+-----------------------------------------+------+-------------+
| 1 | SIMPLE | Favorite | ref | notice_id,faver_profile_id,collection_id_idx,idx_faver_idx_id | idx_faver_idx_id | 4 | const | 7910 | Using where |
| 1 | SIMPLE | Item | eq_ref | PRIMARY | PRIMARY | 4 | gragland_imgfavebeta.Favorite.notice_id | 1 | Using where |
+----+-------------+----------+--------+---------------------------------------------------------------+------------------+---------+-----------------------------------------+------+-------------+

| Table | Create Table |

| favorites | CREATE TABLE `favorites` (
`id` int(11) NOT NULL auto_increment COMMENT 'unique identifier',
`faver_profile_id` int(11) NOT NULL default '0',
`collection_id` int(11) default NULL,
`collection_order` int(8) default NULL,
`created` datetime NOT NULL default '0000-00-00 00:00:00' COMMENT 'date this record was created',
`modified` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP COMMENT 'date this record was modified',
`notice_id` int(11) NOT NULL default '0',
`removed` tinyint(1) NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `notice_id` (`notice_id`),
KEY `faver_profile_id` (`faver_profile_id`),
KEY `collection_id_idx` (`collection_id`),
KEY `idx_faver_idx_id` (`faver_profile_id`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |


| Table | Create Table |

| items |CREATE TABLE `items` (
`id` int(11) NOT NULL auto_increment COMMENT 'unique identifier',
`submitter_id` int(11) NOT NULL default '0' COMMENT 'who made the update',
`source_image` varchar(255) default NULL COMMENT 'update content',
`cached_image` varchar(255) default NULL,
`source_title` varchar(255) NOT NULL default '',
`source_url` text NOT NULL,
`width` int(4) NOT NULL default '0',
`height` int(4) NOT NULL default '0',
`status` varchar(122) NOT NULL default '',
`popular` int(1) NOT NULL default '0',
`made_popular` timestamp NULL default NULL,
`fave_count` int(9) NOT NULL default '0',
`tags` text,
`user_art` tinyint(1) NOT NULL default '0',
`nudity` tinyint(1) NOT NULL default '0',
`created` datetime NOT NULL default '0000-00-00 00:00:00' COMMENT 'date this record was created',
`modified` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP COMMENT 'date this record was modified',
`removed` int(1) NOT NULL default '0',
`nofront` tinyint(1) NOT NULL default '0',
`test` varchar(10) NOT NULL default '',
`recs` text,
`recs_data` text,
PRIMARY KEY (`id`),
KEY `notice_profile_id_idx` (`submitter_id`),
KEY `content` (`source_image`),
KEY `idx_popular` (`popular`),
KEY `idx_madepopular` (`made_popular`),
KEY `idx_favecount_idx_id` (`fave_count`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

First of all, you order by favorites.id which is clustered primary key in favorites table. This wil not be necessary of you will join favorites to items instead of items to favorites.
Second, (Item.removed = '0') in WHERE is excess, because the same condition has already been used in JOIN.
Third, change the order of condition in join to:
`Favorite`.`notice_id` = `Item`.`id` AND `Item`.`removed` = 0
the optimizer will be able to use you primary key for index. You may even consider creating (id, removed) index on items table.
Next, create (faver_profile_id, removed) index in favorites (or better update faver_profile_id index) and change the order of conditions in WHERE to the following:
(`faver_profile_id` = 1)
AND (`Favorite`.`removed` = 0)
AND (`collection_id` IS NULL)
UPD: I am sorry, I missed that you already join favorites to items. Then the ORDER BY is not needed. You should result in something like the following:
SELECT
`Item`.`id`,
`Item`.`source_image`,
`Item`.`cached_image`,
`Item`.`source_title`,
`Item`.`source_url`,
`Item`.`width`,
`Item`.`height`,
`Item`.`fave_count`,
`Item`.`created`
FROM `favorites` AS `Favorite`
LEFT JOIN `items` AS `Item`
ON (`Favorite`.`notice_id` = `Item`.`id` AND `Item`.`removed` = 0)
WHERE `faver_profile_id` = 1
AND `Favorite`.`removed` = 0
AND `collection_id` IS NULL
LIMIT 50;
And one more thing, when you have KEY idx_faver_idx_id (faver_profile_id,id) you do not need KEY faver_profile_id (faver_profile_id), because the second index just duplicates half of the idx_faver_idx_id. I hope you will extend the second index, as I suggested.
Get a copy of your table from backup, and try to make an index on Favorite table covering all WHERE and JOIN conditions, namely (removed, collection_id, profile_id). Do the same with Item. It might help, but will make inserts potentially much slower.
The SQL engine won't use an index if it still has to do full table scan due to constraints, would it?