I ran following query in Mysql 5.5 and Mariadb 10.2. it took only 4 seconds to run the query in Mariadb while it took about 6 minutes in Mysql.
SELECT
' ' AS max_claim_amount,
claims_only_A.*
FROM
(
SELECT
employee.emp_number AS emp_number,
' ' AS emp_id,
' ' AS emp_name,
NULL AS estimate_id,
NULL AS estimate_submitted_date,
NULL AS estimate_state,
NULL AS currency_for_reimbursement,
NULL AS cash_in_advance,
NULL AS estimate_purpose,
NULL AS estimate_exp_type,
NULL AS estimate_foreign_currency,
NULL AS estimate_exchange_rate,
NULL AS estimate_amount,
NULL AS claim_id,
NULL AS claim_currency_for_reimbursement,
NULL AS claimed_date,
NULL AS claim_exp_type,
cety.id AS claim_exp_type_id,
claim_cc.currency_id AS claim_foreign_currency,
cex.exchange_rate AS claim_exchange_rate,
cex.amount AS claim_amount,
cex.remarks AS claim_remarks,
employee.deleted_at AS emp_deleted_at,
employee.purged_at AS emp_purged_at,
employee.termination_id AS emp_termination_id,
employee.emp_lastname AS emp_lastname,
el.location_id AS emp_location_id,
employee.job_title_code AS emp_job_title_code,
employee.work_station AS emp_work_station,
cr.request_id AS claim_request_id,
employee.emp_status AS emp_status,
NULL AS estimate_sort_id,
cex.id AS claim_exp_id
FROM
`claim_request` cr
LEFT JOIN `claim_expense` cex ON cex.request_id = cr.id
LEFT JOIN `claim_expense_type` cety ON cex.expense_type_id = cety.id
LEFT JOIN `_employee` AS employee ON cr.emp_number = employee.emp_number
LEFT JOIN claim_currency claim_cc ON (claim_cc.id = cex.currency_id)
LEFT JOIN claim_currency claim_req_cc ON (claim_req_cc.id = cr.currency_id)
LEFT JOIN _emp_locations el ON(employee.emp_number = el.emp_number)
WHERE
cr.id NOT IN (
SELECT
claim_request_id
FROM
`claim_estimation_claiming`
)
) AS claims_only_A
WHERE
(claim_request_id, claim_amount) NOT IN (
SELECT
claim_request_id,
MAX(claim_amount)
FROM
(
SELECT
cr.request_id AS claim_request_id,
cex.amount AS claim_amount,
cety.name AS claim_expense_type,
cex.id AS claim_exp_id
FROM
`claim_request` cr
LEFT JOIN `claim_expense` cex ON cex.request_id = cr.id
LEFT JOIN `claim_expense_type` cety ON cex.expense_type_id = cety.id
WHERE
cr.id NOT IN (
SELECT
claim_request_id
FROM
`claim_estimation_claiming`
)
) AS A
GROUP BY
claim_request_id,
claim_expense_type
)
Explain of the queried run were the followings,
-- MYSQL 5.5
+----+--------------------+--------------------------------+----------------+------------------+------------------+---------+-----------------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------------------+----------------+------------------+------------------+---------+-----------------------------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2876 | Using where |
| 4 | DEPENDENT SUBQUERY | <derived5> | ALL | NULL | NULL | NULL | NULL | 2876 | Using temporary; Using filesort |
| 5 | DERIVED | cr | ALL | NULL | NULL | NULL | NULL | 1131 | Using where |
| 5 | DERIVED | cex | ref | request_id | request_id | 5 | dbname.cr.id | 1 | |
| 5 | DERIVED | cety | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cex.expense_type_id | 1 | |
| 6 | DEPENDENT SUBQUERY | claim_estimation_claiming | index_subquery | claim_request_id | claim_request_id | 5 | func | 2 | Using index |
| 2 | DERIVED | cr | ALL | NULL | NULL | NULL | NULL | 1131 | Using where |
| 2 | DERIVED | cex | ref | request_id | request_id | 5 | dbname.cr.id | 1 | |
| 2 | DERIVED | cety | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cex.expense_type_id | 1 | Using index |
| 2 | DERIVED | employee | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cr.emp_number | 1 | |
| 2 | DERIVED | claim_cc | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cex.currency_id | 1 | |
| 2 | DERIVED | claim_req_cc | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cr.currency_id | 1 | Using index |
| 2 | DERIVED | el | ref | PRIMARY | PRIMARY | 4 | dbname.employee.emp_number | 1 | Using index |
| 3 | DEPENDENT SUBQUERY | claim_estimation_claiming | index_subquery | claim_request_id | claim_request_id | 5 | func | 2 | Using index |
+----+--------------------+--------------------------------+----------------+------------------+------------------+---------+-----------------------------------------+------+---------------------------------+
-- MARIADB 10.2
+------+--------------+--------------------------------+--------+------------------+------------------+---------+-----------------------------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+--------------------------------+--------+------------------+------------------+---------+-----------------------------------------+------+------------------------------+
| 1 | PRIMARY | cr | ALL | NULL | NULL | NULL | NULL | 920 | Using where |
| 1 | PRIMARY | cex | ref | request_id | request_id | 5 | dbname.cr.id | 1 | Using where |
| 1 | PRIMARY | cety | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cex.expense_type_id | 1 | Using where; Using index |
| 1 | PRIMARY | employee | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cr.emp_number | 1 | Using where |
| 1 | PRIMARY | claim_cc | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cex.currency_id | 1 | Using where |
| 1 | PRIMARY | el | ref | PRIMARY | PRIMARY | 4 | dbname.employee.emp_number | 1 | Using where; Using index |
| 4 | MATERIALIZED | cr | ALL | NULL | NULL | NULL | NULL | 920 | Using where; Using temporary |
| 4 | MATERIALIZED | cex | ref | request_id | request_id | 5 | dbname.cr.id | 1 | |
| 4 | MATERIALIZED | cety | eq_ref | PRIMARY | PRIMARY | 4 | dbname.cex.expense_type_id | 1 | Using where |
| 6 | MATERIALIZED | claim_estimation_claiming | index | claim_request_id | claim_request_id | 5 | NULL | 1 | Using index |
| 3 | MATERIALIZED | claim_estimation_claiming | index | claim_request_id | claim_request_id | 5 | NULL | 1 | Using index |
+------+--------------+--------------------------------+--------+------------------+------------------+---------+-----------------------------------------+------+------------------------------+
I tried running sub-queries separately and the sub-queries didn't show any delay in Mysql. The problem seems to be only when the query is run as a whole.
As I feel, according to the explains, the issue seems to be because Mysql 5.5 has more All values in type field (it means mysql has to go through all the values in a subset).
Anyone has a better reasoning or Anyway to improve this query?
Turn
NOT IN ( SELECT ... )
into either
NOT EXISTS ( SELECT ... )
or
LEFT JOIN ... WHERE .. IS NULL
Then see if you can get rid of more subqueries.
If those don't speed it up enough, I'll look again.
The likely reason is that MariaDB has a feature that MySQL does not have: subquery caching. Also, MySQL has 3 major versions after 5.5, and there are some significant optimization improvements in them.
It would be interesting to see SHOW VARIABLES and SHOW GLOBAL STATUS from each server (plus RAM size). From that, I think I can point out that the caching is in use.
Meanwhile, my goal in the reformulation suggestions is to speed up MySQL (and maybe MariaDB).
Looks like there is a redundant left join which isn't used anywhere, so it will be helpful to remove it: claim_req_cc
Try to modify NOT IN to NOT EXISTS, as Rick mentioned.
I am trying to find a chain of exactly 2^1024 lenght, here is a simplified version:
CREATE VIEW full2 AS SELECT a.id1,b.id2 FROM gt a JOIN gt b ON a.id2=b.id1;
CREATE VIEW full3 AS SELECT a.id1,b.id2 FROM full2 a JOIN full2 b ON a.id2=b.id1;
CREATE VIEW full4 AS SELECT a.id1,b.id2 FROM full3 a JOIN full3 b ON a.id2=b.id1;
The problem is that a lot of rows die out after after f2, but somehow mysql's execution plan is
+----+-------------+-------+------------+-------+---------------+------+---------+--------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+--------------+------+----------+--------------------------+
| 1 | SIMPLE | gt1 | NULL | index | id1,id2 | id2 | 4 | NULL | 33 | 81.00 | Using where; Using index |
| 1 | SIMPLE | gt2 | NULL | ref | id1,id2 | id2 | 4 | picr.gt1.id1 | 1 | 90.00 | Using where; Using index |
| 1 | SIMPLE | gt2 | NULL | ref | id1,id2 | id1 | 4 | picr.gt1.id2 | 1 | 90.00 | Using where; Using index |
| 1 | SIMPLE | gt1 | NULL | ref | id1,id2 | id2 | 4 | picr.gt2.id1 | 1 | 90.00 | Using where; Using index |
| 1 | SIMPLE | gt1 | NULL | ref | id1,id2 | id1 | 4 | picr.gt2.id2 | 1 | 90.00 | Using where; Using index |
| 1 | SIMPLE | gt2 | NULL | ref | id1,id2 | id2 | 4 | picr.gt1.id1 | 1 | 90.00 | Using where; Using index |
| 1 | SIMPLE | gt2 | NULL | ref | id1 | id1 | 4 | picr.gt1.id2 | 1 | 100.00 | Using index |
| 1 | SIMPLE | gt1 | NULL | ref | id2 | id2 | 4 | picr.gt2.id1 | 1 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+--------------+------+----------+--------------------------+
Which I should explain:
for each row in comparison1 that id2 MIGHT exist in comparison.id1
for each row in comparison2 that id2 MIGHT exist in comparison.id1 and id1 MIGHT exist in comparison.id2
for each row in comparison3 that id2 MIGHT exist in comparison.id1 and id1 MIGHT exist in comparison.id2
...
does comparison1.id2=comparison2.id1 and comparison2.id2=comparison3.id1 comparison3.id2=comparison4.id1 ...
...
}
}
}
This is a huge drain on query time. This is because I actually don't have any rows in f2(not yet anyway) and if nothing is in f2 then joining f2 (nothing) to actually any other query should immediately stop the sql calculation(yet it might start right to left, but same problem, right side will also be empty). Calculation time rises exponentially on each new view.
How can I do something like this?
for each row in comparison1
for each row in comparison2 that id1 exists in comparison1 .id2
if fail then skip
for each row in comparison3 that id1 exists in comparison2 .id2
if fail then skip
...
does comparison1.id2=comparison2.id1 and comparison2.id2=comparison3.id1 comparison3.id2=comparison4.id1 ...
...
}
}
}
It is not possible? Yes it is! This is a more weird query, won't go into details(it was supposed to join everything from lenght 1 to 1024 without any unions and saves id1,id2 and previous id2 and has huge where condition) but in essence it does more and faster!
mysql> describe select * from part4_v2_3colleftjoin;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------------------------+
| 1 | SIMPLE | p | NULL | index | NULL | id2 | 4 | NULL | 33 | 100.00 | Using index |
| 1 | SIMPLE | f | NULL | ALL | id1,id2 | NULL | NULL | NULL | 33 | 100.00 | Range checked for each record (index map: 0x3) |
| 1 | SIMPLE | p | NULL | ALL | id1,id2 | NULL | NULL | NULL | 33 | 100.00 | Range checked for each record (index map: 0x3) |
| 1 | SIMPLE | f | NULL | ALL | id1,id2 | NULL | NULL | NULL | 33 | 100.00 | Range checked for each record (index map: 0x3) |
| 1 | SIMPLE | p | NULL | index | id1 | id2 | 4 | NULL | 33 | 100.00 | Using where; Using index |
| 1 | SIMPLE | f | NULL | ALL | id1,id2 | NULL | NULL | NULL | 33 | 100.00 | Range checked for each record (index map: 0x3) |
| 1 | SIMPLE | p | NULL | ALL | id1,id2 | NULL | NULL | NULL | 33 | 100.00 | Range checked for each record (index map: 0x3) |
| 1 | SIMPLE | f | NULL | ALL | id1,id2 | NULL | NULL | NULL | 33 | 100.00 | Range checked for each record (index map: 0x3) |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------------------------+
I have a rather big query that collects a lot of information in several tables, using joins. The database is a GTFS transit information from a city public transportation system.
I'm running the same query with a different WHERE clause, and the time taken can go from 200ms to 200s.
If you don't need the explanation, scroll down straight to The problem.
The database
The tables are:
routes
trips: connected to routes using route_id
stop_times: connected totripsusingtrip_id`
stops: connected to stop_times using stop_id
stop_connections: connects two stop_id
My goal is to select journeys using 2 connections. Here's how my query looks on paper:
Explanation:
The black information is the tables, one table type per line (ie. top line is the trips table).
the red information is the alias in my query (s is stops, st is stop_times, t is trips, a is arrival, d is departure, and the 1/2/3 is the trip index)
the green information is the list of conditions on each table
Basically it's:
[s1d ] Start from a given stop_id
[st1d] Get departure time of trips from that stop
[t1 ] Limit those trips to the set of route_id we want
[st1a] Get the arrival time of the stop
[s1a ] Get stop information (stop name)
[cs1 ] Connect this stop to all other stops in a walking distance
Repeat this operation 3 times to get 3 trips (2 connections), and filter the arrival stops to the one that I want.
The query
select
s1d.stop_id as s1d_id, s1d.stop_name as s1d_name, s1d.stop_lat as s1d_lat, s1d.stop_lon as s1d_lon,
st1d.departure_time as st1d_dep,
t1.trip_id as t1_id, t1.trip_headsign as t1_headsign, t1.route_id as t1_route, t1.direction_id as t1_dir,
st1a.departure_time as st1a_dep,
s1a.stop_id as s1a_id, s1a.stop_name as s1a_name, s1a.stop_lat as s1a_lat, s1a.stop_lon as s1a_lon,
cs1.from_stop_id, cs1.to_stop_id,
s2d.stop_id as s2d_id, s2d.stop_name as s2d_name, s2d.stop_lat as s2d_lat, s2d.stop_lon as s2d_lon,
st2d.departure_time as st2d_dep,
t2.trip_id as t2_id, t2.trip_headsign as t2_headsign, t2.route_id as t2_route, t2.direction_id as t2_dir,
st2a.departure_time as st2a_dep,
s2a.stop_id as s2a_id, s2a.stop_name as s2a_name, s2a.stop_lat as s2a_lat, s2a.stop_lon as s2a_lon,
cs2.from_stop_id, cs2.to_stop_id,
s3d.stop_id as s3d_id, s3d.stop_name as s3d_name, s3d.stop_lat as s3d_lat, s3d.stop_lon as s3d_lon,
st3d.departure_time as st3d_dep,
t3.trip_id as t3_id, t3.trip_headsign as t3_headsign, t3.route_id as t3_route, t3.direction_id as t3_dir,
st3a.departure_time as st3a_dep,
s3a.stop_id as s3a_id, s3a.stop_name as s3a_name, s3a.stop_lat as s3a_lat, s3a.stop_lon as s3a_lon
from stops s1d
left join stop_times st1d on st1d.stop_id = s1d.stop_id
and st1d.departure_time > '07:33:00' and st1d.departure_time < '08:33:00'
left join trips t1 on t1.trip_id = st1d.trip_id
and t1.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1)
and t1.route_id in ('11-0')
left join stop_times st1a on st1a.trip_id = t1.trip_id
and st1a.departure_time > st1d.departure_time
left join stops s1a on s1a.stop_id = st1a.stop_id
left join stop_connections cs1 on cs1.from_stop_id = st1a.stop_id
left join stops s2d on s2d.stop_id = cs1.to_stop_id
left join stop_times st2d on st2d.stop_id = s2d.stop_id
and st2d.departure_time > addtime(st1a.departure_time, '00:03:00')
and st2d.departure_time < addtime(st1a.departure_time, '01:03:00')
left join trips t2 on t2.trip_id = st2d.trip_id
and t2.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1)
and t2.route_id in ('3-0', 'NA-0', '4-0', '2-0')
left join stop_times st2a on st2a.trip_id = t2.trip_id and st2a.departure_time > st2d.departure_time
left join stops s2a on s2a.stop_id = st2a.stop_id
left join stop_connections cs2 on cs2.from_stop_id = st2a.stop_id
left join stops s3d on s3d.stop_id = cs2.to_stop_id
left join stop_times st3d on st3d.stop_id = s3d.stop_id
and st3d.departure_time > addtime(st2a.departure_time, '00:03:00')
and st3d.departure_time < addtime(st2a.departure_time, '01:03:00')
left join trips t3 on t3.trip_id = st3d.trip_id
and t3.service_id in (select service_id from calendar where start_date <= 20141020 and end_date >= 20141020 and monday = 1)
and t3.route_id in ('36-0', '30-0', '97-0')
left join stop_times st3a on st3a.trip_id = t3.trip_id
and st3a.departure_time > st3d.departure_time
and st3a.stop_id in ('StopPoint:CLBO2',
'StopArea:CLBO',
'StopPoint:CLBO1',
'StopPoint:PLTI2',
'StopPoint:LCBU2',
'StopArea:LCBU',
'StopPoint:LCBU1',
'StopPoint:MHDI2',
'StopPoint:BILE2',
'StopArea:MHDI',
'StopPoint:MHDI1',
'StopPoint:MREZ2',
'StopArea:MRDI',
'StopPoint:MRDI1',
'StopArea:SORI',
'StopPoint:SORI1',
'StopArea:MREZ',
'StopPoint:MREZ1',
'StopPoint:SORI2',
'StopArea:BILE',
'StopPoint:BILE1',
'StopPoint:MRDI2',
'StopArea:PLTI',
'StopPoint:PLTI1',
'StopPoint:SEIL3',
'StopPoint:SEIL2',
'StopArea:SEIL',
'StopPoint:SEIL1')
left join stops s3a on s3a.stop_id = st3a.stop_id
where s1d.stop_id = 'StopPoint:DEMO1'
group by s1d_id, s3a_id
having s3a_id is not null
order by s1d_id asc, st1d_dep asc, st1a_dep asc, s1a_id asc, s2d_id asc, st2d_dep asc, st2a_dep asc, s2a_id asc, s3d_id asc, st3d_dep asc, st3a_dep asc, s3a_id asc
The problem
I run this query twice, the only difference is in the where clause at the end:
where s1d.stop_id = 'StopPoint:DEMO1': 13 rows in set (2 min 58.81 sec)
where s1d.stop_id = 'StopPoint:ECTE2': Empty set (0.25 sec)
Now that's very strange to me. Here's the explain for both queries:
Departing from DEMO1 (13 results, slow)
Using EXPLAIN SELECT…:
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
| 1 | SIMPLE | s1d | ALL | NULL | NULL | NULL | NULL | 3411 | NULL |
| 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s1d.stop_id | 163 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | Using where |
| 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | Using where |
| 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | Using index |
| 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | Using where |
| 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | Using where |
| 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | Using index |
| 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | Using where |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | Using where |
| 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | Using where |
| 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | NULL |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+-------------+
Using EXPLAIN EXTENDED…:
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+
| 1 | SIMPLE | s1d | const | PRIMARY | PRIMARY | 302 | const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | const | 234 | 100.00 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | 100.00 | Using where |
| 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | 100.00 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | 100.00 | Using where |
| 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | 100.00 | Using where |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | 100.00 | Using where |
| 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | 100.00 | NULL |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+----------+---------------------------------+
Departing from ECTE2 (0 result, fast)
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+
| 1 | SIMPLE | s1d | const | PRIMARY | PRIMARY | 302 | const | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | st1d | ref | st_stop_id_idx | st_stop_id_idx | 302 | const | 234 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st1d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t1.service_id | 1 | Using where |
| 1 | SIMPLE | st1a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t1.trip_id | 14 | Using where |
| 1 | SIMPLE | s1a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs1 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st1a.stop_id | 1 | Using index |
| 1 | SIMPLE | s2d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs1.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st2d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s2d.stop_id | 163 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st2d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t2.service_id | 1 | Using where |
| 1 | SIMPLE | st2a | ref | st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t2.trip_id | 14 | Using where |
| 1 | SIMPLE | s2a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | NULL |
| 1 | SIMPLE | cs2 | ref | from_to_stop_ids | from_to_stop_ids | 302 | bicou_gtfs_nantes.st2a.stop_id | 1 | Using index |
| 1 | SIMPLE | s3d | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.cs2.to_stop_id | 1 | NULL |
| 1 | SIMPLE | st3d | ref | st_stop_id_idx | st_stop_id_idx | 302 | bicou_gtfs_nantes.s3d.stop_id | 163 | Using where |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,trip_service_id,trip_route_id_idx | PRIMARY | 302 | bicou_gtfs_nantes.st3d.trip_id | 1 | Using where |
| 1 | SIMPLE | calendar | eq_ref | PRIMARY,service_id | PRIMARY | 302 | bicou_gtfs_nantes.t3.service_id | 1 | Using where |
| 1 | SIMPLE | st3a | ref | st_stop_id_idx,st_trip_id_idx | st_trip_id_idx | 302 | bicou_gtfs_nantes.t3.trip_id | 14 | Using where |
| 1 | SIMPLE | s3a | eq_ref | PRIMARY | PRIMARY | 302 | bicou_gtfs_nantes.st3a.stop_id | 1 | NULL |
+----+-------------+----------+--------+-------------------------------------------+------------------+---------+----------------------------------+------+---------------------------------+
Obviously the engine handles the two queries differently. Now why is a different question.
The s1d object is from the table stops:
CREATE TABLE IF NOT EXISTS `stops` (
`stop_id` VARCHAR(100) NOT NULL,
`stop_code` VARCHAR(50) NULL DEFAULT NULL,
`stop_name` VARCHAR(255) NOT NULL,
`stop_desc` VARCHAR(255) NULL DEFAULT NULL,
`stop_lat` DECIMAL(10,6) NOT NULL,
`stop_lon` DECIMAL(10,6) NOT NULL,
`zone_id` VARCHAR(255) NULL DEFAULT NULL,
`stop_url` VARCHAR(255) NULL DEFAULT NULL,
`location_type` VARCHAR(2) NULL DEFAULT NULL,
`parent_station` VARCHAR(100) NOT NULL,
`stop_timezone` VARCHAR(50) NULL DEFAULT NULL,
`wheelchair_boarding` TINYINT(1) NULL DEFAULT NULL,
PRIMARY KEY (`stop_id`),
INDEX `zone_id` (`zone_id` ASC),
INDEX `stop_lat` (`stop_lat` ASC),
INDEX `stop_lon` (`stop_lon` ASC),
INDEX `location_type` (`location_type` ASC),
INDEX `parent_station` (`parent_station` ASC),
CONSTRAINT `stop_parent_station`
FOREIGN KEY (`parent_station`)
REFERENCES `stops` (`stop_id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
I don't understand why when there is no data the engine properly uses indexes and keys; and when there is data (13 rows), the engine doesn't use the indexes and keys and browses 3 thousand rows instead of one.
Is there any way for me to force the engine to use keys on specific tables?
Also, why is the engine behaving like this?
Environment:
OS: Mac OS X 10.10
SQL client: mysql Ver 14.14 Distrib 5.6.17, for osx10.6 (i386) using EditLine wrapper
SQL server: 5.6.21 MySQL Community Server (GPL)
Hardware: MacBook Air, Intel Core i7, 8GB RAM, 256GB SSD (should be fast)
Table sizes:
+-------------------+------------+
| table_name | TABLE_ROWS |
+-------------------+------------+
| agency | 0 |
| calendar | 28 |
| calendar_dates | 1005 |
| fare_attributes | 0 |
| fare_rules | 0 |
| feed_info | 0 |
| frequencies | 0 |
| route_connections | 20919 |
| routes | 60 |
| shapes | 0 |
| stop_connections | 11617 |
| stop_times | 768682 |
| stops | 3411 |
| stops_routes | 16652 |
| transfers | 0 |
| trips | 31913 |
+-------------------+------------+
Row count after each joined table:
+---------+-------------+------------+
| Table | DEMO1 | ECTE2 |
+---------+-------------+------------+
| s1d | 1 | 1 |
| st1d | 16 | 18 |
| t1 | 16 | 18 |
| st1a | 271 | 117 |
| s1a | 271 | 117 |
| cs1 | 1286 | 495 |
| s1d | 1286 | 495 |
| st2d | 32958 | 5973 |
| t2 | 32958 | 5973 |
| st2a | 65891 | 5973 |
| s2a | 65891 | 5973 |
| cs2 | 206455 | 5973 |
| s3d | 206455 | 5973 |
| st3d | 4284871 | 5973 |
| t3 | 4284871 | 5973 |
| st3a | 4351249 | 5973 |
| s3a | 4351249 | 5973 |
| +having | 13 | 0 |
+---------+-------------+------------+
Two thoughts come in mind:
1) Switch some indices to BTEE indices. The default is HASH which are good for equal/unequal comparison, not IN(...). See here
2) See what the optimizer made with your queries. Do a
EXPLAIN EXTENDED SELECT ...
on both queries. THis will give you a warning containg the query optimizer output. You should see a difference here.
I have 2 mysql servers. 1 node is master and the other acting as slave, replicating from the master
The 2 nodes have identical data and schema.
However, 1 particular query is executed differently from mysql when run on both nodes
query
EXPLAIN SELECT t.*, COUNT(h.id)
FROM tags t
INNER JOIN tags2articles s
ON t.id = s.tag_id
INNER JOIN tag_hits h
ON h.id = s.tag_id
INNER JOIN articles art
ON art.id = s.`article_id`
WHERE art.source_id IN (SELECT id FROM feeds WHERE source_id = 15074)
AND time_added > DATE_SUB(NOW(), INTERVAL 1 DAY)
AND t.type = '1'
GROUP BY t.id
HAVING COUNT(h.id) > 4
ORDER BY COUNT(h.id) DESC
LIMIT 15
Below is the outpout from EXPLAIN query run on both nodes. Note that the master node is outputting
the correct one
output on master node
+----+--------------------+-------+-----------------+-----------------------------+---------------------+---------+----------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-----------------+-----------------------------+---------------------+---------+----------------+--------+----------------------------------------------+
| 1 | PRIMARY | art | ALL | PRIMARY | NULL | NULL | NULL | 100270 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | s | ref | PRIMARY,FK_tags2articles | FK_tags2articles | 4 | art.id | 12 | Using index |
| 1 | PRIMARY | h | ref | tags_hits_idx | tags_hits_idx | 4 | s.tag_id | 1 | Using index |
| 1 | PRIMARY | t | eq_ref | PRIMARY,tags_type_idx | PRIMARY | 4 | s.tag_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | feeds | unique_subquery | PRIMARY,f_source_id_idx | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------+-----------------+-----------------------------+---------------------+---------+----------------+--------+----------------------------------------------+
output on slave node
+----+--------------------+-------+-----------------+-----------------------------+------------------+---------+--------------------+--------+----------------------------------------
------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-----------------+-----------------------------+------------------+---------+--------------------+--------+----------------------------------------------+
| 1 | PRIMARY | t | ref | PRIMARY,tags_type_idx | tags_type_idx | 2 | const | 206432 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | h | ref | tags_hits_idx | tags_hits_idx | 4 | t.id | 1 | Using index |
| 1 | PRIMARY | s | ref | PRIMARY,FK_tags2articles | PRIMARY | 4 | h.id | 2 | Using where; Using index |
| 1 | PRIMARY | art | eq_ref | PRIMARY | PRIMARY | 4 | s.article_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | feeds | unique_subquery | PRIMARY,f_source_id_idx | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------+-----------------+-----------------------------+------------------+---------+--------------------+--------+----------------------------------------------+
I cannot understand why this discrepancy exists. Any help?
Thanks
They can have different statistics for indexes / keys and that causes differences in index usage. If possible (locks table, so not always recommended) run ANALYZE TABLE for all participating tables and then query plan is likely same.
I have the following query:
SELECT mdg.id AS mdg_id, vdk.id AS vdisk_id, vdk.name AS vdisk_name, vdk.objKey AS vdisk_key,
vdc.type AS copy_type, vde.copy_id AS copy_id, mdk.id AS mdisk_id, vde.number_extents AS extent_count,
IFNULL(ter.name,'generic_hdd') AS mdisk_tier, mdg.snapKey, vde.objKey AS row_key
FROM mdisk_grp AS mdg
INNER JOIN vdiskcopy AS vdc ON mdg.snapKey = vdc.snapKey AND mdg.id = vdc.mdisk_grp_id
INNER JOIN vdisk AS vdk ON vdc.owner = vdk.objKey
INNER JOIN vdiskextent AS vde ON vdk.snapKey=vde.snapKey AND vdk.id = vde.vdisk_id AND vdc.copy_id = vde.copy_id
INNER JOIN mdisk AS mdk ON vde.id = mdk.id AND vde.snapKey = mdk.snapKey
LEFT JOIN tier_mdisk AS tmk ON tmk.mdisk_objKey = mdk.objKey
LEFT JOIN tier AS ter ON tmk.tier_objKey = ter.objKey
WHERE mdg.snapKey= '333';
I'm seeing it frequently in the slow query log (witch different values of mdg.snapKey, of course); for example:
# User#Host: svcControl[svcControl] # localhost []
# Query_time: 2577 Lock_time: 0 Rows_sent: 11469 Rows_examined: 354942843
The fact that it is running long isn't surprising, given the number of rows being examined.
However, when I EXPLAIN the query, I see
+----+-------------+-------+--------+---------------------------------------------------------+--------------------+---------+------------------------+------+---------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------------------------------+--------------------+---------+------------------------+------+---------------------+
| 1 | SIMPLE | tmk | system | mdisk_objKey_idx | NULL | NULL | NULL | 0 | const row not found |
| 1 | SIMPLE | ter | const | PRIMARY | NULL | NULL | NULL | 1 | |
| 1 | SIMPLE | mdg | ref | mdisk_grp_id_idx,snapIdx | snapIdx | 2 | const | 11 | |
| 1 | SIMPLE | vdc | ref | snapIdx,mdgIdIdx,ownerIdx | mdgIdIdx | 5 | svcObjects.mdg.id | 1 | Using where |
| 1 | SIMPLE | vdk | eq_ref | PRIMARY,vdisk_id_idx,snapIdx | PRIMARY | 3 | svcObjects.vdc.owner | 1 | |
| 1 | SIMPLE | mdk | ref | mdisk_id_idx,snapKey_idx | snapKey_idx | 2 | svcObjects.vdk.snapKey | 16 | |
| 1 | SIMPLE | vde | ref | vdiskextent_id_idx,vdiskextent_vdisk_id_idx,snapKey_idx | vdiskextent_id_idx | 5 | svcObjects.mdk.id | 23 | Using where |
+----+-------------+-------+--------+---------------------------------------------------------+--------------------+---------+------------------------+------+---------------------+
Multiplying all of those rows together comes to a little over 4000 (ignoring the 0 in the column)
It appears that the execution plan is being ignored.
Is there a way to run a query and then get a report and what execution plan was actually followed? Or am I misunderstanding this output?
UPDATE
I noticed that the Cardinality for some of my indexes was NULL, so I ANALYZEd all of the tables. The latest EXPLAIN reports:
+----+-------------+-------+--------+---------------------------------------------------------+--------------------------+---------+----------------------+------+---------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------------------------------+--------------------------+---------+----------------------+------+---------------------+
| 1 | SIMPLE | tmk | system | mdisk_objKey_idx | NULL | NULL | NULL | 0 | const row not found |
| 1 | SIMPLE | ter | const | PRIMARY | NULL | NULL | NULL | 1 | |
| 1 | SIMPLE | mdg | ref | mdisk_grp_id_idx,snapIdx | snapIdx | 2 | const | 8 | |
| 1 | SIMPLE | vdc | ref | snapIdx,mdgIdIdx,ownerIdx | snapIdx | 2 | const | 1580 | Using where |
| 1 | SIMPLE | vdk | eq_ref | PRIMARY,vdisk_id_idx,snapIdx | PRIMARY | 3 | svcObjects.vdc.owner | 1 | |
| 1 | SIMPLE | vde | ref | vdiskextent_id_idx,vdiskextent_vdisk_id_idx,snapKey_idx | vdiskextent_vdisk_id_idx | 5 | svcObjects.vdk.id | 300 | Using where |
| 1 | SIMPLE | mdk | ref | mdisk_id_idx,snapKey_idx | mdisk_id_idx | 5 | svcObjects.vde.id | 14 | Using where |
+----+-------------+-------+--------+---------------------------------------------------------+--------------------------+---------+----------------------+------+---------------------+
That multiplies up to 53,088,000, which is much better than the 354,942,843 that were examined in the slow query.
I guess my question now is this... does the behaviour that I've seen get explained by what I did... if the index cardinalities are NULL then the execution plan will appear over-optimistic and may also result in 6-7 times the number of necessary rows being examined?
Look if the article answers your question: http://www.mysqlperformanceblog.com/2011/10/13/when-explain-estimates-can-go-wrong/