How can I optimise my MySQL query? - mysql

I'm having real difficulties optimising a MySQL query. I have to use the existing database structure, but I am getting an extremely slow response under certain circumstances.
My query is:
SELECT
`t`.*,
`p`.`trp_name`,
`p`.`trp_lname`,
`trv`.`trv_prosceslevel`,
`trv`.`trv_id`,
`v`.`visa_destcountry`,
`track`.`track_id`,
`track`.`track_datetoembassy`,
`track`.`track_expectedreturn`,
`track`.`track_status`,
`track`.`track_comments`
FROM
(SELECT
*
FROM
`_transactions`
WHERE
DATE(`tr_datecreated`) BETWEEN DATE('2011-07-01 00:00:00') AND DATE('2011-08-01 23:59:59')) `t`
JOIN
`_trpeople` `p` ON `t`.`tr_id` = `p`.`trp_trid` AND `p`.`trp_name` = 'Joe' AND `p`.`trp_lname` = 'Bloggs'
JOIN
`_trvisas` `trv` ON `t`.`tr_id` = `trv`.`trv_trid`
JOIN
`_visas` `v` ON `trv`.`trv_visaid` = `v`.`visa_code`
JOIN
`_trtracking` `track` ON `track`.`track_trid` = `t`.`tr_id` AND `p`.`trp_id` = `track`.`track_trpid` AND `trv`.`trv_id` = `track`.`track_trvid` AND `track`.`track_status` IN ('New','Missing_Info',
'En_Route',
'Ready_Pickup',
'Received',
'Awaiting_Voucher',
'Sent_Client',
'Closed')
ORDER BY `tr_id` DESC
The results of an explain statement on the above is:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 164 Using temporary; Using filesort
1 PRIMARY track ALL status_index NULL NULL NULL 4677 Using where
1 PRIMARY p eq_ref PRIMARY PRIMARY 4 db.track.track_trpid 1 Using where
1 PRIMARY trv eq_ref PRIMARY PRIMARY 4 db.track.track_trvid 1 Using where
1 PRIMARY v eq_ref visa_code visa_code 4 db.trv.trv_visaid 1
2 DERIVED _transactions ALL NULL NULL NULL NULL 4276 Using where
The query times are acceptable until the value of 'Closed' is included in the very last track.track_status IN clause. The length of time is then increased about 10 to 15 times the other queries.
This makes sense as the 'Closed' status refers to all the clients whose transactions have been dealt with, wihich corresponds to about 90% to 95% of the database.
The issue is, is that in some cases, the search is taking about 45 seconds which is rediculous. I'm sure MySQL can do much better than that and it's just my query at fault, even if the tables do have 4000 rows, but I can't work out how to optimise this statement.
I'd be grateful for some advice about where I'm going wrong and how I should be implementing this query to produce a faster result.
Many thanks

Try this:
SELECT t.*,
p.trp_name,
p.trp_lname,
trv.trv_prosceslevel,
trv.trv_id,
v.visa_destcountry,
track.track_id,
track.track_datetoembassy,
track.track_expectedreturn,
track.track_status,
track.track_comments
FROM
_transactions t
JOIN _trpeople p ON t.tr_id = p.trp_trid
JOIN _trvisas trv ON t.tr_id = trv.trv_trid
JOIN _visas v ON trv.trv_visaid = v.visa_code
JOIN _trtracking track ON track.track_trid = t.tr_id
AND p.trp_id = track.track_trpid
AND trv.trv_id = track.track_trvid
WHERE DATE(t.tr_datecreated)
BETWEEN DATE('2011-07-01 00:00:00') AND DATE('2011-08-01 23:59:59')
AND track.track_status IN ('New','Missing_Info','En_Route','Ready_Pickup','Received','Awaiting_Voucher','Sent_Client', 'Closed')
AND p.trp_name = 'Joe' AND p.trp_lname = 'Bloggs'
ORDER BY tr_id DESC

Related

Why is MySQL not using the indexes I have created?

Any ideas why index is not using on table SD and how to fix it?
I tried to remove the group and sort clauses but still same issue and cant find that is the problem
P.S. dont read this system wont let me post because code is more than description
Query
SELECT sd.filter_group_id, fgd.name AS group_name, sdc.filter_id AS filter_id, fd.name,
COUNT(DISTINCT p2c.product_id) AS total, f.sort_order, sd.sort_order AS sort,
(CASE
WHEN fgd.custom_order = 1
THEN COUNT(p2c.product_id)
END) AS custom_order
FROM oc_sd_filter sd
JOIN oc_product_to_category p2c ON p2c.category_id = sd.category_id
JOIN oc_product_filter sdc18 ON sdc18.product_id = p2c.product_id
JOIN oc_product_filter sdc21 ON sdc21.product_id = p2c.product_id
JOIN oc_product p ON p.product_id = p2c.product_id
JOIN oc_product_filter sdc ON sdc.product_id = p2c.product_id
JOIN oc_filter f ON sdc.filter_id = f.filter_id
JOIN oc_filter_description fd ON sdc.filter_id = fd.filter_id
JOIN oc_filter_group_description fgd ON fd.filter_group_id = fgd.filter_group_id
WHERE sd.category_id = '93'
AND p.status = '1'
AND sd.filter_group_id = fd.filter_group_id
AND sd.status = 1
AND sdc18.filter_id IN (199,200,120,321,611,451,380,542)
AND sdc21.filter_id IN (241,242)
GROUP BY fd.filter_id, fd.filter_group_id
ORDER BY sd.sort_order ASC,
(CASE
WHEN fgd.custom_order = 0
THEN f.sort_order
END) ASC,
(CASE
WHEN fgd.custom_order = 1
THEN COUNT(p2c.product_id)
END) DESC
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sd ALL filter,cat,status NULL NULL NULL 11 Using where; Using temporary; Using filesort
1 SIMPLE fgd ref PRIMARY,filter_group_id PRIMARY 4 example_db.sd.filter_group_id 1
1 SIMPLE p2c ref PRIMARY,category_id, category_id 4 example_db.sd.category_id 59 Using index
1 SIMPLE p eq_ref PRIMARY,status,product_id PRIMARY 4 example_db.p2c.product_id 1 Using where
1 SIMPLE sdc ref PRIMARY PRIMARY 4 example_db.p2c.product_id 9 Using index
1 SIMPLE fd ref PRIMARY,filter PRIMARY 4 example_db.sdc.filter_id 1 Using where
1 SIMPLE f eq_ref PRIMARY PRIMARY 4 example_db.sdc.filter_id 1
1 SIMPLE sdc21 ref PRIMARY PRIMARY 4 example_db.p2c.product_id 9 Using where; Using index
1 SIMPLE sdc18 ref PRIMARY PRIMARY 4 example_db.p2c.product_id 9 Using where; Using index
Table
CREATE TABLE `oc_sd_filter` (
`id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`filter_group_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
`sort_order` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Indexes for table `oc_sd_filter`
--
ALTER TABLE `oc_sd_filter`
ADD PRIMARY KEY (`id`),
ADD KEY `filter` (`filter_group_id`),
ADD KEY `cat` (`category_id`),
ADD KEY `status` (`status`),
ADD KEY `sort_order` (`sort_order`);
Suggested composite indexes:
sd: INDEX(category_id, status, filter_group_id, sort_order)
fgd: INDEX(filter_group_id, name, custom_order)
sdc: INDEX(product_id, filter_id)
fd: INDEX(filter_group_id, filter_id, name)
p2c: INDEX(category_id, product_id)
f: INDEX(filter_id, sort_order)
oc_product_filter: INDEX(product_id, filter_id)
p: INDEX(status, product_id)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
If that does not help enough, come back and we can talk about turning the query inside out -- so that the GROUP BY is done before most of the JOINs. But first, how many rows in the resultset? How many rows if you take out the GROUP BY clause?
Example (from Comment):
SELECT filter_group_id
FROM sd
WHERE status = 1
ORDER BY sort_order
The Optimal index is both composite and "covering"; the order is important:
INDEX(status, sort_order, filter_group_id)
Any longer index starting with those is essentially "as good". Any shorter index (eg, INDEX(status, sort_order) or starting with that) will be "good", but "not as good".
In particular, the 4-column index I provided above is not useful. It is OK to add both indexes; the Optimizer will decide which index to use for each SELECT.

Why does OR in subquery make query so much slower?

I'm using MySQL and have the following query that I was trying to improve:
SELECT
*
FROM
overpayments AS op
JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
AND overpayment_pa.allocatable_type = 'Overpayment'
JOIN (
SELECT
pa.payment_source_type,
pa.payment_source_id,
ft.conversion_rate
FROM
payment_allocations AS pa
LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
LEFT JOIN financial_transactions AS ft ON (
ft.commercial_document_id = pa.payment_source_id
AND ft.commercial_document_type = pa.payment_source_type
)
OR (
ft.commercial_document_id = cn.id
AND ft.commercial_document_type = 'CreditNote'
)
WHERE
pa.allocatable_type = 'Overpayment'
AND pa.company_id = 14792
AND ft.company_id = 14792
) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;
It takes 10s to run. I was able to improve to 0.047s it by removing the OR statement in the subquery and using COALESCE to get the result:
SELECT
*
FROM
overpayments AS op
JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
AND overpayment_pa.allocatable_type = 'Overpayment'
JOIN (
SELECT
pa.payment_source_type,
pa.payment_source_id,
coalesce(ft_one.conversion_rate, ft_two.conversion_rate)
FROM
payment_allocations AS pa
LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
LEFT JOIN financial_transactions AS ft_one ON (
ft_one.commercial_document_id = pa.payment_source_id
AND ft_one.commercial_document_type = pa.payment_source_type
AND ft_one.company_id = 14792
)
LEFT JOIN financial_transactions AS ft_two ON (
ft_two.commercial_document_id = cn.id
AND ft_two.commercial_document_type = 'CreditNote'
AND ft_two.company_id = 14792
)
WHERE
pa.allocatable_type = 'Overpayment'
AND pa.company_id = 14792
) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;
However, I don't really understand why that worked? The original sub query ran very quickly and only returned 2 results, so why would it slow down the query by so much? Explain on the first query returns the following:
# id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
FIELD13
1
SIMPLE
pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_company_id
index_payment_allocations_on_company_id
5
const
191
10.00
Using where
1
SIMPLE
overpayment_pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_allocatable_id
index_payment_allocations_on_payment_source_id
5
rails.pa.payment_source_id
1
3.42
Using where
1
SIMPLE
op
eq_ref
PRIMARY
PRIMARY
4
rails.overpayment_pa.allocatable_id
1
100.00
1
SIMPLE
li
eq_ref
PRIMARY
PRIMARY
4
rails.pa.payment_source_id
1
100.00
1
SIMPLE
cn
eq_ref
PRIMARY
PRIMARY
8
rails.li.parent_document_id
1
100.00
Using where; Using index
1
SIMPLE
ft
ALL
transactions_unique_by_commercial_doc
12587878
0.00
Range checked for each record (index map: 0x2)
And for the second I get the following:
# id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
FIELD13
FIELD14
1
SIMPLE
pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_company_id
index_payment_allocations_on_company_id
5
const
191
10.00
Using where
1
SIMPLE
overpayment_pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_allocatable_id
index_payment_allocations_on_payment_source_id
5
rails.pa.payment_source_id
1
3.42
Using where
1
SIMPLE
op
eq_ref
PRIMARY
PRIMARY
4
rails.overpayment_pa.allocatable_id
1
100.00
1
SIMPLE
ft_one
ref
transactions_unique_by_commercial_doc
index_financial_transactions_on_company_id
transactions_unique_by_commercial_doc
773
rails.pa.payment_source_id
rails.pa.payment_source_type
1
100.00
Using where
1
SIMPLE
li
eq_ref
PRIMARY
PRIMARY
4
rails.pa.payment_source_id
1
100.00
1
SIMPLE
cn
eq_ref
PRIMARY
PRIMARY
8
rails.li.parent_document_id
1
100.00
Using where; Using index
1
SIMPLE
ft_two
ref
transactions_unique_by_commercial_doc
index_financial_transactions_on_company_id
transactions_unique_by_commercial_doc
773
rails.cn.id
const
1
100.00
Using where
but I don't really know how to interpret those results.
Look at the right side of the last row of your first EXPLAIN. It didn't use an index, and it had to scan through megarows. That's slow. Your second query used indexes for every step of the query, so it was much faster.
If your second query yields correct results, use it and don't look back. Congratulations! You've optimized a query.
OR operations, especially in ON clauses, are harder than usual for the query planner module to satisfy, because they often mean it has to take the union of two separate subqueries. It looks like the planner chose to brute-force it in your case. (brute force === scanning many rows.)
Without knowing your indexes, it's hard to help you further.
Read this to learn more. https://use-the-index-luke.com
These may further speed up the second formulation:
overpayment_pa:
INDEX(payment_source_id, payment_source_type, allocatable_type, allocatable_id)
pa: INDEX(allocatable_type, company_id, payment_source_id, payment_source_type)
financial_transactions:
INDEX(commercial_document_id, commercial_document_type, company_id, conversion_rate)

Need help tuning sql query

My mysql DB has become CPU hungry trying to execute a particularly slow query. When I do an explain, mysql says "Using where; Using temporary; Using filesort". Please help deciphering and solving this puzzle.
Table structure:
CREATE TABLE `topsources` (
`USER_ID` varchar(255) NOT NULL,
`UPDATED_TIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`URL_ID` int(11) NOT NULL,
`SOURCE_SLUG` varchar(100) NOT NULL,
`FEED_PAGE_URL` varchar(255) NOT NULL,
`CATEGORY_SLUG` varchar(100) NOT NULL,
`REFERRER` varchar(2048) DEFAULT NULL,
PRIMARY KEY (`USER_ID`,`URL_ID`),
KEY `USER_ID` (`USER_ID`),
KEY `FEED_PAGE_URL` (`FEED_PAGE_URL`),
KEY `SOURCE_SLUG` (`SOURCE_SLUG`),
KEY `CATEGORY_SLUG` (`CATEGORY_SLUG`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The table has 370K rows...sometimes higher. The below query takes 10+ seconds.
SELECT topsources.SOURCE_SLUG, COUNT(topsources.SOURCE_SLUG) AS VIEW_COUNT
FROM topsources
WHERE CATEGORY_SLUG = '/newssource'
GROUP BY topsources.SOURCE_SLUG
HAVING MAX(CASE WHEN topsources.USER_ID = 'xxxx' THEN 1 ELSE 0 END) = 0
ORDER BY VIEW_COUNT DESC;
Here's the extended explain:
+----+-------------+------------+------+---------------+---------------+---------+-------+--------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------+---------------+---------------+---------+-------+--------+----------+----------------------------------------------+
| 1 | SIMPLE | topsources | ref | CATEGORY_SLUG | CATEGORY_SLUG | 302 | const | 160790 | 100.00 | Using where; Using temporary; Using filesort |
+----+-------------+------------+------+---------------+----
-----------+---------+-------+--------+----------+----------------------------------------------+
Is there a way to improve this query? Also, are there any mysql settings that can help in reducing CPU load? I can allocate more memory that's available on my server.
The most likely thing to help the query is an index on CATEGORY_SLUG, especially if it takes on many values. (That is, if the query is highly selective.) The query needs to read the entire table to get the results -- although 10 seconds seems like a long time.
I don't think the HAVING clause would be affecting the query processing.
Does the query take just as long if you run it two times in a row?
If there are many rows that match your CATEGORY_SLUG criteria, it may be difficult to make this fast, but is this any quicker?
SELECT ts.SOURCE_SLUG, COUNT(ts.SOURCE_SLUG) AS VIEW_COUNT
FROM topsources ts
WHERE ts.CATEGORY_SLUG = '/newssource'
AND NOT EXISTS(SELECT 1 FROM topsources ts2
WHERE ts2.CATEGORY_SLUG = '/newssource'
AND ts.SOURCE_SLUG = TS2.SOURCE_SLUG
AND ts2.USER_ID = 'xxxx')
GROUP BY ts.SOURCE_SLUG
ORDER BY VIEW_COUNT DESC;
That should do the trick if I read this my sql alteration correcty
SELECT topsources.SOURCE_SLUG, COUNT(topsources.SOURCE_SLUG) AS VIEW_COUNT
FROM topsources
WHERE CATEGORY_SLUG = '/newssource' and
topsources.SOURCE_SLUG not in (
select distinct SOURCE_SLUG
from topsources
where USER_ID = 'xxxx'
)
GROUP BY topsources.SOURCE_SLUG
ORDER BY VIEW_COUNT DESC;
Always hard to optimise something when you can't just throw queries at the data yourself, but this would be my first attempt if I was doing it myself:
SELECT t.SOURCE_SLUG, COUNT(t.SOURCE_SLUG) AS VIEW_COUNT
FROM topsources t
LEFT JOIN (
SELECT SOURCE_SLUG
FROM topsources t
WHERE CATEGORY_SLUG = '/newssource'
AND USER_ID = 'xxx'
GROUP BY .SOURCE_SLUG
) x USING (SOURCE_SLUG)
WHERE t.CATEGORY_SLUG = '/newssource'
AND x.SOURCE_SLUG IS NULL
GROUP BY t.SOURCE_SLUG
ORDER BY VIEW_COUNT DESC;

Optimizing SQL request

Im using PDO Mysql, and made a request to select cheapest offers for a product in my database. It works fine, only problem is it is slow (for 200 offers (and still just 25 to return)) it takes almost a second, which is a lot higher than what I aim.
I'm no expert in SQL, so I seek for your help on this matter. Here is the request and I'll be happy to provide more info if needed :
SELECT
mo.id AS id,
mo.stock AS stock,
mo.price AS price,
mo.promotional_price AS promotional_price,
mo.picture_1 AS picture_1,
mo.picture_2 AS picture_2,
mo.picture_3 AS picture_3,
mo.picture_4 AS picture_4,
mo.picture_5 AS picture_5,
mo.title AS title,
mo.description AS description,
mo.state AS state,
mo.is_new AS is_new,
mo.is_original AS is_original,
c.name AS name,
u.id AS user_id,
u.username AS username,
u.postal_code AS postal_code,
p.name AS country_name,
ra.cache_rating_avg AS cache_rating_avg,
ra.cache_rating_nb AS cache_rating_nb,
GROUP_CONCAT(md.delivery_mode_id SEPARATOR ', ') AS delivery_mode_ids,
GROUP_CONCAT(ri.title SEPARATOR ', ') AS delivery_mode_titles
FROM
mp_offer mo, catalog_product_i18n c,
ref_country_i18n p, mp_offer_delivery_mode md,
ref_delivery_mode r,
ref_delivery_mode_i18n ri, user u
LEFT JOIN mp_user_review_rating_i18n ra
ON u.id = ra.user_id
WHERE (mo.product_id = c.id
AND mo.culture = c.culture
AND mo.user_id = u.id
AND u.country_id = p.id
AND mo.id = md.offer_id
AND md.delivery_mode_id = ri.id
AND mo.culture = ri.culture)
AND (mo.culture = 1
AND p.culture = 1)
AND mo.is_deleted = 0
AND mo.product_id = 60
AND ((u.holiday_start IS NULL)
OR (u.holiday_start = '0000-00-00')
OR (u.holiday_end IS NULL)
OR (u.holiday_end = '0000-00-00')
OR (u.holiday_start > '2012-05-03')
OR (u.holiday_end < '2012-05-03'))
AND mo.stock > 0
GROUP BY mo.id
ORDER BY IF (mo.promotional_price IS NULL,
mo.price,
LEAST(mo.price, mo.promotional_price)) ASC
LIMIT 25 OFFSET 0;
I take the offers for a particular product that have their "culture" set to 1, are not deleted, that have some stock and whose seller is not in holidays. I order by price (promotional_price when there is one).
Is LEAST a slow function?
Here is the output of EXPLAIN :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c const PRIMARY,catalog_product_i18n_product,catalog_product_i18n_culture PRIMARY 8 const,const 1 "Using temporary; Using filesort"
1 SIMPLE mo ref PRIMARY,culture,is_deleted,product_id,user_id culture 4 const 3 "Using where with pushed condition"
1 SIMPLE u eq_ref PRIMARY,user_country PRIMARY 4 database.mo.user_id 1 "Using where with pushed condition"
1 SIMPLE p eq_ref PRIMARY,ref_country_i18n_culture PRIMARY 8 database.u.country_id,const 1
1 SIMPLE r ALL NULL NULL NULL NULL 3 "Using join buffer"
1 SIMPLE ra ALL NULL NULL NULL NULL 4
1 SIMPLE md ref PRIMARY,fk_offer_has_delivery_mode_delivery_mode1,fk_offer_has_delivery_mode_offer1 PRIMARY 4 database.mo.id 2
1 SIMPLE ri eq_ref PRIMARY PRIMARY 2 database.md.delivery_mode_id,const 1
Thanks in advance for your help on optimizing this request.
J
You are not making use of ref_delivery_mode table that you have included in from clause. It's getting cause of Cartesian product of tables result.

Optimize a query

I got this query. It take ~0.0854 seconds to excutes. I find it a little slow. Below see my explain
SELECT
stops.stop_number,
stops.stop_name_1,
stops.stop_name_2
FROM
tranzit.stops_times
INNER JOIN
tranzit.stops
ON
(
stops_times.stop_id = stops.stop_id
)
INNER JOIN
tranzit.trips
ON
(
stops_times.trip_id = trips.trip_id
)
WHERE
trips.route_id = 109 AND
trips.trip_direction = 1 AND
trips.trip_period_start <= "2011-11-24" AND
trips.trip_period_end >= "2011-11-24"
GROUP BY
stops.stop_id
ORDER BY
stops_times.time_sequence ASC
LIMIT
0, 200
Explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE trips index_merge trip_id,trip_period_start,trip_period_end,trip_dir... route_id,trip_direction 3,1 NULL 271 Using intersect(route_id,trip_direction); Using wh...
1 SIMPLE stops_times ref stop_id,trip_id trip_id 16 tranzit.trips.trip_id 24
1 SIMPLE stops ref stop_id stop_id 3 tranzit.stops_times.stop_id 1 Using where
And I have indexe on trips :
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
trips 1 agency_id 1 agency_id A 2 NULL NULL BTREE
trips 1 trip_id 1 trip_id A 9361 NULL NULL BTREE
trips 1 trip_period_start 1 trip_period_start A 2 NULL NULL BTREE
trips 1 trip_period_end 1 trip_period_end A 2 NULL NULL BTREE
trips 1 trip_direction 1 trip_direction A 2 NULL NULL BTREE
trips 1 route_id 1 route_id A 106 NULL NULL BTREE
trips 1 shape_id 1 shape_id A 520 NULL NULL BTREE
trips 1 trip_terminus 1 trip_terminus A 301 NULL NULL BTREE
Indexes on stops
stop_number BTREE Non Non stop_number 4626 A
agency_id BTREE Non Non agency_id 1 A
stop_id BTREE Non Non stop_id 4626 A
Thanks for any help
Given how many rows you have in the tables it is already running pretty quick. You could try a few different approaches such as added more where conditions or performing a simple select and then running a second query to get the needed join fields. But these aren't where you really need to focus.
The important question is how will this query behave in the wild. If you are running it 100 times every second you need to know if it is going to degrade and become a bottleneck. If it can run in 0.08 every time, then that still allows for a very responsive application.
The most important strategy however, if it is possible and came be made effective, is using memcache or a similar option to prevent running the query all the time.
As people wrote before:
Split to 2 queries:
Trip information, by group_concat to make it faster
SELECT group_concat(trip_id) FROM trips WHERE
trips.route_id = 109 AND
trips.trip_direction = 1 AND
trips.trip_period_start = "2011-11-24"
Next Information
SELECT
stops.stop_number,
stops.stop_name_1,
stops.stop_name_2
FROM
tranzit.stops_times,
tranzit.stops
WHERE
stops_times.stop_id = stops.stop_id
AND
stops_times.trip_id in ( ...)
GROUP BY, ...
I think it will be faster, as you don't need other information from trips table outside the query.
the most tricky part is on the range query trip_period_start, trip_period_end,
I think you can consider a composite key like:-
alter table trips
add index testing
(
route_id, trip_direction, trip_period_start, trip_period_end
);
depend on how many unique value for trip_direction,
if always only a few unique values,
alter table trips
add index testing
(
route_id, trip_period_start, trip_period_end, trip_direction
);
Already less than 1 tenth of a second and you want it faster? ok...
I would build a composite index on ( route_id, trip_direction, trip_period_start ) as those are the three critical elements of your query. Also, in that order to have the smallest granularity to the front of the index (specific route). Then, within that, its direction, then, the dates. Next, I would swap the order of the query with the trips table up front since you are doing INNER joins. Additionally, have an index on your "stops_times" table on TRIP_ID. By starting with the first table with its qualifiers, then joining to the child-level tabls via relations, you still get the elements, but you are running against the smallest index set first on trips.
select STRAIGHT_JOIN
stops.stop_number,
stops.stop_name_1,
stops.stop_name_2
from
tranzit.trips
join tranzit.stops_times
on trips.trip_id = stops_times.trip_id
join tranzit.stops
on stops_times.stop_id = stops.stop_id
where
trips.route_id = 109
AND trips.trip_direction = 1
AND trips.trip_period_start <= "2011-11-24"
AND trips.trip_period_end >= "2011-11-24"
group by
stops.stop_id
ORDER BY
stops_times.time_sequence
LIMIT
0, 200
I found something that work like a charm. My results number are :
0.0011
0.0008
0.0017 (highest)
0.0006 (lowest)
0.0013
These result aren't from the cache. I switch all the WHERE in t (trips.agency_id, trips.route_id, trips.trip_direction, trips.trip_period_start, trips.trip_period_end) and it is working very good ! I can't explain why but if someone can, i'd like to see why. Thanks a lot everyone !
PS : Even without trips.agency_id it is working great.
SELECT
stops.stop_number,
stops.stop_name_1,
stops.stop_name_2
FROM
tranzit.stops_times,
tranzit.stops,
(
SELECT
trips.trip_id
FROM
tranzit.trips
WHERE
trips.agency_id = 5 AND
trips.route_id = 109 AND
trips.trip_direction = 0 AND
trips.trip_period_start <= "2011-12-01" AND
trips.trip_period_end >= "2011-12-01"
LIMIT 1
) as t
WHERE
stops_times.stop_id = stops.stop_id AND
stops_times.trip_id in (t.trip_id)
GROUP BY
stops_times.stop_id
ORDER BY
stops_times.time_sequence ASC
LIMIT
0, 200
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> system NULL NULL NULL NULL 1 Using temporary; Using filesort
1 PRIMARY stops_times ref trip_id,stop_id trip_id 16 const 33 Using where
1 PRIMARY stops ref stop_id stop_id 3 tranzit.stops_times.stop_id 1 Using where
2 DERIVED trips ref testing testing 4 275 Using where