Improve indexing to speed up slow query - mysql

SELECT SQL_NO_CACHE TIME_FORMAT(ADDTIME(journey.departure
, SEC_TO_TIME(SUM(link2.elapsed))), '%H:%i') AS departure
FROM journey
JOIN journey_day
ON journey_day.journey = journey.code
JOIN pattern
ON pattern.code = journey.pattern
JOIN service
ON service.code = pattern.service
JOIN link
ON link.section = pattern.section
AND link.stop = "370023591"
JOIN link link2
ON link2.section = pattern.section
AND link2.id <= link.id
WHERE journey_day.day = 6
GROUP BY journey.id
ORDER BY journey.departure
The above query takes 1-2 seconds to run. I need to reduce this to roughly 100ms. Please note that I understand the service table hasn't been used in the query, but that is just to simplify the question.
Any ideas how I can speed this up? I can see that the link table is using filesort, is this causing the slowness in the query?

One thought is that you could explicitly optimize the selection of the "link" table record with minimum "id" value.
Using a temporary table or materialized WITH statement are two approaches that would work to produce the result set. Two approaches to get the minimum value of "id" are 1) ordering by id, adding a row_number and selecting the first value; or 2) I use a windowed row_number, ordering by id, and again select the row with value 1.

Well planned indexes can by crucial to performance. From what you have presented, I would start with the following specific indexes... these are all covering indexes to qualify all the joins and criteria you will be working with. Covering indexes help the engine because the engine can get all the data that qualify without having to go to the raw data pages.
Most specifically, starting with your journey table, I would explicitly have the composite index based on all 3 fields in the order I have them... Day first as that is your WHERE criteria, then ID as that is the GROUP BY and finally DEPARTURE for your ORDER BY clause.
The LINK table based on section and stop first as those are the criteria as joined to the journey table. ID next as it is basis of joining to LINK2, and finally ELAPSED for your field criteria selection.
table index
journey (day, id, departure)
link (section, stop, id, elapsed)
pattern (code, service, section)
service (code)

Related

Query Speed Issue with NOT EXISTS condition

I have a query that works, but it is slow. Is there a way to speed this up? Basically I have a table with timecard entries, and then a second table with time breakdowns of that entry, related by the TimecardID. What I am looking for is timeblocks that there are no breakdowns for. I thought if I cut the criteria down to 2 months that it would speed it up. Thanks for your help
SELECT * FROM Timecards
WHERE NOT EXISTS (SELECT TimeCardID FROM TimecardBreakdown WHERE Timecards.ID = TimecardBreakdown.TimeCardID)
AND Status <> 0
AND DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
It seems you want to know the TimecardIDs which do not exist in the TimecardBreakdown table, in which case you can use the left outer join.
SELECT a.*
FROM Timecards a
LEFT OUTER JOIN TimecardBreakdown b ON a.TimecardID = b.TimecardID
WHERE b.TimecardID IS NULL
This would get rid of the subquery (which is expensive) and use join (which is more efficient).
MySQL stinks doing correlated subqueries fast. Try to make your subqueries independent and join them. You can use the LEFT JOIN ... IS NULL pattern to replace WHERE NOT EXISTS.
SELECT tc.*
FROM Timecards tc
LEFT JOIN TimecardBreakdown tcb ON tc.ID = tcb.TimeCardId
WHERE tc.DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
AND tc.Status <> 0
AND tcb.TimeCardId IS NULL
Some optimization points.
First, if you can change tc.Status <> 0 to tc.Status > 0 it makes an index range scan possible on that column.
Second, when you're optimizing stuff, SELECT * is considered harmful. Instead, if you can give the names of just the columns you need, things will be quicker. The database server has to sling around all the data you ask for; it can't tell if you're going to ignore some of it.
Third, this query will be helped by a compound index on Timecards (DateIn, Status, ID). That compound index can be used to do the heavy lifing of satisfying your query conditions.
That's called a covering index; it contains the data needed to satisfy much of your query. If you were to index just the DateIn column, then the query handler would have to bounce back to the main table to find the values of Status and ID. When those columns appear in the index, it saves that extra operation.
If you SELECT a certain set of columns rather than doing SELECT *, including those columns in the covering index can dramatically improve query performance. That's one of several reasons SELECT * is considered harmful.
(Some makes and model of DBMS have ways to specify lists of columns to ride along on indexes without actually indexing them. MySQL requires you to index them. But covering indexes still help.)
Read this: http://use-the-index-luke.com/

Optimize MySQL query for group_concat function

SELECT SQL_NO_CACHE link.stop, stop.common_name, locality.name, stop.bearing, stop.latitude, stop.longitude
FROM service
JOIN pattern ON pattern.service = service.code
JOIN link ON link.section = pattern.section
JOIN naptan.stop ON stop.atco_code = link.stop
JOIN naptan.locality ON locality.code = stop.nptg_locality_ref
GROUP BY link.stop
The above query takes roughly 800ms - 1000ms to run.
If I append a group_concat statement the query then takes 8 - 10 seconds:
SELECT SQL_NO_CACHE link.stop, link.stop, stop.common_name, locality.name, stop.bearing, stop.latitude, stop.longitude, group_concat(service.line) lines
How can I change this query so that it runs in less than 2 seconds with the group_concat statement?
SQL Fiddle: http://sqlfiddle.com/#!9/414fe
EXPLAIN statements for both queries: http://i.imgur.com/qrURgzV.png
How long does this query take?
SELECT p.section, GROUP_CONCAT(s.line)
FROM pattern p join
service s
ON p.service = s.code
GROUP BY p.section
I am thinking that you can do the group_concat() in a subquery, so the outer query does not need an aggregation. This can speed queries when there is one table in the subquery. In your case, there are two.
The final results would be something like:
link.section = pattern.section
SELECT SQL_NO_CACHE . . .,
(SELECT GROUP_CONCAT(s.line)
FROM pattern p join
service s
ON p.service = s.code
WHERE p.section = link.section
) as lines
FROM link JOIN
naptan.stop
ON stop.atco_code = link.stop JOIN
naptan.locality
ON locality.code = stop.nptg_locality_ref;
For this query, you want the following additional indexes: pattern(section, service) and service(code, line).
I don't know if this will work, but it is worth a try.
Note: this is assuming that you really don't need the group by for the rest of the columns.
A remark: You're using the nonstandard MySQL extension to GROUP BY. It happens to work for you because link.stop is joined to stop.atco_code, which itself is a primary key. But you need to be very careful with this.
I suggest you add some compound indexes. You join in to pattern on service and join out based on section. So add this index.
ALTER TABLE pattern ADD INDEX service_section (service, section, line);
This will let the query use just the index, and not have to hit the table itself to retrieve the information needed for the JOIN or your GROUP_CONCAT() operation. (You might also delete the index on just service, this new index makes it redundant).
Similarly, you want to create an index (section, stop) on the link table, and get rid of the index on just section.
On stop, you're using most of the columns, and you already have an index (PK) on atco_code, so let this one be.
Finally, on locality put an index on (code,name).
All this indexing monkey business should cut down the amount of work MySQL must do to satisfy your query.
Now look, as soon as you add WHERE anything = anything to the query, you may need to add a column to one or more of these indexes. You definitely should read up on multi-column indexing and grouping; good indexing is a critical success factor for your kind of data.
You should also run ANALYZE TABLE xxxx on each of your tables after inserting lots of rows, to make sure the query optimizer can see appropriate information about the content of the table and indexes.

Utilizing MySQL multi-column indexes with my JOIN statement

I'm having some trouble utilizing my MySQL multi-column indexes when joining. Maybe a bit of a newbie question, but I appreciate the help.
I have a multi-column index on the notification table across the "type", "status" and "timeSent" columns.
Running this query:
SELECT count(notificationID) FROM notification
WHERE statusID = 2
AND typeID = 1
AND timeSent BETWEEN "2014-01-01" AND "2014-02-01"
This uses my 3 column index and runs fast. If I want to get notifications for a specific client I need to join in my user table.
SELECT COUNT(a.notificationID) FROM notification a
LEFT JOIN user b ON a.userID = b.userID
WHERE a.statusID = 2
AND a.typeID = 1
AND b.clientID = 1
AND a.timeSent BETWEEN "2014-01-01" AND "2014-02-01"
This query ignores my index all together and screeches to halt as there are 15m records in the notification table. I've tried doing a sub-select instead with the same results.
Any input would be greatly appreciated.
Your notification table is pretty big, so it's not cheap to goof around trying to add indexes. But I will suggest that anyhow.
First of all, COUNT(*) is well known to be faster than COUNT(someColumn). I think you can make that change without changing the meaning of your query.
Second, your (typeID, statusID, timeSent) covering index is perfect for your first query. MySQL can random-access the index to the first date, sequentially scan to the second date, and just count the index entries, and so satisfy your query.
Third, let's take a look at that clientId-finding query. What it actually does is look up a date-range of notifications for particular, single, values of statusID, typeID, and userID. You specify the first two of those explicitly, and the query pulls the third one from the joined table. So, another covering index on (userID, typeID, statusID, timeSent) might very well do the trick for you.

Optimize slow SQL query using indexes

I have a problem optimizing a really slow SQL query. I think is an index problem, but I can´t find which index I have to apply.
This is the query:
SELECT
cl.ID, cl.title, cl.text, cl.price, cl.URL, cl.ID AS ad_id, cl.cat_id,
pix.file_name, area.area_name, qn.quarter_name
FROM classifieds cl
/*FORCE INDEX (date_created) */
INNER JOIN classifieds_pix pix ON cl.ID = pix.classified_id AND pix.picture_no = 0
INNER JOIN zip_codes zip ON cl.zip_id = zip.zip_id AND zip.area_id = 132
INNER JOIN area_names area ON zip.area_id = area.id
LEFT JOIN quarter_names qn ON zip.quarter_id = qn.id
WHERE
cl.confirmed = 1
AND cl.country = 'DE'
AND cl.date_created <= NOW() - INTERVAL 1 DAY
ORDER BY
cl.date_created
desc LIMIT 7
MySQL takes about 2 seconds to get the result, and start working in pix.picture_no, but if I force index to "date_created" the query goes much faster, and takes only 0.030 s. But the problem is that the "INNER JOIN zip_codes..." is not always in the query, and when is not, the forced index make the query slow again.
I've been thinking in make a solution by PHP conditions, but I would like to know what is the problem with indexes.
These are several suggestions on how to optimize your query.
NOW Function - You're using the NOW() function in your WHERE clause. Instead, I recommend to use a constant date / timestamp, to allow the value to be cached and optimized. Otherwise, the value of NOW() will be evaluated for each row in the WHERE clause. An alternative to a constant value in case you need a dynamic value, is to add the value from the application (for example calculate the current timestamp and inject it to the query as a constant in the application before executing the query.
To test this recommendation before implementing this change, just replace NOW() with a constant timestamp and check for performance improvements.
Indexes - in general, I would suggest adding an index the contains all columns of your WHERE clause, in this case: confirmed, country, date_created. Start with the column that will cut the amount of data the most and move forward from there. Make sure you adjust the WHERE clause to the same order of the index, otherwise the index won't be used.
I used EverSQL SQL Query Optimizer to get these recommendations (disclaimer: I'm a co-founder of EverSQL and humbly provide these suggestions).
I would actually have a compound index on all elements of your where such as
(country, confirmed, date_created)
Having the country first would keep your optimized index subset to one country first, then within that, those that are confirmed, and finally the date range itself. Don't query on just the date index alone. Since you are ordering by date, the index should be able to optimize it too.
Add explain in front of the query and run it again. This will show you the indexes that are being used.
See: 13.8.2 EXPLAIN Statement
And for an explanation of explain see MySQL Explain Explained. Or: Optimizing MySQL: Queries and Indexes

How can I improve the performance of this MySQL query?

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?
See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)
You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.
Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.
thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)