Improving query performance/rewriting query to be faster on MySQL - mysql

I have a couple of queries that run very slowly (several minutes) with the data currently in my database, and I'd like to improve their performance. Unfortunately they're kind of complex so the info I'm getting via google isn't enough for me to figure out what indexes to add or if I need to rewrite my queries or what... I'm hoping someone can help. I don't think they should be this slow, if things were set up properly.
The first query is:
SELECT i.name, i.id, COUNT(c.id)
FROM cert_certificates c
JOIN cert_histories h ON h.cert_certificate_id = c.id
LEFT OUTER JOIN inspectors i ON h.inspector_id = i.id
LEFT OUTER JOIN cert_histories h2
ON (h2.cert_certificate_id = c.id AND h.date_changed < h2.date_changed)
WHERE (h.cert_status_ref_id = ? OR h.cert_status_ref_id = ?)
AND h2.id IS NULL
GROUP BY i.id, i.name
ORDER BY i.name
The second query is:
SELECT l.letter, c.number
FROM cert_certificates c
JOIN cert_type_letter_refs l ON c.cert_type_letter_ref_id = l.id
JOIN cert_histories h ON h.cert_certificate_id = c.id
LEFT OUTER JOIN cert_histories h2
ON (h2.cert_certificate_id = c.id AND h.date_changed < h2.date_changed)
WHERE h.cert_status_ref_id = ?
AND h2.id IS NULL
AND h.inspector_id = ?
ORDER BY l.letter, c.number
The cert_certificates table contains nearly 19k records as does the cert_histories table (although in the future this table is expected to grow to approximately 2-3x the size of the cert_certificates table). The other tables are all quite small; less than 10 records each.
The only indexes right now are on id for each table and on cert_certificates.number. I read in a couple of places (e.g. here) to add indices for foreign keys, but in the case of the cert_histories table that'd be nearly all the columns (cert_certificate_id, inspector_id, cert_status_ref_id) which is also not advisable (according to some of the answers on that question e.g. Markus Winand's), so I'm kinda lost.
Any help would be greatly appreciated.
ETA: The results from EXPLAIN on the first query are (sorry for the hideous formatting; I'm using SQLyog which presents it in a nice table but it seems StackOverflow doesn't support tables?):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE h ALL NULL NULL NULL NULL 19740 Using where; Using temporary; Using filesort
1 SIMPLE i ref index_inspectors_on_id index_inspectors_on_id 768 marketing_development.h.inspector_id 1
1 SIMPLE c ref index_cert_certificates_on_id index_cert_certificates_on_id 768 marketing_development.h.cert_certificate_id 91 Using where; Using index
1 SIMPLE h2 ALL NULL NULL NULL NULL 19740 Using where
Second query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE h ALL NULL NULL NULL NULL 19795 Using where; Using temporary; Using filesort
1 SIMPLE c ref index_cert_certificates_on_id index_cert_certificates_on_id 768 marketing_development.h.cert_certificate_id 91 Using where
1 SIMPLE l ALL index_cert_type_letter_refs_on_id NULL NULL NULL 5 Using where; Using join buffer
1 SIMPLE h2 ALL NULL NULL NULL NULL 19795 Using where

You should create indices on your join fields:
cert_certificates.cert_type_letter_ref_id
cert_histories.cert_certificate_id
cert_histories.date_changed
cert_histories.inspector_id

Related

Find employees latest activity is slow when adding ORDER BY

I am working on a legacy system in Laravel and I am trying to pull the latest action of some specific types of actions an employee has done.
Performance is good when I don't add ORDER BY. When adding it the query will go from something like 130 ms to 18 seconds. There are about 1.5 million rows in the actions table.
How do I fix the performance problem?
I have tried to isolate the problem by cutting out all the other parts of the query so it is more readable for you:
SELECT
employees.id,
(
SELECT DATE_FORMAT(actions.date, '%Y-%m-%d')
FROM pivot
JOIN actions
ON pivot.actions_id = actions.id
WHERE employees.id = pivot.employee_id
AND (actions.type = 'meeting'
OR (actions.type = 'phone_call'
AND JSON_VALID(actions.data) = 1
AND actions.data->>'$.update_status' = 1))
LIMIT 1
) AS latest_action
FROM employees
ORDER BY latest_action DESC
I tried using LEFT JOIN and MAX() instead but it didn't seem to solve my problem.
I just added a subquery because it was the original query is already very complex. But if you have an alternative suggestion I am all ears.
UPDATE
Result of EXPLAIN:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY employees NULL ALL NULL NULL NULL NULL 15217 10 Using where
2 DEPENDENT SUBQUERY pivot NULL ref actions_type_index,pivot_type_index pivot_type_index 4 dev.employees.id 104 11.11 Using index condition
2 DEPENDENT SUBQUERY actions NULL eq_ref PRIMARY,Logs PRIMARY 4 dev.pivot.actions_id 1 6.68 Using where
UPDATE 2
Here is the indexes. The index employee_type I don't think is important for my specific query, but maybe it should be re-worked?
# pivot table
KEY `actions_type_index` (`actions_id`,`employee_type`),
KEY `pivot_type_index` (`employee_id`,`employee_type`)
# actions table
KEY `Logs` (`type`,`id`,`is_log`)
# I tried to add `date` index to `actions` table but the problem remains.
KEY `date_index` (`date`)
First of all your query is very non-optimal.
I would rewrite it this way:
SELECT
e.id,
DATE_FORMAT(vMAX(a.date), '%Y-%m-%d') AS latest_action
FROM employees e
LEFT JOIN pivot p ON p.employee_id = e.id
LEFT JOIN actions a ON p.actions_id = a.id AND (a.type = 'meeting'
OR (a.type = 'phone_call'
AND JSON_VALID(a.data) = 1
AND a.data->>'$.update_status' = 1))
GROUP BY e.id
ORDER BY latest_action DESC
Obviously there must be indexes on p.employee_id, p.actions_id, a.date. Also would be good on a.type.
Also it would be good to replace a.data->>'$.update_status' with some simple field with an index on it.

Can't figure out why this MySQL query is slow

I have one particular MySQL query which is slow, and I can't figure out why.
SELECT
s.title,
p.minPrice,
s.booking, r.url
FROM shows s
INNER JOIN showResources r
ON r.showID = s.id
INNER JOIN performances p
ON p.showID = s.id
WHERE s.lastDate >= CURDATE()
AND r.type = 'rectangle-poster'
AND p.minPrice > 0
GROUP BY s.id
ORDER BY p.minPrice ASC
LIMIT 30
The EXPLAIN for this query is as follows:
select_type table type possible_keys key key_len ref rows extra
1 SIMPLE s range PRIMARY,lastDate lastDate 4 NULL 291 Using index condition; Using temporary; Using filesort
1 SIMPLE r ref showID,type showID 5 thistle.s.id 1 Using where
1 SIMPLE p ref showID,minPrice showID 5 thistle.s.id 1 Using where
Other, seemingly far more complex queries on the same server are blisteringly fast - but this one typically takes about 4 seconds to run, and I just can't figure out why. I've even gone as far as deleting the tables and recreating them just in case it was some weird corruption, but no luck. Can a MySQL expert tell me what I'm doing wrong here?
Try this:
SELECT
s.id AS id,
s.title,
p.minPrice AS min_price,
s.booking,
r.url
FROM shows s
INNER JOIN showResources r
ON r.showID = s.id AND s.lastDate >= CURDATE() AND r.type = 'rectangle-poster'
INNER JOIN performances p
ON p.showID = s.id AND p.minPrice > 0
GROUP BY id
ORDER BY min_price ASC
LIMIT 30

MySQL Query Times out - Need to speed it up

I whipped up a query here that does something particular with retrieving results that do not match the join (as suggested by this SO question).
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE f_id = cf.f_id
)
Any ideas on how to speed this up? There are anywhere from 30k-200k rows it's looking through and appears to be using indexes, but the query times out.
EXPLAIN/DESCRIBE Info:
1 PRIMARY c ALL PRIMARY NULL NULL NULL 39119
1 PRIMARY cf ref c_id, c_id_2 c_id 8 ...c.id 11 Using where; Using index
2 DEPENDENT SUBQUERY following index NULL PRIMARY 8 NULL 35612 Using where; Using index
The comments table isn't used explicitly in the query. Is it being used for filtering? If not, try:
SELECT cf.f_id
FROM comments_following cf
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE follows.f_id = cf.f_id
)
By the way, if this generates a syntax error (because follows.f_id does not exist), then that is the problem. In that case, you would think you have a correlated subquery, but there is not really one.
Or the left outer join version:
SELECT cf.f_id
FROM comments_following cf left outer join
follows f
on f.f_id = cf.f_id
where f.f_id is null
Having an index on follows(f_id) should make both these versions run faster.
LEFT JOIN sometimes is faster then WHERE NOT EXISTS subquerys, try:
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
LEFT JOIN follows AS f ON f.f_id = cf.f_id
WHERE f.f_id IS NULL
The answer to this problem was to place a second index on follows.f_id.

SQL optimization - slow query

Given SQL takes 1.2s:
SELECT DISTINCT contracts.id, jt0.id, jt1.id, jt2.id, jt3.id FROM contracts
LEFT JOIN accounts jt0 ON jt0.id = contracts.account_id AND jt0.deleted=0
LEFT JOIN manufacturers jt1 ON jt1.id = contracts.manufacturer_id AND jt1.deleted=0
LEFT JOIN products jt2 ON jt2.id = contracts.product_id AND jt2.deleted=0
LEFT JOIN users jt3 ON jt3.id = contracts.assigned_user_id AND jt3.deleted=0
WHERE contracts.deleted=0
ORDER BY contracts.application_number ASC
LIMIT 0,21
here is what explain extended returns:
id select_type table type possible_keys key key_len ref rows
1 SIMPLE contracts ref idx_contracts_deleted idx_contracts_deleted 2 const 18968 100.00 Using where; Using temporary; Using filesort
1 SIMPLE jt0 eq_ref PRIMARY,idx_accnt_id_del,idx_accnt_assigned_del PRIMARY 108 xxx.contracts.account_id 1 100.00
1 SIMPLE jt1 eq_ref PRIMARY,idx_manufacturers_id_deleted,idx_manufacturers_deleted PRIMARY 108 xxx.contracts.manufacturer_id 1 100.00
1 SIMPLE jt2 eq_ref PRIMARY,idx_products_id_deleted,idx_products_deleted PRIMARY 108 xxx.contracts.product_id 1 100.00
1 SIMPLE jt3 eq_ref PRIMARY,idx_users_id_del,idx_users_id_deleted,idx_users_deleted PRIMARY 108 xxx.contracts.assigned_user_id 1 100.00
I need the distinct, I need all the joins to be left, I need order by and i need limit.
Can i optimize it somehow?
These are the only suggestions i've got
I hope the id's are defined as primary keys and the foreign keys with the relation between the tables.
Maybe the application_number can be indexed (then the sort will be faster)
Maybe, if you are using MyISAM, the sql could be faster if you lock the tables before selecting (don't forget to unlock afterwards)
Try changing the indexes on the subsidiary tables to include the deleted column:
accounts(id, deleted)
manufacturers(id, deleted)
products(id, deleted)
users(id, deleted)
By including all the columns in the index, MySQL has a better opportunity to take advantage of the index.
Another suggestion is to figure out what is causing the duplication in values and to use subqueries to eliminate the duplicates, rather than distinct.
For instance, with the above indexes:
from contracts c left join
(select id
from accounts
where deleted = 0
group by id
) a
on c.account_id = a.id
. . .
The subquery should only use the index, which might speed things up.
First create necessary index on the following columns.
contracts.application_number, manufacturers.deleted, products.deleted, users.deleted
SELECT DISTINCT contracts.id, jt0.id, jt1.id, jt2.id, jt3.id
FROM contracts
LEFT JOIN accounts jt0
ON contracts.deleted=0 AND jt0.id = contracts.account_id
LEFT JOIN manufacturers jt1
ON jt1.deleted=0 AND jt1.id = contracts.manufacturer_id
LEFT JOIN products jt2
ON jt2.deleted=0 AND jt2.id = contracts.product_id
LEFT JOIN users jt3
ON jt3.deleted=0 AND jt3.id = contracts.assigned_user_id
ORDER BY contracts.application_number ASC
LIMIT 0,21
As you have mentioned you are have already index on contracts.deleted
FROM
(SELECT * FROM contracts WHERE contracts.deleted = 0 USE INDEX(<deletedIndexName>))
LEFT JOIN
accounts jt0
ON
jt0.id = contracts.account_id
LEFT JOIN
...
Try a little referential integrity? I bet the query is much faster with inner joins. It should be, because the query optimizer has more to work with. You're paying the price at select time for not taking more care with create.
I would also remove deleted rows to their own tables, and strike the deleted columns. Your queries will be simpler and likely run faster.
try those indexes
CREATE INDEX PAW_IDX1921682121 ON products(deleted,id);
CREATE INDEX PAW_IDX1196677611 ON manufacturers(deleted,id);
CREATE INDEX PAW_IDX1360881332 ON users(deleted,id);
CREATE INDEX PAW_IDX1028958902 ON accounts(deleted,id);
CREATE INDEX PAW_IDX1564931998 ON contracts(deleted,application_number);

Left JOIN faster or Inner Join faster?

So... which one is faster (NULl value is not an issue), and are indexed.
SELECT * FROM A
JOIN B b ON b.id = a.id
JOIN C c ON c.id = b.id
WHERE A.id = '12345'
Using Left Joins:
SELECT * FROM A
LEFT JOIN B ON B.id=A.bid
LEFT JOIN C ON C.id=B.cid
WHERE A.id = '12345'
Here is the actual query
Here it is.. both return the same result
Query (0.2693sec) :
EXPLAIN EXTENDED SELECT *
FROM friend_events, zcms_users, user_events,
EVENTS WHERE friend_events.userid = '13006'
AND friend_events.state =0
AND UNIX_TIMESTAMP( friend_events.t ) >=1258923485
AND friend_events.xid = user_events.id
AND user_events.eid = events.eid
AND events.active =1
AND zcms_users.id = user_events.userid
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE zcms_users ALL PRIMARY NULL NULL NULL 43082
1 SIMPLE user_events ref PRIMARY,eid,userid userid 4 zcms_users.id 1
1 SIMPLE events eq_ref PRIMARY,active PRIMARY4 user_events.eid 1 Using where
1 SIMPLE friend_events eq_ref PRIMARY PRIMARY 8 user_events.id,const 1 Using where
LEFTJOIN QUERY: (0.0393 sec)
EXPLAIN EXTENDED SELECT *
FROM `friend_events`
LEFT JOIN `user_events` ON user_events.id = friend_events.xid
LEFT JOIN `events` ON user_events.eid = events.eid
LEFT JOIN `zcms_users` ON user_events.userid = zcms_users.id
WHERE (
events.active =1
)
AND (
friend_events.userid = '13006'
)
AND (
friend_events.state =0
)
AND (
UNIX_TIMESTAMP( friend_events.t ) >=1258923485
)
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE friend_events ALL PRIMARY NULL NULL NULL 53113 Using where
1 SIMPLE user_events eq_ref PRIMARY,eid PRIMARY 4 friend_events.xid 1 Using where
1 SIMPLE zcms_users eq_ref PRIMARY PRIMARY 4 user_events.userid 1
1 SIMPLE events eq_ref PRIMARY,active PRIMARY 4 user_events.eid 1 Using where
It depends; run them both to find out; then run an 'explain select' for an explanation.
The actual performance difference may range from "virtually non-existent" to "pretty significant" depending on how many rows in A with id='12345' have no matching records in B and C.
Update (based on posted query plans)
When you use INNER JOIN it doesn't matter (results-wise, not performance-wise) which table to start with, so optimizer tries to pick the one it thinks would perform best. It seems you have indexes on all appropriate PK / FK columns and you either don't have an index on friend_events.userid or there are too many records with userid = '13006' and it's not being used; either way optimizer picks the table with less rows as "base" - in this case it's zcms_users.
When you use LEFT JOIN it does matter (results-wise) which table to start with; thus friend_events is picked. Now why it takes less time that way I'm not quite sure; I'm guessing friend_events.userid condition helps. If you were to add an index (is it really varchar, btw? not numeric?) on that, your INNER JOIN might behave differently (and become faster) as well.
The INNER JOIN has to do an extra check to remove any records from A that don't have matching records in B and C. Depending on the number of records initially returned from A it COULD have an impact.
LEFT JOIN shows all data from A and only shows data from B/C only if the condition is true. As for INNER JOIN, it has to do some extra checking on both tables. So, I guess that explains why LEFT JOIN is faster.
Use EXPLAIN to see the query plan. It's probably the same plan for both cases, so I doubt it makes much difference, assuming there are no rows that don't match. But these are two different queries so it really doesn't make sense to compare them - you should just use the correct one.
Why not use the "INNER JOIN" keyword instead of "LEFT JOIN"?