SQL optimization - slow query - mysql

Given SQL takes 1.2s:
SELECT DISTINCT contracts.id, jt0.id, jt1.id, jt2.id, jt3.id FROM contracts
LEFT JOIN accounts jt0 ON jt0.id = contracts.account_id AND jt0.deleted=0
LEFT JOIN manufacturers jt1 ON jt1.id = contracts.manufacturer_id AND jt1.deleted=0
LEFT JOIN products jt2 ON jt2.id = contracts.product_id AND jt2.deleted=0
LEFT JOIN users jt3 ON jt3.id = contracts.assigned_user_id AND jt3.deleted=0
WHERE contracts.deleted=0
ORDER BY contracts.application_number ASC
LIMIT 0,21
here is what explain extended returns:
id select_type table type possible_keys key key_len ref rows
1 SIMPLE contracts ref idx_contracts_deleted idx_contracts_deleted 2 const 18968 100.00 Using where; Using temporary; Using filesort
1 SIMPLE jt0 eq_ref PRIMARY,idx_accnt_id_del,idx_accnt_assigned_del PRIMARY 108 xxx.contracts.account_id 1 100.00
1 SIMPLE jt1 eq_ref PRIMARY,idx_manufacturers_id_deleted,idx_manufacturers_deleted PRIMARY 108 xxx.contracts.manufacturer_id 1 100.00
1 SIMPLE jt2 eq_ref PRIMARY,idx_products_id_deleted,idx_products_deleted PRIMARY 108 xxx.contracts.product_id 1 100.00
1 SIMPLE jt3 eq_ref PRIMARY,idx_users_id_del,idx_users_id_deleted,idx_users_deleted PRIMARY 108 xxx.contracts.assigned_user_id 1 100.00
I need the distinct, I need all the joins to be left, I need order by and i need limit.
Can i optimize it somehow?

These are the only suggestions i've got
I hope the id's are defined as primary keys and the foreign keys with the relation between the tables.
Maybe the application_number can be indexed (then the sort will be faster)
Maybe, if you are using MyISAM, the sql could be faster if you lock the tables before selecting (don't forget to unlock afterwards)

Try changing the indexes on the subsidiary tables to include the deleted column:
accounts(id, deleted)
manufacturers(id, deleted)
products(id, deleted)
users(id, deleted)
By including all the columns in the index, MySQL has a better opportunity to take advantage of the index.
Another suggestion is to figure out what is causing the duplication in values and to use subqueries to eliminate the duplicates, rather than distinct.
For instance, with the above indexes:
from contracts c left join
(select id
from accounts
where deleted = 0
group by id
) a
on c.account_id = a.id
. . .
The subquery should only use the index, which might speed things up.

First create necessary index on the following columns.
contracts.application_number, manufacturers.deleted, products.deleted, users.deleted
SELECT DISTINCT contracts.id, jt0.id, jt1.id, jt2.id, jt3.id
FROM contracts
LEFT JOIN accounts jt0
ON contracts.deleted=0 AND jt0.id = contracts.account_id
LEFT JOIN manufacturers jt1
ON jt1.deleted=0 AND jt1.id = contracts.manufacturer_id
LEFT JOIN products jt2
ON jt2.deleted=0 AND jt2.id = contracts.product_id
LEFT JOIN users jt3
ON jt3.deleted=0 AND jt3.id = contracts.assigned_user_id
ORDER BY contracts.application_number ASC
LIMIT 0,21
As you have mentioned you are have already index on contracts.deleted
FROM
(SELECT * FROM contracts WHERE contracts.deleted = 0 USE INDEX(<deletedIndexName>))
LEFT JOIN
accounts jt0
ON
jt0.id = contracts.account_id
LEFT JOIN
...

Try a little referential integrity? I bet the query is much faster with inner joins. It should be, because the query optimizer has more to work with. You're paying the price at select time for not taking more care with create.
I would also remove deleted rows to their own tables, and strike the deleted columns. Your queries will be simpler and likely run faster.

try those indexes
CREATE INDEX PAW_IDX1921682121 ON products(deleted,id);
CREATE INDEX PAW_IDX1196677611 ON manufacturers(deleted,id);
CREATE INDEX PAW_IDX1360881332 ON users(deleted,id);
CREATE INDEX PAW_IDX1028958902 ON accounts(deleted,id);
CREATE INDEX PAW_IDX1564931998 ON contracts(deleted,application_number);

Related

Find employees latest activity is slow when adding ORDER BY

I am working on a legacy system in Laravel and I am trying to pull the latest action of some specific types of actions an employee has done.
Performance is good when I don't add ORDER BY. When adding it the query will go from something like 130 ms to 18 seconds. There are about 1.5 million rows in the actions table.
How do I fix the performance problem?
I have tried to isolate the problem by cutting out all the other parts of the query so it is more readable for you:
SELECT
employees.id,
(
SELECT DATE_FORMAT(actions.date, '%Y-%m-%d')
FROM pivot
JOIN actions
ON pivot.actions_id = actions.id
WHERE employees.id = pivot.employee_id
AND (actions.type = 'meeting'
OR (actions.type = 'phone_call'
AND JSON_VALID(actions.data) = 1
AND actions.data->>'$.update_status' = 1))
LIMIT 1
) AS latest_action
FROM employees
ORDER BY latest_action DESC
I tried using LEFT JOIN and MAX() instead but it didn't seem to solve my problem.
I just added a subquery because it was the original query is already very complex. But if you have an alternative suggestion I am all ears.
UPDATE
Result of EXPLAIN:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY employees NULL ALL NULL NULL NULL NULL 15217 10 Using where
2 DEPENDENT SUBQUERY pivot NULL ref actions_type_index,pivot_type_index pivot_type_index 4 dev.employees.id 104 11.11 Using index condition
2 DEPENDENT SUBQUERY actions NULL eq_ref PRIMARY,Logs PRIMARY 4 dev.pivot.actions_id 1 6.68 Using where
UPDATE 2
Here is the indexes. The index employee_type I don't think is important for my specific query, but maybe it should be re-worked?
# pivot table
KEY `actions_type_index` (`actions_id`,`employee_type`),
KEY `pivot_type_index` (`employee_id`,`employee_type`)
# actions table
KEY `Logs` (`type`,`id`,`is_log`)
# I tried to add `date` index to `actions` table but the problem remains.
KEY `date_index` (`date`)
First of all your query is very non-optimal.
I would rewrite it this way:
SELECT
e.id,
DATE_FORMAT(vMAX(a.date), '%Y-%m-%d') AS latest_action
FROM employees e
LEFT JOIN pivot p ON p.employee_id = e.id
LEFT JOIN actions a ON p.actions_id = a.id AND (a.type = 'meeting'
OR (a.type = 'phone_call'
AND JSON_VALID(a.data) = 1
AND a.data->>'$.update_status' = 1))
GROUP BY e.id
ORDER BY latest_action DESC
Obviously there must be indexes on p.employee_id, p.actions_id, a.date. Also would be good on a.type.
Also it would be good to replace a.data->>'$.update_status' with some simple field with an index on it.

Optimizing self join in mysql

The following query takes forever to complete. I've added the indexes on all fields included in the join, tried putting the where conditions into the join and I thought I'd ask for advice before tinkering with FORCE/USE indexes. It just seems that indexes should be used on both sides of this join. Seems only i1 is being used.
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a ALL i1,i2,i3 2399303 100.00 Using temporary
1 SIMPLE b ref i1,i2,i3 i1 5 db.a.bt 11996 100.00 Using where
create index i1 on obs(bt);
create index i2 on obs(st);
create index i3 on obs(bt,st);
create index i4 on obs(sid);
explain extended
select distinct b.sid
from obs a inner join obs b on a.bt = b.bt and a.st = b.st
where
a.sid != b.sid and
abs( datediff( b.sid_start_date , a.sid_expire_date ) ) < 60;
I've tried both ALTER TABLE and CREATE INDEX above to add indexes to obs.
Since you are not selecting any of the columns in table a, it may be better to use an exists. An exists allows you to check if the information you are looking for is in a specified table without using a join. Removing the join improves the performance. I also like exists because I think it makes the query easier to understand when you come back to it months later.
select distinct b.sid
from obs b
where exists (Select 1
From obs a
Where a.bt = b.bt
and a.st = b.st
and a.sid != b.sid
and abs( datediff( b.sid_start_date , a.sid_expire_date ) ) < 60);

How to make JOINS faster?

I had this query to start out with:
SELECT DISTINCT spentits.*
FROM `spentits`
WHERE (spentits.user_id IN
(SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44')
ORDER BY id DESC
LIMIT 15 OFFSET 0
This query takes 10ms to execute.
But once I add a simple join in:
SELECT DISTINCT spentits.*
FROM `spentits`
LEFT JOIN wishlist_items ON wishlist_items.user_id = 44 AND wishlist_items.spentit_id = spentits.id
WHERE (spentits.user_id IN
(SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44')
ORDER BY id DESC
LIMIT 15 OFFSET 0
This execute time increased by 11x. Now it takes around 120ms to execute. What's interesting is that if I remove either the LEFT JOIN clause or the ORDER BY id DESC , the time goes back to 10ms.
I am new to databases so I don't understand this. Why is it that removing either one of these clauses speeds it up 11x ? And how can I keep it as is but make it faster?
I have indexes on spentits.user_id, follows.follower_id, follows.accepted, and on primary ids of each table.
EXPLAIN:
1 PRIMARY spentits index index_spentits_on_user_id PRIMARY 4 NULL 15 Using where; Using temporary
1 PRIMARY wishlist_items ref index_wishlist_items_on_user_id,index_wishlist_items_on_spentit_id index_wishlist_items_on_spentit_id 5 spentit.spentits.id 1 Using where; Distinct
2 SUBQUERY follows index_merge index_follows_on_follower_id,index_follows_on_following_id,index_follows_on_accepted
index_follows_on_follower_id,index_follows_on_accepted 5,2 NULL 566 Using intersect(index_follows_on_follower_id,index_follows_on_accepted); Using where
You should have index also on:
wishlist_items.spentit_id
Because you are joining over that column
The LEFT JOIN is easy to explain: A cross product of all entries against all other entries is made. The conditions of the join (in your case: Take all entries on the left and find fitting ones on the right) are applied afterwards. So if your spentits table is large it will take the server some time. Would suggest you get rid of your subquery and make three joins. Start with the smallest table to avoid big amounts of data.
In the 2nd example the subselect runs for every spentits.user_id.
If you write is like this it will be faster because the subselect runs once:
SELECT DISTINCT spentits.*
FROM `spentits`, (SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44') as `follow`
LEFT JOIN wishlist_items ON wishlist_items.user_id = 44 AND wishlist_items.spentit_id = spentits.id
WHERE (spentits.user_id IN
(follow)
ORDER BY id DESC
LIMIT 15 OFFSET 0
As you can see the subselect moved to the FROM-part of the query and creates a imaginary tabel (or view).
This imaginary tabel is a inline-view.
JOINs and inline-views are faster every time than a subselect in the WHERE-part.

Improving query performance/rewriting query to be faster on MySQL

I have a couple of queries that run very slowly (several minutes) with the data currently in my database, and I'd like to improve their performance. Unfortunately they're kind of complex so the info I'm getting via google isn't enough for me to figure out what indexes to add or if I need to rewrite my queries or what... I'm hoping someone can help. I don't think they should be this slow, if things were set up properly.
The first query is:
SELECT i.name, i.id, COUNT(c.id)
FROM cert_certificates c
JOIN cert_histories h ON h.cert_certificate_id = c.id
LEFT OUTER JOIN inspectors i ON h.inspector_id = i.id
LEFT OUTER JOIN cert_histories h2
ON (h2.cert_certificate_id = c.id AND h.date_changed < h2.date_changed)
WHERE (h.cert_status_ref_id = ? OR h.cert_status_ref_id = ?)
AND h2.id IS NULL
GROUP BY i.id, i.name
ORDER BY i.name
The second query is:
SELECT l.letter, c.number
FROM cert_certificates c
JOIN cert_type_letter_refs l ON c.cert_type_letter_ref_id = l.id
JOIN cert_histories h ON h.cert_certificate_id = c.id
LEFT OUTER JOIN cert_histories h2
ON (h2.cert_certificate_id = c.id AND h.date_changed < h2.date_changed)
WHERE h.cert_status_ref_id = ?
AND h2.id IS NULL
AND h.inspector_id = ?
ORDER BY l.letter, c.number
The cert_certificates table contains nearly 19k records as does the cert_histories table (although in the future this table is expected to grow to approximately 2-3x the size of the cert_certificates table). The other tables are all quite small; less than 10 records each.
The only indexes right now are on id for each table and on cert_certificates.number. I read in a couple of places (e.g. here) to add indices for foreign keys, but in the case of the cert_histories table that'd be nearly all the columns (cert_certificate_id, inspector_id, cert_status_ref_id) which is also not advisable (according to some of the answers on that question e.g. Markus Winand's), so I'm kinda lost.
Any help would be greatly appreciated.
ETA: The results from EXPLAIN on the first query are (sorry for the hideous formatting; I'm using SQLyog which presents it in a nice table but it seems StackOverflow doesn't support tables?):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE h ALL NULL NULL NULL NULL 19740 Using where; Using temporary; Using filesort
1 SIMPLE i ref index_inspectors_on_id index_inspectors_on_id 768 marketing_development.h.inspector_id 1
1 SIMPLE c ref index_cert_certificates_on_id index_cert_certificates_on_id 768 marketing_development.h.cert_certificate_id 91 Using where; Using index
1 SIMPLE h2 ALL NULL NULL NULL NULL 19740 Using where
Second query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE h ALL NULL NULL NULL NULL 19795 Using where; Using temporary; Using filesort
1 SIMPLE c ref index_cert_certificates_on_id index_cert_certificates_on_id 768 marketing_development.h.cert_certificate_id 91 Using where
1 SIMPLE l ALL index_cert_type_letter_refs_on_id NULL NULL NULL 5 Using where; Using join buffer
1 SIMPLE h2 ALL NULL NULL NULL NULL 19795 Using where
You should create indices on your join fields:
cert_certificates.cert_type_letter_ref_id
cert_histories.cert_certificate_id
cert_histories.date_changed
cert_histories.inspector_id

Left JOIN faster or Inner Join faster?

So... which one is faster (NULl value is not an issue), and are indexed.
SELECT * FROM A
JOIN B b ON b.id = a.id
JOIN C c ON c.id = b.id
WHERE A.id = '12345'
Using Left Joins:
SELECT * FROM A
LEFT JOIN B ON B.id=A.bid
LEFT JOIN C ON C.id=B.cid
WHERE A.id = '12345'
Here is the actual query
Here it is.. both return the same result
Query (0.2693sec) :
EXPLAIN EXTENDED SELECT *
FROM friend_events, zcms_users, user_events,
EVENTS WHERE friend_events.userid = '13006'
AND friend_events.state =0
AND UNIX_TIMESTAMP( friend_events.t ) >=1258923485
AND friend_events.xid = user_events.id
AND user_events.eid = events.eid
AND events.active =1
AND zcms_users.id = user_events.userid
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE zcms_users ALL PRIMARY NULL NULL NULL 43082
1 SIMPLE user_events ref PRIMARY,eid,userid userid 4 zcms_users.id 1
1 SIMPLE events eq_ref PRIMARY,active PRIMARY4 user_events.eid 1 Using where
1 SIMPLE friend_events eq_ref PRIMARY PRIMARY 8 user_events.id,const 1 Using where
LEFTJOIN QUERY: (0.0393 sec)
EXPLAIN EXTENDED SELECT *
FROM `friend_events`
LEFT JOIN `user_events` ON user_events.id = friend_events.xid
LEFT JOIN `events` ON user_events.eid = events.eid
LEFT JOIN `zcms_users` ON user_events.userid = zcms_users.id
WHERE (
events.active =1
)
AND (
friend_events.userid = '13006'
)
AND (
friend_events.state =0
)
AND (
UNIX_TIMESTAMP( friend_events.t ) >=1258923485
)
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE friend_events ALL PRIMARY NULL NULL NULL 53113 Using where
1 SIMPLE user_events eq_ref PRIMARY,eid PRIMARY 4 friend_events.xid 1 Using where
1 SIMPLE zcms_users eq_ref PRIMARY PRIMARY 4 user_events.userid 1
1 SIMPLE events eq_ref PRIMARY,active PRIMARY 4 user_events.eid 1 Using where
It depends; run them both to find out; then run an 'explain select' for an explanation.
The actual performance difference may range from "virtually non-existent" to "pretty significant" depending on how many rows in A with id='12345' have no matching records in B and C.
Update (based on posted query plans)
When you use INNER JOIN it doesn't matter (results-wise, not performance-wise) which table to start with, so optimizer tries to pick the one it thinks would perform best. It seems you have indexes on all appropriate PK / FK columns and you either don't have an index on friend_events.userid or there are too many records with userid = '13006' and it's not being used; either way optimizer picks the table with less rows as "base" - in this case it's zcms_users.
When you use LEFT JOIN it does matter (results-wise) which table to start with; thus friend_events is picked. Now why it takes less time that way I'm not quite sure; I'm guessing friend_events.userid condition helps. If you were to add an index (is it really varchar, btw? not numeric?) on that, your INNER JOIN might behave differently (and become faster) as well.
The INNER JOIN has to do an extra check to remove any records from A that don't have matching records in B and C. Depending on the number of records initially returned from A it COULD have an impact.
LEFT JOIN shows all data from A and only shows data from B/C only if the condition is true. As for INNER JOIN, it has to do some extra checking on both tables. So, I guess that explains why LEFT JOIN is faster.
Use EXPLAIN to see the query plan. It's probably the same plan for both cases, so I doubt it makes much difference, assuming there are no rows that don't match. But these are two different queries so it really doesn't make sense to compare them - you should just use the correct one.
Why not use the "INNER JOIN" keyword instead of "LEFT JOIN"?