MySQL and making subquery more efficient - mysql

I'm using MySQL and I have a query. There is also a subquery.
SELECT * FROM rg, list, status
WHERE (
(rg.required_status_id IS NULL AND rg.incorrect_status_id IS NULL) ||
(status.season_id = rg.required_status_id AND status.user_id = list.user_id) ||
(rg.incorrect_status_id IS NOT NULL AND
list.user_id NOT IN (SELECT user_id FROM status WHERE user_id = list.user_id AND season_id = rg.incorrect_status_id)
)
)
The problem is the following part of the code:
(rg.incorrect_status_id IS NOT NULL AND
list.user_id NOT IN (SELECT user_id FROM status WHERE user_id = list.user_id AND season_id = rg.incorrect_status_id)
)
How could I check if the table "status" has a row where user_id is same as list.user_id and season_id is same as rg.incorrect_status_id?
Update
Here is my current code, but it does not work at all. I do not know what to do.
SELECT * FROM rg, list, status
LEFT JOIN status AS stat
INNER JOIN rg AS rglist
ON rglist.incorrect_status_id = stat.season_id
ON stat.season_id = rglist.incorrect_status_id
WHERE (
(rg.required_status_id IS NULL AND rg.incorrect_status_id IS NULL) ||
(status.season_id = rg.required_status_id AND status.user_id = list.user_id) ||
(rg.incorrect_status_id IS NOT NULL AND stat.user_id IS NULL)
)
)
Update 2
I modified the names, but the basic idea is same.
FROM sarjojen_rglistat, sarjojen_rglistojen_osakilpailut, kilpailukausien_kilpailut, sarjojen_osakilpailuiden_rgpisteet
, sarjojen_kilpailukaudet, sarjojen_kilpailukausien_kilpailusysteemit
/* , kayttajien_ilmoittautumiset */
/* , sarjojen_kilpailukausien_pelaajastatukset */
LEFT OUTER JOIN sarjojen_kilpailukausien_pelaajastatukset
ON sarjojen_kilpailukausien_pelaajastatukset.sarjan_kilpailukausi_id = sarjojen_rglistat.vaadittu_pelaajastatus_id
LEFT OUTER JOIN kayttajien_ilmoittautumiset
ON kayttajien_ilmoittautumiset.kayttaja_id = sarjojen_kilpailukausien_pelaajastatukset.kayttaja_id
Now this says:
Column not found: 1054 Unknown column 'sarjojen_rglistat.vaadittu_pelaajastatus_id' in 'on clause'
Why is that so?
I have a table called "sarjojen_rglistat" and there is a column "vaadittu_pelaajastatus_id".

1) Simpler queries are easier for the query engine to interpret and produce an efficient plan.
If you pay careful attention to the following part of your query, you may realise something a little "weird" is going. This is a clue the approach is perhaps a little too complicated.
...(
list.user_id NOT IN (
SELECT user_id
FROM status
/* Note the sub-query cannot ever return a user_id different
to the one checked with "NOT IN" above */
WHERE user_id = list.user_id
AND season_id = rg.incorrect_status_id)
)
The query filtering where list.user_id is not in a result set that cannot contain user_id's other than list.user_id. Of course the sub-query could return zero results. So basically it boils down to a simple existence check.
So for a start, you should rather write:
...(
NOT EXISTS (
SELECT *
FROM status
WHERE user_id = list.user_id
AND season_id = rg.incorrect_status_id)
)
2) Be clear about your "what joins the tables together" (this refers back to 1 as well).
Your query selects from 3 tables without specifying any join conditions:
FROM rg, list, status
This would result in a cross join producing a result set that is a permutation combination of all possible row matches. If your WHERE clause were simple, the query engine might be able to implicitly promote certain filter conditions into join conditions, but that's not the case. So even if for example you have a very small number of rows in each table:
status 20
rg 100
list 1000
Your intermediate result set (before WHERE is applied),
would need 1000 * 100 * 20 = 2000000 rows!
It helps tremendously to make it clear with join conditions how the rows of each table are intended to match up. Not only does it make the query easier to read and understand, but it also helps avoid overlooking join conditions which can be the bane of performance considerations.
Note that when specifying join conditions, some rows might not have matches and this is where knowing and understanding the different types of joins is extremely important. Particularly in your case, most of the complexity in your WHERE clause seems to come from trying resolve when rows do/do not match. See this answer for some useful information.
Your FROM/WHERE clause should probably look more like the following. (Difficult to be certain because you haven't stated your table relationships or expected input/output of your query. But it should set you on the right track.)
FROM rg
/* Assumes rg rows form the base of the query, and not to have
some rg rows excluded due to non-matches in list or status. */
LEFT OUTER JOIN status ON
status.season_id = rg.required_status_id
LEFT OUTER JOIN list ON
status.user_id = list.user_id
WHERE rg.incorrect_status_id IS NULL
/* As Barmar commented, it may also be useful to break this
OR condition out as a separate query UNION to the above. */
OR (
rg.incorrect_status_id IS NOT NULL
AND NOT EXISTS (
SELECT *
FROM status
WHERE user_id = list.user_id
AND season_id = rg.incorrect_status_id)
)
Note that this query is very clear about the distinction between how the tables are joined, and what is used to filter the joined result set.
3) Finally and very importantly, even the best queries are of little benefit without the correct indexes!
A good query with bad indexes (or conversely a bad query with good indexes) is going to be inefficient either way. Computers are fast enough that you might not notice on small databases, but you do experiment with candidate indexes to find the best combination for your data and workload.
In the above query you likely need indexes on the following. (Some may already be covered by Primary Key constraints.)
status.season_id
status.user_id
list.user_id
rg.required_status_id
rg.incorrect_status_id

Use a UNION of subqueries that handle the 3 cases that you're combining with OR. You can then use explicit JOIN in each subquery to make it clear how the tables are related to each other (or not related at all when you're doing a full cross-product, as is the case when rg.required_status_id IS NULL AND rg.incorrect_status_id IS NULL).
SELECT rg.*, list.*, status.*
FROM rg
CROSS JOIN list
CROSS JOIN status
WHERE rg.required_status_id IS NULL AND rg.incorrect_status_id IS NULL
UNION ALL
SELECT rg.*, list.*, status.*
FROM rg
JOIN status ON rg.required_status_id = status.season_id
JOIN list ON status.user_id = list.user_id
UNION ALL
SELECT rg.*, list.*, status.*
FROM rg
CROSS JOIN list
LEFT JOIN status ON status.user_id = list.user_id AND status.season_id = rg.required_status_id
WHERE rg.incorrect_status_id IS NOT NULL AND status.season_id IS NULL

Related

Not quite a good enough JOIN? [duplicate]

I need to retrieve all default settings from the settings table but also grab the character setting if exists for x character.
But this query is only retrieving those settings where character is = 1, not the default settings if the user havent setted anyone.
SELECT `settings`.*, `character_settings`.`value`
FROM (`settings`)
LEFT JOIN `character_settings`
ON `character_settings`.`setting_id` = `settings`.`id`
WHERE `character_settings`.`character_id` = '1'
So i should need something like this:
array(
'0' => array('somekey' => 'keyname', 'value' => 'thevalue'),
'1' => array('somekey2' => 'keyname2'),
'2' => array('somekey3' => 'keyname3')
)
Where key 1 and 2 are the default values when key 0 contains the default value with the character value.
The where clause is filtering away rows where the left join doesn't succeed. Move it to the join:
SELECT `settings`.*, `character_settings`.`value`
FROM `settings`
LEFT JOIN
`character_settings`
ON `character_settings`.`setting_id` = `settings`.`id`
AND `character_settings`.`character_id` = '1'
When making OUTER JOINs (ANSI-89 or ANSI-92), filtration location matters because criteria specified in the ON clause is applied before the JOIN is made. Criteria against an OUTER JOINed table provided in the WHERE clause is applied after the JOIN is made. This can produce very different result sets. In comparison, it doesn't matter for INNER JOINs if the criteria is provided in the ON or WHERE clauses -- the result will be the same.
SELECT s.*,
cs.`value`
FROM SETTINGS s
LEFT JOIN CHARACTER_SETTINGS cs ON cs.setting_id = s.id
AND cs.character_id = 1
If I understand your question correctly you want records from the settings database if they don't have a join accross to the character_settings table or if that joined record has character_id = 1.
You should therefore do
SELECT `settings`.*, `character_settings`.`value`
FROM (`settings`)
LEFT OUTER JOIN `character_settings`
ON `character_settings`.`setting_id` = `settings`.`id`
WHERE `character_settings`.`character_id` = '1' OR
`character_settings`.character_id is NULL
You might find it easier to understand by using a simple subquery
SELECT `settings`.*, (
SELECT `value` FROM `character_settings`
WHERE `character_settings`.`setting_id` = `settings`.`id`
AND `character_settings`.`character_id` = '1') AS cv_value
FROM `settings`
The subquery is allowed to return null, so you don't have to worry about JOIN/WHERE in the main query.
Sometimes, this works faster in MySQL, but compare it against the LEFT JOIN form to see what works best for you.
SELECT s.*, c.value
FROM settings s
LEFT JOIN character_settings c ON c.setting_id = s.id AND c.character_id = '1'
For this problem, as for many others involving non-trivial left joins such as left-joining on inner-joined tables, I find it convenient and somewhat more readable to split the query with a with clause. In your example,
with settings_for_char as (
select setting_id, value from character_settings where character_id = 1
)
select
settings.*,
settings_for_char.value
from
settings
left join settings_for_char on settings_for_char.setting_id = settings.id;
The way I finally understand the top answer is realising (following the Order Of Execution of the SQL query ) that the WHERE clause is applied to the joined table thereby filtering out rows that do not satisfy the WHERE condition from the joined (or output) table. However, moving the WHERE condition to the ON clause applies it to the individual tables prior to joining. This enables the left join to retain rows from the left table even though some column entries of those rows (entries from the right tables) do not satisfy the WHERE condition.
The result is correct based on the SQL statement. Left join returns all values from the right table, and only matching values from the left table.
ID and NAME columns are from the right side table, so are returned.
Score is from the left table, and 30 is returned, as this value relates to Name "Flow". The other Names are NULL as they do not relate to Name "Flow".
The below would return the result you were expecting:
SELECT a.*, b.Score
FROM #Table1 a
LEFT JOIN #Table2 b
ON a.ID = b.T1_ID
WHERE 1=1
AND a.Name = 'Flow'
The SQL applies a filter on the right hand table.

MySQL Sum even if records doesnt exist [duplicate]

I need to retrieve all default settings from the settings table but also grab the character setting if exists for x character.
But this query is only retrieving those settings where character is = 1, not the default settings if the user havent setted anyone.
SELECT `settings`.*, `character_settings`.`value`
FROM (`settings`)
LEFT JOIN `character_settings`
ON `character_settings`.`setting_id` = `settings`.`id`
WHERE `character_settings`.`character_id` = '1'
So i should need something like this:
array(
'0' => array('somekey' => 'keyname', 'value' => 'thevalue'),
'1' => array('somekey2' => 'keyname2'),
'2' => array('somekey3' => 'keyname3')
)
Where key 1 and 2 are the default values when key 0 contains the default value with the character value.
The where clause is filtering away rows where the left join doesn't succeed. Move it to the join:
SELECT `settings`.*, `character_settings`.`value`
FROM `settings`
LEFT JOIN
`character_settings`
ON `character_settings`.`setting_id` = `settings`.`id`
AND `character_settings`.`character_id` = '1'
When making OUTER JOINs (ANSI-89 or ANSI-92), filtration location matters because criteria specified in the ON clause is applied before the JOIN is made. Criteria against an OUTER JOINed table provided in the WHERE clause is applied after the JOIN is made. This can produce very different result sets. In comparison, it doesn't matter for INNER JOINs if the criteria is provided in the ON or WHERE clauses -- the result will be the same.
SELECT s.*,
cs.`value`
FROM SETTINGS s
LEFT JOIN CHARACTER_SETTINGS cs ON cs.setting_id = s.id
AND cs.character_id = 1
If I understand your question correctly you want records from the settings database if they don't have a join accross to the character_settings table or if that joined record has character_id = 1.
You should therefore do
SELECT `settings`.*, `character_settings`.`value`
FROM (`settings`)
LEFT OUTER JOIN `character_settings`
ON `character_settings`.`setting_id` = `settings`.`id`
WHERE `character_settings`.`character_id` = '1' OR
`character_settings`.character_id is NULL
You might find it easier to understand by using a simple subquery
SELECT `settings`.*, (
SELECT `value` FROM `character_settings`
WHERE `character_settings`.`setting_id` = `settings`.`id`
AND `character_settings`.`character_id` = '1') AS cv_value
FROM `settings`
The subquery is allowed to return null, so you don't have to worry about JOIN/WHERE in the main query.
Sometimes, this works faster in MySQL, but compare it against the LEFT JOIN form to see what works best for you.
SELECT s.*, c.value
FROM settings s
LEFT JOIN character_settings c ON c.setting_id = s.id AND c.character_id = '1'
For this problem, as for many others involving non-trivial left joins such as left-joining on inner-joined tables, I find it convenient and somewhat more readable to split the query with a with clause. In your example,
with settings_for_char as (
select setting_id, value from character_settings where character_id = 1
)
select
settings.*,
settings_for_char.value
from
settings
left join settings_for_char on settings_for_char.setting_id = settings.id;
The way I finally understand the top answer is realising (following the Order Of Execution of the SQL query ) that the WHERE clause is applied to the joined table thereby filtering out rows that do not satisfy the WHERE condition from the joined (or output) table. However, moving the WHERE condition to the ON clause applies it to the individual tables prior to joining. This enables the left join to retain rows from the left table even though some column entries of those rows (entries from the right tables) do not satisfy the WHERE condition.
The result is correct based on the SQL statement. Left join returns all values from the right table, and only matching values from the left table.
ID and NAME columns are from the right side table, so are returned.
Score is from the left table, and 30 is returned, as this value relates to Name "Flow". The other Names are NULL as they do not relate to Name "Flow".
The below would return the result you were expecting:
SELECT a.*, b.Score
FROM #Table1 a
LEFT JOIN #Table2 b
ON a.ID = b.T1_ID
WHERE 1=1
AND a.Name = 'Flow'
The SQL applies a filter on the right hand table.

MySQL performance issue for concurrent users

When I am running a query on MySQL database, it is taking around 3 sec. When we execute the performance testing for 50 concurrent users, then the same query is taking 120 sec.
The query joins multiple tables with an order by clause and a limit condition.
We are using RDS instance (16 GB memory, 4 vCPU).
Can any one suggest how to improve the performance in this case?
Query:
SELECT
person0_.person_id AS person_i1_131_,
person0_.uuid AS uuid2_131_,
person0_.gender AS gender3_131_
CASE
WHEN
EXISTS( SELECT * FROM patient p WHERE p.patient_id = person0_.person_id)
THEN 1
ELSE 0
END AS formula1_,
CASE
WHEN person0_1_.patient_id IS NOT NULL THEN 1
WHEN person0_.person_id IS NOT NULL THEN 0
END AS clazz_
FROM
person person0_
LEFT OUTER JOIN
patient person0_1_ ON person0_.person_id = person0_1_.patient_id
INNER JOIN
person_attribute attributes1_ ON person0_.person_id = attributes1_.person_id
CROSS JOIN
person_attribute_type personattr2_
WHERE
attributes1_.person_attribute_type_id = personattr2_.person_attribute_type_id
AND personattr2_.name = 'PersonImageAttribute'
AND (person0_.person_id IN (SELECT
person3_.person_id
FROM
person person3_
INNER JOIN
person_attribute attributes4_ ON person3_.person_id = attributes4_.person_id
CROSS JOIN
person_attribute_type personattr5_
WHERE
attributes4_.person_attribute_type_id = personattr5_.person_attribute_type_id
AND personattr5_.name = 'LocationAttribute'
AND (attributes4_.value IN ('d31fe20e-6736-42ff-a3ed-b3e622e80842'))))
ORDER BY person0_1_.date_changed , person0_1_.patient_id
LIMIT 25
Plan
There appears to be some redundant query components, and what does not appear to be a proper context of CROSSS-JOIN when you have relation on specific patient and/or attribute info.
Your query getting the "clazz_" is based on a patient_id NOT NULL, but then again a person_id not null. Under what condition, would the person_id coming from the person table EVER be null. That sounds like a KEY ID and would NEVER be null, so why test for that. It seems like that is a duplicate field and in-essence is just the condition of a person actually being a patient vs not.
This query SHOULD get the same results otherwise and suggest the following specific indexes are available including
table index
person ( person_id )
person_attribute ( person_id, person_attribute_type_id )
person_attribute_type ( person_attribute_type_id, name )
patient ( patient_id )
select
p1.person_id AS person_i1_131_,
p1.uuid AS uuid2_131_,
p1.gender AS gender3_131_,
CASE WHEN p2.patient_id IS NULL
then 0 else 1 end formula1_,
-- appears to be a redunant result, just trying to qualify
-- some specific column value for later calculations.
CASE WHEN p2.patient_id IS NULL
THEN 0 else 1 end clazz_
from
-- pre-get only those people based on the P4 attribute in question
-- and attribute type of location. Get small list vs everything else
( SELECT distinct
pa.person_id
FROM
person_attribute pa
JOIN person_attribute_type pat
on pa.person_attribute_type_id = pat.person_attribute_type_id
AND pat.name = 'LocationAttribute'
WHERE
pa.value = 'd31fe20e-6736-42ff-a3ed-b3e622e80842' ) PQ
join person p1
on PQ.person_id = p1.person_id
LEFT JOIN patient p2
ON p1.person_id = p2.patient_id
JOIN person_attribute pa1
ON p1.person_id = pa1.person_id
JOIN person_attribute_type pat1
on pa1.person_attribute_type_id = pat1.person_attribute_type_id
AND pat1.name = 'PersonImageAttribute'
order by
p2.date_changed,
p2.patient_id
LIMIT
25
Finally, your query does an order by the date_changed and patient id which is based on the PATIENT table data having been changed. If that table is a left-join, you may have a bunch of PERSON records that are not patients and thus may not get
the expected records you really intent. So, just some personal review of what is presented in the question.
Speeding up the query is the best hope for handling more connections.
A simplification (but no speed difference), since TRUE=1 and FALSE=0:
CASE WHERE (boolean_expression) THEN 1 ELSE 0 END
-->
(boolean_expression)
Index suggestions:
person: INDEX(patient_id, date_changed)
person_attribute: INDEX(person_attribute_type_id, person_id)
person_attribute: INDEX(person_attribute_type_id, value, person_id)
person_attribute_type: INDEX(person_attribute_type_id, name)
If value is of type TEXT, then that cannot be used in an index.
Assuming that person has PRIMARY KEY(person_id) and patient -- patient_id, I have no extra recommendations for them.
The Entity-Attribute-Value schema pattern, which this seems to be, is hard to optimize when there are a large number of rows. Sorry.
The CROSS JOIN seems to be just an INNER JOIN, but with the condition in the WHERE instead of in ON, where it belongs.
person0_1_.patient_id can be NULL because of the LEFT JOIN, but I don't see how person0_.person_id can be NULL. Please check your logic.

Query optimization (multiple joins)

I would like to find a way to improve a query but it seems i've done it all. Let me give you some details.
Below is my query :
SELECT
`u`.`id` AS `id`,
`p`.`lastname` AS `lastname`,
`p`.`firstname` AS `firstname`,
COALESCE(`r`.`value`, 0) AS `rvalue`,
SUM(`rat`.`category` = 'A') AS `count_a`,
SUM(`rat`.`category` = 'B') AS `count_b`,
SUM(`rat`.`category` = 'C') AS `count_c`
FROM
`user` `u`
JOIN `user_customer` `uc` ON (`u`.`id` = `uc`.`user_id`)
JOIN `profile` `p` ON (`p`.`id` = `u`.`profile_id`)
JOIN `ad` FORCE INDEX (fk_ad_customer_idx) ON (`uc`.`customer_id` = `ad`.`customer_id`)
JOIN `ac` ON (`ac`.`id` = `ad`.`ac_id`)
JOIN `a` ON (`a`.`id` = `ac`.`a_id`)
JOIN `rat` ON (`rat`.`code` = `a`.`rat_code`)
LEFT JOIN `r` ON (`r`.`id` = `u`.`r_id`)
GROUP BY `u`.`id`
;
Note : Some table and column names are voluntarily hidden.
Now let me give you some volumetric data :
user => 6534 rows
user_customer => 12 923 rows
profile => 6511 rows
ad => 320 868 rows
ac => 4505 rows
a => 536 rows
rat => 6 rows
r => 3400 rows
And finally, my execution plan :
My query does currently run in around 1.3 to 1.7 seconds which is slow enough to annoy users of my application of course ... Also fyi result set is composed of 165 rows.
Is there a way I can improve this ?
Thanks.
EDIT 1 (answer to Rick James below) :
What are the speed and EXPLAIN when you don't use FORCE INDEX?
Surprisingly it gets faster when i don't use FORCE INDEX. To be honest, i don't really remember why i've done that change. I've probably found better results in terms of performance with it during one of my various tries and didn't remove it since.
When i don't use FORCE INDEX, it uses an other index ad_customer_ac_id_blocked_idx(customer_id, ac_id, blocked) and times are around 1.1 sec.
I don't really get it because fk_ad_customer_idx(customer_id) is the same when we talk about index on customer_id.
Get rid of FORCE INDEX. Even if it helped yesterday; it may hurt tomorrow.
Some of these indexes may be beneficial. (It is hard to predict; so simply add them all.)
a: (rat_code, id)
rat: (code, category)
ac: (a_id, id)
ad: (ac_id, customer_id)
ad: (customer_id, ac_id)
uc: (customer_id, user_id)
uc: (user_id, customer_id)
u: (profile_id, r_id, id)
(This assumes that id is the PRIMARY KEY of each table. Note that none have id first.) Most of the above are "covering".
Another approach that sometimes helps: Gather the SUMs before joining to any unnecessary table. But is seems that p is the only table not involved in getting from u (the target of GROUP BY) to r and rat (used in aggregates). It would look something like:
SELECT ..., firstname, lastname
FROM ( everything as above except for `p` ) AS most
JOIN `profile` `p` ON (`p`.`id` = most.`profile_id`)
GROUP BY most.id
This avoids hauling around firstname and lastname while doing most of the joins and the GROUP BY.
When doing JOINs and GROUP BY, be sure to sanity check the aggregates. Your COUNTs and SUMs may be larger than they should be.
First, you don't need to tick.everyTableAndColumn in your queries, nor result columns, aliases, etc. The tick marks are used primarily when you are in conflict with a reserved work so the parser knows you are referring to a specific column... like having a table with a COLUMN named "JOIN", but JOIN is part of SQL command... see the confusion it would cause. Helps clean readability too.
Next, and this is just personal preference and can help you and others following behind you on data and their relationships. I show the join as indented from where it is coming from. As you can see below, I see the chain on how do I get from the User (u alias) to the rat alias table... You get there only by going 5 levels deep, and I put the first table on the left-side of the join (coming from table) then = the table joining TO right-side of join.
Now, that I can see the relationships, I would suggest the following. Make COVERING indexes on your tables that have the criteria, and id/value where appropriate. This way the query gets as best it needs, the data from the index page vs having to go to the raw data. So here are suggestions for indexes.
table index
user_customer ( user_id, customer_id ) -- dont know what your fk_ad_customer_idx parts are)
ad ( customer_id, ac_id )
ac ( id, a_id )
a (id, rat_code )
rat ( code, category )
Reformatted query for readability and seeing relationships between the tables
SELECT
u.id,
p.lastname,
p.firstname,
COALESCE(r.value, 0) AS rvalue,
SUM(rat.category = 'A') AS count_a,
SUM(rat.category = 'B') AS count_b,
SUM(rat.category = 'C') AS count_c
FROM
user u
JOIN user_customer uc
ON u.id = uc.user_id
JOIN ad FORCE INDEX (fk_ad_customer_idx)
ON uc.customer_id = ad.customer_id
JOIN ac
ON ad.ac_id = ac.id
JOIN a
ON ac.a_id = a.id
JOIN rat
ON a.rat_code = rat.code
JOIN profile p
ON u.profile_id = p.id
LEFT JOIN r
ON u.r_id = r.id
GROUP BY
u.id

How to optimize query looking for rows where conditional join rows do not exist?

I've got a table of keywords that I regularly refresh against a remote search API, and I have another table that gets a row each each time I refresh one of the keywords. I use this table to block multiple processes from stepping on each other and refreshing the same keyword, as well as stat collection. So when I spin up my program, it queries for all the keywords that don't have a request currently in process, and don't have a successful one within the last 15 mins, or whatever the interval is. All was working fine for awhile, but now the keywords_requests table has almost 2 million rows in it and things are bogging down badly. I've got indexes on almost every column in the keywords_requests table, but to no avail.
I'm logging slow queries and this one is taking forever, as you can see. What can I do?
# Query_time: 20 Lock_time: 0 Rows_sent: 568 Rows_examined: 1826718
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT JOIN `keywords_requests` as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
AND KeywordsRequest.created > FROM_UNIXTIME(1234551323)
)
WHERE KeywordsRequest.id IS NULL
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;
It seems your most selective index on Keywords is one on KeywordRequest.created.
Try to rewrite query this way:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` as kr
WHERE created > FROM_UNIXTIME(1234567890) /* Happy unix_time! */
) AS KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
)
WHERE keyword_id IS NULL;
It will (hopefully) hash join two not so large sources.
And Bill Karwin is right, you don't need the GROUP BY or ORDER BY
There is no fine control over the plans in MySQL, but you can try (try) to improve your query in the following ways:
Create a composite index on (keyword_id, status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
UNION
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
This ideally should use NESTED LOOPS on your index.
Create a composite index on (status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
UNION ALL
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
ON keyword_id = id
WHERE keyword_id IS NULL
This will hopefully use HASH JOIN on even more restricted hash table.
When diagnosing MySQL query performance, one of the first things you need to analyze is the report from EXPLAIN.
If you learn to read the information EXPLAIN gives you, then you can see where queries are failing to make use of indexes, or where they are causing expensive filesorts, or other performance red flags.
I notice in your query, the GROUP BY is irrelevant, since there will be only one NULL row returned from KeywordRequests. Also the ORDER BY is irrelevant, since you're ordering by a column that will always be NULL due to your WHERE clause. If you remove these clauses, you'll probably eliminate a filesort.
Also consider rewriting the query into other forms, and measure the performance of each. For example:
SELECT k.id, k.keyword
FROM `keywords` AS k
WHERE NOT EXISTS (
SELECT * FROM `keywords_requests` AS kr
WHERE kr.keyword_id = k.id
AND kr.status IN ('success', 'active')
AND kr.source_id = '29'
AND kr.created > FROM_UNIXTIME(1234551323)
);
Other tips:
Is kr.source_id an integer? If so, compare to the integer 29 instead of the string '29'.
Are there appropriate indexes on keyword_id, status, source_id, created? Perhaps even a compound index over all four columns would be best, since MySQL will use only one index per table in a given query.
You did a screenshot of your EXPLAIN output and posted a link in the comments. I see that the query is not using an index from Keywords, which makes sense since you're scanning every row in that table anyway. The phrase "Not exists" indicates that MySQL has optimized the LEFT OUTER JOIN a bit.
I think this should be improved over your original query. The GROUP BY/ORDER BY was probably causing it to save an intermediate data set as a temporary table, and sorting it on disk (which is very slow!). What you'd look for is "Using temporary; using filesort" in the Extra column of EXPLAIN information.
So you may have improved it enough already to mitigate the bottleneck for now.
I do notice that the possible keys probably indicate that you have individual indexes on four columns. You may be able to improve that by creating a compound index:
CREATE INDEX kr_cover ON keywords_requests
(keyword_id, created, source_id, status);
You can give MySQL a hint to use a specific index:
... FROM `keywords_requests` AS kr USE INDEX (kr_cover) WHERE ...
Dunno about MySQL but in MSSQL the lines of attack I would take are:
1) Create a covering index on KeywordsRequest status, source_id and created
2) UNION the results tog et around the OR on KeywordsRequest.status
3) Use NOT EXISTS instead o the Outer Join (and try with UNION instead of OR too)
Try this
SELECT Keyword.id, Keyword.keyword
FROM keywords as Keyword
LEFT JOIN (select * from keywords_requests where source_id = '29' and (status = 'success' OR status = 'active')
AND source_id = '29'
AND created > FROM_UNIXTIME(1234551323)
AND id IS NULL
) as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
)
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;