Query optimization (multiple joins)

Query optimization (multiple joins) - mysql

I would like to find a way to improve a query but it seems i've done it all. Let me give you some details.
Below is my query :
SELECT
`u`.`id` AS `id`,
`p`.`lastname` AS `lastname`,
`p`.`firstname` AS `firstname`,
COALESCE(`r`.`value`, 0) AS `rvalue`,
SUM(`rat`.`category` = 'A') AS `count_a`,
SUM(`rat`.`category` = 'B') AS `count_b`,
SUM(`rat`.`category` = 'C') AS `count_c`
FROM
`user` `u`
JOIN `user_customer` `uc` ON (`u`.`id` = `uc`.`user_id`)
JOIN `profile` `p` ON (`p`.`id` = `u`.`profile_id`)
JOIN `ad` FORCE INDEX (fk_ad_customer_idx) ON (`uc`.`customer_id` = `ad`.`customer_id`)
JOIN `ac` ON (`ac`.`id` = `ad`.`ac_id`)
JOIN `a` ON (`a`.`id` = `ac`.`a_id`)
JOIN `rat` ON (`rat`.`code` = `a`.`rat_code`)
LEFT JOIN `r` ON (`r`.`id` = `u`.`r_id`)
GROUP BY `u`.`id`
;
Note : Some table and column names are voluntarily hidden.
Now let me give you some volumetric data :
user => 6534 rows
user_customer => 12 923 rows
profile => 6511 rows
ad => 320 868 rows
ac => 4505 rows
a => 536 rows
rat => 6 rows
r => 3400 rows
And finally, my execution plan :
My query does currently run in around 1.3 to 1.7 seconds which is slow enough to annoy users of my application of course ... Also fyi result set is composed of 165 rows.
Is there a way I can improve this ?
Thanks.
EDIT 1 (answer to Rick James below) :
What are the speed and EXPLAIN when you don't use FORCE INDEX?
Surprisingly it gets faster when i don't use FORCE INDEX. To be honest, i don't really remember why i've done that change. I've probably found better results in terms of performance with it during one of my various tries and didn't remove it since.
When i don't use FORCE INDEX, it uses an other index ad_customer_ac_id_blocked_idx(customer_id, ac_id, blocked) and times are around 1.1 sec.
I don't really get it because fk_ad_customer_idx(customer_id) is the same when we talk about index on customer_id.

Get rid of FORCE INDEX. Even if it helped yesterday; it may hurt tomorrow.
Some of these indexes may be beneficial. (It is hard to predict; so simply add them all.)
a: (rat_code, id)
rat: (code, category)
ac: (a_id, id)
ad: (ac_id, customer_id)
ad: (customer_id, ac_id)
uc: (customer_id, user_id)
uc: (user_id, customer_id)
u: (profile_id, r_id, id)
(This assumes that id is the PRIMARY KEY of each table. Note that none have id first.) Most of the above are "covering".
Another approach that sometimes helps: Gather the SUMs before joining to any unnecessary table. But is seems that p is the only table not involved in getting from u (the target of GROUP BY) to r and rat (used in aggregates). It would look something like:
SELECT ..., firstname, lastname
FROM ( everything as above except for `p` ) AS most
JOIN `profile` `p` ON (`p`.`id` = most.`profile_id`)
GROUP BY most.id
This avoids hauling around firstname and lastname while doing most of the joins and the GROUP BY.
When doing JOINs and GROUP BY, be sure to sanity check the aggregates. Your COUNTs and SUMs may be larger than they should be.

First, you don't need to tick.everyTableAndColumn in your queries, nor result columns, aliases, etc. The tick marks are used primarily when you are in conflict with a reserved work so the parser knows you are referring to a specific column... like having a table with a COLUMN named "JOIN", but JOIN is part of SQL command... see the confusion it would cause. Helps clean readability too.
Next, and this is just personal preference and can help you and others following behind you on data and their relationships. I show the join as indented from where it is coming from. As you can see below, I see the chain on how do I get from the User (u alias) to the rat alias table... You get there only by going 5 levels deep, and I put the first table on the left-side of the join (coming from table) then = the table joining TO right-side of join.
Now, that I can see the relationships, I would suggest the following. Make COVERING indexes on your tables that have the criteria, and id/value where appropriate. This way the query gets as best it needs, the data from the index page vs having to go to the raw data. So here are suggestions for indexes.
table index
user_customer ( user_id, customer_id ) -- dont know what your fk_ad_customer_idx parts are)
ad ( customer_id, ac_id )
ac ( id, a_id )
a (id, rat_code )
rat ( code, category )
Reformatted query for readability and seeing relationships between the tables
SELECT
u.id,
p.lastname,
p.firstname,
COALESCE(r.value, 0) AS rvalue,
SUM(rat.category = 'A') AS count_a,
SUM(rat.category = 'B') AS count_b,
SUM(rat.category = 'C') AS count_c
FROM
user u
JOIN user_customer uc
ON u.id = uc.user_id
JOIN ad FORCE INDEX (fk_ad_customer_idx)
ON uc.customer_id = ad.customer_id
JOIN ac
ON ad.ac_id = ac.id
JOIN a
ON ac.a_id = a.id
JOIN rat
ON a.rat_code = rat.code
JOIN profile p
ON u.profile_id = p.id
LEFT JOIN r
ON u.r_id = r.id
GROUP BY
u.id

Related

MySQL performance issue for concurrent users

When I am running a query on MySQL database, it is taking around 3 sec. When we execute the performance testing for 50 concurrent users, then the same query is taking 120 sec.
The query joins multiple tables with an order by clause and a limit condition.
We are using RDS instance (16 GB memory, 4 vCPU).
Can any one suggest how to improve the performance in this case?
Query:
SELECT
person0_.person_id AS person_i1_131_,
person0_.uuid AS uuid2_131_,
person0_.gender AS gender3_131_
CASE
WHEN
EXISTS( SELECT * FROM patient p WHERE p.patient_id = person0_.person_id)
THEN 1
ELSE 0
END AS formula1_,
CASE
WHEN person0_1_.patient_id IS NOT NULL THEN 1
WHEN person0_.person_id IS NOT NULL THEN 0
END AS clazz_
FROM
person person0_
LEFT OUTER JOIN
patient person0_1_ ON person0_.person_id = person0_1_.patient_id
INNER JOIN
person_attribute attributes1_ ON person0_.person_id = attributes1_.person_id
CROSS JOIN
person_attribute_type personattr2_
WHERE
attributes1_.person_attribute_type_id = personattr2_.person_attribute_type_id
AND personattr2_.name = 'PersonImageAttribute'
AND (person0_.person_id IN (SELECT
person3_.person_id
FROM
person person3_
INNER JOIN
person_attribute attributes4_ ON person3_.person_id = attributes4_.person_id
CROSS JOIN
person_attribute_type personattr5_
WHERE
attributes4_.person_attribute_type_id = personattr5_.person_attribute_type_id
AND personattr5_.name = 'LocationAttribute'
AND (attributes4_.value IN ('d31fe20e-6736-42ff-a3ed-b3e622e80842'))))
ORDER BY person0_1_.date_changed , person0_1_.patient_id
LIMIT 25
Plan

There appears to be some redundant query components, and what does not appear to be a proper context of CROSSS-JOIN when you have relation on specific patient and/or attribute info.
Your query getting the "clazz_" is based on a patient_id NOT NULL, but then again a person_id not null. Under what condition, would the person_id coming from the person table EVER be null. That sounds like a KEY ID and would NEVER be null, so why test for that. It seems like that is a duplicate field and in-essence is just the condition of a person actually being a patient vs not.
This query SHOULD get the same results otherwise and suggest the following specific indexes are available including
table index
person ( person_id )
person_attribute ( person_id, person_attribute_type_id )
person_attribute_type ( person_attribute_type_id, name )
patient ( patient_id )
select
p1.person_id AS person_i1_131_,
p1.uuid AS uuid2_131_,
p1.gender AS gender3_131_,
CASE WHEN p2.patient_id IS NULL
then 0 else 1 end formula1_,
-- appears to be a redunant result, just trying to qualify
-- some specific column value for later calculations.
CASE WHEN p2.patient_id IS NULL
THEN 0 else 1 end clazz_
from
-- pre-get only those people based on the P4 attribute in question
-- and attribute type of location. Get small list vs everything else
( SELECT distinct
pa.person_id
FROM
person_attribute pa
JOIN person_attribute_type pat
on pa.person_attribute_type_id = pat.person_attribute_type_id
AND pat.name = 'LocationAttribute'
WHERE
pa.value = 'd31fe20e-6736-42ff-a3ed-b3e622e80842' ) PQ
join person p1
on PQ.person_id = p1.person_id
LEFT JOIN patient p2
ON p1.person_id = p2.patient_id
JOIN person_attribute pa1
ON p1.person_id = pa1.person_id
JOIN person_attribute_type pat1
on pa1.person_attribute_type_id = pat1.person_attribute_type_id
AND pat1.name = 'PersonImageAttribute'
order by
p2.date_changed,
p2.patient_id
LIMIT
25
Finally, your query does an order by the date_changed and patient id which is based on the PATIENT table data having been changed. If that table is a left-join, you may have a bunch of PERSON records that are not patients and thus may not get
the expected records you really intent. So, just some personal review of what is presented in the question.

Speeding up the query is the best hope for handling more connections.
A simplification (but no speed difference), since TRUE=1 and FALSE=0:
CASE WHERE (boolean_expression) THEN 1 ELSE 0 END
-->
(boolean_expression)
Index suggestions:
person: INDEX(patient_id, date_changed)
person_attribute: INDEX(person_attribute_type_id, person_id)
person_attribute: INDEX(person_attribute_type_id, value, person_id)
person_attribute_type: INDEX(person_attribute_type_id, name)
If value is of type TEXT, then that cannot be used in an index.
Assuming that person has PRIMARY KEY(person_id) and patient -- patient_id, I have no extra recommendations for them.
The Entity-Attribute-Value schema pattern, which this seems to be, is hard to optimize when there are a large number of rows. Sorry.
The CROSS JOIN seems to be just an INNER JOIN, but with the condition in the WHERE instead of in ON, where it belongs.
person0_1_.patient_id can be NULL because of the LEFT JOIN, but I don't see how person0_.person_id can be NULL. Please check your logic.

how to change this mysql query to a efficient one

my table user contains these fields
id,company_id,created_by,name,image
table valet contains
id,vid,dept_id
table cart contains
id,dept_id,map_id,purchase,time
to get the details i have written this mysql query
SELECT c.id, a.id, c.purchace, c.time
FROM user a
LEFT JOIN valet b ON a.vid = b.id
AND a.is_deleted = 0
LEFT JOIN cart c ON b.dept_id = c.dept_id
WHERE a.company_id = 18
AND a.created_by = 102
AND a.is_deleted = 0
AND c.time
IN ( SELECT MAX( time ) FROM cart WHERE dept_id = b.dept_id )
from these three table i want to select last updated raw from cart along with id from user table which is mapped in valet table
this query works fine but it takes almost 15 sec to retrieve the details .
is there any way to improve this query or may be i am doing some wrong.
any help would be appreciated

For one thing, I can see that you’re running the subquery for each row. Depending on what the optimiser does, that may have an impact. max is a pretty expensive operation (there’s nothing for it but to read every row).
If you plan to update and use this query repeatedly, perhaps you should at least index the table on cart.time. This will make it much easier to find the maximum value.
MySQL has the concept of user variables, so you can set a variable to the result of the subquery, and that might help:
SELECT c.id, a.id, c.purchace, c.time
FROM
user a
LEFT JOIN valet b ON a.vid = b.id AND a.is_deleted = '0'
LEFT JOIN cart c ON b.dept_id = c.dept_id
LEFT JOIN (SELECT dept_id,max(time) as mx FROM cart GROUP BY dept_id) m on m.dept_id=c.dept_id
WHERE
a.company_id = '18'
AND a.created_by = '102'
AND a.is_deleted = '0'
AND c.time=m.mx;
Note also:
since you’re only testing a single value (max) for c.time, you should be using = not in.
I’m not sure about is why you are using strings instead of integers. I shold have though that leaving off the quotes makes more sense.
Your JOIN includes AND a.is_deleted = '0', though you make no mention of it in your table description. In any case, why is it in the JOIN and not in the WHERE clause?

Count, Group By, Subquery, Left Join not working as expected

This is puzzling me and no amount of the Google is helping me, hoping someone can point me in the right direction.
Please note that I have omitted some fields from the tables that don't relate to the question just to simplify things.
contacts
contact_id
name
email
contact_uuids
uuid
contact_id
visitor_activity
uuid
event
contact_communications
comm_id
contact_id
employee_id
Query
SELECT
`c`.*,
(SELECT COUNT(`log_id`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `num_comms`,
(SELECT MAX(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `latest_date`,
(SELECT MIN(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `first_date`,
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
FROM `contacts` `c`
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
GROUP BY `c`.`contact_id`
ORDER BY `c`.`name` ASC
Some contacts have multiple UUIDs (upwards of 20 or 30).
When I perform the query WITHOUT the GROUP BY statement, I get the results I expect - a row returned for each UUID that exists for that contact, with the correct "num_comms" and "num_act" numbers.
However when I add the GROUP BY statement, the "num_comms" is a lot smaller then expected and the "num_act" returns only the value from the first row without the GROUP BY statement.
I tried doing a "WHERE NOT IN" in the subquery, however that simply crashed the server as it was far too intense.
So - how do I get this to add up all the COUNT values from the LEFT JOIN and not just return the first value?
Also if anyone can help me optimize this that would be great.

Two problems:
GROUP BY c.contact_id does not include all the non-aggregate columns. This is a MySQL extension. What you get is random values for the rows other than contact_id
The JOIN adds confusion. Your only use for visitor_activity is the COUNT(*) one it. But that does not make sense since it is limited to one UUID, whereas the row is limited to one contact_id. Rethink the purpose of that.
Remove this line:
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
and the rest may work ok.
I will continue with the assumption that you want the COUNT of all rows in visitor_activity for all the uuids associated with the one contact_id.
See if this:
( SELECT COUNT(*)
FROM `contacts` c2
JOIN `visitor_activity` USING(uuid)
WHERE c2.contact_id = c.contact_id as `num_act` ) AS num_act
will work for the last subquery. At the same time, remove the JOIN:
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
Now, back to the other problem (the non-standard usage of GROUP BY). Assuming that contact_id is the PRIMARY KEY, then simply remove the
GROUP BY `c`.`contact_id`

Query worked fine on localhost but it's very slow on our server

We've tested with 1 million records on every table, results were fine, always under 0,08.
So we implemented on our server but it's very slow there, taking up to 36 secs.
We've asked for help before to optimize the query we were running on our test machine, we detailed the basic structure of our one to many relationship:
Problems to optimize large query and tables structure
That's the final query, the one we're using after getting help on the link above:
explain
SELECT
st.sid, st.title, st.summary, st.storynotes, st.thumb, st.completed, st.wordcount, st.rid, st.date, st.updated,
stats.total_reviews, stats.total_recommendations,
(SELECT GROUP_CONCAT(CAST(catid AS CHAR)) FROM fanfiction_stories_categories WHERE sid = st.sid) as categories,
(SELECT GROUP_CONCAT(CAST(genre_id AS CHAR)) FROM fanfiction_stories_genres WHERE sid = st.sid) as genres,
(SELECT GROUP_CONCAT(CAST(warning_id AS CHAR)) FROM fanfiction_stories_warnings WHERE sid = st.sid) as warnings
FROM
fanfiction_stories st
LEFT JOIN fanfiction_stories_stats stats ON st.sid = stats.sid
JOIN fanfiction_stories_categories cat ON st.sid = cat.sid AND cat.catid = 924
WHERE validated = 1
ORDER BY updated DESC
LIMIT 0, 15
That's the explain:
http://dl.dropbox.com/u/14508898/Printscreen/stackoverflow_explain_print_003.PNG
0 rows affected, 6 rows found. Duration for 1 query: 31,356 sec.
Updated
We removed some old indexes of the previous DB structure there was at fanfiction_stories and added new indexes to fanfiction_stories_categories, now is much faster. That's the updated explain:
http://dl.dropbox.com/u/14508898/Printscreen/stackoverflow_explain_print_004.PNG
Sorry, the program that I use only format the explain table as HTML, CSV, etc, doesn't make an ASCII table to display here.
Can we optimize it even more? Any help is very appreciated.

Hi There instead of a JOIN you might be better using an explicit INNER JOIN like:
It might also be all the GROUP_CONCAT's that you are doing, they are quite memory hungry.
SELECT
st.sid, st.title, st.summary, st.storynotes, st.thumb, st.completed, st.wordcount, st.rid, st.date, st.updated,
stats.total_reviews, stats.total_recommendations,
(SELECT GROUP_CONCAT(CAST(catid AS CHAR)) FROM fanfiction_stories_categories WHERE sid = st.sid) as categories,
(SELECT GROUP_CONCAT(CAST(genre_id AS CHAR)) FROM fanfiction_stories_genres WHERE sid = st.sid) as genres,
(SELECT GROUP_CONCAT(CAST(warning_id AS CHAR)) FROM fanfiction_stories_warnings WHERE sid = st.sid) as warnings
FROM
fanfiction_stories st
LEFT JOIN fanfiction_stories_stats stats ON st.sid = stats.sid
INNER JOIN fanfiction_stories_categories cat ON st.sid = cat.sid AND cat.catid = 924
WHERE validated = 1
ORDER BY updated DESC
LIMIT 0, 15

This should work although I don't have table structures and sample data to simulate. By removing each of the (SELECT ... ) as Column and just leaving as left joins, group the entire outer query by the sid should give the same result. I think its more efficient than each subquery AS Column than normal query/join. The Group_Concat is grouped based on the "sid" at the end anyway and should retain... The only thing that might be an issue is any NULL values at the end on these concat fields which you can then wrap with IFNULL() test.
I would ensure EACH of these tables has index on the "sid" used for the join. Additionally, your main stories table to have an index on Validated for its criteria = 1.
Based on your feedback, I would shift the criteria and first table to the top by categories.. Get ONE CATEGORY first, then see what stories are associated with it. Then, from only those stories, hook up the rest of the genre, warnings, comments, etc. You obviously have a smaller set of categories, so I would hit THAT as the primary table in the query. Let me know how this works.
SELECT STRAIGHT_JOIN
st.sid,
st.title,
st.summary,
st.storynotes,
st.thumb,
st.completed,
st.wordcount,
st.rid,
st.date,
st.updated,
stats.total_reviews,
stats.total_recommendations,
GROUP_CONCAT( DISTINCT cat.catid ) categories,
GROUP_CONCAT( DISTINCT genre.genre_id ) genres,
GROUP_CONCAT( DISTINCT warn.warning_id ) as warnings
FROM
fanfiction_stories_categories cat
JOIN fanfiction_stories st
ON cat.sid = st.sid
AND st.Validated = 1
LEFT JOIN fanfiction_stories_stats stats
ON st.sid = stats.sid
LEFT JOIN fanfiction_stories_genres genre
on st.sid = genre.sid
LEFT JOIN fanfiction_stories_warnings warn
on st.sid = warn.sid
WHERE
cat.catid = 924
group by
st.sid
ORDER BY
updated DESC
LIMIT
0, 15

How to optimize query looking for rows where conditional join rows do not exist?

I've got a table of keywords that I regularly refresh against a remote search API, and I have another table that gets a row each each time I refresh one of the keywords. I use this table to block multiple processes from stepping on each other and refreshing the same keyword, as well as stat collection. So when I spin up my program, it queries for all the keywords that don't have a request currently in process, and don't have a successful one within the last 15 mins, or whatever the interval is. All was working fine for awhile, but now the keywords_requests table has almost 2 million rows in it and things are bogging down badly. I've got indexes on almost every column in the keywords_requests table, but to no avail.
I'm logging slow queries and this one is taking forever, as you can see. What can I do?
# Query_time: 20 Lock_time: 0 Rows_sent: 568 Rows_examined: 1826718
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT JOIN `keywords_requests` as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
AND KeywordsRequest.created > FROM_UNIXTIME(1234551323)
)
WHERE KeywordsRequest.id IS NULL
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;

It seems your most selective index on Keywords is one on KeywordRequest.created.
Try to rewrite query this way:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` as kr
WHERE created > FROM_UNIXTIME(1234567890) /* Happy unix_time! */
) AS KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
)
WHERE keyword_id IS NULL;
It will (hopefully) hash join two not so large sources.
And Bill Karwin is right, you don't need the GROUP BY or ORDER BY
There is no fine control over the plans in MySQL, but you can try (try) to improve your query in the following ways:
Create a composite index on (keyword_id, status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
UNION
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
This ideally should use NESTED LOOPS on your index.
Create a composite index on (status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
UNION ALL
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
ON keyword_id = id
WHERE keyword_id IS NULL
This will hopefully use HASH JOIN on even more restricted hash table.

When diagnosing MySQL query performance, one of the first things you need to analyze is the report from EXPLAIN.
If you learn to read the information EXPLAIN gives you, then you can see where queries are failing to make use of indexes, or where they are causing expensive filesorts, or other performance red flags.
I notice in your query, the GROUP BY is irrelevant, since there will be only one NULL row returned from KeywordRequests. Also the ORDER BY is irrelevant, since you're ordering by a column that will always be NULL due to your WHERE clause. If you remove these clauses, you'll probably eliminate a filesort.
Also consider rewriting the query into other forms, and measure the performance of each. For example:
SELECT k.id, k.keyword
FROM `keywords` AS k
WHERE NOT EXISTS (
SELECT * FROM `keywords_requests` AS kr
WHERE kr.keyword_id = k.id
AND kr.status IN ('success', 'active')
AND kr.source_id = '29'
AND kr.created > FROM_UNIXTIME(1234551323)
);
Other tips:
Is kr.source_id an integer? If so, compare to the integer 29 instead of the string '29'.
Are there appropriate indexes on keyword_id, status, source_id, created? Perhaps even a compound index over all four columns would be best, since MySQL will use only one index per table in a given query.
You did a screenshot of your EXPLAIN output and posted a link in the comments. I see that the query is not using an index from Keywords, which makes sense since you're scanning every row in that table anyway. The phrase "Not exists" indicates that MySQL has optimized the LEFT OUTER JOIN a bit.
I think this should be improved over your original query. The GROUP BY/ORDER BY was probably causing it to save an intermediate data set as a temporary table, and sorting it on disk (which is very slow!). What you'd look for is "Using temporary; using filesort" in the Extra column of EXPLAIN information.
So you may have improved it enough already to mitigate the bottleneck for now.
I do notice that the possible keys probably indicate that you have individual indexes on four columns. You may be able to improve that by creating a compound index:
CREATE INDEX kr_cover ON keywords_requests
(keyword_id, created, source_id, status);
You can give MySQL a hint to use a specific index:
... FROM `keywords_requests` AS kr USE INDEX (kr_cover) WHERE ...

Dunno about MySQL but in MSSQL the lines of attack I would take are:
1) Create a covering index on KeywordsRequest status, source_id and created
2) UNION the results tog et around the OR on KeywordsRequest.status
3) Use NOT EXISTS instead o the Outer Join (and try with UNION instead of OR too)

Try this
SELECT Keyword.id, Keyword.keyword
FROM keywords as Keyword
LEFT JOIN (select * from keywords_requests where source_id = '29' and (status = 'success' OR status = 'active')
AND source_id = '29'
AND created > FROM_UNIXTIME(1234551323)
AND id IS NULL
) as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
)
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008