I have the following query…
SELECT DISTINCT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
GROUP BY post_post_id
UNION
SELECT DISTINCT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id )
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
AND post_updated >:updated
GROUP BY post_post_id ORDER BY post_posted_date DESC LIMIT :limit
Where :id = 7, :updated = 0.0 and :limit=40 for example
My issue is that the query is taking about a minute to return results. Is there anything in this query that I can do to speed up the result?
I am using RDS
********EDIT*********
I was asked to run the query with an EXPLAIN the result is below
********EDIT**********
View Definitition
CREATE ALGORITHM=UNDEFINED DEFINER=`MySQLUSer`#`%` SQL SECURITY DEFINER VIEW `vPAS_Posts_Users`
AS SELECT
`PAS_User`.`user_user_id` AS `user_user_id`,
`PAS_User`.`user_country` AS `user_country`,
`PAS_User`.`user_city` AS `user_city`,
`PAS_User`.`user_company` AS `user_company`,
`PAS_User`.`user_account_type` AS `user_account_type`,
`PAS_User`.`user_account_premium` AS `user_account_premium`,
`PAS_User`.`user_sign_up_date` AS `user_sign_up_date`,
`PAS_User`.`user_first_name` AS `user_first_name`,
`PAS_User`.`user_last_name` AS `user_last_name`,
`PAS_User`.`user_avatar_url` AS `user_avatar_url`,
`PAS_User`.`user_cover_image_url` AS `user_cover_image_url`,
`PAS_User`.`user_bio` AS `user_bio`,
`PAS_User`.`user_telephone` AS `user_telephone`,
`PAS_User`.`user_dob` AS `user_dob`,
`PAS_User`.`user_sector` AS `user_sector`,
`PAS_User`.`user_job_type` AS `user_job_type`,
`PAS_User`.`user_unique` AS `user_unique`,
`PAS_User`.`user_deleted` AS `user_deleted`,
`PAS_User`.`user_updated` AS `user_updated`,
`PAS_Post`.`post_post_id` AS `post_post_id`,
`PAS_Post`.`post_language_id` AS `post_language_id`,
`PAS_Post`.`post_type` AS `post_type`,
`PAS_Post`.`post_promoted` AS `post_promoted`,
`PAS_Post`.`post_user_id` AS `post_user_id`,
`PAS_Post`.`post_posted_date` AS `post_posted_date`,
`PAS_Post`.`post_latitude` AS `post_latitude`,
`PAS_Post`.`post_longitude` AS `post_longitude`,
`PAS_Post`.`post_location_name` AS `post_location_name`,
`PAS_Post`.`post_text` AS `post_text`,
`PAS_Post`.`post_media_url` AS `post_media_url`,
`PAS_Post`.`post_image_height` AS `post_image_height`,
`PAS_Post`.`post_link` AS `post_link`,
`PAS_Post`.`post_link_title` AS `post_link_title`,
`PAS_Post`.`post_unique` AS `post_unique`,
`PAS_Post`.`post_deleted` AS `post_deleted`,
`PAS_Post`.`post_updated` AS `post_updated`,
`PAS_Post`.`post_original_post_id` AS `post_original_post_id`,
`PAS_Post`.`post_original_type` AS `post_original_type`,
`PAS_Post`.`post_passed_on_by` AS `post_passed_on_by`,
`PAS_Post`.`post_passed_on_caption` AS `post_passed_on_caption`,
`PAS_Post`.`post_passed_on_fullname` AS `post_passed_on_fullname`,
`PAS_Post`.`post_passed_on_avatar_url` AS `post_passed_on_avatar_url`
FROM (`PAS_User` join `PAS_Post` on((`PAS_User`.`user_user_id` = `PAS_Post`.`post_user_id`)));
try this query:
SELECT *
FROM
vPAS_Posts_Users
WHERE
post_user_id =:id
AND post_type != 4
AND post_updated > :updated
UNION
SELECT u.*
FROM vPAS_Posts_Users u
JOIN PAS_Follow f ON f.folw_followed_user_id = u.post_user_id
WHERE
u.post_updated > :updated
AND ( (f.folw_follower_user_id = :id AND f.folw_deleted = 0)
OR (u.post_type = 4 AND u.post_passed_on_by = f.folw_follower_user_id AND u.post_user_id != :id)
)
ORDER BY u.post_posted_date DESC;
LIMIT :limit
Other improvements
Indices:
Be sure you have indices on the following columns:
PAS_User.user_user_id
PAS_Post.post_user_id
PAS_Post.post_type
PAS_Post.post_updated
PAS_Follow.folw_followed_user_id
PAS_Follow.folw_deleted
PAS_Post.post_passed_on_by
After that is done, please 1- check the performance again (SQL_NO_CACHE) and 2- extract another explain plan so we can adjust the query.
EXPLAIN Results
Here are the some suggestions for the query and view first of all using the UNION for the two result sets which might makes your query to work slow instead you can use the UNION ALL
Why i am referring you to use UNION ALL
Reason is both UNION ALL and UNION use temporary table for result generation.The difference in execution speed comes from the fact UNION requires internal temporary table with index (to skip duplicate rows) while UNION ALL will create table without such index.This explains the slight performance improvement when using UNION ALL.
UNION on its own will remove any duplicate records so no need to use the DISTINCT clause, try to only one GROUP BY of the whole result set by subqueries this will also minimize the execution time rather then grouping results in each subquery.
Make sure you have added the right indexes on the columns especially the columns used in the WHERE,ORDER BY, GROUP BY, the data types should be appropriate for each column with respect to the nature of data in it like post_posted_date should be datetime,date with an index also.
Here is the rough idea for the query
SELECT q.* FROM (
SELECT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
UNION ALL
SELECT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id
AND vPAS_Posts_Users.post_updated >:updated)
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
) q
GROUP BY q.post_post_id ORDER BY q.post_posted_date DESC LIMIT :limit
References
Difference Between Union vs. Union All – Optimal Performance Comparison
Optimize Mysql Union
MySQL Performance Blog
From your explain I can see that most of your table don't have any key except for the primary one, I would suggest you to add some extra key on the columns you're going to join, for example on: PAS_Follow.folw_followed_user_id and vPAS_Posts_Users.post_user_id, just this will result in a big performance boost.
Bye,
Gnagno
Related
I have a relatively basic query that fetches the most recent messages per conversation:
SELECT `message`.`conversation_id`, MAX(`message`.`add_time`) AS `max_add_time`
FROM `message`
LEFT JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE ((`conversation`.`receiver_user_id` = 1 AND `conversation`.`status` != -2)
OR (`conversation`.`sender_user_id` = 1 AND `conversation`.`status` != -1))
GROUP BY `conversation_id`
ORDER BY `max_add_time` DESC
LIMIT 12
The message table contains more than 911000 records, the conversation table contains around 680000. The execution time for this query, varies between 4 and 10 seconds, depending on the load on the server. Which is far too long.
Below is a screenshot of the EXPLAIN result:
The cause is apparently the MAX and/or the GROUP BY, because the following similar query only takes 10ms:
SELECT COUNT(*)
FROM `message`
LEFT JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE (`message`.`status`=0)
AND (`message`.`user_id` <> 1)
AND ((`conversation`.`sender_user_id` = 1 OR `conversation`.`receiver_user_id` = 1))
The corresponding EXPLAIN result:
I have tried adding different indices to both tables without any improvement, for example: conv_msg_idx(add_time, conversation_id) on message which seems to be used according to the first EXPLAIN result, however the query still takes around 10 seconds to execute.
Any help improving the indices or query to get the execution time down would be greatly appreciated.
EDIT:
I have changed the query to use an INNER JOIN:
SELECT `message`.`conversation_id`, MAX(`message`.`add_time`) AS `max_add_time`
FROM `message`
INNER JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE ((`conversation`.`receiver_user_id` = 1 AND `conversation`.`status` != -2)
OR (`conversation`.`sender_user_id` = 1 AND `conversation`.`status` != -1))
GROUP BY `conversation_id`
ORDER BY `max_add_time` DESC
LIMIT 12
But the execution time is still ~ 6 seconds.
You should create Multiple-Column Index on the columns which are in your WHERE clause and which you want to SELECT (except conversation_id). (reference)
conversation_id should be an index in both table.
Try to avoid 'Or' in Sql query this will make the fetching slow. Instead use union or any other methods.
SELECT message.conversation_id, MAX(message.add_time) AS max_add_time FROM message INNER JOIN conversation ON message.conversation_id = conversation.id WHERE (conversation.sender_user_id = 1 AND conversation.status != -1)) GROUP BY conversation_id
union
SELECT message.conversation_id, MAX(message.add_time) AS max_add_time FROM message INNER JOIN conversation ON message.conversation_id = conversation.id WHERE ((conversation.receiver_user_id = 1 AND conversation.status != -2) ) GROUP BY conversation_id ORDER BY max_add_time DESC LIMIT 12
Instead of depending on a single table message, have two tables: One for message, as you have, plus another thread that keeps the status of the thread of messages.
Yes, that requires a little more work when adding a new message -- update a column or two in thread.
But it eliminates the GROUP BY and MAX that are causing grief in this query.
While doing this split, see if some other columns would be better off in the new table.
SELECT `message`.`conversation_id`, MAX(`message`.`add_time`) AS `max_add_time`
FROM `message`
INNER JOIN `conversation` ON `message`.`conversation_id` = `conversation`.`id`
WHERE ((`conversation`.`receiver_user_id` = 1 AND `conversation`.`status` != -2)
OR (`conversation`.`sender_user_id` = 1 AND `conversation`.`status` != -1))
GROUP BY `conversation_id`
ORDER BY `max_add_time` DESC
LIMIT 12
You can try with INNER JOIN, if your logic not get affect using it.
you can modify this query by avoiding max() use
select * from(
select row_number() over(partition by conversation_id order by add_time desc)p1
)t1 where t1.p1=1
The following sort of the queries are running on the server which uses the derived table and subquery. The constraint is that the subqueries are generated from the multiple modules based on the current situation so cannot really convert it into the join combination.
Please suggest the possible solution to optimize the query
SELECT COUNT(1)
AS total
FROM member tlb_m
where tlb_m.active = 1
and tlb_m.rank > 0
and tlb_m.member_id not in (5735,134,241,1055,348,272,476,43,7,804,7548,90,229,346,40895)
and tlb_m.type = 'M'
and (tlb_m.hometown_list_id in
(SELECT l2.list_id
FROM ((
SELECT t12.list_id
from list_tree_idx t12
INNER JOIN list_tree_idx t11
ON t12.list_parent_id=t11.list_id
where t11.list_parent_id='205546'
) UNION ALL (
SELECT list_id
from list_tree_idx
where list_parent_id='205546'
) ) as l2
) or tlb_m.hometown_list_id = 205546
)
I would suggest to use a closure table for optimal hierarchical queries.
For example, having a closure table with columns ANCESTOR_ID, CHILD_ID and DEPTH your query will look like this
SELECT COUNT(1) AS total
FROM member AS tlb_m
LEFT JOIN hometown_closure AS c ON c.child_id = tlb_m.hometown_list_id
where tlb_m.active = 1
and tlb_m.rank > 0
and tlb_m.member_id not in (5735,134,241,1055,348,272,476,43,7,804,7548,90,229,346,40895)
and tlb_m.type = 'M'
and c.ancestor_id = 205546
I have something like this:
SELECT id, fruit, pip
FROM plant
WHERE COUNT(*) = 2;
This weird query is self explanatory I guess. COUNT(*) here means the number of rows in plant table. My requirement is that I need to retrieve values from specified fields only if total number of rows in table = 2. This doesn't work but: invalid use of aggregate function COUNT.
I cannot do this:
SELECT COUNT(*) as cnt, id, fruit, pip
FROM plant
WHERE cnt = 2;
for one, it limits the number of rows outputted to 1, and two, it gives the same error: invalid use of aggregate function.
What I can do is instead:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
But then that subquery is the main query re-run. I'm presenting here a small example of the larger part of the problem, though I know an additional COUNT(*) subquery in the given example isn't that big an overhead.
Edit: I do not know why the question is downvoted. The COUNT(*) I'm trying to get is from a view (a temporary table) in the query which is a large query with 5 to 6 joins and additional where clauses. To re-run the query as a subquery to get the count is inefficient, and I can see the bottleneck as well.
Here is the actual query:
SELECT U.UserName, E.Title, AE.Mode, AE.AttemptNo,
IF(AE.Completed = 1, 'Completed', 'Incomplete'),
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
) AS Inst_Count,
(
SELECT COUNT(DISTINCT(AQ.FK_QId))
FROM attempt_question AS AQ
JOIN `question` AS Q
ON Q.PK_Id = AQ.FK_QId
LEFT JOIN actions AS A
ON A.FK_QId = AQ.FK_QId
WHERE AQ.FK_ExcAttemptId = #excAttemptId
AND (
Q.Type = #descQtn
OR Q.Type = #actQtn
AND A.type = 'CTVI.NotImplemented'
AND A.IsDelete = #status
AND (
SELECT COUNT(*)
FROM actions
WHERE FK_QId = A.FK_QId
AND type != 'CTVI.NotImplemented'
AND IsDelete = #status
) = 0
)
) AS NotEvalInst_Count,
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
AND Mark = #mark
) AS CorrectAns_Count,
E.AllottedTime, AE.TimeTaken
FROM attempt_exercise AS AE
JOIN ctvi_exercise_tblexercise AS E
ON AE.FK_EId = E.PK_EId
JOIN ctvi_user_table AS U
ON AE.FK_UId = U.PK_Id
JOIN ctvi_grade AS G
ON AE.FK_GId = G.PK_GId
WHERE AE.PK_Id = #excAttemptId
-- AND COUNT(AE.*) = #number --the portion in contention.
Kindly ignore the above query and guide me to right direction from the small example query I posted, thanks.
In MySQL, you can only do what you tried:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
or this variation:
SELECT id, fruit, pip
FROM plant
JOIN
(
SELECT COUNT(*) AS cnt
FROM plant
) AS c
ON c.cnt = 2;
Whether the 1st or the 2nd is more efficient, depends on the version of MySQL (and the optimizer). I would bet on the 2nd one, on most versions.
In other DBMSs, that have window functions, you can also do the first query that #Andomar suggests.
Here is a suggestion to avoid the bottleneck of calculating the derived table twice, once to get the rows and once more to get the count. If the derived table is expensive to be calculated, and its rows are thousands or millions, calculating them twice only to throw them away, is a problem, indeed. This may improve efficiency as it will limit the intermediately (twice) calculated rows to 3:
SELECT p.*
FROM
( SELECT id, fruit, pip
FROM plant
LIMIT 3
) AS p
JOIN
( SELECT COUNT(*) AS cnt
FROM
( SELECT 1
FROM plant
LIMIT 3
) AS tmp
) AS c
ON c.cnt = 2 ;
After re-reading your question, you're trying to return rows only if there are 2 rows in the entire table. In that case I think your own example query is already the best.
On another DBMS, you could use a Windowing function:
select *
from (
select *
, count(*) over () as cnt
from plant
) as SubQueryAlias
where cnt = 2
But the over clause is not supported on MySQL.
old wrong anser below
The where clause works before grouping. It works on single rows, not groups of rows, so you can't use aggregates like count or max in the where clause.
To set filters that work on groups of rows, use the having clause. It works after grouping and can be used to filter with aggregates:
SELECT id, fruit, pip
FROM plant
GROUP BY
id, fruit, pip
HAVING COUNT(*) = 2;
The other answers do not fulfill the original question which was to filter the results "without using a subquery".
You can actually do this by using a variable in 2 consecutive MySQL statements:
SET #count=0;
SELECT * FROM
(
SELECT id, fruit, pip, #count:=#count+1 AS count
FROM plant
WHERE
) tmp
WHERE #count = 2;
In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.
I've got a table of keywords that I regularly refresh against a remote search API, and I have another table that gets a row each each time I refresh one of the keywords. I use this table to block multiple processes from stepping on each other and refreshing the same keyword, as well as stat collection. So when I spin up my program, it queries for all the keywords that don't have a request currently in process, and don't have a successful one within the last 15 mins, or whatever the interval is. All was working fine for awhile, but now the keywords_requests table has almost 2 million rows in it and things are bogging down badly. I've got indexes on almost every column in the keywords_requests table, but to no avail.
I'm logging slow queries and this one is taking forever, as you can see. What can I do?
# Query_time: 20 Lock_time: 0 Rows_sent: 568 Rows_examined: 1826718
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT JOIN `keywords_requests` as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
AND KeywordsRequest.created > FROM_UNIXTIME(1234551323)
)
WHERE KeywordsRequest.id IS NULL
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;
It seems your most selective index on Keywords is one on KeywordRequest.created.
Try to rewrite query this way:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` as kr
WHERE created > FROM_UNIXTIME(1234567890) /* Happy unix_time! */
) AS KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
)
WHERE keyword_id IS NULL;
It will (hopefully) hash join two not so large sources.
And Bill Karwin is right, you don't need the GROUP BY or ORDER BY
There is no fine control over the plans in MySQL, but you can try (try) to improve your query in the following ways:
Create a composite index on (keyword_id, status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
UNION
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
This ideally should use NESTED LOOPS on your index.
Create a composite index on (status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
UNION ALL
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
ON keyword_id = id
WHERE keyword_id IS NULL
This will hopefully use HASH JOIN on even more restricted hash table.
When diagnosing MySQL query performance, one of the first things you need to analyze is the report from EXPLAIN.
If you learn to read the information EXPLAIN gives you, then you can see where queries are failing to make use of indexes, or where they are causing expensive filesorts, or other performance red flags.
I notice in your query, the GROUP BY is irrelevant, since there will be only one NULL row returned from KeywordRequests. Also the ORDER BY is irrelevant, since you're ordering by a column that will always be NULL due to your WHERE clause. If you remove these clauses, you'll probably eliminate a filesort.
Also consider rewriting the query into other forms, and measure the performance of each. For example:
SELECT k.id, k.keyword
FROM `keywords` AS k
WHERE NOT EXISTS (
SELECT * FROM `keywords_requests` AS kr
WHERE kr.keyword_id = k.id
AND kr.status IN ('success', 'active')
AND kr.source_id = '29'
AND kr.created > FROM_UNIXTIME(1234551323)
);
Other tips:
Is kr.source_id an integer? If so, compare to the integer 29 instead of the string '29'.
Are there appropriate indexes on keyword_id, status, source_id, created? Perhaps even a compound index over all four columns would be best, since MySQL will use only one index per table in a given query.
You did a screenshot of your EXPLAIN output and posted a link in the comments. I see that the query is not using an index from Keywords, which makes sense since you're scanning every row in that table anyway. The phrase "Not exists" indicates that MySQL has optimized the LEFT OUTER JOIN a bit.
I think this should be improved over your original query. The GROUP BY/ORDER BY was probably causing it to save an intermediate data set as a temporary table, and sorting it on disk (which is very slow!). What you'd look for is "Using temporary; using filesort" in the Extra column of EXPLAIN information.
So you may have improved it enough already to mitigate the bottleneck for now.
I do notice that the possible keys probably indicate that you have individual indexes on four columns. You may be able to improve that by creating a compound index:
CREATE INDEX kr_cover ON keywords_requests
(keyword_id, created, source_id, status);
You can give MySQL a hint to use a specific index:
... FROM `keywords_requests` AS kr USE INDEX (kr_cover) WHERE ...
Dunno about MySQL but in MSSQL the lines of attack I would take are:
1) Create a covering index on KeywordsRequest status, source_id and created
2) UNION the results tog et around the OR on KeywordsRequest.status
3) Use NOT EXISTS instead o the Outer Join (and try with UNION instead of OR too)
Try this
SELECT Keyword.id, Keyword.keyword
FROM keywords as Keyword
LEFT JOIN (select * from keywords_requests where source_id = '29' and (status = 'success' OR status = 'active')
AND source_id = '29'
AND created > FROM_UNIXTIME(1234551323)
AND id IS NULL
) as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
)
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;