Why doesn't this query run? - mysql

I have this query that isn't finishing (I think the server runs out of memory)
SELECT fOpen.*, fClose.*
FROM (
SELECT of.*
FROM fixtures of
JOIN (
SELECT MIN(id) id
FROM fixtures
GROUP BY matchId, period, type
) ofi ON ofi.id = of.id
) fOpen
JOIN (
SELECT cf.*
FROM fixtures cf
JOIN (
SELECT MAX(id) id
FROM fixtures
GROUP BY matchId, period, type
) cfi ON cfi.id = cf.id
) fClose ON fClose.matchId = fOpen.matchId AND fClose.period = fOpen.period AND fClose.type = fOpen.type
This is the EXPLAIN of it:
Those 2 subqueries 'of' and 'cf' take about 1.5s to run, if I run them separately.
'id' is a PRIMARY INDEX and there is a BTREE INDEX named 'matchPeriodType' that has those 3 columns in that order.
More info: MySQL 5.5, 512MB of server memory, and the table has about 400k records.

I tried to rewrite your query, so that it is easier to read and should be able to use your indexes. Hope I got it right, could not test without your data.
SELECT fOpen.*, fClose.*
FROM (
SELECT MIN(id) AS min_id, MAX(id) AS max_id
FROM fixtures
GROUP BY matchId, period, type
) ids
JOIN fixtures fOpen ON ( fOpen.id = ids.min_id )
JOIN fixtures fClose ON ( fClose.id = ids.max_id );
This one gets MIN(id) and MAX(id) per matchId, period, type (should use your index) and joins the corresponding rows afterwards.
Appending id to your existing index matchPeriodType could also help, since the sub-query could then be executed with this index only.

Not sure how unique the matchid / period / type is. If unique you are joining 400k records against 400k records, possibly with the indexes being lost.
However it appears that the 2 main subselects might be unnecessary. You could just join fixtures against itself and that against the subselects to get the min and max.

Related

Why two mysql selects executed separately are much faster than one combined?

I want to understand the case when I run two queries separately i takes around 400ms in total, but when I combined them using sub-select it takes around 12 seconds.
I have two InnoDB tables:
event: 99 914 rows
even_prizes: 24 540 770 rows
Below are my queries:
SELECT
id
FROM
event e
WHERE
e.status != 'SCHEDULED';
-- takes 130ms, returns 2406 rows
SELECT
id, count(*)
FROM
event_prizes
WHERE event_id in (
-- 2406 ids returned from the previous query
)
GROUP BY
id;
-- takes 270ms, returns the same amount of rows
From the other side when I run the query from below:
SELECT
id, count(*)
FROM
event_prizes
WHERE event_id in (
SELECT
id
FROM
event e
WHERE
e.status != 'SCHEDULED'
)
GROUP BY
id;
-- takes 12seconds
I guess in the second case MySQL makes the full-scan of the event_prizes table ?
Is there any better way to create a single query for this case ?
You can use a INNER JOIN instead of a sub-select:
SELECT ep.id, COUNT(*)
FROM event_prizes ep INNER JOIN event e ON ep.event_id = e.id
WHERE e.status <> 'SCHEDULED'
GROUP BY ep.id
Make sure you are using
a PRIMARY KEY on event.id
a PRIMARY KEY on event_prizes.id
a FOREIGN KEY on event_prizes.event_id
You can also try the following indices at least:
event(status)

Extreme query optimization with IN clause and subquery

My table has more than 15 millions of rows just now.
I need to run such query:
SELECT ch1.* FROM citizens_dynamic ch1
WHERE ch1.id IN (4369943, ..., 4383420, 4383700)
AND ch1.update_id_to = (
SELECT MAX(ch2.update_id_to)
FROM citizens_dynamic ch2
WHERE ch1.id = ch2.id AND ch2.update_id_to < 812
)
Basically, for every citizen in IN clause it searches for a row with closest but lower than specified update_id_to.
There is PRIMARY key on 2 columns columns update_id_to, id.
At the moment, this query is executed in 0.9s (having 100 ids in IN clause).
It's still too slow, I would need to run my scripts for 3 days to complete.
Below you can see my EXPLAIN output.
id index is just like PRIMARY key, but with reversed columns: id, update_id_to
Do you have any ideas how to make it even faster?
I've found that MySQL tends to perform better with JOIN than correlated subqueries.
SELECT ch1.*
FROM citizens_dynamic AS ch1
JOIN (SELECT id, MAX(update_id_to) AS update_id_to
FROM citizens_dynamic
WHERE id IN (4369943, ..., 4383420, 4383700)
GROUP BY id) AS ch2
ON ch1.id = ch2.id
WHERE ch1.id IN (4369943, ..., 4383420, 4383700)
Also, see the other methods in this question:
Retrieving the last record in each group

Can't find a way to reduce 2 SQL queries to 1 without killing performance

I have the following SQL query which works absolutely fine:
SELECT COUNT(*), COUNT(DISTINCT `fk_match_id`)
FROM `pass`
WHERE `passer` IN ('48717','33305','49413','1640')
AND `receiver` IN ('48717','33305','49413','1640');
The numbers in the IN clause are player ID's, and can be obtained from another table in the database called player. Each row in this table has a player ID, a team_id and a match_id which is a foreign key to the match table.
I would like to automatically obtain those player ID's using the match_id. I can do this as follows:
SELECT COUNT(*), COUNT(DISTINCT `fk_match_id`)
FROM `pass`
WHERE `passer` IN
(
SELECT player_id
FROM `player`
WHERE `team_id` = someTeamID
AND `match_id` = someMatchID)
AND `receiver` IN
(
SELECT player_id
FROM `player`
WHERE `team_id` = someTeamID
AND `match_id` = someMatchID
)
)
However, apparentyly using subqueries is infamously slow and indeed, it's far too slow to use. Even using join, as follows, is far too slow:
SELECT COUNT(*), COUNT(DISTINCT `fk_match_id`)
from `pass` st1
INNER JOIN `player` st2
ON (st1.passer = st2.player_id OR st1.receiver = st2.player_id);
That too, is far too slow. So want to know if it is possible to do what I can do in 2 queries in effectively 0.0 seconds (fetching the players id's in one query and then running the first query takes virtually no time at all) in just one query, or if that is completely impossible.
Any help would be greatly appreciated.
Thanks a lot!
EDIT::
The relevant table structures are as follows:
Player:
Pass:
I want to calculate the number of passes every player has made to another player in a given line up in history. I have a match id and a team id. I can obtain the players involved in a particular match for a team by querying the player table:
SELECT player_id
FROM `player`
WHERE `team_id` = someTeamID
AND `match_id` = someMatchID
This returns something like:
1803,1930,13310,1764,58845,15157,51938,2160,18892,12002,4101,14668,80979,59013
I then want to query the pass table and return every row where one of those id's is in the passer and the receiver columns.
You need a composite index on (passer, receiver):
After adding it, try the JOIN:
SELECT COUNT(*), COUNT(DISTINCT fk_match_id)
FROM pass
INNER JOIN player AS p
ON pass.passer = p.player_id
INNER JOIN player AS r
ON r.player_id = pass.passer ;
If you want these results for a specific (team_id, match_id) combination, add an (team_id, match_id, player_id) index and then use:
SELECT COUNT(*), COUNT(DISTINCT fk_match_id)
FROM pass
INNER JOIN player AS p
ON p.team_id = someTeamID
AND p.match_id` = someMatchID
AND p.player_id = pass.passer
INNER JOIN player AS r
ON r.team_id = someTeamID
AND r.match_id` = someMatchID
AND r.player_id = pass.receiver ;

How to use actual row count (COUNT(*)) in WHERE clause without writing the same query as subquery?

I have something like this:
SELECT id, fruit, pip
FROM plant
WHERE COUNT(*) = 2;
This weird query is self explanatory I guess. COUNT(*) here means the number of rows in plant table. My requirement is that I need to retrieve values from specified fields only if total number of rows in table = 2. This doesn't work but: invalid use of aggregate function COUNT.
I cannot do this:
SELECT COUNT(*) as cnt, id, fruit, pip
FROM plant
WHERE cnt = 2;
for one, it limits the number of rows outputted to 1, and two, it gives the same error: invalid use of aggregate function.
What I can do is instead:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
But then that subquery is the main query re-run. I'm presenting here a small example of the larger part of the problem, though I know an additional COUNT(*) subquery in the given example isn't that big an overhead.
Edit: I do not know why the question is downvoted. The COUNT(*) I'm trying to get is from a view (a temporary table) in the query which is a large query with 5 to 6 joins and additional where clauses. To re-run the query as a subquery to get the count is inefficient, and I can see the bottleneck as well.
Here is the actual query:
SELECT U.UserName, E.Title, AE.Mode, AE.AttemptNo,
IF(AE.Completed = 1, 'Completed', 'Incomplete'),
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
) AS Inst_Count,
(
SELECT COUNT(DISTINCT(AQ.FK_QId))
FROM attempt_question AS AQ
JOIN `question` AS Q
ON Q.PK_Id = AQ.FK_QId
LEFT JOIN actions AS A
ON A.FK_QId = AQ.FK_QId
WHERE AQ.FK_ExcAttemptId = #excAttemptId
AND (
Q.Type = #descQtn
OR Q.Type = #actQtn
AND A.type = 'CTVI.NotImplemented'
AND A.IsDelete = #status
AND (
SELECT COUNT(*)
FROM actions
WHERE FK_QId = A.FK_QId
AND type != 'CTVI.NotImplemented'
AND IsDelete = #status
) = 0
)
) AS NotEvalInst_Count,
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
AND Mark = #mark
) AS CorrectAns_Count,
E.AllottedTime, AE.TimeTaken
FROM attempt_exercise AS AE
JOIN ctvi_exercise_tblexercise AS E
ON AE.FK_EId = E.PK_EId
JOIN ctvi_user_table AS U
ON AE.FK_UId = U.PK_Id
JOIN ctvi_grade AS G
ON AE.FK_GId = G.PK_GId
WHERE AE.PK_Id = #excAttemptId
-- AND COUNT(AE.*) = #number --the portion in contention.
Kindly ignore the above query and guide me to right direction from the small example query I posted, thanks.
In MySQL, you can only do what you tried:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
or this variation:
SELECT id, fruit, pip
FROM plant
JOIN
(
SELECT COUNT(*) AS cnt
FROM plant
) AS c
ON c.cnt = 2;
Whether the 1st or the 2nd is more efficient, depends on the version of MySQL (and the optimizer). I would bet on the 2nd one, on most versions.
In other DBMSs, that have window functions, you can also do the first query that #Andomar suggests.
Here is a suggestion to avoid the bottleneck of calculating the derived table twice, once to get the rows and once more to get the count. If the derived table is expensive to be calculated, and its rows are thousands or millions, calculating them twice only to throw them away, is a problem, indeed. This may improve efficiency as it will limit the intermediately (twice) calculated rows to 3:
SELECT p.*
FROM
( SELECT id, fruit, pip
FROM plant
LIMIT 3
) AS p
JOIN
( SELECT COUNT(*) AS cnt
FROM
( SELECT 1
FROM plant
LIMIT 3
) AS tmp
) AS c
ON c.cnt = 2 ;
After re-reading your question, you're trying to return rows only if there are 2 rows in the entire table. In that case I think your own example query is already the best.
On another DBMS, you could use a Windowing function:
select *
from (
select *
, count(*) over () as cnt
from plant
) as SubQueryAlias
where cnt = 2
But the over clause is not supported on MySQL.
old wrong anser below
The where clause works before grouping. It works on single rows, not groups of rows, so you can't use aggregates like count or max in the where clause.
To set filters that work on groups of rows, use the having clause. It works after grouping and can be used to filter with aggregates:
SELECT id, fruit, pip
FROM plant
GROUP BY
id, fruit, pip
HAVING COUNT(*) = 2;
The other answers do not fulfill the original question which was to filter the results "without using a subquery".
You can actually do this by using a variable in 2 consecutive MySQL statements:
SET #count=0;
SELECT * FROM
(
SELECT id, fruit, pip, #count:=#count+1 AS count
FROM plant
WHERE
) tmp
WHERE #count = 2;

How to optimize query looking for rows where conditional join rows do not exist?

I've got a table of keywords that I regularly refresh against a remote search API, and I have another table that gets a row each each time I refresh one of the keywords. I use this table to block multiple processes from stepping on each other and refreshing the same keyword, as well as stat collection. So when I spin up my program, it queries for all the keywords that don't have a request currently in process, and don't have a successful one within the last 15 mins, or whatever the interval is. All was working fine for awhile, but now the keywords_requests table has almost 2 million rows in it and things are bogging down badly. I've got indexes on almost every column in the keywords_requests table, but to no avail.
I'm logging slow queries and this one is taking forever, as you can see. What can I do?
# Query_time: 20 Lock_time: 0 Rows_sent: 568 Rows_examined: 1826718
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT JOIN `keywords_requests` as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
AND KeywordsRequest.created > FROM_UNIXTIME(1234551323)
)
WHERE KeywordsRequest.id IS NULL
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;
It seems your most selective index on Keywords is one on KeywordRequest.created.
Try to rewrite query this way:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` as kr
WHERE created > FROM_UNIXTIME(1234567890) /* Happy unix_time! */
) AS KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
)
WHERE keyword_id IS NULL;
It will (hopefully) hash join two not so large sources.
And Bill Karwin is right, you don't need the GROUP BY or ORDER BY
There is no fine control over the plans in MySQL, but you can try (try) to improve your query in the following ways:
Create a composite index on (keyword_id, status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
UNION
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
This ideally should use NESTED LOOPS on your index.
Create a composite index on (status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
UNION ALL
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
ON keyword_id = id
WHERE keyword_id IS NULL
This will hopefully use HASH JOIN on even more restricted hash table.
When diagnosing MySQL query performance, one of the first things you need to analyze is the report from EXPLAIN.
If you learn to read the information EXPLAIN gives you, then you can see where queries are failing to make use of indexes, or where they are causing expensive filesorts, or other performance red flags.
I notice in your query, the GROUP BY is irrelevant, since there will be only one NULL row returned from KeywordRequests. Also the ORDER BY is irrelevant, since you're ordering by a column that will always be NULL due to your WHERE clause. If you remove these clauses, you'll probably eliminate a filesort.
Also consider rewriting the query into other forms, and measure the performance of each. For example:
SELECT k.id, k.keyword
FROM `keywords` AS k
WHERE NOT EXISTS (
SELECT * FROM `keywords_requests` AS kr
WHERE kr.keyword_id = k.id
AND kr.status IN ('success', 'active')
AND kr.source_id = '29'
AND kr.created > FROM_UNIXTIME(1234551323)
);
Other tips:
Is kr.source_id an integer? If so, compare to the integer 29 instead of the string '29'.
Are there appropriate indexes on keyword_id, status, source_id, created? Perhaps even a compound index over all four columns would be best, since MySQL will use only one index per table in a given query.
You did a screenshot of your EXPLAIN output and posted a link in the comments. I see that the query is not using an index from Keywords, which makes sense since you're scanning every row in that table anyway. The phrase "Not exists" indicates that MySQL has optimized the LEFT OUTER JOIN a bit.
I think this should be improved over your original query. The GROUP BY/ORDER BY was probably causing it to save an intermediate data set as a temporary table, and sorting it on disk (which is very slow!). What you'd look for is "Using temporary; using filesort" in the Extra column of EXPLAIN information.
So you may have improved it enough already to mitigate the bottleneck for now.
I do notice that the possible keys probably indicate that you have individual indexes on four columns. You may be able to improve that by creating a compound index:
CREATE INDEX kr_cover ON keywords_requests
(keyword_id, created, source_id, status);
You can give MySQL a hint to use a specific index:
... FROM `keywords_requests` AS kr USE INDEX (kr_cover) WHERE ...
Dunno about MySQL but in MSSQL the lines of attack I would take are:
1) Create a covering index on KeywordsRequest status, source_id and created
2) UNION the results tog et around the OR on KeywordsRequest.status
3) Use NOT EXISTS instead o the Outer Join (and try with UNION instead of OR too)
Try this
SELECT Keyword.id, Keyword.keyword
FROM keywords as Keyword
LEFT JOIN (select * from keywords_requests where source_id = '29' and (status = 'success' OR status = 'active')
AND source_id = '29'
AND created > FROM_UNIXTIME(1234551323)
AND id IS NULL
) as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
)
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;