MySQL Query optimization with JOIN and COUNT - mysql

I have the following MySQL Query:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
The idea being to select a release and count how many other sites had also released it prior to the recorded date, to get the position.
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 498422 Using temporary; Using filesort
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 91661
2 DERIVED tracers index NULL releaseid 4 NULL 498422
Any suggestions on how to eliminate the Using temporary; Using filesort ? It takes a loooong time. The indexes I have thought of and tried haven't helped anything.

Try adding an index on tracers.releaseid and one on tracers.date

make sure you have an index on releaseid.
flip your JOIN, the sub-query must be on the left side in the LEFT JOIN.
put the ORDER BY and LIMIT clauses inside the sub-query.

Try having two indices, one on (date) and one on (releaseid, date).
Another thing is that your query does not seem to be doing what you describe it does. Does it actually count correctly?
Try rewriting it as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, COUNT(*) AS pos
FROM tracers AS t1
JOIN tracers AS t2
ON t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
GROUP BY t1.releaseid
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
or as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, ( SELECT COUNT(*)
FROM tracers AS t2
WHERE t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
) AS pos
FROM tracers AS t1
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100

This answer below maybe not change explain output, however if your major problem is sorting data, which it identified by removing order clause will makes your query run faster, try to sort your subquery join table first and your query will be:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
ORDER BY `pos` DESC -- additional order
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
Note: My db version is mysql-5.0.96-x64, maybe in another version you get different result.

Related

MySql is null vs is not null performance

I have a query where I am basically doing a left outer join and checking if the joined value is null
select count(T1.code)
from ( select code
from asset
where type = 'meter'
and creation_time <= '2022-04-29 00:00:00'
and (deactivation_time > '2022-04-28 00:00:00' or deactivation_time is null )
group by code
) as T1
left join ( select asset_code
from amr_midnight_data
where server_time between '2022-04-28 00:00:00' and '2022-04-29 00:00:00'
group by asset_code
) as T2 on T1.code = T2.asset_code
Where T2.asset_code is null;
This query takes 3 seconds to execute, but if I replace the is null at the end with is not null, it takes less then a second. Why is there a performance difference here and what alternatives do I have to make my original query faster?
Look at the EXPLAIN. A guess... Changing to IS NOT NULL lets the Optimizer change LEFT JOIN to JOIN, which lets it start with amr_midnight_data which might optimize better.
I think that the LEFT JOIN ( SELECT ... ) .. IS [NOT] NULL can be replaced with
WHERE [NOT] EXISTS ( SELECT 1 FROM amr_midnight_data
WHERE asset_code = T1.code
AND server_time >= '2022-04-28'
AND server_time < '2022-04-28' + INTERVAL 1 DAY )
That would like to have INDEX(asset_code, server_time)
EXISTS is faster than SELECT .. GROUP BY because it can stop as soon as one matching row is found.
asset would probably benefit from INDEX(type, creation_time) or (to make it "covering"):
INDEX(time, creation_time, deactivation_time, code)
If you wish to discuss further, please provide SHOW CREATE TABLE for both tables and EXPLAIN for each SELECT.

MySQL query is slow - difference in successive dates at the group level

Below is my MySQL query to find the difference between successive date for each account and then using the results to prepare a frequency count table. This query is of course very slow but before that am I doing the right thing? Please help if you can. Also embedded is a small data sample.
Appreciate your time.
OZooHA
ID DATE
403 2008-06-01
403 2012-06-01
403 2011-06-01
403 2010-06-01
403 2009-06-01
15028 2011-07-01
15028 2010-07-01
15028 2009-07-01
15028 2008-07-01
SELECT
month_diff,
count(*)
FROM
(SELECT t1.id,
t1.date,
MIN(t2.date) AS lag_date,
TIMESTAMPDIFF(MONTH, t1.date, MIN(t2.date)) AS month_diff
FROM tbl_name T1
INNER JOIN tbl_name T2
ON t1.id = t2.id
AND t2.date > t1.date
GROUP BY t1.id, t1.date
ORDER BY t1.id, t1.date
)
GROUP BY month_diff
ORDER BY month_diff
Likely, materializing the inline view is taking most of the time. Ensure you have suitable indexes available to improve performance of the join operation; a covering index ON tbl_name (id, date) would likely be optimal for this query.
With a suitable index available (as above) it may be possible to get better performance with a query something like this:
SELECT d.month_diff
, COUNT(*)
FROM ( SELECT IF(#prev_id = t.id
, TIMESTAMPDIFF(MONTH, t.date, #prev_date )
, NULL
) AS month_diff
, #prev_date := t.date
, #prev_id := t.id
FROM tbl_name t
CROSS
JOIN (SELECT #prev_date := NULL, #prev_id := NULL) i
GROUP BY t.id DESC, t.date DESC
) d
WHERE d.month_diff IS NOT NULL
GROUP BY d.month_diff
Note that the usage of MySQL user-defined variables is not guaranteed. But we do observe consistent behavior with queries written in a particular way. (Future versions of MySQL may change the behavior we observe.)
EDIT: I modified the query above, to replace the ORDER BY t.id, t.date with a GROUP BY t.id, t.date... It's not clear from the example data whether (id,date) is guaranteed to be unique. (If we do have that guarantee, then we don't need the GROUP BY, we can just use ORDER BY. Otherwise, we need the GROUP BY to get the same result returned by the original query.)

MySQL: Two moving averages in the same query?

Is it possible to get two different moving averages from the same MySQL dataset at the same time?
I'm trying to extract data from a MySQL database that gives me the 'raw' data, plus two different moving averages of the same data set. My best attempt is below, the problem is that the two moving averages appear to be producing identical results?
Also, is there a more efficient way of querying the data? The dataset is reasonably large and this query takes a little too long to run?
SELECT
t1.`DateTime`,
t1.`Positive` AS `RawData`,
(SELECT AVG(t2.`Positive`)
FROM `tbl_DATA_KeywordResults` as t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 96
) AS `DailyAverage`,
(SELECT AVG(t3.`Positive`)
FROM `tbl_DATA_KeywordResults` as t3
WHERE t3.`DateTime` <= t1.`DateTime`
ORDER BY t3.`DateTime` DESC
LIMIT 674
) AS `WeeklyAverage`
FROM `tbl_DATA_KeywordResults` AS t1
ORDER BY t1.`DateTime`;
You are taking the limit after you do the average. The correct form of the subquery would be:
(select avg(Positive)
from (SELECT t2.`Positive`
FROM `tbl_DATA_KeywordResults` as t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 96
) t
) AS `DailyAverage`
I'm not 100% sure that this will work as a subquery. I believe MySQL limits outer references (what's in the where clause) to one layer deep.
There are more painful ways of doing this in MySQL:
select t1.DateTime, t1.RawData,
avg(case when t2.DateTime between avg_96_dt and t1.DateTime then t2.Positive end) as avg96,
avg(case when t2.DateTime between avg_674_dt and t1.DateTime then t2.Positive end) as avg674
from (SELECT t1.`DateTime`, t1.`Positive` AS `RawData`,
(SELECT t2.DateTime
FROM `tbl_DATA_KeywordResults` t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 95, 1
) as avg_96_dt,
(SELECT t2.DateTime
FROM `tbl_DATA_KeywordResults` t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 673, 1
) as avg_674_dt
FROM `tbl_DATA_KeywordResults` t1
) t1 join
tbl_DATA_KeywordResults t2
group by t1.DateTime, t1.RawData
ORDER BY t1.`DateTime`;
That is, get the limits for the date time range and then do the average in a different step.

MySQL Nested Select Query?

Ok, so I have the following query:
SELECT MIN(`date`), `player_name`
FROM `player_playtime`
GROUP BY `player_name`
I then need to use this result inside the following query:
SELECT DATE(`date`) , COUNT(DISTINCT `player_name`)
FROM `player_playtime /*Use previous query result here*/`
GROUP BY DATE( `date`) DESC LIMIT 60
How would I go about doing this?
You just need to write the first query as a subquery (derived table), inside parentheses, pick an alias for it (t below) and alias the columns as well.
The DISTINCT can also be safely removed as the internal GROUP BY makes it redundant:
SELECT DATE(`date`) AS `date` , COUNT(`player_name`) AS `player_count`
FROM (
SELECT MIN(`date`) AS `date`, `player_name`
FROM `player_playtime`
GROUP BY `player_name`
) AS t
GROUP BY DATE( `date`) DESC LIMIT 60 ;
Since the COUNT is now obvious that is only counting rows of the derived table, you can replace it with COUNT(*) and further simplify the query:
SELECT t.date , COUNT(*) AS player_count
FROM (
SELECT DATE(MIN(`date`)) AS date
FROM player_playtime
GROUP BY player_name
) AS t
GROUP BY t.date DESC LIMIT 60 ;

Performance Issue On MySQL DataBase

I have table contains around 14 million records, and I have multiple SP's contain Dynamic SQL, and these SP's contain multiple parameters,and I build Indexes on my table, but the problem is I have a performance Issue, I tried to get the Query from Dynamic SQL and run it, but this query takes between 30 Seconds to 1 minute, my query contains just select from table and some queries contain join with another table with numeric values in where statement and grouping and order by.
I checked status result, I found the grouping by takes all time, and I checked Explain result, It's using right index.
So what I should doing to enhance my queries performance.
Thanks for your cooperation.
-- EDIT, Added queries directly into question instead of comment.
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange,
cast(SUM(column2) as SIGNED) AS Alias1
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY
MONTH(column1)
ORDER BY
Alias1 ASC
LIMIT 0, 10;
and this one:
SELECT
cast(column1 as char(30)) AS DateRange,
cast(SUM(column2) as SIGNED)
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110102)
GROUP BY
column1
ORDER BY
Alias1 ASC
LIMIT 0, 10;
For this query:
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange <<--error? never mind
, cast(SUM(column2) as SIGNED)
FROM Table1
INNER JOIN Table2 DD ON Table1.Date = Table2.Date
WHERE Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY MONTH(column1) <<-- problem 1.
ORDER BY column2 ASC <<-- problem 2.
LIMIT 0, 10;
If you group by a function MySQL cannot use an index. You can speed this up by adding an extra column YearMonth to the table1 that contains the year+month, put an index on that and then group by yearmonth.
The order by does not make sense. You are adding column2, ordering by that column serves no purpose. If you order by yearmonth asc the query will run much faster and make more sense.