MySQL query is slow - difference in successive dates at the group level - mysql

Below is my MySQL query to find the difference between successive date for each account and then using the results to prepare a frequency count table. This query is of course very slow but before that am I doing the right thing? Please help if you can. Also embedded is a small data sample.
Appreciate your time.
OZooHA
ID DATE
403 2008-06-01
403 2012-06-01
403 2011-06-01
403 2010-06-01
403 2009-06-01
15028 2011-07-01
15028 2010-07-01
15028 2009-07-01
15028 2008-07-01
SELECT
month_diff,
count(*)
FROM
(SELECT t1.id,
t1.date,
MIN(t2.date) AS lag_date,
TIMESTAMPDIFF(MONTH, t1.date, MIN(t2.date)) AS month_diff
FROM tbl_name T1
INNER JOIN tbl_name T2
ON t1.id = t2.id
AND t2.date > t1.date
GROUP BY t1.id, t1.date
ORDER BY t1.id, t1.date
)
GROUP BY month_diff
ORDER BY month_diff

Likely, materializing the inline view is taking most of the time. Ensure you have suitable indexes available to improve performance of the join operation; a covering index ON tbl_name (id, date) would likely be optimal for this query.
With a suitable index available (as above) it may be possible to get better performance with a query something like this:
SELECT d.month_diff
, COUNT(*)
FROM ( SELECT IF(#prev_id = t.id
, TIMESTAMPDIFF(MONTH, t.date, #prev_date )
, NULL
) AS month_diff
, #prev_date := t.date
, #prev_id := t.id
FROM tbl_name t
CROSS
JOIN (SELECT #prev_date := NULL, #prev_id := NULL) i
GROUP BY t.id DESC, t.date DESC
) d
WHERE d.month_diff IS NOT NULL
GROUP BY d.month_diff
Note that the usage of MySQL user-defined variables is not guaranteed. But we do observe consistent behavior with queries written in a particular way. (Future versions of MySQL may change the behavior we observe.)
EDIT: I modified the query above, to replace the ORDER BY t.id, t.date with a GROUP BY t.id, t.date... It's not clear from the example data whether (id,date) is guaranteed to be unique. (If we do have that guarantee, then we don't need the GROUP BY, we can just use ORDER BY. Otherwise, we need the GROUP BY to get the same result returned by the original query.)

Related

Query taking lot of time to execute

I am trying to run a query to get data one time from a client database to our database but a query is taking a lot of time to execute, when I change the order by from primary key user_appoint.id to user_appoint.u_id below is my query
SELECT
CONCAT('D',user_appoint.`id`) AS ApptId,
user_appoint.`u_id`,
tbl_questions.CandAns,
tbl_questions.ExamAns,
tbl_questions.QueNote,
CONCAT("[",GROUP_CONCAT(CONCAT('"',`tbl_investigations`.`test_id`,'":"',tbl_investigations.`result`,'"')),"]") AS CandInv,
CONCAT("[",GROUP_CONCAT(CONCAT('"',`tbl_investigations`.`test_id`,'":"',tbl_investigations.`comments`,'"')),"]") AS IntComm,
IF(tbl_questions.LastUpdatedDateTime>MAX(tbl_investigations.`ModifiedAt`),tbl_questions.LastUpdatedDateTime,MAX(tbl_investigations.`ModifiedAt`)) AS LastUpdatedDateTime,
CONCAT('D',user_appoint.`id`) AS UniqueId
FROM user_appoint
LEFT JOIN tbl_investigations ON tbl_investigations.`appt_id`=user_appoint.`id` AND tbl_investigations.`ModifiedAt`>'2011-01-01 00:00:00'
LEFT JOIN tbl_questions ON tbl_questions.`appt_id` =user_appoint.`id` AND tbl_questions.`LastUpdatedDateTime`>'2011-01-01 00:00:00'
GROUP BY user_appoint.`id`
HAVING LastUpdatedDateTime>'2011-01-01 00:00:00'
ORDER BY user_appoint.`u_id`
LIMIT 0, 2000;
user_appoint.u_id is properly indexed.
Please check the explain plan of your query. And its better to always share explain plan with your original question.
explain format=json
SELECT CONCAT('D',user_appoint.id) AS ApptId, user_appoint.u_id,
tbl_questions.CandAns, tbl_questions.ExamAns, tbl_questions.QueNote,
CONCAT("[",GROUP_CONCAT(CONCAT('"',tbl_investigations.test_id,'":"',tbl_investigations.result,'"')),"]")
AS CandInv,
CONCAT("[",GROUP_CONCAT(CONCAT('"',tbl_investigations.test_id,'":"',tbl_investigations.comments,'"')),"]")
AS IntComm,
IF(tbl_questions.LastUpdatedDateTime>MAX(tbl_investigations.ModifiedAt),tbl_questions.LastUpdatedDateTime,MAX(tbl_investigations.ModifiedAt))
AS LastUpdatedDateTime, CONCAT('D',user_appoint.id) AS UniqueId FROM
user_appoint LEFT JOIN tbl_investigations ON
tbl_investigations.appt_id=user_appoint.id AND
tbl_investigations.ModifiedAt>'2011-01-01 00:00:00' LEFT JOIN
tbl_questions ON tbl_questions.appt_id =user_appoint.id AND
tbl_questions.LastUpdatedDateTime>'2011-01-01 00:00:00' GROUP BY
user_appoint.id HAVING LastUpdatedDateTime>'2011-01-01 00:00:00'
ORDER BY user_appoint.u_id LIMIT 0, 2000;
On looking at your query,I could see lot of concat,aggregate function and join is being performed in single query.
These operations will be performed for all 2000 records as you have set limit on query execution.
This might have caused query to slow down its execution.
You have 2 identical columns with different aliases
CONCAT('D',user_appoint.`id`) AS ApptId,
CONCAT('D',user_appoint.`id`) AS UniqueId
(changed) Assuming NULLs may occur in these date columns then comparing the max() values will overcome any adverse impacts by NULL:
if(max(tbl_questions.lastupdateddatetime) > max(tbl_investigations.`modifiedat`) , max(tbl_questions.lastupdateddatetime), max(tbl_investigations.`modifiedat`)) AS LastUpdatedDateTime
Try this:
SELECT *
FROM (
SELECT
Concat('D', user_appoint.`id`) AS ApptId
, user_appoint.`u_id`
, tbl_questions.candans
, tbl_questions.examans
, tbl_questions.quenote
, Concat("[", Group_concat(Concat('"', `tbl_investigations`.`test_id`, '":"', tbl_investigations.`result`, '"')), "]") AS CandInv
, Concat("[", Group_concat(Concat('"', `tbl_investigations`.`test_id`, '":"', tbl_investigations.`comments`, '"')), "]") AS IntComm
, if(max(tbl_questions.lastupdateddatetime) > max(tbl_investigations.`modifiedat`) , max(tbl_questions.lastupdateddatetime), max(tbl_investigations.`modifiedat`) ) AS LastUpdatedDateTime
, Concat('D', user_appoint.`id`) AS UniqueId
FROM user_appoint
LEFT JOIN tbl_investigations
ON tbl_investigations.`appt_id` = user_appoint.`id`
AND tbl_investigations.`modifiedat` > '2011-01-01 00:00:00'
LEFT JOIN tbl_questions
ON tbl_questions.`appt_id` = user_appoint.`id`
AND tbl_questions.`lastupdateddatetime` > '2011-01-01 00:00:00'
GROUP BY user_appoint.`id`
HAVING lastupdateddatetime > '2011-01-01 00:00:00'
) d
ORDER BY `u_id`
LIMIT 0, 2000
;
HOWEVER
You are using a non-current and non-standard form of GROUP BY clause. MySQL started life allowing this bizarre situation where you could select many columns but only group by one of those. This is completely non-standard for SQL.
In recent versions of MySQL the default settings have changed and using just one column in the GROUP BY clause will cause an error.
So, you may have to change the way you perform the grouping to
GROUP BY
user_appoint.`id`
, user_appoint.`u_id`
, tbl_questions.candans
, tbl_questions.examans
, tbl_questions.quenote
If none of these improve performance please provide the execution plan (as text).

SQL find rows where value is not increasing

I have a table with columns like this:
id | timestamp | ...
and I am looking for rows where the timestamp decreased since the previous row.
I tried a statement like this:
SELECT count(a.id)
FROM tbl AS a INNER JOIN tbl AS b ON a.id+1=b.id
WHERE a.timestamp<b.timestamp;
but it appears not to have worked. I get zero results even though I expect some. Any suggestions what is wrong?
I would also appreciate any ideas on a better way to write this query.
I am using MySQL.
You can get the previous value using a correlated subquery, and then use that for the comparison:
select t.*
from (select t.*,
(select t2.timestamp from tbl t2 where t2.id < t.id order by t2.id desc limit 1
) as prevts
from tbl t
) t
where timestamp < prevts;
The problem with your query is probably that the ids have gaps in them.
EDIT:
You can do this with variables. The challenge is getting the variable comparison and assignment in a single expression. This is needed because MySQL does not guarantee the order of evaluation of expressions in a select statement.
The following assigns a value to IsDecreasing and assigns the values:
select t.*
from (select t.*,
if(#prev > timestamp, if(#prev := timestamp, 1, 1),
if(#prev := timestamp, 0, 0)
) IsDecreasing
from tbl t cross join
(select #prev := -1) vars
order by id
) t
where IsDecreasing = 1;
This should be faster than the previous method -- probably even when you have the right index.

MySQL: Two moving averages in the same query?

Is it possible to get two different moving averages from the same MySQL dataset at the same time?
I'm trying to extract data from a MySQL database that gives me the 'raw' data, plus two different moving averages of the same data set. My best attempt is below, the problem is that the two moving averages appear to be producing identical results?
Also, is there a more efficient way of querying the data? The dataset is reasonably large and this query takes a little too long to run?
SELECT
t1.`DateTime`,
t1.`Positive` AS `RawData`,
(SELECT AVG(t2.`Positive`)
FROM `tbl_DATA_KeywordResults` as t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 96
) AS `DailyAverage`,
(SELECT AVG(t3.`Positive`)
FROM `tbl_DATA_KeywordResults` as t3
WHERE t3.`DateTime` <= t1.`DateTime`
ORDER BY t3.`DateTime` DESC
LIMIT 674
) AS `WeeklyAverage`
FROM `tbl_DATA_KeywordResults` AS t1
ORDER BY t1.`DateTime`;
You are taking the limit after you do the average. The correct form of the subquery would be:
(select avg(Positive)
from (SELECT t2.`Positive`
FROM `tbl_DATA_KeywordResults` as t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 96
) t
) AS `DailyAverage`
I'm not 100% sure that this will work as a subquery. I believe MySQL limits outer references (what's in the where clause) to one layer deep.
There are more painful ways of doing this in MySQL:
select t1.DateTime, t1.RawData,
avg(case when t2.DateTime between avg_96_dt and t1.DateTime then t2.Positive end) as avg96,
avg(case when t2.DateTime between avg_674_dt and t1.DateTime then t2.Positive end) as avg674
from (SELECT t1.`DateTime`, t1.`Positive` AS `RawData`,
(SELECT t2.DateTime
FROM `tbl_DATA_KeywordResults` t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 95, 1
) as avg_96_dt,
(SELECT t2.DateTime
FROM `tbl_DATA_KeywordResults` t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 673, 1
) as avg_674_dt
FROM `tbl_DATA_KeywordResults` t1
) t1 join
tbl_DATA_KeywordResults t2
group by t1.DateTime, t1.RawData
ORDER BY t1.`DateTime`;
That is, get the limits for the date time range and then do the average in a different step.

Performance Issue On MySQL DataBase

I have table contains around 14 million records, and I have multiple SP's contain Dynamic SQL, and these SP's contain multiple parameters,and I build Indexes on my table, but the problem is I have a performance Issue, I tried to get the Query from Dynamic SQL and run it, but this query takes between 30 Seconds to 1 minute, my query contains just select from table and some queries contain join with another table with numeric values in where statement and grouping and order by.
I checked status result, I found the grouping by takes all time, and I checked Explain result, It's using right index.
So what I should doing to enhance my queries performance.
Thanks for your cooperation.
-- EDIT, Added queries directly into question instead of comment.
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange,
cast(SUM(column2) as SIGNED) AS Alias1
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY
MONTH(column1)
ORDER BY
Alias1 ASC
LIMIT 0, 10;
and this one:
SELECT
cast(column1 as char(30)) AS DateRange,
cast(SUM(column2) as SIGNED)
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110102)
GROUP BY
column1
ORDER BY
Alias1 ASC
LIMIT 0, 10;
For this query:
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange <<--error? never mind
, cast(SUM(column2) as SIGNED)
FROM Table1
INNER JOIN Table2 DD ON Table1.Date = Table2.Date
WHERE Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY MONTH(column1) <<-- problem 1.
ORDER BY column2 ASC <<-- problem 2.
LIMIT 0, 10;
If you group by a function MySQL cannot use an index. You can speed this up by adding an extra column YearMonth to the table1 that contains the year+month, put an index on that and then group by yearmonth.
The order by does not make sense. You are adding column2, ordering by that column serves no purpose. If you order by yearmonth asc the query will run much faster and make more sense.

MySQL Query optimization with JOIN and COUNT

I have the following MySQL Query:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
The idea being to select a release and count how many other sites had also released it prior to the recorded date, to get the position.
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 498422 Using temporary; Using filesort
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 91661
2 DERIVED tracers index NULL releaseid 4 NULL 498422
Any suggestions on how to eliminate the Using temporary; Using filesort ? It takes a loooong time. The indexes I have thought of and tried haven't helped anything.
Try adding an index on tracers.releaseid and one on tracers.date
make sure you have an index on releaseid.
flip your JOIN, the sub-query must be on the left side in the LEFT JOIN.
put the ORDER BY and LIMIT clauses inside the sub-query.
Try having two indices, one on (date) and one on (releaseid, date).
Another thing is that your query does not seem to be doing what you describe it does. Does it actually count correctly?
Try rewriting it as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, COUNT(*) AS pos
FROM tracers AS t1
JOIN tracers AS t2
ON t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
GROUP BY t1.releaseid
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
or as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, ( SELECT COUNT(*)
FROM tracers AS t2
WHERE t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
) AS pos
FROM tracers AS t1
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
This answer below maybe not change explain output, however if your major problem is sorting data, which it identified by removing order clause will makes your query run faster, try to sort your subquery join table first and your query will be:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
ORDER BY `pos` DESC -- additional order
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
Note: My db version is mysql-5.0.96-x64, maybe in another version you get different result.