Is it possible to get two different moving averages from the same MySQL dataset at the same time?
I'm trying to extract data from a MySQL database that gives me the 'raw' data, plus two different moving averages of the same data set. My best attempt is below, the problem is that the two moving averages appear to be producing identical results?
Also, is there a more efficient way of querying the data? The dataset is reasonably large and this query takes a little too long to run?
SELECT
t1.`DateTime`,
t1.`Positive` AS `RawData`,
(SELECT AVG(t2.`Positive`)
FROM `tbl_DATA_KeywordResults` as t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 96
) AS `DailyAverage`,
(SELECT AVG(t3.`Positive`)
FROM `tbl_DATA_KeywordResults` as t3
WHERE t3.`DateTime` <= t1.`DateTime`
ORDER BY t3.`DateTime` DESC
LIMIT 674
) AS `WeeklyAverage`
FROM `tbl_DATA_KeywordResults` AS t1
ORDER BY t1.`DateTime`;
You are taking the limit after you do the average. The correct form of the subquery would be:
(select avg(Positive)
from (SELECT t2.`Positive`
FROM `tbl_DATA_KeywordResults` as t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 96
) t
) AS `DailyAverage`
I'm not 100% sure that this will work as a subquery. I believe MySQL limits outer references (what's in the where clause) to one layer deep.
There are more painful ways of doing this in MySQL:
select t1.DateTime, t1.RawData,
avg(case when t2.DateTime between avg_96_dt and t1.DateTime then t2.Positive end) as avg96,
avg(case when t2.DateTime between avg_674_dt and t1.DateTime then t2.Positive end) as avg674
from (SELECT t1.`DateTime`, t1.`Positive` AS `RawData`,
(SELECT t2.DateTime
FROM `tbl_DATA_KeywordResults` t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 95, 1
) as avg_96_dt,
(SELECT t2.DateTime
FROM `tbl_DATA_KeywordResults` t2
WHERE t2.`DateTime` <= t1.`DateTime`
ORDER BY t2.`DateTime` DESC
LIMIT 673, 1
) as avg_674_dt
FROM `tbl_DATA_KeywordResults` t1
) t1 join
tbl_DATA_KeywordResults t2
group by t1.DateTime, t1.RawData
ORDER BY t1.`DateTime`;
That is, get the limits for the date time range and then do the average in a different step.
Related
For a graph I am asked to structure the data with an MYSQL statement.
It needs the following output (3 colums):
AVG_points | Playerid | Date
In the database I have dont have the average points per roundID, just the points scored per round per player.
its easy to calculate the current average points with avg(points) but I need the average points on every round so it can be plotted out in a graph.
I tried to make an SQL statement to give me the averages for every round but its not comming out in an useable format for graph to plot. I read into Pivotting but thats not what works in this situation, I think my sql is to simple, plus for every new round that comes up i need to program more lines wich means manual edits every table update to get the graph to work...
this what I tried:
SELECT t1.playerid as player
,date_format(t1.CreatedTime, '%Y%m%d%H%i') as date
/* calculate average points per round */
,(select avg(points) from pokermax_scores t2 where tournamentid <= (select distinct(tournamentid) from pokermax_scores order by tournamentid desc limit 0,1) and t2.playerid = t1.playerid) as avg_current
,(select avg(points) from pokermax_scores t2 where tournamentid <= (select distinct(tournamentid) from pokermax_scores order by tournamentid desc limit 1,1) and t2.playerid = t1.playerid) as 1_avg_last
,(select avg(points) from pokermax_scores t2 where tournamentid <= (select distinct(tournamentid) from pokermax_scores order by tournamentid desc limit 2,1) and t2.playerid = t1.playerid) as 2_avg_last
,(select avg(points) from pokermax_scores t2 where tournamentid <= (select distinct(tournamentid) from pokermax_scores order by tournamentid desc limit 3,1) and t2.playerid = t1.playerid) as 3_avg_last
,(select avg(points) from pokermax_scores t2 where tournamentid <= (select distinct(tournamentid) from pokermax_scores order by tournamentid desc limit 4,1) and t2.playerid = t1.playerid) as 4_avg_last
,(select avg(points) from pokermax_scores t2 where tournamentid <= (select distinct(tournamentid) from pokermax_scores order by tournamentid desc limit 5,1) and t2.playerid = t1.playerid) as 5_avg_last
FROM pokermax_scores as t1, pokermax_players as t3
GROUP BY player
which gives the following output: < SEE SQLFIDDLE LINK >
but I need my data in this format so PHP can loop it correctly:
http://i57.tinypic.com/10wists.png
Is there any SQL guru here that knows how I can edit my statement to make it come out as above picture?
thanks for reading all this :)
Here the SQLFIDDLE: http://sqlfiddle.com/#!9/a956f/2
It is unclear what your data really looks like. The following would seem to be a good place to start because it produces the output in the format you want:
SELECT date(ps.CreatedTime) as date,
ps.playerid as player,
avg(ps.score)
FROM pokermax_scores ps
GROUP BY date(ps.CreatedTime), ps.playerid;
EDIT:
The comment helps. You have nothing called "round" in the data.
I'm guessing it is tournamentid. The query is easily modified if it is something else. I think you want two levels of aggregation:
SELECT date, player, avg(score)
FROM (SELECT date(ps.CreatedTime) as date,
ps.playerid as player, tournamentid,
SUM(ps.points) as score
FROM pokermax_scores ps
GROUP BY date(ps.CreatedTime), ps.playerid, tournamentid
) dpt
GROUP BY date, player;
Here it is in a SQL Fiddle.
You could use this to get cascaded sums on tournament points:
set #score_accumulator=0;
set #accumulator=0;
SELECT sub.tournamentId,
sub.datePS,
-- SUM(#score_accumulator := #score_accumulator + sub.points) score,
-- #accumulator := #accumulator + 1 nrTour,
SUM(#score_accumulator := #score_accumulator + sub.points)/(#accumulator := #accumulator + 1) score
FROM
(SELECT DISTINCT tournamentId,
date(CreatedTime) AS datePS,
points points
FROM pokermax_scores ps2
ORDER BY CreatedTime) sub
GROUP BY sub.tournamentId ORDER BY sub.datePS
Similarly, you could to this for players points evolution.
That being said, this kind of logic should reside at application level and not DB level.
Below is my MySQL query to find the difference between successive date for each account and then using the results to prepare a frequency count table. This query is of course very slow but before that am I doing the right thing? Please help if you can. Also embedded is a small data sample.
Appreciate your time.
OZooHA
ID DATE
403 2008-06-01
403 2012-06-01
403 2011-06-01
403 2010-06-01
403 2009-06-01
15028 2011-07-01
15028 2010-07-01
15028 2009-07-01
15028 2008-07-01
SELECT
month_diff,
count(*)
FROM
(SELECT t1.id,
t1.date,
MIN(t2.date) AS lag_date,
TIMESTAMPDIFF(MONTH, t1.date, MIN(t2.date)) AS month_diff
FROM tbl_name T1
INNER JOIN tbl_name T2
ON t1.id = t2.id
AND t2.date > t1.date
GROUP BY t1.id, t1.date
ORDER BY t1.id, t1.date
)
GROUP BY month_diff
ORDER BY month_diff
Likely, materializing the inline view is taking most of the time. Ensure you have suitable indexes available to improve performance of the join operation; a covering index ON tbl_name (id, date) would likely be optimal for this query.
With a suitable index available (as above) it may be possible to get better performance with a query something like this:
SELECT d.month_diff
, COUNT(*)
FROM ( SELECT IF(#prev_id = t.id
, TIMESTAMPDIFF(MONTH, t.date, #prev_date )
, NULL
) AS month_diff
, #prev_date := t.date
, #prev_id := t.id
FROM tbl_name t
CROSS
JOIN (SELECT #prev_date := NULL, #prev_id := NULL) i
GROUP BY t.id DESC, t.date DESC
) d
WHERE d.month_diff IS NOT NULL
GROUP BY d.month_diff
Note that the usage of MySQL user-defined variables is not guaranteed. But we do observe consistent behavior with queries written in a particular way. (Future versions of MySQL may change the behavior we observe.)
EDIT: I modified the query above, to replace the ORDER BY t.id, t.date with a GROUP BY t.id, t.date... It's not clear from the example data whether (id,date) is guaranteed to be unique. (If we do have that guarantee, then we don't need the GROUP BY, we can just use ORDER BY. Otherwise, we need the GROUP BY to get the same result returned by the original query.)
My table has an NAME and DISTANCE column. I'd like to figure out a way to list all the names that are within N units or less from the same name. i.e. Given:
NAME DISTANCE
a 2
a 4
a 3
a 7
a 1
b 3
b 1
b 2
b 5
(let's say N = 2)
I would like
a 2
a 4
a 3
a 1
...
...
Instead of
a 2
a 2 (because it double counts)
I'm trying to apply this method in order to solve for a customerID with claim dates (stored as number) that appear in clusters around each other. I'd like to be able to label the customerID and the claim date that is within say 10 days of another claim by that same customer. i.e., |a.claimdate - b.claimdate| <= 10. When I use this method
WHERE a.CUSTID = b.CUSTID
AND a.CLDATE BETWEEN (b.CLDATE - 10 AND b.CLDATE + 10)
AND a.CLAIMID <> b.CLAIMID
I double count. CLAIMID is unique.
Since you don't need the text, and just want the values, you can accomplish that using DISTINCT:
select distinct t.name, t.distance
from yourtable t
join yourtable t2 on t.name = t2.name
and (t.distance = t2.distance+1 or t.distance = t2.distance-1)
order by t.name
SQL Fiddle Demo
Given your edits, if you're looking for results between a certain distance, you can use >= and <= (or BETWEEN):
select distinct t.name, t.distance
from yourtable t
join yourtable t2 on t.name = t2.name
and t.distance >= t2.distance-1
and t.distance <= t2.distance+1
and t.distance <> t2.distance
order by t.name
You need to add the final criteria of t.distance <> t2.distance so you don't return the entire dataset -- technically every distance is between itself. This would be better if you had a primary key to add to the join, but if you don't, you could utilize ROW_NUMBER() as well to achieve the same results.
with cte as (
select name, distance, row_number() over (partition by name order by (select null)) rn
from yourtable
)
select distinct t.name, t.distance
from cte t
join cte t2 on t.name = t2.name
and t.distance >= t2.distance-1
and t.distance <= t2.distance+1
and t.rn <> t2.rn
order by t.name
Updated SQL Fiddle
I like #sgeddes' solution, but you can also get rid of the distinct and or in the join condition like this:
select * from table a
where exists (
select 1 from table b
where b.name = a.name
and b.distance between a.distance - 1 and a.distance + 1
)
This also ensures that rows with equal distance get included and considers a whole range, not just the rows that have a distance difference of exactly n, as suggested by #HABO.
I have table contains around 14 million records, and I have multiple SP's contain Dynamic SQL, and these SP's contain multiple parameters,and I build Indexes on my table, but the problem is I have a performance Issue, I tried to get the Query from Dynamic SQL and run it, but this query takes between 30 Seconds to 1 minute, my query contains just select from table and some queries contain join with another table with numeric values in where statement and grouping and order by.
I checked status result, I found the grouping by takes all time, and I checked Explain result, It's using right index.
So what I should doing to enhance my queries performance.
Thanks for your cooperation.
-- EDIT, Added queries directly into question instead of comment.
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange,
cast(SUM(column2) as SIGNED) AS Alias1
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY
MONTH(column1)
ORDER BY
Alias1 ASC
LIMIT 0, 10;
and this one:
SELECT
cast(column1 as char(30)) AS DateRange,
cast(SUM(column2) as SIGNED)
FROM
Table1
INNER JOIN Table2 DD
ON Table1.Date = Table2.Date
WHERE
Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110102)
GROUP BY
column1
ORDER BY
Alias1 ASC
LIMIT 0, 10;
For this query:
SELECT
CONCAT(column1, ' - ', column1 + INTERVAL 1 MONTH) AS DateRange <<--error? never mind
, cast(SUM(column2) as SIGNED)
FROM Table1
INNER JOIN Table2 DD ON Table1.Date = Table2.Date
WHERE Table1.ID = 1
AND (Date BETWEEN 20110101 AND 20110201)
GROUP BY MONTH(column1) <<-- problem 1.
ORDER BY column2 ASC <<-- problem 2.
LIMIT 0, 10;
If you group by a function MySQL cannot use an index. You can speed this up by adding an extra column YearMonth to the table1 that contains the year+month, put an index on that and then group by yearmonth.
The order by does not make sense. You are adding column2, ordering by that column serves no purpose. If you order by yearmonth asc the query will run much faster and make more sense.
I have the following MySQL Query:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
The idea being to select a release and count how many other sites had also released it prior to the recorded date, to get the position.
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 498422 Using temporary; Using filesort
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 91661
2 DERIVED tracers index NULL releaseid 4 NULL 498422
Any suggestions on how to eliminate the Using temporary; Using filesort ? It takes a loooong time. The indexes I have thought of and tried haven't helped anything.
Try adding an index on tracers.releaseid and one on tracers.date
make sure you have an index on releaseid.
flip your JOIN, the sub-query must be on the left side in the LEFT JOIN.
put the ORDER BY and LIMIT clauses inside the sub-query.
Try having two indices, one on (date) and one on (releaseid, date).
Another thing is that your query does not seem to be doing what you describe it does. Does it actually count correctly?
Try rewriting it as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, COUNT(*) AS pos
FROM tracers AS t1
JOIN tracers AS t2
ON t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
GROUP BY t1.releaseid
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
or as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, ( SELECT COUNT(*)
FROM tracers AS t2
WHERE t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
) AS pos
FROM tracers AS t1
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
This answer below maybe not change explain output, however if your major problem is sorting data, which it identified by removing order clause will makes your query run faster, try to sort your subquery join table first and your query will be:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
ORDER BY `pos` DESC -- additional order
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
Note: My db version is mysql-5.0.96-x64, maybe in another version you get different result.