MySql - Select and compute running AVG date by date [duplicate] - mysql

I have the table like below
id timestamp speed
1 11:00:01 100
2 11:05:01 110
3 11:10:01 90
4 11:15 :01 80
I need to calculate moving average like below
id timestamp speed average
1 11:00:01 100 100
2 11:05:01 110 105
3 11:10:01 90 100
4 11:15:01 80 95
What I tried
SELECT
*,
(select avg(speed) from tbl t where tbl.timestamp<=t.timestamp) as avg
FROM
tbl
At first it looks quite easy but when the data on the table swell, it is too slow
Any faster approach?

Your query is one way to do a running average:
SELECT t.*,
(select avg(speed) from tbl tt where tt.timestamp <= t.timestamp) as avg
FROM tbl t;
The alternative is to use variables:
select t.*, (sum_speed / cnt) as running_avg_speed
from (select t.*, (#rn := #rn + 1) as cnt, (#s := #s + speed) as sum_speed
from tbl t cross join
(select #rn := 0, #s := 0) params
order by timestamp
) t;
An index on tbl(timestamp) should further improve performance.

Does MySQL support windowing functions?
select
id, timestamp, speed,
avg (speed) over (order by timestamp) as average
from tbl
If it doesn't this might work, although I doubt it's efficient:
select
min (t1.id) as id, t1.timestamp, min (t1.speed) as speed,
avg (t2.speed)
from
tbl t1
join tbl t2 on
t2.id <= t1.id
group by
t1.timestamp
order by
t1.timestamp

Or slotting neatly between GL's two answers (performancewise anyway)...
SELECT x.*, AVG(y.speed) avg
FROM my_table x
JOIN my_table y
ON y.id <= x.id
GROUP
BY x.id;

What about a simple concurrent solution?
SET #summ=0; SET #counter=0;SELECT *,(#counter := #counter +1) as cnt, (#summ := #summ+speed) as spd, (#summ/#counter) AS avg FROM tbl;

Related

get the range of sequence values in table column

I have a list of value in my column. And want to query the range.
Eg. If values are 1,2,3,4,5,9,11,12,13,14,17,18,19
I want to display
1-5,9,11-14,17-19
Assuming that each value is stored on a separate row, you can use some gaps-and-island technique here:
select case when min(val) <> max(val)
then concat(min(val), '-', max(val))
else min(val)
end val_range
from (select val, row_number() over(order by val) rn from mytable) t
group by val - rn
order by min(val)
The idea is to build groups of consecutive values by taking the difference between the value and an incrementing rank, which is computed using row_number() (available in MySQL 8.0):
Demo on DB Fiddle:
| val_range |
| :-------- |
| 1-5 |
| 9 |
| 11-14 |
| 17-19 |
In earlier versions, you can emulate row_number() with a correlated subquery, or a user variable. The second option goes like:
select case when min(val) <> max(val)
then concat(min(val), '-', max(val))
else min(val)
end val_range
from (select #rn := 0) x
cross join (
select val, #rn := #rn + 1 rn
from (select val from mytable order by val) t
) t
group by val - rn
order by min(val)
As a complement to other answers:
select dn.val as dnval, min(up.val) as upval
from mytable up
join mytable dn
on dn.val <= up.val
where not exists (select 1 from mytable a where a.val = up.val + 1)
and not exists (select 1 from mytable b where b.val = dn.val - 1)
group by dn.val
order by dn.val;
1 5
9 9
11 14
17 19
Needless to say, but using an OLAP function like #GNB does, is orders of magnitude more efficient.
A short article on how to mimic OLAP functions in MySQL < 8 can be found at:
mysql-row_number
Fiddle
EDIT:
If another dimension is introduced (in this case p), something like:
select dn.p, dn.val as dnval, min(up.val) as upval
from mytable up
join mytable dn
on dn.val <= up.val
and dn.p = up.p
where not exists (select 1 from mytable a where a.val = up.val + 1 and a.p = up.p)
and not exists (select 1 from mytable b where b.val = dn.val - 1 and b.p = dn.p)
group by dn.p, dn.val
order by dn.p, dn.val;
can be used, see Fiddle2

running average in mysql

I have the table like below
id timestamp speed
1 11:00:01 100
2 11:05:01 110
3 11:10:01 90
4 11:15 :01 80
I need to calculate moving average like below
id timestamp speed average
1 11:00:01 100 100
2 11:05:01 110 105
3 11:10:01 90 100
4 11:15:01 80 95
What I tried
SELECT
*,
(select avg(speed) from tbl t where tbl.timestamp<=t.timestamp) as avg
FROM
tbl
At first it looks quite easy but when the data on the table swell, it is too slow
Any faster approach?
Your query is one way to do a running average:
SELECT t.*,
(select avg(speed) from tbl tt where tt.timestamp <= t.timestamp) as avg
FROM tbl t;
The alternative is to use variables:
select t.*, (sum_speed / cnt) as running_avg_speed
from (select t.*, (#rn := #rn + 1) as cnt, (#s := #s + speed) as sum_speed
from tbl t cross join
(select #rn := 0, #s := 0) params
order by timestamp
) t;
An index on tbl(timestamp) should further improve performance.
Does MySQL support windowing functions?
select
id, timestamp, speed,
avg (speed) over (order by timestamp) as average
from tbl
If it doesn't this might work, although I doubt it's efficient:
select
min (t1.id) as id, t1.timestamp, min (t1.speed) as speed,
avg (t2.speed)
from
tbl t1
join tbl t2 on
t2.id <= t1.id
group by
t1.timestamp
order by
t1.timestamp
Or slotting neatly between GL's two answers (performancewise anyway)...
SELECT x.*, AVG(y.speed) avg
FROM my_table x
JOIN my_table y
ON y.id <= x.id
GROUP
BY x.id;
What about a simple concurrent solution?
SET #summ=0; SET #counter=0;SELECT *,(#counter := #counter +1) as cnt, (#summ := #summ+speed) as spd, (#summ/#counter) AS avg FROM tbl;

Mysql query get SUM() specific row?

Is it possible to get specific row in query using like SUM?
Example:
id tickets
1 10 1-10 10=10
2 35 11-45 10+35=45
3 45 46-90 10+35+45=90
4 110 91-200 10+35+45+110=200
Total: 200 tickets(In SUM), I need to get row ID who have ticket with number like 23(Output would be ID: 2, because ID: 2 contains 11-45tickets in SUM)
You can do it by defining a local variable into your select query (in form clause), e.g.:
select id, #total := #total + tickets as seats
from test, (select #total := 0) t
Here is the SQL Fiddle.
You seem to want the row where "23" fits in. I think this does the trick:
select t.*
from (select t.*, (#total := #total + tickets) as running_total
from t cross join
(select #total := 0) params
order by id
) t
where 23 > running_total - tickets and 23 <= running_total;
SELECT
d.id
,d.tickets
,CONCAT(
TRIM(CAST(d.RunningTotal - d.tickets + 1 AS CHAR(10)))
,'-'
,TRIM(CAST(d.RunningTotal AS CHAR(10)))
) as TicketRange
,d.RunningTotal
FROM
(
SELECT
id
,tickets
,#total := #total + tickets as RunningTotal
FROM
test
CROSS JOIN (select #total := 0) var
ORDER BY
id
) d
This is similar to Darshan's answer but there are a few key differences:
You shouldn't use implicit join syntax, explicit join has more functionality in the long run and has been a standard for more than 20 years
ORDER BY will make a huge difference on your running total when calculated with a variable! if you change the order it will calculate differently so you need to consider how you want to do the running total, by date? by id? by??? and make sure you put it in the query.
finally I actually calculated the range as well.
And here is how you can do it without using variables:
SELECT
d.id
,d.tickets
,CONCAT(
TRIM(d.LowRange)
,'-'
,TRIM(
CAST(RunningTotal AS CHAR(10))
)
) as TicketRange
,d.RunningTotal
FROM
(
SELECT
t.id
,t.tickets
,CAST(COALESCE(SUM(t2.tickets),0) + 1 AS CHAR(10)) as LowRange
,t.tickets + COALESCE(SUM(t2.tickets),0) as RunningTotal
FROM
test t
LEFT JOIN test t2
ON t.id > t2. id
GROUP BY
t.id
,t.tickets
) d

MYSQL - Total registrations per day

I have the following structure in my user table:
id(INT) registered(DATETIME)
1 2016-04-01 23:23:01
2 2016-04-02 03:23:02
3 2016-04-02 05:23:03
4 2016-04-03 04:04:04
I want to get the total (accumulated) user count per day, for all days in DB
So result should be something like
day total
2016-04-01 1
2016-04-02 3
2016-04-03 4
I tried some sub querying, but somehow i have now idea how to achieve this with possibly 1 SQL statement. Of course if could group by per day count and add them programmatically, but i don't want to do that if possible.
You can use a GROUP BY that does all the counts, without the need of doing anything programmatically, please have a look at this query:
select
d.dt,
count(*) as total
from
(select distinct date(registered) dt from table1) d inner join
table1 r on d.dt>=date(r.registered)
group by
d.dt
order by
d.dt
the first subquery returns all distinct dates, then we can join all dates with all previous registrations, and do the counts, all in one query.
An alternative join condition that can give some improvements in performance is:
on d.dt + interval 1 day > r.registered
Not sure why not just use GROUP BY, without it this thing will be more complicated, anyway, try this;)
select
date_format(main.registered, '%Y-%m-%d') as `day`,
main.total
from (
select
table1.*,
#cnt := #cnt + 1 as total
from table1
cross join (select #cnt := 0) t
) main
inner join (
select
a.*,
if(#param = date_format(registered, '%Y-%m-%d'), #rowno := #rowno + 1 ,#rowno := 1) as rowno,
#param := date_format(registered, '%Y-%m-%d')
from (select * from table1 order by registered desc) a
cross join (select #param := null, #rowno := 0) tmp
having rowno = 1
) sub on main.id = sub.id
SQLFiddle DEMO

Partition data in Percentile range and assign different value for different Range

I have table structure as shown in below
Temp
Customer_id | sum
Now I have to create view with extra column customer_type and assign value 1 if customer lies in top 10% customer (with descending order of sum,also total number of customer may vary) and 2 if customer lies between 10%-20%, 3 if customer lies between 20%-60% and 4 if customer lies between 60%-100%. How can I do this?
I just able to extract top 10% and between 10% - 20% data but couldn't able to assign value as (source)
SELECT * FROM temp WHERE sum >= (SELECT sum FROM temp t1
WHERE(SELECT count(*) FROM temp t2 WHERE t2.sum >= t1.sum) <=
(SELECT 0.1 * count(*) FROM temp));
and (not efficient just enhance above code)
select * from temp t1
where (select count(*) from temp t2 where t2.sum>=t2.sum)
>= (select 0.1 * count(*) from temp) and (select count(*) from temp t2 where t2.sum>=t1.sum)
<= (select 0.2 * count(*) from temp);
Sample data are available at sqlfiddle.com
This should help you. You need to get row number for sum and total number of rows. I'm sure you can figure out the rest easily.
SELECT
*,
#curRow := #curRow + 1 AS row_number,
(#curRow2 := #curRow2 + 1) / c as pct_row
FROM
temp t
JOIN (SELECT #curRow := 0) r
JOIN (SELECT #curRow2 := 0) r2
join (select count(*) c from temp) s
order by
sum desc
This is based on this answer
I had solve this as like this. Thanks for #twn08 for his answer which guide me upto this.
select customer_id,sum,case
when pct_row<=0.10 then 1
when pct_row>0.10 and pct_row<=0.20 then 2
when pct_row>0.20 and pct_row<=0.60 then 3
when pct_row>0.60 then 4
end as customer_label from (
select customer_id,sum,(#curRow := #curRow+1)/c as pct_row
from temp t
jOIN (SELECT #curRow := 0) r
JOIN (SELECT #curRow2 := 0) r2
join (select count(*) c from temp) s
order by sum desc) p;
I don't know whether this is efficient method or not but work fine for small data set.