MySQL - Calculate accumulation since reset event in a Table - mysql

This issue is a reference for my other question
Python solution has been done based on extract from MySQL DB (5.6.34) where original data are stored.
My question is: Is it possible to make such calculation straight in MySQL?
Just to remind:
There is 'runners' table with accumulated distance per runner and reset tags
runner startdate cum_distance reset_event
0 1 2017-04-01 100 1
1 1 2018-04-20 125 0
2 1 2018-05-25 130 1
3 2 2015-04-05 10 1
4 2 2015-10-20 20 1
5 2 2016-11-29 50 0
I would like to calculate an accumulated distance per runner since the reset point (my comments in brackets ()):
runner startdate cum_distance reset_event runner_dist_since_reset
0 1 2017-04-01 100 1 100 <-(no reset since begin)
1 1 2018-04-20 125 0 25 <-(125-100)
2 1 2018-05-25 130 1 30 <-(130-100)
3 2 2015-04-05 10 1 10 <-(no reset since begin)
4 2 2015-10-20 20 1 10 <-(20-10)
5 2 2016-11-29 50 0 30 <-(50-20)
So far I was able to calculate only differences between reset events:
SET #DistSinceReset=0;
SELECT
runner,
startdate,
reset_event,
IF(cum_distance - #DistSinceReset <0, cum_distance, cum_distance - #DistSinceReset) AS 'runner_dist_since_reset',
#DistSinceReset := cum_distance AS 'cum_distance'
FROM
runners
WHERE
reset_event = 1
GROUP BY runner, startdate

This answer is for MySQL 8.
The information you want is the most recent cum_distance for each user with reset_event = 1. You are using MySQL 8, so you can use window functions.
Here is one method:
select r.*,
(cum_distance - coalesce(preceding_reset_cum_distance, 0)) as runner_dist_since_reset
from (select r.*,
min(cum_distance) over (partition by runner order by preceding_reset) as preceding_reset_cum_distance
from (select r.*,
max(case when reset_event = 1 then start_date end) over
(partition by runner
order by start_date
rows between unbounded preceding and 1 preceding
) as preceding_reset
from runners r
) r
) r;

Related

how to group the data by some interval datetimes?

there are many devices and while using it will upload data every some seconds or minutes.
I want to get the sections of date-time that the device is in use
Id date-time value
0 2021-07-08 14:46:46 1
1 2021-07-08 14:47:47 5
2 2021-07-08 14:48:48 2
3 2021-07-08 14:49:49 4
4 2021-07-08 15:30:01 7
5 2021-07-08 15:30:46 4
6 2021-07-08 15:30:46 4
7 2021-07-08 15:50:04 4
8 2021-07-08 15:50:05 6
can it be true that group the data by an interval?
let us consider interval = 1 minutes
then group the data which the minus of the two date-time is more than 1 minutes.
then Id=0 or Id=1 or Id=2 or Id=3 is one group and Id=4 and Id=5 and Id=6 and Id=7 and Id=8 is another group
what I want is the group is a nearly date-time.
If the difference between two records is more than 1 minute then they are in two groups. If not they are in the same groups.
which means in the same group time1 will be smaller than 1 minutes to one of the other time.
If the time difference is 1 or 10 minutes larger than the previous record it will belong to a new groups
and I am using MYSQL
You can use lag window function to obtain previous date_time.
One way to calculate the time difference in seconds is to convert timestamp type to integer by unix_timestamp function.
Make a newgroup flag which equals one if and only if the difference from the previous record is larger than 60*10 seconds (10 minutes).
Cumulative sum of newgroup would become the section group ID.
with tmp AS (
SELECT
*,
coalesce(unix_timestamp(date_time) - unix_timestamp(lag(date_time) over (ORDER BY date_time)), 0) > 60*10 AS newgroup
FROM
tbl
)
,tmp2 AS (
SELECT
*,
sum(newgroup) over (ORDER BY date_time) AS groupid
FROM
tmp
)
SELECT * FROM tmp2
This query would get:
id date_time value newgroup groupid
0 2021-07-08 14:46:46 1 0 0
1 2021-07-08 14:47:47 5 0 0
2 2021-07-08 14:48:48 2 0 0
3 2021-07-08 14:49:49 4 0 0
4 2021-07-08 15:30:01 7 1 1
5 2021-07-08 15:30:46 4 0 1
6 2021-07-08 15:30:46 4 0 1
7 2021-07-08 15:50:04 4 1 2
8 2021-07-08 15:50:05 6 0 2
Hmmm . . . It sounds like you are looking for gaps to defines groups that are related, and the gaps are determined by the interval.
In pseudo-SQL, this might look like:
select min(date_time), max(date_time), count(*), avg(value)
from (select t.*,
sum(case when prev_date_time > date_time - interval '1 minute' then 0 else 1 end) over (order by date_time) as grp
from (select t.*,
lag(date_time) over (order by date_time) as prev_date_time
from t
) t
) t
group by grp;

SQL subquery in SELECT clause

I'm trying to find admin activity within the last 30 days.
The accounts table stores the user data (username, password, etc.)
At the end of each day, if a user had logged in, it will create a new entry in the player_history table with their updated data. This is so we can track progress over time.
accounts table:
id
username
admin
1
Michael
4
2
Steve
3
3
Louise
3
4
Joe
0
5
Amy
1
player_history table:
id
user_id
created_at
playtime
0
1
2021-04-03
10
1
2
2021-04-04
10
2
3
2021-04-05
15
3
4
2021-04-10
20
4
5
2021-04-11
20
5
1
2021-05-12
40
6
2
2021-05-13
55
7
3
2021-05-17
65
8
4
2021-05-19
75
9
5
2021-05-23
30
10
1
2021-06-01
60
11
2
2021-06-02
65
12
3
2021-06-02
67
13
4
2021-06-03
90
The following query
SELECT a.`username`, SEC_TO_TIME((MAX(h.`playtime`) - MIN(h.`playtime`))*60) as 'time' FROM `player_history` h, `accounts` a WHERE h.`created_at` > '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
Outputs this table:
Note that this is just admin activity, so Joe is not included in this data.
from 2021-05-06 to present (yy-mm-dd):
username
time
Michael
00:20:00
Steve
00:10:00
Louise
00:02:00
Amy
00:00:00
As you can see this from data, Amy's time is shown as 0 although she has played for 10 minutes in the last month. This is because she only has 1 entry starting from 2021-05-06 so there is no data to compare to. It is 0 because 10-10 = 0.
Another flaw is that it doesn't include all activity in the last month, basically only subtracts the highest value from the lowest.
So I tried fixing this by comparing the highest value after 2021-05-06 to their most previous login before the date. So I modified the query a bit:
SELECT a.`Username`, SEC_TO_TIME((MAX(h.`playtime`) - (SELECT MAX(`playtime`) FROM `player_history` WHERE a.`id` = `user_id` AND `created_at` < '2021-05-06'))*60) as 'Time' FROM `player_history` h, `accounts` a WHERE h.`created_at` >= '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
So now it will output:
username
time
Michael
00:50:00
Steve
00:50:00
Louise
00:52:00
Amy
00:10:00
But I feel like this whole query is quite inefficient. Is there a better way to do this?
I think you want lag():
SELECT a.username,
SEC_TO_TIME(SUM(h.playtime - COALESCE(h.prev_playtime, 0))) as time
FROM accounts a JOIN
(SELECT h.*,
LAG(playtime) OVER (PARTITION BY u.user_id ORDER BY h.created_at) as prev_playtime
FROM player_history h
) h
ON h.user_id = a.id
WHERE h.created_at > '2021-05-06' AND
a.admin > 0
GROUP BY a.username;
In addition to the LAG() logic, note the other changes to the query:
The use of proper, explicit, standard, readable JOIN syntax.
The use of consistent columns for the SELECT and GROUP BY.
The removal of single quotes around the column alias.
The removal of backticks; they just clutter the query, making it harder to write and to read.

How to get cumulative total for previous month and upto this month?

ID pcID contractor approver claimed
-------------------------------------------
1 1 one 1000 900
2 1 two 200 100
3 1 three 1000 1000
4 1 six 100 11
5 2 six 100 22
6 3 six 120 1
7 4 three 102 10
From the above table, I need to get cumulative amount for upto this month and previous month of approver and claimed and also current month approver, claimed amount based on the contractor. Like below table.
ID contractor approver claimed uptothisMTApprover uptothisMTClaimed previousMTApprover previousMTClaimed
-----------------------------------------------------------------------------------------------------------------
1 one 1000 900 1000 900 0 0
2 two 200 100 200 100 0 0
3 three 102 10 1102 1010 1000 1000
4 six 120 1 320 34 200 33
Thanks in advance..
You seem to want the latest row per contractor, as defined by pcID, and a cumulative sum of all previous months.
You can use window functions:
select contractor, approver, claimed,
total_approver as uptothisMTApprover,
total_claimed as uptothisMTClaimed,
total_approver - approver as previousMTApprover,
total_claimed - claimed as previousMTClaimed
from (
select t.*,
row_number() over(partition by contractor order by pcID desc) rn,
sum(approver) over(partition by contractor) total_approver,
sum(claimed) over(partition by contractor) total_claimed
from mytable t
) t
where rn = 1

Calculate avg time in mysql from last 7 days

Tablename=run_detail
I have to calculate avg time of jobs for last 7 days, but in somecases
number of runs could be less than 7 days. eg abc has only 2 run_date.
(4.5+6+.....+7)/7=5.83 and (23.9+45.7)/2=34.8 and also need to
calculate based on latest 7 runs. for eg. 2020-07-04 to 2020-07-10,
not from 2020-07-01
Job_name run_date rownum count elapsed_time(sec) avg_time
xyz 2020-07-01 1 10 4.5 ?
xyz 2020-07-02 2 10 6 ?
.......
xyz 2020-07-10 10 10 7.0 ?
abc 2020-07-01 1 2 23.9 ?
abc 2020-07-02 2 2 45.7 ?
Desired Output
Job_name run_date rownum count elapsed_time(sec) avg_time
xyz 2020-07-01 1 10 4.5 5.83
xyz 2020-07-02 2 10 6 5.83
.......
xyz 2020-07-10 10 10 7.0 5.83
abc 2020-07-01 1 2 23.9 34.8
abc 2020-07-02 2 2 45.7 34.8
Could you please help how to achieve the avg time in mysql
If you want the overage over the preceding 7 days, you can use a window functions:
select t.*,
avg(elapsed_time) over (partition by job_name
order by run_date
range between interval -6 day preceding and current row
) as avg_time
from t;
Note: This assumes that you really want six preceding days plus the current date. If you really want 7 days before to 1 day before (the preceding week), then use:
range between interval -7 day preceding and interval -1 day preceding
EDIT:
In older versions of MySQL, you can use a correlated subquery:
select t.*,
(select avg(t2.elapsed_time)
from t t2
where t2.job_name = t.job_name and
t2.run_date <= t.run_date and
t2.run_date > t.run_date - interval 7 day
) as avg_time
from t;
Adjust the date comparison to get exactly the period you want.

mysql query items with the largest price increase

I have a table 'item_prices' with:
resource_id, avg_price, time_stamp, samples
Items gets a new average price inserted into the table every day. Old averages are not deleted.
How can I query the 10 items with the highest "percent increase in price" since yesterdays average? I would also like to check that the samples is > 10 to ensure accuracy.
to clarify "percent increase in price":
percent_increase = (todays_avg_price - yesterdays_avg_price) / yesterdays_avg_price
example
resource_id | avg_price | time_stamp | samples
1 450 1380526003 12
2 650 1380526002 2
3 980 1380526001 68
1 400 1380440003 24
2 700 1380440002 13
3 400 1380440001 38
1 900 1380300003 11
2 250 1380300002 8
3 300 1380300001 4
returns
resource id | percent_increase
3 1.45
1 0.125
select
today.resource_id,
(today.avg_price - yesterday.avg_price) / yesterday.avg_price as percent_increase
from
item_prices today,
item_prices yesterday
where today.resource_id = yesterday_resource_id
and DATE(FROM_UNIXTIME(today.timestamp)) = $today
and DATE(FROM_UNIXTIME(yesterday.timestamp)) = $yesterday
order by percent_increase desc
limit 10
This is a self join; apologies for the archaic join syntax, it's a bad habit I find hard to shake.
Please try the query below..
select top 1 resource_id,avg_price from item_Prices
group by resource_id,avg_price
having count(resource_id,avg_price)>
10 order by time_stamp