MySql Statement History with user balance algorithm - mysql

I have a table with payment history
payments:
id
consumer_id
amount
created_at
1
1
30
2021-05-11 13:01:36
2
1
-10
2021-05-12 14:01:36
3
1
-2.50
2021-05-13 13:01:36
4
1
-4.50
2021-05-14 13:01:36
5
1
20
2021-05-15 13:01:36
In final result need to get consumer balance after each transaction.
So something like this
id
consumer_id
amount
created_at
balance
1
1
30
2021-05-11 13:01:36
30.00
2
1
-10
2021-05-12 14:01:36
20.00
3
1
-2.50
2021-05-13 13:01:36
17.50
4
1
-4.50
2021-05-14 13:01:36
13.00
5
1
20
2021-05-15 13:01:36
33.00
I using this query
SET #balanceTotal = 0;
select amount, created_at, consumer_id, #balanceTotal := #balanceTotal + amount as balance
from payments
where consumer_id = 1
This works fine until I try to add some sorting or pagination.
Any suggestion on how to write a query with order by desc, limit, and offset to count balance properly?

That's just a window sum. In MySQL 8.0:
select p.*,
sum(amount) over(partition by consumer_id order by created_at) balance
from payments p
You can add the filtering on the customer in the where clause if you like (in which case the partition by clause is not really needed anymore).
In earlier versions of MySQL, an alternative uses a correlated subquery:
select p.*,
(
select sum(amount)
from payments p1
where p1.consumer_id = p.consumer_id and p1.created_at <= p.created_at
) balance
from payments p
I would not recommend user variables for this; although efficient, their behavior is quite tricky, and their use is deprecated in recent. versions.

If using MySQL >= 8 using a window sum is preferable -
select p.*, sum(amount) over(order by created_at) balance
from payments p
where consumer_id = 1
order by created_at desc
limit 0, 5;
If you are using MySQL < 8 then using a user variable for this is significantly more efficient than using the suggested correlated subquery. You can have it as a derived table for re-ordering and pagination -
select * from (
select p.*, #balanceTotal := #balanceTotal + amount as balance
from payments p, (SELECT #balanceTotal := 0) vars
where consumer_id = 1
order by created_at
) t
order by created_at desc
limit 0, 5;

Related

Get all transaction details of a user f their 2nd month of transaction

Trying to get the 2nd transaction month details for all the customers
Date User_id amount
2021-11-01 1 100
2021-11-21 1 200
2021-12-20 2 110
2022-01-20 2 200
2022-02-04 1 50
2022-02-21 1 100
2022-03-22 2 200
For every customer get all the records in the month of their 2nd transaction (There can be multiple transaction in a month and a day by a particular user)
Expected Output
Date User_id amount
2022-02-04 1 50
2022-02-21 1 100
2022-01-20 2 200
You can use dense_rank:
select Date, User_id, amount from
(select *, dense_rank() over(partition by User_id order by year(Date), month(date)) r
from table_name) t
where r = 2;
Fiddle
If dense_rank is an option you can:
with cte1 as (
select *, extract(year_month from date) as yyyymm
from t
), cte2 as (
select *, dense_rank() over (partition by user_id order by yyyymm) as dr
from cte1
)
select *
from cte2
where dr = 2
Note that it is possible to write the above using one cte.

Interpolate Multiseries Data In SQL

I have a system that stores the data only when they are changed. So, the dataset looks like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
My data frequency is every 5 seconds. So, whether there's any timestamp or not I need to get the result by assuming in this 5th-second data value the same as the previous value.
As I am storing the data that are only changed, indeed the dataset should be like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:52
2
239
2022-01-19 17:20:47
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
I don't want to insert into my table, I just want to retrieve the data like this on the SELECT statement.
Is there any way I can create this query?
PS. I have many data_types hence when the OP makes a query, it usually gets around a million rows.
EDIT:
Information about server Server version: 10.3.27-MariaDB-0+deb10u1 Debian 10
The User is going to determine the SELECT DateTime. So, there's no certain between time.
As #Akina mentioned, sometimes there're some gaps between the inserted_at. The difference might be ~4seconds or ~6seconds instead of a certain 5seconds. Since it's not going to happen so frequently, It is okay to generate by ignoring this fact.
With the help of a query that gets you all the combinations of data_type_id and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value:
with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id,
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value,
d as inserted_at
from v
Fiddle
You can replace the recursive CTE with any query that gets you all the 5-second moments you need.
WITH RECURSIVE
cte1 AS ( SELECT #start_datetime dt
UNION ALL
SELECT dt + INTERVAL 5 SECOND FROM cte1 WHERE dt < #end_datetime),
cte2 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY test.data_type_id, cte1.dt
ORDER BY test.inserted_at DESC) rn
FROM cte1
LEFT JOIN test ON FIND_IN_SET(test.data_type_id, #data_type_ids)
AND cte1.dt >= test.inserted_at )
SELECT *
FROM cte2
WHERE rn = 1
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=380ad334de0c980a0ddf1b49bb6fa38e

MySQL : Selecting the rows with the highest group by count

I have a table with records that are updated every minute with a decimal value (10,2). To ignore measure errors I want to have the number that has been inserted the most.
Therefor I tried:
SELECT date_time,max(sensor1),count(ID)
FROM `weigh_data
group by day(date_time),sensor1
This way I get the number of records
Datetime sensor1 count(ID)
2020-03-19 11:49:12 33.22 3
2020-03-19 11:37:47 33.36 10
2020-03-20 07:32:02 32.54 489
2020-03-20 00:00:43 32.56 891
2020-03-20 14:20:51 32.67 5
2020-03-21 07:54:16 32.50 1
2020-03-21 00:00:58 32.54 1373
2020-03-21 01:15:16 32.56 9
2020-03-22 08:35:12 32.52 2
2020-03-22 00:00:40 32.54 575
2020-03-22 06:50:54 32.58 1
What I actually want is for each day one row which has the highest count(ID)
Anyone can help me out on this?
With newer MySQL (8.0 and later) you can use the RANK window function to rank the rows according to the count.
Note that this will return all "ties" which means if there are 100 readings of X and 100 readings of Y (and 100 is the max), both X and Y will be returned.
WITH cte AS (
SELECT
DATE(date_time), sensor1,
RANK() OVER (PARTITION BY DATE(date_time) ORDER BY COUNT(*) DESC) rnk
FROM `weigh_data` GROUP BY DATE(date_time), sensor1
)
SELECT * FROM cte WHERE rnk=1
If you just want to pick one (non deterministic) of the ties, you can instead use ROW_NUMBER in place of RANK
A DBfiddle to test with.
Here is a solution based on a correlated subquery, that works in all versions of MySQL:
select w.*
from weigh_data w
where w.datetime = (
select w1.datetime
from weigh_data w1
where w1.datetime >= date(w.datetime) and w1.datetime < date(w.datetime) + interval 1 day
order by sensor1 desc
limit 1
)
Just like the window function solution using rank(), this allows top ties.
For performance, you want an index on (datetime, sensor1).

MySQL Error 1111 - Invalid use of group function when nesting window functions

I'm working to create a SQL report on answers table:
id | created_at
1 | 2018-03-02 18:05:56
2 | 2018-04-02 18:05:56
3 | 2018-04-02 18:05:56
4 | 2018-05-02 18:05:56
5 | 2018-06-02 18:05:56
And output is:
weeks_ago | record_count (# of rows per weekly cohort) | growth (%)
-4 | 21 | 22%
-3 | 22 | -12%
-2 | 32 | 2%
-1 | 2 | 20%
0 | 31 | 0%
My query is currently erring with:
1111 - Invalid use of group function
What am I doing wrong here?
SELECT floor(datediff(f.created_at, curdate()) / 7) AS weeks_ago,
count(DISTINCT f.id) AS "New Records in Cohort",
100 * (count(*) - lag(count(*), 1) over (order by f.created_at)) / lag(count(*), 1) over (order by f.created_at) || '%' as growth
FROM answers f
WHERE f.completed_at IS NOT NULL
GROUP BY weeks_ago
HAVING count(*) > 1;
I think you want to find running count of all rows excluding the current row. I think you can ditch the LAG function as follows:
SELECT
COUNT(*) OVER (ORDER BY f.created_at ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) x, -- running count before current row
COUNT(*) OVER (ORDER BY f.created_at) y -- running count including current row
You can divide and multiply all you want.
Nope. you simply need to separate GROUP BY and LAG OVER:
WITH cte AS (
SELECT
FLOOR(DATEDIFF(created_at, CURDATE()) / 7) AS weeks_ago,
COUNT(DISTINCT id) AS new_records
FROM answers
WHERE 1 = 1 -- todo: change this
GROUP BY weeks_ago
HAVING 1 = 1 -- todo: change this
)
SELECT
cte.*,
100 * (
new_records - LAG(new_records) OVER (ORDER BY weeks_ago)
) / LAG(new_records) OVER (ORDER BY weeks_ago) AS percent_increase
FROM cte
Fiddle
You can't use lag contain COUNT aggregate function, because It isn't valid when you use aggregate function contain aggregate function.
you can try to use a subquery to make it.
SELECT weeks_ago,
NewRecords "New Records in Cohort",
100 * (cnt - lag(cnt, 1) over (order by created_at)) / lag(cnt, 1) over (order by created_at) || '%' as growth
FROM (
SELECT floor(datediff(f.created_at, curdate()) / 7) AS weeks_ago,
COUNT(*) over(partition by weeks_ago order by weeks_ago) cnt,
count(DISTINCT f.id) NewRecords,
f.created_at
FROM answers f
) t1

Correct query to get average from top 5 of 7 days?

I'm tracking number of steps/day. I want to get the average steps/day using the 5 best days out of a 7 day period. My end goal is going to be to get an average for the best 5 out of 7 days for a total of 16 weeks.
Here's my sqlfiddle - http://sqlfiddle.com/#!9/5e69bdf/2
Here is the query I'm currently using but I've discovered the result is not correct. It's taking the average of 7 days instead of selecting the 5 days that had the most steps. It's outputting 14,122 as an average instead of 11,606 based on my data as posted in the sqlfiddle.
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid=? AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL $y DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL $x DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
Here's the same query with the values filled in for testing:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL 6 DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
As #SloanThrasher pointed out, the reason the query is not working is because you have multiple rows for the same course in the Courses database which end up being joined to the activities database. Thus the output for the subquery gives the top value (16058) 3 times plus the second highest value (11218) twice for a total of 70610 and an average of 14122. You can work around this by modifying the query as follows:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN (SELECT DISTINCT Startsemester FROM Courses) c
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(c.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(c.Startsemester, INTERVAL 6 DAY)
ORDER BY CAST(activities.steps AS UNSIGNED) DESC LIMIT 5
) a
GROUP BY a.encodedid
Now since there are actually only 3 days with activity (2018-07-16, 2018-07-17 and 2018-07-18) between the start of semester and 6 days later (2018-07-12 and 2018-07-18) this gives a total of 37533 (16058+11218+10277) and an average of 12517.7.
StepsTotal AVGSteps
37553 12517.666666666666
Ideally, you probably also want to add a constraint on the Course chosen from Courses e.g. change
(SELECT DISTINCT Startsemester FROM Courses)
to
(SELECT DISTINCT Startsemester FROM Courses WHERE CourseNumber='PHED1164')
Try this query:
SELECT #rn := 1, #weekAndYear := 0;
SELECT weekDayAndYear,
SUM(steps),
AVG(steps)
FROM (
SELECT #weekAndYear weekAndYearLag,
CASE WHEN #weekAndYear = YEAR(activitydate) * 100 + WEEK(activitydate)
THEN #rn := #rn + 1 ELSE #rn := 1 END rn,
#weekAndYear := YEAR(activitydate) * 100 + WEEK(activitydate) weekDayAndYear,
steps,
lightly_act_min,
fairly_act_min,
sed_act_min,
vact_min,
encodedid,
activitydate,
username
FROM activities
ORDER BY YEAR(activitydate) * 100 + WEEK(activitydate), CAST(steps AS UNSIGNED) DESC
) a WHERE rn <= 5
GROUP BY weekDayAndYear
Demo
With additional variables, I imitate SQL Server ROW_NUMBER function, to number from 1 to 7 days partitioned by weeks. This way I can filter best 5 days and easily get a average grouping by column weekAndDate, which is in the same format as variable: yyyyww (i used integer to avoid casting to varchar).
Consider the following:
DROP TABLE IF EXISTS my_table;
CREATE TABLE `my_table`
(id SERIAL PRIMARY KEY
,steps INT NOT NULL
);
insert into my_table (steps) values
(9),(5),(7),(7),(7),(8),(4);
select prev
, sum(steps) total
from (
select steps
, case when #prev = grp
then #j:=#j+1 else #j:=1 end j
, #prev:=grp prev
from (SELECT steps
, case when mod(#i,3)=0
then #grp := #grp+1 else #grp:=#grp end grp -- a 3 day week
, #i:=#i+1 i
from my_table
, (select #i:=0,#grp:=0) vars
order
by id) x
, (select #prev:= null, #j:=0) vars
order by grp,steps desc,i) a
where j <=2 -- top 2 (out of 3)
group by prev;
+------+-------+
| prev | total |
+------+-------+
| 1 | 16 |
| 2 | 15 |
| 3 | 4 |
+------+-------+
http://sqlfiddle.com/#!9/ee46d7/11