Group By specified numbers of ordered rows - mysql

I have such table in my MySQL database:
---------------------------
|fid | price | date |
---------------------------
| 1 | 1.23 | 2011-08-11 |
| 1 | 1.43 | 2011-08-12 |
| 1 | 1.54 | 2011-08-13 |
| 1 | 1.29 | 2011-08-14 |
| 1 | 1.60 | 2011-08-15 |
| 1 | 1.80 | 2011-08-16 |
fid - this is the product id
price - this is the price of the product in specified day
I want to calculate average price of the product fid=1. I want to calculate the average price of first n=3 rows ordered by date for specified fid, and then calculate average price for another 3 rows ordered by date.
How can I group first 3 rows and calculate avg and then group next 3 rows and calculate avg. Before calculation I need to sort the rows by date and then group n rows.
If n=3 this should return such result:
--------------
|fid | price |
--------------
| 1 | 1.40 | 2011-08-11 -> 2011-08-13 - average price for 3 days
| 1 | 1.56 | 2011-08-14 -> 2011-08-16 - average price for 3 days
How can I create SQL Query to do such calculations?
Thanks in advance.

Unluckily mysql doesn't offer analytic functions like oracle,mssql and postgres do. So you have to play with variables to reach your goal.
create table mytest (
id int not null auto_increment primary key,
fid int,
price decimal(4,2),
fdate date
) engine = myisam;
insert into mytest (fid,price,fdate)
values
(1,1.23,'2011-08-11'),
(1,1.43,'2011-08-12'),
(1,1.54,'2011-08-13'),
(1,1.29,'2011-08-14'),
(1,1.60,'2011-08-15'),
(1,1.80,'2011-08-16');
select
concat_ws('/',min(fdate),max(fdate)) as rng,
format(avg(price),2) as average from (
select *,#riga:=#riga+1 as riga
from mytest,(select #riga:=0) as r order by fdate
) as t
group by ceil(riga/3);
+-----------------------+---------+
| rng | average |
+-----------------------+---------+
| 2011-08-11/2011-08-13 | 1.40 |
| 2011-08-14/2011-08-16 | 1.56 |
+-----------------------+---------+
2 rows in set (0.02 sec)

maybe you could use
GROUP BY FLOOR(UNIX_TIMESTAMP(date)/(60*60*24*3))
= convert to secounds, divide by secounds for 3 days, and round down

SELECT AVG( price ) FROM my_table
GROUP BY ( date - ( SELECT MIN( date ) FROM my_table WHERE fid = 1 ) ) DIV 3
WHERE fid = 1

select fid, avg(price), min(date), max(date)
from
(select floor((#rownum := #rownum + 1)/3) as `rank`, prices.*
from prices, (SELECT #rownum:=-1) as r
order by date) as t
group by rank

Related

Retrieving latest dates grouped by a key column value (MySql)

Given the following table with purchase data.
CREATE TABLE myTable (
id INT NOT NULL AUTO_INCREMENT,
date DATETIME NOT NULL,
subNo SMALLINT NOT NULL,
poNo INT NOT NULL,
PRIMARY KEY (id))
INSERT INTO myTable VALUES (0, '2022-11-01 12:43', 1, 800), (0, '2022-11-02 13:00', 1, 800), (0, '2022-11-03 12:43', 2, 800), (0, '2022-11-03 14:00', 1, 923), (0, '2022-11-03 15:00', 2, 800), (0, '2022-11-04 12:43', 1, 800)
Id | Date | SubNo | PO# |
----|------------------|-------|-----|
100 | 2022-11-01 12:43 | 1 | 800 |
101 | 2022-11-02 13:00 | 1 | 800 |
102 | 2022-11-03 12:43 | 2 | 800 |
103 | 2022-11-03 14:00 | 1 | 923 |
104 | 2022-11-03 15:00 | 2 | 800 |
105 | 2022-11-04 12:43 | 1 | 800 |
SubNo is the ordinal number of a subset or partial quantity of the purchase (PO#). There can be more than 30 subsets to a purchase.
I am looking for a query supplying for a given purchase for each of its subsets the latest date.
For PO 800 it would look like this:
Id | Date | SubNo | PO# |
----|------------------|-------|-----|
105 | 2022-11-04 12:43 | 1 | 800 |
104 | 2022-11-03 15:00 | 2 | 800 |
I haven't found a way to filter the latest dates.
A rough approach is
SELECT id, date, subNo
FROM myTable
WHERE poNo=800
GROUP BY subNo
ORDER BY subNo, date DESC
but DISTINCT and GROUP BY do not guarantee to return the latest date.
Then I tried to create a VIEW first, to be used in a later query.
CREATE VIEW myView AS
SELECT subNo s, (SELECT MAX(date) FROM myTable WHERE poNo=800 AND subNo=s) AS dd
FROM myTable
WHERE poNo=800
GROUP BY s
But although the query is ok, the result differs when used for a VIEW, probably due to VIEW restrictions.
Finally I tried a joined table
SELECT id, datum, subNo s
FROM myTable my JOIN (SELECT MAX(date) AS d FROM myTable WHERE poNo=800 AND subNo=s) tmp ON my.date=tmp.d
WHERE poNo=800
but getting the error "Unknown column 's' in where clause.
My MySql version is 8.0.22
You can check if (date, subno) corresponds to one of the pairs of ( MAX(date), subno) :
SELECT id, date, subno
FROM mytable
WHERE pono = 800 AND (date, subno) IN (
SELECT MAX(date), subno
FROM mytable
WHERE pono = 800
GROUP BY subno
)
GROUP BY subno;
My result in a clean table :
+----+---------------------+-------+
| id | date | subno |
+----+---------------------+-------+
| 6 | 2022-11-04 12:43:00 | 1 |
| 5 | 2022-11-03 15:00:00 | 2 |
+----+---------------------+-------+
Depending on how you want to to manage multiple rows being the max with the same subno, you might want to remove the last GROUP BY subno. With it, it only shows one of them. Without, it shows all the duplicated max rows.
We use row_number(), partition by SubNo and PO and order by Date Desc.
select Id
,Date
,SubNo
,PO
from
(
select *
,row_number() over(partition by SubNo, PO order by Date desc) as rn
from t
) t
where rn = 1
Id
Date
SubNo
PO
105
2022-11-04 12:43:00
1
800
103
2022-11-03 14:00:00
1
923
104
2022-11-03 15:00:00
2
800
Fiddle

SQL Query with all data from lest column and fill blank with previous row value

After searching a lot on this forum and the web, i have an issue that i cannot solve without your help.
The requirement look simple but not the code :-(
Basically i need to make a report on cumulative sales by product by week.
I have a table with the calendar (including all the weeks) and a view which gives me all the cumulative values by product and sorted by week. What i need the query to do is to give me all the weeks for each products and then add in a column the cumulative values from the view. if this value does not exist, then it should give me the last know record.
Can you help?
Thanks,
The principal is establish all the weeks that a product could have had sales , sum grouping by week, add the missing weeks and use the sum over window function to get a cumulative sum
DROP TABLE IF EXISTS T;
CREATE TABLE T
(PROD INT, DT DATE, AMOUNT INT);
INSERT INTO T VALUES
(1,'2022-01-01', 10),(1,'2022-01-01', 10),(1,'2022-01-20', 10),
(2,'2022-01-10', 10);
WITH CTE AS
(SELECT MIN(YEARWEEK(DT)) MINYW, MAX(YEARWEEK(DT)) MAXYW FROM T),
CTE1 AS
(SELECT DISTINCT YEARWEEK(DTE) YW ,PROD
FROM DATES
JOIN CTE ON YEARWEEK(DTE) BETWEEN MINYW AND MAXYW
CROSS JOIN (SELECT DISTINCT PROD FROM T) C
)
SELECT CTE1.YW,CTE1.PROD
,SUMAMT,
SUM(SUMAMT) OVER(PARTITION BY CTE1.PROD ORDER BY CTE1.YW) CUMSUM
FROM CTE1
LEFT JOIN
(SELECT YEARWEEK(DT) YW,PROD ,SUM(AMOUNT) SUMAMT
FROM T
GROUP BY YEARWEEK(DT),PROD
) S ON S.PROD = CTE1.PROD AND S.YW = CTE1.YW
ORDER BY CTE1.PROD,CTE1.YW
;
+--------+------+--------+--------+
| YW | PROD | SUMAMT | CUMSUM |
+--------+------+--------+--------+
| 202152 | 1 | 20 | 20 |
| 202201 | 1 | NULL | 20 |
| 202202 | 1 | NULL | 20 |
| 202203 | 1 | 10 | 30 |
| 202152 | 2 | NULL | NULL |
| 202201 | 2 | NULL | NULL |
| 202202 | 2 | 10 | 10 |
| 202203 | 2 | NULL | 10 |
+--------+------+--------+--------+
8 rows in set (0.021 sec)
Your calendar date may be slightly different to mine but you should get the general idea.

Get return for the latest day

I am running a mysql - 10.1.39-MariaDB - mariadb.org binary- database.
I am having the following table:
| id | date | product_name | close |
|----|---------------------|--------------|-------|
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 |
| 2 | 2019-08-06 00:00:00 | Product 1 | 982 |
| 3 | 2019-08-05 00:00:00 | Product 1 | 64 |
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 |
| 5 | 2019-08-06 00:00:00 | Product 2 | 739 |
| 6 | 2019-08-05 00:00:00 | Product 2 | 555 |
| 7 | 2019-08-07 00:00:00 | Product 3 | 762 |
| 8 | 2019-08-06 00:00:00 | Product 3 | 955 |
| 9 | 2019-08-05 00:00:00 | Product 3 | 573 |
I want to get the following output:
| id | date | product_name | close | daily_return |
|----|---------------------|--------------|-------|--------------|
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 | 0,182679296 |
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 | -0,179226069 |
Basically I want ot get the TOP 2 products with the highest return. Whereas return is calculated by (close_currentDay - close_previousDay)/close_previousDay for each product.
I tried the following:
SELECT
*,
(
CLOSE -(
SELECT
(t2.close)
FROM
prices t2
WHERE
t2.date < t1.date
ORDER BY
t2.date
DESC
LIMIT 1
)
) /(
SELECT
(t2.close)
FROM
prices t2
WHERE
t2.date < t1.date
ORDER BY
t2.date
DESC
LIMIT 1
) AS daily_return
FROM
prices t1
WHERE DATE >= DATE(NOW()) - INTERVAL 1 DAY
Which gives me the return for each product_name.
How to get the last product_name and sort this by the highest daily_return?
Problem Statement: Find the top 2 products with the highest returns on the latest date i.e. max date in the table.
Solution:
If you have an index on date field, it would be super fast.
Scans table only once and also uses date filter(index would allow MySQL to only process rows of given date range only.
A user-defined variable #old_close is used to find the return. Note here we need sorted data based on product and date.
SELECT *
FROM (
SELECT
prices.*,
CAST((`close` - #old_close) / #old_close AS DECIMAL(20, 10)) AS daily_return, -- Use #old_case, currently it has value of old row, next column will set it to current close value.
#old_close:= `close` -- Set #old_close to close value of this row, so it can be used in next row
FROM prices
INNER JOIN (
SELECT
DATE(MAX(`date`)) - INTERVAL 1 DAY AS date_from, -- if you're not sure whether you have date before latest date or not, can keep date before 1/2/3 day.
#old_close:= 0 as o_c
FROM prices
) AS t ON prices.date >= t.date_from
ORDER BY product_name, `date` ASC
) AS tt
ORDER BY `date` DESC, daily_return DESC
LIMIT 2;
Another version which doesn't depend on this date parameter.
SELECT *
FROM (
SELECT
prices.*,
CAST((`close` - #old_close) / #old_close AS DECIMAL(20, 10)) AS daily_return, -- Use #old_case, currently it has value of old row, next column will set it to current close value.
#old_close:= `close` -- Set #old_close to close value of this row, so it can be used in next row
FROM prices,
(SELECT #old_close:= 0 as o_c) AS t
ORDER BY product_name, `date` ASC
) AS tt
ORDER BY `date` DESC, daily_return DESC
LIMIT 2
You can do it with a self join:
select
p.*,
cast((p.close - pp.close) / pp.close as decimal(20, 10)) as daily_return
from prices p left join prices pp
on p.product_name = pp.product_name
and pp.date = date_add(p.date, interval -1 day)
order by p.date desc, daily_return desc, p.product_name
limit 2
See the demo.
Results:
| id | date | product_name | close | daily_return |
| --- | ------------------- | ------------ | ----- | ------------ |
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 | 0.182679296 |
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 | -0.179226069 |

Calculate balance per each category group

I have table my_table which contains groups of categories, each category has initial budget (original_budget):
I am trying to add a new column balance so it contains the balance after reducing expense from the original_budget in each category group. Something like:
my try:
SELECT category, expense, original_budget, (original_budget-expense) AS balance
FROM my_table GROUP BY category order by `trans_date`
MySQL version: innodb_version 5.7.25
10.2.23-MariaDB
If you are using MySQL 8+, then it is fairly straightforward to use SUM here as a window function:
SELECT
trans_date,
category,
expense,
original_budget,
original_budget - SUM(expense) OVER
(PARTITION BY category
ORDER BY trans_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) balance
FROM my_table
ORDER BY
category,
trans_date;
Demo
On earlier versions of MySQL, we can try to compute the rolling balance using a correlated subquery:
SELECT
trans_date,
category,
expense,
original_budget,
original_budget - (SELECT SUM(t2.expense) FROM my_table t2
WHERE t1.category = t2.category AND
t2.trans_date <= t1.trans_date) balance
FROM my_table t1
ORDER BY
category,
trans_date;
Demo
For All MySQL versions:
You can use MySQL User defined Variable to reduce balance amount for a category. For this keep same category records together with sorted dates.
SELECT
category,
expense,
original_budget,
IF(#cat <> category, #budg:= original_budget - expense, #budg:= #budg - expense) AS balance,
#cat:= category -- Set category to current value so we can compare it in next iteration
FROM my_table,
(SELECT #cat:= '' AS c, #budg:= NULL AS b) AS t
ORDER BY category, `trans_date`;
Output:
| category | expense | original_budget | balance | #cat:= category |
| A | 10 | 100 | 90 | A |
| A | 2 | 100 | 88 | A |
| A | 1 | 100 | 87 | A |
| B | 12 | 300 | 288 | B |
| B | 1 | 300 | 287 | B |
| B | 1 | 300 | 286 | B |
| B | 1 | 300 | 285 | B |

How to get the average time between multiple dates

What I'm trying to do is bucket my customers based on their transaction frequency. I have the date recorded for every time they transact but I can't work out to get the average delta between each date. What I effectively want is a table showing me:
| User | Average Frequency
| 1 | 15
| 2 | 15
| 3 | 35
...
The data I currently have is formatted like this:
| User | Transaction Date
| 1 | 2018-01-01
| 1 | 2018-01-15
| 1 | 2018-02-01
| 2 | 2018-06-01
| 2 | 2018-06-18
| 2 | 2018-07-01
| 3 | 2019-01-01
| 3 | 2019-02-05
...
So basically, each customer will have multiple transactions and I want to understand how to get the delta between each date and then average of the deltas.
I know the datediff function and how it works but I can't work out how to split them transactions up. I also know that the offset function is available in tools like Looker but I don't know the syntax behind it.
Thanks
In MySQL 8+ you can use LAG to get a delayed Transaction Date and then use DATEDIFF to get the difference between two consecutive dates. You can then take the average of those values:
SELECT User, AVG(delta) AS `Average Frequency`
FROM (SELECT User,
DATEDIFF(`Transaction Date`, LAG(`Transaction Date`) OVER (PARTITION BY User ORDER BY `Transaction Date`)) AS delta
FROM transactions) t
GROUP BY User
Output:
User Average Frequency
1 15.5
2 15
3 35
Demo on dbfiddle.com
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(user INT NOT NULL
,transaction_date DATE
,PRIMARY KEY(user,transaction_date)
);
INSERT INTO my_table VALUES
(1,'2018-01-01'),
(1,'2018-01-15'),
(1,'2018-02-01'),
(2,'2018-06-01'),
(2,'2018-06-18'),
(2,'2018-07-01'),
(3,'2019-01-01'),
(3,'2019-02-05');
SELECT user
, AVG(delta) avg_delta
FROM
( SELECT x.*
, DATEDIFF(x.transaction_date,MAX(y.transaction_date)) delta
FROM my_table x
JOIN my_table y
ON y.user = x.user
AND y.transaction_date < x.transaction_date
GROUP
BY x.user
, x.transaction_date
) a
GROUP
BY user;
+------+-----------+
| user | avg_delta |
+------+-----------+
| 1 | 15.5000 |
| 2 | 15.0000 |
| 3 | 35.0000 |
+------+-----------+
I don't know what to say other than use a GROUP BY.
SELECT User, AVG(DATEDIFF(...))
FROM ...
GROUP BY User