How to GROUP BY in groups of N rows? - mysql

I have a table called PEOPLE with the following data:
MYNAME AGE MYDATE
==========================
MARIO 20 2015/02/03
MARIA 10 2015/02/02
PEDRO 40 2015/02/01
JUAN 15 2015/01/03
PEPE 20 2015/01/02
JULIA 30 2015/01/01
JUANI 50 2014/02/03
MARTIN 10 2014/02/03
NASH 21 2014/02/03
Then I want to get the average of age grouping the people in groups of 3 ordering by MYDATE descending.
I mean, the result that I'm looking for would be something like:
23,3
21,6
27
Where 23,3 is the average of the age of:
MARIO 20 2015/02/03
MARIA 10 2015/02/02
PEDRO 40 2015/02/01
And 21,6 is the average of the age of:
JUAN 15 2015/01/03
PEPE 20 2015/01/02
JULIA 30 2015/01/01
And 27 is the average of the age of:
JUANI 50 2014/02/03
MARTIN 10 2014/02/03
NASH 21 2014/02/03
How could I handle this? I know how to use GROUP BY but only to group for a particular field of the table.

SQL tables are inherently unordered, so I assume that you want to order by mydate descending. You can enumerate the rows using variables, use arithmetic to define the groups, and then get the average:
select avg(age)
from (select t.*, (#rn := #rn + 1) as seqnum
from table t cross join
(select #rn := 0) vars
order by mydate desc
) t
group by floor((seqnum - 1) / 3);

Try - You can also then use Grp to get which three people the average relates to
select MYNAME, AGE, MYDATE, RN / 3 As Grp into #x
from
(select MYNAME, AGE , MYDATE, ROW_NUMBER() over(order by MyDate) + 2 as RN
from MYdata)x
select Grp, AVG(Age) as AvgAge
From #x
Group By Grp

Related

Get original RANK() value based on row create date

Using MariaDB and trying to see if I can get pull original rankings for each row of a table based on the create date.
For example, imagine a scores table that has different scores for different users and categories (lower score is better in this case)
id
leaderboardId
userId
score
submittedAt ↓
rankAtSubmit
9
15
555
50.5
2022-01-20 01:00:00
2
8
15
999
58.0
2022-01-19 01:00:00
3
7
15
999
59.1
2022-01-15 01:00:00
3
6
15
123
49.0
2022-01-12 01:00:00
1
5
15
222
51.0
2022-01-10 01:00:00
1
4
14
222
87.0
2022-01-09 01:00:00
1
5
15
555
51.0
2022-01-04 01:00:00
1
The "rankAtSubmit" column is what I'm trying to generate here if possible.
I want to take the best/smallest score of each user+leaderboard and determine what the rank of that score was when it was submitted.
My attempt at this failed because in MySQL you cannot reference outer level columns more than 1 level deep in a subquery resulting in an error trying to reference t.submittedAt in the following query:
SELECT *, (
SELECT ranking FROM (
SELECT id, RANK() OVER (PARTITION BY leaderboardId ORDER BY score ASC) ranking
FROM scores x
WHERE x.submittedAt <= t.submittedAt
GROUP BY userId, leaderboardId
) ranks
WHERE ranks.id = t.id
) rankAtSubmit
FROM scores t
Instead of using RANK(), I was able to accomplish this by with a single subquery that counts the number of users that have a score that is lower than and submitted before the given score.
SELECT id, userId, score, leaderboardId, submittedAt,
(
SELECT COUNT(DISTINCT userId) + 1
FROM scores t2
WHERE t2.userId = t.userId AND
t2.leaderboardId = t.leaderboardId AND
t2.score < t.score AND
t2.submittedAt <= t.submittedAt
) AS rankAtSubmit
FROM scores t
What I understand from your question is you want to know the minimum and maximum rank of each user.
Here is the code
SELECT userId, leaderboardId, score, min(rankAtSubmit),max(rankAtSubmit)
FROM scores
group BY userId,
leaderboardId,
scorescode here

How to cummulative sum in MySql with grupBy and interval?

Table Sales
date id_product total sales
2018-10-01 1 40
2019-09-01 1 20
2019-11-01 1 5
2019-12-01 1 40
2020-01-01 1 10
2020-02-01 1 15
2020-03-01 1 20
2020-08-01 1 10
2021-01-01 1 5
2021-02-01 1 8
2021-04-01 1 12
Table Product
id name
1 Book
2 Pen
How to query in MySql to get the number (serial/sequential number) and the total cumulative sales with an interval of 3 years to get results like this?
number name date sum_sales cummulative_sales
1 Book 2018 40 40
2 Book 2019 65 105
3 Book 2020 55 160
4 Book 2021 25 145
Assuming you are running MySQL 8+, you may try using SUM as an analytic function:
SELECT
ROW_NUMBER() OVER (PARTITION BY p.name ORDER BY YEAR(s.date)) number,
p.name,
YEAR(s.date) AS date,
SUM(s.total_sales) AS sum_sales,
SUM(SUM(s.total_sales)) OVER (PARTITION BY p.name ORDER BY YEAR(s.date)) AS cummulative_sales
FROM Sales s
INNER JOIN Product p
ON p.id = s.id_product
GROUP BY
p.name,
YEAR(s.date)
ORDER BY
p.name,
YEAR(s.date);
Using MySQL 5.7, this will give you a sum and a cumulative sum of sales in a window of 3 years.
SELECT (#row = #row + 1) AS number, name, YEAR(date) AS date,
SUM(total_sales) as sum_sales,
iF(
#window > 0,
#cummulative_sales := #cummulative_sales + SUM(total_sales),
#cummulative_sales := SUM(total_sales)
) AS cummulative_sales,
#window := #window + 1,
#window := #window % 3
FROM Sales
JOIN Product ON Sales.id_product = Product.id,
(
SELECT #row := 0, #cummulative_sales := 0, #window := 0
) as a
GROUP BY name, YEAR(date);

Grouping by to find average differences for specific indexes in SQL

I have the following table:
person_index score year
3 76 2003
3 86 2004
3 86 2005
3 87 2006
4 55 2005
4 91 2006
I want to group by person_index, getting the average score difference between consecutive years, such that I end up with one row per person, indicating the average increase/decrease:
person_index avg(score_diff)
3 3.67
4 36
So for person with index 3 - there were changes over 3 years, one was 10pt, one was 0, and one was 1pt. Therefore, their average score_diff is 3.67.
EDIT: to clarify, scores can also decrease. And years aren't necessarily consecutive (one person might not get a score at a certain year, so could be 2013 followed by 2015).
Simplest way is to use LAG(MySQL 8.0+):
WITH cte AS (
SELECT *, score - LAG(score) OVER(PARTITION BY person_index ORDER BY year) AS diff
FROM tab
)
SELECT person_index, AVG(diff) AS avg_diff
FROM cte
GROUP BY person_index;
db<>fiddle demo
Output:
+---------------+----------+
| person_index | avg_diff |
+---------------+----------+
| 3 | 3.6667 |
| 4 | 36.0000 |
+---------------+----------+
If the scores only increase -- as in your example -- you can simply do:
select person_id,
( max(score) - min(score) ) / nullif(max(year) - min(year) - 1, 0)
from t
group by person_id;
If they do not only increase, it is a bit trickier because you have to calculate the first and last scores:
select t.person_id,
(tmax.score - tmin.score) / nullif(tmax.year - tmin.year - 1, 0)
from (select t.person_id, min(year) as miny, max(year) as maxy
from t
group by person_id
) p join
t tmin
on tmin.person_id = p.person_id and tmin.year = p.miny join
t tmax
on tmax.person_id = p.person_id and tmax.year = p.maxy join

Second Last records

I am trying to get the second last records use mysql.
I did some research, some sample has fix gap between numbers or date. But my situation is that the contract_id is not always +1 from the previous one. Anyone ideas? Thank you so much.
merchant_id contract_id start_date
10 501 2016-05-01
10 506 2016-06-01
13 456 2015-12-01
13 462 2016-01-01
14 620 2016-06-01
14 642 2016-07-01
14 656 2016-07-05
merchant_id Second_last_contract_id
10 501
13 456
14 642
contract_id != previous contract_id + X. (The X is not fixed)
'start_date' tell us the contracts creating order.
Here's one option using user-defined variables to establish a row number per group of merchants and then filtering on the 2nd in each group ordered by contracts:
select *
from (
select *,
#rn:=if(#prevMerchantId=merchantid,
#rn+1,
if(#prevMerchantId:=merchantid, 1, 1)
) as rn
from yourtable cross join (select #rn:=0, #prevMerchantId:=null) t
order by merchantId, contractid desc
) t
where rn = 2
SQL Fiddle Demo
Here's another option, filtering the results of GROUP_CONCAT() using SUBSTRING_INDEX():
SELECT merchant_id,
SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(contract_id ORDER BY start_date DESC),
',', 2), ',', -1) AS Second_last_contract_id
FROM the_table
GROUP BY merchant_id
See it on sqlfiddle.

mysql group by query with average calculation

id originator revenue date
-- ---------- ------- ----------
1 acme 1 2013-09-15
2 acme 0 2013-09-15
3 acme 4 2013-09-14
4 acme 6 2013-09-13
5 acme -6 2013-09-13
6 hello 1 2013-09-15
7 hello 0 2013-09-14
8 hello 2 2013-09-13
9 hello 5 2013-09-14
I have the above table . And I would like to add the ranking column based on the revenue generated by the originator based on the revenue for last 3 days
the fields to be displayed as below:
originator revenue toprank
---------- ------- -------
hello 8 1
acme 5 2
2) And based on the above data , i would like to calculate the avg revenue generated based on the following criteria
If the sum of total revenue for the same date is 0 ( zero) then it should not be counted with calculating the average.
a) avg value for originator acme should be sum of revenue/count(no of dates where the revenue is non zero value) so (4+1)/2 i.e 2.5
b) avg value for originator hello should be sum of revenue/count(no of dates where the revenue is non zero value) so (5+2+1)/3 i.e 2.6666
originator revenue toprank avg(3 days)
---------- ------- ------- -----------
hello 8 1 2.6666
acme 5 2 2.5
To ignore a row when averaging, give AVG a null value. The NULLIF function is good for this.
The ranking is problematic in MySQL. It doesn't support analytic functions that make this a bit easier to do in Oracle, MySQL, Teradata, etc. The most common workaround is to use a counter variable, and that requires an ordered set of rows, which means the total revenue must be calculated in an inner query.
SELECT originator, TotalRev, Avg3Days, #rnk := #rnk + 1 AS TopRank
FROM (
SELECT
originator,
SUM(revenue) AS TotalRev,
AVG(NULLIF(revenue, 0)) AS Avg3Days
FROM myTable
GROUP BY originator
ORDER BY TotalRev DESC
) Tots, (SELECT #rnk := 0) Ranks
If you want to get the values for the last 3 days from today's date, try something like this:
SET #rank=0;
select originator, rev, #rank:=#rank+1 AS rank, avg
FROM
(select originator, sum(revenue) as rev,
AVG(NULLIF(revenue, 0)) as avg
FROM t1
WHERE date >= DATE_ADD(CURDATE(), INTERVAL -3 DAY)
group by originator
order by 2 desc) as t2;
SQL Fiddle..
EDITED:
If you want to get the values for the last 3 days from the nearest date, try this:
SET #rank=0;
select originator, rev, #rank:=#rank+1 AS rank, avg
from
(select originator, sum(revenue) as rev,
AVG(NULLIF(revenue, 0)) as avg
from t1
WHERE date >= DATE_ADD((select max(date) from t1), INTERVAL -3 DAY)
group by originator
order by 2 desc) as t2;
SQL Fiddle..