MySQL : Selecting the rows with the highest group by count - mysql

I have a table with records that are updated every minute with a decimal value (10,2). To ignore measure errors I want to have the number that has been inserted the most.
Therefor I tried:
SELECT date_time,max(sensor1),count(ID)
FROM `weigh_data
group by day(date_time),sensor1
This way I get the number of records
Datetime sensor1 count(ID)
2020-03-19 11:49:12 33.22 3
2020-03-19 11:37:47 33.36 10
2020-03-20 07:32:02 32.54 489
2020-03-20 00:00:43 32.56 891
2020-03-20 14:20:51 32.67 5
2020-03-21 07:54:16 32.50 1
2020-03-21 00:00:58 32.54 1373
2020-03-21 01:15:16 32.56 9
2020-03-22 08:35:12 32.52 2
2020-03-22 00:00:40 32.54 575
2020-03-22 06:50:54 32.58 1
What I actually want is for each day one row which has the highest count(ID)
Anyone can help me out on this?

With newer MySQL (8.0 and later) you can use the RANK window function to rank the rows according to the count.
Note that this will return all "ties" which means if there are 100 readings of X and 100 readings of Y (and 100 is the max), both X and Y will be returned.
WITH cte AS (
SELECT
DATE(date_time), sensor1,
RANK() OVER (PARTITION BY DATE(date_time) ORDER BY COUNT(*) DESC) rnk
FROM `weigh_data` GROUP BY DATE(date_time), sensor1
)
SELECT * FROM cte WHERE rnk=1
If you just want to pick one (non deterministic) of the ties, you can instead use ROW_NUMBER in place of RANK
A DBfiddle to test with.

Here is a solution based on a correlated subquery, that works in all versions of MySQL:
select w.*
from weigh_data w
where w.datetime = (
select w1.datetime
from weigh_data w1
where w1.datetime >= date(w.datetime) and w1.datetime < date(w.datetime) + interval 1 day
order by sensor1 desc
limit 1
)
Just like the window function solution using rank(), this allows top ties.
For performance, you want an index on (datetime, sensor1).

Related

MySQL join and group based on date ranges

I have table A
uid dt val_A
10 04/09/2012 34
10 08/09/2012 35
10 10/09/2012 36
100 04/09/2012 40
100 08/09/2012 41
and table B
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
I want to join them to get table C. I want to join them on uid. Furthermore I want to have a new column val_C which holds the average of val_B where date in B is greater or equal than the corresponding row-value dt in A AND less than the next higher dt value for this uid in table A. It means I want to aggregate the values in B based on date ranges defined in A. The joined table should look like this:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 0
How can this be achieved?
//EDIT
How could a more generalized solution look like where all dates in B2 which are greater than the greatest date in A are being joined & aggregated to the greatest date in A. B2:
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
100 10/09/2012 4
100 11/09/2012 2
Desired output C2:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 3
If you're on MySQL v8+ that supports LEAD() function, then you can try this:
WITH cte AS (
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA)
SELECT cte.uid, cte.dt, cte.val_A,
AVG(val_B) AS val_C
FROM cte
LEFT JOIN tableB tb1
ON cte.uid=tb1.uid
AND tb1.dt >= cte.dt
AND tb1.dt < cte.dtRg
GROUP BY cte.uid, cte.dt, cte.val_A
The query in common table expression (cte):
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA
will give you a result like this:
As you can see, the dtRg column is generated using LEAD() function which takes the next row dt value according to the ORDER BY. Read more about LEAD() here.
After that, join the cte with tableB on matching uid and where tableB.dt is the same or bigger than the existing tableA.dt - which is now as cte.dt, but lower than cte.dtRg - which is the next date in tableA that was generated by LEAD(). And finally adding AVG(val_B) AS val_C
Demo fiddle
On older MySQL version, you can try this:
SELECT tA.uid, tA.dt, tA.val_A,
AVG(val_B) AS val_C
FROM
(SELECT uid, dt, val_A,
(SELECT dt FROM tableA ta1
WHERE ta1.uid=ta2.uid
AND ta1.dt > ta2.dt LIMIT 1) AS dtRg
FROM tableA ta2) tA
LEFT JOIN tableB tB
ON tA.uid=tB.uid
AND tB.dt >= tA.dt
AND tB.dt < tA.dtRg
GROUP BY tA.uid, tA.dt, tA.val_A;
The difference are as following:
Instead of using LEAD(), it uses correlated subquery in SELECT to get the next dt value of next row in the same uid.
Instead of common table expression, it uses a derived table.
Fiddle for MySQL v5.7 version

MySql Statement History with user balance algorithm

I have a table with payment history
payments:
id
consumer_id
amount
created_at
1
1
30
2021-05-11 13:01:36
2
1
-10
2021-05-12 14:01:36
3
1
-2.50
2021-05-13 13:01:36
4
1
-4.50
2021-05-14 13:01:36
5
1
20
2021-05-15 13:01:36
In final result need to get consumer balance after each transaction.
So something like this
id
consumer_id
amount
created_at
balance
1
1
30
2021-05-11 13:01:36
30.00
2
1
-10
2021-05-12 14:01:36
20.00
3
1
-2.50
2021-05-13 13:01:36
17.50
4
1
-4.50
2021-05-14 13:01:36
13.00
5
1
20
2021-05-15 13:01:36
33.00
I using this query
SET #balanceTotal = 0;
select amount, created_at, consumer_id, #balanceTotal := #balanceTotal + amount as balance
from payments
where consumer_id = 1
This works fine until I try to add some sorting or pagination.
Any suggestion on how to write a query with order by desc, limit, and offset to count balance properly?
That's just a window sum. In MySQL 8.0:
select p.*,
sum(amount) over(partition by consumer_id order by created_at) balance
from payments p
You can add the filtering on the customer in the where clause if you like (in which case the partition by clause is not really needed anymore).
In earlier versions of MySQL, an alternative uses a correlated subquery:
select p.*,
(
select sum(amount)
from payments p1
where p1.consumer_id = p.consumer_id and p1.created_at <= p.created_at
) balance
from payments p
I would not recommend user variables for this; although efficient, their behavior is quite tricky, and their use is deprecated in recent. versions.
If using MySQL >= 8 using a window sum is preferable -
select p.*, sum(amount) over(order by created_at) balance
from payments p
where consumer_id = 1
order by created_at desc
limit 0, 5;
If you are using MySQL < 8 then using a user variable for this is significantly more efficient than using the suggested correlated subquery. You can have it as a derived table for re-ordering and pagination -
select * from (
select p.*, #balanceTotal := #balanceTotal + amount as balance
from payments p, (SELECT #balanceTotal := 0) vars
where consumer_id = 1
order by created_at
) t
order by created_at desc
limit 0, 5;

Get original RANK() value based on row create date

Using MariaDB and trying to see if I can get pull original rankings for each row of a table based on the create date.
For example, imagine a scores table that has different scores for different users and categories (lower score is better in this case)
id
leaderboardId
userId
score
submittedAt ↓
rankAtSubmit
9
15
555
50.5
2022-01-20 01:00:00
2
8
15
999
58.0
2022-01-19 01:00:00
3
7
15
999
59.1
2022-01-15 01:00:00
3
6
15
123
49.0
2022-01-12 01:00:00
1
5
15
222
51.0
2022-01-10 01:00:00
1
4
14
222
87.0
2022-01-09 01:00:00
1
5
15
555
51.0
2022-01-04 01:00:00
1
The "rankAtSubmit" column is what I'm trying to generate here if possible.
I want to take the best/smallest score of each user+leaderboard and determine what the rank of that score was when it was submitted.
My attempt at this failed because in MySQL you cannot reference outer level columns more than 1 level deep in a subquery resulting in an error trying to reference t.submittedAt in the following query:
SELECT *, (
SELECT ranking FROM (
SELECT id, RANK() OVER (PARTITION BY leaderboardId ORDER BY score ASC) ranking
FROM scores x
WHERE x.submittedAt <= t.submittedAt
GROUP BY userId, leaderboardId
) ranks
WHERE ranks.id = t.id
) rankAtSubmit
FROM scores t
Instead of using RANK(), I was able to accomplish this by with a single subquery that counts the number of users that have a score that is lower than and submitted before the given score.
SELECT id, userId, score, leaderboardId, submittedAt,
(
SELECT COUNT(DISTINCT userId) + 1
FROM scores t2
WHERE t2.userId = t.userId AND
t2.leaderboardId = t.leaderboardId AND
t2.score < t.score AND
t2.submittedAt <= t.submittedAt
) AS rankAtSubmit
FROM scores t
What I understand from your question is you want to know the minimum and maximum rank of each user.
Here is the code
SELECT userId, leaderboardId, score, min(rankAtSubmit),max(rankAtSubmit)
FROM scores
group BY userId,
leaderboardId,
scorescode here

Interpolate Multiseries Data In SQL

I have a system that stores the data only when they are changed. So, the dataset looks like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
My data frequency is every 5 seconds. So, whether there's any timestamp or not I need to get the result by assuming in this 5th-second data value the same as the previous value.
As I am storing the data that are only changed, indeed the dataset should be like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:52
2
239
2022-01-19 17:20:47
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
I don't want to insert into my table, I just want to retrieve the data like this on the SELECT statement.
Is there any way I can create this query?
PS. I have many data_types hence when the OP makes a query, it usually gets around a million rows.
EDIT:
Information about server Server version: 10.3.27-MariaDB-0+deb10u1 Debian 10
The User is going to determine the SELECT DateTime. So, there's no certain between time.
As #Akina mentioned, sometimes there're some gaps between the inserted_at. The difference might be ~4seconds or ~6seconds instead of a certain 5seconds. Since it's not going to happen so frequently, It is okay to generate by ignoring this fact.
With the help of a query that gets you all the combinations of data_type_id and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value:
with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id,
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value,
d as inserted_at
from v
Fiddle
You can replace the recursive CTE with any query that gets you all the 5-second moments you need.
WITH RECURSIVE
cte1 AS ( SELECT #start_datetime dt
UNION ALL
SELECT dt + INTERVAL 5 SECOND FROM cte1 WHERE dt < #end_datetime),
cte2 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY test.data_type_id, cte1.dt
ORDER BY test.inserted_at DESC) rn
FROM cte1
LEFT JOIN test ON FIND_IN_SET(test.data_type_id, #data_type_ids)
AND cte1.dt >= test.inserted_at )
SELECT *
FROM cte2
WHERE rn = 1
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=380ad334de0c980a0ddf1b49bb6fa38e

Second Last records

I am trying to get the second last records use mysql.
I did some research, some sample has fix gap between numbers or date. But my situation is that the contract_id is not always +1 from the previous one. Anyone ideas? Thank you so much.
merchant_id contract_id start_date
10 501 2016-05-01
10 506 2016-06-01
13 456 2015-12-01
13 462 2016-01-01
14 620 2016-06-01
14 642 2016-07-01
14 656 2016-07-05
merchant_id Second_last_contract_id
10 501
13 456
14 642
contract_id != previous contract_id + X. (The X is not fixed)
'start_date' tell us the contracts creating order.
Here's one option using user-defined variables to establish a row number per group of merchants and then filtering on the 2nd in each group ordered by contracts:
select *
from (
select *,
#rn:=if(#prevMerchantId=merchantid,
#rn+1,
if(#prevMerchantId:=merchantid, 1, 1)
) as rn
from yourtable cross join (select #rn:=0, #prevMerchantId:=null) t
order by merchantId, contractid desc
) t
where rn = 2
SQL Fiddle Demo
Here's another option, filtering the results of GROUP_CONCAT() using SUBSTRING_INDEX():
SELECT merchant_id,
SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(contract_id ORDER BY start_date DESC),
',', 2), ',', -1) AS Second_last_contract_id
FROM the_table
GROUP BY merchant_id
See it on sqlfiddle.