Interpolate Multiseries Data In SQL - mysql

I have a system that stores the data only when they are changed. So, the dataset looks like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
My data frequency is every 5 seconds. So, whether there's any timestamp or not I need to get the result by assuming in this 5th-second data value the same as the previous value.
As I am storing the data that are only changed, indeed the dataset should be like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:52
2
239
2022-01-19 17:20:47
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
I don't want to insert into my table, I just want to retrieve the data like this on the SELECT statement.
Is there any way I can create this query?
PS. I have many data_types hence when the OP makes a query, it usually gets around a million rows.
EDIT:
Information about server Server version: 10.3.27-MariaDB-0+deb10u1 Debian 10
The User is going to determine the SELECT DateTime. So, there's no certain between time.
As #Akina mentioned, sometimes there're some gaps between the inserted_at. The difference might be ~4seconds or ~6seconds instead of a certain 5seconds. Since it's not going to happen so frequently, It is okay to generate by ignoring this fact.

With the help of a query that gets you all the combinations of data_type_id and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value:
with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id,
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value,
d as inserted_at
from v
Fiddle
You can replace the recursive CTE with any query that gets you all the 5-second moments you need.

WITH RECURSIVE
cte1 AS ( SELECT #start_datetime dt
UNION ALL
SELECT dt + INTERVAL 5 SECOND FROM cte1 WHERE dt < #end_datetime),
cte2 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY test.data_type_id, cte1.dt
ORDER BY test.inserted_at DESC) rn
FROM cte1
LEFT JOIN test ON FIND_IN_SET(test.data_type_id, #data_type_ids)
AND cte1.dt >= test.inserted_at )
SELECT *
FROM cte2
WHERE rn = 1
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=380ad334de0c980a0ddf1b49bb6fa38e

Related

MySql Statement History with user balance algorithm

I have a table with payment history
payments:
id
consumer_id
amount
created_at
1
1
30
2021-05-11 13:01:36
2
1
-10
2021-05-12 14:01:36
3
1
-2.50
2021-05-13 13:01:36
4
1
-4.50
2021-05-14 13:01:36
5
1
20
2021-05-15 13:01:36
In final result need to get consumer balance after each transaction.
So something like this
id
consumer_id
amount
created_at
balance
1
1
30
2021-05-11 13:01:36
30.00
2
1
-10
2021-05-12 14:01:36
20.00
3
1
-2.50
2021-05-13 13:01:36
17.50
4
1
-4.50
2021-05-14 13:01:36
13.00
5
1
20
2021-05-15 13:01:36
33.00
I using this query
SET #balanceTotal = 0;
select amount, created_at, consumer_id, #balanceTotal := #balanceTotal + amount as balance
from payments
where consumer_id = 1
This works fine until I try to add some sorting or pagination.
Any suggestion on how to write a query with order by desc, limit, and offset to count balance properly?
That's just a window sum. In MySQL 8.0:
select p.*,
sum(amount) over(partition by consumer_id order by created_at) balance
from payments p
You can add the filtering on the customer in the where clause if you like (in which case the partition by clause is not really needed anymore).
In earlier versions of MySQL, an alternative uses a correlated subquery:
select p.*,
(
select sum(amount)
from payments p1
where p1.consumer_id = p.consumer_id and p1.created_at <= p.created_at
) balance
from payments p
I would not recommend user variables for this; although efficient, their behavior is quite tricky, and their use is deprecated in recent. versions.
If using MySQL >= 8 using a window sum is preferable -
select p.*, sum(amount) over(order by created_at) balance
from payments p
where consumer_id = 1
order by created_at desc
limit 0, 5;
If you are using MySQL < 8 then using a user variable for this is significantly more efficient than using the suggested correlated subquery. You can have it as a derived table for re-ordering and pagination -
select * from (
select p.*, #balanceTotal := #balanceTotal + amount as balance
from payments p, (SELECT #balanceTotal := 0) vars
where consumer_id = 1
order by created_at
) t
order by created_at desc
limit 0, 5;

Get original RANK() value based on row create date

Using MariaDB and trying to see if I can get pull original rankings for each row of a table based on the create date.
For example, imagine a scores table that has different scores for different users and categories (lower score is better in this case)
id
leaderboardId
userId
score
submittedAt ↓
rankAtSubmit
9
15
555
50.5
2022-01-20 01:00:00
2
8
15
999
58.0
2022-01-19 01:00:00
3
7
15
999
59.1
2022-01-15 01:00:00
3
6
15
123
49.0
2022-01-12 01:00:00
1
5
15
222
51.0
2022-01-10 01:00:00
1
4
14
222
87.0
2022-01-09 01:00:00
1
5
15
555
51.0
2022-01-04 01:00:00
1
The "rankAtSubmit" column is what I'm trying to generate here if possible.
I want to take the best/smallest score of each user+leaderboard and determine what the rank of that score was when it was submitted.
My attempt at this failed because in MySQL you cannot reference outer level columns more than 1 level deep in a subquery resulting in an error trying to reference t.submittedAt in the following query:
SELECT *, (
SELECT ranking FROM (
SELECT id, RANK() OVER (PARTITION BY leaderboardId ORDER BY score ASC) ranking
FROM scores x
WHERE x.submittedAt <= t.submittedAt
GROUP BY userId, leaderboardId
) ranks
WHERE ranks.id = t.id
) rankAtSubmit
FROM scores t
Instead of using RANK(), I was able to accomplish this by with a single subquery that counts the number of users that have a score that is lower than and submitted before the given score.
SELECT id, userId, score, leaderboardId, submittedAt,
(
SELECT COUNT(DISTINCT userId) + 1
FROM scores t2
WHERE t2.userId = t.userId AND
t2.leaderboardId = t.leaderboardId AND
t2.score < t.score AND
t2.submittedAt <= t.submittedAt
) AS rankAtSubmit
FROM scores t
What I understand from your question is you want to know the minimum and maximum rank of each user.
Here is the code
SELECT userId, leaderboardId, score, min(rankAtSubmit),max(rankAtSubmit)
FROM scores
group BY userId,
leaderboardId,
scorescode here

Find MySql concurrent user per hour in a week

This is my table, i want to find concurrent user per hour for a given week
I am trying to calculate number of concurrent users in a time range. The input looks something like the below
Table
id user_id login_time
1 23 2016-06-08 09:10:00
2 24 2016-06-08 08:55:00
3 25 2016-06-08 09:29:00
4 26 2016-06-08 09:40:00
5 27 2016-06-08 09:08:00
6 28 2016-06-09 13:40:00
7 31 2016-06-09 14:04:00
How to get the concurrent users in time range ?
Expected Output Table
Date
Hour
User
2014-08-04
0
3
2014-08-04
1
2
2014-08-04
2
0
2014-08-05
0
1
Similar question
concurrent users sql
I created a DBFIDDLE
first I entered the data from your question
half-way I changed data to what was given here: http://sqlfiddle.com/#!9/67356f/2
first the cte1 contains the first and last date from users.
cte2 contains all the dates between StartDate and EndDate
cte3 contains all (24) hours for the dates.
After this is is just counting to see if a user is logged in.
WITH RECURSIVE cte1 AS (
SELECT
DATE(MIN(login_time)) StartDate,
DATE(MAX(login_time)) EndDate
FROm users),
cte2 AS (
SELECT cte1.StartDate
from cte1
union all
select DATE_ADD(cte2.StartDate, INTERVAL 1 DAY)
from cte2
cross join cte1 where cte2.StartDate < cte1.EndDate
),
cte3 AS (
SELECT StartDate, 0 as H
FROM cte2
UNION ALL
SELECT StartDate, H+1 FROM cte3 WHERE H<24
)
select * from (
select
StartDate as `Date`,
H as `hour`,
(SELECT count(*) from users
WHERE login_time BETWEEN DATE_ADD(StartDate, interval H HOUR) AND DATE_ADD(StartDate, interval (H+1) HOUR)
) as `Count`
from cte3) x
where x.`Count` <>0
order by 1,2;
You can begin with this, but (from my opinion) it has no sense the result you are trying to get because you need to calculate the time:
If a user enters 9:30 and left 9:35 and re-enter 9:45 is not a concurrent user but you get this in the SQL.
If a user enters 9:59 and enter 10:01 you have a concurrent user but you won't see this with this logic of "hour"
Concurrent user with different day (23:59 and 00:01 logins)
In any case, the SQL you are asking for:
SQL Fiddle
SELECT
up.user_id,
up.diff as TimeDiff,
FROM
(
SELECT TIMESTAMPDIFF(HOUR,u1.login,u2.login) as diff, u1.user_id FROM users u1
JOIN users u2
ON u1.user_id = u2.user_id
AND u1.login < u2.login ) up
WHERE up.diff < 1
And without DIFF time (as you requested):
SELECT
g.id,
g.hour,
g.datelogin,
COUNT(*) as times
FROM
(SELECT HOUR(login) as hour, DATE(login) as datelogin, id FROM users) g
GROUP BY datelogin, hour, id
HAVING COUNT(*) > 1 -- This will show only counts is bigger than 1

MySQL : Selecting the rows with the highest group by count

I have a table with records that are updated every minute with a decimal value (10,2). To ignore measure errors I want to have the number that has been inserted the most.
Therefor I tried:
SELECT date_time,max(sensor1),count(ID)
FROM `weigh_data
group by day(date_time),sensor1
This way I get the number of records
Datetime sensor1 count(ID)
2020-03-19 11:49:12 33.22 3
2020-03-19 11:37:47 33.36 10
2020-03-20 07:32:02 32.54 489
2020-03-20 00:00:43 32.56 891
2020-03-20 14:20:51 32.67 5
2020-03-21 07:54:16 32.50 1
2020-03-21 00:00:58 32.54 1373
2020-03-21 01:15:16 32.56 9
2020-03-22 08:35:12 32.52 2
2020-03-22 00:00:40 32.54 575
2020-03-22 06:50:54 32.58 1
What I actually want is for each day one row which has the highest count(ID)
Anyone can help me out on this?
With newer MySQL (8.0 and later) you can use the RANK window function to rank the rows according to the count.
Note that this will return all "ties" which means if there are 100 readings of X and 100 readings of Y (and 100 is the max), both X and Y will be returned.
WITH cte AS (
SELECT
DATE(date_time), sensor1,
RANK() OVER (PARTITION BY DATE(date_time) ORDER BY COUNT(*) DESC) rnk
FROM `weigh_data` GROUP BY DATE(date_time), sensor1
)
SELECT * FROM cte WHERE rnk=1
If you just want to pick one (non deterministic) of the ties, you can instead use ROW_NUMBER in place of RANK
A DBfiddle to test with.
Here is a solution based on a correlated subquery, that works in all versions of MySQL:
select w.*
from weigh_data w
where w.datetime = (
select w1.datetime
from weigh_data w1
where w1.datetime >= date(w.datetime) and w1.datetime < date(w.datetime) + interval 1 day
order by sensor1 desc
limit 1
)
Just like the window function solution using rank(), this allows top ties.
For performance, you want an index on (datetime, sensor1).

Get the last 2 rows of a table while grouping one of the column. MySQL

Consider Facebook. Facebook displays the latest 2 comments of any status. I want to do something similar.
I have a table with e.g. status_id, comment_id, comment and timestamp.
Now I want to fetch the latest 2 comments for each status_id.
Currently I am first doing a GROUP_CONCAT of all columns, group by status_id and then taking the SUBSTRING_INDEX with -2.
This fetches the latest 2 comments, however the GROUP_CONCAT of all the records for a status_id is an overhead.
SELECT SUBSTRING_INDEX(GROUP_CONCAT('~', comment_id,
'~', comment,
'~', timestamp)
SEPARATOR '|~|'),
'|~|', -2)
FROM commenttable
GROUP BY status_id;
Can you help me with better approach?
My table looks like this -
status_id comment_id comment timestamp
1 1 xyz1 3 hour
1 2 xyz2 2 hour
1 3 xyz3 1 hour
2 4 xyz4 2 hour
2 6 xyz6 1 hour
3 5 xyz5 1 hour
So I want the output as -
1 2 xyz2 2 hour
1 3 xyz3 1 hour
2 4 xyz4 2 hour
2 6 xyz6 1 hour
3 5 xyz5 1 hour
Here is a great answer I came across here:
select status_id, comment_id, comment, timestamp
from commenttable
where (
select count(*) from commenttable as f
where f.status_id = commenttable.status_id
and f.timestamp < commenttable.timestamp
) <= 2;
This is not very efficient (O(n^2)) but it's a lot more efficient than concatenating strings and using substrings to isolate your desired result. Some would say that reverting to string operations instead of native database indexing robs you of the benefits of using a database in the first place.
After some struggle I found this solution -
The following gives me the row_id -
SELECT a.status_id,
a.comments_id,
COUNT(*) AS row_num
FROM comments a
JOIN comments b
ON a.status_id = b.status_id AND a.comments_id >= b.comments_id
GROUP BY a.status_id , a.comments_id
ORDER BY row_num DESC
The gives me the total rows -
SELECT com.status_id, COUNT(*) total
FROM comments com
GROUP BY com.status_id
In the where clause of the main select -
row_num = total OR row_num = total - 1
This gives the latest 2 rows. You can modify the where clause to fetch more than 2 latest rows.