how to group results according to a timestamp interval - mysql

I have a table that holds timestamped messages from a sender to a receiver, and it looks like:
mes timestamp sender receiver
10 2014-04-13 12:22:25.000 1 72
10 2014-04-13 12:22:25.000 1 91
10 2014-04-13 12:22:25.000 1 58
16 2014-02-20 20:09:06.000 3 35
16 2014-02-20 20:09:06.000 3 54
17 2014-03-05 14:55:28.000 1 65
18 2014-03-07 14:55:28.000 2 97
19 2014-03-09 14:55:28.000 2 97
My table holds 3 millions rows like these, and I am trying to group results according to timestamp intervals, counting the number of messages in each month between each pair of sender-receiver. Something like:
timestamp sender receiver count
2014-04 1 72 1
2014-04 1 91 1
2014-04 1 58 1
2014-02 3 35 1
2014-02 3 54 1
2014-03 1 65 1
2014-03 2 97 2
I really have no clue how to specify my clause in mysql, so I apologize for not providing a snippet of my not-working code... should I use something as a switch control and manually specify the time interval? or is there something more specific in mysql to manage tasks like this?

This is fairly straight-forward GROUP BY with multiple levels:
SELECT CONCAT(YEAR(tstamp),'-',MONTH(tstamp)) as tstamp, sender, receiver, COUNT(*) AS cnt
FROM yourtable
GROUP BY YEAR(tstamp), MONTH(tstamp),sender,receiver;
I'm using tstamp as field name for "timestamp" to avoid conflicting with reserved words (timestamp).

Related

Cumulative sum of counts is not working for distinct values of users

Table "users":
id
name
email
created_at
46
FSDSD2
FSDSD2#thebluedot.co
2022-05-29 14:19:21
47
Fxz3
Fxz3#gmail.com
2022-05-30 20:12:15
48
Fgh3
Fgh3#gmail.com
2022-05-31 20:12:15
49
Fghxc3
Fghxc3#gmail.com
2022-06-01 20:12:15
50
Fdx3
Fdx3#gmail.com
2022-06-02 20:12:15
51
Fg3q3
Fg3q3#gmail.com
2022-06-03 20:12:15
88
Fbhgt
Fbhgt#gmail.co
2022-05-23 16:38:41
112
Fht
Fht#gmail.com
2022-05-24 16:19:23
113
Y14gss
Y14gss#gmail.com
2022-05-25 16:42:44
114
sfhf
sfhf#gmail.com
2022-05-26 12:10:40
115
A2czu
A2czu#thebluedot.co
2022-05-27 14:00:31
116
Cc1sn
Cc1sn#gmail.com
2022-05-28 12:04:56
Table "oxygen_point_earns":
id
user_id
oxygen_point
created_at
2
116
50.00
2022-05-23 17:49:30
3
113
10.00
2022-05-24 07:49:46
4
114
10.00
2022-05-25 07:50:42
5
46
50.00
2022-05-26 07:55:19
6
47
40.00
2022-05-27 13:28:17
7
48
30.00
2022-05-28 13:32:19
8
49
10.00
2022-05-29 13:32:19
9
50
5.00
2022-05-30 13:32:19
10
51
10.00
2022-05-31 13:32:19
11
88
20.00
2022-06-01 13:32:19
12
112
50.00
2022-06-02 13:32:19
13
115
10.00
2022-06-03 13:32:19
14
112
20.00
2022-06-03 16:32:19
I have two tables:
"users", which stores users basic information
"oxygen_point_earns", which stores oxygen points earned by specific users
The "users" table has 12 rows, though the "oxygen_point_earns" table contains 13 records, which means that one user can win points even more than once.
I was trying to made some calculation between those tables (e.g. dividing the total of weekly gained points by the weekly users cumulative sum, for each user). The problem occurs when I attempt to get the users cumulative sum.
SELECT STR_TO_DATE(CONCAT(YEARWEEK(op.created_at), ' Sunday'), '%X%V %W') AS week,
SUM(COUNT(*)) OVER(ORDER BY MIN(op.created_at)) AS user_count,
SUM(op.oxygen_point) AS op_weekly
FROM users us
LEFT JOIN oxygen_point_earns op
ON us.id = op.user_id
GROUP BY week
ORDER BY week
This query gets me the following output:
As you can see, even though the points are correctly computed, the total user count is wrong at the second row: it should be 12 instead of 13 (First week I got 6 users then next week 6 more users registered. So my total user count is 12. On second row I should get 12.)
I tried DISTINCT, GROUP_CONCAT but didn't work. How can I fix this query to get true result of users counts?
One straightforward option is to separate the two operations (aggregation and windowing) using a subquery/cte:
WITH cte AS (
SELECT STR_TO_DATE(CONCAT(YEARWEEK(op.created_at), ' Sunday'), '%X%V %W') AS week,
COUNT(DISTINCT user_id) AS cnt,
SUM(op.oxygen_point) AS op_weekly
FROM users us
LEFT JOIN oxygen_point_earns op ON us.id = op.user_id
GROUP BY week
)
SELECT week,
SUM(cnt) OVER(ORDER BY week) AS user_count,
op_weekly
FROM cte
ORDER BY week

Mysql select earliest matching common date for multiple conditions

I have a table like the following.
id
dept_id
category_id
event_id
event_date
1
1
75
95
2022-07-11
2
1
75
96
2022-07-09
3
1
75
95
2022-07-12
4
1
75
96
2022-07-12
5
1
75
95
2022-07-13
6
1
75
96
2022-07-13
I need to look for the earliest common event_date for both event_id matches 95 and 96 & category_id matches 75.
The result should be 2022-07-12 according to the table.
How could I achieve this in a single query ?
Thanks in advance.

Find and Remove SQL records, if they occurred within 8 hours (except most recent record)

I have an SQL table, which contains some duplicate records that I want to remove.
Removal should happen under 2 conditions together:
Records have same value under score column
Records happened within 8 hours from each other.
The removed records should be the ones that have an older date, among all matching records, so only the most recent record among matching records should be present in the new query result.
So far, I've managed only to create a code that removes such duplicate, only if the records happened on the same day of the month, so it's missing any records that span over 2 consecutive days - How to solve this?
Original DB looks like:
user_id score visited_at visit_id
------- ---------------- ------------------- ----------
22 75.0 2018-05-14 23:39:14 169
22 75.0 2018-05-14 18:36:26 168
22 75.0 2018-05-13 02:04:46 166
2 55.0 2018-05-12 18:38:24 165
22 78.0 2018-05-12 18:14:34 164
22 75.0 2018-05-12 18:45:12 164
22 55.0 2018-05-08 12:36:12 161
SQL command to partly remove duplicates:
SELECT COUNT(*) AS ct
, it.user_id
, it.score
, UNIX_TIMESTAMP(CONVERT_TZ(it.visited_at,'+00:00',##global.time_zone)) DIV 86400 AS diff
, it.visited_at
, it.visit_id
FROM `vw_items` it
GROUP
BY user_id
, score
, diff
ORDER
BY visited_at DESC
Result:
ct user_id score diff visited_at visit_id
------ ------- ---------------- ------ ------------------- ----------
2 22 75.0 17665 2018-05-14 23:39:14 169
1 22 75.0 17664 2018-05-13 02:04:46 166
1 2 55.0 17663 2018-05-12 18:38:24 165
1 22 78.0 17663 2018-05-12 18:14:34 164
1 22 75.0 17663 2018-05-12 18:45:12 164
1 22 55.0 17659 2018-05-08 12:36:12 161
But I need a command that will also remove record:
1 22 75.0 17663 2018-05-12 18:45:12 164
Because it has the same score as another record, that is more recent, which occurred within 8 hours from that record:
1 22 75.0 17664 2018-05-13 02:04:46 166
I believe what you're looking for is the DATE_SUB function
DATE_SUB(it.visited_at, INTERVAL 8 HOUR)
This will create a datetime that you can compare to find things within 8 hours of a given record. I'd write more answer, but it looks like that's the only piece of the puzzle you're missing.

Select data basis of two values of one column matches of a table

suppose this is my table structure of table user
id field_id user_id value
1 1 37 Lalit
4 2 37 Test
5 13 37 123
6 18 37 324
7 28 37 english
8 33 37 203
9 21 37 201
10 1 39 Mukesh
11 2 39 Test
12 13 39 523
13 18 39 245
14 28 39 French
15 33 39 278
16 21 39 2897
So I wnat to get the result to match the two or three values from the column value and want the result
I made query like
SELECT DISTINCT user_id FROM user where value =123 AND value=523;
But it is not working please give solution how we get the result
A value in a row, as per your example, cannot be both 123 and 523. You have to use OR
SELECT DISTINCT(user_id) FROM user WHERE value=123 OR value=523;
Alternatively you can also use IN clause
SELECT DISTINCT user_id
FROM user
WHERE value IN (123, 523);

Select Distinct Column For Each Value In Another Column Then Group By

I am pretty new to mySQL and I have had a hard time over the past 2 days trying to get this to work. I do not know if the title is correct in relation on what I am trying to fix, but if it is not, please correct me.
Here is the deal:
I have 4 columns... id, number, package_id, and date.
id - Increments every time a new row is inserted
number - Just a 2 digit number
package_id - ID of package
date - date and time row was inserted
Here is what an example table looks like: (I omitted the time from the date)
id number package_id date
--- ------ ---------- ----
1 12 20 08-01-2013
2 12 21 08-01-2013
3 12 20 08-01-2013
4 45 20 08-02-2013
5 45 22 08-02-2013
6 45 22 08-03-2013
7 12 20 08-03-2013
8 70 25 08-03-2013
9 70 26 08-03-2013
10 70 25 08-03-2013
Not only am I trying to select distinct for number and group by date. I am also trying to make sure it does it for each unique value in the package_id column.
To better explain, this is what i want the output to be like when I SELECT *:
id number package_id date
--- ------ ---------- ----
1 12 20 08-01-2013
2 12 21 08-01-2013
4 45 20 08-02-2013
5 45 22 08-02-2013
6 45 22 08-03-2013
7 12 20 08-03-2013
8 70 25 08-03-2013
9 70 26 08-03-2013
As you can see only row 3 and 10 did not get selected because of the same number and package_id together within the same day.
How can I accomplish this?
Is this what you are looking for:
SELECT MIN(id), number, package_id, date
FROM MyTable
GROUP by number, package_id, date
It certainly satisfies your expected result set.