I have been trying to extract the number of days a particular user spent on each status in a month from the MySQL database table. The data is saved in log format which makes it a bit hard to work with. For e.g. I need to calculate the number of days the user 488 spent on each status in the month of June 2022 only.
user_id old_status new_status modified_on
488 3 10 31/05/2022 10:03
488 10 5 01/06/2022 13:05
488 5 16 07/06/2022 16:06
488 16 2 09/06/2022 08:26
488 2 6 30/06/2022 13:51
488 6 2 07/07/2022 09:44
488 2 6 08/08/2022 13:25
488 6 1 15/08/2022 10:37
488 1 11 02/09/2022 13:48
488 11 2 03/10/2022 07:26
488 2 10 10/10/2022 10:17
488 10 6 25/01/2023 17:50
488 6 1 01/02/2023 13:46
The output should look like this:
The output should look like:
user status Days
488 5 6
488 16 2
488 2 21
I tried multiple ways to join the same table with itself in order to find the solution but no luck. Any help will be appreciated.
here is what I think you should do, first join the old_status field in the log table with the status table then use the DATEDIFF function to subtract modified_on(log table ) from created_at(or any other field in status that stores creation time) you can filter results using where clause to get certain users on certain dates
this query might help (i don't know the structure of your tables so if there is something wrong edit it to suit your needs)
SELECT *,DATEDIFF(log.modified_at,st.created_at) AS spent_time_on_staus
FROM log_status AS log JOIN status AS st ON st.id=log.old_status
WHERE log.user_id=488 AND EXTRACT(MONTH FROM st.created_at) = 6
This is a suggestion to get you started. It will not get you all the way (since there are several status changes to and from the same status...)
SELECT
shfrom.userid,
shfrom.new_status as statusName,
shfrom.modified_on as fromdate,
shto.modified_on as todate,
DATEDIFF(shto.modified_on, shfrom.modified_on) as days_spent_in_status
FROM
status_history as shfrom
INNER JOIN status_history as shto
ON shfrom.userid = shto.userid and shfrom.new_status = shto.old_status
WHERE
shfrom.modified_on < shto.modified_on
;
I created a table based on your question and put in the data you provided, in mysql format:
create table status_history(
userid int,
old_status int,
new_status int,
modified_on datetime
);
insert into status_history values
(488, 3,10, '2022-05-31 10:03'),
(488,10, 5, '2022-06-01 13:05'),
(488, 5,16, '2022-06-07 16:06'),
(488,16, 2, '2022-06-09 08:26'),
(488, 2, 6, '2022-06-30 13:51'),
(488, 6, 2, '2022-07-07 09:44'),
(488, 2, 6, '2022-08-08 13:25'),
(488, 6, 1, '2022-08-15 10:37'),
(488, 1,11, '2022-09-02 13:48'),
(488,11, 2, '2022-10-03 07:26'),
(488, 2,10, '2022-10-10 10:17'),
(488,10, 6, '2023-01-25 17:50'),
(488, 6, 1, '2023-02-01 13:46');
this produces this result, where the duration is the time spent:
userid
statusName
fromdate
todate
days_spent_in_status
488
10
2022-05-31 10:03:00
2022-06-01 13:05:00
1
488
5
2022-06-01 13:05:00
2022-06-07 16:06:00
6
488
16
2022-06-07 16:06:00
2022-06-09 08:26:00
2
488
2
2022-06-09 08:26:00
2022-06-30 13:51:00
21
488
6
2022-06-30 13:51:00
2022-07-07 09:44:00
7
488
2
2022-06-09 08:26:00
2022-08-08 13:25:00
60
488
2
2022-07-07 09:44:00
2022-08-08 13:25:00
32
488
6
2022-06-30 13:51:00
2022-08-15 10:37:00
46
488
6
2022-08-08 13:25:00
2022-08-15 10:37:00
7
488
1
2022-08-15 10:37:00
2022-09-02 13:48:00
18
488
11
2022-09-02 13:48:00
2022-10-03 07:26:00
31
488
2
2022-06-09 08:26:00
2022-10-10 10:17:00
123
488
2
2022-07-07 09:44:00
2022-10-10 10:17:00
95
488
2
2022-10-03 07:26:00
2022-10-10 10:17:00
7
488
10
2022-05-31 10:03:00
2023-01-25 17:50:00
239
488
10
2022-10-10 10:17:00
2023-01-25 17:50:00
107
488
6
2022-06-30 13:51:00
2023-02-01 13:46:00
216
488
6
2022-08-08 13:25:00
2023-02-01 13:46:00
177
488
6
2023-01-25 17:50:00
2023-02-01 13:46:00
7
You still need to filter out the ones that are capturing an early status change with a later status change. I hope it gets you started.
I can't find solution how can I sort data form three datasets. I have one static dataset and two matrix tables which I want to connect in one report. Every table has the same ID which I can use to connect them (the same number of rows as well) but don't know how could I do this? Is it possibile to connect few datasets?
table1:
N ID St From To
1 541 7727549 08:30:00 14:00:00
2 631 7727575 07:00:00 15:00:00
3 668 7727552 09:00:00 17:00:00
4 679 18:00:00 00:00:00
5 721 17:00:00 00:00:00
table:2
ID P1 P2 P3 P4
541 12:00:00 - 12:10:00
631 08:45:00 - 08:55:00 11:30:00 - 11:40:00 13:00:00 - 13:15:00
668 12:05:00 - 12:15:00 13:45:00 - 13:55:00 14:55:00 - 15:10:00
679 21:15:00 - 21:30:00
721 20:40:00 - 20:50:00 21:50:00 - 22:05:00
table3:
ID W1 W2 W3
541 11:28:58 - 11:39:13
631 08:46:54 - 08:58:43 11:07:04 - 11:17:05
668 11:26:11 - 11:41:44
679
721 11:07:19 - 11:17:06
This is not my query, its a query that someone wrote that i am now working with.
I have a database like so
id date high low open close open_id close_id
1 2009-05-01 00:00:00 0.729125 0.729225 0.72889 0.72889 1 74
2 2009-05-01 00:01:00 0.72888 0.728895 0.72883 0.72887 75 98
3 2009-05-01 00:02:00 0.728865 0.72889 0.72881 0.72888 99 121
4 2009-05-01 00:03:00 0.72891 0.72901 0.72891 0.729 122 141
5 2009-05-01 00:04:00 0.728975 0.729115 0.728745 0.72878 142 225
6 2009-05-01 00:05:00 0.728785 0.72882 0.72867 0.72882 226 271
7 2009-05-01 00:06:00 0.72884 0.72887 0.728735 0.728785 272 293
8 2009-05-01 00:07:00 0.728775 0.728835 0.72871 0.728835 294 317
9 2009-05-01 00:08:00 0.728825 0.72899 0.728795 0.72897 318 338
10 2009-05-01 00:09:00 0.72898 0.729255 0.72898 0.72922 339 383
11 2009-05-01 00:10:00 0.72922 0.729325 0.72908 0.729105 384 437
12 2009-05-01 00:11:00 0.729115 0.72918 0.728635 0.72905 438 553
(this is 12 out of about 200k rows)
This is my query
SELECT x.date, t.high, t.low, t.open, t.close, x.open_id, x.close_id from (SELECT MIN(`date`) as `date`, MAX(`close_id`) as `close_id`, MIN(`open_id`) as `open_id`
FROM `AUDNZD_minutes`
WHERE `date` >= '2011-03-07 00:00:00' and `date` < '2011-03-11 12:00:00'
GROUP BY round(UNIX_TIMESTAMP(date) / 600) order by `date`) as x inner join `AUDNZD_minutes` as t on x.close_id = t.close_id
It is selecting rows from that data base in 10 minute intervals. However I always have this Anomaly.
2011-03-07 00:00:00 1.3761 1.375595 1.375815 1.37589 55180489 55181083
2011-03-07 00:05:00 1.376055 1.37568 1.375925 1.37594 55181084 55181751
2011-03-07 00:15:00 1.37609 1.375835 1.375835 1.37606 55181752 55182003
2011-03-07 00:25:00 1.37578 1.37526 1.375505 1.375555 55182004 55182615
2011-03-07 00:35:00 1.374645 1.374455 1.374535 1.374645 55182616 55183178
2011-03-07 00:45:00 1.37463 1.373775 1.374085 1.374025 55183179 55183820
You can see that the diffrence between the first row and the second is 5 minutes and everythign after this is 10 minutes. this happens with any interval i try.
For example, 20 miunte intervals
2011-03-07 00:00:00 1.376155 1.375915 1.37594 1.376025 55180489 55181434
2011-03-07 00:10:00 1.376105 1.37592 1.37593 1.376085 55181435 55182273
2011-03-07 00:30:00 1.374025 1.37388 1.373965 1.37401 55182274 55183429
2011-03-07 00:50:00 1.373895 1.373595 1.37365 1.373595 55183430 55184894
2011-03-07 01:10:00 1.37382 1.373505 1.37373 1.373715 55184895 55185885
2011-03-07 01:30:00 1.373305 1.373025 1.373265 1.373055 55185886 55187306
How can i correct this query?
round function rounds numbers using basic math rules you probably learned in primary:
select FROM_UNIXTIME(round(UNIX_TIMESTAMP('2009-05-01 00:04:00') / 600) *600) from dual;
results with 2009-05-01 00:00:00 and
select FROM_UNIXTIME(round(UNIX_TIMESTAMP('2009-05-01 00:06:00') / 600) *600) from dual;
results with 2009-05-01 00:10:00, so you will always (on the provided dataset) have half of the interval in the first line if you keep using it.
Consider ceil or floor functions instead.
As a side note, #Strawberry made a point. Try to use anything like http://sqlfiddle.com/ to show some efforts in asking question at least.
I have a table where population number is given for each day at every hour. How can I get the row with max population number? Here is an example of the table
date hour population
2015-07-11 10 205
2015-07-11 11 390
2015-07-11 12 579
2015-07-11 13 679
2015-07-11 14 699
2015-07-11 15 890
2015-07-11 16 816
2015-07-11 17 970
2015-07-11 18 835
2015-07-11 19 827
2015-07-11 20 753
2015-07-11 21 638
2015-07-11 22 327
2015-07-12 9 33
2015-07-12 10 151
2015-07-12 11 227
2015-07-12 12 419
2015-07-12 13 561
2015-07-12 14 683
2015-07-12 15 799
2015-07-12 16 830
2015-07-12 17 876
2015-07-12 18 844
2015-07-12 19 819
2015-07-12 20 626
2015-07-12 21 526
2015-07-12 22 235
Try using MAX() to get maximum value of a column per group :
SELECT date, MAX(population)
FROM foo
GROUP BY date
EDIT :
If you want to have the hour that corresponds to your max population value, you can go with :
SELECT foo.*
FROM foo
INNER JOIN
(SELECT date, MAX(population) as MaxPop
FROM foo
GROUP BY date) max
ON foo.date = max.date
AND foo.population = max.MaxPop
Hope it helps.
I am not sure why my numbers are drastically off from each other.
A query with no max id:
SELECT id, DATE_FORMAT(t_stamp, '%Y-%m-%d %H:00:00') as date, COUNT(*) as count
FROM test_ips
WHERE id > 0
AND viewip != ""
GROUP BY HOUR(t_stamp)
ORDER BY t_stamp ASC;
I get:
1 2012-07-18 19:00:00 1313
106 2012-07-18 20:00:00 1567
107 2012-07-19 09:00:00 847
225 2012-07-19 10:00:00 5095
421 2012-07-19 11:00:00 205
423 2012-07-19 12:00:00 900
461 2012-07-19 13:00:00 619
490 2012-07-20 15:00:00 729
575 2012-07-20 16:00:00 1682
1060 2012-07-20 17:00:00 2063
2260 2012-07-20 18:00:00 1417
5859 2012-07-20 21:00:00 1303
7060 2012-07-20 22:00:00 1340
8280 2012-07-20 23:00:00 1211
9149 2012-07-21 00:00:00 1675
10418 2012-07-21 01:00:00 721
11127 2012-07-21 02:00:00 825
But if I add a max id:
AND id <= 8279
I get:
1 2012-07-18 19:00:00 1313
106 2012-07-18 20:00:00 1201
107 2012-07-19 09:00:00 118
225 2012-07-19 10:00:00 196
421 2012-07-19 11:00:00 2
423 2012-07-19 12:00:00 38
461 2012-07-19 13:00:00 20
490 2012-07-20 15:00:00 85
575 2012-07-20 16:00:00 483
1060 2012-07-20 17:00:00 1200
2260 2012-07-20 18:00:00 1200
5859 2012-07-20 21:00:00 1201
7060 2012-07-20 22:00:00 1220
The numbers are WAY off from each other. Something is goofy.
EDIT: Here is my table structure:
id t_stamp bID viewip unique
1 2012-07-18 19:22:20 5 192.168.1.1 1
2 2012-07-18 19:22:21 1 192.168.1.1 1
3 2012-07-18 19:22:22 5 192.168.1.1 0
4 2012-07-18 19:22:22 3 192.168.1.1 1
You are not grouping by ID and I think you intend to.
Try:
SELECT id, DATE_FORMAT(t_stamp, '%Y-%m-%d %H:00:00') as date, COUNT(*) as count
FROM test_ips
WHERE id > 0
AND viewip != ""
GROUP BY id, DATE_FORMAT(t_stamp, '%Y-%m-%d %H:00:00')
ORDER BY t_stamp;
Your query is not consistent.
In your select statement you are displaying the full date.
But you are grouping your data by the hour. So your count statement is taking the count of all the data for each hour of the day.
As an example take your first result:
1 2012-07-18 19:00:00 1313
The count of 1313 contains the records for all of your dates (7/18, 7/19, 7/20, 7/21, 7/22, etc) that have an hour of 19:00.
But the way you have your query setup, it looks like it should be the count of all records for 2012-07-18 19:00:00.
So when you add AND id <= 8279" The dates of 7/21 and some of 7/20 or no longer being counted so your count values are now lower.
I'm guessing you are meaning to group by the date and hour and not just the hour.