MySQL find number of students in attendance broken down by time

MySQL find number of students in attendance broken down by time - mysql

I have a table containing arriving and departing times for students attending a class. Given something like this data:
CREATE TABLE `attendance` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`class_id` int(11) DEFAULT NULL,
`student_id` int(11) NOT NULL DEFAULT '0',
`arrival` datetime DEFAULT NULL,
`departure` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `attendance` (`id`, `class_id`, `student_id`, `arrival`, `departure`)
VALUES
(1,1,1,'2013-01-01 16:00:00','2013-01-01 17:00:00'),
(2,1,2,'2013-01-01 16:00:00','2013-01-01 18:00:00'),
(3,1,3,'2013-01-01 17:00:00','2013-01-01 19:00:00'),
(4,1,4,'2013-01-01 17:00:00','2013-01-01 19:00:00'),
(5,1,5,'2013-01-01 17:30:00','2013-01-01 18:30:00');
I'm trying to get a breakdown of time in minutes, and how many students are present for that time period. A result something like this from the above data:
Time Students
60 2 (the first hour from 16:00 to 17:00 has students 1 & 2)
30 3 (the next 30 minutes from 17:00 to 17:30 has students 2, 3 & 4)
30 4 (etc...)
30 3
30 2
The select statement I have so far is getting some way towards the answer but I can't quite get it working:
SELECT a.id, a.arrival, b.id, LEAST(a.departure,b.departure) AS departure,
TIMEDIFF((LEAST(a.departure,b.departure)),(a.arrival)) AS subtime
FROM attendance a
JOIN attendance b ON (a.id <> b.id and a.class_id=b.class_id
and a.arrival >= b.arrival and a.arrival < b.departure)
WHERE a.class_id=1
ORDER BY a.arrival, departure, b.id;
Thank you in advance to anyone who can help me get this right.

Using correlated sub-queries you can create virtual tables (not the same as a temporary table, but kinda the same idea). You can then query against these virtual tables just as if they really existed.
select clocks.clock, count( att.student_id ) as numStudents
from
(
( select arrival as clock from attendance )
union distinct
( select departure as clock from attendance )
)
as clocks
left outer join attendance att on att.arrival <= clocks.clock and clocks.clock < att.departure
group by clocks.clock
order by 1,2
;
Almost what you are looking for. Rather than group by elapsed time, this uses the actual 'event' timestamps (arrivals and departures) and gives you a useful report.
clock numStudents
------------------- -----------
2013-01-01 16:00:00 2
2013-01-01 17:00:00 3
2013-01-01 17:30:00 4
2013-01-01 18:00:00 3
2013-01-01 18:30:00 2
2013-01-01 19:00:00 0
The report shows how many students are still 'here' at each event time.
Hopefully this is useful for you.

Related

MySQL - divide results of count function into columns derived from a separate column

I recently posted this within a different page of Stack Exchange but believe this to be the more appropriate place for it.
Ok, the title seems abit confusing but I am struggling to put down what I need this query to do so best to explain it. I have 3 tables in my database (Using MySQL Workbench), but for this query I'm just trying to use one. The table named service_data has the following columns:
Services_ID|Service_Type|Day|Time|Customer_ID(FK)
1001 |SERVICE1 |Mon|0950|1
1002 |SERVICE2 |Tue|1032|65
1003 |SERVICE3 |Wed|0859|4
the table contains approx 200 records, my aim is to group the timings together, which i have managed to achieve by doing this:
select
case
WHEN (Delivery_Time between '08:00:00' and '09:00:00') then '0800-0900'
WHEN (Delivery_Time between '09:00:00' and '10:00:00') then '0900-1000'
WHEN (Delivery_Time between '10:00:00' and '11:00:00') then '1000-1100'
WHEN (Delivery_Time between '11:00:00' and '12:00:00') then '1100-1200'
WHEN (Delivery_Time between '12:00:00' and '13:00:00') then '1200-1300'
WHEN (Delivery_Time between '13:00:00' and '14:00:00') then '1300-1400'
WHEN (Delivery_Time between '14:00:00' and '15:00:00') then '1400-1500'
WHEN (Delivery_Time between '15:00:00' and '16:00:00') then '1500-1600'
WHEN (Delivery_Time between '16:00:00' and '17:00:00') then '1600-1700'
WHEN (Delivery_Time between '17:00:00' and '18:00:00') then '1700-1800'
WHEN (Delivery_Time between '18:00:00' and '19:00:00') then '1800-1900'
WHEN (Delivery_Time between '19:00:00' and '20:00:00') then '1900-2000'
WHEN (Delivery_Time between '20:00:00' and '21:00:00') then '2000-2100'
else 'Outside Opening Hours'
end as `Time Period`,
count(0) as 'count'
from service_data
group by `Time Period`
order by count desc
limit 20;
Which produces the below result:
TimePeriod Count
1700-1800 24
1500-1600 21
1200-1300 19
1400-1500 19
1800-1900 17
1100-1200 17
1300-1400 16
1600-1700 16
1000-1100 16
1900-2000 12
0800-0900 12
0900-1000 11
What I am now trying to do is split the count up so that there are 4 columns labelled SERVICE1 SERVICE2 SERVICE3 and SERVICE4 (the values within the Service_Type column. Hopefully so it looks something like this:
TimePeriod|SERVICE1|SERVICE2|SERVICE3|SERVICE4
1700-1800 | 6 | 7 | 10 | 1
1500-1600 | 5 | 9 | 1 | 6
1200-1300 | 0 | 4 | 2 | 13`
Is this Possible!? I'm sure it must be but i have been pulling my hair out trying to work it out, SQL isn't my first language! Any help would be appreciated
My second issue is:
I would like a second query to be able to do all of the above and then also link the results to a customer_data table who’s primary key customer_id is a foreign key in service_data and link the customer_id to the quadrant (column within customer_data table with values NE,SE,SW,NW dependant on coords) and group the count a second time by quadrant as well as service, so it looks like this:
TimePeriod| SERVICE1 | SERVICE2 | SERVICE3 | SERVICE4 |
-----------|NE|SE|SW|NW|NE|SE|SW|NW|NE|SE|SW|NW|NE|SE|SW|NW|
1700-1800 |2 |1 | 0| 3|4 | 0| 0|3 |2 |5 |2 |1 |0 |1 | 0| 0|
Again is this possible or am i asking too much? I was wondering if i could use the SUM(IF) function in some way to achieve all this?

Here's something to get you started, although I do agree with #Strawberry that this needs a programming language to do your last step.
This is not in any way optimised for performance or elegance, but I have tested it with your data as given above.
Here's my CREATE TABLE statement:
CREATE TABLE `service_data` (
`services_id` int(11) NOT NULL,
`service_type` varchar(45) DEFAULT NULL,
`day` varchar(45) DEFAULT NULL,
`time` time DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
PRIMARY KEY (`services_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
First get your time ranges. I have knocked a second off the to-time so that a time that is bang on the hour doesn't get double counted.
create view spans as
select time(concat(hour(time),':00:00')) as fromtime,
time(concat(hour(addtime(time,'00:59:59')),':00:00')) as totime
from service_data
Now look up the time range for each row of data.
create view withspans as
select * from service_data s1
join spans s2
on time between fromtime and totime
Now summarise the data as input to the pivot.
create view summary as
select fromtime, totime, service_type, count(*) as spancount
from withspans
group by fromtime, totime, service_type
Now do the pivot via derived tables.
select w.fromtime, w.totime,
s1.spancount as service1,
s2.spancount as service2,
s3.spancount as service3,
s4.spancount as service4
from summary w
left join (select * from summary where service_type = 'SERVICE1') s1
on s1.fromtime=w.fromtime and s1.totime=w.totime
left join (select * from summary where service_type = 'SERVICE2') s2
on s2.fromtime=w.fromtime and s2.totime=w.totime
left join (select * from summary where service_type = 'SERVICE3') s3
on s3.fromtime=w.fromtime and s3.totime=w.totime
left join (select * from summary where service_type = 'SERVICE4') s4
on s4.fromtime=w.fromtime and s4.totime=w.totime

Take previous row value if current row value is NULL

I have a table with transactions by wallet. Using data from this table I need to get information about balance on each wallet in each of month.
The list of month must be defined from transactions dates from same table. So if first transaction in was at 2012-01-14 23:44:12 and last in current month (2015-10), I should have list of year-month like this:
2012-01
2012-02
2012-03
...
2015-10
To get list of available year-month combinations I use next subquery:
SELECT DISTINCT DATE_FORMAT( `created` , '%Y-%m' ) AS `d` FROM `transactions`
and it should be good enough, because I'm sure that there is at least one transaction each month (not for each wallet but in common).
What I need to achieve - have a list with combinations for each of available month + wallet_id + max balance for that month. If in some specific month was no transaction for wallet, I need take balance from previous month. So result data must looks like this:
Month | Wallet | Balance
2012-01 | 234 | 111.10
2012-02 | 234 | 45.29
2012-03 | 234 | 45.29 (no transaction in 2012-03, so take value from prev month)
2012-04 | 234 | 45.29 (no transaction in 2012-04, so take value from prev month)
2012-05 | 234 | 45.29 (no transaction in 2012-05, so take value from prev month)
2012-06 | 234 | 45.29 (no transaction in 2012-06, so take value from prev month)
2012-07 | 234 | 14.32 (new transaction in 2012-07, so calculate new value)
and I need have it for each of wallet id.
Table with transactions have next structure:
CREATE TABLE IF NOT EXISTS `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`wallet_id` int(11) NOT NULL,
`credit` decimal(7,2) NOT NULL DEFAULT '0.00',
`bonus` decimal(7,2) NOT NULL DEFAULT '0.00',
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `wallet_id` (`wallet_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
So far I have problems even with getting list of all available year-month + wallet id + balance, where balance is NULL if there was no transaction for this wallet in this month. So now I want at least have list like this (comparing to table above, which I need to get eventually):
Month | Wallet | Balance
2012-01 | 234 | 111.10
2012-02 | 234 | 45.29
2012-03 | 234 | NULL
2012-04 | 234 | NULL
2012-05 | 234 | NULL
2012-06 | 234 | NULL
2012-07 | 234 | 14.32
if I'll have this one, next step is to get value from previous row if current have NULL balance.
My query looks like this so far:
SELECT dates.*, `t`.*
FROM (
SELECT DISTINCT DATE_FORMAT( `created` , '%Y-%m' ) AS `d`
FROM `transactions`
) AS `dates`
LEFT JOIN (
SELECT `wallet_id` ,
MAX( `credit` + `bonus` ) AS `balance` ,
DATE_FORMAT( `created` , '%Y-%m' ) AS `date`
FROM `transactions`
GROUP BY `wallet_id` , `date`
) AS `t` ON ( `t`.`date` = `dates`.`d` )
ORDER BY `t`.`wallet_id`, `t`.`date`
But now all months where was no transaction for wallet are missed in result. So I have something like this:
Month | Wallet | Balance
2012-01 | 234 | 111.10
2012-02 | 234 | 45.29
2012-07 | 234 | 14.32
How to modify query to get list in the view I need?
And even better, how to achieve my final goal, where will be all months and NULL balance will be replaced with value from previous row?
And important point - I'm going to use result of query as kind of view (indeed derived table in BI tool).

Generate sequence
As far as I can see you need row generator in mysql, but there's on one. See How do I make a row generator in MySQL?
Give a try to 'generate_series' procedure at:
generate many rows with mysql
Search existing DB for months sequence
As an alternative solution I'd find a way in DB to store every month value (so you don't have to generate months sequence). If you don't have a lot of inserts you can do a trigger that will check if there are previous months data and create empty transaction if empty

MySQL subquery for selecting latest items returns to much entities

I'm busy with this problem for hours now. I hope you can help me.
I have a table which contains some articles on different inventory locations. There's also a column which describes the date when the current state was noticed.
I try to get a query which returns the entitys of
- a specific article
- for every inventory location
- only one entry for every inventory location, but it should be the latest entry of a specific date.
So, this is my table:
CREATE TABLE `article_stock` (
`id` bigint(20) NOT NULL,
`stock` double NOT NULL,
`date` datetime DEFAULT NULL,
`inventory_location` varchar(255) DEFAULT NULL,
`article` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `FK_krgmyglif194cjh9t1ndmse6n` FOREIGN KEY (`article`)
REFERENCES `article` (`article`)
);
So, I tried several approaches. But I can't solve my problem.
One more example:
I use this query:
SELECT * FROM article_stock WHERE article_stock.date <= "2015-10-12 00:00:00" AND article_stock.article = 5656
id stock date inventory_location article
6310 1058.68 2015-10-10 00:00:00 A64 5656
6311 561.08 2015-10-11 00:00:00 A64 5656
6312 140.92 2015-10-12 00:00:00 A64 5656
6314 20.06 2015-10-10 00:00:00 K16 5656
6315 600 2015-10-11 00:00:00 K16 5656
I want to get the IDs 6312 and 6315.
Can someone help me? :-(
Thank you! :-)
EDIT:
It seems like it's the same problem as described here:Retrieving the last record in each group
But that's not true.
The question there is to retreive the latest record. But I want to get the latest record of a specific date FOR EVERY grouped element...
Let me explain:
I changed the most popular solution for fitting in my situation:
select
a.*
from
article_stock a
inner join
(select inventory_location, max(date) as datecol
from article_stock
WHERE date <= "2015-10-11 00:00:00"
group by inventory_location) as b
ON (a.inventory_location = b.inventory_location
AND a.date = b.datecol)
WHERE article = 5656;
It returns two rows:
id stock date inventory_location article
6311 561.08 2015-10-11 00:00:00 A64 5656
6315 600 2015-10-11 00:00:00 K16 5656
But when I change the date in the where clause to 2015-10-12 it returns only one single row:
id stock date inventory_location article
6312 140.92 2015-10-12 00:00:00 A64 5656
But the correct solution would be:
id stock date inventory_location article
6312 140.92 2015-10-12 00:00:00 A64 5656
6315 600 2015-10-11 00:00:00 K16 5656
I can't assume that every "inventory_location" change happened on the same date! :-(

Think your later query is almost there, but you need to check the article number in the sub query as well (ie, I presume the max date for an inventory location may be different between different articles):-
SELECT a.*
FROM article_stock a
INNER JOIN
(
SELECT article, inventory_location, MAX(`date`) AS max_date
FROM article_stock
WHERE `date` <= "2015-10-12 00:00:00"
GROUP BY article, inventory_location
) b
ON a.article = b.article
AND a.inventory_location = b.inventory_location
AND a.`date` = b.max_date
WHERE a.article = 5656

most active time of day based on start and end time

I'm logging statistics of the gamers in my community. For both their online and in-game states I'm registering when they "begin" and when they "end". In order to show the most active day and hour of the day I'd like to use an SQL statement that measures the most active moments based on the "begin" and "end" datetime values.
Looking at SQL - select most 'active' time from db I can see similarities, but I need to also include the moments between the start and end time.
Perhaps the easiest way is to write a cron that does the calculations, but I hope this question might teach me how to address this issue in SQL instead.
I've been searching for an SQL statement that allows to create a datetime period and use that to substract single hours and days. But to no avail.
--- update
As I'm thinking more about this, I'm wondering whether it might be wise to run 24 queries based on each hour of the day (for most active hour) and several queries for the most active day. But that seems like a waste of performance. But this solution might make a query possible like:
SELECT COUNT(`userID`), DATE_FORMAT("%H",started) AS starthour,
DATE_FORMAT("%H",ended) AS endhour
FROM gameactivity
WHERE starthour >= $hour
AND endhour <= $hour GROUP BY `userID`
($hour is added for example purposes, of course I'm using PDO. Columns are also just for example purposes, whatever you think is easy for you to use in explaining that is identifiable as start and end is ok with me)
Additional information; PHP 5.5+, PDO, MySQL 5+
Table layout for ingame would be: gameactivity: activityid, userid, gameid, started, ended
DDL:
CREATE TABLE IF NOT EXISTS `steamonlineactivity` (
`activityID` int(13) NOT NULL AUTO_INCREMENT,
`userID` varchar(255) NOT NULL,
`online` datetime DEFAULT NULL,
`offline` datetime DEFAULT NULL,
PRIMARY KEY (`activityID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

If I understood your requirements correctly, if this graph represents user activity:
Day
12/1 12/2 12/3 12/4 ...
Hour 0 xx x x xx
1 x xx xx
2 xxx x x xx
3 x x
4 x x
5 x x
6 x
...
You want to know that 02:00 is the time of the day with the highest average activity (a row with 7 x), and 12/4 was most active day (a column with 10 x). Note that this doesn't imply that 02:00 of 12/4 was the most active hour ever, as you can see in the example. If this is not what you want please clarify with concrete examples of input and desired result.
We make a couple assumptions:
An activity record can start on one date and finish on the next one. For instance: online 2013-12-02 23:35, offline 2013-12-03 00:13.
No activity record has a duration longer than 23 hours, or the number of such records is negligible.
And we need to define what does 'activity' mean. I picked the criteria that were easier to compute in each case. Both can be made more accurate if needed, at the cost of having more complex queries.
The most active time of day will be the hour with which more activity records overlap. Note that if a user starts and stops more than once during the hour it will be counted more than once.
The most active day will be the one for which there were more unique users that were active at any time of the day.
For the most active time of day we'll use a small auxiliary table holding the 24 possible hours. It can also be generated and joined on the fly with the techniques described in other answers.
CREATE TABLE hour ( hour tinyint not null, primary key(hour) );
INSERT hour (hour)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)
, (11), (12), (13), (14), (15), (16), (17), (18), (19), (20)
, (21), (22), (23);
Then the following queries give the required results:
SELECT hour, count(*) AS activity
FROM steamonlineactivity, hour
WHERE ( hour BETWEEN hour(online) AND hour(offline)
OR hour(online) BETWEEN hour(offline) AND hour
OR hour(offline) BETWEEN hour AND hour(online) )
GROUP BY hour
ORDER BY activity DESC;
SELECT date, count(DISTINCT userID) AS activity
FROM (
SELECT userID, date(online) AS date
FROM steamonlineactivity
UNION
SELECT userID, date(offline) AS date
FROM steamonlineactivity
) AS x
GROUP BY date
ORDER BY activity DESC;

You need a sequence to get values for hours where there was no activity (e.g. hours where nobody starting or finishing, but there were people on-line who had started but had not finished in that time). Unfortunately there is no nice way to create a sequence in MySQL so you will have to create the sequence manually;
CREATE TABLE `hour_sequence` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`hour` datetime NOT NULL,
KEY (`hour`),
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
# this is not great
INSERT INTO `hour_sequence` (`hour`) VALUES
("2013-12-01 00:00:00"),
("2013-12-01 01:00:00"),
("2013-12-01 02:00:00"),
("2013-12-01 03:00:00"),
("2013-12-01 04:00:00"),
("2013-12-01 05:00:00"),
("2013-12-01 06:00:00"),
("2013-12-01 07:00:00"),
("2013-12-01 08:00:00"),
("2013-12-01 09:00:00"),
("2013-12-01 10:00:00"),
("2013-12-01 11:00:00"),
("2013-12-01 12:00:00");
Now create some test data
CREATE TABLE `log_table` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`userID` bigint(20) unsigned NOT NULL,
`started` datetime NOT NULL,
`finished` datetime NOT NULL,
KEY (`started`),
KEY (`finished`),
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET latin1;
INSERT INTO `log_table` (`userID`,`started`,`finished`) VALUES
(1, "2013-12-01 00:00:12", "2013-12-01 02:25:00"),
(2, "2013-12-01 07:25:00", "2013-12-01 08:23:00"),
(1, "2013-12-01 04:25:00", "2013-12-01 07:23:00");
Now the query - for every hour we keep a tally (accumulation/running total/integral etc) of how many people started a session hour-on-hour
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS starts
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.started
GROUP BY
HS.hour
And also how many people went off-line likewise
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
GROUP BY
HS.hour
By subtracting the accumulation of people that had gone off-line at a point in time from the accumulation of people that have come on-line at that point in time we get the number of people who were on-line at that point in time (presuming there were zero people on-line when the data starts, of course).
SELECT
starts.period_starting,
starts.starts as users_started,
finishes.finishes as users_finished,
starts.starts - finishes.finishes as users_online
FROM
(
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS starts
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.started
GROUP BY
HS.hour
) starts
LEFT JOIN (
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
GROUP BY
HS.hour
) finishes ON starts.period_starting = finishes.period_starting;
Now a few caveats. First of all you will need a process to keep your sequence table populated with the hourly timestamps as time progresses. Additionally the accumulators do not scale well with large amounts of log data due to the tenuous join - it would be wise to constrain access to the log table by timestamp in both the starts and finishes subquery, and the sequence table while you are at it.
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
WHERE
LT.finished BETWEEN ? AND ? AND HS.hour BETWEEN ? AND ?
GROUP BY
HS.hour
If you start constraining your log_table data to specific time ranges bear in mind you will have an offset issue if, at the point you start looking at the log data, there were already people on-line. If there were 1000 people on-line at the point where you start looking at your log data then you threw them all off the server from the query it would look like we went from 0 people on-line to -1000 people on-line!

#rsanchez had an amazing answer, but the query for most active time of day has a weird behaviour when handling session times that started and ended on the same hour (a short session). The query seems to calculate them to last for 24 hours.
With trial and error I corrected his query from that part to be following
SELECT hour, count(*) AS activity
FROM steamonlineactivity, hour
WHERE ( hour >= HOUR(online) AND hour <= HOUR(offline)
OR HOUR(online) > HOUR(offline) AND HOUR(online) <= hour
OR HOUR(offline) >= hour AND HOUR(offline) < HOUR(online) )
GROUP BY hour
ORDER BY activity DESC;
So with following structure:
CREATE TABLE hour ( hour tinyint not null, primary key(hour) );
INSERT hour (hour)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)
, (11), (12), (13), (14), (15), (16), (17), (18), (19), (20)
, (21), (22), (23);
CREATE TABLE `steamonlineactivity` (
`activityID` int(13) NOT NULL AUTO_INCREMENT,
`userID` varchar(255) NOT NULL,
`online` datetime DEFAULT NULL,
`offline` datetime DEFAULT NULL,
PRIMARY KEY (`activityID`)
);
INSERT INTO `steamonlineactivity` (`activityID`, `userID`, `online`, `offline`) VALUES
(1, '1', '2014-01-01 16:01:00', '2014-01-01 19:01:00'),
(2, '2', '2014-01-02 16:01:00', '2014-01-02 19:01:00'),
(3, '3', '2014-01-01 22:01:00', '2014-01-02 02:01:00'),
(4, '4', '2014-01-01 16:01:00', '2014-01-01 16:05:00');
The top query to get the most active times output following:
+------+----------+
| hour | activity |
+------+----------+
| 16 | 3 |
| 17 | 2 |
| 18 | 2 |
| 19 | 2 |
| 22 | 1 |
| 23 | 1 |
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
+------+----------+
Instead of the original query which gives following erronous result:
+------+----------+
| hour | activity |
+------+----------+
| 16 | 3 |
| 17 | 3 |
| 18 | 3 |
| 19 | 3 |
| 0 | 2 |
| 1 | 2 |
| 2 | 2 |
| 22 | 2 |
| 23 | 2 |
| 11 | 1 |
| 12 | 1 |
| 13 | 1 |
| 14 | 1 |
| 15 | 1 |
| 3 | 1 |
| 4 | 1 |
| 20 | 1 |
| 5 | 1 |
| 21 | 1 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
+------+----------+

This query is for oracle, but you can get idea from it:
SELECT
H, M,
COUNT(BEGIN)
FROM
-- temporary table that should return numbers from 0 to 1439
-- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
-- in oracle you can use CONNECT BY clause which is designated to do recursive queries
(SELECT LEVEL - 1 DAYMIN, FLOOR((LEVEL - 1) / 60) H, MOD((LEVEL - 1), 60) M FROM dual CONNECT BY LEVEL <= 1440) T LEFT JOIN
-- join stats to each row from T by converting discarding date and converting time to minute of a day
STATS S ON 60 * TO_NUMBER(TO_CHAR(S.BEGIN, 'HH24')) + TO_NUMBER(TO_CHAR(S.BEGIN, 'MI')) <= T.DAYMIN AND
60 * TO_NUMBER(TO_CHAR(S.END, 'HH24')) + TO_NUMBER(TO_CHAR(S.END, 'MI')) > T.DAYMIN
GROUP BY H, M
HAVING COUNT(BEGIN) > 0
ORDER BY H, M
GROUP BY H, M
HAVING COUNT(BEGIN) > 0
ORDER BY H, M
Fiddle: http://sqlfiddle.com/#!4/e5e31/9
The idea is to have some temp table or view with one row for time point, and left join to it. In my example there is one row for every minute in day. In mysql you can use variables to create such view on-the-fly.
MySQL version:
SELECT
FLOOR(T.DAYMIN / 60), -- hour
MOD(T.DAYMIN, 60), -- minute
-- T.DAYMIN, -- minute of the day
COUNT(S.BEGIN) -- count not null stats
FROM
-- temporary table that should return numbers from 0 to 1439
-- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
-- in mysql you must have some table which has at least 1440 rows;
-- I use (INFORMATION_SCHEMA.COLLATIONSxINFORMATION_SCHEMA.COLLATIONS) for that purpose - it should be
-- in every database
(
SELECT
#counter := #counter + 1 AS DAYMIN
FROM
INFORMATION_SCHEMA.COLLATIONS A CROSS JOIN
INFORMATION_SCHEMA.COLLATIONS B CROSS JOIN
(SELECT #counter := -1) C
LIMIT 1440
) T LEFT JOIN
-- join stats to each row from T by converting discarding date and converting time to minute of a day
STATS S ON (
(60 * DATE_FORMAT(S.BEGIN, '%H')) + (1 * DATE_FORMAT(S.BEGIN, '%i')) <= T.DAYMIN AND
(60 * DATE_FORMAT(S.END, '%H')) + (1 * DATE_FORMAT(S.END, '%i')) > T.DAYMIN
)
GROUP BY T.DAYMIN
HAVING COUNT(S.BEGIN) > 0 -- filter empty counters
ORDER BY T.DAYMIN
Fiddle: http://sqlfiddle.com/#!2/de01c/1

I've been overthinking this question myself and based on everyone's answers I think it's obvious to conclude with the following;
In general it's probably easy to implement some kind of separate table that has the hours of the day and do inner selects from that separate table. Other examples without a separate table have many sub selects, even with four tiers, which makes me believe they will probably not scale. Cron solutions have come to my mind as well, but the question was asked - out of curiosity - to focus on SQL queries and not other solutions.
In my own case and completely outside the scope of my own question, I believe the best solution is to create a separate table with two fields (hour [Y-m-d H], onlinecount, playingcount) that counts the number of people online at a certain hour and the people playing at a certain hour. When a player stops playing or goes offline we update the count (+1) based on the start and end times. Thus I can easily deduce tables and graphs from this separate table.
Please, let me know whether you come to the same conclusion. My thanks to #lolo, #rsanchez and #abasterfield. I wish I could split the bounty :)

sqlFiddle, this query will give you the period that has the most userCount, the period could be between anytime, it just gives you the start time and end time that has the most userCount
SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
(
SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
(SELECT starttime,(SELECT MIN(endtime) FROM
(SELECT DISTINCT started as endtime FROM gameactivity WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
UNION
SELECT DISTINCT ended as endtime FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T1
WHERE T1.endtime > T2.starttime
)as endtime
FROM
(SELECT DISTINCT started as starttime FROM gameactivity WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
UNION
SELECT DISTINCT ended as starttime FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T2
)T3,
GameActivity GA
WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
AND T3.EndTime BETWEEN GA.Started AND GA.Ended
)FinalTable
GROUP BY StartTime,EndTime
ORDER BY UserCount DESC
LIMIT 1
just change the date of '1970-01-01' occurences to the date you're trying to get data from.
What the query does it selects all the times in the inner queries and then create intervals out of them, then join with GameActivity and count occurrences of users within those intervals and return the interval with the most userCount(most activity).
here's an sqlFiddle with one less tier
SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
(
SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
(SELECT DISTINCT started as starttime,(SELECT MIN(ended)as endtime FROM
gameactivity T1 WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
AND T1.ended > T2.started
)as endtime
FROM
gameactivity T2
WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T3,
GameActivity GA
WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
AND T3.EndTime BETWEEN GA.Started AND GA.Ended
)FinalTable
GROUP BY StartTime,EndTime
ORDER BY UserCount DESC
LIMIT 1
or according to your query in your question above, you don't seem to care about dates, but only hour statistics across all dates then the below query might do it (your query just looks at the HOUR of started and ended and ignore users that play longer than 1 hour.
the below query might do it for you sqlFiddle
SELECT COUNT(*) as UserCount,
HOURSTABLE.StartHour,
HOURSTABLE.EndHour
FROM
(SELECT #hour as StartHour,
#hour:=#hour + 1 as EndHour
FROM
gameActivity as OrAnyTableWith24RowsOrMore,
(SELECT #hour:=0)as InitialValue
LIMIT 24) as HOURSTABLE,
gameActivity GA
WHERE HOUR(GA.started) >= HOURSTABLE.StartHour
AND HOUR(GA.ended) <= HOURSTABLE.EndHour
GROUP BY HOURSTABLE.StartHour,HOURSTABLE.EndHour
ORDER BY UserCount DESC
LIMIT 1
just delete the LIMIT 1 if you want to see userCount for other hours as well.

The easiest solution is to run a cron at the top of each hour of who has a start time but no end time (null end time? if you reset it when they login) and log that count. This will give you a count of currently logged in at each hour without needing to do funky schema changes or wild queries.
Now when you check the next hour and they had logged out they would fall out of your results. This query would work if you reset end time when they login.
SELECT CONCAT(CURDATE(), ' ', HOUR(NOW()), ' ', COUNT(*)) FROM activity WHERE DATE(start) = CURDATE() AND end IS NULL;
Then you can log this at your hearts content to a file or to another table (Of course you might need to adjust the select per your log table). For example you can have a table that gets one entry per day and only gets updated once.
Assume a log table like:
current_date | peak_hour | peak_count
SELECT IF(peak_count< $peak_count, true, false) FROM log where DATE(current_date) = NOW();
where $peak_count is a variable coming from your cron. If you find that you have a new bigger peak count you do an update, if the record does not exist for the day do an insert into log. Otherwise, no you have not beat a peak_hour from earlier in the day, don't do an update. This means each day will give you only 1 row in your table. Then you don't need to do any aggregation, it is all right there for you to see the date and hour over the course of a week or month or whatever.

How do I display the number of records per month (including zero) from a MySQL Database?

AMENDED 24/11/2012 based on comments below.
I have a MySQL database (v5.0.95) of members which lists when they joined
CREATE TABLE IF NOT EXISTS `members` (
`id` int(11) NOT NULL,
`group_id` int(11) NOT NULL,
`name` varchar(64) NOT NULL,
`joined` datetime NOT NULL,
KEY `id` (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I also have a second table which will have records indicating which YYYY-MM
CREATE TABLE IF NOT EXISTS `blocker` (
`group_id` int(11) NOT NULL,
`YYYYMM` char(7) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Ideally the following join would provide the following
SELECT date_format( m.joined, '%Y-%m' ) AS DateJoined, count( * ) AS NumJoined
FROM members AS m
LEFT OUTER JOIN blocker AS b
ON b.YYYYMM = date_format( m.joined, '%Y-%m')
WHERE m.group_id =1637017 AND b.group_id =1637017
GROUP BY DateJoined
ORDER BY DateJoined ASC
would give me this
DateJoined NumJoined
2012-01 0
2012-02 0
2012-03 0
2012-04 17
2012-05 0
2012-06 12
2012-07 10
2012-08 10
2012-09 11
2012-10 14
2012-11 4
unfortunately it is not providing zero result months and gives me this
DateJoined NumJoined
2012-04 17
2012-06 12
2012-07 10
2012-08 10
2012-09 11
2012-10 14
2012-11 4
Any pointers would be appreciated. Am I close...?

sql databases can't produce data for you where it doesn't exist. if no one joined in a particular month, you can't have it magically produce that month out of nothing.
if you want to force it, you'll have to have a temp table with the individual months listed in the range you desire, then you can join against that temp table and get your 0-counts.

You could create a Year/Month table and use a left join to the member table.

create another table (may be named as dateCalender) storing values for each month as follows:
2012-01
2012-02
2012-03
2012-04
2012-05
2012-06
2012-07
2012-08
2012-09
2012-10
2012-11
now retrive results using a left outer join of this new table with your original table, for months having valid values will return the values, for months not having any values, it will return Null (that can be represented as 0)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL find number of students in attendance broken down by time - mysql

Related

MySQL - divide results of count function into columns derived from a separate column

Take previous row value if current row value is NULL

MySQL subquery for selecting latest items returns to much entities

most active time of day based on start and end time

How do I display the number of records per month (including zero) from a MySQL Database?

Categories

Resources