Group by date from multiple columns? - mysql

first of all sorry for that title, but I have no idea how to describe it:
I'm saving sessions in my table and I would like to get the count of sessions per hour to know how many sessions were active over the day. The sessions are specified by two timestamps: start and end.
Hopefully you can help me.
Here we go:
http://sqlfiddle.com/#!2/bfb62/2/0

While I'm still not sure how you'd like to compare the start and end dates, looks like using COUNT, YEAR, MONTH, DAY, and HOUR, you could come up with your desired results.
Possibly something similar to this:
SELECT COUNT(ID), YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
FROM Sessions
GROUP BY YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
And the SQL Fiddle.

What you want to do is rather hard in MySQL. You can, however, get an approximation without too much difficulty. The following counts up users who start and stop within one day:
select date(start), hour,
sum(case when hours.hour between hour(start) and hours.hour then 1 else 0
end) as GoodEstimate
from sessions s cross join
(select 0 as hour union all
select 1 union all
. . .
select 23
) hours
group by date(start), hour
When a user spans multiple days, the query is harder. Here is one approach, that assumes that there exists a user who starts during every hour:
select thehour, count(*)
from (select distinct date(start), hour(start),
(cast(date(start) as datetime) + interval hour(start) hour as thehour
from sessions
) dh left outer join
sessions s
on s.start <= thehour + interval 1 hour and
s.end >= thehour
group by thehour
Note: these are untested so might have syntax errors.

OK, this is another problem where the index table comes to the rescue.
An index table is something that everyone should have in their toolkit, preferably in the master database. It is a table with a single id int primary key indexed column containing sequential numbers from 0 to n where n is a number big enough to do what you need, 100,000 is good, 1,000,000 is better. You only need to create this table once but once you do you will find it has all kinds of applications.
For your problem you need to consider each hour and, if I understand your problem you need to count every session that started before the end of the hour and hasn't ended before that hour starts.
Here is the SQL fiddle for the solution.
What it does is use a known sequential number from the indextable (only 0 to 100 for this fiddle - just over 4 days - you can see why you need a big n) to link with your data at the top and bottom of the hour.

Related

Query with three tables, no common column

I've just started a job and my boss wants me to learn mySQL so please bear with me, i've been learning for only 2 days and i'm not that good at it yet.
So i've been given 3 tables and several tasks to do.
The tables are:
mobile_log_messages_sms
mobile_providers
service_instances
And in them i've got to:
Find out how many messages there were in the last 25 days and how
much income did they make
Then i need to group them by day (so per day, exclude hours) and
provider name.
Also i need to ignore all the messages that have an empty string
under the service column
Also i need to ignore the messages that made 0 income and count only
those that have the column service_enabled = 1
And then i need to sort it descending, by date.
in the tables
mobile_log_messages_sms:
message_id - used to count the messages
price - using for price obviously, exlude those with 0
time - date in yyyy/mm/dd hh:mm:ss format
service - exclude all those that have an empty string (or null)
mobile_providers
provider_name - to use to group with
service_instances
enabled - only use if value is 1
I've started with:
SELECT message_id, price, time
FROM mobile_log_messages_sms
WHERE time BETWEEN '2017-02-26 00:00:00'
AND time AND '2017-03-22 00:00:00'
But i need to change the date format and then use the JOIN commands but i don't know how, and i know i need to add more to it, but i'm stumped even at the start. Also the starting just lists the messages but i need to count the total sum of the income (price) per day.
Can anyone point me in the right direction at least since i'm still a noob? Many thanks in advance and sorry if i worded something badly, english is not my first language.
Find out how many messages there were in the last 25 days and how much income did they make
1.
SELECT COUNT(message_id), SUM(price)
FROM mobile_log_messages_sms
WHERE CAST(time AS DATE) BETWEEN DATE_SUB(CURRENT_DATE,INTERVAL 25 DAY)
AND CURRENT_DATE;
2.
SELECT COUNT(message_id), SUM(price)
FROM mobile_log_messages_sms
WHERE CAST(time AS DATE) BETWEEN DATE_SUB(CURRENT_DATE,INTERVAL 25 DAY)
AND CURRENT_DATE
GROUP BY CAST(time AS DATE);
3.
SELECT COUNT(message_id), SUM(price)
FROM mobile_log_messages_sms
WHERE CAST(time AS DATE) BETWEEN DATE_SUB(CURRENT_DATE,INTERVAL 25 DAY)
AND CURRENT_DATE AND service IS NULL
GROUP BY CAST(time AS DATE);
rest can't done with join so make sure that at least one column should be common in tables.

MySQL cumulative sum grouped by date

I know there have been a few posts related to this, but my case is a little bit different and I wanted to get some help on this.
I need to pull some data out of the database that is a cumulative count of interactions by day. currently this is what i have
SELECT
e.Date AS e_date,
count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON e1.Date <= e.Date
GROUP BY e.Date;
The output of this is close to what I want but not exactly what I need.
The problem I'm having is the dates are stored with the hour minute and second that the interaction happened, so the group by is not grouping days together.
This is what the output looks like.
On 12-23 theres 5 interactions but its not grouped because the time stamp is different. So I need to find a way to ignore the timestamp and just look at the day.
If I try GROUP BY DAY(e.Date) it groups the data by the day only (i.e everything that happened on the 1st of any month is grouped into one row) and the output is not what I want at all.
GROUP BY DAY(e.Date), MONTH(e.Date) is splitting it up by month and the day of the month, but again the count is off.
I'm not a MySQL expert at all so I'm puzzled on what i'm missing
New Answer
At first, I didn't understand you were trying to do a running total. Here is how that would look:
SET #runningTotal = 0;
SELECT
e_date,
num_interactions,
#runningTotal := #runningTotal + totals.num_interactions AS runningTotal
FROM
(SELECT
DATE(eDate) AS e_date,
COUNT(*) AS num_interactions
FROM example AS e
GROUP BY DATE(e.Date)) totals
ORDER BY e_date;
Original Answer
You could be getting duplicates because of your join. Maybe e1 has more than one match for some rows which is inflating your count. Either that or the comparison in your join is also comparing the seconds, which is not what you expect.
Anyhow, instead of chopping the datetime field into days and months, just strip the time from it. Here is how you do that.
SELECT
DATE(e.Date) AS e_date,
count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON DATE(e1.Date) <= DATE(e.Date)
GROUP BY DATE(e.Date);
I figured out what I needed to do last night... but since I'm new to this I couldn't post it then... what I did that worked was this:
SELECT
DATE(e.Date) AS e_date,
count(e.ID) AS num_daily_interactions,
(
SELECT
COUNT(id)
FROM example
WHERE DATE(Date) <= e_date
) as total_interactions_per_day
FROM example AS e
GROUP BY e_date;
Would that be less efficient than your query? I may just do the calculation in python after pulling out the count per day if its more efficient, because this will be on the scale of thousands to hundred of thousands of rows returned.

Calculating patient census by hour

I am trying to build a query that calculates number of patients in the emergency room by hour. I have each patients arrival and departure times. I tried building a boolean style query but all it did was give me the arrivals by hour using this logic
SELECT MRN,
,CASE WHEN CAST(EDArrival AS TIME) between '00:00:00.000' and '00:59:59.000' then 1 else 0 end as Hour0
,CASE WHEN CAST(EDArrival AS TIME) between '01:00:00.000' and '01:59:59.000' then 1 else 0 end as Hour1
,CASE WHEN CAST(EDArrival AS TIME) between '02:00:00.000' and '02:59:59.000' then 1 else 0 end as Hour2
FROM EDArrivals
WHERE EDArrival between '2012-06-01' and '2013-07-01'
I was thinking maybe the query could place a column for each hour with a 1 or 0 in they were in the ED during those hours. What I ultimately want to get to is average patients in the ED by hour over the course of a year. If anyone can think of an easier method I would greatly appreciate the help.
Thank you
This probably won't perform great, but it will give the average for each hour over the time span you specify. The perf issue will be because of the function in the JOIN criteria in the CTE. If you need to do this for a very large number of rows it probably makes sense to break that out to another table and populate a column with the hour.
DECLARE #Hours TABLE (Hr smallint)
INSERT INTO #Hours
(Hr)
VALUES
(0)
,(1)
,(2)
,(3)
,(4)
,(5)
,(6)
,(7)
,(8)
,(9)
,(10)
,(11)
,(12)
,(13)
,(14)
,(15)
,(16)
,(17)
,(18)
,(19)
,(20)
,(21)
,(22)
,(23)
WITH ByDate
AS
(
SELECT
CAST(ED.EDArrival AS date) AS 'Dt',h.Hr, COUNT(*) AS 'PatientCount'
FROM
EDArrivals ED
JOIN
#Hours AS h
ON DATEPART(HOUR, ED.EDArrival) = h.Hr
WHERE
ED.EDArrival BETWEEN '2012-06-01' AND '2013-07-01'
GROUP BY
CAST(ED.EDArrival AS date)
,h.Hr
)
SELECT
Hr, AVG(PatientCount)
FROM
ByDate
GROUP BY
hr
ORDER BY
hr
I should also note that though you don't list it in your requirements, it probably makes more sense to also filter on the departure time is >= the given hour. You likely need to know not just how many patients show up but how many are sticking around at any given time.
I managed to create an example of my comment in SQLFiddle.
http://sqlfiddle.com/#!6/5234e/6
It's similar to JNK answer (hey, I commented first!)
By the way, creating that table variable will not be great, consider keeping a domain table with the hours.
If do you need performance consider also persisting the date part values. Evaluating them for each row is a performance killer.
Also take care with null departures date times and patients staying at midnight.
Have you tried using DateDiff:
SELECT DateDiff(n, startdate, enddate) FROM MyTable
SELECT COUNT(*) [TotalArrivals]
, DATEPART(hh, [EDArrival]) [Hour]
FROM [EDArrivals]
GROUP BY DATEPART(hh, [EDArrival])
This will get you the total arrivals grouped by hour. You can then use this to do your averages per hour / whatever other calculations you need. This wont give you the hours with no arrivals, but that should be easy to fit in to your calculations at the end.

sum up multilple datediffs of datetimes in mysql

I have a table with one user and one day's worth of punches (clockin, breakout, breakin, clockout). Now say the user takes 2 or more breaks. I need to sum up the total time of all breaks taken. I have created a sqlfiddle to make it easier to show what I am trying to do. Here is my example: http://sqlfiddle.com/#!2/21542/6 Now I need to take (12:30:21 - 12:04:44) + (12:36:00 - 12:34:00) to get the total of all breaks taken. How can I do that in my query. Now pretend I have 10 users and 10 days in my table. I would need to group by day and user I know.
I would start by finding some way to link the punch-out records with the punch-in records from the same table. We can then put this data into a table and use it for querying against.
CREATE TEMPOARY TABLE breakPunchInOut (
SELECT
DATE(punchout.PunchDateTime) AS ShiftDate,
punchout.EmpId,
punchout.PunchId AS PunchOutID,
(SELECT
PunchId
FROM
timeclock
WHERE
timeclock.EmpId = punchout.EmpId
AND
timeclock.`In-Out` = 1
AND
timeclock.PunchDateTime > punchout.PunchDateTime
AND
DATE(timeclock.PunchDateTime) = DATE(punchout.PunchDateTime)
ORDER BY
timeclock.PunchDateTime ASC
LIMIT 1
) AS PunchInID
FROM
timeclock AS punchout
WHERE
punchout.`In-Out` = 0
HAVING
PunchInID IS NOT NULL
);
The way this query works is looking for all the "punch-outs" in a specific day, for each of these it then looks for the next "punch-in" which happened on the same day, by the same person. The HAVING clause filters out records where there is no punch-in after a punch-out - so maybe where the employee goes home for the day. This is something to remember because if someone goes home halfway through a shift then their break time will not be added to the total.
It's important to point out that this approach will only work for shifts which start and end on the same day. If you have a night shift which starts in the evening and finishes in the morning the next day, then you'll have to alter the way that you join the punch outs and punch ins together.
Now that we have this linking table, its relatively simple to use it to create a summary report for each employee and each shift:
SELECT
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId,
SUM(
TIMESTAMPDIFF(MINUTE, punchOut.PunchDateTime, punchIn.PunchDateTime)
) AS TotalBreakLengthMins
FROM
breakPunchInOut
INNER JOIN
timeclock AS punchOut
ON
punchOut.PunchId = breakPunchInOut.PunchOutId
INNER JOIN
timeclock AS punchIn
ON
punchIn.PunchId = breakPunchInOut.PunchInId
GROUP BY
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId
;
Notice we use the TIMESTAMPDIFF function, not the DATEDIFF. DATEDIFF only calculates the number of days between two dates - it's not used for time.

Help needed optimizing MySQL SELECT query

I have a MySQL table like this one:
day int(11)
hour int(11)
amount int(11)
Day is an integer with a value that spans from 0 to 365, assume hour is a timestamp and amount is just a simple integer. What I want to do is to select the value of the amount field for a certain group of days (for example from 0 to 10) but I only need the last value of amount available for that day, which pratically is where the hour field has its max value (inside that day). This doesn't sound too hard but the solution I came up with is completely inefficient.
Here it is:
SELECT q.day, q.amount
FROM amt_table q
WHERE q.day >= 0 AND q.day <= 4 AND q.hour = (
SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day
) GROUP BY day
It takes 5 seconds to execute that query on a 11k rows table, and it just takes a span of 5 days; I may need to select a span of en entire month or year so this is not a valid solution.
Anybody who can help me find another solution or optimize this one is really appreciated
EDIT
No indexes are set, but (day, hour, amount) could be a PRIMARY KEY if needed
Use:
SELECT a.day,
a.amount
FROM AMT_TABLE a
JOIN (SELECT t.day,
MAX(t.hour) AS max_hour
FROM AMT_TABLE t
GROUP BY t.day) b ON b.day = a.day
AND b.max_hour = a.hour
WHERE a.day BETWEEN 0 AND 4
I think you're using the GROUP BY a.day just to get a single amount value per day, but it's not reliable because in MySQL, columns not in the GROUP BY are arbitrary -- the value could change. Sadly, MySQL doesn't yet support analytics (ROW_NUMBER, etc) which is what you'd typically use for cases like these.
Look at indexes on the primary keys first, then add indexes on the columns used to join tables together. Composite indexes (more than one column to an index) are an option too.
I think the problem is the subquery in the where clause. MySQl will at first calculate this "SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day" for the whole table and afterwards select the days. Not quite efficient :-)