MySQL - How to combine the column data into one row - mysql

I am making an employee attendance record and I have trouble merging the
raw data into time in and time out combination format.
From the given "Raw Data" Table below
I need to combine the time in and time out of the employee into one row
like the "Merge Time in/out" sample below.
Also consider that the employee has two shifting schedule the day shift and night shift.
Take note that if the employee is in night shift schedule
the time out date is different to time in date.
Day shift empid(ID001,ID002)
Night shift empid(ID003)
Raw Data Table
--------------------------------------------
empid date time[in/out] in_out
--------------------------------------------
ID001 2014-08-01 7:00am IN
ID002 2014-08-01 7:01am IN
ID003 2014-08-01 8:05pm IN <--Night Shift
ID001 2014-08-01 5:00pm OUT
ID002 2014-08-01 5:01pm OUT
ID003 2014-08-02 6:01am OUT <--take note of date
Merge Time in/out Table
--------------------------------------------
empid date time_in time_out
--------------------------------------------
ID001 2014-08-01 7:00am 5:00pm
ID002 2014-08-01 7:01am 5:01pm
ID003 2014-08-01 8:05pm 6:01am

select r1.empid,
r1.date,
r1.time as time_in,
r2.time as time_out
from raw_Data r1
inner join raw_data r2 on r1.empid = r2.empid
where r1.in_out = 'IN'
and r2.in_out = 'OUT';

Ok, so you can tell if the employee worked the night shift when his time_out was AM. In this case, it's the last row's case.
What I did was determinate a real date field. It is the day before when you're out from the night shift, and the current date in any other case
select empid,
IF(RIGHT(timeinout,2)='am' AND in_out='OUT',
DATE_ADD(date, INTERVAL -1 DAY),
date) as realdate,
MAX(if(in_out='IN',timeinout,null)) as time_in,
MAX(if(in_out='OUT',timeinout,null)) as time_out
from shifts
group by empid, realdate
Outputs
depending on the table size it might be worth using this way just for saving yourself a join. In almost any other case, a join is cleaner.
I guess you have no control over the format of the input, so you'll have to stick to times as text and make a comparison for the am/pm suffix in the last 2 characters. I find that rather error prone, but let's pray the raw data will stick to that format.
This solution makes a few assumptions that I rather explain here to avoid further misunderstandings
The workers can't work the night shift if they worked the day shift (since we're grouping by date, you would need an extra field to distinguish day shift and night shift for a given day)
Input will never list an OUT time earlier than a IN time for a given day/employee tuple (if it happens, this would need an extra verification step to guarantee consistent output)
Input will always include timein and timeout for a given shift (if it didn't, you would need an extra step to discard orfan timeentries).

Try this query and tell me if it works
SELECT empid,
date,
MAX(CASE WHEN in_out = 'IN' THEN time ELSE '' END) time_in,
MAX(CASE WHEN in_out = 'OUT' THEN time ELSE '' END) time_out
FROM Raw Data
GROUP BY empid, date

Related

What's the difference between the two SQL statements?

This is a question from leetcode, using the second query I got the question wrong but could not identify why
SELECT
user_id,
max(time_stamp) as "last_stamp"
from
logins
where
year(time_stamp) = '2020'
group by
user_id
and
select
user_id,
max(time_stamp) as "last_stamp"
from
logins
where
time_stamp between '2020-01-01' and '2020-12-31'
group by
user_id
The first query uses a function on every row to extract the year (an integer) and compares that to a string. (It would be preferable to use an integer instead.) Whilst this may be sub-optimal, this query would accurately locate all rows that fall into the year 2020.
The second query could fail to locate all rows that fall into 2020. Here it is important to remember that days have a 24 hour duration, and that each day starts at midnight and concludes at midnight 24 hours later. That is; a day does have a start point (midnight) and an end-point (midnight+24 hours).
However a single date used in SQL code cannot be both the start-point and the end-point of the same day, so every date in SQL represents only the start-point. Also note here, that between does NOT magically change the second given date into "the end of that day" - it simply cannot (and does not) do that.
So, when you use time_stamp between '2020-01-01' and '2020-12-31' you need to think of it as meaning "from the start of 2020-01-01 up to and including the start of 2020-12-31". Hence, this excludes the 24 hours duration of 2020-12-31.
The safest way to deal with this is to NOT use between at all, instead write just a few characters more code which will be accurate regardless of the time precision used by any date/datetime/timestamp column:
where
time_stamp >= '2020-01-01' and time_stamp <'2021-01-01'
with the second date being "the start-point of the next day"
See answer to SQL "between" not inclusive

SQl query to calculate number of active users at the end of everyday

I have three columns User_ID, New_Status and DATETIME.
New_Status contains 0(inactive) and 1(active) for users.
Every user starts from active status - ie. 1.
Subsequently table stores their status and datetime at which they got activated/inactivated.
How to calculate number of active users at the end of each date, including dates when no records were generated into the table.
Sample data:
| ID | New_Status | DATETIME |
+----+------------+---------------------+
| 1 | 1 | 2019-01-01 21:00:00 |
| 1 | 0 | 2019-02-05 17:00:00 |
| 1 | 1 | 2019-03-06 18:00:00 |
| 2 | 1 | 2019-01-02 01:00:00 |
| 2 | 0 | 2019-02-03 13:00:00 |
Format the date time value to a date only string and group by it
SELECT DATE_FORMAT(DATETIME, '%Y-%m-%d') as day, COUNT(*) as active
FROM test
WHERE New_Status = 1
GROUP BY day
ORDER BY day
In MySQL 8 you can use the row_number() window function to get the last status of a user per day. Then filter for the one that indicate the user was active GROUP BY the day and count them.
SELECT date(x.datetime),
count(*)
FROM (SELECT date(t.datetime) datetime,
t.new_status,
row_number() OVER (PARTITION BY date(t.datetime)
ORDER BY t.datetime DESC) rn
FROM elbat t) x
WHERE x.rn = 1
AND x.new_status = 1
GROUP BY x.datetime;
If not all days are in the table you need to create a (possibly derived) table with all days and cross join it.
Find out the last activity status of users whose activity was changed for each day
select User_ID, New_Status, DATE_FORMAT(DATETIME, '%Y-%m-%d')
from activity_table
where not exists
(
select 1
from activity_table at
where at.User_ID = activity_table.User_ID and
DATE_FORMAT(at.DATETIME, '%Y-%m-%d') = DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d') and
at.DATETIME > activity_table.DATETIME
)
order by DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d');
This is not the solution yet, but a very very useful information before solution. Note that here not all dates are covered yet and the values are individual records, more precisely their last values on each day, ordered by the date.
Let's get aggregate numbers
Using the query above as a subselect and aliasing it into a table, you can group by DATETIME and do a select sum(new_Status) as activity, count(*) total, DATETIME so you will know that activity - (total - activity) is the difference in comparison to the previous day.
Knowing the delta for each day present in the result
At the previous section we have seen how the delta can be calculated. If the whole query in the previous section is aliased, then you can self join it using a left join, with pairs of (previous date, current date), still having the gaps of dates, but not worrying about that just yet. In the case of the first date, its activity is the delta. For subsequent records, adding the previous day's delta to their delta yields the result you need. To achieve this you can use a recursive query, supported by MySQL 8, or, alternatively, you can just have a subquery which sums the delta of previous days (with special attention to the first date, as described earlier) will and adding the current date's delta yields the result we need.
Fill the gaps
The previous section would already perfectly work (assuming the lack of integrity problems), assuming that there were activity changes for each day, but we will not continue with the assumption. Here we know that the figures are correct for each date where a figure is present and we will need to just add the missing dates into the result. If the results are properly ordered, as they should be, then one can use a cursor and loop the results. At each record after the first one, we can determine the dates that are missing. There might be 0 such dates between two consequent dates or more. What we do know about the gaps is that their values are exactly the same as the previous record, that do has data. If there were no activity changes on a given date, then the number of active users is exactly the same as in the previous day. Using some structure, like a table you can generate the results you have with the knowledge described here.
Solving possible integrity problems
There are several possibilities for such problems:
First, a data item might exist prior to the introduction of this table's records were started to be spawned.
Second, bugs or any other causes might have made a pause in creating records for this activity table.
Third, the addition of user is or was not necessarily generating an activity change, since its popping into existence renders its previous state of activity undefined and subject to human standards, which might change over time.
Fourth, the removal of user is or was not necessarily generating an activity change, since its popping out of existence renders is current state of activity undefined and subject to human standards, which might change over time.
Fifth, there is an infinity of other issues which might cause data integrity issues.
To cope with these you will need to comprehensively analyze whatever you can from the source-code and the history of the project, including database records, logs and humanly available information to detect such anomalies, the time they were effective and figure out what their solution is if they exist.
EDIT
In the meantime I was thinking about the possibility of a user, who was active at the start of the day being deactivated and then activated again by the end of the day. Similarly, an inactive user during a day might be activated and then finally deactivated by the end of the day. For users that have more than an activation at the start of the day, we need to compare their activity status at the start and the end of the day to find out what the difference was.
SELECT
DATE(DATETIME),
COUNT(*)
FROM your_table
WHERE New_Status = 1
GROUP BY User_ID,
DATE(DATETIME)
For MySQL
WITH RECURSIVE
cte AS (
SELECT MIN(DATE(DT)) dt
FROM src
UNION ALL
SELECT dt + INTERVAL 1 DAY
FROM cte
WHERE dt < ( SELECT MAX(DATE(DT)) dt
FROM src )
),
cte2 AS
(
SELECT users.id,
cte.dt,
SUM( CASE src.New_Status WHEN 1 THEN 1
WHEN 0 THEN -1
ELSE 0
END ) OVER ( PARTITION BY users.id
ORDER BY cte.dt ) status
FROM cte
CROSS JOIN ( SELECT DISTINCT id
FROM src ) users
LEFT JOIN src ON src.id = users.id
AND DATE(src.dt) = cte.dt
)
SELECT dt, SUM(status)
FROM cte2
GROUP BY dt;
fiddle
Do not forget to adjust max recursion depth.
Here is what I believe is a good solution for this problem of yours:
SELECT SUM(New_Status) "Number of active users"
, DATE_FORMAT(DATEC, '%Y-%m-%d') "Date"
FROM TEST T1
WHERE DATE_FORMAT(DATEC,'%H:%i:%s') =
(SELECT MAX(DATE_FORMAT(T2.DATEC,'%H:%i:%s'))
FROM TEST T2
WHERE T2.ID = T1.ID
AND DATE_FORMAT(T1.DATEC, '%Y-%m-%d') = DATE_FORMAT(T2.DATEC, '%Y-%m-%d')
GROUP BY ID
, DATE_FORMAT(DATEC, '%Y-%m-%d'))
GROUP BY DATE_FORMAT(DATEC, '%Y-%m-%d');
Here is the DEMO

How to store date and time ranges without overlap in MySQL

I'm trying to find the right query to check if date and time ranges overlap in the MySQL table, here is the table:
id pickup_date pickup_time return_date return_time
1 2016-05-01 12:00:00 2016-05-31 13:00:00
2 2016-07-01 12:00:00 2016-07-04 15:00:00
Here are the data about every reservation which is coming and need to be checked against the "Reservations" table:
pickup_date = '2016-04-01';
pickup_time = '12:00:00'
return_date = '2016-05-01';
return_time = '13:00:00'
with this data the reservation overlap the one in the database. Take a note: the new reservation can be in the past or in the future.
EDIT (as proposed by spencer7593, this is the working version so far):
SET #new_booking_pickup_date = '2016-04-01';
SET #new_booking_pickup_time = '12:00:00';
SET #new_booking_return_date = '2016-05-01';
SET #new_booking_return_time = '13:00:00';
SELECT * FROM Reservation WHERE NOT
( CONCAT(#new_booking_pickup_date,' ',#new_booking_pickup_time) > CONCAT(return_date,' ',return_time) + INTERVAL 0 DAY OR CONCAT(#new_booking_return_date,' ',#new_booking_return_time) < CONCAT(pickup_date,' ',pickup_time) + INTERVAL 0 DAY);
, so this query will result:
id pickup_date pickup_time return_date return_time
1 2016-05-01 12:00:00 2016-05-31 13:00:00
It's pretty easy to determine if a given period doesn't overlap with another period.
For ease of expressing the comparison, for period 1, we'll let the begin and end be represented by b1 and e1. For period 2, b2 and e2.
There is no overlap if the following is true:
b1 > e2 OR e1 < b2
(We can quibble whether equality of b1 and e2 would be considered an overlap or not, and adjust as necessary.)
The negation of that test would return TRUE if there was an overlap...
NOT (b1 > e2 OR e1 < b2)
So, to find out if there is a row that overlaps with the proposed period, we would need a query that tests whether a row is returned...
Let's assume that table we are going to check has columns st and et (DATETIME) representing the beginning and ending of each period.
To find rows with an overlap with a proposed period bounded by b1 and e1
SELECT t.* FROM t WHERE NOT (b1 > t.et OR e1 < t.st)
So for a query to just check for the existence of an overlapping row, we could do something like this:
SELECT EXISTS (SELECT 1 FROM t WHERE NOT (b1 > t.et OR e1 < t.st))
That's pretty simple.
It's going to look a lot more complicated when we make the adjustment for the (inexplicable) split of the date and time components of a datetime into separate columns (as shown in the question).
It's just a straightforward matter of combining the separate date and time values together into a single value of DATETIME datatype.
All we need to do is substitute into our query above an appropriate conversion, e.g.
st => CONCAT(pickup_date,' ',pickup_time) + INTERVAL 0 DAY
et => CONCAT(return_date,' ',return_time) + INTERVAL 0 DAY
Same for b1 and e1.
Doing that substitution, coming up with the final query, is left as an exercise for whoever decided that storing date and time as separate columns was a good idea.

MYSQL First and last datetime within a day

I have a table with 3 days of data (about 4000 rows). The 3 sets of data are all from a 30 minutes session. I want to have the start and ending time of each session.
I currently use this SQL, but it's quite slow (even with only 4000 records). The datetime table is indexed, but I think the index is not properly used because of the conversion from datetime to date.
The tablelayout is fixed, so I cannot change any part of that. The query takes about 20 seconds to run.. (and every day longer and longer). Anyone have some good tips to make it faster?
select distinct
date(a.datetime) datetime,
(select max(b.datetime) from bike b where date(b.datetime) = date(a.datetime)),
(select min(c.datetime) from bike c where date(c.datetime) = date(a.datetime))
from bike a
Maybe I'm missing something, but...
Isn't the result returned by the OP query equivalent to the result from this query:
SELECT DATE(a.datetime) AS datetime
, MAX(a.datetime) AS max_datetime
, MIN(a.datetime) AS min_datetime
FROM bike a
GROUP BY DATE(a.datetime)
Alex, warning, this in typed "freehand" so may have some syntax problems. But kind of shows what I was trying to convey.
select distinct
date(a.datetime) datetime,
(select max(b.datetime) from bike b where b.datetime between date(a.datetime) and (date(a.datetime) + interval 1 day - interval 1 second)),
(select min(c.datetime) from bike c where c.datetime between date(a.datetime) and (date(a.datetime) + interval 1 day - interval 1 second))
from bike a
Instead of comparing date(b.datetime), it allows comparing the actual b.datetime against a range calculated form the a.datetime. Hopefully this helps you out and does not make things murkier.

Time difference between dates, including business hour and excluding holidays

How can I calculate the time difference between two date, considering:
Only Monday to Friday
Time between 9am to 5:30pm;
Exclude Holidays.
Example:
d1 = 2012-10-05 17:00:00
d2 = 2012-14-09 12:00:00
Calculation Steps:
2012-10-05 = 00:30:00
2012-10-06 = 00:00:00
2012-10-07 = 00:00:00
2012-10-08 = 07:30:00
2012-10-09 = 04:00:00
ddiff(d2,d1) = 12:00:00
I know how to do it using only mon-fri, as described here. And I am talking about MySQL.
I've come up with a solution that's relatively straightforward for calculating the time difference for the full interim dates. However it's a bit messy to use mysql for calculating the time difference for the start & end dates. I have included them in my solution, but with a number of assumptions.
In any case, here's the sql
SET #startdate:='2012-12-24 17:00:00';
SET #enddate:='2013-01-02 12:00:00';
SELECT
TIME_TO_SEC(TIMEDIFF(CONCAT(DATE(#startdate),' 17:30:00'), #startdate))/3600 as startday_time,
TIME_TO_SEC(TIMEDIFF(#enddate, CONCAT(DATE(#enddate),' 9:00:00')))/3600 as endday_time,
SUM(daily_hours) as otherdays_time from
(
SELECT 7.5 as daily_hours, id, DATE_ADD(DATE(#startdate),INTERVAL id-1 DAY) as idate from numbers
) dates
LEFT JOIN holidays on DATE(dates.idate) = DATE(holidays.date)
WHERE
idate BETWEEN #startdate AND #enddate
AND holidays.date IS NULL
AND DAYOFWEEK(idate) <> 7 AND DAYOFWEEK(idate) <> 1;
sqlfiddle here:
http://sqlfiddle.com/#!2/ff3f3/1/2
To get the valid interim dates, we'll need two tables - a holidays table listing all the holiday dates and a numbers table that contains a series of integers which is very useful for joining against to get a sequential series of dates (with no gaps).
Note: In the sqlfiddle, I've populated the numbers table only up to 12 to cover the dates used in my example - it will probably need to be populated to a higher number depending on the range of dates you'll be working with.
For the start day time & end day time, I've made the following assumptions:
that start date & end date are both valid dates that should be counted towards the total time
that the time on the start date is between lunch and 17.30
that the time on the end date is between lunch and 17.30
if these assumptions are wrong, you're getting into serious conditional territory (with lots of ifs) and might be best doing this in the php (or whatever).
note: I've left the times (which are in hours) un-added for illustration purposes.