I'm rather stuck on how to transform my table (using MySQL). Let me start by describing my table.
ID, ObservationDate(DATETIME), Total score, whole bunch of parameters that determine the total score
The ID's stand for patients, each ID may occur multiple times on different observation dates. I want to add 3 new columns: t_start, t_end, t_total
T_start: would be 0 for the very first observation or the previous observations' ObservationDate
T_end: is the current observation date
T_Total: Total time elapsed since start until last observation date.
These columns would have to be in LONG format, so preferably in hours.
Any idea on how to do this?
Kind regards
As I'm getting downvoted and this is being requested:
Edit: Getting downvoted and people seem to require more info. Here we go:
Table:
CREATE TABLE `dataset_origineel` (
PatientId` int(11) NOT NULL,
`ObservationDate` varchar(255) DEFAULT NULL,
`EWS_Total` int(11) DEFAULT NULL,)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Data set:
INSERT INTO `dataset_origineel` (`PatientId`, `ObservationDate`, `EWS_Total`) VALUES
(30, '2019-05-01 13:27:50.0000000',0)
(30, '2019-05-01 15:27:44.0000000',5)
(30, '2019-05-01 15:54:27.0000000',4)
(30, '2019-05-01 16:07:27.0000000',2)
(31, '2019-05-01 17:03:16.0000000',1)
(31, '2019-05-01 18:02:29.0000000',0)
(31, '2019-05-01 19:23:49.0000000',0)
(32, '2019-05-01 21:07:36.0000000',0)
(32, '2019-05-01 21:08:05.0000000',4)
(32, '2019-05-01 21:12:11.0000000',3)
(32, '2019-05-01 21:13:21.0000000',2)
(32, '2019-05-01 23:12:50.0000000',0)
(32, '2019-05-02 00:28:57.0000000',3)
What I want is:
PatientId, ObservationDate, t_start, t_end, t_total
30 2019-05-01 13:27:50 0 120 170
30 2019-05-01 15:27:44 120 147 170
30 2019-05-01 15:54:27 147 170 170
30 2019-05-01 16:07:27 170 170
And so on. Hope this is more clear.
Also: Thanks to people helping with the formatting, rather new to SO as a whole )
You seem to want lag():
select t.*,
lag(observationdate) over(partition by id order by observationdate) as t_start,
observationdate as t_end,
timestampdiff(
second,
lag(observationdate) over(partition by id order by observationdate),
observationdate
) / 60 / 60 as t_total
from mytable t
This window function is available in MySQL 8.0.
Note that I computed the time difference in seconds, then translated it to hours. This gives you a decimal count of hours, which is more accurate that giving argument hour to timestampdif() (this function just counts the number of unit boundaries that where crossed between the two timestamps).
Related
I have a database with the following columns. I have added some sample data to show formatting.
date, time, amount
2021-10-14, 13:00, 15.40
2021-10-14, 13:01, 9.34
2021-10-14, 13:02, 10.12
2021-10-14, 13:03, 7.44
There are 2.6 million rows in the database spanning two years.
Each row is an increment of 1 minute.
I need to write sql that will output and group rows that are continuous by minute for the same date, where the amount is greater than 8.00 and there are a minimum of 3 consecutive rows.
This would then find an example like:
2021-11-30, 14:44, 8.04
2021-11-30, 14:45, 9.41
2021-11-30, 14:46, 9.27
2021-11-30, 14:47, 10.54
2021-11-30, 14:48, 11.09
2022-03-13, 08:22, 36.44
2022-03-13, 08:23, 17.38
2022-03-13, 08:24, 11.86
So if I understand correctly you only want to select the rows that are part of a 3 minute (minimum) consecutive sequence where amount >= 8 ?
I'm not sure about the performance but this seems to work:
Setup:
CREATE TABLE series
(
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
date DATE NOT NULL,
time TIME NOT NULL,
datetime DATETIME GENERATED ALWAYS AS (TIMESTAMP(date, time)),
amount decimal(5, 2),
INDEX (amount)
);
INSERT INTO series (date, time, amount)
VALUES ('2021-11-30', '14:40', 7),
('2021-11-30', '14:41', 8),
('2021-11-30', '14:42', 8),
('2021-11-30', '14:43', 8),
('2021-11-30', '14:44', 8),
('2021-11-30', '14:45', 7),
('2021-11-30', '14:46', 8),
('2021-11-30', '14:47', 8),
('2021-11-30', '14:48', 8),
('2021-11-30', '14:49', 7),
('2021-11-30', '14:50', 8),
('2021-11-30', '14:51', 8),
('2021-11-30', '14:52', 7)
;
The solution:
SELECT date, time, T.amount
FROM (SELECT date,
time,
datetime,
amount,
LAG(datetime, 2) OVER (order by datetime) AS tmin2,
LAG(datetime, 1) OVER (order by datetime) AS tmin1,
LEAD(datetime, 1) OVER (order by datetime) AS tplus1,
LEAD(datetime, 2) OVER (order by datetime) AS tplus2
FROM series
WHERE amount >= 8) T
WHERE TIME_TO_SEC(TIMEDIFF(T.datetime, T.tmin2)) = 120
OR TIME_TO_SEC(TIMEDIFF(T.datetime, T.tplus2)) = -120
OR (TIME_TO_SEC(TIMEDIFF(T.datetime, T.tmin1)) = 60 AND TIME_TO_SEC(TIMEDIFF(T.datetime, T.tplus1)) = -60)
ORDER BY datetime;
Explanation:
First we filter out the values < 8 using a WHERE-statement.
Then we peek into the previous two and next two rows ordered by datetime to see if the current to is part of a 3 min sequence and filter based on that criteria.
Here is my solution:
Table Definition :
CREATE TABLE YourTbl (
date date DEFAULT NULL,
time time DEFAULT NULL,
amount decimal(4,2) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO YourTbl VALUES
('2021-10-12','16:30', 20.40),
('2021-10-12','14:21', 19.34),
('2021-10-14','13:00', 15.40),
('2021-10-14','13:01', 9.34),
('2021-10-14','13:02', 10.12),
('2021-10-14','13:03', 7.44),
('2021-11-30', '14:44', 8.04),
('2021-11-30', '14:45', 9.41),
('2021-11-30', '14:46', 9.27),
('2021-11-30', '14:47', 10.54),
('2021-11-30', '14:48', 11.09),
('2022-03-13', '08:22', 36.44),
('2022-03-13', '08:23', 17.38),
('2022-03-13', '08:24', 11.86);
Lets test the query:
SELECT
date,time,amount FROM
(SELECT date,time,amount,
LEAD(minute(time),1) OVER(PARTITION BY date ORDER BY time) as leadtime,
LAG(minute(time),1) OVER(PARTITION BY date ORDER BY time) as lagtime,
(minute(time) - LAG(minute(time),1) OVER(PARTITION BY date ORDER BY time)) as minute_forward_difference,
(LEAD(minute(time),1) OVER(PARTITION BY date ORDER BY time) - minute(time)) as minute_backward_difference
FROM YourTbl
WHERE amount > 8.0
) as tblder
WHERE (minute_forward_difference = 1) OR (minute_backward_difference = 1)
GROUP BY date,time,amount;
Resultset:
please find below the sample database:
CREATE TABLE ipay
(Ticket int(11) Primary Key,Login int(11), Profit double, opentime datetime);
INSERT INTO ipay
(Ticket,Login,Profit,opentime)
VALUES
(1,100,100,'2020-01-01 00:00:00'),
(2,100,100,'2020-02-01 00:00:00'),
(3,100,-200,'2019-01-01 00:00:00'),
(4,100,-50,'2020-01-02 00:00:00'),
(5,101,200,'2020-02-02 00:00:00'),
(6,101,200,'2020-03-02 00:00:00'),
(7,101,-10,'2020-04-02 00:00:00'),
(8,101,-200,'2020-05-02 00:00:00')
When Profit> 0, you can think the record as a deposit; when Profit<0, you can think the record as a withdrawal.
I need to get all withdrawals that happened after the first deposit, for each individual login. So that the expected output would be:
Ticket
Login
Profit
opentime
4
100
-50
2020-01-02 00:00:00
7
101
-10
2020-04-02 00:00:00
8
101
-200
2020-05-02 00:00:00
(For login 100, Ticket 3 is filtered out as it was made before Ticket 1;
For login 101, both Ticket 7 and 8 are included, since both were made after Ticket 5)
I have managed to identify the time when the first deposit was made:
SELECT LOGIN, TICKET, PROFIT, SUM(PROFIT), MIN(OPENTIME)
FROM ipay
WHERE PROFIT>0
GROUP BY LOGIN
I am stuck as there are more than one MIN(opentime).
I'm currently using MySQL version 5.7.34. Please do not hesitate to let me know if any clarification is needed. Any ideas would be much appreciated!
please try this
select * from ipay as a left join
(select login,min(opentime) as firsttime
from ipay where profit>0
group by login
order by opentime) as b
on a.login=b.login
where a.opentime>firsttime and profit<0
see in action here :sqlfiddle
I am developing a employee login system in which user check in and checkout timings are recorder. I have the following mySql table schema from which I would like to query the total working hours of an employee of a particular month.
AttendanceId UserId Operation CreatedDate
24 4 1 2016-03-20 23:18:59
25 4 2 2016-03-20 23:19:50
26 4 1 2016-03-20 23:20:28
27 4 2 2016-03-20 23:20:31
Operation 1 is for check in and operation 2 is for checkout. Can any one help me to build this query?
A pleasingly complicated question, thanks. My query deals with:
Attendances that aren't precisely measured in hours. The number of seconds is totalled and divided by 3600 at the end of the calculation.
Attendances that span the month boundary at either end (thanks strawberry)
Attendances in the current month that have started (there is an entry with operation "1") but not yet finished (there is no corresponding operation "2").
I used the following data for testing:
INSERT INTO Attendance(UserId, Operation, CreatedDate) VALUES
(4, 1, '2016-01-01 15:00:00'),
(4, 2, '2016-01-01 19:00:00'),
(4, 1, '2016-01-31 23:00:00'),
(4, 2, '2016-02-01 01:00:00'),
(4, 1, '2016-02-20 23:18:59'),
(4, 2, '2016-02-20 23:19:50'),
(4, 1, '2016-02-20 23:20:28'),
(4, 2, '2016-02-20 23:20:31'),
(4, 1, '2016-02-29 23:00:00'),
(4, 2, '2016-03-01 01:00:00'),
(4, 1, '2016-03-02 15:00:00'),
(4, 2, '2016-03-02 18:00:00'),
(4, 1, '2016-03-22 10:00:00');
The query selects all users' hours for a specific month. Selecting results for more than one month in one query is more complicated because of the possibility that attendances span month boundaries and if required it might be simplest to iterate over the months and run the query repeatedly, adjusting the four dates in the SQL appropriately.
The innermost query selects all arrival times and the corresponding departure time for all users. The outer query then restricts them to the current month, calculates the difference between the two times, and sums them by user.
SELECT UserId, SUM(TIMESTAMPDIFF(
SECOND,
GREATEST(TimeIn, '2016-02-01'),
LEAST(COALESCE(TimeOut, NOW()), '2016-03-01'))) / 3600 HoursInMonth
FROM (SELECT TimeIn.UserId, TimeIn.CreatedDate TimeIn, MIN(TimeOut.CreatedDate) TimeOut
FROM Attendance TimeIn
LEFT JOIN Attendance TimeOut ON TimeOut.UserId = TimeIn.UserId
AND TimeOut.Operation = 2
AND TimeOut.CreatedDate > TimeIn.CreatedDate
WHERE TimeIn.operation = 1
GROUP BY TimeIn.AttendanceId
ORDER BY TimeIn.CreatedDate) TimeInOut
WHERE DATE_FORMAT(TimeIn, '%Y-%m') = '2016-02'
OR DATE_FORMAT(TimeOut, '%Y-%m') = '2016-02'
OR (DATE_FORMAT(TimeIn, '%Y-%m') < '2016-02' AND TimeOut IS NULL)
GROUP BY UserId;
I have a database table with an auto-update column which is required to be in the TIMESTAMP format, this saves dates in the form YYYY-MM-DD HH:mm:ss for each time a row is updated.
On reading statements that date comparisons are (possibly very) processor heavy, the preferred method seems to be to use MySQL BETWEEN statement to check and return updates that have occurred in the last 24 hours.
A reference: https://stackoverflow.com/a/14104364/3536236
My SQL
I have removed some details that take up space that are outside the scope of this question, such as some columns
-- Generation Time: Oct 14, 2015 at 04:54 PM
-- Server version: 5.5.45-cll
-- PHP Version: 5.4.31
--
-- Table structure for table `check_log`
--
CREATE TABLE IF NOT EXISTS `check_log` (
`table_id` int(8) NOT NULL AUTO_INCREMENT,
`last_action` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`ip_addr` varchar(60) NOT NULL,
`submit_fail` varchar(1) NOT NULL,
PRIMARY KEY (`fail_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 AUTO_INCREMENT=14 ;
--
-- Dumping data for table `check_log`
--
INSERT INTO `check_log` (`table_id`, `last_action`, `ip_addr`, `submit_fail`) VALUES
(2, '2015-10-14 14:08:30', '92.99.252.185', 'N'),
(3, '2015-10-14 14:09:23', '92.99.252.185', 'N'),
(4, '2015-10-14 14:09:25', '92.99.252.185', 'N'),
(5, '2015-10-14 14:09:38', '92.99.252.185', 'N'),
(6, '2015-10-14 14:14:22', '92.99.252.185', 'N'),
(7, '2015-10-14 14:17:13', '92.99.252.185', 'N'),
(8, '2015-10-14 14:20:51', '92.99.252.185', 'N'),
(9, '2015-10-14 14:20:52', '92.99.252.185', 'N'),
(10, '2015-10-14 14:50:34', '92.99.252.185', 'N'),
(11, '2015-10-14 15:29:07', '92.99.252.185', 'N'),
(12, '2015-10-14 15:31:04', '92.99.252.185', 'N'),
(13, '2015-10-14 15:32:00', '92.99.252.185', 'N');
My Query
Now, my query wants to return all the rows that fit the criteria that have been updated in the last 24hours. So:
SELECT * FROM `check_log` WHERE `ip_addr` = '92.99.252.185' AND
(`last_action` BETWEEN date_sub(CURDATE() , INTERVAL -1 DAY ) AND CURDATE())
AND `submit_fail` = 'N'
I wrote the query in this shape because I wanted to explore how BETWEEN ... AND ... handled other ANDS in the same query, and hence for my own clarity I encased the BETWEEN statement in brackets ().
I have tried a range of minorly different syntaxes for this query including:
SELECT * FROM `check_login` WHERE `ip_addr` = '92.99.252.185' AND
(DATE_FORMAT(`last_action`, '%Y-%m-%d') BETWEEN date_sub(CURDATE() , INTERVAL -1 DAY ) AND CURDATE())
and pure date check:
SELECT * FROM `check_login` WHERE
`last_action` BETWEEN date_sub(CURDATE() , INTERVAL -1 DAY ) AND CURDATE()
Each time the MySQL returns Zero Rows (not an error) but zero rows found.
I have viewed and compared at least a dozen similar answers on SO about the comparison of dates and am at a bit of a loss how I'm not getting the rows returned that I'm expecting with my query.
(I am ideally wanting to use the BETWEEN form as this table will, when in use be reaching several thousands of rows. )
What can I do to make the comparison work?
How does the BETWEEN clause handle other ANDs, is it suitable to encase in brackets (for clarity)
Is there a more efficient / suitable method to compare timestamp column dates?
It appears that DATE_SUB() is subtraction so I did not need to do -1 on the INTERVAL <value> DAY section of the SQL, however, the INTERVAL does accept negative values but that would overall be a subtraction of a negative and so a +1 Day interval.
I had originally thought for some reason DATE_SUB had stood for substitution as the allowance of negative values in the value part - to me - meant that there was no need for a data addition function as well.
I wasted half a day reading up and trying to work out how this logic worked.
Each staff already has a table of avail time slots in AvailSlots like this:
Staff_ID Avail_Slots_Datetime
1 2015-1-1 09:00:00
1 2015-1-1 10:00:00
1 2015-1-1 11:00:00
2 2015-1-1 09:00:00
2 2015-1-1 10:00:00
2 2015-1-1 11:00:00
3 2015-1-1 09:00:00
3 2015-1-1 12:00:00
3 2015-1-1 15:00:00
I need to find out which staff has, for example, 2 (or 3, 4, etc) CONSECUTIVE avail time slots at each time slot. As a novice, the INNER JOIN codes below is all I know to write if the query is for 2 consecutive time slots.
SELECT a.start_time, a.person
FROM a_free a, a_free b
WHERE (b.start_time = addtime( a.start_time, '01:00:00' )) and (a.person = b.person)
But, obviously, doing it that way, I would have to add more INNER JOIN codes - for each case - depending on whether the query is for 3, or 4, or 5 , etc consecutive available time slots at a given date/hour. Therefore, I want to learn a more efficient and flexible way to do the same. Specifically, the query code I need (in natural language) would be this:
For each time slot in AvailSlots, list one staff that has X (where X can
be any number I specify per query, from 1 to 24) consecutive datetime
slot starting from that datetime. In case more than one staff can meet
that criteria, the tie break is their "rank" which is kept in a
separate table below:
Ranking Table (lower number = higher rank)
Staff_ID Rank
1 3
2 1
3 2
If the answer is to use things like "mysql variables", "views", etc, please kindly explain how those things work. Again, as a total mysql novice, "select", "join", "where", "group by" are all I know so far. I am eager to learn more but have trouble understanding more advanced mysql concepts so far. Many thanks in advance.
Using a bit more data than you posted, I found a query that might do what you need. It does use the variables as you predicted :) but I hope it's pretty self-explanatory. Let's start with the table:
CREATE TABLE a_free
(`Staff_ID` int, `Avail_Slots_Datetime` datetime)
;
INSERT INTO a_free
(`Staff_ID`, `Avail_Slots_Datetime`)
VALUES
(1, '2015-01-01 09:00:00'),
(1, '2015-01-01 10:00:00'),
(1, '2015-01-01 11:00:00'),
(1, '2015-01-01 13:00:00'),
(2, '2015-01-01 09:00:00'),
(2, '2015-01-01 10:00:00'),
(2, '2015-01-01 11:00:00'),
(3, '2015-01-01 09:00:00'),
(3, '2015-01-01 12:00:00'),
(3, '2015-01-01 15:00:00'),
(3, '2015-01-01 16:00:00'),
(3, '2015-01-01 17:00:00'),
(3, '2015-01-01 18:00:00')
;
Then there's a query to find the consecutive slots. It lists start times of each pair, and marks each group of consecutive slots with a unique number. The case expression is where the magic happens, see the comments:
select
Staff_ID,
Avail_Slots_Datetime as slot_start,
case
when #slot_group is null then #slot_group:=0 -- initalize the variable
when #prev_end <> Avail_Slots_Datetime then #slot_group:=#slot_group+1 -- iterate if previous slot end does not match current one's start
else #slot_group -- otherwise just just keep the value
end as slot_group,
#prev_end:= Avail_Slots_Datetime + interval 1 hour as slot_end -- store the current slot end to compare with next row
from a_free
order by Staff_ID, Avail_Slots_Datetime asc;
Having the list with slot groups identified, we can wrap the query above in another one to get the lengths of each slot group. The results of the first query are treated as any other table:
select
Staff_ID,
slot_group,
min(slot_start) as group_start,
max(slot_end) as group_end,
count(*) as group_length
from (
select
Staff_ID,
Avail_Slots_Datetime as slot_start,
case
when #slot_group is null then #slot_group:=0
when #prev_end <> Avail_Slots_Datetime then #slot_group:=#slot_group+1
else #slot_group
end as slot_group,
#prev_end:= Avail_Slots_Datetime + interval 1 hour as slot_end
from a_free
order by Staff_ID, Avail_Slots_Datetime asc
) groups
group by Staff_ID, slot_group;
Note: if you use the same DB connection to execute the query again, the variables would not be reset, so the slot_groups numbering will continue to grow. This normally should not be a problem, but to be on the safe side, you need to execute something like this before or after:
select #prev_end:=null;
Play with the fiddle if you like: http://sqlfiddle.com/#!2/0446c8/15