How to merge tables in standard sql92 or in mysql - mysql

Suppose I have the following set in a table:
empid
start_time
end_time
1
8
9
1
9
10
1
11
12
1
12
13
1
13
14
1
14
15
I want to have an sql (or an sql process ) that convert the previous set to the following set:
empid
start_time
end_time
1
8
10
1
11
15
It means that if the end_time of a record equals to the start_time of the next record we shall remove one record and update the record with the new value (of course without touching the main table)

This is a type of gaps-and-islands problem. In this case, you can use lag to see where an "island" starts, then use a cumulative sum to assign the same number within an island and aggregate:
select empid, min(start_time), max(end_time)
from (select t.*,
sum(case when prev_end_time = start_time then 0 else 1 end) over (partition by empid order by start_time) as island
from (select t.*,
lag(end_time) over (partition by empid order by start_time) as prev_end_time
from t
) t
) t
group by empid, island;
Here is a db<>fiddle.

Related

how to get current streak in mysql

I am trying to create a query for getting the current streak in MySQL based on status
ID
Dated
Status
1
2022-03-08
1
2
2022-03-09
1
3
2022-03-10
0
4
2022-03-11
1
5
2022-03-12
0
6
2022-03-13
1
7
2022-03-14
1
8
2022-03-16
1
9
2022-03-18
0
10
2022-03-19
1
11
2022-03-20
1
In the above table current streak should be 2( i.e 2022-03-20 - 2022-03-19) based on status 1. Any help or ideas would be greatly appreciated!
WITH cte AS (
SELECT SUM(Status) OVER (ORDER BY Dated DESC) s1,
SUM(NOT Status) OVER (ORDER BY Dated DESC) s2
FROM table
)
SELECT MAX(s1)
FROM cte
WHERE NOT s2;
SELECT DATEDIFF(MAX(CASE WHEN Status THEN Dated END),
MAX(CASE WHEN NOT Status THEN Dated END))
FROM table
and so on...
This is a gaps and islands problem. In your case, you want the island of status 1 records which occurs last. We can use the difference in row numbers method, assuming you are using MySQL 8+.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY Dated) rn1,
ROW_NUMBER() OVER (PARTITION BY Status ORDER BY Dated) rn2
FROM yourTable
),
cte2 AS (
SELECT *, RANK() OVER (ORDER BY rn1 - rn2 DESC) rnk
FROM cte
WHERE Status = 1
)
SELECT ID, Dated, Status
FROM cte2
WHERE rnk = 1
ORDER BY Dated;
Demo
We can use 2 one row CTE's to find the latest date where the status was not the same as the latest one and then count the records superieur.
**Schema (MySQL v8.0)**
create table t(
ID int,
Dated date,
Status int);
insert into t values
(1,'2022-03-08',1),
(2,'2022-03-09',1),
(3,'2022-03-10',0),
(4,'2022-03-11',1),
(5,'2022-03-12',0),
(6,'2022-03-13',1),
(7,'2022-03-14',1),
(8,'2022-03-16',1),
(9,'2022-03-18',0),
(10,'2022-03-19',1),
(11,'2022-03-20',1);
---
**Query #1**
with latest AS
(SELECT
dated lastDate,
status lastStatus
from t
order by dated desc
limit 1 ),
lastDiff as
(select MAX(dated) diffDate
from t,latest
where not status = lastStatus
)
select count(*)
from t ,lastDiff
where dated > diffDate;
| count(*) |
| -------- |
| 2 |
---
[View on DB Fiddle](https://www.db-fiddle.com/)
We could also consider using datediff() to find the number of days that the streak has lasted which might be more interesting than count() seeing as there are some days where there is no record.

MySQL find rows where yesterday's value is > X AND where last 5 days value < X exists

Let's say I have the following table:
date | name | value
----------------------------
2020-09-01 | name1 | 10
2020-09-02 | name1 | 9
2020-09-03 | name1 | 12
2020-09-04 | name1 | 11
2020-09-05 | name1 | 11
I would like to identify names where the latest value >= 10 AND where over the last 5 days it has ever dropped below 10. In the example table above, name1 would be returned because the latest date has a value of 11 (which is > 10), and over the last 5 days it has dropped below 10 at least once.
Here is my SELECT statement, but it always returns zero rows:
SELECT
name,
count(value) as count
FROM table_name
WHERE
(date = #date AND value >= 10) AND
date BETWEEN date_sub(#date, interval 5 day) AND #date AND value < 10
GROUP BY name
HAVING count < 5
ORDER BY name
I understand why it's failing, but I don't know what to change.
In MySQL 8.0, you could use window functions and aggregation:
select name
from (
select t.*, row_number() over(partition by name order by date desc) rn
from mytable t
where date >= #date - interval 5 day and date <= #date
) t
having max(case when rn = 1 then value end) >= 10 and min(value) <= 10
How about something like this:
SELECT Name, COUNT(*) AS Ct FROM
(SELECT A.*,B.mdate,
CASE WHEN A.date=B.mdate AND A.value >= 10 THEN 1
WHEN A.date >= B.mdate - INTERVAL 5 DAY AND A.date <> B.mdate AND A.value < 10 THEN 1
ELSE 0 END AS Chk
FROM table_name A
JOIN (SELECT Name,MAX(DATE) AS mdate FROM table_name GROUP BY Name) B ON A.Name=B.Name
HAVING Chk <> 0) V
GROUP BY Name
HAVING Ct >= 2
Here's a fiddle for reference: https://www.db-fiddle.com/f/jX4GktCdTrUbqHBf7ZQwdr/0
And here's a breakdown of what the query above is doing.
Joining table_name with a sub-query of the same table but with MAX(DATE) value for comparison.
Using CASE function to check for your conditions; if matches with the conditions, it will return 1, if not, return 0. Added HAVING to exclude any 0 value from the CASE function.
Turn the query to become a sub-query (assigned as V) and do a COUNT(*) over how many occurrence happen on the name then using HAVING again to get any name that have 2 or more occurrence.

Calculate total working hour of employee in sql with only 1 column

Data in table rider_status will be like:
rider_id online_status date_time
2 1 2019-10-17 08:00:40
3 1 2019-10-17 09:30:30
2 0 2019-10-17 12:30:40
2 1 2019-10-17 14:50:50
2 0 2019-10-17 18:50:50
Online status 0 = not working
Online status 1 = working
Now I want to calculate rider '2' total working hour of that particular date (for example '2019-19-17'). And further I want to calculate total hour of that rider for particular date range (for example '2019-10-05' to '2019-10-30').
My answer for rider_id '2' should be like:
12:30:40 - 08:00:40 = 04:30:12
18:50:50 - 14:50:50 = 06:00:00
--------
Total working hour = 10:30:12
Assuming that you will always get an online status of 1,0,1,0 etc then you can use LEAD() and LAG() to match the log-off to the log-on. Be careful to ensure that you are using the user ID in the window expression
https://www.geeksforgeeks.org/mysql-lead-and-lag-function/
This will also apply to MSSQL from 2008 onwards
you can then use TIMEDIFF() to get the difference between the two times
https://www.w3resource.com/mysql/date-and-time-functions/mysql-timediff-function.php
In MSSQL you would use DATEDIFF(MINUTE,Time1,Time2)
Assuming that the 1s and 0s are interleaved for each person, you can use:
select rider_id,
sum(timestamp_diff(second, datetime, next_datetime)) as time_in_seconds
from (select t.*,
lead(date_time) over (partition by rider_id order by date_time) as next_date_time
from t
) t
where online_status = 1
group by rider_id;
Select rider_id,
case when lead(
online_status) over
partition by rider_id order by
riderid, date_time <> online_status
then
lead(date_time)over (partition
by rider_id order by rider_id)
-
lag(updatedtime) over (partition
by rider_id order by rider_id)
End
From table

Correct query to get average from top 5 of 7 days?

I'm tracking number of steps/day. I want to get the average steps/day using the 5 best days out of a 7 day period. My end goal is going to be to get an average for the best 5 out of 7 days for a total of 16 weeks.
Here's my sqlfiddle - http://sqlfiddle.com/#!9/5e69bdf/2
Here is the query I'm currently using but I've discovered the result is not correct. It's taking the average of 7 days instead of selecting the 5 days that had the most steps. It's outputting 14,122 as an average instead of 11,606 based on my data as posted in the sqlfiddle.
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid=? AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL $y DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL $x DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
Here's the same query with the values filled in for testing:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL 6 DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
As #SloanThrasher pointed out, the reason the query is not working is because you have multiple rows for the same course in the Courses database which end up being joined to the activities database. Thus the output for the subquery gives the top value (16058) 3 times plus the second highest value (11218) twice for a total of 70610 and an average of 14122. You can work around this by modifying the query as follows:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN (SELECT DISTINCT Startsemester FROM Courses) c
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(c.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(c.Startsemester, INTERVAL 6 DAY)
ORDER BY CAST(activities.steps AS UNSIGNED) DESC LIMIT 5
) a
GROUP BY a.encodedid
Now since there are actually only 3 days with activity (2018-07-16, 2018-07-17 and 2018-07-18) between the start of semester and 6 days later (2018-07-12 and 2018-07-18) this gives a total of 37533 (16058+11218+10277) and an average of 12517.7.
StepsTotal AVGSteps
37553 12517.666666666666
Ideally, you probably also want to add a constraint on the Course chosen from Courses e.g. change
(SELECT DISTINCT Startsemester FROM Courses)
to
(SELECT DISTINCT Startsemester FROM Courses WHERE CourseNumber='PHED1164')
Try this query:
SELECT #rn := 1, #weekAndYear := 0;
SELECT weekDayAndYear,
SUM(steps),
AVG(steps)
FROM (
SELECT #weekAndYear weekAndYearLag,
CASE WHEN #weekAndYear = YEAR(activitydate) * 100 + WEEK(activitydate)
THEN #rn := #rn + 1 ELSE #rn := 1 END rn,
#weekAndYear := YEAR(activitydate) * 100 + WEEK(activitydate) weekDayAndYear,
steps,
lightly_act_min,
fairly_act_min,
sed_act_min,
vact_min,
encodedid,
activitydate,
username
FROM activities
ORDER BY YEAR(activitydate) * 100 + WEEK(activitydate), CAST(steps AS UNSIGNED) DESC
) a WHERE rn <= 5
GROUP BY weekDayAndYear
Demo
With additional variables, I imitate SQL Server ROW_NUMBER function, to number from 1 to 7 days partitioned by weeks. This way I can filter best 5 days and easily get a average grouping by column weekAndDate, which is in the same format as variable: yyyyww (i used integer to avoid casting to varchar).
Consider the following:
DROP TABLE IF EXISTS my_table;
CREATE TABLE `my_table`
(id SERIAL PRIMARY KEY
,steps INT NOT NULL
);
insert into my_table (steps) values
(9),(5),(7),(7),(7),(8),(4);
select prev
, sum(steps) total
from (
select steps
, case when #prev = grp
then #j:=#j+1 else #j:=1 end j
, #prev:=grp prev
from (SELECT steps
, case when mod(#i,3)=0
then #grp := #grp+1 else #grp:=#grp end grp -- a 3 day week
, #i:=#i+1 i
from my_table
, (select #i:=0,#grp:=0) vars
order
by id) x
, (select #prev:= null, #j:=0) vars
order by grp,steps desc,i) a
where j <=2 -- top 2 (out of 3)
group by prev;
+------+-------+
| prev | total |
+------+-------+
| 1 | 16 |
| 2 | 15 |
| 3 | 4 |
+------+-------+
http://sqlfiddle.com/#!9/ee46d7/11

mysql select rows by consecutive date

I have a table of available date blocks (7 days in my case) which may or may not be consecutive:
start_date end_date booked id room_id
2012-07-14 2012-07-21 0 1 6
2012-07-21 2012-07-28 0 2 6
2012-07-28 2012-08-04 1 3 6
2012-08-04 2012-08-11 0 4 6
What I'd like to do is be able to get a result set that gives me one row per X weeks of consecutive unbooked dates, within a date range.
So, for 2 week blocks starting on the 14th of July and using the above table data, I would expect the following:
start_date end_date booked
2012-07-14 2012-07-28 0
The second block of 2 weeks would not be returned as one of the component weeks is booked.
Here are a few ideas I've tried already:
SELECT
MIN(start_date) AS start_date_min,
MAX(end_date) AS end_date_max,
CAST(GROUP_CONCAT(id) AS CHAR) AS ids,
SUM(booked) AS booked
FROM
available_dates
WHERE
(start_date>=20120714 AND end_date<=DATE_ADD(20120714, INTERVAL 14 DAY))
GROUP BY
room_id
HAVING
end_date_max=DATE_ADD(20120714, INTERVAL 14 DAY)
This gets me part of the way, however doesn't get me the consecutive results - that is the important part. It also only returns a single result (probably because of the HAVING clause) when I widen the test data.
Can anyone point me in the right direction?
If you have a calendar or a numbers table:
CREATE TABLE num
( i INT NOT NULL
, PRIMARY KEY (i)
) ;
INSERT INTO num
(i)
VALUES
(0), (1), (2), ..., (1000) ;
You could use something like this:
SELECT
avail.room_id,
MIN(avail.start_date) AS start_date_min,
MAX(avail.end_date) AS end_date_max,
CAST(GROUP_CONCAT(avail.id) AS CHAR) AS ids,
SUM(avail.booked) AS booked
FROM
available_dates AS avail
CROSS JOIN
( SELECT DATE('2012-07-14') AS start_date_check
, 52 AS max_week_check
) AS param
JOIN
num
ON avail.start_date = param.start_date_check + INTERVAL num.i WEEK
AND num.i < param.max_week_check
WHERE
avail.booked = 0
GROUP BY
avail.room_id,
( num.i / 2 )
HAVING
COUNT(*) = 2
You could also have this:
WHERE
1 =1 --- no WHERE condition
GROUP BY
avail.room_id,
( num.i / 2 )
HAVING --- and optionally
SUM(avail.booked) = 0 --- this