I am trying to pair records in a SQL table, my table looks similar to this:
UID DATE TIME MateID
---------------------------------------
1 2013-06-07 08:00 NULL
2 2013-06-07 10:00 NULL
3 2013-06-07 13:00 NULL
4 2013-06-07 17:00 NULL
5 2013-06-08 07:00 NULL
6 2013-06-08 11:00 NULL
7 2013-06-08 14:00 NULL
8 2013-06-08 18:00 NULL
I know I can do this with a cursor, but I wanted to know if there was a set based solution that could give me this output:
UID DATE TIME MateID
---------------------------------------
1 2013-06-07 08:00 2
2 2013-06-07 10:00 1
3 2013-06-07 13:00 4
4 2013-06-07 17:00 3
5 2013-06-08 07:00 6
6 2013-06-08 11:00 5
7 2013-06-08 14:00 8
8 2013-06-08 18:00 7
The UID field won't be consecutive, the records will be ordered by DATE and TIME. The table will contain about 50k records
Edit: Sorry I should have been a bit more clear. MateID is the UID of the previous/next record. Records are grouped based on the DATE and ordered by TIME ASC, so the first record and the second record of the DATE are pairs, the third record and fourth record of the DATE are paired too. Please let me know if you need me to explain anything else. There will always be an even number of records per date.
Thanks
You can use ROW_NUMBER() and some simple maths to generate PairIDs:
declare #Tab table (UID int not null,Date date not null,time time not null)
insert into #Tab (UID,Date,Time) values
(1,'20130607','08:00'),
(2,'20130607','10:00'),
(3,'20130607','13:00'),
(4,'20130607','17:00'),
(5,'20130608','07:00'),
(6,'20130608','11:00'),
(7,'20130608','14:00'),
(8,'20130608','18:00')
;With PairedRows as (
select UID,Date,Time,
(ROW_NUMBER() OVER (ORDER BY Date,Time) + 1) / 2 as PairID
from #Tab
)
select p1.UID,p1.Date,p1.Time,p2.UID
from
PairedRows p1
inner join
PairedRows p2
on
p1.PairID = p2.PairID and
p1.UID != p2.UID
(I've done this as a SELECT, but it's easy enough to switch it to an UPDATE if this is meant to be a permanent pairing - it's not really clear from your question)
It may better match your model to PARTITION BY Date and only ORDER BY Time in the ROW_NUMBER() function - but since in this case you've stated that every date has an even number of rows, and all we care about are those rows which are assigned the same PairID without caring about the numeric value, it shouldn't affect the result of the query.
But it may better document your requirements.
Related
I have two tables in my schema. The first contains a list of recurring appointments - default_appointments. The second table is actual_appointments - these can be generated from the defaults or individually created so not linked to any default entry.
Example:
default_appointments
id
day_of_week
user_id
appointment_start_time
appointment_end_time
1
1
1
10:00:00
16:00:00
2
4
1
11:30:00
17:30:00
3
6
5
09:00:00
17:00:00
actual_appointments
id
default_appointment_id
user_id
appointment_start
appointment_end
1
1
1
2021-09-13 10:00:00
2021-09-13 16:00:00
2
NULL
1
2021-09-13 11:30:00
2021-09-13 13:30:00
3
6
5
2021-09-18 09:00:00
2021-09-18 17:00:00
I'm looking to calculate the total minutes that were scheduled in against the total that were actually created/generated. So ultimately I'd end up with a query result with this data:
user_id
appointment_date
total_planned_minutes
total_actual_minutes
1
2021-09-13
360
480
1
2021-09-16
360
0
5
2021-09-18
480
480
What would be the best approach here? Hopefully the above makes sense.
Edit
OK so the default_appointments table contains all appointments that are "standard" and are automatically generated. These are what appointments "should" happen every week. So e.g. ID 1, this appointment should occur between 10am and 4pm every Monday. ID 2 should occur between 11:30am an 5:30pm every Thursday.
The actual_appointments table contains a list of all of the appointments which did actually occur. Basically what happens is a default_appointment will automatically generate itself an instance in the actual_appointments table when initially set up. The corresponding default_appointment_id indicates that it links to a default and has not been changed - therefore the times on both will remain the same. The user is free to change these appointments that have been generated by a default, resulting in setting the default_appointment_id to NULL * - or -* can add new appointments unrelated to a default.
So, if on a Monday (day_of_week = 1) I should normally have a default appointment at 10am - 4pm, the total minutes I should have planned based on the defaults are 360 minutes, regardless of what's in the actual_appointments table, I should be planned for those 360 minutes every Monday without fail. If in the system I say - well actually, I didn't have an appointment from 10am - 4pm and instead change it to 10am - 2pm, actual_appointments table will then contain the actual time for the day, and the actual minutes appointed would be 240 minutes.
What I need is to group each of these by the date and user to understand how much time the user had planned for appointments in the default_appointments table vs how much they actually appointed.
Adjusted based on new detail in the question.
Note: I used day_of_week values compatible with default MySQL behavior, where Monday = 2.
The first CTE term (args) provides the search parameters, start date and number of days. The second CTE term (drange) calculates the dates in the range to allow generation of the scheduled appointments within that range.
allrows combines the scheduled and actual appointments via UNION to prepare for aggregation. There are other ways to set this up.
Finally, we aggregate the results per user_id and date.
The test case:
Working Test Case (Updated)
WITH RECURSIVE args (startdate, days) AS (
SELECT DATE('2021-09-13'), 7
)
, drange (adate, days) AS (
SELECT startdate, days-1 FROM args UNION ALL
SELECT adate + INTERVAL '1' DAY, days-1 FROM drange WHERE days > 0
)
, allrows AS (
SELECT da.user_id
, dr.adate
, ROUND(TIME_TO_SEC(TIMEDIFF(da.appointment_end_time, da.appointment_start_time))/60, 0) AS planned
, 0 AS actual
FROM drange AS dr
JOIN default_appointments AS da
ON da.day_of_week = dayofweek(adate)
UNION
SELECT user_id
, DATE(appointment_start) AS xdate
, 0 AS planned
, TIMESTAMPDIFF(MINUTE, appointment_start, appointment_end)
FROM drange AS dr
JOIN actual_appointments aa
ON DATE(appointment_start) = dr.adate
)
SELECT user_id, adate
, SUM(planned) AS planned
, SUM(actual) AS actual
FROM allrows
GROUP BY adate, user_id
;
Result:
+---------+------------+---------+--------+
| user_id | adate | planned | actual |
+---------+------------+---------+--------+
| 1 | 2021-09-13 | 360 | 480 |
| 1 | 2021-09-16 | 360 | 0 |
| 5 | 2021-09-18 | 480 | 480 |
+---------+------------+---------+--------+
The mysql table we work on has data in the following format:
entityId status updated_date
-------------------------------
1 1 29/05/2017 12:00
1 2 29/05/2017 03:00
1 3 29/05/2017 07:00
1 4 29/05/2017 14:00
1 5 30/05/2017 02:00
1 6 30/05/2017 08:00
2 1 31/05/2017 03:00
2 2 31/05/2017 05:00
.
.
So every entity id has 6 statuses, and every status has an update datetime. Each status has an activity attached to it.
For example 1 - Started journey
2 - Reached first destination
3 - Left Point A, moving towards B. etc
I need to get an output in the below format for specific entity id eg 3 and 4. I need the time for status 3 and 4 independently.
entity_id time_started_journey time_reached_first_destination
(update time of status 3) (update time of status 4)
--------------------------------------------------------------
1 29/05/2017 7:00 29/05/2017 14:00
2 30/05/2017 7:00 30/05/2017 16:00
Later I need to calculate the total time which would be the difference of the two.
How can I achieve the desired result using mysql.
I tried using Union operator but cannot do it separate columns.
Also, tried using case when operator with the below query but failed.
select distinct entityid,
(case status when 3 then freight_update_time else 0 end)
as starttime,
(case status when 4 then freight_update_time else 0 end) as endtime
from table ;
Can anyone throw light on this?
Conditional aggregation is one way to return a resultset that looks like that.
SELECT t.entityid
, MAX(IF(t.status=3,t.updated_date,NULL)) AS time_started_journey
, MAX(IF(t.status-4,t.updated_date,NULL)) AS time_reached_first_destination
FROM mytable t
WHERE t.status IN (3,4)
GROUP BY t.entityid
ORDER BY t.entityid
This is just one suggestion; the specification is unclear about what the query should do with duplicated status values for a given entityid.
There are other query patterns that will return similar results.
My query in MySQL
SELECT
e3.updated_date AS sta3,
e4.updated_date AS sta4
FROM
`prueba` AS e3
LEFT JOIN prueba AS e4
ON
e3.entityId = e4.entityId AND e4.status = 4
WHERE
e3.status = 3
OUTPUT:
I am working on a space booking system and is trying to find out if the space is available for a given date range. So I have 2 tables -
space_master
id space_address start_time end_time
space_availability
space_id days_of_week
Note -
(a) start_time and end_time are DATETIME fields in MYSQL
(b) days_of_week are numbers where Sunday is represented by 1 and so on
(c) one space can be available on multiple days (example follows)
space_master
id space_address start_time end_time
1 Florida 2012-03-18 10:21:00 2012-03-29 4:21:00
2 London 2012-04-21 09:00:00 2012-06-18 10:00:00
space_availability
space_id days_of_week
1 1
1 2
2 4
2 5
2 6
This means the first space (with id 1) is available between 2012-03-18 10:21:00 and 2012-03-29 4:21:00 but only on Sunday and Monday. Now I am trying to write a function that will take booking_start_time and booking_end_time (all DATETIME in MYSQL) as input then scan the available spaces table and return the availble bookings. Something like this -
getBooking(2012-03-19 10:21:00, 2012-03-19 15:21:00) - returns the space with id 1 (as 19th March 2012 was a Monday and hence available)
getBooking(2012-03-19 10:21:00, 2012-03-20 15:21:00) - returns nothing since 20th March is a Tuesday on which the space is not available.
Any idea how to do this? I was first trying to do this in a single query. But is that even possible?
EDIT: the sqlfiddle link follows -
http://sqlfiddle.com/#!9/8a6e1a
I have a table like this
id plan_id cancel_date paid_date
9 2 2015-08-05 2014-09-13
10 2 2015-09-08 2015-09-03
10 3 NULL 2015-09-10
11 3 NULL 2015-09-13
14 3 2015-09-28 2015-09-14
And I would like to select ids where there is a less than 30 days difference between cancel_date and paid_date (for a given plan), and they didn't acquired a new plan in less than 30 days.
In this case, this would mean returning id 14 only.
Update:
Whenever a user buy a new plan, we insert it to the table, with a different paid_date (paid_date is the date that the plan was acquired the first time).
I have a table that contains Following entries:
completed_time|| BOOK_CNT
*********************************************
2013-07-23 | 2
2013-07-22 | 1
2013-07-19 | 3
2013-07 16 |5
2013-07-12 |4
2013-07-11 |2
2013-07-02 |9
2013-06-30 |5
Now, I want to use above entries for data analysis.
Lets say DAYS_FROM, DAYS_TO and PERIOD are three variables.
I need to fire following sort of queries:
"Total book from DAYS_FROM to DAYS_TO in interval of PERIOD."
DAYS_FROM is a date in format YYYY-MM-DD
,DAYS_TO is a date in format YYYY-MM-DD
PERIOD is {1W,2W,1M,2M,1Y}
where W,M,Y represents WEEK,MONTH and YEAR.
Example: The queries DAYS_FROM=2013-07-23 , DAYS_TO=2013-07-03 and PERIOD=1W should return:
ith week - total
1 - 3
2- 8
3- 6
4- 14
Explanation:
1-3 means (The total book from 2013-07-21(sun) to 2013-07-23(tue) is 3 )
2-8 means (The total book from 2013-07-14(sun) to 2013-07-21(sun) is 8 )
3-16 means (The total book from 2013-07-07(sun) to 2013-07-14(sun) is 6 )
4-14 means (The total book from 2013-07-03(wed) to 2013-07-07(sun) is 14 )
Please refer the calendar image for better understanding.
How to fire such query?
What I tried?
SELECT DAY(completed_time), COUNT(total) AS Total
FROM my_tab
WHERE completed_time BETWEEN '2013-07-23' - INTERVAL 1 WEEK AND '2013-07-03'
GROUP BY DAY(completed_time);
The above queries subtracted 7 days from 2013-07-23 and thus considered 2013-07-16 to 2013-07-23 as first week, 2013-07-09 to 2013-07-16 as second week and so on.
A simple starting point would be something like below, of course you may want to adjust the ith value to suit your needs;
SET #period='1M';
SELECT CASE WHEN #period='1Y' THEN YEAR(completed_time)
WHEN #period='1M' THEN YEAR(completed_time)*100+MONTH(completed_time)
WHEN #period='2M' THEN FLOOR((YEAR(completed_time)*100+MONTH(completed_time))/2)*2
WHEN #period='1W' THEN YEARWEEK(completed_time)
WHEN #period='2W' THEN FLOOR(YEARWEEK(completed_time)/2)*2
END ith,
SUM(BOOK_CNT) Total
FROM my_tab
GROUP BY ith
ORDER BY ith DESC;
An SQLfiddle to test with.