Select rows with closest timestamp - mysql

I have a table that looks something like the following - essentially containing a timestamp as well as some other columns:
WeatherTable
+---------------------+---------+----------------+ +
| TS | MonthET | InsideHumidity | .... |
+---------------------+---------+----------------+ |
| 2014-10-27 14:24:22 | 0 | 54 | |
| 2014-10-27 14:24:24 | 0 | 54 | |
| 2014-10-27 14:24:26 | 0 | 52 | |
| 2014-10-27 14:24:28 | 0 | 54 | |
| 2014-10-27 14:24:30 | 0 | 53 | |
| 2014-10-27 14:24:32 | 0 | 55 | |
| 2014-10-27 14:24:34 | 9 | 54 | |
.......
I'm trying to formulate a SQL query that returns all rows within a certain timeframe (no problem here) with a certain arbitrary granularity, for instance, every 15 seconds. The number is always specified in seconds but is not limited to values less than 60. To complicate things further, the timestamps don't necessarily fall on the granularity required, so it's not a case of simply selecting the timestamp of 14:24:00, 14:24:15, 14:24:30, etc. - the row with the closest timestamp to each value needs to be included in the result.
For example, if the starting time was given as 14:24:30, the end time as 14:32:00, and the granularity was 130, the ideal times would be:
14:24:30
14:26:40
14:28:50
14:31:00
However, timestamps may not exist for each of those times, in which case the row with the closest timestamp to each of those ideal timestamps should be selected. In the case of two timestamps which are equally far away from the ideal timestamp, the earlier one should be selected.
The database is part of a web service, so presently I'm just ignoring the granularity in the SQL query and filtering the unwanted results out in (Java) code later. However, this seems far from ideal in terms of memory consumption and performance.
Any ideas?

You could try to do it like this:
Create a list of time_intervals first. Using the stored procedure make_intervals from Get a list of dates between two dates create a temporary tables calling it somehow like that:
call make_intervals(#startdate,#enddate,15,'SECOND');
You will then have a table time_intervals with one of two columns named interval_start. Use this to find the closest Timestamp to each interval somehow like that:
CREATE TEMPORARY TABLE IF NOT EXISTS time_intervals_copy
AS (SELECT * FROM time_intervals);
SELECT
time_intervals.interval_start,
WeatherTable.*
FROM time_intervals
JOIN WeatherTable
ON WeatherTable.TS BETWEEN #startdate AND #enddate
JOIN (SELECT
time_intervals.interval_start AS interval_start,
MIN(ABS(time_intervals.interval_start - WeatherTable.TS)) AS ts_diff
FROM time_intervals_copy AS time_intervals
JOIN WeatherTable
WHERE WeatherTable.TS BETWEEN #startdate AND #enddate
GROUP BY time_intervals.interval_start) AS min
ON min.interval_start = time_intervals.interval_start AND
ABS(time_intervals.interval_start - WeatherTable.TS) = min.ts_diff
GROUP BY time_intervals.interval_start;
This will find the closest timestamp to every time_interval. Note: Each row in WeatherTable could be listed more than once, if the interval used is less than half the interval of the stored data (or something like that, you get the point ;)).
Note: I did not test the queries, they are written from my head. Please adjust to your use-case and correct minor mistakes, that might be in there...

For testing purposes, I extended your dataset to the following timestamps. The column in my database is called time_stamp.
2014-10-27 14:24:24
2014-10-27 14:24:26
2014-10-27 14:24:28
2014-10-27 14:24:32
2014-10-27 14:24:34
2014-10-27 14:24:25
2014-10-27 14:24:32
2014-10-27 14:24:34
2014-10-27 14:24:36
2014-10-27 14:24:37
2014-10-27 14:24:39
2014-10-27 14:24:44
2014-10-27 14:24:47
2014-10-27 14:24:53
I've summarized the idea, but let me explain in more detail before providing the solution I was able to work out.
The requirements are to address timestamps +/- a given time. Since we must go in either direction, we'll want to take the timeframe and split it in half. Then, -1/2 of the timeframe to +1/2 of the timeframe defines a "bin" to consider.
The bin for a given time from a given start time in an interval of #seconds is then given by this MySQL statement:
((floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds)
NOTE: The whole + 1 trick is there so that we do not end up with bin of -1 index (it'll start at zero). All times are calculated from the start time to ensure timeframes of >=60 seconds work.
Within each bin, we will need to know the magnitude of the distance from the center of the bin for each timeframe. That's done by determining the number of seconds from start and subtracting it from the bin (then taking the absolute value).
At this stage we then have all times "binned up" and ordered within the bin.
To filter out these results, we LEFT JOIN to the same table and setup the conditions to remove the undesirable rows. When LEFT JOINed, the desirable rows will have a NULL match in the LEFT JOINed table.
I have rather hack-like replaced the start, end, and seconds with variables, but only for readability. MySQL-style comments are included in the LEFT JOIN ON clause identifying the conditions.
SET #seconds = 7;
SET #time_start = TIMESTAMP('2014-10-27 14:24:24');
SET #time_end = TIMESTAMP('2014-10-27 14:24:52');
SELECT t1.*
FROM temp t1
LEFT JOIN temp t2 ON
#Condition 1: Only considering rows in the same "bin"
((floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds)
= ((floor(((t2.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds)
AND
(
#Condition 2 (Part A): "Filter" by removing rows which are greater from the center of the bin than others
abs(
(t1.time_stamp - #time_start)
- (floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
>
abs(
(t2.time_stamp - #time_start)
- (floor(((t2.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
OR
#Condition 2 (Part B1): "Filter" by removing rows which are the same distance from the center of the bin
(
abs(
(t1.time_stamp - #time_start)
- (floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
=
abs(
(t2.time_stamp - #time_start)
- (floor(((t2.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
#Condition 2 (Part B2): And are in the future from the other match
AND
(t1.time_stamp - #time_start)
>
(t2.time_stamp - #time_start)
)
)
WHERE t1.time_stamp - #time_start >= 0
AND #time_end - t1.time_stamp >= 0
#Condition 3: All rows which have a match are undesirable, so those
#with a NULL for the primary key (in this case temp_id) are selected
AND t2.temp_id IS NULL
There may be a more succinct way to write the query, but it did filter the results down to what was needed with one notable exception -- I purposefully put in a duplicate entry. This query will return both such entries as they do meet the criteria as stated.

Related

Calculate total scheduled against total actual in two separate tables

I have two tables in my schema. The first contains a list of recurring appointments - default_appointments. The second table is actual_appointments - these can be generated from the defaults or individually created so not linked to any default entry.
Example:
default_appointments
id
day_of_week
user_id
appointment_start_time
appointment_end_time
1
1
1
10:00:00
16:00:00
2
4
1
11:30:00
17:30:00
3
6
5
09:00:00
17:00:00
actual_appointments
id
default_appointment_id
user_id
appointment_start
appointment_end
1
1
1
2021-09-13 10:00:00
2021-09-13 16:00:00
2
NULL
1
2021-09-13 11:30:00
2021-09-13 13:30:00
3
6
5
2021-09-18 09:00:00
2021-09-18 17:00:00
I'm looking to calculate the total minutes that were scheduled in against the total that were actually created/generated. So ultimately I'd end up with a query result with this data:
user_id
appointment_date
total_planned_minutes
total_actual_minutes
1
2021-09-13
360
480
1
2021-09-16
360
0
5
2021-09-18
480
480
What would be the best approach here? Hopefully the above makes sense.
Edit
OK so the default_appointments table contains all appointments that are "standard" and are automatically generated. These are what appointments "should" happen every week. So e.g. ID 1, this appointment should occur between 10am and 4pm every Monday. ID 2 should occur between 11:30am an 5:30pm every Thursday.
The actual_appointments table contains a list of all of the appointments which did actually occur. Basically what happens is a default_appointment will automatically generate itself an instance in the actual_appointments table when initially set up. The corresponding default_appointment_id indicates that it links to a default and has not been changed - therefore the times on both will remain the same. The user is free to change these appointments that have been generated by a default, resulting in setting the default_appointment_id to NULL * - or -* can add new appointments unrelated to a default.
So, if on a Monday (day_of_week = 1) I should normally have a default appointment at 10am - 4pm, the total minutes I should have planned based on the defaults are 360 minutes, regardless of what's in the actual_appointments table, I should be planned for those 360 minutes every Monday without fail. If in the system I say - well actually, I didn't have an appointment from 10am - 4pm and instead change it to 10am - 2pm, actual_appointments table will then contain the actual time for the day, and the actual minutes appointed would be 240 minutes.
What I need is to group each of these by the date and user to understand how much time the user had planned for appointments in the default_appointments table vs how much they actually appointed.
Adjusted based on new detail in the question.
Note: I used day_of_week values compatible with default MySQL behavior, where Monday = 2.
The first CTE term (args) provides the search parameters, start date and number of days. The second CTE term (drange) calculates the dates in the range to allow generation of the scheduled appointments within that range.
allrows combines the scheduled and actual appointments via UNION to prepare for aggregation. There are other ways to set this up.
Finally, we aggregate the results per user_id and date.
The test case:
Working Test Case (Updated)
WITH RECURSIVE args (startdate, days) AS (
SELECT DATE('2021-09-13'), 7
)
, drange (adate, days) AS (
SELECT startdate, days-1 FROM args UNION ALL
SELECT adate + INTERVAL '1' DAY, days-1 FROM drange WHERE days > 0
)
, allrows AS (
SELECT da.user_id
, dr.adate
, ROUND(TIME_TO_SEC(TIMEDIFF(da.appointment_end_time, da.appointment_start_time))/60, 0) AS planned
, 0 AS actual
FROM drange AS dr
JOIN default_appointments AS da
ON da.day_of_week = dayofweek(adate)
UNION
SELECT user_id
, DATE(appointment_start) AS xdate
, 0 AS planned
, TIMESTAMPDIFF(MINUTE, appointment_start, appointment_end)
FROM drange AS dr
JOIN actual_appointments aa
ON DATE(appointment_start) = dr.adate
)
SELECT user_id, adate
, SUM(planned) AS planned
, SUM(actual) AS actual
FROM allrows
GROUP BY adate, user_id
;
Result:
+---------+------------+---------+--------+
| user_id | adate | planned | actual |
+---------+------------+---------+--------+
| 1 | 2021-09-13 | 360 | 480 |
| 1 | 2021-09-16 | 360 | 0 |
| 5 | 2021-09-18 | 480 | 480 |
+---------+------------+---------+--------+

Cant combine two selects together

I have a table containing timestamps and the direction(In/Out)
CASE 1:in, 2016-07-06 08:00:00, I
CASE 1:out, 2016-07-06 17:00:00, O
CASE 2:in, 2016-07-12 08:00:00, I
CASE 2:out, 2016-07-13 17:00:00, O
CASE 3:in, 2016-07-14 08:00:00, I
CASE 3:out, 2016-07-18 17:00:00, O
I would like to create a select that shows me all rows that have been In at 03am every morning. It should also show me all days of the months it was in.
As you see from the example, Case 1 should not be returned because it was not In at 03:00 am.
So far i have created a select statement to check for a particular day:
Select t.* from
(SELECT * FROM movement
where scan_date < '2016-07-06 03:00:00'
order by scan_date desc limit 1) t
WHERE t.direction='I';
So far so good. I would now like to check that against the whole month.
For simplicity i have created a table with all 3am entiries for the whole month:
Select day from month
returns
2017-07-01 03:00:00
...and so on...
For some reason i am not able to combine these two selects.
I cant use the second table in the < clause because it tells me that it returns more then one row.
The problem sounds simple but didnt find a solution so far.
If anybody has any ideas i would be happy to hear it.
Given your updated problem statement, the following should get you started:
SELECT
check_time
, CAST(LEFT(checktime_direction, 19) AS DATETIME) scan_time
, RIGHT(checktime_direction, 1) direction
FROM (
SELECT
CP.p check_time
, (SELECT CONCAT(DATE_FORMAT(scan_date, '%Y-%m-%d %H:%i:%s'), direction)
FROM Movement
WHERE scan_date = (SELECT MAX(scan_date)
FROM Movement
WHERE scan_date <= CP.p
)
) checktime_direction
FROM Checkpoint CP
) T
WHERE RIGHT(checktime_direction, 1) = 'I'
ORDER BY check_time
;
For your sample data it returns:
| check_time | scan_time | direction |
|------------------------|------------------------|-----------|
| July, 13 2016 03:00:00 | July, 12 2016 08:00:00 | I |
| July, 15 2016 03:00:00 | July, 14 2016 08:00:00 | I |
| July, 16 2016 03:00:00 | July, 14 2016 08:00:00 | I |
| July, 17 2016 03:00:00 | July, 14 2016 08:00:00 | I |
| July, 18 2016 03:00:00 | July, 14 2016 08:00:00 | I |
See it in action: SQL Fiddle.
NB: The helper table has been called Checkpoint rather than month as the latter is a MySQL reserved word.
How does it work?
The overall solution approach is to use a correlated (or "synchronized") subquery.
Generally speaking, the inner query pulls up the most recent direction change for the respective outer checkpoint. This (intermediate) result set in turn is filtered to retain just the "I" records.
In detail:
SELECT
CP.p check_time
[...]
FROM Checkpoint CP
is the outer query, setting the scene. (In the context of the mentioned synch, that is. To get the wanted result, this "outer" query is wrapped itself. - But this involves no correlating, just filtering.)
SELECT MAX(scan_date)
FROM Movement
WHERE scan_date <= CP.p
is linked to it via CP.p.
As you asked for both scan_date and direction to be pulled up for your final result, another level is introduced in between the two:
SELECT CONCAT(DATE_FORMAT(scan_date, '%Y-%m-%d %H:%i:%s'), direction)
FROM Movement
WHERE scan_date = ([...]
)
This query gets two columns - but its output needs to fit into a single column in the outer query. Therefore, the two columns are combined into one - in a way ensuring correct disassembly later on.
From here, all that's left to do, is to get rid of the "O" records (and put the remaining one's into the correct order). Well - the earlier combined into one column scan_time and direction are waiting to get separated (and the former to be returned to DATETIME):
SELECT
check_time
, CAST(LEFT(checktime_direction, 19) AS DATETIME) scan_time
, RIGHT(checktime_direction, 1) direction
FROM (
[...]
) T
WHERE RIGHT(checktime_direction, 1) = 'I'
ORDER BY check_time
;
You might want to use the Fiddle, break apart the query, play with the pieces, and watch the show...
Mind the pair of round brackets / parentheses surrounding each sub-select. These are required by the SQL syntax.
Please comment if and as this requires adjustment / further detail.
Select movement.*
from month
inner join movement ON scan_date < month.day AND direction='I'
GROUP BY month.day
ORDER BY movement.scan_date DESC
I hope this will work for you.
select *, extract(day from scan_date) as days_of_month
from movement where extract(hour from scan_date) = 3
and direction='I'
order by scan_date desc;
If you need anything more then please explain them with table structure.

MySQL - Select row with column + X > column

We have a database for patients that shows the details of their various visits to our office, such as their weight during that visit. I want to generate a report that returns the visit (a row from the table) based on the difference between the date of that visit and the patient's first visit being the largest value possible but not exceeding X number of days.
That's confusing, so let me try an example. Let's say I have the following table called patient_visits:
visit_id | created | patient_id | weight
---------+---------------------+------------+-------
1 | 2006-08-08 09:00:05 | 10 | 180
2 | 2006-08-15 09:01:03 | 10 | 178
3 | 2006-08-22 09:05:43 | 10 | 177
4 | 2006-08-29 08:54:38 | 10 | 176
5 | 2006-09-05 08:57:41 | 10 | 174
6 | 2006-09-12 09:02:15 | 10 | 173
In my query, if I were wanting to run this report for "30 days", I would want to return the row where visit_id = 5, because it's 28 days into the future, and the next row is 35 days into the future, which is too much.
I've tried a variety of things, such as joining the table to itself, or creating a subquery in the WHERE clause to try to return the max value of created WHERE it is equal to or less than created + 30 days, but I seem to be at a loss at this point. As a last resort, I can just pull all of the data into a PHP array and build some logic there, but I'd really rather not.
The bigger picture is this: The database has about 5,000 patients, each with any number of office visits. I want to build the report to tell me what the average wait loss has been for all patients combined when going from their first visit to X days out (that is, X days from each individual patient's first visit, not an arbitrary X-day period). I'm hoping that if I can get the above resolved, I'll be able to work the rest out.
You can get the date of the first and next visit using query like this (Note that this doesn't has correct syntax for date comparing and it is just an schema of the query):
select
first_visits.patient_id,
first_visits.date first_date,
max(next_visit.created) next_date
from (
select patient_id, min(created) as "date"
from patient_visits
group by patient_id
) as first_visits
inner join patient_visits next_visit
on (next_visit.patient_id = first_visits.patient_id
and next_visit.created between first_visits.created and first_visits.created + 30 days)
group by first_visits.patient_id, first_visits.date
So basically you need to find start date using grouping by patient_id and then join patient_visits and find max date that is within the 30 days window.
Then you can join the result to patient_visits to get start and end weights and calculate the loss.

CASE w/ DATEADD range to SUM column multiple times for future earnings estimate

EDIT: The original post follows, but its a bit long and wordy. This edit presents a simplified question.
I'm trying to SUM 1 column multiple times; from what I've found, my options are either CASE or (SELECT). I am trying to SUM based on a date range and I can't figure out if CASE allows that.
table.number | table.date
2 2014/12/18
2 2014/12/19
3 2015/01/11
3 2015/01/12
7 2015/02/04
7 2015/02/05
As separate queries, it would look like this:
SELECT SUM(number) as alpha FROM table WHERE date >= 2014/12/01 AND date<= DATE_ADD (2014/12/01, INTERVAL 4 WEEKS)
SELECT SUM(number) as beta FROM table WHERE date >= 2014/12/29 AND date<= DATE_ADD (2014/12/01, INTERVAL 4 WEEKS)
SELECT SUM(number) as gamma FROM table WHERE date >= 2014/01/19 AND date<= DATE_ADD (2014/12/01, INTERVAL 4 WEEKS)
Looking for result set
alpha | beta | gamma
2 6 14
ORIGINAL:
I'm trying to return SUM of payments that will be due within my budgeting time frame (4 weeks) for the current budgeting period and 2 future periods. Some students pay every 4 weeks, others every 12. Here are the relevant fields in my tables:
client.name | client.ppid | client.last_payment
john | 1 | 12/01/14
jack | 2 | 11/26/14
jane | 3 | 10/27/14
pay_profile.id | pay_profile.price | pay_profile.interval (in weeks)
1 140 4
2 399 4
3 1 12
pay_history.name | pay_history.date | pay_history.amount
john | 12/02/14 | 140
jerry | more historical | data
budget.period_start |
12/01/14
I think the most efficient way of doing this is:
1.)SUM all students who pay every 4 weeks as base_pay
2.)SUM all students who pay every 12 weeks and whose DATEADD(client.last_payment, INTERVAL pay_profile.interval WEEKS) is >= budget.period_start and <= DATEADD(budget.period_start, INTERVAL 28 DAYS) as accounts_receivable
3.) As the above step will miss people who've already paid in this budgeting period (as this updates their last_payment dating, putting them out of the range specified in #2), I'll also need to SUM pay_history.date for the range above as well. paid_in_full
4.) repeat step 2 above, adjusting the range and column name for future periods (i.e. accounts_receivable_2
5.) use php to SUM base_pay, accounts_receivable, and pay_history, repeating the process for future periods.
I'm guessing the easiest way would be to use CASE, which I've not done before. Here was my best guess, which fails due to a sytax error. I assuming I can use DATE_ADD in the WHEN statement.
SELECT
CASE
DATE_ADD(client.last_payment, INTERVAL pay_profile.interval WEEK) >= budget.period_start
AND
DATE_ADD(client.last_payment, INTERVAL pay_profile.interval WEEK) <=
DATE_ADD(budget.period_start,INTERVAL 28 DAY) THEN SUM(pay_profile.price) as base_pay
FROM client
LEFT OUTER JOIN pay_profile ON client.ppid = pay_profile.ppid
LEFT OUTER JOIN budget ON client.active = 1
WHERE
client.active = 1
Thanks.

Need an array of date blocks from MySql Database

Ok, I have a database table of rows with a StartDate and an EndDate. What I need to do is return blocks of consumed time from that.
So, for example, if I have 3 rows as follows:
RowID StartDate EndDate
1 2011-01-01 2011-02-01
2 2011-01-30 2011-02-20
3 2011-03-01 2011-04-01
then the blocks of used time would be as follows:
2011-01-01 to 2011-02-20 and 2011-03-01 to 2011-04-01
Is there an easy method of extracting that from a MySql database? Any suggestions welcome!
Look at the diagram below which represents some overlapping time periods
X----| |--------| |------X
|-------X X------------X
|----| X----|
The beginning or end of any contiguous time period, marked with an X doesn't fall inside any other time period. If we identify these times we can make some progress.
This query identifies the boundaries.
SELECT boundary FROM
(
-- find all the lower bounds
SELECT d1.StartDate AS boundary, 'lower' as type
FROM dates d1
LEFT JOIN dates d2 ON (
d1.StartDate > d2.StartDate
AND
d1.StartDate < d2.EndDate
)
WHERE d2.RowId IS NULL
GROUP BY d1.StartDate
UNION
-- find all the upper bounds
SELECT d1.EndDate AS boundary, 'upper' as type
FROM dates d1
LEFT JOIN dates d2 ON (
d1.EndDate > d2.StartDate
AND
d1.EndDate < d2.EndDate
)
WHERE d2.RowId IS NULL
GROUP BY d1.StartDate
) as boundaries
ORDER BY boundary ASC
The result of this query on your data is
boundry | type
------------------
2011-01-01 | lower
2011-02-20 | upper
2011-03-01 | lower
2011-04-01 | upper
The date ranges you are after are between consecutive lower and upper bounds shown abouve. With a bit of post processing, these can easily be found.
Have you tried mysql group concat
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
It would return a comma separated string, but you would still have to intialize that as an array in your application.