Retrieve Timediff between current row and next row in subquery - mysql

Why I am getting more than 24 hours? I am trying to get the timediff between each row in the sub-query if the timediff is greater than 10 min. then sum the result per day.
My goal is to figure out for each user the total of every brake thats longer than 10 min. and list that among the amount of calls on that particular day?
SELECT DATE_FORMAT(last_call, '%d, %W') AS DAY
, COUNT(call_id) AS calls
, ( SELECT SEC_TO_TIME(SUM((
SELECT timestampdiff(SECOND, c.last_call, c2.last_call)
FROM calls c2
WHERE c2.calling_agent = c.calling_agent
AND c2.last_call > c.last_call
AND timestampdiff(SECOND, c.last_call, c2.last_call) > 600
ORDER BY c2.last_call LIMIT 1
)))
FROM calls AS c
WHERE EXTRACT(DAY FROM c.last_call) = EXTRACT(DAY FROM calls.last_call)
) AS `brakes`
FROM calls
WHERE 9 IN (calls.reg_calling_agent)
AND last_call > DATE_SUB(now() , INTERVAL 12 MONTH)
GROUP BY EXTRACT(DAY FROM last_call)
ORDER BY EXTRACT(DAY FROM last_call) DESC

You're getting more than 24 hours because
1) the row retrieved from c2 could be from a different day. There's no guarantee that the next call (10 minutes after the previous call) isn't the first call made/received by an agent after a week long vacation.
2) that same "gap" of over 10 minutes is going to reported for the last call the agent made/received. And you're also going to get a "gap" between the call the agent made immediately before the one before the gap, and the one before that. That is, there's no provision to made exclude the calls that DID have a subsequent call within 10 minutes. (The subquery is just looking for any subsequent call that is 10 minutes after a call.)
3) you are getting getting an aggregate total (SUM) of all of those gaps in a given day, irregardless of the agent; all the gaps for all agents are being totaled.
4) the outer query is getting a years worth of calls, (for all agents?) but is grouping by day of month (1 through 31). So, you're getting back one row for the 5th of the month, but there will be multiple agents and multiple "days" (Jan 5, Feb 5, March 5, etc.), multiple values of 'brakes', and only one of those values is going to be included in the result,. It's indeterminate which of those row values will be returned. (Other RDBMS's would balk with this construct, a non-aggregate expression in the SELECT list which not included in the GROUP BY, but by default, MySQL allows it.)
--
FOLLOWUP
Q: could you please post the corrected query?
A: I don't have the table schema, or sample data, or a specification, so it's impossible for me to provide a "corrected" query.
For example, it's not at all clear why there's a predicate on reg_calling_agent in the outermost query, but the subqueries don't have any reference to that column, or any other column from the table in the outer query, except for the last_call column. The query to find a subsequent call is relying on the calling_agent column, not reg_calling_agent, but that's being performed for ALL calls in a given day of month.
I can take a shot a query that may be closer to what you are looking for, but there is absolutely no guarantee that this is "correct" in terms of matching the schema, the datatypes, the actual data, or the expected output. A query that returns unexpected results is not an adequate specification.
SELECT a.calling_agent
, DATE_FORMAT(a.last_call,'%d, %W') AS `day`
, COUNT(a.call_id) AS `calls`
, SEC_TO_TIME(
SUM(
SELECT IF(TIMESTAMPDIFF(SECOND, a.last_call, c.last_call) > 600
,TIMESTAMPDIFF(SECOND, a.last_call, c.last_call)
,NULL
) AS `gap`
FROM calls c
WHERE c.calling_agent = a.calling_agent
AND c.last_call > a.last_call
AND c.last_call < DATE(a.last_call)+INTERVAL 1 DAY
ORDER BY c.last_call
LIMIT 1
)
) AS `breaks`
FROM calls a
WHERE a.reg_calling_agent = 9
AND a.last_call > DATE(NOW()) - INTERVAL 12 MONTH
GROUP BY a.calling_agent, DATE_FORMAT(a.last_call,'%d, %W')
ORDER BY a.calling_agent, DATE_FORMAT(a.last_call,'%d, %W') DESC
UNPACKING THE QUERY
I thought I might provide some insight as to the design of this query, what it's intended to do. I retained the FROM and WHERE clauses from the original outer query. I just gave an alias to the calls table, and re-wrote the predicates to a form that I think is simpler, and that I'm more used to using.
For the GROUP BY, I added calling_agent, since it doesn't seem to make sense that we would want to lump all of the agents together. (It's really up to you to decide whether that matches the spec or not.) I did this because calling_agent is NOT referenced in the WHERE clause. (There's an equality predicate on reg_calling_agent, but that's a different column.)
I replaced the EXTRACT(DAY FROM ) expression, since that's only returning an integer value between 1 and 31. And it just doesn't seem to make sense to lump together all the "4th day" of all months. I chose to use the expression that's in the SELECT list; because that's the normative pattern... returning the expressions used in the GROUP BY clause in the SELECT list, so the client will be able to distinguish which row in the result belongs to which group identifier.
I also qualified all column references with a table alias, as an aid to the future reader. We're familiar following that pattern in complex queries. It's natural that we extend that same pattern to simpler queries, even when it's not required.
The big change is to the derived breaks column. (I renamed that from 'brakes', because it seems like what this query is doing is finding out when calling_agents weren't making/receiving calls, when workers were "taking a break". (That's entirely a guess on my part.)
There's a SEC_TO_TIME function, all that's doing is reformatting the result.
There's a SUM() aggregate. This is just going to total up the values, for each row in a that's in a "group".
The real "meat" is the correlated subquery. What that does... for each row returned by the outer query (i.e. every row from calls that satisfies the WHERE clause on the outer query)... we are going to run another SELECT. And it's going to look for the very "next" call made/received by the same calling_agent. To do that, the calling_agent on the "next" call needs to match the value from row from the outer query...
WHERE c.calling_agent = a.calling_agent
Also, the datetime/timestamp of the subsequent "call" needs to be anytime after the datetime/timestamp of the row from the outer query...
AND c.last_call > a.last_call
And, we only want to look for calls that are on the same calendar date (year, month, day) as the previous call. (This prevents us from considering a call made four days later as a "subsequent" call.)
AND c.last_call < DATE(a.last_call)+INTERVAL 1 DAY
And, out of all those potential subsequent calls, we only want the first one, so we order them by datetime/timestamp, and then take just the first one.
ORDER BY c.last_call
LIMIT 1
If we don't get a row, the subquery will return a NULL. If we do get a row, the next thing we want to do is check if the datetime/timestamp on this call is more than 10 minutes after the previous call. We use the same TIMESTAMPDIFF expression from the original query, to derive the number of seconds between the calls, and we compare that to 10 minutes. If the gap is greater than 10 minutes, we consider this as a "break", and we return the difference as number of seconds. Otherwise, we just return a NULL, as if we hadn't found a "next" row.
IF(TIMESTAMPDIFF(SECOND, a.last_call, c.last_call) > 600
,TIMESTAMPDIFF(SECOND, a.last_call, c.last_call)
,NULL
) AS `gap`
That's MySQL-specific shorthand for the ANSI-standard form:
CASE
WHEN TIMESTAMPDIFF(SECOND, a.last_call, c.last_call) > 600
THEN TIMESTAMPDIFF(SECOND, a.last_call, c.last_call)
ELSE NULL
END AS `gap`
(NOTE: the ELSE NULL could be omitted, that would be functionally equivalent because NULL is the default when ELSE is omitted. I include it here for completeness, and for comparison to the MySQL IF() function.)
Finally, we include all of the expressions in the GROUP BY clause in the SELECT list. (This isn't required, but it's the usual pattern. If those expressions are omitted, there should be a pretty obvious reason why they are omitted. For example, if the outer query had an equality predicate on calling_agent, e.g.
AND a.calling_agent = 86
Then we'd know that any row returned by the query would have a value of 86 returned for calling_agent, so we could omit the expression from the SELECT list. But if we omit an equality predicate, or change it so that more than one calling_agent could be returned, something like:
AND (a.calling_agent = 86 OR a.calling_agent = 99)
then without calling_agent in the SELECT list, we won't be able to tell which rows are for which calling_agent. If we're going to the bother of doing a GROUP BY on the expression, we usually want to include the expression in the SELECT list; that's the normal pattern.

Related

SQL: Reuse function result in query without using sub-query

In a MySQL DB table that stores sale orders, I have a LastReviewed column that holds the last date and time when the sale order was modified (type timestamp, default value CURRENT_TIMESTAMP). I'd like to plot the number of sales that were modified each day, for the last 90 days, for a particular user.
I'm trying to craft a SELECT that returns the number of days since LastReviewed date, and how many records fall within that range. Below is my query, which works just fine:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND DATEDIFF(CURDATE(),LastReviewed)<=90
GROUP BY days
ORDER BY days ASC
Notice that I am computing the DATEDIFF() as well as CURDATE() multiple times for each record. This seems really ineffective, so I'd like to know how I can reuse the results of the previous computation. The first thing I tried was:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND days<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'where clause'. So I started to look around the net. Based on another discussion (Can I reuse a calculated field in a SELECT query?), I next tried the following:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND (SELECT days)<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'field list'. I'm also tried the following:
SELECT #days := DATEDIFF(CURDATE(), LastReviewed) AS days,
COUNT(*) AS number FROM sales
WHERE UserID=123 AND #days <=90
GROUP BY days
ORDER BY days ASC
The query returns zero result, so #days<=90 seems to return false even though if I put it in the SELECT clause and remove the WHERE clause, I can see some results with #days values below 90.
I've gotten things to work by using a sub-query:
SELECT * FROM (
SELECT DATEDIFF(CURDATE(),LastReviewed) AS sales ,
COUNT(*) AS number FROM sales
WHERE UserID=123
GROUP BY days
) AS t
WHERE days<=90
ORDER BY days ASC
However I odn't know whether it's the most efficient way. Not to mention that even this solution computes CURDATE() once per record even though its value will be the same from the start to the end of the query. Isn't that wasteful? Am I overthinking this? Help would be welcome.
Note: Mods, should this be on CodeReview? I posted here because the code I'm trying to use doesn't actually work
There are actually two problems with your question.
First, you're overlooking the fact that WHERE precedes SELECT. When the server evaluates WHERE <expression>, it then already knows the value of the calculations done to evaluate <expression> and can use those for SELECT.
Worse than that, though, you should almost never write a query that uses a column as an argument to a function, since that usually requires the server to evaluate the expression for each row.
Instead, you should use this:
WHERE LastReviewed < DATE_SUB(CURDATE(), INTERVAL 90 DAY)
The optimizer will see this and get all excited, because DATE_SUB(CURDATE(), INTERVAL 90 DAY) can be resolved to a constant, which can be used on one side of a < comparison, which means that if an index exists with LastReviewed as the leftmost relevant column, then the server can immediately eliminate all of the rows with LastReviewed >= that constant value, using the index.
Then DATEDIFF(CURDATE(), LastReviewed) AS days (still needed for SELECT) will only be evaluated against the rows we already know we want.
Add a single index on (UserID, LastReviewed) and the server will be able to pinpoint exactly the relevant rows extremely quickly.
Builtin functions are much less costly than, say, fetching rows.
You could get a lot more performance improvement with the following 'composite' index:
INDEX(UserID, LastReviewed)
and change to
WHERE UserID=123
AND LastReviewed >= CURRENT_DATE() - INTERVAL 90 DAY
Your formulation is 'hiding' LastRevieded in a function call, making it unusable in an index.
If you are still not satisfied with that improvement, then consider a nightly query that computes yesterday's statistics and puts them in a "Summary table". From there, the SELECT you mentioned can run even faster.

How do I subtract two declared variables in MYSQL

The question I am working on is as follows:
What is the difference in the amount received for each month of 2004 compared to 2003?
This is what I have so far,
SELECT #2003 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT #2004 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT MONTH(orderDate), (#2004 - #2003) AS Diff
FROM Payments, Orders
WHERE Orders.customerNumber = Payments.customerNumber
Group By MONTH(orderDate);
In the output I am getting the months but for Diff I am getting NULL please help. Thanks
I cannot test this because I don't have your tables, but try something like this:
SELECT a.orderMonth, (a.orderTotal - b.orderTotal ) AS Diff
FROM
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal
FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as a,
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as b
WHERE a.orderMonth=b.orderMonth
Q: How do I subtract two declared variables in MySQL.
A: You'd first have to DECLARE them. In the context of a MySQL stored program. But those variable names wouldn't begin with an at sign character. Variable names that start with an at sign # character are user-defined variables. And there is no DECLARE statement for them, we can't declare them to be a particular type.
To subtract them within a SQL statement
SELECT #foo - #bar AS diff
Note that MySQL user-defined variables are scalar values.
Assignment of a value to a user-defined variable in a SELECT statement is done with the Pascal style assignment operator :=. In an expression in a SELECT statement, the equals sign is an equality comparison operator.
As a simple example of how to assign a value in a SQL SELECT statement
SELECT #foo := '123.45' ;
In the OP queries, there's no assignment being done. The equals sign is a comparison, of the scalar value to the return from a subquery. Are those first statements actually running without throwing an error?
User-defined variables are probably not necessary to solve this problem.
You want to return how many rows? Sounds like you want one for each month. We'll assume that by "year" we're referring to a calendar year, as in January through December. (We might want to check that assumption. Just so we don't find out way too late, that what was meant was the "fiscal year", running from July through June, or something.)
How can we get a list of months? Looks like you've got a start. We can use a GROUP BY or a DISTINCT.
The question was... "What is the difference in the amount received ... "
So, we want amount received. Would that be the amount of payments we received? Or the amount of orders that we received? (Are we taking orders and receiving payments? Or are we placing orders and making payments?)
When I think of "amount received", I'm thinking in terms of income.
Given the only two tables that we see, I'm thinking we're filling orders and receiving payments. (I probably want to check that, so when I'm done, I'm not told... "oh, we meant the number of orders we received" and/or "the payments table is the payments we made, the 'amount we received' is in some other table"
We're going to assume that there's a column that identifies the "date" that a payment was received, and that the datatype of that column is DATE (or DATETIME or TIMESTAMP), some type that we can reliably determine what "month" a payment was received in.
To get a list of months that we received payments in, in 2003...
SELECT MONTH(p.payment_received_date)
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2004-01-01'
GROUP BY MONTH(p.payment_received_date)
ORDER BY MONTH(p.payment_received_date)
That should get us twelve rows. Unless we didn't receive any payments in a given month. Then we might only get 11 rows. Or 10. Or, if we didn't receive any payments in all of 2003, we won't get any rows back.
For performance, we want to have our predicates (conditions in the WHERE clause0 reference bare columns. With an appropriate index available, MySQL will make effective use of an index range scan operation. If we wrap the columns in a function, e.g.
WHERE YEAR(p.payment_received_date) = 2003
With that, we will be forcing MySQL to evaluate that function on every flipping row in the table, and then compare the return from the function to the literal. We prefer not do do that, and reference bare columns in predicates (conditions in the WHERE clause).
We could repeat the same query to get the payments received in 2004. All we need to do is change the date literals.
Or, we could get all the rows in 2003 and 2004 all together, and collapse that into a list of distinct months.
We can use conditional aggregation. Since we're using calendar years, I'll use the YEAR() shortcut (rather than a range check). Here, we're not as concerned with using a bare column inside the expression.
SELECT MONTH(p.payment_received_date) AS `mm`
, MAX(MONTHNAME(p.payment_received_date)) AS `month`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0)) AS `2004_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2003_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0))
- SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2004_2003_diff`
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2005-01-01'
GROUP
BY MONTH(p.payment_received_date)
ORDER
BY MONTH(p.payment_received_date)
If this is a homework problem, I strongly recommend you work on this problem yourself. There are other query patterns that will return an equivalent result.
I think this is the problem:
In #2003 and #2004, you select only the sum. And even if you group by the month you still select one column i.e. each row does not say what month it is select for. So when you try to subtract SQL asks which row in #2003 should be subtracted from #2004.
So I think the solution is to select the month with the sum and do the subtract later based on the month.

SQL query to select values grouped by hour(col) and weekday(row) based on the timestamp

I have searched SO for this question and found slightly similar posts but was unable to adapt to my needs.
I have a database with server requests since forever, each one with a timestamp and i'm trying to come up with a query that allows me to create a heatmatrix chart (CCC HeatGrid).
The sql query result must represent the server load grouped by each hour of each weekday.
Like this: Example table
I just need the SQL query, i know how to create the chart.
Thank you,
Those looks like "counts" of rows.
One of the issues is "sparse" data, we can address that later.
To get the day of the week ('Sunday','Monday',etc.) returned, you can use the DATE_FORMAT function. To get those ordered, we need to include an integer value 0 through 6, or 1 through 7. We can use an ORDER BY clause on that expression to get the rows returned in the order we want.
To get the "hour" across the top, we can use expressions in the SELECT list that conditionally increments the count.
Assuming your timestamp column is named ts, and assuming you want to pull all rows from the year 2014, we start with something like this:
SELECT DAYOFWEEK(t.ts)
, DATE_FORMAT(t.ts,'%W')
FROM mytable t
WHERE t.ts >= '2014-01-01'
AND t.ts < '2015-01-01'
GROUP BY DAYOFWEEK(t.ts)
ORDER BY DAYOFWEEK(t.ts)
(I need to check the MySQL documentation, WEEKDAY and DAYOFWEEK are real similar, but we want the one that returns lowest value for Sunday, and highest value for Saturday... i think we want DAYOFWEEK, easy enough to fix later)
The "trick" now is the columns across the top.
We can extract the "hour" from timestamp using the DATE_FORMAT() function, the HOUR() function, or an EXTRACT() function... take your pick.
The expressions we want are going to return a 1 if the timestamp is in the specified hour, and a zero otherwise. Then, we can use a SUM() aggregate to count up the 1. A boolean expression returns a value of 1 for TRUE and 0 for FALSE.
, SUM( HOUR(t.ts)=0 ) AS `h0`
, SUM( HOUR(t.ts)=1 ) AS `h1`
, SUM( HOUR(t.ts)=2 ) AS `h2`
, '...'
, SUM( HOUR(t.ts)=22 ) AS `h22`
, SUM( HOUR(t.ts)=23 ) AS `h23`
A boolean expression can also evaluate to NULL, but since we have a predicate (i.e. condition in the WHERE clause) that ensures us that ts can't be NULL, that won't be an issue.
The other issue we can encounter (as I mentioned earlier) is "sparse" data. To illustrate that, consider what happens (with our query) if there are no rows that have a ts value for a Monday. What happens is that we don't get a row in the resultset for Monday. If it does happen that a row is "missing" for Monday (or any day of the week), we do know that all of the hourly counts across the "missing" Monday row would all be zero.

MySQL DATE_ADD running too slow with dynamic interval

I have the following query that's running pretty slow when executing it on thousands of records.
SELECT
name,
id
FROM
meetings
WHERE
meeting_date < '2014-09-20 11:00:00' AND (
meeting_date >= '2014-09-20 09:00:00' OR
DATE_ADD(meeting_date, INTERVAL meeting_length SECOND) > '2014-09-20 09:00:00'
)
The query checks if meeting_date overlaps in anyway between 2014-09-20 09:00:00 and 2014-09-20 11:00:00. The above query covers all the possible overlapping cases. However, DATE_ADD adds a lot of overhead.
Anyway to optimize DATE_ADD? Removing DATE_ADD greatly boosts the performance but it won't cover all overlapping cases.
I recommend you eliminate the OR.
MySQL won't (can't) perform a range scan operation on an index on column meeting_date when that column is wrapped in a function.
When the comparison is against the bare column, MySQL can do a range scan. But with the comparison to an expression, MySQL has to evaluate that expression for every row in the table, and then comapare.
For a large table, we'd get optimal performance with an index with leading column of meeting_date.
I think the "trick" to getting better performance is to rewrite the query to introduce some additional domain knowledge. Specifically, what are the MINIMUM and MAXIMUM values for meeting_length?
I think it's pretty safe to assume it won't be negative. And we probably don't expect it to be zero. But even if the minimum length is greater than zero, we can use zero as our "known" minimum. (It's going to turn out to be more convenient than some other non-zero value.)
What we really need to know is the MAXIMUM value for meeting_length. If that's a known constant value, that would be great, because we're going to include that value in the query. let's assume the maximum value of meeting_length is the number of seconds in 7 days.
As a demonstration of what I'm thinking:
SELECT m.name
, m.id
FROM meetings m
WHERE m.meeting_date < '2014-09-20 11:00:00'
AND m.meeting_date > '2014-09-20 09:00:00' + INTERVAL -7 DAY
HAVING m.meeting_date + INTERVAL meeting_length SECOND
> '2014-09-20 09:00:00'
Let's unwrap that a bit.
The first predicate is the same as in your original query... the "start" time of the meeting is before the "end" of the specified period.
The third predicate is the same as in your query too... the "end" of the meeting is after the beginning of the specified period. (My personal preference is to use the + INTERVAL form to add a duration to datetime.)
So, just like the original query we're looking for overlap.
I'm suggesting that we include another sargable predicate. The addition of this predicate doesn't really change the check for the overlap, given that we have a known minimum of 0 for meeting_length. What it does do is add a fixed lower bound that we can check against.
To explain that a little bit... if a meeting row that satisfies the condition "meeting end is after the period start", then we also know, for that row, that "meeting start is after (period start MINUS meeting length)". And we also know that "meeting start is after (period start MINUS the MAXIMUM possible value of meeting length.
And for most rows, that's going to be a bigger range... but the "trick" is the the predicate that checks that can compare a "bare" column against a constant.
And that means MySQL will be able to use an index range scan operation to satisfy that. The query is of the form:
WHERE meeting_date > const
AND meeting_date < const
And that's perfect for an index range scan. That should benefit performance... assuming there's a suitable index and that significantly limits the number of rows that need to be checked.
But by itself, that returns more rows than we need, we're going to get some meetings that start and end before the start of the period.
So we still need the additional check, to further filter down the rows. But that won't have to be evaluated for every row, only the rows that are pass through the first two predicates.
AND meeting_date + length > const
We just need to MySQL to recognize that it length won't ever be negative; to recognize that this is actually a "stricter" range, not a broader range. It might work with the AND, but we can force MySQL to evaluate that condition later, by including it in the HAVING clause.
HAVING meeting_date + length > const
But, all of that is really just a guess.
We'd really need to take a look at the EXPLAIN output.
If that index with the leading column of meeting_date also includes the id and name columns, then MySQL could satisfy the query entirely from the index, without any need to reference pages in the underlying table. (If that happens, we'll see "Using index" in the EXPLAIN output.)
Earlier, I said it would be convenient if we had a known constant for maximum meeting_length.
We could also use a query to determine that from the data:
SELECT MAX(meeting_length) FROM meetings
(And index with meeting_length as the leading column will avoid having to do an expensive full scan of the table)
We use that value to derive the "constant" value in the predicate.
We could include that query (as an inline view or a subquery), but that might impact performance. (We'd need to test how "smart" MySQL optimizer is...
We could try it as a subquery:
SELECT m.name
, m.id
FROM meetings m
WHERE m.meeting_date < '2014-09-20 11:00:00'
AND m.meeting_date > '2014-09-20 09:00:00'
- INTERVAL (SELECT MAX(l.meeting_length) FROM meetings l) DAY
HAVING m.meeting_date + INTERVAL meeting_length SECOND
> '2014-09-20 09:00:00'
Or try it as an inline view:
SELECT m.name
, m.id
FROM ( SELECT MAX(l.meeting_length) AS max_seconds
FROM meetings l
) d
CROSS
JOIN meetings m
WHERE m.meeting_date < '2014-09-20 11:00:00'
AND m.meeting_date > '2014-09-20 09:00:00'
- INTERVAL d.max_seconds SECOND
HAVING m.meeting_date + INTERVAL meeting_length SECOND
> '2014-09-20 09:00:00'

Calculating time difference between activity timestamps in a query

I'm reasonably new to Access and having trouble solving what should be (I hope) a simple problem - think I may be looking at it through Excel goggles.
I have a table named importedData into which I (not so surprisingly) import a log file each day. This log file is from a simple data-logging application on some mining equipment, and essentially it saves a timestamp and status for the point at which the current activity changes to a new activity.
A sample of the data looks like this:
This information is then filtered using a query to define the range I want to see information for, say from 29/11/2013 06:00:00 AM until 29/11/2013 06:00:00 PM
Now the object of this is to take a status entry's timestamp and get the time difference between it and the record on the subsequent row of the query results. As the equipment works for a 12hr shift, I should then be able to build a picture of how much time the equipment spent doing each activity during that shift.
In the above example, the equipment was in status "START_SHIFT" for 00:01:00, in status "DELAY_WAIT_PIT" for 06:08:26 and so-on. I would then build a unique list of the status entries for the period selected, and sum the total time for each status to get my shift summary.
You can use a correlated subquery to fetch the next timestamp for each row.
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#;
Then you can use that query as a subquery in another query where you compute the duration between timestamp and next_timestamp. And then use that entire new query as a subquery in a third where you GROUP BY status and compute the total duration for each status.
Here's my version which I tested in Access 2007 ...
SELECT
sub2.status,
Format(Sum(Nz(sub2.duration,0)), 'hh:nn:ss') AS SumOfduration
FROM
(
SELECT
sub1.status,
(sub1.next_timestamp - sub1.timestamp) AS duration
FROM
(
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#
) AS sub1
) AS sub2
GROUP BY sub2.status;
If you run into trouble or need to modify it, break out the innermost subquery, sub1, and test that by itself. Then do the same for sub2. I suspect you will want to change the WHERE clause to use parameters instead of hard-coded times.
Note the query Format expression would not be appropriate if your durations exceed 24 hours. Here is an Immediate window session which illustrates the problem ...
' duration greater than one day:
? #2013-11-30 02:00# - #2013-11-29 01:00#
1.04166666667152
' this Format() makes the 25 hr. duration appear as 1 hr.:
? Format(#2013-11-30 02:00# - #2013-11-29 01:00#, "hh:nn:ss")
01:00:00
However, if you're dealing exclusively with data from 12 hr. shifts, this should not be a problem. Keep it in mind in case you ever need to analyze data which spans more than 24 hrs.
If subqueries are unfamiliar, see Allen Browne's page: Subquery basics. He discusses correlated subqueries in the section titled Get the value in another record.