I'm trying to get time difference between rows. I tried this but it's not working.
SELECT id,locationDate,
TIMESTAMPDIFF(SECOND,
(SELECT MAX(locationDate) FROM location WHERE locationDate< t.locationDate),
created_at
) as secdiff
FROM location t where tagCode = 24414 AND locationDate >= '2017-05-10 16:00:01' and locationDate <= '2017-05-10 16:59:59';
What should I do for calculating time difference between rows ?
You can reach the sample structure and data from sqlfiddle
I am guessing you just want a correlated subquery:
select l.id, l.locationDate,
TIMESTAMPDIFF(SECOND,
(SELECT MAX(l2.locationDate)
FROM location l2
WHERE l2.locationDate < l.locationDate AND
l2.tagCode = l.tagCode
),
locationDate
) as secdiff
from location l
where l.tagCode = 24414 and
l.locationDate > '2017-05-10 16:00:00' and
l.locationDate < '2017-05-10 17:00:00';
I modified the date/time constants to be a bit more reasonable (from my perspective). If you really care about one second before or after a time, then you can use your original formulation.
Related
I am trying to return GPS coordinates for the last 3 visits to a store.
Current Visit
Previous Visit
Visit Prior to that.
The challenge is that there are 15,000+ stores, each that would of been visited on different dates (Within a year).
I have only written my query as far as retuning the last two visits (Current & Previous) and already I am facing efficiency problems with the query taking forever to run.
The query I have returns the correct dataset, however the efficiency is lacking greatly as it takes extremely long to run (A number of hours).
SELECT
MAX(ActionDate) 'VisitDate'
, Store 'Store'
, Route 'Route'
, GPS 'GPS'
FROM
sys_data.mod_visit AA
WHERE
ActionDate = (SELECT
MAX(ActionDate) FROM sys_data.mod_visit MX
WHERE
ActionDate < (SELECT MAX(ActionDate) FROM sys_data.mod_visit WHERE Store = MX.Store)
AND MX.Store = AA.Store
GROUP BY
Store)
AND ActionDate < CURDATE()
AND YEAR(ActionDate) = YEAR(CURDATE())
Both the store, and actiondate columns are indexed.
I need to find a way to run this select more efficiently so that I can use the query daily.
The action date if is = ( subquery ) don't need AND ActionDate < CURDATE() (alreday use in subquery)
and istead of a where subquery you could try an inner join
SELECT
MAX(ActionDate) 'VisitDate'
, Store 'Store'
, Route 'Route'
, GPS 'GPS'
FROM sys_data.mod_visit AA
INNER JOIN (
SELECT Store, MAX(ActionDate) max_date
FROM sys_data.mod_visit
WHERE ActionDate < CURDATE()
GROUP BY Store
) MX on MX.max_date = AA.ActionDate
AND MX.store = AA.store
WHERE YEAR(ActionDate) = YEAR(CURDATE())
anyway be sure you have proper composite index on
table sys_data.mod_visit columns (store , ActionDate)
In MySQL 8 you can use the ROW_NUMBER() window function to enumerate the rows per Store in the order of ActionDate. Then you just peek the first three (per store):
SELECT *
FROM (
SELECT
ActionDate 'VisitDate'
, Store 'Store'
, Route 'Route'
, GPS 'GPS'
, ROW_NUMBER() OVER (PARTITION BY Store ORDER BY ActionDate DESC) rn
FROM
sys_data.mod_visit AA
WHERE ActionDate < CURDATE()
AND YEAR(ActionDate) = YEAR(CURDATE())
) x
WHERE rn <= 3
You should have a composite index on (Store, ActionDate). But it's difficult to say how the optimizer is using indexes for window functions.
Also I would rewrite
AND YEAR(ActionDate) = YEAR(CURDATE())
to
AND ActionDate >= DATE_FORMAT(CURDATE(), '%Y-01-01')
I have a database that's set up like this:
(Schema Name)
Historical
-CID int UQ AI NN
-ID Int PK
-Location Varchar(255)
-Status Varchar(255)
-Time datetime
So an entry might look like this
433275 | 97 | MyLocation | OK | 2013-08-20 13:05:54
My question is, if I'm expecting 5 minute interval data from each of my sites, how can I determine how long a site has been down?
Example, if MyLocation didn't send in the 5 minute interval data from 13:05:54 until 14:05:54 it would've missed 60 minutes worth of intervals, how could I find this downtime and report on it easily?
Thanks,
*Disclaimer: I'm assuming that your time column determines the order of the entries in your table and that you can't easily (and without heavy performance loss) self-join the table on auto_increment column since it can contain gaps.*
Either you create a table containing simply datetime values and do a
FROM datetime_table d
LEFT JOIN your_table y ON DATE_FORMAT(d.datetimevalue, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(y.`time`, '%Y-%m-%d %H:%i:00')
WHERE y.some_column IS NULL
(date_format() function is used here to get rid of the seconds part in the datetime values).
Or you use user defined variables.
SELECT * FROM (
SELECT
y.*,
TIMESTAMPDIFF(MINUTE, #prevDT, `Time`) AS timedifference
#prevDT := `Time`
FROM your_table y ,
(SELECT #prevDT:=(SELECT MIN(`Time`) FROM your_table)) vars
ORDER BY `Time`
) sq
WHERE timedifference > 5
EDIT: I thought you wanted to scan the whole table (or parts of it) for rows where the timedifference to the previous row is greater than 5 minutes. To check for a specific ID (and still having same assumptions as in the disclaimer) you'd have to do a different approach:
SELECT
TIMESTAMPDIFF(MINUTE, (SELECT `Time` FROM your_table sy WHERE sy.ID < y.ID ORDER BY ID DESC LIMIT 1), `Time`) AS timedifference
FROM your_table y
WHERE ID = whatever
EDIT 2:
When you say "if the ID is currently down" is there already an entry in your table or not? If not, you can simply check this via
SELECT TIMESTAMPDIFF(MINUTE, NOW(), (SELECT MAX(`Time`) FROM your_table WHERE ID = whatever));
So I assume you are going to have some sort of cron job running to check this table. If that is the case you can simply check for the highest time value for each id/location and compare it against current time to flag any id's that have a most recent time that is older than the specified threshold. You can do that like this:
SELECT id, location, MAX(time) as most_recent_time
FROM Historical
GROUP BY id
HAVING most_recent_time < DATE_SUB(NOW(), INTERVAL 5 minutes)
Something like this:
SELECT h1.ID, h1.location, h1.time, min(h2.time)
FROM Historical h1 LEFT JOIN Historical h2
ON (h1.ID = h2.ID AND h2.CID > h1.CID)
WHERE now() > h1.time + INTERVAL 301 SECOND
GROUP BY h1.ID, h1.location, h1.time
HAVING min(h2.time) IS NULL
OR min(h2.time) > h1.time + INTERVAL 301 SECOND
I have three tables, with the following setup:
TEMPERATURE_1
time
zone (FK)
temperature
TEMPERATURE_2
time
zone (FK)
temperature
TEMPERATURE_3
time
zone (FK)
temperature
The data in each table is updated periodically, but not necessarily concurrently (ie, the time entries are not identical).
I want to be able to access the closest reading from each table for each time, ie:
TEMPERATURES
time
zone (FK)
temperature_1
temperature_2
temperature_3
In other words, for every unique time across my three tables, I want a row in the TEMPERATURES table, where the temperature_n values are the temperature reading closest in time from each original table.
At the moment, I've set this up using two views:
create view temptimes
as select time, zone
from temperature_1
union
select time, zone
from temperature_2
union
select time, zone
from temperature_3;
create view temperatures
as select tt.time,
tt.zone,
(select temperature
from temperature_1
order by abs(timediff(time, tt.time))
limit 1) as temperature_1,
(select temperature
from temperature_2
order by abs(timediff(time, tt.time))
limit 1) as temperature_2,
(select temperature
from temperature_3
order by abs(timediff(time, tt.time))
limit 1) as temperature_3,
from temptimes as tt
order by tt.time;
This approach works, but is too slow to use in production (it takes minutes+ for small data sets of ~1000 records for each temperature).
I'm not great with SQL, so I'm sure I'm missing the correct way to do this. How should I approach the problem?
The expensive part is where the correlated subqueries have to compute the time difference for every single row of each temperature_* table to find just one closest row for one column of one row in the main query.
It would be dramatically faster if you could just pick one row after and one row before the current time according to an index and only compute the time difference for these two candidates. All you need for that to be fast is an index on the column time in your tables.
I am ignoring the column zone, since its role remains unclear in the question, and it just add more noise to the core problem. Should be easy to add to the query.
Without an additional view, this query does all at once:
SELECT time
,COALESCE(temp1
,CASE WHEN timediff(time, time1a) > timediff(time1b, time) THEN
(SELECT t.temperature
FROM temperature_1 t
WHERE t.time = y.time1b)
ELSE
(SELECT t.temperature
FROM temperature_1 t
WHERE t.time = y.time1a)
END) AS temp1
,COALESCE(temp2
,CASE WHEN timediff(time, time2a) > timediff(time2b, time) THEN
(SELECT t.temperature
FROM temperature_2 t
WHERE t.time = y.time2b)
ELSE
(SELECT t.temperature
FROM temperature_2 t
WHERE t.time = y.time2a)
END) AS temp2
,COALESCE(temp3
,CASE WHEN timediff(time, time3a) > timediff(time3b, time) THEN
(SELECT t.temperature
FROM temperature_3 t
WHERE t.time = y.time3b)
ELSE
(SELECT t.temperature
FROM temperature_3 t
WHERE t.time = y.time3a)
END) AS temp3
FROM (
SELECT time
,max(t1) AS temp1
,max(t2) AS temp2
,max(t3) AS temp3
,CASE WHEN max(t1) IS NULL THEN
(SELECT t.time FROM temperature_1 t
WHERE t.time < x.time
ORDER BY t.time DESC LIMIT 1) ELSE NULL END AS time1a
,CASE WHEN max(t1) IS NULL THEN
(SELECT t.time FROM temperature_1 t
WHERE t.time > x.time
ORDER BY t.time LIMIT 1) ELSE NULL END AS time1b
,CASE WHEN max(t2) IS NULL THEN
(SELECT t.time FROM temperature_2 t
WHERE t.time < x.time
ORDER BY t.time DESC LIMIT 1) ELSE NULL END AS time2a
,CASE WHEN max(t2) IS NULL THEN
(SELECT t.time FROM temperature_2 t
WHERE t.time > x.time
ORDER BY t.time LIMIT 1) ELSE NULL END AS time2b
,CASE WHEN max(t3) IS NULL THEN
(SELECT t.time FROM temperature_3 t
WHERE t.time < x.time
ORDER BY t.time DESC LIMIT 1) ELSE NULL END AS time3a
,CASE WHEN max(t3) IS NULL THEN
(SELECT t.time FROM temperature_3 t
WHERE t.time > x.time
ORDER BY t.time LIMIT 1) ELSE NULL END AS time3b
FROM (
SELECT time, temperature AS t1, NULL AS t2, NULL AS t3 FROM temperature_1
UNION ALL
SELECT time, NULL AS t1, temperature AS t2, NULL AS t3 FROM temperature_2
UNION ALL
SELECT time, NULL AS t1, NULL AS t2, temperature AS t3 FROM temperature_3
) AS x
GROUP BY time
) y
ORDER BY time;
->sqlfiddle
Explain
suqquery x replaces your view temptimes and brings the temperature into the result. If all three tables are in sync and have temperatures for all the same points in time, the rest is not even needed and extremely fast.
For every point in time where one of the three tables has no row, the temperature is being fetched as instructed: take the "closest" one from each table.
suqquery y aggregates the rows from x and fetches the previous time (time1a) and the next time (time1b) according to the current time from each table where the temperature is missing. These lookups should be fast using the index.
The final query fetches the temperature from the row with the closest time for each temperature that's actually missing.
This query could be simpler if MySQL would allow to reference columns from more than one level above the current subquery. Bit it cannot. Works just fine with in PostgreSQL: ->sqlfiddle
It also would be simpler if one could return more than one column from a correlated subquery, but I don't know how to do that in MySQL.
And it would be much simpler with CTEs and window functions, but MySQL doesn't know these modern SQL features (unlike other relevant RDBMS).
The reason that this is slow is that it requires 3 table scans to calculate and order the diferences.
I assume that you allready have indexes on the time zone columns - at the moment they won't help becuase of the table scan problem.
There are a number of options to avoid this depending on what you need and what the data collection rates are.
You have already said that the data is collected periodically but not concurrently. This suggests a few options.
To what level of significance do you need the temp data - the day, the hour, the minute etc. Store the time zone info to that level of significance only (or have another column that does) and do your queries on that.
If you know that the 3 closets times will be within a certain time frame (hour, day etc) put in a where clause to limit the calculation to those times that are potential candidates. You are effectively constructing histogram type buckets - you will need a calendar table to do this efficiently.
Make the comparison unidirectional i.e. limit consideration to only those times after the time you are looking for, so if you are looking for 12:00:00 then 13:45:32 is a candidate but 11:59:59 isn't.
I understand what you are trying to accomplish - ask yourself why and if a simpler solution will neet your needs.
My suggestion is that you don't take the closest time, but you take the first time on or before a given time. The reason for this is simple: generally the data for a given time is what is known at that time. Incorporating future information is generally not a good idea for most purposes.
With this change, you can modify your query to take advantage of an index on time. The problem with an index on your query is that the function precludes the use of the index.
So, if you want the most recent temperature, use this instead for each variable:
(select temperature
from temperature_1 t2
where t2.time <= tt.time
order by t2.time desc
limit 1
) as temperature_1,
Actually, you can also construct it like this:
(select time
from temperature_1 t2
where t2.time <= tt.time
order by t2.time desc
limit 1
) as time_1,
And then join the information for the temperature back in. This will be efficient, with the use of an index.
With that in mind, you could actually have two variables time_1_before and time_1_after, for the best time on or before and the best time on or after. You can use logic in the select to choose the nearest value. The joins back to the temperature should be efficient using an index.
But, I will reiterate, I think the last temperature on or before may be the best choice.
I have stored temperatures in a MySQL database. The table is called temperatures. It contains, for example, the columns dtime and temperature. The first one is the time the temperature was measured (the column type is DATETIME) and the latter, well, apparently the temperature (the type is FLOAT).
At the moment I use the following query to fetch the temperatures in a certain period.
SELECT dtime, temperature
FROM temperatures
WHERE dtime BETWEEN "2012-11-15 00:00:00" AND "2012-11-30 23:59:59"
ORDER BY dtime DESC
I'd like to add the average temperature of the day in the results. I tried the following.
SELECT
dtime AS cPVM,
temperature,
(
SELECT AVG(temperature)
FROM temperatures
WHERE DATE(dtime) = DATE(cPVM)
) AS avg
FROM temperatures
WHERE dtime BETWEEN "2012-11-15 00:00:00" AND "2012-11-30 23:59:59"
ORDER BY dtime DESC
Works ok, but this is really, really slow. Fetching the results in that period takes about 5 seconds, when the first one (without the averages) is done in 0.03 seconds.
SELECT DATE(dtime), AVG(temperature)
FROM temperatures
WHERE DATE(dtime) BETWEEN "2012-11-15" AND "2012-11-30"
GROUP BY DATE(dtime)
ORDER BY dtime DESC
This one however is done in 0.04 seconds.
How do I fetch the average temperatures more efficiently?
Use a join instead of a correlated subquery:
SELECT dtime, temperature, avg_temperature
FROM temperatures
JOIN (
SELECT DATE(dtime) AS date_dtime, AVG(temperature) AS avg_temperature
FROM temperatures
WHERE dtime >= '2012-11-15' AND dtime < '2012-12-01'
GROUP BY DATE(dtime)
) AS avg_t
ON date_dtime = DATE(dtime)
WHERE dtime dtime >= '2012-11-15' AND dtime < '2012-12-01'
ORDER BY dtime DESC
Since your first query is very efficient already, let's use it as a starting point. Depending on the size of the result sets it produces, querying the results of your first query can still be very efficient.
Your third query also seems to run very efficiently, so you can fall back to that if my proposed query doesn't perform well enough. The reason I like it is because you can take the original query as a parameter of sorts (minus the ORDER BY) and plug it into this one, which shows the average temperature from the date range of the original query:
SELECT
DATE(dtime) AS day_of_interest,
AVG(temperature) AS avg_temperature
FROM
(
-- Your first query is here, minus the ORDER BY clause
SELECT
dtime,
temperature
FROM
temperatures
WHERE
dtime BETWEEN "2012-11-15 00:00:00" AND "2012-11-30 23:59:59"
-- ORDER BY irrelevant inside subqueries, only slows you down
-- ORDER BY
-- dtime DESC
) AS temperatures_of_interest
GROUP BY
day_of_interest
ORDER BY
day_of_interest DESC
If this query runs "efficiently enough" ™ for you, then this could potentially be an easier solution to code up and automate than perhaps some others.
Hope this helps!
What's the best practice of using subqueries versus calculations multiple times? I've used subqueries until now, but they seem so ridiculous to have when you just need a variable calculated from the previous query (in the following example we're talking about a query with a subquery with a subquery).
So which is the right / best practice method? Personally, being a programmer, everything in me tells me to use method a, seeing as it seems stupid to copy paste calculations, but at the same time, subqueries aren't always good seeing as it can make the query use filesort instead of index sorts (correct me if I'm wrong in this, please).
Method a - subqueries:
SELECT
tmp2.*
FROM
(
SELECT
tmp.*,
(NOW() < tmp.expire_time) as `active`
FROM
(
SELECT
tr.orderid,
tr.transactiontime,
pa.months as `months`,
DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH) as `expire_time`
FROM
`transactions` as `tr`
INNER JOIN
`packages` as `pa`
ON
tr.productid = pa.productid
WHERE
tr.isprocessed = '1'
ORDER BY
tr.transactiontime ASC
) as `tmp`
) as `tmp2`
WHERE
tmp2.active = 1
Explain:
Method b - reusing calculations:
SELECT
tr.orderid,
tr.transactiontime,
pa.months as `months`,
DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH) as `expire_time`,
(NOW() < DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH)) as `active`
FROM
`transactions` as `tr`
INNER JOIN
`packages` as `pa`
ON
tr.productid = pa.productid
WHERE
tr.isprocessed = '1'
AND
(NOW() < DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH))
ORDER BY
tr.transactiontime ASC
Explain:
Notice how DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH) is repeated 3 times, and (NOW() < DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH)) is repeated 2 times.
With the EXPLAINs it seems that method B is much better, but I still dislike the fact that it has to do the same calculation 3 times (I'm assuming it does this, and doesn't save the result and replace all instances itself.).
You should look at MySQL's EXPLAIN command:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
which tells you how MySQL executes the queries.