I am looking to compare two sets of data that are stored in the same table. I am sorry if this is a duplicate SO post, I have read some other posts but have not been able to implement it to solve my problem.
I am running a query to show all Athletes and times for the most recent date (2017-05-20):
SELECT `eventID`,
`location`,<BR>
`date`,
`barcode`,
`runner`,
`Gender`,
`time` FROM `TableName` WHERE `date`='2017-05-20'
I would like to compare the time achieved on the 20th May with the previous time for each athlete.
SELECT `time` FROM `TableName` WHERE `date`='2017-05-13'
How can I structure my query showing all of the ATHLETES, TIME on 13th, TIME on 20th
I have tried some methods such as UNION ALL for example
You can get the previous time using a correlated subquery:
SELECT t.*,
(SELECT t2.time
FROM TableName t2
WHERE t2.runner = t.runner AND t2.eventId = t.eventId AND
t2.date < t.date
ORDER BY t2.date DESC
LIMIT 1
) prev_time
FROM `TableName` t
WHERE t.date = '2017-05-20';
For performance, you want an index on (runner, eventid, date, time).
Related
In this table, there are 100 different receipt_ids. Each receipt_id has multiple statuses. I want to calculate the time difference between the status_code DWRESULT_INIT and DWRESULT_SAVED. I want to group the results by receipt_id so I can see the time difference for all 100 receipt_ids in the DB. I am new to mysql, and am not sure how to accomplish this.
Use mysql's timestampdiff() function to calculate the difference and use from_unixtime() function to convert the timestamp into a datetime data type. To get the 2 status codes into a single record, use a subquery to get the init records only and join it back to your table filtered saved:
select t1.receipt_id, timestampdiff(second, from_unixtime(t2.event_time), from_unixtime(t1.event_time)) as diff
from yourtable t1
inner join
(select receipt_id, event_time
from yourtable
where status_code='DWRESULT_INIT') t2 on t1.receipt_id=t2.receipt_id
where t1.status_code='DWRESULT_SAVED'
SELECT
T1.type,
if(TIMESTAMPDIFF(MINUTE, FROM_UNIXTIME(T1.start_time),FROM_UNIXTIME(T1.end_time)) IS NULL,0,TIMESTAMPDIFF(MINUTE, FROM_UNIXTIME(T1.start_time),FROM_UNIXTIME(T1.end_time))) AS totalTimeInMinute,
T1.start_time AS startTime,
T1.end_time AS endTime
FROM (SELECT type,createdAt AS start_time,(SELECT createdAt FROM Worktime WHERE date='2020-10-08' AND createdAt > start_time LIMIT 1) AS end_time FROM Worktime
WHERE date='2020-10-08') AS T1
I need to select first value for every hour from my db. But I don't know how to reverse order on GROUP BY statement.
How can i rewrite my query (now it selects last value in hour)?
SELECT HOUR(`time`) as hour, mytable.*
FROM mytable
WHERE DATE(`time`) ="2015-09-12" GROUP BY HOUR(`time`) ORDER BY `time` ASC;
This query gave me expected result:
SELECT HOUR(`time`) as hour, sortedTable.* FROM
(SELECT electrolysis.* FROM electrolysis
WHERE DATE(`time`)='2015-09-12' ORDER BY `time`) as sortedTable
GROUP BY HOUR(`time`);
You can just select the MIN HOUR in sub query , try using the query:
SELECT * from mytable WHERE `time` IN (
SELECT MIN(HOUR(`time`)) as `hour`
FROM mytable
WHERE DATE(`time`) ="2015-09-12"
GROUP BY HOUR(`time`) ) ORDER BY `time` ASC;
You can do something like this:-
SELECT sub0.min_time,
mytable.*
FROM mytable
INNER JOIN
(
SELECT MIN(`time`) AS min_time
FROM mytable
GROUP BY HOUR(`time`)
) sub0
ON mytable.`time` = sub0.min_time
WHERE DATE(`time`) ="2015-09-12"
ORDER BY `time` ASC
This is using a sub query to get the smallest time in each hour. This is then joined back against your main table on this min time to get the record that has this time.
Note that there is a potential problem here if there are multiple records that share the same time as the smallest one for an hour. There are ways around this, but that will depend on your data (eg, if you have a unique id field which is always ascending with time then you could select the min id for each hour and join based on that)
You can use below query, which is more optimized just make sure that time field should be indexed.
SELECT HOUR(m.time), m.*
FROM mytable AS m
JOIN
(
SELECT MIN(`time`) AS tm
FROM mytable
WHERE `time` >= '2015-09-12 00:00:00' AND `time` <= '2015-09-12 23:59:59'
GROUP BY HOUR(`time`)
) AS a ON m.time=a.tm
GROUP BY HOUR(m.time)
ORDER BY m.time;
I have a TABLE with Columns: USER_ID,TIMESTAMP and ACTION
Every row tells me which user did what action at a certain time-stamp.
Example:
Alice starts the application at 2014-06-12 16:37:46
Alice stops the application at 2014-06-12 17:48:55
I want a list of users with the time difference between the first row in which they start the application and the last row in which they close it.
Here is how I'm trying to do it:
SELECT USER_ID,DATEDIFF(
(SELECT timestamp FROM MOBILE_LOG WHERE ACTION="START_APP" AND USER_ID="Alice" order by TIMESTAMP LIMIT 1),
(SELECT timestamp FROM MOBILE_LOG WHERE ACTION ="CLOSE_APP" AND USER_ID="Alice" order by TIMESTAMP LIMIT 1)
) AS Duration FROM MOBILE_LOG AS t WHERE USER_ID="Alice";
I ask for the DATEDIFF between two SELECT queries, but I just get a list of Alice`s with -2 as Duration.
Am i on the right track?
I think you should group this table by USER_ID and find minimum date of "START_APP" and maximum of "CLOSE_APP" for each user. Also you should use in DATEDIFF the CLOSE_APP time first and then START_APP time in this case you will get positive value result
SELECT USER_ID,
DATEDIFF(MAX(CASE WHEN ACTION="CLOSE_APP" THEN timestamp END),
MIN(CASE WHEN ACTION="START_APP" THEN timestamp END)
) AS Duration
FROM MOBILE_LOG AS t
GROUP BY USER_ID
SQLFiddle demo
SELECT user_id, start_time, close_time, DATEDIFF(close_time, start_time) duration
FROM
(SELECT MIN(timestamp) start_time, user_id FROM MOBILE_LOG WHERE action="START_APP" GROUP BY user_id) start_action
JOIN
(SELECT MAX(timestamp) close_time, user_id FROM MOBILE_LOG WHERE ACTION ="CLOSE_APP" GROUP BY user_id) close_action
USING (user_id)
WHERE USER_ID="Alice";
You make two "tables" with the earliest time for start for each user, and the latest time for close for each user. Then join them so that the actions of the same user are together.
Now that you have everything setup you can easily subtract between them.
You have the int value because you use the function DATEDIFF, it shows you the number of days between two dates, if you want to have the number of hours and minutes and seconds between dates you have to use TIMEDIFF
Try this:
select t1.USER_ID, TIMEDIFF(t2.timestamp, t1.timestamp)
from MOBILE_LOG t1, MOBILE_LOG t2
where (t1.action,t1.timestamp) in (select action, max(timestamp) from MOBILE_LOG t where t.ACTION = "START_APP" group by USER_ID)
and (t1.action,t1.timestamp) in (select action, max(timestamp), max(id) from MOBILE_LOG t where t.ACTION = "CLOSE_APP" group by USER_ID)
and t1.USER_ID = t2.USER_ID
It will show you difference between two latest dates (startdate,enddate) for all user.
P.S: Sorry, I wrote it without any databases, and may be there are some mistakes. If you have problems with (t1.action,t1.timestamp) in (select...) split it on two: where t1.action in (select ...) and t1.timestamp in (select ...)
Need help with SQL Query (MySQL)
Say I have a table with data as..
The table has the Latitude and Longitude locations logged for a person at some time intervals (TIME column), And DISTANCE_TRAVELLED column has the distance traveled from its previous record.
If i want to know how many minutes a person was not moving (i.e DISTANCE_TRAVEKLLED <= 0.001)
what query should i use?
Can we also group the data by Date? Basically i want to know how many minutes the person was idle in a specific day.
You need to get the previous time for each record. I like to do this using a correlated subquery:
select t.*,
(select t2.time
from table t2
where t2.device = t.device and t2.time < t.time
order by time desc
limit 1
) as prevtime
from table t;
Now you can get the number of minutes not moved, as something like:
select t.*, TIMESTAMPDIFF(MINUTE, prevftime, time) as minutes
from (select t.*,
(select t2.time
from table t2
where t2.device = t.device and t2.time < t.time
order by time desc
limit 1
) as prevtime
from table t
) t
The rest of what you request is just adding the appropriate where clause or group by clause. For instance:
select device, date(time), sum(TIMESTAMPDIFF(MINUTE, prevftime, time)) as minutes
from (select t.*,
(select t2.time
from table t2
where t2.device = t.device and t2.time < t.time
order by time desc
limit 1
) as prevtime
from table t
) t
where distance_travelled <= 0.001
group by device, date(time)
EDIT:
For performance, create an index on table(device, time).
I have three tables, with the following setup:
TEMPERATURE_1
time
zone (FK)
temperature
TEMPERATURE_2
time
zone (FK)
temperature
TEMPERATURE_3
time
zone (FK)
temperature
The data in each table is updated periodically, but not necessarily concurrently (ie, the time entries are not identical).
I want to be able to access the closest reading from each table for each time, ie:
TEMPERATURES
time
zone (FK)
temperature_1
temperature_2
temperature_3
In other words, for every unique time across my three tables, I want a row in the TEMPERATURES table, where the temperature_n values are the temperature reading closest in time from each original table.
At the moment, I've set this up using two views:
create view temptimes
as select time, zone
from temperature_1
union
select time, zone
from temperature_2
union
select time, zone
from temperature_3;
create view temperatures
as select tt.time,
tt.zone,
(select temperature
from temperature_1
order by abs(timediff(time, tt.time))
limit 1) as temperature_1,
(select temperature
from temperature_2
order by abs(timediff(time, tt.time))
limit 1) as temperature_2,
(select temperature
from temperature_3
order by abs(timediff(time, tt.time))
limit 1) as temperature_3,
from temptimes as tt
order by tt.time;
This approach works, but is too slow to use in production (it takes minutes+ for small data sets of ~1000 records for each temperature).
I'm not great with SQL, so I'm sure I'm missing the correct way to do this. How should I approach the problem?
The expensive part is where the correlated subqueries have to compute the time difference for every single row of each temperature_* table to find just one closest row for one column of one row in the main query.
It would be dramatically faster if you could just pick one row after and one row before the current time according to an index and only compute the time difference for these two candidates. All you need for that to be fast is an index on the column time in your tables.
I am ignoring the column zone, since its role remains unclear in the question, and it just add more noise to the core problem. Should be easy to add to the query.
Without an additional view, this query does all at once:
SELECT time
,COALESCE(temp1
,CASE WHEN timediff(time, time1a) > timediff(time1b, time) THEN
(SELECT t.temperature
FROM temperature_1 t
WHERE t.time = y.time1b)
ELSE
(SELECT t.temperature
FROM temperature_1 t
WHERE t.time = y.time1a)
END) AS temp1
,COALESCE(temp2
,CASE WHEN timediff(time, time2a) > timediff(time2b, time) THEN
(SELECT t.temperature
FROM temperature_2 t
WHERE t.time = y.time2b)
ELSE
(SELECT t.temperature
FROM temperature_2 t
WHERE t.time = y.time2a)
END) AS temp2
,COALESCE(temp3
,CASE WHEN timediff(time, time3a) > timediff(time3b, time) THEN
(SELECT t.temperature
FROM temperature_3 t
WHERE t.time = y.time3b)
ELSE
(SELECT t.temperature
FROM temperature_3 t
WHERE t.time = y.time3a)
END) AS temp3
FROM (
SELECT time
,max(t1) AS temp1
,max(t2) AS temp2
,max(t3) AS temp3
,CASE WHEN max(t1) IS NULL THEN
(SELECT t.time FROM temperature_1 t
WHERE t.time < x.time
ORDER BY t.time DESC LIMIT 1) ELSE NULL END AS time1a
,CASE WHEN max(t1) IS NULL THEN
(SELECT t.time FROM temperature_1 t
WHERE t.time > x.time
ORDER BY t.time LIMIT 1) ELSE NULL END AS time1b
,CASE WHEN max(t2) IS NULL THEN
(SELECT t.time FROM temperature_2 t
WHERE t.time < x.time
ORDER BY t.time DESC LIMIT 1) ELSE NULL END AS time2a
,CASE WHEN max(t2) IS NULL THEN
(SELECT t.time FROM temperature_2 t
WHERE t.time > x.time
ORDER BY t.time LIMIT 1) ELSE NULL END AS time2b
,CASE WHEN max(t3) IS NULL THEN
(SELECT t.time FROM temperature_3 t
WHERE t.time < x.time
ORDER BY t.time DESC LIMIT 1) ELSE NULL END AS time3a
,CASE WHEN max(t3) IS NULL THEN
(SELECT t.time FROM temperature_3 t
WHERE t.time > x.time
ORDER BY t.time LIMIT 1) ELSE NULL END AS time3b
FROM (
SELECT time, temperature AS t1, NULL AS t2, NULL AS t3 FROM temperature_1
UNION ALL
SELECT time, NULL AS t1, temperature AS t2, NULL AS t3 FROM temperature_2
UNION ALL
SELECT time, NULL AS t1, NULL AS t2, temperature AS t3 FROM temperature_3
) AS x
GROUP BY time
) y
ORDER BY time;
->sqlfiddle
Explain
suqquery x replaces your view temptimes and brings the temperature into the result. If all three tables are in sync and have temperatures for all the same points in time, the rest is not even needed and extremely fast.
For every point in time where one of the three tables has no row, the temperature is being fetched as instructed: take the "closest" one from each table.
suqquery y aggregates the rows from x and fetches the previous time (time1a) and the next time (time1b) according to the current time from each table where the temperature is missing. These lookups should be fast using the index.
The final query fetches the temperature from the row with the closest time for each temperature that's actually missing.
This query could be simpler if MySQL would allow to reference columns from more than one level above the current subquery. Bit it cannot. Works just fine with in PostgreSQL: ->sqlfiddle
It also would be simpler if one could return more than one column from a correlated subquery, but I don't know how to do that in MySQL.
And it would be much simpler with CTEs and window functions, but MySQL doesn't know these modern SQL features (unlike other relevant RDBMS).
The reason that this is slow is that it requires 3 table scans to calculate and order the diferences.
I assume that you allready have indexes on the time zone columns - at the moment they won't help becuase of the table scan problem.
There are a number of options to avoid this depending on what you need and what the data collection rates are.
You have already said that the data is collected periodically but not concurrently. This suggests a few options.
To what level of significance do you need the temp data - the day, the hour, the minute etc. Store the time zone info to that level of significance only (or have another column that does) and do your queries on that.
If you know that the 3 closets times will be within a certain time frame (hour, day etc) put in a where clause to limit the calculation to those times that are potential candidates. You are effectively constructing histogram type buckets - you will need a calendar table to do this efficiently.
Make the comparison unidirectional i.e. limit consideration to only those times after the time you are looking for, so if you are looking for 12:00:00 then 13:45:32 is a candidate but 11:59:59 isn't.
I understand what you are trying to accomplish - ask yourself why and if a simpler solution will neet your needs.
My suggestion is that you don't take the closest time, but you take the first time on or before a given time. The reason for this is simple: generally the data for a given time is what is known at that time. Incorporating future information is generally not a good idea for most purposes.
With this change, you can modify your query to take advantage of an index on time. The problem with an index on your query is that the function precludes the use of the index.
So, if you want the most recent temperature, use this instead for each variable:
(select temperature
from temperature_1 t2
where t2.time <= tt.time
order by t2.time desc
limit 1
) as temperature_1,
Actually, you can also construct it like this:
(select time
from temperature_1 t2
where t2.time <= tt.time
order by t2.time desc
limit 1
) as time_1,
And then join the information for the temperature back in. This will be efficient, with the use of an index.
With that in mind, you could actually have two variables time_1_before and time_1_after, for the best time on or before and the best time on or after. You can use logic in the select to choose the nearest value. The joins back to the temperature should be efficient using an index.
But, I will reiterate, I think the last temperature on or before may be the best choice.