Use MySQL to calculate difference between two entries in the same table - mysql

I have a series of meter reading stored in a table, each is identified by a building ID, a meter ID and the time at which it was recorded.
For each entry I would like to search for the entry that has the same ID numbers and the closest previous time, I would then like to use the previous time and the previous reading to calculate the length of the time step and the differential between the two readings.
so, I currently have:
BuildingID | MeterID | Date_and_Time | Reading
and I would like to produce:
BuildingID | MeterID | Date_and_Time | Time_Since_Previous_Read | Accumulation_Since previous_Read
two typical entries might look like this:
1 | 1 | 2010-10-09 17:56:20 | 119.6
1 | 1 | 2010-10-09 18:01:08 | 157.4
and I would like to produce:
1 | 1 | 2010-10-09 18:01:08 | 00:04:48 | 37.8
If no previous entry exists (i.e. for the first reading) i woudl like to rerun zeros for the time elapsed and the accumulation.
I would appreciate very much any help that could be offered on this, I made a concerted effort to find the answer in previous posts but to no avail, feel free to direct me to a good source if this has already been solved elsewhere.
thank you

Maybe like this?
SELECT a.BuildingID, a.MeterID, a.Date_and_Time,
a.Date_and_Time-b.Date_and_Time AS `Time_Since_Previous_Read`,
a.Reading-b.Reading AS `Accumulation`,
MAX(b.Date_and_Time) AS `otherdateandtime`
FROM `TABLENAME` AS `a`, `TABLENAME` AS `b`
WHERE a.BuildingID = b.BuildingID AND a.MeterID = b.MeterID
AND a.Date_and_Time>b.Date_and_Time
GROUP BY `a.Date_and_Time`

Try this:
/* 1*/SELECT
/* 2*/ r_B.BuildingID,
/* 3*/ r_B.MeterID,
/* 4*/ r_B.Date_and_Time,
/* 5*/ COALESCE(DATEDIFF(hh, r_A.Date_and_Time, r_B.Date_and_Time), 0) AS Time_Since_Previous_Read,
/* 6*/ COALESCE(r_B.Reading-r_A.Reading, 0) AS Accumulation_Since_Previous_Read
/* 7*/ FROM meterdata r_B
/* 8*/ LEFT OUTER JOIN meterdata r_A
/* 9*/ ON r_B.BuildingID = r_A.BuildingID AND r_B.MeterID = r_A.MeterID AND r_B.Date_and_Time > r_A.Date_and_Time
/*10*/ WHERE NOT EXISTS (SELECT nonelater.Date_and_Time FROM meterdata nonelater WHERE nonelater.BuildingID = r_B.BuildingID AND nonelater.MeterID = r_B.MeterID AND nonelater.Date_and_Time > r_A.Date_and_Time AND nonelater.Date_and_Time < r_B.Date_and_Time)
/*11*/ORDER BY r_B.BuildingID, r_B.MeterID, r_B.Date_and_Time
Here's how the design works:
Lines 7 and 8: the core of the query is a self-JOIN on the meterdata table. You need that to be able to find the difference between values in one row and values in another row. r_B is the later one, r_B the earlier one.
Line 8: Making it a LEFT OUTER JOIN means that it works even if there isn't an earlier r_A row; that part of the join will return NULLs.
Line 9: this constrains the JOIN to only join rows for the same building and meter, and makes sure the two rows are the right way around.
Line 10: If you didn't change the join any more, you'd have each r_B row joining to every single past row. To make sure that r_B matches only to the most recent past row, this line checks that there isn't another row more recent than r_A.
Line 6: this calculates the difference between the readings; if there isn't an earlier r_A row, this calculation will return NULL, so you need the COALESCE function to change that to zero.
At line 5, it does the same thing to find the time interval. For this demo I've used the SQL Server DATEDIFF function which won't give you exactly what you want, because on MySQL it only has one option, to calculate the difference in days; you may be able to use the INTERVAL function instead. Again, if there isn't a row r_A then COALESCE will change the NULL to zero.
Everything's there except for getting a time interval out in days, hours and minutes and formatting it nicely. Good luck with that.

Related

How to Find First Valid Row in SQL Based on Difference of Column Values

I am trying to find a reliable query which returns the first instance of an acceptable insert range.
Research:
some of the below links adress similar questions, but I could get none of them to work for me.
Find first available date, given a date range in SQL
Find closest date in SQL Server
MySQL difference between two rows of a SELECT Statement
How to find a gap in range in SQL
and more...
Objective Query Function:
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
Where InsertRange(1) is the value the query should return. In other words, this would be the first instance where the above condition is satisfied.
Table Structure:
Primary Key: StartRange
StartRange(i-1) < StartRange(i)
StartRange(i-1) + EndRange(i-1) < StartRange(i)
Example Dataset
Below is an example User table (3 columns), with a set range distribution. StartRanges are always ordered in a strictly ascending way, UserID are arbitrary strings, only the sequences of StartRange and EndRange matters:
StartRange EndRange UserID
312 6896 user0
7134 16268 user1
16877 22451 user2
23137 25142 user3
25955 28272 user4
28313 35172 user5
35593 38007 user6
38319 38495 user7
38565 45200 user8
46136 48007 user9
My current Query
I am trying to use this query at the moment:
SELECT t2.StartRange, t2.EndRange
FROM user AS t1, user AS t2
WHERE (t1.StartRange - t2.StartRange+1) > NewValue
ORDER BY t1.EndRange
LIMIT 1
Example Case
Given the table, if NewValue = 800, then the returned answer should be 23137. This means, the first available slot would be between user3 and user4 (with an actual slot size = 813):
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
InsertRange = (StartRange(6) - EndRange(5)) > NewValue
23137 = 25955 - 25142 > 800
More Comments
My query above seemed to be working for the special case where StartRanges where tightly packed (i.e. StartRange(i) = StartRange(i-1) + EndRange(i-1) + 1). This no longer works with a less tightly packed set of StartRanges
Keep in mind that SQL tables have no implicit row order. It seems fair to order your table by StartRange value, though.
We can start to solve this by writing a query to obtain each row paired with the row preceding it. In MySQL, it's hard to do this beautifully because it lacks the row numbering function.
This works (http://sqlfiddle.com/#!9/4437c0/7/0). It may have nasty performance because it generates O(n^2) intermediate rows. There's no row for user0; it can't be paired with any preceding row because there is none.
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
Then, you can use that as a subquery, and apply your conditions, which are
gap >= 800
first matching row (lowest StartRange value) ORDER BY SB
just one LIMIT 1
Here's the query (http://sqlfiddle.com/#!9/4437c0/11/0)
SELECT SB-EA Gap,
EA+1 Beginning_of_gap, SB-1 Ending_of_gap,
UserId UserID_after_gap
FROM (
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
) pairs
WHERE SB-EA >= 800
ORDER BY SB
LIMIT 1
Notice that you may actually want the smallest matching gap instead of the first matching gap. That's called best fit, rather than first fit. To get that you use ORDER BY SB-EA instead.
Edit: There is another way to use MySQL to join adjacent rows, that doesn't have the O(n^2) performance issue. It involves employing user variables to simulate a row_number() function. The query involved is a hairball (that's a technical term). It's described in the third alternative of the answer to this question. How do I pair rows together in MYSQL?

Setting column equal to lagged value of another column in the same table

I have a column where the date is recorded and I want to set another column to the lagged version of the date column. In other words, for every date I want the new column to have the previous date.
I tried a lot of stuff, mostly stupid, and I got nowhere. My main issue was that I was updating a column based on where clauses from the same table and same column and MySQL doesn't allow it.
An example of the data follows below. My goal is to update colum PREVDATE, with the previous row from DATA_DATE with the condition that GVKEY is the same for both rows. I would define previous row as follows, order by GVKEY and DATE_DATE ASC and for every row (given that GVKEY is the same ) I want the previous one
+--------------+--------+---------+-------+----------+-------------+
| DATA_DATE |PREVDATE| PRICE | GVKEY | CUR_DEBT | LT_DEBT |
+--------------+--------+---------+-------+----------+-------------+
| 1965-05-31 | NULL | -17.625 | 1004 | 0.198 | 1.63 |
| 1970-05-31 | NULL | -18.375 | 1004 | 2.298 | 1.58 |
+--------------+--------+---------+-------+----------+-------------+
Here's one approach that makes use of MySQL user-defined variables, and behavior that is not guaranteed, but which see as consistent (at least in MySQL 5.1, 5.5 and 5.6).
WARNING: this returns every row in the table. You may want to consider doing this for a limited range of gvkey values, for testing. Add a WHERE clause...
SELECT IF(r.gvkey=#prev_gvkey,#prev_ddate,NULL) AS prev_date
, #prev_gvkey := r.gvkey AS gvkey
, #prev_ddate := r.data_date AS data_date
FROM (SELECT #prev_ddate := NULL, #prev_gvkey := NULL) i
CROSS
JOIN mytable r
ORDER BY r.gvkey, r.data_date
The order of the expressions in the SELECT list is important, we need to compare the value of the current row to the value "saved" from the previous row, before we save the current values in the #prev_ variables, for the next row.
We need a conditional test to make sure we're still working on the same gvkey. The first data_date for a gvkey isn't going to have a "previous" data_date, so we need to return a NULL.
For best performance, we'll want to have a covering index, with gvkey and data_date as the leading columns:
... ON mytable (gvkey,data_data)
The index can include additional columns, after those, but we need those two columns first, in that order. That will allow MySQL to return the rows "in order" using the index, and avoid an expensive "Using filesort" operation. (Extra column from EXPLAIN will show MySQL "Using index".)
Once we get that working correctly, we can use that as an inline view in an UPDATE statement.
For example:
UPDATE mytable t
JOIN (
SELECT IF(r.gvkey=#prev_gvkey,#prev_ddate,NULL) AS prev_date
, #prev_gvkey := r.gvkey AS gvkey
, #prev_ddate := r.data_date AS data_date
FROM (SELECT #prev_ddate := NULL, #prev_gvkey := NULL) i
CROSS
JOIN mytable r
ORDER BY r.gvkey, r.data_date
) s
ON t.gvkey = s.gvkey
AND t.data_date = s.data_date
SET t.prev_date = s.prev_date
(Again, for a very large table, we probably want to break that transaction up into smaller chunks, by including a predicate on gvkey in the inline view, to limit the number of rows returned/updated.)
Doing this in batches of gvkey ranges is a reasonable approach... eg.
/* first batch */ WHERE r.gvkey >= 1 AND r.gvkey < 100
/* second run */ WHERE r.gvkey >= 100 AND r.gvkey < 200
/* third batch */ WHERE r.gvkey >= 200 AND r.gvkey < 300
Obviously, there are other approaches/SQL patterns to accomplish an equivalent result. I've had success with this approach.
To emphasize an earlier IMPORTANT note: this relies on behavior that is not guaranteed, and which the MySQL Reference Manual warns against (using user-defined variables like this.)

MySQL find minimum and maximum date associated with a record in another table

I am trying to write a query to find the number of miles on a bicycle fork. This number is calculated by taking the distance_reading associated with the date that the fork was installed on (the minimum reading_date on or after the Bicycle_Fork.start_date associated with the Bicycle_Fork record) and subtracting that from the date that the fork was removed (the maximum reading_date on or before the Bicycle_Fork.end_date or, if that is null, the reading closest to today's date). I've managed to restrict the range of odometer_readings to the appropriate ones, but I cannot figure out how to find the minimum and maximum date for each odometer that represents when the fork was installed. It was easy when I only had to look at records matching the start_date or end_date, but the user is not required to enter a new odometer reading for each date that a part is changed. I've been working on this query for several hours now, and I can't find a way to use MIN() that doesn't just take the single smallest date out of all of the results.
Question: How can I find the minimum reading_date and the maximum reading_date associated with each odometer_id while maintaining the restrictions created by my WHERE clause?
If this is not possible, I plan to store the values retrieved from the first query in an array in PHP and deal with it from there, but I would like to be able to find a solution solely in MySQL.
Here is an SQL fiddle with the database schema and the current state of the query: http://sqlfiddle.com/#!2/015642/1
SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id, Bicycle_Fork.fork_id
FROM Bicycle_Fork
INNER JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id
AND Odometers.bicycle_id = Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE (OdometerReadings.reading_date >= Bicycle_Fork.start_date) AND
((Bicycle_Fork.end_date IS NOT NULL AND OdometerReadings.reading_date<= Bicycle_Fork.end_date) XOR (Bicycle_Fork.end_date IS NULL AND OdometerReadings.reading_date <= CURRENT_DATE()))
This is the old query that didn't take into account the possibility of the database lacking a record that corresponded with the start_date or end_date:
SELECT MaxReadingOdo.distance_reading, MinReadingOdo.distance_reading
FROM
(SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id
FROM Bicycle_Fork
LEFT JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id
AND Odometers.bicycle_id = Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE Bicycle_Fork.start_date = OdometerReadings.reading_date) AS MinReadingOdo
INNER JOIN
(SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id
FROM Bicycle_Fork
LEFT JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id AND Odometers.bicycle_id
= Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE Bicycle_Fork.end_date = OdometerReadings.reading_date) AS
MaxReadingOdo
ON MinReadingOdo.odometer_id = MaxReadingOdo.odometer_id
I'm trying to get the following to return from the SQL schema:
I will eventually sum these into one number, but I've been working with them separately to make it easier to check the values.
min_distance_reading | max_distance_reading | odometer_id
=============================================================
75.5 | 2580.5 | 1
510.5 | 4078.5 | 2
17.5 | 78.5 | 3
I don't understand the final part of the puzzle, but this seems close...
SELECT MIN(ro.distance_reading) min_val
, MAX(ro.distance_reading) max_val
, ro.odometer_id
FROM OdometerReadings ro
JOIN odometers o
ON o.odometer_id = ro.odometer_id
JOIN Bicycle_Fork bf
ON bf.bicycle_id = o.bicycle_id
AND bf.start_date <= ro.reading_date
GROUP
BY ro.odometer_id;
http://sqlfiddle.com/#!2/015642/8

SELECT to get two entries, from same table, differentiated by date, in one row

I have a table in which i keep different meters (water meter, electricity meter) and in another table i keep the readings for each meter.
The table structure is like this :
The meter table
MeterID | MeterType | MeterName
The readings Table:
ReadingID | MeterID | Index | DateOfReading
The readings for a meter are read monthly. The thing I am trying to do now is to get the Meter information, the current reading and the previous reading in just one row. So if i would have a query, the following row would result:
MeterID | MeterType | MeterName | CurrnetIndex | LastIndex
I have the following query so far :
SELECT Meter.MeterID, Meter.MeterType, Meter.MeterName, CurrentReading.Index, PreviousReading.Index
FROM Meters AS Meter
LEFT OUTER JOIN Readings AS CurrentReading ON Meter.MeterID = CurrentReading.MeterID
LEFT OUTER JOIN Readings AS PreviousReading ON Meter.MeterID = PreviouseReading.MeterID
WHERE CurrentReading.ReadingID != PreviousReading.ReadingID AND DIMESTAMPDIFF(MONTH, CurrentReading.DateOfReading, PreviousReding.DateOfReading)=-1
The problem is that I may not have the current reading or the previous, or both, but I would still need to have the meter information retrieved. It is perfectly acceptable for me to get NULL columns, but i still need a row :)
Use:
SELECT m.meterid,
m.metertype,
m.metername,
current.index,
previous.index
FROM METER m
LEFT JOIN READING current ON current.meterid = m.meterid
AND MONTH(current.dateofreading) = MONTH(NOW())
LEFT JOIN READING previous ON previous.meterid = m.meterid
AND MONTH(current.dateofreading) = MONTH(NOW())-1
Being an OUTER JOIN - if the MONTH filtration is done in the WHERE clause, it can produce different results than being done in the ON clause.
You could use a subquery to grab the value from a month ago:
select *
, (
select Index
from Readings r2
where r2.MeterID = m.MeterID
and DIMESTAMPDIFF(MONTH, r1.DateOfReading,
r2.DateOfReading) = -1
) as LastIndex
from Meter m
left join
Readings r1
on r1.MeterID = m.MeterID
Another solution is to allow the second left join to fail. You can do that by just changing your where clause to:
WHERE PreviousReading.ReadingID is null
or
(
CurrentReading.ReadingID != PreviousReading.ReadingID
and
DIMESTAMPDIFF(MONTH, CurrentReading.DateOfReading,
PreviousReding.DateOfReading) = -1
)
well, sql philosophy is to store what you know. if you don't know it, then there isn't any row for it. if you do a filter on the record set that you search for, and find nothing, then there isn't any month reading for it. Or that i didnt understand the question

MySQL schedule conflicts

Hey, I stumbled upon this site looking for solutions for event overlaps in mySQL tables. I was SO impressed with the solution (which is helping already) I thought I'd see if I could get some more help...
Okay, so Joe want's to swap shifts with someone at work. He has a court date. He goes to the shift swap form and it pull up this week's schedule (or what's left of it). This is done with a DB query. No sweat. He picks a shift. From this point, it gets prickly.
So, first, the form passes the shift start and shift end to the script. It runs a query for anyone who has a shift that overlaps this shift. They can't work two shifts at once, so all user IDs from this query are put on a black list. This query looks like:
SELECT DISTINCT user_id FROM shifts
WHERE
FROM_UNIXTIME('$swap_shift_start') < shiftend
AND FROM_UNIXTIME('$swap_shift_end') > shiftstart
Next, we run a query for all shifts that are a) the same length (company policy), and b) don't overlap with any other shifts Joe is working.
What I currently have is something like this:
SELECT *
FROM shifts
AND shiftstart BETWEEN FROM_UNIXTIME('$startday') AND FROM_UNIXTIME('$endday')
AND user_id NOT IN ($busy_users)
AND (TIME_TO_SEC(TIMEDIFF(shiftend,shiftstart)) = '$swap_shift_length')
$conflict_dates
ORDER BY shiftstart, lastname
Now, you are probably wondering "what is $conflict_dates???"
Well, when Joe submits the swap shift, it reloads his shifts for the week in case he decides to check out another shift's potential. So when it does that first query, while the script is looping through and outputting his choices, it is also building a string that looks kind of like:
AND NOT(
'joe_shift1_start' < shiftend
AND 'joe_shift1_end' > shiftstart)
AND NOT(
'joe_shift2_start' < shiftend
AND 'joe_shift2_end' > shiftstart)
...etc
So that the database is getting a pretty long query along the lines of:
SELECT *
FROM shifts
AND shiftstart BETWEEN FROM_UNIXTIME('$startday') AND FROM_UNIXTIME('$endday')
AND user_id NOT IN ('blacklisteduser1', 'blacklisteduser2',...etc)
AND (TIME_TO_SEC(TIMEDIFF(shiftend,shiftstart)) = '$swap_shift_length')
AND NOT(
'joe_shift1_start' < shiftend
AND 'joe_shift1_end' > shiftstart)
AND NOT(
'joe_shift2_start' < shiftend
AND 'joe_shift2_end' > shiftstart)
AND NOT(
'joe_shift3_start' < shiftend
AND 'joe_shift3_end' > shiftstart)
AND NOT(
'joe_shift4_start' < shiftend
AND 'joe_shift4_end' > shiftstart)
...etc
ORDER BY shiftstart, lastname
So, my hope is that either SQL has some genius way of dealing with this in a simpler way, or that someone can point out a fantastic logical principal that accounts for the potential conflicts in a much smarter way. (Notice the use of the 'start > end, end < start', before I found that I was using betweens and had to subtract a minute off both ends.)
Thanks!
A
I think you should be able to exclude Joe's other shifts using an inner select instead of the generated string, something like:
SELECT *
FROM shifts s1
AND shiftstart BETWEEN FROM_UNIXTIME('$startday') AND FROM_UNIXTIME('$endday')
AND user_id NOT IN ($busy_users)
AND (TIME_TO_SEC(TIMEDIFF(shiftend,shiftstart)) = '$swap_shift_length')
AND (SELECT COUNT(1) FROM shifts s2
WHERE s2.user_id = $joes_user_id
AND s1.shiftstart < s2.shiftend
AND s2.shiftstart < s1.shiftend) = 0
ORDER BY shiftstart, lastname
Basically, each row has an inner query for the count of Joe's shifts which overlap, and makes sure that it's zero. Thus, only rows which don't overlap with any of Joe's existing shifts will be returned.
You could load the joe_shift{1,2,3} values into a TEMPORARY table and then do a query to join against it, using an outer join to find only shift that don't match any:
CREATE TEMPORARY TABLE joes_shifts (
shiftstart DATETIME
shiftend DATETIME
);
INSERT INTO joes_shifts (shiftstart, shiftend) VALUES
('$joe_shift1_start', '$joe_shift1_end'),
('$joe_shift2_start', '$joe_shift2_end'),
('$joe_shift3_start', '$joe_shift3_end'),
('$joe_shift4_start', '$joe_shift4_end');
-- make sure you have validated these variables to prevent SQL injection
SELECT s.*
FROM shifts s
LEFT OUTER JOIN joes_shifts j
ON (j.shiftstart < s.shiftend OR j.shiftend > s.shiftstart)
WHERE j.shiftstart IS NULL
AND s.shiftstart BETWEEN FROM_UNIXTIME('$startday') AND FROM_UNIXTIME('$endday')
AND s.user_id NOT IN ('blacklisteduser1', 'blacklisteduser2',...etc)
AND (TIME_TO_SEC(TIMEDIFF(s.shiftend,s.shiftstart)) = '$swap_shift_length');
Because of the LEFT OUTER JOIN, when there is no matching row in joes_shifts, the columns are NULL.