We have a statistics database of which we would like to group some results. Every entry has a timestamp 'tstarted'.
We would like to group by every quarter of the day. For each quarter, we would like to know the day count where we have > 0 results (for that quarter).
We could resolve this by using a subquery:
select quarter, sum(q), count(quarter), sum(q) / count(quarter) as average
from (
select SEC_TO_TIME((TIME_TO_SEC(tstarted) DIV 900) * 900) as quarter, sum(qdelivered) as q
from statistics
where stat_field = 1
group by SEC_TO_TIME((TIME_TO_SEC(tstarted) DIV 900) * 900), date(tstarted)
order by SEC_TO_TIME((TIME_TO_SEC(tstarted) DIV 900) * 900) asc
) as sub
group by quarter
My question: is there a more efficient way to retrieve this result (e.g. join or other way)?
Efficiency could be improved by eliminating the inline view (derived table aliased as sub), and doing all the work in a single query. (This is because of the way that MySQL processes the inline view, creating and populating a temporary MyISAM table.)
I don't understand why the expression date(tstarted) needs to be included in the GROUP BY clause; I don't see that removing that would change the result set returned by the query.
I do now see the effect of including the date(tstarted) in the GROUP BY of the inline view query.
I think this query returns the same result as the original:
SELECT SEC_TO_TIME((TIME_TO_SEC(s.tstarted) DIV 900) * 900) AS `quarter`
, SUM(s.qdelivered) AS `q`
, COUNT(DISTINCT DATE(s.tstarted)) AS `day_count`
, SUM(s.qdelivered) / COUNT(DISTINCT DATE(s.tstarted)) AS `average`
FROM statistics s
WHERE s.stat_field = 1
GROUP BY SEC_TO_TIME((TIME_TO_SEC(s.tstarted) DIV 900) * 900)
This should be more efficient since it avoids materializing an intermediate derived table.
Your question said you wanted a "day count"; that sounds like you want a count of the each day that had a row within a particular quarter hour.
To get that, you could just add an aggregate expression to the SELECT list,
, COUNT(DISTINCT DATE(s.tstarted)) AS `day_count`
I would be tempted to set up a table of quarters in the day. Use this table and LEFT JOIN your statistics table it.
CREATE TABLE quarters
(
id INT,
start_qtr INT,
end_qtr INT
);
INSERT INTO quarters (id, start_qtr, end_qtr) VALUES
(1,0,899),
(2,900,1799),
(3,1800,2699),
(4,2700,3599),
(5,3600,4499),
(6,4500,5399),
(7,5400,6299),
(8,6300,7199),
etc;
Your query can then be:-
SELECT SEC_TO_TIME(quarters.start_qtr) AS quarter,
sum(statistics.qdelivered),
count(statistics.qdelivered),
sum(statistics.qdelivered) / count(statistics.qdelivered) as average
FROM quarters
LEFT OUTER JOIN statistics
ON TIME_TO_SEC(statistics.tstarted) BETWEEN quarters.start_qtr AND quarters.end_qtr
AND statistics.stat_field = 1
AND DATE(statistics.tstarted) = '2014-06-30'
GROUP BY quarter
ORDER BY quarter;
Advantage of this is that it will give you entries with a count of 0 (and an average of NULL) for quarters where there are no statistics, and it saves some of the calculations.
You could save more calculations by adding time columns to the quarters table:-
CREATE TABLE quarters
(
id INT,
start_qtr INT,
end_qtr INT
start_qtr_time TIME,
end_qtr_time TIME,
);
INSERT INTO quarters (id, start_qtr, end_qtr, start_qtr_time, end_qtr_time) VALUES
(1,0,899, '00:00:00', '00:14:59'),
(2,900,1799, '00:15:00', '00:29:59'),
(3,1800,2699, '00:30:00', '00:44:59'),
(4,2700,3599, '00:45:00', '00:59:59'),
(5,3600,4499, '01:00:00', '01:14:59'),
(6,4500,5399, '01:15:00', '01:29:59'),
(7,5400,6299, '01:30:00', '01:44:59'),
(8,6300,7199, '01:45:00', '01:59:59'),
etc
Then this saves the use of a function on the JOIN:-
SELECT start_qtr_time AS quarter,
sum(statistics.qdelivered),
count(statistics.qdelivered),
sum(statistics.qdelivered) / count(statistics.qdelivered) as average
FROM quarters
LEFT OUTER JOIN statistics
ON TIME(statistics.tstarted) BETWEEN quarters.start_qtr_time AND quarters.end_qtr_time
AND statistics.stat_field = 1
AND DATE(statistics.tstarted) = '2014-06-30'
GROUP BY quarter
ORDER BY quarter;
These both assume you are interested in a particular day.
Related
I have the following two tables:
movie_sales (provided daily)
movie_id
date
revenue
movie_rank (provided every few days or weeks)
movie_id
date
rank
The tricky thing is that every day I have data for sales, but only data for ranks once every few days. Here is an example of sample data:
`movie_sales`
- titanic (ID), 2014-06-01 (date), 4.99 (revenue)
- titanic (ID), 2014-06-02 (date), 5.99 (revenue)
`movie_rank`
- titanic (ID), 2014-05-14 (date), 905 (rank)
- titanic (ID), 2014-07-01 (date), 927 (rank)
And, because the movie_rate.date of 2014-05-14 is closer to the two sales dates, the output should be:
id date revenue closest_rank
titanic 2014-06-01 4.99 905
titanic 2014-06-02 5.99 905
The following query works to get the results by getting the min date difference in the sub-select:
SELECT
id,
date,
revenue,
(SELECT rank from movie_rank where id=s.id ORDER BY ABS(DATEDIFF(date, s.date)) ASC LIMIT 1)
FROM
movie_sales s
But I'm afraid that this would have terrible performance as it will literally be doing millions of subselects...on millions of rows. What would be a better way to do this, or is there really no proper way to do this since an index can not be properly done with a DATEDIFF ?
Unfortunately, you are right. The movie rank table must be searched for each movie sale and of all matching movie rows the closest be picked.
With an index on movie_rank(id) the DBMS finds the movie rows quickly, but an index on movie_rank(id, date) would be better, because the date could be read from the index and only the one best match would be read from the table.
But you also say that there are new ranks every few dates. If it is guaranteed to find a rank in a certain range, e.g. for each date there will be at least one rank in the twenty days before and at least one rank in the twenty days after, you can limit the search accordingly. (The index on movie_rank(id, date) would be essential for this, though.)
SELECT
id,
date,
revenue,
(
select r.rank
from movie_rank r
where r.id = s.id
and r.date between s.date - interval 20 days
and s.date + interval 20 days
order by abs(datediff(date, s.date)) asc
limit 1
)
FROM movie_sales s;
This is difficult to get quick with SQL. In a programming language I would choose this algorithm:
Sort the two tables by date and point to the first rows.
Move the rank pointer forward until we match the sales date or are beyond it. (If we aren't there already.)
Compare the sales date with the rank date we are pointing at and with the rank date of the previous row. Take the closer one.
Move the sales pointer one row forward.
Go to 2.
With this algorithm we would already be in about the position we want to be. Let's see, if we can do the same with SQL. Iterations are done with recursive queries in SQL. These are available in MySQL as of version 8.0.
We start with sorting the rows, i.e. giving them numbers. Then we iterate through both data sets.
with recursive
sales as
(
select *, row_number() over (partition by movie_id order by date) as rn
from movie_sales
),
ranks as
(
select *, row_number() over (partition by movie_id order by date) as rn
from movie_rank
),
cte (movie_id, revenue, srn, rrn, sdate, rdate, rrank, closest_rank) as
(
select
movie_id, s.revenue, s.rn, r.rn, s.date, r.date, r.ranking,
case when s.date <= r.date then r.ranking end
from (select * from sales where rn = 1) s
join (select * from ranks where rn = 1) r using (movie_id)
union all
select
cte.movie_id,
cte.revenue,
coalesce(s.rn, cte.srn),
coalesce(r.rn, cte.rrn),
coalesce(s.date, cte.sdate),
coalesce(r.date, cte.rdate),
coalesce(r.ranking, cte.rrank),
case when coalesce(r.date, cte.rdate) >= coalesce(s.date, cte.sdate) then
case when abs(datediff(coalesce(r.date, cte.rdate), coalesce(s.date, cte.sdate))) <
abs(datediff(cte.rdate, coalesce(s.date, cte.sdate)))
then coalesce(r.ranking, cte.rrank)
else cte.rrank
end
end
from cte
left join sales s on s.movie_id = cte.movie_id and s.rn = cte.srn + 1 and cte.closest_rank is not null
left join ranks r on r.movie_id = cte.movie_id and r.rn = cte.rrn + 1 and cte.rdate < cte.sdate
where s.movie_id is not null or r.movie_id is not null
-- where cte.closest_rank is null
)
select
movie_id,
sdate,
revenue,
closest_rank
from cte
where closest_rank is not null;
(BTW: I named the column ranking, because rank is a reserved word in SQL.)
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e994cb56798efabc8f7249fd8320e1cf
This is probably still slow. The reason for this is: there are no pointers to a row in SQL. If we want to go from row #1 to row #2, we must search that row, while in a programming language we would really just move the pointer one step forward. If the tables had an ID, we could build a chain (next_row_id) instead of using row numbers. That could speed this process up. But well, I guess you already notice: this is not an algorithm made for SQL.
Another approach... Avoid the problem by cleansing the data.
Make sure the rank is available for every day. When a new date comes in, find the previous rank, then fill in all the rows for the intervening days.
(This will take some initial effort to 'fix' all the previous missing dates. After that, it is a small effort when a new list of ranks comes in.)
The "report" would be a simple JOIN on the date. You would probably need a 2-column INDEX(movie_id, date) or something like that.
Ultimate solution would be not to calculate all the ranks every time, but store them (in a new column, or even in a new table if you don't want to change existing tables).
Each time you update you could look for sales data without rank and calculate only for those.
With above approach you get rank always from last available rank BEFORE sales data (e.g. if you've data 14 days before and 1 days after, still the one before would be used)
If you strictly need to use ranking closest in time, then you need to run UPDATE also for newly arrived ranking info. I believe it would still be more efficient in the long run.
I have a number of stores where I would like to sum the energy consumption so far this year compared with the same period last year. My challenge is that in the current year the stores have different date intervals in terms of delivered data. That means that store A may have data between 01.01.2018 and 20.01.2018, and store B may have data between 01.01.2018 and 28.01.2018. I would like to sum the same date intervals current year versus previous year.
Data looks like this
Store Date Sum
A 01.01.2018 12
A 20.01.2018 11
B 01.01.2018 33
B 28.01.2018 32
But millions of rows and would use these dates as references to get the same sums previous year.
This is my (erroneous) try:
SET #curryear = (SELECT YEAR(MAX(start_date)) FROM energy_data);
SET #maxdate_curryear = (SELECT MAX(start_date) FROM energy_data WHERE
YEAR(start_date) = #curryear);
SET #mindate_curryear = (SELECT MIN(start_date) FROM energy_data WHERE
YEAR(start_date) = #curryear);
-- the same date intervals last year
SET #maxdate_prevyear = (#maxdate_curryear - INTERVAL 1 YEAR);
SET #mindate_prevyear = (#mindate_curryear - INTERVAL 1 YEAR);
-- sums current year
CREATE TABLE t_sum_curr AS
SELECT name as name_curr, sum(kwh) as sum_curr, min(start_date) AS
min_date_curr, max(start_date) AS max_date_curr, count(distinct
start_date) AS ant_timer FROM energy_data WHERE agg_type = 'timesnivå'
AND start_date >= #mindate_curryear and start_date <= #maxdate_curryear GROUP BY NAME;
-- also seems fair, the same dates one year ago, figured I should find those first and in the next query use that to sum each stores between those date intervals
CREATE TABLE t_sum_prev AS
SELECT name_curr as name_curr2, (min_date_curr - INTERVAL 1 YEAR) AS
min_date_prev, (max_date_curr - INTERVAL 1 YEAR) as max_date_prev FROM
t_sum_curr;
-- getting into trouble!
CREATE TABLE the_results AS
SELECT name, start_date, sum(kwh) as sum_prev from energy_data where
agg_type = 'timesnivå' and
start_date >= #mindate_prevyear and start_date <=
#maxdate_prevyear group by name having start_date BETWEEN (SELECT
min_date_prev from t_sum_prev) AND
(SELECT max_date_prev from t_sum_prev);
`
This last query just tells me that my sub query returns more than 1 row and throws an error message.
I assume what you have is a list of energy consumption figures, where bills or readings have been taken at irregular times, so the consumption covers irregular periods.
The basic approach you need to take is to regularise the consumption periods - by establishing which days each periods covers, and then breaking each reading down into as many days as it covers, and the consumption for each day being a daily average of the period.
I'm assuming the consumption periods are entirely sequential (as a bill or reading normally would be), and not overlapping.
Because of the volume of rows involved (you say millions even in its current form), you might not want to leave the data in daily form - it might suffice to regroup them into regular weekly, monthly, or quarterly periods, depending on what level of granularity you require for comparison.
Once you have your regular periods, comparison will be as easy as cake.
If this is part of a report that will be run on an ongoing basis, you'd probably want to implement some logic that calculates a "regularised consumption" incrementally and on a scheduled basis and stores it in a summary table, with appropriate columns and indexes, so that you aren't having to process many millions of historical rows each time the report is run.
Trying to work around the irregular periods (if indeed it can be done) with fancy joins and on-the-fly averages, rather than tackling them head on, will likely lead to very difficult logic, and particularly on a data set of this size, dire performance.
EDIT: from the comments below.
#Alexander, I've knocked together an example of a query. I haven't tested it and I've written it all in a text editor, so excuse any small syntax errors. What I've come up with seems a bit complex (more complex than I imagined when I began), but I'm also a little bit tired, so I'm not sure whether it could be simplified further.
The only point I would make is that the performance of this query (or any such query), because of the nature of what it has to do in traversing date ranges, is likely to be appalling on a table with millions of rows. I stand by my earlier remarks, that proper indexing of the source data will be crucial, and summarising the source data into a larger granularity will massively aid performance (at the expense of a one-off hit to summarise it). Even daily granularity, will reduce the number of rows by a factor of 24!
WITH energy_data_ext AS
(
SELECT
ed.name AS store_name
,YEAR(ed.start_date) AS reading_year
,ed.start_date AS reading_date
,ed.kwh AS reading_kwh
FROM
energy_data AS ed
)
,available_stores AS
(
SELECT ede.store_name
FROM energy_data_ext AS ede
GROUP BY ede.store_name
)
,current_reading_yr_per_store AS
(
SELECT
ede.store_name
,MAX(ede.reading_year) AS current_reading_year
FROM
energy_data_ext AS ede
GROUP BY
ede.store_name
)
,latest_reading_ranges_per_year AS
(
SELECT
ede.store_name
,ede.reading_year
,MAX(ede.start_date) AS latest_reading_date_of_yr
FROM
energy_data_ext AS ede
GROUP BY
ede.store_name
,ede.reading_year
)
,store_reading_ranges AS
(
SELECT
avs.store_name
,lryps.current_reading_year
,lyrr.latest_reading_date_of_yr AS current_year_latest_reading_date
,(lryps.current_reading_year - 1) AS prev_reading_year
,(lyrr.latest_reading_date_of_yr - INTERVAL 1 YEAR) AS prev_year_latest_reading_date
FROM
available_stores AS avs
LEFT JOIN
current_reading_yr_per_store AS lryps
ON (lryps.store_name = avs.store_name)
LEFT JOIN
latest_reading_ranges_per_year AS lyrr
ON (lyrr.store_name = avs.store_name)
AND (lyrr.reading_year = lryps.current_reading_year)
)
--at this stage, we should have all the calculations we need to
--establish the range for the latest year, and the range for the year prior to that
,current_year_consumption AS
(
SELECT
avs.store_name
SUM(cyed.reading_kwh) AS latest_year_kwh
FROM
available_stores AS avs
LEFT JOIN
store_reading_ranges AS srs
ON (srs.store_name = avs.store_name)
LEFT JOIN
energy_data_ext AS cyed
ON (cyed.reading_year = srs.current_reading_year)
AND (cyed.reading_date <= srs.current_year_latest_reading_date)
GROUP BY
avs.store_name
)
,prev_year_consumption AS
(
SELECT
avs.store_name
SUM(pyed.reading_kwh) AS prev_year_kwh
FROM
available_stores AS avs
LEFT JOIN
store_reading_ranges AS srs
ON (srs.store_name = avs.store_name)
LEFT JOIN
energy_data_ext AS pyed
ON (pyed.reading_year = srs.prev_reading_year)
AND (pyed.reading_date <= srs.prev_year_latest_reading_date)
GROUP BY
avs.store_name
)
SELECT
avs.store_name
,srs.current_reading_year
,srs.current_year_latest_reading_date
,lyc.latest_year_kwh
,srs.prev_reading_year
,srs.prev_year_latest_reading_date
,pyc.prev_year_kwh
FROM
available_stores AS avs
LEFT JOIN
store_reading_ranges AS srs
ON (srs.store_name = avs.store_name)
LEFT JOIN
current_year_consumption AS lyc
ON (lyc.store_name = avs.store_name)
LEFT JOIN
prev_year_consumption AS pyc
ON (pyc.store_name = avs.store_name)
I have 2 tables, one with hostels (effectively a single-room hotel with lots of beds), and the other with bookings.
Hostel table: unique ID, total_spaces
Bookings table: start_date, end_date, num_guests, hostel_ID
I need a (My)SQL query to generate a list of all hostels that have at least num_guests free spaces between start_date and end_date.
Logical breakdown of what I'm trying to achieve:
For each hostel:
Get all bookings that overlap start_date and end_date
For each day between start_date and end_date, sum the total bookings for that day (taking into account num_guests for each booking) and compare with total_spaces, ensuring that there are at least num_guests spaces free on that day (if there aren't on any day then that hostel can be discounted from the results list)
Any suggestions on a query that would do this please? (I can modify the tables if necessary)
I built an example for you here, with more comments, which you can test out:
http://sqlfiddle.com/#!9/10219/9
What's probably tricky for you is to join ranges of overlapping dates. The way I would approach this problem is with a DATES table. It's kind of like a tally table, but for dates. If you join to the DATES table, you basically break down all the booking ranges into bookings for individual dates, and then you can filter and sum them all back up to the particular date range you care about. Helpful code for populating a DATES table can be found here: Get a list of dates between two dates and that's what I used in my example.
Other than that, the query basically follows the logical steps you've already outlined.
Ok, if you are using mysql 8.0.2 and above, then you can use window functions. In such case you can use the solution bellow. This solution does not need to compute the number of quests for each day in the query interval, but only focuses on days when there is some change in the number of hostel guests. Therefore, there is no helping table with dates.
with query as
(
select * from bookings where end_date > '2017-01-02' and start_date < '2017-01-05'
)
select hostel.*, bookingsSum.intervalMax
from hostel
join
(
select tmax.id, max(tmax.intervalCount) intervalMax
from
(
select hostel.id, t.dat, sum(coalesce(sum(t.gn),0)) over (partition by t.id order by t.dat) intervalCount
from hostel
left join
(
select id, start_date dat, guest_num as gn from query
union all
select id, end_date dat, -1 * guest_num as gn from query
) t on hostel.id = t.id
group by hostel.id, t.dat
) tmax
group by tmax.id
) bookingsSum on hostel.id = bookingsSum.id and hostel.total_spaces >= bookingsSum.intervalMax + <num_of_people_you_want_accomodate>
demo
It uses a simple trick, where each start_date represents +guest_num to the overall number of quests and each 'end_date' represents -guest_num to the overall number of quests. We than do the necessary sumarizations in order to find peak number of quests (intervalMax) in the query interval.
You change '2017-01-05' in my query to '2017-01-06' (then only two hostels are in the result) and if you use '2017-01-07' then just hostel id 3 is in the result, since it does not have any bookings yet.
I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?
Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.
To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.
I have one table which is having four fields:
trip_paramid, creation_time, fuel_content,vehicle_id
I want to find the difference between two rows.In my table i have one field fuel_content.Every two minutes i getting packets and inserting to database.From this i want to find out total refuel quantity.If fuel content between two packets is greater than 2,i will treat it as refueling quantity.Multiple refuel may happen in same day.So i want to find out total refuel quantity for a day for a vehicle.I created one table schema&sample data in sqlfiddle. Can anyone help me to find a solution for this.here is the link for table schema..http://www.sqlfiddle.com/#!2/4cf36
Here is a good query.
Parameters (vehicle_id=13) and (date='2012-11-08') are injected in the query, but they are parameters to be modified.
You can note that have I chosen an expression using creation_time<.. and creation_time>.. in instead of DATE(creation_time)='...', this is because the first expression can use indexes on "creation_time" while the second one cannot.
SELECT
SUM(fuel_content-prev_content) AS refuel_tot
, COUNT(*) AS refuel_nbr
FROM (
SELECT
p.trip_paramid
, fuel_content
, creation_time
, (
SELECT ps.fuel_content
FROM trip_parameters AS ps
WHERE (ps.vehicle_id=p.vehicle_id)
AND (ps.trip_paramid<p.trip_paramid)
ORDER BY trip_paramid DESC
LIMIT 1
) AS prev_content
FROM trip_parameters AS p
WHERE (p.vehicle_id=13)
AND (creation_time>='2012-11-08')
AND (creation_time<DATE_ADD('2012-11-08', INTERVAL 1 DAY))
ORDER BY p.trip_paramid
) AS log
WHERE (fuel_content-prev_content)>2
Test it:
select sum(t2.fuel_content-t1.fuel_content) TotalFuel,t1.vehicle_id,t1.trip_paramid as rowIdA,
t2.trip_paramid as rowIdB,
t1.creation_time as timeA,
t2.creation_time as timeB,
t2.fuel_content fuel2,
t1.fuel_content fuel1,
(t2.fuel_content-t1.fuel_content) diffFuel
from trip_parameters t1, trip_parameters t2
where t1.trip_paramid<t2.trip_paramid
and t1.vehicle_id=t2.vehicle_id
and t1.vehicle_id=13
and t2.fuel_content-t1.fuel_content>2
order by rowIdA,rowIdB
where (rowIdA,rowIdB) are all possibles tuples without repetition, diffFuel is the difference between fuel quantity and TotalFuel is the sum of all refuel quanty.
The query compare all fuel content diferences for same vehicle(in this example, for vehicle with id=13) and only sum fuel quantity when the diff fuel is >2.
Regards.