start
end
category
2022:10:14 17:13:00
2022:10:14 17:19:00
A
2022:10:01 16:29:00
2022:10:01 16:49:00
B
2022:10:19 18:55:00
2022:10:19 19:03:00
A
2022:10:31 07:52:00
2022:10:31 07:58:00
A
2022:10:13 18:41:00
2022:10:13 19:26:00
B
The table is sample data about trips
the target is to calculate the time consumed for each category . EX: category A = 02:18:02
1st I changed the time stamp criteria in the csv file as YYYY/MM/DD HH:MM:SS to match with MYSQL, and removed the headers
I created a table in MYSQL Workbench as the following code
CREATE TABLE trip (
start TIMESTAMP,
end TIMESTAMP,
category VARCHAR(6)
);
Then to calculate the consumed time I coded as
SELECT category, SUM(TIMEDIFF(end, start)) as length
FROM trip
GROUP BY CATEGORY;
The result was solid numbers as A=34900 & B = 38000
SO I added a convert, Time function as following:
SELECT category, Convert(SUM(TIMEDIFF(end, start)), Time) as length
FROM trip
GROUP BY category;
THE result was great with category A =03:49:00 , but unfortunately category B= NULL instead of 03:08:00
WHAT I'VE DONE WRONG , what is the different approach I should've done
You can do it as follows :
This is useful to Surpass MySQL's TIME value limit of 838:59:59
SELECT category,
CONCAT(FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600),":",FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60),":",(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)%60) as `length`
FROM trip
GROUP BY category;
This is to get time like 00:20:00 instead of 0:20:0
SELECT category,
CONCAT(
if(FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600) > 10, FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600), CONCAT('0',FLOOR(SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))/3600)) ) ,
":",
if(FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60) > 10, FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60), CONCAT('0', FLOOR((SUM(TIMESTAMPDIFF(SECOND, `start`, `end`))%3600)/60) ) ),
":",
if( (SUM(TIMESTAMPDIFF(SECOND, `start`, `end`) )%3600)%60 > 10, (SUM(TIMESTAMPDIFF(SECOND, `start`, `end`) )%3600)%60, concat('0', (SUM(TIMESTAMPDIFF(SECOND, `start`, `end`) )%3600)%60))
) as `length`
FROM trip
GROUP BY category;
You'd calculate the length for each separate trip in seconds, get sum of the lengths per category then convert seconds to time:
SELECT category, SEC_TO_TIME(SUM(TIMESTAMPDIFF(SECOND, `end`, `start`))) as `length`
FROM trip
GROUP BY category;
If SUM() exceeds the limit for TIME datatype (838:59:59) then this MAXVALUE will be returned.
For the values which exceeds the limit for TIME value use
SELECT category,
CONCAT_WS(':',
secs DIV (60 * 60),
LPAD(secs DIV 60 MOD 60, 2, 0),
LPAD(secs MOD 60, 2, 0)) AS `length`
FROM (
SELECT category, SUM(TIMESTAMPDIFF(SECOND, `end`, `start`)) AS secs
FROM trip
GROUP BY category
) subquery
;
Related
I need to get users visits duration for each day in MySQL.
I have table like:
user_id,date,time_start, time_end
1, 2018-09-01, 09:00:00, 12:30:00
2, 2018-09-01, 13:00:00, 15:10:00
1, 2018-09-03, 09:30:00, 12:30:00
2, 2018-09-03, 13:00:00, 15:10:00
and need to get:
user_id,2018-09-01_duration,2018-09-03_duration
1,03:30:00,03:00:00
2,02:10:00,02:10:00
So columns need to be dynamic as some dates can be missed (2018-09-02).
Is it possible to do with one query without explicit joins per each day (as some days can be null)?
Update #1
Yes, I can generate columns in application side, But I still have terrible query like
SELECT user_id, d1.dt AS "2018-08-01_duration", d2.dt AS "2018-08-03_duration"...
FROM (SELECT
user_id,
time_format(TIMEDIFF(TIMEDIFF(time_out,time_in),time_norm),"%H:%i") AS dt
FROM visits
WHERE date = "2018-09-01") d1
LEFT JOIN(
SELECT
user_id,
time_format(TIMEDIFF(TIMEDIFF(time_out,time_in),time_norm),"%H:%i") AS dt
FROM visits
WHERE date = "2018-09-03") d3
ON users.id = d3.user_id...
Update #2
Yes, data like
select user_id, date, SEC_TO_TIME(SUM(TIME_TO_SEC(time_out) - TIME_TO_SEC(time_in))) as total
from visits
group by user_id, date;
is correct, but in this case data for users goes consistently. And I hope there's the way when I have rows with users and columns with dates (like in example above)
Try something like this:
select user_id, date, sum(time_end - time_start)
from table
group by user_id, date;
You will need to do some tweaking, as you didn't mention the RDBMS provider, but it should give you a clear idea on how to do it.
There's no dynamic way to use pivotting in MySQL but you might use the following for your case :
create table t(user_id int, time_start timestamp, time_end timestamp);
insert into t values(1,'2018-09-01 09:00:00', '2018-09-01 12:30:00');
insert into t values(2,'2018-09-01 13:00:00', '2018-09-01 15:10:00');
insert into t values(1,'2018-09-03 09:30:00', '2018-09-03 12:30:00');
insert into t values(2,'2018-09-03 13:00:00', '2018-09-03 15:10:00');
select min(q.user_id) as user_id,
min(CASE WHEN (q.date='2018-09-01') THEN q.time_diff END) as '2018-09-01_duration',
min(CASE WHEN (q.date='2018-09-03') THEN q.time_diff END) as '2018-09-03_duration'
from
(
select user_id, date(time_start) date,
concat(concat(lpad(hour(timediff(time_start, time_end)),2,'0'),':'),
concat(lpad(minute(timediff(time_start, time_end)),2,'0'),':'),
lpad(second(timediff(time_start, time_end)),2,'0')) as time_diff
from t
) q
group by user_id;
If you know the dates that you want in the result set, you don't need a dynamic query. You can just use conditional aggregation:
select user_id,
SEC_TO_TIME(SUM(CASE WHEN date = '2018-09-01' THEN TIME_TO_SEC(time_out) - TIME_TO_SEC(time_in))) as total_20180901,
SEC_TO_TIME(SUM(CASE WHEN date = '2018-09-02' THEN TIME_TO_SEC(time_out) - TIME_TO_SEC(time_in))) as total_20180902,
SEC_TO_TIME(SUM(CASE WHEN date = '2018-09-03' THEN TIME_TO_SEC(time_out) - TIME_TO_SEC(time_in))) as total_20180903
from visits
group by user_id;
You only need dynamic SQL if you don't know the dates you want in the result set. In that case, I would suggest following the same structure with the dates that you do want.
By the query you can solve your problem. the query is dynamic and you can improve it.
i use TSQL for the query, you can use the idea in MySQL.
declare
#columns as nvarchar(max),
#query as nvarchar(max)
select
#columns =
stuff
((
select
distinct
',' + quotename([date])
from
table_test
for xml path(''), type
).value('.', 'nvarchar(max)'), 1, 1, '')
--select #columns
set #query =
'with
cte_result
as
(
select
[user_id] ,
[date] ,
time_start ,
time_end ,
datediff(minute, time_start, time_end) as duration
from
table_test
)
select
[user_id], ' + #columns + '
from
(
select
[user_id] ,
[date] ,
duration
from
cte_result
)
sourceTable
pivot
(
sum(duration)
for [date] in (' + #columns + ')
)
pivotTable'
execute(#query)
I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha
Table car_log
Speed LogDate
5 2013-04-30 10:10:09 ->row1
6 2013-04-30 10:12:15 ->row2
4 2013-04-30 10:13:44 ->row3
17 2013-04-30 10:15:32 ->row4
22 2013-04-30 10:18:19 ->row5
3 2013-04-30 10:22:33 ->row6
4 2013-04-30 10:24:14 ->row7
15 2013-04-30 10:26:59 ->row8
2 2013-04-30 10:29:19 ->row9
I want to know how long the car get speed under 10.
In my mind, i will count the LogDate difference between row 1 - row4 (because in 10:14:44 => between row4 and row3, the speed is 4) + (sum) LogDate difference between row6 - row8. I am doubt if it right or no.
How can i count it in mysql queries. Thank you.
For every row, find a first row with higher (later) LogDate. If the speed in this row is less than 10, count date difference between this row's date and next row's date, else put 0.
A query that would give a list of the values counted this way should look like:
SELECT ( SELECT IF( c1.speed <10, unix_timestamp( c2.LogDate ) - unix_timestamp( c1.logdate ) , 0 )
FROM car_log c2
WHERE c2.LogDate > c1.LogDate
LIMIT 1
) AS seconds_below_10
FROM car_log c1
Now its just a matter of summing it up:
SELECT sum( seconds_below_10) FROM
( SELECT ( SELECT IF( c1.speed <10, unix_timestamp( c2.LogDate ) - unix_timestamp( c1.logdate ) , 0 )
FROM car_log c2
WHERE c2.LogDate > c1.LogDate
LIMIT 1
) AS seconds_below_10
FROM car_log c1 ) seconds_between_logs
Update after comment about adding CarId:
When you have more than 1 car you need to add one more WHERE condition inside dependent subquery (we want next log for that exact car, not just any next log) and group whole rowset by CarId, possibly adding said CarId to the select to show it too.
SELECT sbl.carId, sum( sbl.seconds_below_10 ) as `seconds_with_speed_less_than_10` FROM
( SELECT c1.carId,
( SELECT IF( c1.speed <10, unix_timestamp( c2.LogDate ) - unix_timestamp( c1.logdate ) , 0 )
FROM car_log c2
WHERE c2.LogDate > c1.LogDate AND c2.carId = c1.carId
LIMIT 1 ) AS seconds_below_10
FROM car_log c1 ) sbl
GROUP BY sbl.carId
See an example at Sqlfiddle.
If the type of column 'LogDate' is a MySQL DATETIME type, you can use the timestampdiff() function in your select statement to get the difference between timestamps. The timestampdiff function is documented in the manual at:
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_timestampdiff
You need to break the query down into subqueries, and then use the TIMESTAMPDIFF function.
The function takes three arguments, the units you want the result in (ex. SECOND, MINUTE, DAY, etc), and then Value2, and last Value1.
To get the maximum value for LogDate where speed is less than 10 use:
select MAX(LogDate) from <yourtable> where Speed<10
To get the minimum value for LogDate where speed is less than 10 use:
select MIN(LogDate) from <yourtable> where Speed<10
Now, combine these into a single query with the TIMESTAMPDIFF function:
select TIMESTAMPDIFF(SECOND, (select MAX(LogDate) from <yourtable> where Speed<10, (select MIN(LogDate) from <yourtable> where Speed<10)));
If LogDate is of a different type, there are other Date/Time Diff functions to handle math between any of these types. You will just need to change 'TIMESTAMPDIFF' to the correct function for your column type.
Additional ref: http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html
Try this SQL:
;with data as (
select *
from ( values
( 5, convert(datetime,'2013-04-30 10:10:09') ),
( 6, convert(datetime,'2013-04-30 10:12:15') ),
( 4, convert(datetime,'2013-04-30 10:13:44') ),
(17, convert(datetime,'2013-04-30 10:15:32') ),
(22, convert(datetime,'2013-04-30 10:18:19') ),
( 3, convert(datetime,'2013-04-30 10:22:33') ),
( 4, convert(datetime,'2013-04-30 10:24:14') ),
(15, convert(datetime,'2013-04-30 10:26:59') ),
( 2, convert(datetime,'2013-04-30 10:29:19') )
) data(speed,logDate)
)
, durations as (
select
duration = case when speed<=10
then datediff(ss, logDate, endDate)
else 0
end
from (
select
t1.speed, t1.logDate, endDate = (
select top 1 logDate
from data
where data.logDate > t1.logDate
)
from data t1
) T
where endDate is not null
)
select TotalDuration = sum(duration)
from durations
which calculates 589 seconds from the sample data provided.
MySQL 5.5.29
Here is a mysql query I am working on without success:
SELECT ID, Bike,
(SELECT IF( MIN( ABS( DATEDIFF( '2011-1-1', Reading_Date ) ) ) = ABS( DATEDIFF( '2011-1-1', Reading_Date ) ) , Reading_Date, NULL ) FROM odometer WHERE Bike=10 ) AS StartDate,
(SELECT IF( MIN( ABS( DATEDIFF( '2011-1-1', Reading_Date ) ) ) = ABS( DATEDIFF( '2011-1-1', Reading_Date ) ) , Miles, NULL ) FROM odometer WHERE Bike=10 ) AS BeginMiles,
(SELECT IF( MIN( ABS( DATEDIFF( '2012-1-1', Reading_Date ) ) ) = ABS( DATEDIFF( '2012-1-1', Reading_Date ) ) , Reading_Date, NULL ) FROM odometer WHERE Bike=10 ) AS EndDate,
(SELECT IF( MIN( ABS( DATEDIFF( '2012-1-1', Reading_Date ) ) ) = ABS( DATEDIFF( '2012-1-1', Reading_Date ) ) , Miles, NULL ) FROM odometer WHERE Bike=10 ) AS EndMiles
FROM `odometer`
WHERE Bike =10;
And the result is:
ID Bike StartDate BeginMiles EndDate EndMiles
14 10 [->] 2011-04-15 27.0 NULL NULL
15 10 [->] 2011-04-15 27.0 NULL NULL
16 10 [->] 2011-04-15 27.0 NULL NULL
Motocycle owners enter odometer readings once a year at or near January 1. I want to calculate the total mileage by motorcycle.
Here is what the data in the table odometer looks like:
(source: bmwmcindy.org)
So to calculate the mileage for this bike for 2011, I need determine which of these records is closer to Jan. 1, 2011 and that is record 14. The starting mileage would be 27. I need to find the record closest to Jan. 1, 2012 and that is record 15. The ending mileage for 2011 is 10657 (which will also be the starting odometer reading when 2012 is calculated.
Here is the table:
DROP TABLE IF EXISTS `odometer`;
CREATE TABLE IF NOT EXISTS `odometer` (
`ID` int(3) NOT NULL AUTO_INCREMENT,
`Bike` int(3) NOT NULL,
`is_MOA` tinyint(1) NOT NULL,
`Reading_Date` date NOT NULL,
`Miles` decimal(8,1) NOT NULL,
PRIMARY KEY (`ID`),
KEY `Bike` (`Bike`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=22 ;
data for table odometer
INSERT INTO `odometer` (`ID`, `Bike`, `is_MOA`, `Reading_Date`, `Miles`) VALUES
(1, 1, 0, '2012-01-01', 5999.0),
(2, 6, 0, '2013-02-01', 14000.0),
(3, 7, 0, '2013-03-01', 53000.2),
(6, 1, 1, '2012-04-30', 10001.0),
(7, 1, 0, '2013-01-04', 31000.0),
(14, 10, 0, '2011-04-15', 27.0),
(15, 10, 0, '2011-12-31', 10657.0),
(16, 10, 0, '2012-12-31', 20731.0),
(19, 1, 1, '2012-09-30', 20000.0),
(20, 6, 0, '2011-12-31', 7000.0),
(21, 7, 0, '2012-01-03', 23000.0);
I am trying to get dates and miles from different records so that I can subtact the beginning miles from the ending miles to get total miles for a particular bike (in the example Bike=10) for a particular year (in this case 2011).
I have read quite a bit about aggregate functions and problems of getting values from the correct record. I thought the answer is somehow in a subqueries. But when try the query above I get data from only the first record. In this case the ending miles should come from the second record.
I hope someone can point me in the right direction.
Miles should be steadily increasing. It would be nice if something like this worked:
select year(Reading_Date) as yr,
max(miles) - min(miles) as MilesInYear
from odometer o
where bike = 10
group by year(reading_date)
Alas, your logic is really much harder than you think. This would be easier in a database such as SQL Server 2012 or Oracle that has the lead and lag functions.
My approach is to find the first and last reading dates for each year. You can calculate this using a correlated subquery:
select o.*,
(select max(date) from odometer o2 where o.bike = o2.bike and o2.date <= o.date - dayofyear(o.date) + 1
) ReadDateForYear
from odometer o
Next, summarize this at the bike and year levels. If there is no read date for the year one or before the beginning of the year, use the first date:
select bike, year(date) as yr,
coalesce(min(ReadDateForYear), min(date)) as FirstReadDate,
coalesce(min(ReadDateForNextYear), max(date)) as LastReadDate
from (select o.*,
(select max(date) from odometer o2 where o.bike = o2.bike and o2.date <= o.date - dayofyear(o.date) + 1
) ReadDateForYear,
(select max(date) from odometer o2 where o.bike = o2.bike and o2.date <= date_add(o.date - dayofyear(0.date) + 1 + interval 1 year)
) ReadDateForNextYear
from odometer o
) o
group by bike, year(date)
Let me call this . To get the final results, you need something like:
select the fields you need
from <q> q join
odometer s
on s.bike = q.bike and year(s.date) = q.year join
odometer e
on s.bike = q.bike and year(e.date) = q.year
Note: this SQL is untested. I'm sure there are syntax errors.
Quest
After a day of running (against nearly 1 GB of data), a set of statements are tumbling down to 40 inserts per second. I am looking to increase that by an order of magnitude or two.
SQL Code
The code to insert the information comes in two parts: a master record and detail records. The master record:
INSERT INTO MONTH_REF (DISTRICT_ID, STATION_ID, CATEGORY_ID, YEAR, MONTH) VALUES
('101', '0066', '010', 1984, 07);
The detail records:
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES ((SELECT ID
FROM MONTH_REF M WHERE M.DISTRICT_ID = '101' AND M.STATION_ID = '0066' AND M.CAT
EGORY_ID = '010' AND M.YEAR = 1984 AND M.MONTH = 07), 0, ' ', 1);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES ((SELECT ID
FROM MONTH_REF M WHERE M.DISTRICT_ID = '101' AND M.STATION_ID = '0066' AND M.CAT
EGORY_ID = '010' AND M.YEAR = 1984 AND M.MONTH = 07), 0.5, ' ', 2);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES ((SELECT ID
FROM MONTH_REF M WHERE M.DISTRICT_ID = '101' AND M.STATION_ID = '0066' AND M.CAT
EGORY_ID = '010' AND M.YEAR = 1984 AND M.MONTH = 07), 0, 'T', 3);
Proposed Solution
The proposed solution eliminates looking up each MONTH_REF_ID by storing it in a local variable, as follows:
INSERT INTO MONTH_REF (DISTRICT_ID, STATION_ID, CATEGORY_ID, YEAR, MONTH) VALUES
('101', '0066', '010', 1984, 07);
SET #month_ref_id := (SELECT LAST_INSERT_ID());
The detail statements then become:
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES (#month_ref_id, 0, ' ', 1);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES (#month_ref_id, 0.5, ' ', 2);
INSERT INTO DAILY (MONTH_REF_ID, AMOUNT, DAILY_FLAG_ID, DAY) VALUES (#month_ref_id, 0, 'T', 3);
Constraints
The MONTH_REF table has an AUTO_INCREMENT primary key and is indexed on it. The DAILY table has no index and no primary key. A primary key can be added to the DAILY table, if it would help.
Question
What is a more efficient way to execute the (billion or so) insert statements than the proposed solution?
Thank you!
This solution works:
INSERT INTO MONTH_REF (DISTRICT_ID,STATION_ID,CATEGORY_ID,YEAR,MONTH) VALUES('101','QFEG','012',1973,08);
SET #month_ref_id := (SELECT LAST_INSERT_ID());
INSERT INTO DAILY (MONTH_REF_ID,AMOUNT,DAILY_FLAG_ID,DAY) VALUES(#month_ref_id,0,' ',1),(#month_ref_id,0,' ',2),(#month_ref_id,0,' ',3);
Inserts went up about four orders of magnitude.