Related
I am trying to build a system that will track vehicle fuelings, and have run into a problem with one report; determining fuel efficiency in distance/fuel. Sample data is:
odometer
fuel
partial_fillup
61290
10.3370
0
61542
6.4300
0
61735
4.3600
0
61994
7.5000
0
62242
5.4070
0
62452
8.1100
0
62713
5.7410
1
62876
9.4850
0
63243
6.1370
1
63499
10.7660
0
Where odometer is the total distance the vehicle has traveled, fuel is the number of gallons or liters put in, and partial_fillup is a boolean meaning the fuel tank was not completely filled if non-zero.
If the user fills the tank each time the query I can use is:
set #a = null;
select
odometer,
odometer-previousOdometer distance,
fuel,
(odometer-previousOdometer)/fuel mpg,
partial_fillup
from
(
select
#a as previousOdometer,
#a:=odometer,
odometer,
fuel/1000 fuel,
partial_fillup
from fuel
where
vehicle_id =1
and odometer >= 61290
order by odometer
) as readings
where readings.previousOdometer is not null;
However, when the user only partially fills the tank, the correct procedure would be to subtract the last full fueling from current odometer reading, then divide by the sum of all fuel since the previous odometer reading, so at odometer 63499, the calculate would be (63499-62876)/(10.7660+6.1370)
This will get the average used on the last ride:
select
odometer,
odometer-lag(odometer) over (order by odometer) as distance,
fuel,
(odometer-lag(odometer) over (order by odometer))/fuel as mpg
from fuel
output:
odometer
distance
fuel
mpg
61290
10.3370
61542
252
6.4300
39.1913
61735
193
4.3600
44.2661
61994
259
7.5000
34.5333
62242
248
5.4070
45.8665
62452
210
8.1100
25.8940
62713
261
5.7410
45.4625
62876
163
9.4850
17.1850
63243
367
6.1370
59.8012
63499
256
10.7660
23.7786
Or you can calculate the total drive distance, and the total amount of fuel used:
select
distance,
sum_fuel,
distance/sum_fuel as mpg
from (
select
f.odometer,
f.odometer-(select min(odometer) from fuel) as distance,
fuel,
sum_fuel
from fuel f
inner join (
select
odometer,
sum(fuel) over (order by R) as sum_fuel
from (
select
odometer,
fuel,
row_number() over (order by odometer) R
from fuel) x
) x on x.odometer = f.odometer
) x2
which will get next output, which will get closer to an average after a longer time of measurement:
distance
sum_fuel
mpg
0
10.3370
0.0000
252
16.7670
15.0295
445
21.1270
21.0631
704
28.6270
24.5922
952
34.0340
27.9720
1162
42.1440
27.5721
1423
47.8850
29.7170
1586
57.3700
27.6451
1953
63.5070
30.7525
2209
74.2730
29.7416
DBFIDDLE
I was able to figure it out after studying Luuk's answer. I'm sure there is a more efficient way to do this; I am not used to using variables in SQL. But, the answers are correct in the test data.
set #oldOdometer = null;
set #totalFuel = 0;
select
s.odometer,
format(fuel, 3) fuel,
s.distance,
format( distance / fuel, 2) as mpg
from (
select
partial_fillup as partial,
odometer,
(fuel+#totalFuel) as fuel,
#totalFuel as totalFuel,
#oldOdometer oldOdometer,
if ( partial_fillup, null,odometer - #oldOdometer ) as distance,
#totalFuel := if ( partial_fillup, #totalFuel + fuel, 0) as pastFuel,
#oldOdometer := if (partial_fillup,#oldOdometer,odometer ) as runningOdometer
from
fuel
order by
odometer ) s
where s.distance is not null
order by s.odometer
limit 1,999;
limit 1,999 simply there to skip the first row returned, since there is not enough data to calculate distance or mpg. On my copy of MySQL, doing this means you do not need to initialize the two variables (you don't have to include the set commands at the beginning), so it works with my reporting tool very well. If you do initialize them, you do not need the limit statement. Works assuming you don't have more than 999 rows returned.
Just checking to see if any of you would have a solution for this – from the below text like this I want to eliminate all text within any parenthesis.
Input –
PAY - addition,FILES (aaaaaaaaaaaaaa/bbbbbbbbbbbs i.e. ssss,ffff – i.e. cccccc),DED (ppppppp, llllll, fffff gggg),LOSS (ddddd, hhhhhh – i.e.),F TO G ( “F” is switching to “G”)
Output –
PAY - addition,FILES,DED,LOSS,F TO G
If you are running MySQL 8.0, you can do this with regexp_replace():
regexp_replace(mytext, '\\([^)]*\\)', '')
This works as long as there are no nested parentheses in the expression (which is consistent with your sample data).
Demo on DB Fiddle:
select regexp_replace(
'PAY - addition,FILES (aaaaaaaaaaaaaa/bbbbbbbbbbbs i.e. ssss,ffff – i.e. cccccc),DED (ppppppp, llllll, fffff gggg),LOSS (ddddd, hhhhhh – i.e.),F TO G ( “F” is switching to “G”)',
'\\([^)]*\\)',
''
) val
| val |
| :--------------------------------------- |
| PAY - addition,FILES ,DED ,LOSS ,F TO G |
Another on for MYSQL8.0:
SET #input:="PAY - addition,FILES (aaaaaaaaaaaaaa/bbbbbbbbbbbs i.e. ssss,ffff – i.e. cccccc),DED (ppppppp, llllll, fffff gggg),LOSS (ddddd, hhhhhh – i.e.),F TO G ( “F” is switching to “G”)";
with recursive cte as (
select
0 i,
#input as text
union all
select
i+1,
CASE WHEN instr(text,'(') >0 AND instr(text,')')>instr(text,'(') THEN REPLACE(text, substring(text,instr(text,'('),instr(text,')')-instr(text,'(')+1), '') ELSE '' END
from cte
where i<10
) select text from cte where text<>'' order by i desc limit 1;
output:
+------------------------------------------+
| text |
+------------------------------------------+
| PAY - addition,FILES ,DED ,LOSS ,F TO G |
+------------------------------------------+
1 row in set (0.00 sec)
I have the following table:
id (integer, primary key)
amount_low (integer)
amount_high (integer)
fixedprice (decimal 4,2 Null)
percentadjust (decimal 4,2 Null)
itemname (varchar 50)
A record will have a value in either the "fixedprice" or "percentadjust" field, but not both. One will be NULL, and the other will have a value.
I need to get records based on a single input amount, "X":
If the "fixedprice" field has a value, I need to get the record if X is >= (fixedprice * amount_low) AND X is <= (fixedprice * amount_high).
If the "percentadjust" field has a value, I need to get the record if X is >= ((((percentadjust / 100) + 1) * 3.5) * amount_low) AND X is <= ((((percentadjust / 100) + 1) * 3.5) * amount_high).
The "3.5" is a value that changes on occasion and I'm not too concerned about that part.
What is a good way to do this in MySQL?
Sample data: (also see http://sqlfiddle.com/#!9/922a0 )
id amount_low amount_high fixedprice percentadjust itemname
-----------------------------------------------------------------
1 20 25 2.25 NULL A
2 50 75 2.38 NULL B
3 23 32 NULL 9.75 C
4 14 22 NULL 9.12 D
5 96 112 2.58 NULL E
Assuming your X was entered as 111 it would be
select * from tblItems
where (fixedprice is not null and 111>=(fixedprice * amount_low) and 111 <= (fixedprice * amount_high) )
OR (percentadjust is not null and 111>=((((percentadjust / 100) + 1) * 3.5) * amount_low) AND 111<=((((percentadjust / 100) + 1) * 3.5) * amount_high))
Note you can always write it as where xyz between A and B to simplify somethings slightly.
Remember that a lot of time can be wasted debugging logic operators when AND and OR are used and safe wrappers with parentheses are not used. So, if you intermingle AND with OR, wrap things well.
I make a cohort analysis processor. Input parameters: time range and step, condition (initial event) to exctract cohorts, additional condition (retention event) to check after each N hours/days/months. Output parameters: cohort analysis grid, like this:
0h | 16h | 32h | 48h | 64h | 80h | 96h |
cohort #00 15 | 6 | 4 | 1 | 1 | 2 | 2 |
cohort #01 1 | 35 | 8 | 0 | 2 | 0 | 1 |
cohort #02 0 | 3 | 31 | 11 | 5 | 3 | 0 |
cohort #03 0 | 0 | 4 | 27 | 7 | 6 | 2 |
cohort #04 0 | 1 | 1 | 4 | 29 | 4 | 3 |
Basically:
fetch cohorts: unique users who did something 1 in every period from time_begin every time_step.
find how many of them (in each cohort) did something 2 after N seconds, N*2 seconds, N*3, and so on until now.
In short - I have 2 solutions. One works too slow and includes a heavy select with joins for each data step: 1 day, 2 day, 3 day, etc. I want to optimize it by joining result for every data step to cohorts - and it's the second solution. It looks like it works but I'm not sure it's the best way and that it will give the same result even if cohorts will intersect. Please check it out.
Here's the whole story.
I have a table of > 100,000 events, something like this:
#user-id, timestamp, event_name
events_view (uid varchar(64), tm int(11), e varchar(64))
example input row:
"user_sampleid1", 1423836540, "level_end:001:win"
To make a cohort analisys first I extract cohorts: for example, users, who send special event '1st_launch' in 10 hour periods starting from 2015-02-13 and ending with 2015-02-16. All code in this post is simplified and shortened to see the idea.
DROP TABLE IF EXISTS tmp_c;
create temporary table tmp_c (uid varchar(64), tm int(11), c int(11) );
set beg = UNIX_TIMESTAMP('2015-02-13 00:00:00');
set en = UNIX_TIMESTAMP('2015-02-16 00:00:00');
select min(tm) into t_start from events_view ;
select max(tm) into t_end from events_view ;
if beg < t_start then
set beg = t_start;
end if;
if en > t_end then
set en = t_end;
end if;
set period = 3600 * 10;
set cnt_c = ceil((en - beg) / period) ;
/*works quick enough*/
WHILE i < cnt_c DO
insert into tmp_c (
select uid, min(tm), i from events_view where
locate("1st_launch", e) > 0 and tm > (beg + period * i)
AND tm <= (beg + period * (i+1)) group by uid );
SET i = i+1;
END WHILE;
Cohorts may consist the same user ids, though usually one user is exist only in one cohort. And in each cohort users are unique.
Now I have temp table like this:
user_id | 1st timestamp | cohort_no
uid1 1423836540 0
uid2 1423839540 0
uid3 1423841160 1
uid4 1423841460 2
...
uidN 1423843080 M
Then I need to again divide time range on periods and calculate for each period how many users from each cohort have sent event "level_end:001:win".
For each small period I select all unique users who have sent "level_end:001:win" event and left join them to tmp_c cohorts table. So I have something like this:
user_id | 1st timestamp | cohort_no | user_id | other fields...
uid1 1423836540 0 uid1
uid2 1423839540 0 null
uid3 1423841160 1 null
uid4 1423841460 2 uid4
...
uidN 1423843080 M null
This way I see how many users from my cohorts are in those who have sent "level_end:001:win", exclude not found by where clause: where t2.uid is not null.
Finally I perform grouping and have counts of users in each cohort, who have sent "level_end:001:win" in this particluar period.
Here's the code:
DROP TABLE IF EXISTS tmp_res;
create temporary table tmp_res (uid varchar(64) CHARACTER SET cp1251 NOT NULL, c int(11), cnt int(11) );
set i = 0;
set cnt_c = ceil((t_end - beg) / period) ;
WHILE i < cnt_c DO
insert into tmp_res
select concat(beg + period * i, "_", beg + period * (i+1)), c, count(distinct(uid)) from
(select t1.uid, t1.c from tmp_c t1 left join
(select uid, min(tm) from events_view where
locate("level_end:001:win", e) > 0 and
tm > (beg + period * i) AND tm <= (beg + period * (i+1)) group by uid ) t2
on t1.uid = t2.uid where t2.uid is not null) t3
group by c;
SET i = i+1;
END WHILE;
/*getting result of the first method: tooo slooooow!*/
select * from tmp_res;
The result I've got (it's ok that some cohorts are not appear on some periods):
"1423832400_1423890000","1","35"
"1423832400_1423890000","2","3"
"1423832400_1423890000","3","1"
"1423832400_1423890000","4","1"
"1423890000_1423947600","1","21"
"1423890000_1423947600","2","50"
"1423890000_1423947600","3","2"
"1423947600_1424005200","1","9"
"1423947600_1424005200","2","24"
"1423947600_1424005200","3","70"
"1423947600_1424005200","4","6"
"1424005200_1424062800","1","7"
"1424005200_1424062800","2","15"
"1424005200_1424062800","3","21"
"1424005200_1424062800","4","32"
"1424062800_1424120400","1","7"
"1424062800_1424120400","2","13"
"1424062800_1424120400","3","24"
"1424062800_1424120400","4","18"
"1424120400_1424178000","1","10"
"1424120400_1424178000","2","12"
"1424120400_1424178000","3","18"
"1424120400_1424178000","4","14"
"1424178000_1424235600","1","6"
"1424178000_1424235600","2","7"
"1424178000_1424235600","3","9"
"1424178000_1424235600","4","12"
"1424235600_1424293200","1","6"
"1424235600_1424293200","2","8"
"1424235600_1424293200","3","9"
"1424235600_1424293200","4","5"
"1424293200_1424350800","1","5"
"1424293200_1424350800","2","3"
"1424293200_1424350800","3","11"
"1424293200_1424350800","4","10"
"1424350800_1424408400","1","8"
"1424350800_1424408400","2","5"
"1424350800_1424408400","3","7"
"1424350800_1424408400","4","7"
"1424408400_1424466000","2","6"
"1424408400_1424466000","3","7"
"1424408400_1424466000","4","3"
"1424466000_1424523600","1","3"
"1424466000_1424523600","2","4"
"1424466000_1424523600","3","8"
"1424466000_1424523600","4","2"
"1424523600_1424581200","2","3"
"1424523600_1424581200","3","3"
It works but it takes too much time to process because there are many queries here instead of one, so I need to rewrite it.
I think it can be rewritten with joins, but I'm still not sure how.
I decided to make a temporary table and write period boundaries in it:
DROP TABLE IF EXISTS tmp_times;
create temporary table tmp_times (tm_start int(11), tm_end int(11));
set cnt_c = ceil((t_end - beg) / period) ;
set i = 0;
WHILE i < cnt_c DO
insert into tmp_times values( beg + period * i, beg + period * (i+1));
SET i = i+1;
END WHILE;
Then I get periods-to-events mapping (user_id + timestamp represent particular event) to temp table and left join it to cohorts table and group the result:
SELECT Concat(tm_start, "_", tm_end) per,
t1.c coh,
Count(DISTINCT( t2.uid ))
FROM tmp_c t1
LEFT JOIN (SELECT *
FROM tmp_times t3
LEFT JOIN (SELECT uid,
tm
FROM events_view
WHERE Locate("level_end:101:win", e) > 0)
t4
ON ( t4.tm > t3.tm_start
AND t4.tm <= t3.tm_end )
WHERE t4.uid IS NOT NULL
ORDER BY t3.tm_start) t2
ON t1.uid = t2.uid
WHERE t2.uid IS NOT NULL
GROUP BY per,
coh
ORDER BY per,
coh;
In my tests this returns the same result as method #1. I can't check the result manually, but I understand how method #1 work more and as far I can see it gives what I want. Method #2 is faster, but I'm not sure it's the best way and it will give the same result even if cohorts will intersect.
Maybe there are well-known common methods to perform a cohort analysis in SQL? Is method #1 I use more reliable than method #2? I work with joins not that often, that's why still do not fully understand joins magic yet.
Method #2 looks like pure magic, and I used to not believe in what I don't understand :)
Thanks for answers!
I was looking around and found no solution to this. I´d be glad if someone could help me out here:
I have a table, e.g. that has among others, following columns:
Vehicle_No, Stop1_depTime, Segment_TravelTime, Stop_arrTime, Stop_Sequence
The data might look something like this:
Vehicle_No Stop1_DepTime Segment_TravelTime Stop_Sequence Stop_arrTime
201 13000 60 1
201 13000 45 2
201 13000 120 3
201 13000 4
202 13300 240 1
202 13300 60 2
...
and I need to calculate the arrival time at each stop from the departure time at the first stop and the travel times in between for each vehicle. What I need in this case would look like this:
Vehicle_No Stop1_DepTime Segment_TravelTime Stop_Sequence Stop_arrTime
201 13000 60 1
201 13000 45 2 13060
201 13000 120 3 13105
201 13000 4 13225
202 13300 240 1
202 13300 60 2 13540
...
I have tried to find a solution for some time but was not successful - Thanks for any help you can give me!
Here is the query that still does not work - I am sure I did something wrong with getting the table from the database into this but dont know where. Sorry if this is a really simple error, I have just begun working with MSSQL.
Also, I have implemented the solution provided below and it works. At this point I mainly want to understand what went wrong here to learn about it. If it takes too much time, please do not bother with my question for too long. Otherwise - thanks a lot :)
;WITH recCTE
AS
(
SELECT ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, ZAEHL_2011.dbo.L32.PlanAbfahrtStart, ZAEHL_2011.dbo.L32.Fahrzeit, ZAEHL_2011.dbo.L32.Sequenz, ZAEHL_2011.dbo.L32.PlanAbfahrtStart AS Stop_arrTime
FROM ZAEHL_2011.dbo.L32
WHERE ZAEHL_2011.dbo.L32.Sequenz = 1
UNION ALL
SELECT t. ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, t. ZAEHL_2011.dbo.L32.PlanAbfahrtStart, t. ZAEHL_2011.dbo.L32.Fahrzeit,t. ZAEHL_2011.dbo.L32.Sequenz, r.Stop_arrTime + r. ZAEHL_2011.dbo.L32.Fahrzeit AS Stop_arrTime
FROM recCTE AS r
JOIN ZAEHL_2011.dbo.L32 AS t
ON t. ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id = r. ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id
AND t. ZAEHL_2011.dbo.L32.Sequenz = r. ZAEHL_2011.dbo.L32.Sequenz + 1
)
SELECT ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, ZAEHL_2011.dbo.L32.PlanAbfahrtStart, ZAEHL_2011.dbo.L32.Fahrzeit, ZAEHL_2011.dbo.L32.Sequenz, ZAEHL_2011.dbo.L32.PlanAbfahrtStart,
CASE WHEN Stop_arrTime = ZAEHL_2011.dbo.L32.PlanAbfahrtStart THEN NULL ELSE Stop_arrTime END AS Stop_arrTime
FROM recCTE
ORDER BY ZAEHL_2011.dbo.L32.Zaehl_Fahrt_Id, ZAEHL_2011.dbo.L32.Sequenz
A recursive CTE solution - assumes that each Vehicle_No appears in the table only once:
DECLARE #t TABLE
(Vehicle_No INT
,Stop1_DepTime INT
,Segment_TravelTime INT
,Stop_Sequence INT
,Stop_arrTime INT
)
INSERT #t (Vehicle_No,Stop1_DepTime,Segment_TravelTime,Stop_Sequence)
VALUES(201,13000,60,1),
(201,13000,45,2),
(201,13000,120,3),
(201,13000,NULL,4),
(202,13300,240,1),
(202,13300,60,2)
;WITH recCTE
AS
(
SELECT Vehicle_No, Stop1_DepTime, Segment_TravelTime,Stop_Sequence, Stop1_DepTime AS Stop_arrTime
FROM #t
WHERE Stop_Sequence = 1
UNION ALL
SELECT t.Vehicle_No, t.Stop1_DepTime, t.Segment_TravelTime,t.Stop_Sequence, r.Stop_arrTime + r.Segment_TravelTime AS Stop_arrTime
FROM recCTE AS r
JOIN #t AS t
ON t.Vehicle_No = r.Vehicle_No
AND t.Stop_Sequence = r.Stop_Sequence + 1
)
SELECT Vehicle_No, Stop1_DepTime, Segment_TravelTime,Stop_Sequence, Stop1_DepTime,
CASE WHEN Stop_arrTime = Stop1_DepTime THEN NULL ELSE Stop_arrTime END AS Stop_arrTime
FROM recCTE
ORDER BY Vehicle_No, Stop_Sequence
EDIT
Corrected version of OP's query - note that it's not necessary to fully qualify the column names:
;WITH recCTE
AS
(
SELECT Zaehl_Fahrt_Id, PlanAbfahrtStart, Fahrzeit, L32.Sequenz, PlanAbfahrtStart AS Stop_arrTime
FROM ZAEHL_2011.dbo.L32
WHERE Sequenz = 1
UNION ALL
SELECT t.Zaehl_Fahrt_Id, t.PlanAbfahrtStart, t.Fahrzeit,t.Sequenz, r.Stop_arrTime + r.Fahrzeit AS Stop_arrTime
FROM recCTE AS r
JOIN ZAEHL_2011.dbo.L32 AS t
ON t.Zaehl_Fahrt_Id = r.Zaehl_Fahrt_Id
AND t.Sequenz = r.Sequenz + 1
)
SELECT Zaehl_Fahrt_Id, PlanAbfahrtStart, Fahrzeit, Sequenz, PlanAbfahrtStart,
CASE WHEN Stop_arrTime = PlanAbfahrtStart THEN NULL ELSE Stop_arrTime END AS Stop_arrTime
FROM recCTE
ORDER BY Zaehl_Fahrt_Id, Sequenz
I'm quite sure this works:
SELECT a.Vehicle_No, a.Stop1_DepTime,
a.Segment_TravelTime, a.Stop_Sequence, a.Stop1_DepTime +
(SELECT SUM(b.Segment_TravelTime) FROM your_table b
WHERE b.Vehicle_No = a.Vehicle_No AND b.Stop_Sequence < a.Stop_Sequence)
FROM your_table a
ORDER BY a.Vehicle_No