Fuel Consumtion data via OBD2 is wrong - can you help me out? - obd-ii

So I am trying to get real time fuel consumtions data from my Car (2021 Kia Sorento PHEV) via OBD2. I've read up on the topic and it seems to be simple enough.
Fuel Consumtion in Liters per Hour (PID 5E(hex)/94(dec) "Engine fuel rate") divided by Speed in Km/h == Liters/100km.
The problem is: The results are... absurd. When i coast around town #50km/h and the gauge cluster reads an instant fuel consumtion ~3-4 Liters/100km the OBD2 Data suggest an usage of ~17-21 Liters/100km.
I've started to calculate the fuel rate in l/h manually using MAP AFR etc. Data from the OBDII Port and arrive at the same value for Liters/Hour and therefor for the same absurd instant fuel consumtion values.
OBD2 Bluetooth Dongles and popuplar Apps like "Car Scanner" or Torque also report this insanely high instant fuel consumtion.
So I am asking you guys: Is there some alternate formula for fuel consumtion I (And the developers of all those android apps) am not aware of?
Thanks :)

Instantaneous consumption can show some "wild" results.
Top Gear's Richard Hammond made reference to this in one series when he pointed out he was getting 99mpg going downhill.
If you want an accurate check of fuel consumption then the most accurate that I know of is to "brim" the tank, drive then "brim" the tank again. You then have distance and fuel consumption.

Related

Is there a way to combine these variables in a way that makes sense?

Hello stack overflow community!
I am a sociology student working on a thesis project comparing home value appreciation and neighborhood racial composition over time.
I'm currently using two separate data sources and trying to combine them in a way that makes sense without aggregating anything.
The first data source is GIS data which has information on home sales in each year by home. The second is census data which has yearly estimates of racial composition by census tract. Both are in .csv formats.
My goal is to create a set of variables for each home row in the GIS data which represents the racial composition for the tract the home is in at the year it was sold (e.g. home 1 | 2010| $500,000 | Census tract 10 | 10% white).
I began doing this by going into Stata and using the following strategy:
For example, if I'm looking at a home sold in 2010 in Census tract 10 and I find that this tract was 10% white in 2010, using something like
If censustract=10 and year=2010, replace percentwhite = 10
However, this seemed incredibly time consuming, as I'm using data that go back decades and a couple dozen Census tracts.
Does anyone have any suggestions on how I might do this smarter, not harder? The first thought I had was to aggregate the data by census tract and year, but was hoping to avoid that if possible. Thank you so much in advance for your help and have a terrific day and start to the new year!
It sounds like you can simply merge census data onto your GIS data. That will be much less painful than using -replace-. Here's an example:
*GIS data: information on home sales in each year by home
clear
input censustract house_id year house_value_k
10 100 2010 200
11 101 2020 500
11 102 1980 100
end
tempfile GIS_data
sa `GIS_data'
*census data: yearly estimates of racial composition by census tract
clear
input censustract year percentwhite
10 2010 20
10 2000 10
11 2010 25
11 2000 5
end
tempfile census_data
sa `census_data'
*easy method: merge the census data onto your GIS data
use `GIS_data', clear
mer m:1 censustract year using `census_data'
drop if _merge==2
list
*hard method: use -replace-
use `GIS_data', clear
gen percentwhite=.
replace percentwhite=20 if censustract==10 & year==2010
replace percentwhite=10 if censustract==10 & year==2000
replace percentwhite=25 if censustract==11 & year==2010
replace percentwhite=5 if censustract==11 & year==2000
list
Both methods "work", but using -merge- is much easier and less prone to errors.
Note: I intentionally created the data sets so that the merge wouldn't be perfect. You will likely want to drop some of the observations in that case. In the code above I dropped when _merge==2

Small sample size for a regression marketing model

I have sales, advertising spend and price data for 10 brands of same industry from 2013-2018. I want to develop an equation to predict 2019 sales.
The variables I have are (price & ad spend by type) :PricePerUnit Magazine, News, Outdoor, Broadcasting, Print.
The confusion I have is I am not sure whether to run regression using only 2018 data with 2018 sales as Target variable and adding additional variable like Past_2Yeas_Sales(2016-17) to above price & ad spend variables (For clarity-Refer the image of data). With this type of data I will have a sample size of only 10 as there are only 10 brands. This I think is too low for linear regression to give correct results.
Second option (which will increase sample size) I figure is could be instead of having a brand as an observation, I take brand+year as an observation which will increase my sample size to 60- for e.g. Brand A has 6 observations like A-2013, A-2014, A-2014...,A-2018, B has B-2013,B-2014..B-2018 and so on for 10 brands(Refer image for data).
Is the second option valid way to run regression? What is the right way to run regression in such situations of small sample size?

Sarsa and Q Learning (reinforcement learning) don't converge optimal policy

I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can take 4 actions during 8 steps. At the end of this eight steps, the agent can be in 5 possible victory states. The goal is to find the minimum cost. To access of this 5 victories (with different cost value: 50, 50, 0, 40, 60), the agent don't take the same path (like a graph). The blue states are the fail states (sorry for quality) and the episode is stopped.
enter image description here
The real good path is: DCCBBAD
Now my question, I don't understand why in SARSA & Q-Learning (mainly in Q learning), the agent find a path but not the optimal one after 100 000 iterations (always: DACBBAD/DACBBCD). Sometime when I compute again, the agent falls in the good path (DCCBBAD). So I would like to understand why sometime the agent find it and why sometime not. And there is a way to look at in order to stabilize my agent?
Thank you a lot,
Tanguy
TD;DR;
Set your epsilon so that you explore a bunch for a large number of episodes. E.g. Linearly decaying from 1.0 to 0.1.
Set your learning rate to a small constant value, such as 0.1.
Don't stop your algorithm based on number of episodes but on changes to the action-value function.
More detailed version:
Q-learning is only garranteed to converge under the following conditions:
You must visit all state and action pairs infinitely ofter.
The sum of all the learning rates for all timesteps must be infinite, so
The sum of the square of all the learning rates for all timesteps must be finite, that is
To hit 1, just make sure your epsilon is not decaying to a low value too early. Make it decay very very slowly and perhaps never all the way to 0. You can try , too.
To hit 2 and 3, you must ensure you take care of 1, so that you collect infinite learning rates, but also pick your learning rate so that its square is finite. That basically means =< 1. If your environment is deterministic you should try 1. Deterministic environment here that means when taking an action a in a state s you transition to state s' for all states and actions in your environment. If your environment is stochastic, you can try a low number, such as 0.05-0.3.
Maybe checkout https://youtu.be/wZyJ66_u4TI?t=2790 for more info.

scrapy xpath not returning desired results. Any idea?

Please look at this page http://164.100.47.132/LssNew/psearch/QResult16.aspx?qref=15845. As you would have guessed, I am trying to scrape all the fields on this page. All fields are yield-ed properly except the Answer field. What I find odd is that the page structure for the question and answer is almost the same (Table[1] and Table[2]); the question scrapes perfectly but the Answer does not. Here are my xpaths:
question:
['q_main'] = Selector(response).xpath('//*[#id="ctl00_ContPlaceHolderMain_GridView2"]/tbody/tr/td/table[1]/tbody/tr/td/text()').extract()
works perfect
Answer:
['q_answer'] = Selector(response).xpath('//*[#id="ctl00_ContPlaceHolderMain_GridView2"]/tbody/tr/td/table[2]/tbody/tr[2]/td/text()').extract()
returns a blank. I have reproduced the full xpath, as returned by/verified in Xpath Helper and console.
What am i overlooking? What am I not able to see?
seems like your xpath has some problem,
checkout the demo from scrapy shell,
In [1]: response.xpath('//tr[td[#class="mainheaderq" and contains(font/text(), "ANSWER")]]/following-sibling::tr/td[#class="griditemq"]//text()').extract()
Out[1]:
[u'\r\n\r\n',
u'MINISTER OF STATE(I/C) FOR COAL, POWER AND NEW & RENEWABLE ENERGY (SHRI PIYUSH GOYAL)\r\n\r\n ',
u'(a) & (b): So far 29 coal mines have been auctioned under the provisions of Coal Mines (Special Provisions) \r\nAct, 2015 and the Rules made thereunder. The auction process for non-regulated sector viz. Iron and Steel, \r\nCement and Captive Power was based on forward bidding process where bidders had to submit their final price \r\noffer above the applicable floor price. In case of Power sector which is a regulated one, reverse bidding \r\nmethodology was adopted where bidders had to submit bids below the applicable ceiling price, which shall be \r\ntaken as fuel cost in determination of power tariff. In case, bid price reaches Rs. zero in reverse bidding, \r\nthe bidding is based on additional premium payable to the concerned State Government, over and above the \r\nfixed reserve price of Rs. 100/- per tonne.\r\n\r\n',
u'\r\nRevenue which would accrue to the coal bearing State Government concerned comprises of Upfront payment \r\nas prescribed in the tender document, Auction proceeds and Royalty on per tonne of coal production. State-wise \r\ndetails of 29 coal mines auctioned so far along-with specified end-uses and estimated revenue which would accrue \r\nto coal bearing state during the life of mine/lease period as given below:\r\n',
u'\r\n\r\nS.No\tState\t\tSpecified End \u2013Use\t\t\tName of Coal Mine\t\tEstimated Revenueduring \r\n\t\t\t\t\t\t\t\t\t\t\t\tthe life of mine/lease \r\n\t\t\t\t\t\t\t\t\t\t\t\tperiod (Rs. In Crores)\r\n1\tChattishgarh\tNon-Regualted Sector\t\t\tChotia\t\t\t\t51596\r\n\t\t\t\t\t\t\t\tGare Palma IV-4\t\r\n\t\t\t\t\t\t\t\tGare Palma IV-5\t\r\n\t\t\t\t\t\t\t\tGare Palma IV-7\t\r\n\t\t\t\t\t\t\t\tGare-Palma Sector-IV/8\r\n2\tJharkhand\tNon-Regualted Sector\t\t\tBrinda and Sasai\t\t49272\r\n\t\t\t\t\t\t\t\tDumri\r\n\t\t\t\t\t\t\t\tKathautia\r\n\t\t\t\t\t\t\t\tLohari\r\n\t\t\t\t\t\t\t\tMeral\r\n\t\t\t\t\t\t\t\tMoitra\r\n\t\t\tPower\t\t\t\t\tGaneshpur\r\n\t\t\t\t\t\t\t\tJitpur\r\n\t\t\t\t\t\t\t\tTokisud North\r\n3\tMadhya Pradesh\tNon-Regualted Sector\t\t\tBicharpur\t\t\t42811\r\n\t\t\t\t\t\t\t\tMandla North\r\n\t\t\t\t\t\t\t\tMandla-South\r\n\t\t\t\t\t\t\t\tSialGhoghri\r\n\t\t\tPower\t\t\t\t\tAmelia North\r\n4\tMaharashtra\tNon-Regualted Sector\t\t\tBelgaon\t\t\t\t2738\r\n\t\t\t\t\t\t\t\tMarkiMangli III\r\n\t\t\t\t\t\t\t\tNerad Malegaon\r\n5\tOdisha\t\tPower\t\t\t\t\tMandakini\t\t\t33741\r\n\t\t\t\t\t\t\t\tTalabira-I\r\n\t\t\t\t\t\t\t\tUtkal - C\r\n6\tWest Bengal\tNon-Regualted Sector\t\t\tArdhagram\t\t\t13354\r\n\t\t\tPower\t\t\t\t\tSarisatolli\r\n\t\t\t\t\t\t\t\tTrans Damodar\r\n\tTotal\t\t\t\t\t\t\t(29) coal blocks\t\t193512\r\n',
u'\r\n\r\n\r\nCoal mine has been assigned to successful bidder as Designated Custodian in view of a court case.\r\n\r\n',
u'\r\nIn addition, an estimated amount of Rs. 1,41,854 Crores would accrue to coal bearing States from allotment \r\nof 38 coal mines to Central and State PSU\u2019s.\r\n\r\n',
u'Out of these 29 coal mines, 16 are operational coal mines included in Schedule-II of the Act and 13 are \r\nnon-operational included in Schedule-III of the Act. Milestones for development and production of coal \r\nfrom the auctioned coal mines have been prescribed under the Coal Mines Development and Production Agreement \r\nsigned with the Successful Bidder. \r\n\r\n ',
u'(c) & (d): Yes, Sir. A few complaints were received regarding cartelization in bidding. It is not possible to \r\nconclusively establish the same until investigation are carried out by Competent Authority. ',
u'\r\n\r\n\r\nThe Government has not approved the recommendation of NA for declaration of successful bidder in case of \r\n4 coal mines namely Gare Palma IV/2&3, Gare Palma IV/1 and Tara as final closing bid price was not found \r\nto be reflecting fair value. ',
u'\r\n\r\n\r\n']
when you are dealing with the tables sometimes it happens and for more information you can refer this.
At least part of the source of your difficulty lies in the fact that the code you see in the console is not the source html that your spider gets as a response (and on which the selectors operate).
In particular, it is extremely common for a <table> to not include a <tbody>; but when your browser translates the html to the DOM tree, it slaps in <tbody> tags. And there was a time when much of the layout of webpages was actually accomplished with (crazily) nested tables. As a result, the DOM of such a website will typically have many more <tbody> elements than the html source.
What this means in practical terms is that:
It is generally a good idea to find a relatively simple xpath (or CSS selector, or ...) for the element(s) you want to select -- not the behemoth you sometimes get from your developer tools.
It is generally a bad idea to include /tbody in your xpath (unless there is an associated attribute, indicating that the tag exists in the source html).
For the site in question,
response.xpath('//td[#class="griditemq"]').extract()
returns a list with the first element the question and the second element the answer.

Explain process noise terminology in Kalman Filter

I am just learning Kalman filter. In the Kalman Filter terminology, I am having some difficulty with process noise. Process noise seems to be ignored in many concrete examples (most focused on measurement noise). If someone can point me to some introductory level link that described process noise well with examples, that’d be great.
Let’s use a concrete scalar example for my question, given:
x_j = a x_j-1 + b u_j + w_j
Let’s say x_j models the temperature within a fridge with time. It is 5 degrees and should stay that way, so we model with a = 1. If at some point t = 100, the temperature of the fridge becomes 7 degrees (ie. hot day, poor insulation), then I believe the process noise at this point is 2 degrees. So our state variable x_100 = 7 degrees, and this is the true value of the system.
Question 1:
If I then paraphrase the phrase I often see for describing Kalman filter, “we filter the signal x so that the effects of the noise w are minimized “, http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html if we minimize the effects of the 2 degrees, are we trying to get rid of the 2 degree difference? But the true state at is x_100 == 7 degrees. What are we doing to the process noise w exactly when we Kalmen filter?
Question 2:
The process noise has a variance of Q. In the simple fridge example, it seems easy to model because you know the underlying true state is 5 degrees and you can take Q as the deviation from that state. But if the true underlying state is fluctuating with time, when you model, what part of this would be considered state fluctuation vs. “process noise”. And how do we go about determining a good Q (again example would be nice)?
I have found that as Q is always added to the covariance prediction no matter which time step you are at, (see Covariance prediction formula from http://greg.czerniak.info/guides/kalman1/) that if you select an overly large Q, then it doesn’t seem like the Kalman filter would be well-behaved.
Thanks.
EDIT1 My Interpretation
My interpretation of the term process noise is the difference between the actual state of the system and the state modeled from the state transition matrix (ie. a * x_j-1). And what Kalman filter tries to do, is to bring the prediction closer to the actual state. In that sense, it actually partially "incorporate" the process noise into the prediction through the residual feedback mechanism, rather than "eliminate" it, so that it can predict the actual state better. I have not read such an explanation anywhere in my search, and I would appreciate anyone commenting on this view.
In Kalman filtering the "process noise" represents the idea/feature that the state of the system changes over time, but we do not know the exact details of when/how those changes occur, and thus we need to model them as a random process.
In your refrigerator example:
the state of the system is the temperature,
we obtain measurements of the temperature on some time interval, say hourly,
by looking the thermometer dial. Note that you usually need to
represent the uncertainties involved in the measurement process
in Kalman filtering, but you didn't focus on this in your question.
Let's assume that these errors are small.
At time t you look at the thermometer, see that it says 7degrees;
since we've assumed the measurement errors are very small, that means
that the true temperature is (very close to) 7 degrees.
Now the question is: what is the temperature at some later time, say 15 minutes
after you looked?
If we don't know if/when the condenser in the refridgerator turns on we could have:
1. the temperature at the later time is yet higher than 7degrees (15 minutes manages
to get close to the maximum temperature in a cycle),
2. Lower if the condenser is/has-been running, or even,
3. being just about the same.
This idea that there are a distribution of possible outcomes for the real state of the
system at some later time is the "process noise"
Note: my qualitative model for the refrigerator is: the condenser is not running, the temperature goes up until it reaches a threshold temperature a few degrees above the nominal target temperature (note - this is a sensor so there may be noise in terms of the temperature at which the condenser turns on), the condenser stays on until the temperature
gets a few degrees below the set temperature. Also note that if someone opens the door, then there will be a jump in the temperature; since we don't know when someone might do this, we model it as a random process.
Yeah, I don't think that sentence is a good one. The primary purpose of a Kalman filter is to minimize the effects of observation noise, not process noise. I think the author may be conflating Kalman filtering with Kalman control (where you ARE trying to minimize the effect of process noise).
The state does not "fluctuate" over time, except through the influence of process noise.
Remember, a system does not generally have an inherent "true" state. A refrigerator is a bad example, because it's already a control system, with nonlinear properties. A flying cannonball is a better example. There is some place where it "really is", but that's not intrinsic to A. In this example, you can think of wind as a kind of "process noise". (Not a great example, since it's not white noise, but work with me here.) The wind is a 3-dimensional process noise affecting the cannonball's velocity; it does not directly affect the cannonball's position.
Now, suppose that the wind in this area always blows northwest. We should see a positive covariance between the north and west components of wind. A deviation of the cannonball's velocity northwards should make us expect to see a similar deviation to westward, and vice versa.
Think of Q more as covariance than as variance; the autocorrelation aspect of it is almost incidental.
Its a good discussion going over here. I would like to add that the concept of process noise is that what ever prediction that is made based on the model is having some errors and it is represented using the Q matrix. If you note the equations in KF for prediction of Covariance matrix (P_prediction) which is actually the mean squared error of the state being predicted, the Q is simply added to it. PPredict=APA'+Q . I suggest, it would give a good insight if you could find the derivation of KF equations.
If your state-transition model is exact, process noise would be zero. In real-world, it would be nearly impossible to capture the exact state-transition with a mathematical model. The process noise captures that uncertainty.