Explain process noise terminology in Kalman Filter - kalman-filter

I am just learning Kalman filter. In the Kalman Filter terminology, I am having some difficulty with process noise. Process noise seems to be ignored in many concrete examples (most focused on measurement noise). If someone can point me to some introductory level link that described process noise well with examples, that’d be great.
Let’s use a concrete scalar example for my question, given:
x_j = a x_j-1 + b u_j + w_j
Let’s say x_j models the temperature within a fridge with time. It is 5 degrees and should stay that way, so we model with a = 1. If at some point t = 100, the temperature of the fridge becomes 7 degrees (ie. hot day, poor insulation), then I believe the process noise at this point is 2 degrees. So our state variable x_100 = 7 degrees, and this is the true value of the system.
Question 1:
If I then paraphrase the phrase I often see for describing Kalman filter, “we filter the signal x so that the effects of the noise w are minimized “, http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html if we minimize the effects of the 2 degrees, are we trying to get rid of the 2 degree difference? But the true state at is x_100 == 7 degrees. What are we doing to the process noise w exactly when we Kalmen filter?
Question 2:
The process noise has a variance of Q. In the simple fridge example, it seems easy to model because you know the underlying true state is 5 degrees and you can take Q as the deviation from that state. But if the true underlying state is fluctuating with time, when you model, what part of this would be considered state fluctuation vs. “process noise”. And how do we go about determining a good Q (again example would be nice)?
I have found that as Q is always added to the covariance prediction no matter which time step you are at, (see Covariance prediction formula from http://greg.czerniak.info/guides/kalman1/) that if you select an overly large Q, then it doesn’t seem like the Kalman filter would be well-behaved.
Thanks.
EDIT1 My Interpretation
My interpretation of the term process noise is the difference between the actual state of the system and the state modeled from the state transition matrix (ie. a * x_j-1). And what Kalman filter tries to do, is to bring the prediction closer to the actual state. In that sense, it actually partially "incorporate" the process noise into the prediction through the residual feedback mechanism, rather than "eliminate" it, so that it can predict the actual state better. I have not read such an explanation anywhere in my search, and I would appreciate anyone commenting on this view.

In Kalman filtering the "process noise" represents the idea/feature that the state of the system changes over time, but we do not know the exact details of when/how those changes occur, and thus we need to model them as a random process.
In your refrigerator example:
the state of the system is the temperature,
we obtain measurements of the temperature on some time interval, say hourly,
by looking the thermometer dial. Note that you usually need to
represent the uncertainties involved in the measurement process
in Kalman filtering, but you didn't focus on this in your question.
Let's assume that these errors are small.
At time t you look at the thermometer, see that it says 7degrees;
since we've assumed the measurement errors are very small, that means
that the true temperature is (very close to) 7 degrees.
Now the question is: what is the temperature at some later time, say 15 minutes
after you looked?
If we don't know if/when the condenser in the refridgerator turns on we could have:
1. the temperature at the later time is yet higher than 7degrees (15 minutes manages
to get close to the maximum temperature in a cycle),
2. Lower if the condenser is/has-been running, or even,
3. being just about the same.
This idea that there are a distribution of possible outcomes for the real state of the
system at some later time is the "process noise"
Note: my qualitative model for the refrigerator is: the condenser is not running, the temperature goes up until it reaches a threshold temperature a few degrees above the nominal target temperature (note - this is a sensor so there may be noise in terms of the temperature at which the condenser turns on), the condenser stays on until the temperature
gets a few degrees below the set temperature. Also note that if someone opens the door, then there will be a jump in the temperature; since we don't know when someone might do this, we model it as a random process.

Yeah, I don't think that sentence is a good one. The primary purpose of a Kalman filter is to minimize the effects of observation noise, not process noise. I think the author may be conflating Kalman filtering with Kalman control (where you ARE trying to minimize the effect of process noise).
The state does not "fluctuate" over time, except through the influence of process noise.
Remember, a system does not generally have an inherent "true" state. A refrigerator is a bad example, because it's already a control system, with nonlinear properties. A flying cannonball is a better example. There is some place where it "really is", but that's not intrinsic to A. In this example, you can think of wind as a kind of "process noise". (Not a great example, since it's not white noise, but work with me here.) The wind is a 3-dimensional process noise affecting the cannonball's velocity; it does not directly affect the cannonball's position.
Now, suppose that the wind in this area always blows northwest. We should see a positive covariance between the north and west components of wind. A deviation of the cannonball's velocity northwards should make us expect to see a similar deviation to westward, and vice versa.
Think of Q more as covariance than as variance; the autocorrelation aspect of it is almost incidental.

Its a good discussion going over here. I would like to add that the concept of process noise is that what ever prediction that is made based on the model is having some errors and it is represented using the Q matrix. If you note the equations in KF for prediction of Covariance matrix (P_prediction) which is actually the mean squared error of the state being predicted, the Q is simply added to it. PPredict=APA'+Q . I suggest, it would give a good insight if you could find the derivation of KF equations.

If your state-transition model is exact, process noise would be zero. In real-world, it would be nearly impossible to capture the exact state-transition with a mathematical model. The process noise captures that uncertainty.

Related

LSTM Evolution Forecast

I have a confusion about the way the LSTM networks work when forecasting with an horizon that is not finite but I'm rather searching for a prediction in whatever time in future. In physical terms I would call it the evolution of the system.
Suppose I have a time series $y(t)$ (output) I want to forecast, and some external inputs $u_1(t), u_2(t),\cdots u_N(t)$ on which the series $y(t)$ depends.
It's common to use the lagged value of the output $y(t)$ as input for the network, such that I schematically have something like (let's consider for simplicity just lag 1 for the output and no lag for the external input):
[y(t-1), u_1(t), u_2(t),\cdots u_N(t)] \to y(t)
In this way of thinking the network, when one wants to do recursive forecast it is forced to use the predicted value at the previous step as input for the next step. In this way we have an effect of propagation of error that makes the long term forecast badly behaving.
Now, my confusion is, I'm thinking as a RNN as a kind of an (simple version) implementation of a state space model where I have the inputs, my output and one or more state variable responsible for the memory of the system. These variables are hidden and not observed.
So now the question, if there is this kind of variable taking already into account previous states of the system why would I need to use the lagged output value as input of my network/model ?
Getting rid of this does my long term forecast would be better, since I'm not expecting anymore the propagation of the error of the forecasted output. (I guess there will be anyway an error in the internal state propagating)
Thanks !
Please see DeepAR - a LSTM forecaster more than one step into the future.
The main contributions of the paper are twofold: (1) we propose an RNN
architecture for probabilistic forecasting, incorporating a negative
Binomial likelihood for count data as well as special treatment for
the case when the magnitudes of the time series vary widely; (2) we
demonstrate empirically on several real-world data sets that this
model produces accurate probabilistic forecasts across a range of
input characteristics, thus showing that modern deep learning-based
approaches can effective address the probabilistic forecasting
problem, which is in contrast to common belief in the field and the
mixed results
In this paper, they forecast multiple steps into the future, to negate exactly what you state here which is the error propagation.
Skipping several steps allows to get more accurate predictions, further into the future.
One more thing done in this paper is predicting percentiles, and interpolating, rather than predicting the value directly. This adds stability, and an error assessment.
Disclaimer - I read an older version of this paper.

I wonder why some have inputs and some don't in Kalman filter

Whenever I study the Kalman filter. I saw two kind of algorithms of Kalman filter.
One has an input matrix, the other don't have input matrix.
So I'm always confused by that.
Let me know what is the difference.
Please.....
The 'inputs' are often called 'controls' and that is a typical case. For example if the system was a car, u might represent the position of the steering wheel, the position of the accelerator and so on. These do not help to determine the position, but they do help when predicting the new state from the old. In some cases the kalman filter may be part of a larger system that actually produces these control signals.
However one may not have access to such information. For example if one was estimating the position (and so on) of a car externally, for example using radar readings, the state of the car controls may be unknown, and so cannot be included.
Note that there is another difference between the two images, and that is the occurrence (in the input case) of a matrix Q that is added to the predicted state covariance. This is not related to the presence of the inputs and indeed I think that its omission from the no-input case is a mistake. Without such a term the state error covariance matrix will collapse over time to 0 and the filter will fail.
Second image is not a good representation of the Kalman Filter. Like dmuir said, initialization process is basically giving information about current states to the filter. In linear case, initial state condition errors are tolerable whereas in nonlinear case, ie. Extended KF which is the most used case of kalman filter, initial conditions are very important. Most the time if you cannot initialize your filter with close approximation of the real initial state, your filter will mostly diverge.
But it looks like you need to restudy whole filter again. I would suggest this site which helped me through Msc greatly.How Kalman Filters Work

Sarsa and Q Learning (reinforcement learning) don't converge optimal policy

I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can take 4 actions during 8 steps. At the end of this eight steps, the agent can be in 5 possible victory states. The goal is to find the minimum cost. To access of this 5 victories (with different cost value: 50, 50, 0, 40, 60), the agent don't take the same path (like a graph). The blue states are the fail states (sorry for quality) and the episode is stopped.
enter image description here
The real good path is: DCCBBAD
Now my question, I don't understand why in SARSA & Q-Learning (mainly in Q learning), the agent find a path but not the optimal one after 100 000 iterations (always: DACBBAD/DACBBCD). Sometime when I compute again, the agent falls in the good path (DCCBBAD). So I would like to understand why sometime the agent find it and why sometime not. And there is a way to look at in order to stabilize my agent?
Thank you a lot,
Tanguy
TD;DR;
Set your epsilon so that you explore a bunch for a large number of episodes. E.g. Linearly decaying from 1.0 to 0.1.
Set your learning rate to a small constant value, such as 0.1.
Don't stop your algorithm based on number of episodes but on changes to the action-value function.
More detailed version:
Q-learning is only garranteed to converge under the following conditions:
You must visit all state and action pairs infinitely ofter.
The sum of all the learning rates for all timesteps must be infinite, so
The sum of the square of all the learning rates for all timesteps must be finite, that is
To hit 1, just make sure your epsilon is not decaying to a low value too early. Make it decay very very slowly and perhaps never all the way to 0. You can try , too.
To hit 2 and 3, you must ensure you take care of 1, so that you collect infinite learning rates, but also pick your learning rate so that its square is finite. That basically means =< 1. If your environment is deterministic you should try 1. Deterministic environment here that means when taking an action a in a state s you transition to state s' for all states and actions in your environment. If your environment is stochastic, you can try a low number, such as 0.05-0.3.
Maybe checkout https://youtu.be/wZyJ66_u4TI?t=2790 for more info.

Kalman Filter corrected by known path

I am trying to get filtered velocity/spacial data from noisy position data from a tracked vehicle. I have a set of noisy position/time data = (x_i,y_i,t_i) and a known curve along which the vehicle is traveling, curve = (x(s),y(s)), where s is total distance along the curve. I can run a Kalman filter on the data, but I don't know how to constrain it to the 'road' without throwing out data that is too far from the road, which I don't want to do.
Alternately, I'm trying to estimate the value of s along the constrained path with position data that is noisy in x and y
Does anyone have an idea of how to merge the two types of data?
Thanks!
Do you understand what a Kalman filter does? Fundamentally, it assigns a probability to each possible state given just observables. In simple cases, this doesn't use a priori knowledge. But in your case, you can simply set the off-road estimates to zero and renormalizing the remaining probabilities.
Note: this isn't throwing out observables which are too far off the road, or even discarding outcomes which are too far off. It means that an apparent off-road position strongly increases the probabilities of an outcome on, but near the edge of the road.
If you want the model to allow small excursions away from the road, you can use a fast decaying function to model the low but non-zero probability of a car being off the road.
You could have as states the distance s along the path, and the rate of change of s. The position observations X and Y will then be non-linear functions of the state (assuming your track is not a line) so you'll need to use an extended or unscented filter.

How to represent stereo audio data for FFT

How should stereo (2 channel) audio data be represented for FFT? Do you
A. Take the average of the two channels and assign it to the real component of a number and leave the imaginary component 0.
B. Assign one channel to the real component and the other channel to the imag component.
Is there a reason to do one or the other? I searched the web but could not find any definite answers on this.
I'm doing some simple spectrum analysis and, not knowing any better, used option A). This gave me an unexpected result, whereas option B) went as expected. Here are some more details:
I have a WAV file of a piano "middle-C". By definition, middle-C is 260Hz, so I would expect the peak frequency to be at 260Hz and smaller peaks at harmonics. I confirmed this by viewing the spectrum via an audio editing software (Sound Forge). But when I took the FFT myself, with option A), the peak was at 520Hz. With option B), the peak was at 260Hz.
Am I missing something? The explanation that I came up with so far is that representing stereo data using a real and imag component implies that the two channels are independent, which, I suppose they're not, and hence the mess-up.
I don't think you're taking the average correctly. :-)
C. Process each channel separately, assigning the amplitude to the real component and leaving the imaginary component as 0.
Option B does not make sense. Option A, which amounts to convert the signal to mono, is OK (if you are interested in a global spectrum).
Your problem (double freq) is surely related to some misunderstanding in the use of your FFT routines.
Once you take the FFT you need to get the Magnitude of the complex frequency spectrum. To get the magnitude you take the absolute of the complex spectrum |X(w)|. If you want to look at the power spectrum you square the magnitude spectrum, |X(w)|^2.
In terms of your frequency shift I think it has to do with you setting the imaginary parts to zero.
If you imagine the complex Frequency spectrum as a series of complex vectors or position vectors in a cartesian space. If you took one discrete frequency bin X(w), there would be one real component representing its direction in the real axis (x -direction), and one imaginary component in the in the imaginary axis (y - direction). There are four important values about this discrete frequency, 1. real value, 2. imaginary value, 3. Magnitude and, 4. phase. If you just take the real value and set imaginary to 0, you are setting Magnitude = real and phase = 0deg or 90deg. You have hence forth modified the resulting spectrum, and applied a bias to every frequency bin. Take a look at the wiki on Magnitude of a vector, also called the Euclidean norm of a vector to brush up on your understanding. Leonbloy was correct, but I hope this was more informative.
Think of the FFT as a way to get information from a single signal. What you are asking is what is the best way to display data from two signals. My answer would be to treat each independently, and display an FFT for each.
If you want a really fast streaming FFT you can read about an algorithm I wrote here: www.depthcharged.us/?p=176