How to verify performance of designed Kalman Filter? - kalman-filter

In the literature I have come across two ways how to verify performance of a Kalman filter.
(a) During the run of Kalman filter you keep information about innovation
in every step , where is the observed output of the system and is the predicted output in the same step. Then you take the autocovariance of the innovation and according to whether the function exceeds or doesn't exceeed certain bounds, you can conclude something about the performance of your Kalman Filter.
But that's where the explanation ends. How do I decide what bounds it shouldn't exceed and what does it say about the Kalman filter design?
(b) Aside of innovation you also keep track of innovation covariance matrix and in each step you compute trust region . And again, there is supposed to be a bound over which the function shouldn't grow. But this is where the explanation ends and I'm left without the knowledge of how to determine the bound and what does it say about my Kalman filter.

Related

How to preprocess data for Kalman filter

I am reading through a Kalman filter techniques and thinking about how to use them but I am not sure if I understand the whole process in using the measured data in Kalman data-step.
Lets assume that you have accelerometer and you want to estimate speed. You create a Kalman filter with acceleration, velocity and bias states, and time-step and data-step with their equations. You know the accelerometer's and bias noise variances.
Now, the Kalman filter assumes by definition that the measurement noise is white which is flawed because most of the time it is colored in some way. My questions:
If you use low pass filter to get rid of most of the high frequency noise, the noise won't be white so that is not going to work neither, right?
Is the data preprocessing worth focusing on?
Do you know any techniques/books/articles to fix that? What information do you need about the problem to decide which actions to take?
Are there any data rejection algorithms that are used?
I read a little about that you can use Kalman as whitening filter too but I am not sure how to incorporate that into Kalman I described above.

Why is a target network required?

I have a concern in understanding why a target network is necessary in DQN? I’m reading paper on “human-level control through deep reinforcement learning”
I understand Q-learning. Q-learning is value-based reinforcement learning algorithm that learns “optimal” probability distribution between state-action that will maximize it’s long term discounted reward over a sequence of timesteps.
The Q-learning is updated using the bellman equation, and a single step of the q-learning update is given by
Q(S, A) = Q(S, A) + $\alpha$[R_(t+1) + $\gamma$ (Q(s’,a;’) - Q(s,a)]
Where alpha and gamma are learning and discount factors.
I can understand that the reinforcement learning algorithm will become unstable and diverge.
The experience replay buffer is used so that we do not forget past experiences and to de-correlate datasets provided to learn the probability distribution.
This is where I fail.
Let me break the paragraph from the paper down here for discussion
The fact that small updates to $Q$ may significantly change the policy and therefore change the data distribution — understood this part. Changes to Q-network periodically may lead to unstability and changes in distribution. For example, if we always take a left turn or something like this.
and the correlations between the action-values (Q) and the target values r + $gamma$ (argmax(Q(s’,a’)) — This says that the reward + gamma * my prediction of the return given that I take what I think is the best action in the current state and follow my policy from then on.
We used an iterative update that adjusts the action-values (Q) towards target values that are only periodically updated, thereby reducing correlations with the target.
So, in summary a target network required because the network keeps changing at each timestep and the “target values” are being updated at each timestep?
But I do not understand how it is going to solve it?
So, in summary a target network required because the network keeps changing at each timestep and the “target values” are being updated at each timestep?
The difference between Q-learning and DQN is that you have replaced an exact value function with a function approximator. With Q-learning you are updating exactly one state/action value at each timestep, whereas with DQN you are updating many, which you understand. The problem this causes is that you can affect the action values for the very next state you will be in instead of guaranteeing them to be stable as they are in Q-learning.
This happens basically all the time with DQN when using a standard deep network (bunch of layers of the same size fully connected). The effect you typically see with this is referred to as "catastrophic forgetting" and it can be quite spectacular. If you are doing something like moon lander with this sort of network (the simple one, not the pixel one) and track the rolling average score over the last 100 games or so, you will likely see a nice curve up in score, then all of a sudden it completely craps out starts making awful decisions again even as your alpha gets small. This cycle will continue endlessly regardless of how long you let it run.
Using a stable target network as your error measure is one way of combating this effect. Conceptually it's like saying, "I have an idea of how to play this well, I'm going to try it out for a bit until I find something better" as opposed to saying "I'm going to retrain myself how to play this entire game after every move". By giving your network more time to consider many actions that have taken place recently instead of updating all the time, it hopefully finds a more robust model before you start using it to make actions.
On a side note, DQN is essentially obsolete at this point, but the themes from that paper were the fuse leading up to the RL explosion of the last few years.

Reinforcement Learning, ϵ-greedy approach vs optimal action

In Reinforcement Learning, why should we select actions according to an ϵ-greedy approach rather than always selecting the optimal action ?
We use an epsilon-greedy method for exploration during training. This means that when an action is selected by training, it is either chosen as the action with the highest Q-value, or a random action by some factor (epsilon).
Choosing between these two is random and based on the value of epsilon. initially, lots of random actions are taken which means we start by exploring the space, but as training progresses, more actions with the maximum q-values are taken and we gradually start giving less attention to actions with low Q-value.
During testing, we use this epsilon-greedy method, but with epsilon at a very low value, such that there is a strong bias towards exploitation over exploration, favoring choosing the action with the highest q-value over a random action. However, random actions are still sometimes chosen.
All this is because we want to eliminate the negative effects of over-fitting or under-fitting.
Using epsilon of 0 (always choosing the optimal action) is a fully exploitative choice. For example, consider a labyrinth game where the agent’s current Q-estimates are converged to the optimal policy except for one grid, where it greedily chooses to move toward a boundary (which is currently the optimal policy) that results in it remaining in the same grid, If the agent reaches any such state, and it is choosing the maximum Q-action, it will be stuck there. However, keeping a small epsilon factor in its policy allows it to get out of such states.
There wouldn't be much learning happening if you already knew what the best action was, right ? :)
ϵ-greedy is "on-policy" learning, meaning that you are learning the optimal ϵ-greedy policy, while exploring with an ϵ-greedy policy. You can also learn "off-policy" by selecting moves that are not aligned to the policy that you are learning, an example is exploring always randomly (same as ϵ=1).
I know this can be confusing at first, how can you learn anything if you just move randomly? The key bit of knowledge here is that the policy that you learn is not defined by how you explore, but by how you calculate the sum of future rewards (in the case of regular Q-Learning it's the max(Q[next_state]) piece in the Q-Value update).
This all works assuming you are exploring enough, if you don't try out new actions the agents will never be able to figure out which ones are the best ones in the first place.

How to handle uncertainty in position?

I am working on a car following problem and the measurements I am receiving are uncertain ( I know that the noise model is gaussian and it's variance is also known). How do I select my next action in such kind of uncertainty?
Basically how should I change my cost function so that I can optimize my plan by selecting appropriate action?
Vanilla reinforcement learning is meant for Markov decision processes, where it's assumed that you can fully observe the state. Because your states are noisy, you have a Partially observable Markov decision process. Theoretically speaking you should be looking at a different category of RL approaches.
Practically, since you have so much information about the parameters of the uncertainty, you should consider using a Kalman or particle filter to perform state estimation. Then, use the most likely state estimate as the true state in your RL problem. The estimate will be wrong at times, of course, but if you're using a function approximation approach for the value function, the experience can generalize across similar states and you'll be able to learn. The learning performance is going to be proportional to the quality of your state estimate.

Using 10-node tetrahedron, is strain continuous between neighbouing tetrahedons?

I'm trying to implementing a Finite Element Analysis algorithm. I solve K u = f to get the displacement u, and then calculate strain with u, then calculate the stress. Finally, I use the stress to calculate the Von Mises Stress, and visualize this. From the result I find the strain is not continuous between tetrahedrons.
I use 10 nodes tetrahedron as the element, so the displacement is a second-order polynomial in every element. The displacement should be enforced to be continuous between tetrahedrons. And the strain, which is the first order derivatives of the displacements should be continuous inside every tetrahedron. But I'm not sure: is this true across the interface between tetrahedrons?
Only the components of strain tangent to the adjoining face are guaranteed continuous.
This follows from the displacement continuity, when you take derivatives in the direction of the interface they are the same.
Commercial FEM programs typically do some post process averaging to make the other components look continuous. Note the strain components normal to an element boundary are only expected to be continuous if the underlying constitutive model is continuous, so such averaging is not always appropriate.
You should not compute the stress and strain at the nodes but inside the elements. You can choose for example 4 Gauss points and compute the values there. You then have to think about a scheme on how to get the values computed at the Gauss points onto the tet nodes.
There is a Mathematica application example which illustrates this. Unfortunately the web page is no longer available, but the notebooks are here. You'll find the example in the application example section under Finite Element Method, Structural Mechanics 3D (in the old HelpBrowser). If you have difficulties I could convert it to PDF and send it you.