Applying a Kalman filter on a leg follower robot - kalman-filter

I was asked to create a leg follower robot (I already did it) and in the second part of this assignment I have to develop a Kalman filter in order to improve the following process of the robot. The robot gets from the person the distance where she is to the robot and also the angle (it is a relative angle, because the reference is the robot itself, not absolute x-y coordinates)
About this assignment I have a serious doubt. Everything I have read, every sample I have seen about kalman filter has been in one dimension (a car running distance or a rock falling from a building) and according to the task I would have to apply it in 2 dimensions. Is it possible to apply a kalman filter like this?
If it is possible to calculate kalman filter in 2 dimensions then I would understand that what is asked to do is to follow the legs in a linnearized way, despite a person walks weirdly (with random movements) --> About this I have the doubt of how to establish the function of the state matrix, could anyone please tell me how to do it or to tell me where I can find more information about this?
thanks.

Well you should read up on Kalman Filter. Basically what it does is estimate a state through its mean and variance separately. The state can be whatever you want. You can have local coordinates in your state but also global coordinates.
Note that the latter will certainly result in nonlinear system dynamics, in which case you could use the Extended Kalman Filter, or to be more correct the continuous-discrete Kalman Filter, where you treat the system dynamics in a continuous manner and the measurements in discrete time.
Example with global coordinates:
Assuming you have a small cubic mass which can drive forward with velocity v. You could simply model the dynamics in local coordinates only, where your state s would be s = [v], which is a linear model.
But, you could also incorporate the global coordinates x and y, assuming we are moving on a plane only. Then you would have s = [x, y, phi, v]'. We need phi to keep track of the current orientation since the cube can only move forward in respect to its orientation of course. Let's define phi as the angle between the cube's forward direction and the x-axis. Or in other words: With phi=0 the cube would move along the x-axis, with phi=90° it would move along the y-axis.
The nonlinear system dynamics with global coordinates can then be written as
s_dot = [x_dot, y_dot, phi_dot, v_dot]'
with
x_dot = cos(phi) * v
y_dot = sin(phi) * v
phi_dot = ...
v_dot = ... (Newton's Law)
In EKF (Extended Kalman Filter) Prediction step you would use the (discretized) equations above to predict the mean of the state in the first step of and the linearized (and discretized) equations for prediction of the Variance.
There are two things to keep in mind when you decide what your state vector s should look like:
You might be tempted to use my linear example s = [v] and then integrate the velocity outside of the Kalman Filter in order to obtain the global coordinate estimates. This would work, but you would lose the awesomeness of the Kalman Filter since you would only integrate the mean of the state, not its variance. In other words, you would have no idea what the current uncertainties for your global coordinates are.
The second step of the Kalman Filter, the measurement or correction update, requires that you can describe your sensor output as a function of your states. So you may have to add states to your representation just so that you can express your measurements correctly as z[k] = h(s[k], w[k]) where z are measurements and w is a noise vector with Gaussian distribution.

Related

Why W_q matrix in torch.nn.MultiheadAttention is quadratic

I am trying to implement nn.MultiheadAttention in my network. According to the docs,
embed_dim  – total dimension of the model.
However, according to the source file,
embed_dim must be divisible by num_heads
and
self.q_proj_weight = Parameter(torch.Tensor(embed_dim, embed_dim))
If I understand properly, this means each head takes only a part of features of each query, as the matrix is quadratic. Is it a bug of realization or is my understanding wrong?
Each head uses a different part of the projected query vector. You can imagine it as if the query gets split into num_heads vectors that are independently used to compute the scaled dot-product attention. So, each head operates on a different linear combination of the features in queries (and keys and values, too). This linear projection is done using the self.q_proj_weight matrix and the projected queries are passed to F.multi_head_attention_forward function.
In F.multi_head_attention_forward, it is implemented by reshaping and transposing the query vector, so that the independent attentions for individual heads can be computed efficiently by matrix multiplication.
The attention head sizes are a design decision of PyTorch. In theory, you could have a different head size, so the projection matrix would have a shape of embedding_dim × num_heads * head_dims. Some implementations of transformers (such as C++-based Marian for machine translation, or Huggingface's Transformers) allow that.

Determining the values of the filter matrices in a CNN

I am getting started with deep learning and have a basic question on CNN's.
I understand how gradients are adjusted using backpropagation according to a loss function.
But I thought the values of the convolving filter matrices (in CNN's) needs to be determined by us.
I'm using Keras and this is how (from a tutorial) the convolution layer was defined:
classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
There are 32 filter matrices with dimensions 3x3 is used.
But, how are the values for these 32x3x3 matrices are determined?
It's not the gradients that are adjusted, the gradient calculated with the backpropagation algorithm is just the group of partial derivatives with respect to each weight in the network, and these components are in turn used to adjust the network weights in order to minimize the loss.
Take a look at this introductive guide.
The weights in the convolution layer in your example will be initialized to random values (according to a specific method), and then tweaked during training, using the gradient at each iteration to adjust each individual weight. Same goes for weights in a fully connected layer, or any other layer with weights.
EDIT: I'm adding some more details about the answer above.
Let's say you have a neural network with a single layer, which has some weights W. Now, during the forward pass, you calculate your output yHat for your network, compare it with your expected output y for your training samples, and compute some cost C (for example, using the quadratic cost function).
Now, you're interested in making the network more accurate, ie. you'd like to minimize C as much as possible. Imagine you want to find the minimum value for simple function like f(x)=x^2. You can start at some random point (as you did with your network), then compute the slope of the function at that point (ie, the derivative) and move down that direction, until you reach a minimum value (a local minimum at least).
With a neural network it's the same idea, with the difference that your inputs are fixed (the training samples), and you can see your cost function C as having n variables, where n is the number of weights in your network. To minimize C, you need the slope of the cost function C in each direction (ie. with respect to each variable, each weight w), and that vector of partial derivatives is the gradient.
Once you have the gradient, the part where you "move a bit following the slope" is the weights update part, where you update each network weight according to its partial derivative (in general, you subtract some learning rate multiplied by the partial derivative with respect to that weight).
A trained network is just a network whose weights have been adjusted over many iterations in such a way that the value of the cost function C over the training dataset is as small as possible.
This is the same for a convolutional layer too: you first initialize the weights at random (ie. you place yourself on a random position on the plot for the cost function C), then compute the gradients, then "move downhill", ie. you adjust each weight following the gradient in order to minimize C.
The only difference between a fully connected layer and a convolutional layer is how they calculate their outputs, and how the gradient is in turn computed, but the part where you update each weight with the gradient is the same for every weight in the network.
So, to answer your question, those filters in the convolutional kernels are initially random and are later adjusted with the backpropagation algorithm, as described above.
Hope this helps!
Sergio0694 states ,"The weights in the convolution layer in your example will be initialized to random values". So if they are random and say I want 10 filters. Every execution algorithm could find different filter. Also say I have Mnist data set. Numbers are formed of edges and curves. Is it guaranteed that there will be a edge filter or curve filter in 10?
I mean is first 10 filters most meaningful most distinctive filters we can find.
best

Obtaining multiple output in regression using deep learning

Given an RGB image of hand and 3d position of the keypoints of the hand as dataset, I want to do this as regression problem in DL. In this case input will be the RGB image, and output should be estimated 3d position of keypoints.
I have seen some info about regression but most of them are trying to estimate one single value. Is it possible to estimate multiple values(or output) all at once?
For now I have referred to this code. This guy is trying to estimate the age of a person in the image.
The output vector from a neural net can represent anything as long as you define loss function well. Say you want to detect (x,y,z) co-ordinates of 10 keypoints, then just have 30 element long output vector say (x1,y1,z1,x2,y2,z2..............,x10,y10,z10), where xi,yi,zi denote coordinates of ith keypoint, basically you can use any order you feel convenient with. Just be careful with your loss function. Say you want to calculate RMSE loss, you would have to extract tripes correctly and then calculate RMSE loss for each keypoint, or if you are fimiliar with linear algebra, just reshape it into a 3x10 matrix correctly and and have your results also as a 3x10 matrix and then just use
loss = tf.sqrt(tf.reduce_mean(tf.squared_difference(Y1, Y2)))
But once you have formulated your net you will have to stick to it.

Kalman Filter corrected by known path

I am trying to get filtered velocity/spacial data from noisy position data from a tracked vehicle. I have a set of noisy position/time data = (x_i,y_i,t_i) and a known curve along which the vehicle is traveling, curve = (x(s),y(s)), where s is total distance along the curve. I can run a Kalman filter on the data, but I don't know how to constrain it to the 'road' without throwing out data that is too far from the road, which I don't want to do.
Alternately, I'm trying to estimate the value of s along the constrained path with position data that is noisy in x and y
Does anyone have an idea of how to merge the two types of data?
Thanks!
Do you understand what a Kalman filter does? Fundamentally, it assigns a probability to each possible state given just observables. In simple cases, this doesn't use a priori knowledge. But in your case, you can simply set the off-road estimates to zero and renormalizing the remaining probabilities.
Note: this isn't throwing out observables which are too far off the road, or even discarding outcomes which are too far off. It means that an apparent off-road position strongly increases the probabilities of an outcome on, but near the edge of the road.
If you want the model to allow small excursions away from the road, you can use a fast decaying function to model the low but non-zero probability of a car being off the road.
You could have as states the distance s along the path, and the rate of change of s. The position observations X and Y will then be non-linear functions of the state (assuming your track is not a line) so you'll need to use an extended or unscented filter.

Kalman filter used in IMU , what signals does the fusion process combine?

from what I read, Kalman filter basically tries to "reconcile" the predictions for one variable based on history of this variable, with actual observation of this variable. in the case of finding the position of an IMU , I would imagine that we need the speed read out, so that we can predict x_(k+1) = v * dt + x_(k) , and we would also need the direct read out for z_(k+1).
but in fact on a IMU we don't have this z read out. so what exactly does kalman filtering on IMU do ?
thanks
Yang
ok, after reading more on related resources, i figured out:
the "baseline" report is that vertical vector for earth gravity, measured by the 3-axis accelerometer, which really measures force, either force induced by gravity or force induced by acceleration; noise(including the "noise" introduced by actual horizontal acceleration) is repaired by using gyro's rotation rate readout , in a Kalman filter.
heading is obtained from earth magnetic field , through compass (only on the horizontal plane, compass doesn't tell you the tilt away from earth z-axis). again Kalman filter uses the gyro rotation rate to repair the compass read out.
once you have the earth vertical axis and north axis, you have your full orientation