LSTM: Diffrent y size after splitting X on timesteps - deep-learning

After following some tutorials on LSTM networks, I've decided to put my knowledge in practice by training a LSTM model on my own dataset.
Here is a view of my data:
As you can observe, I have same number of samples and labels.
Let's say that I have 10 samples and 10 labels for those samples and I want to split those samples in 2 timesteps.
After spliting I would have 5 samples, each having 2 timesteps, but I would still have 10 labels.
Am I right?
How you guys deal with this problem?
If I'm trying to feed the data in this form, I will get a "Data cardinality is ambiguous" exception.

In an LSTM, every input sequence has one one label (in the case of simple classification, at least). So in your case you would have your data be two samples of the position, and then a single label.

Related

Using cnn models trained on different datasets

I am having 3 different datasets, 3 of them were all blood smear image stained with the same chemical substance. Blood smear images are images that capture your blood, include Red, White blood cells inside.
The first dataset contain 2 classes : normal vs blood cancer
The second dataset contain 2 classes: normal vs blood infection
The third dataset contain 2 classes: normal vs sickle cell disease
So, what i want to do is : when i input a blood smear image, the AI system will tell me whether it was : normal , or blood cancer or blood infection or sickle cell disease (4 classes classification task)
What should i do?
Should i mix these 3 datasets and train only 1 model to detect 4 classes ?
Or should i train 3 different models and them combine them? If yes, what method should i use to combine?
Update : i searched for a while. Can this task called "Learning without forgetting?"
I think it depends on the data.
You may use three different models and make three binary predictions on each image. So you get a vote (probability) for each x vs. normal. If binary classifications are accurate, this should deliver okay results. But you kind of get a cummulated missclassification or error in this case.
If you can afford, you can train a four class model and compare the test error to the series of binary classifications. I understand that you already have three models. So training another one may be not too expensive.
If ONLY one of the classes can occur, a four class model might be the way to go. If in fact two (or more) classes can occur jointly, a series of binary classifications would make sense.
As #Peter said it is totally data dependent. If the images of the 4 classes, namely normal ,blood cancer ,blood infection ,sickle cell disease are easily distinguishable with your naked eyes and there is no scope of confusion among all the classes then you should simply go for 1 model which gives out probabilities of all the 4 classes(as mentioned by #maxi marufo). If there is confusion between classes and the images are NOT distinguishable with naked eyes or there is a lot of scope of confusion between the classes then you should use 3 different models but then you'll need. You simply get the predicted probabilities from all the 3 models suppose p1(normal) and p1(c1), p2(normal) and p2(c2), p3(normal) and p3(c3). Now you can average(p1(normal),p2(normal),p3(normal)) and the use a softmax for p(normal), p1(c1), p2(c2), p3(c3) . Out of multiple ways you could try, the above could be one.
This is a multiclass classification problem. You can train just one model, with the final layer being a full connected (dense) layer of 4 units (i.e. output dimension) and softmax activation function.

Machine Learning for gesture recognition with Myo Armband

I'm trying to develop a model to recognize new gestures with the Myo Armband. (It's an armband that possesses 8 electrical sensors and can recognize 5 hand gestures). I'd like to record the sensors' raw data for a new gesture and feed it to a model so it can recognize it.
I'm new to machine/deep learning and I'm using CNTK. I'm wondering what would be the best way to do it.
I'm struggling to understand how to create the trainer. The input data looks like something like that I'm thinking about using 20 sets of these 8 values (they're between -127 and 127). So one label is the output of 20 sets of values.
I don't really know how to do that, I've seen tutorials where images are linked with their label but it's not the same idea. And even after the training is done, how can I avoid the model to recognize this one gesture whatever I do since it's the only one it's been trained for.
An easy way to get you started would be to create 161 columns (8 columns for each of the 20 time steps + the designated label). You would rearrange the columns like
emg1_t01, emg2_t01, emg3_t01, ..., emg8_t20, gesture_id
This will give you the right 2D format to use different algorithms in sklearn as well as a feed forward neural network in CNTK. You would use the first 160 columns to predict the 161th one.
Once you have that working you can model your data to better represent the natural time series order it contains. You would move away from a 2D shape and instead create a 3D array to represent your data.
The first axis shows the number of samples
The second axis shows the number of time steps (20)
The thirst axis shows the number of sensors (8)
With this shape you're all set to use a 1D convolutional model (CNN) in CNTK that traverses the time axis to learn local patterns from one step to the next.
You might also want to look into RNNs which are often used to work with time series data. However, RNNs are sometimes hard to train and a recent paper suggests that CNNs should be the natural starting point to work with sequence data.

How can we define an RNN - LSTM neural network with multiple output for the input at time "t"?

I am trying to construct a RNN to predict the possibility of a player playing the match along with the runs score and wickets taken by the player.I would use a LSTM so that performance in current match would influence player's future selection.
Architecture summary:
Input features: Match details - Venue, teams involved, team batting first
Input samples: Player roster of both teams.
Output:
Discrete: Binary: Did the player play.
Discrete: Wickets taken.
Continous: Runs scored.
Continous: Balls bowled.
Question:
Most often RNN uses "Softmax" or"MSE" in the final layers to process "a" from LSTM -providing only a single variable "Y" as output. But here there are four dependant variables( 2 Discrete and 2 Continuous). Is it possible to stitch together all four as output variables?
If yes, how do we handle mix of continuous and discrete outputs with loss function?
(Though the output from LSTM "a" has multiple features and carries the information to the next time-slot, we need multiple features at output for training based on the ground-truth)
You just do it. Without more detail on the software (if any) in use it is hard to give more detasmail
The output of the LSTM unit is at every times step on of the hidden layers of your network
You can then input it in to 4 output layers.
1 sigmoid
2 i'ld messarfound wuth this abit. Maybe 4x sigmoid(4 wickets to an innnings right?) Or relu4
3,4 linear (squarijng it is as lso an option,e or relu)
For training purposes your loss function is the sum of your 4 individual losses.
Since f they were all MSE you could concatenat your 4 outputs before calculating the loss.
But sincd the first is cross-entropy (for a decision sigmoid) yould calculate seperately and sum.
You can still concatenate them after to have a output vector

How to deal with ordinal labels in keras?

I have data with integer target class in the range 1-5 where one is the lowest and five the highest. In this case, should I consider it as regression problem and have one node in the output layer?
My way of handling it is:
1- first I convert the labels to binary class matrix
labels = to_categorical(np.asarray(labels))
2- in the output layer, I have five nodes
main_output = Dense(5, activation='sigmoid', name='main_output')(x)
3- I use 'categorical_crossentropy with mean_squared_error when compiling
model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['mean_squared_error'],loss_weights=[0.2])
Also, can anyone tells me: what is the difference between using categorical_accuracy and 'mean_squared_error in this case?
Regression and classification are vastly different things. If you reimagine this as a regression task than the difference of predicting 2 when the ground truth is 4 will be rated more than if you predict 3 instead of 4. If you have class like car, animal, person you do not care for the ranking between those classes. Predicting car is just as wrong as animal, iff the image shows a person.
Metrics do not impact your learning at all. It is just something that is computed additionally to the loss to show the performance of the model. Here the accuracy makes sense, because this is mostly the metric that we care about. Mean squared error does not tell you how well your model performs. If you get something like 0.0015 mean squared error it sounds good, but it is hard to visualize just how well this performs. In contrast using accuracy and achieving 95% accuracy for example is meaningful.
One last thing you should use softmax instead of sigmoid as your final output to get a probability distribution in your final layer. Softmax will output percentages for every class that sum up to 1. Then crossentropy calculates the difference of the probability distribution of your network output and the ground truth.

Why Caffe's Accuracy layer's bottoms consist of InnerProduct and Label?

I'm new to caffe, in the MNIST example, I was thought that label should compared with softmax layers, but it was not the situation in lenet.prototxt.
I wonder why use InnerProduct result and label to get the accuracy, it seems unreasonable. Was it because I missed something in the layers?
The output dimension of the last inner product layer is 10, which corresponds to the number of classes (digits 0~9) of your problem.
The loss layer, takes two blobs, the first one being the prediction(ip2) and the second one being the label provided by the data layer.
loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip2,:label])
It does not produce any outputs - all it does is to compute the loss function value, report it when backpropagation starts, and initiates the gradient with respect to ip2. This is where all magic starts.
After training ( in TEST phase), In the last layer desire results come from multiply weights and ip1( that are computed in last layer ); And each of class( one of 10 neurons) has max value is choosen.