First of all, thank you in advance for the answers to come. I am confused about the use of K-Fold Cross Validation (CV) with CNN.
When working with CV under normal conditions, as seen in the link below, the original dataset is first split as test and training.
https://miro.medium.com/max/875/1*pJ5jQHPfHDyuJa4-7LR11Q.png
Then, the training dataset is divided into training and validation in the K cycle according to the determined K value. In short, if we say K = 5, our training is repeated 5 times, and each time a newly trained model is formed.
Question 1: How can we calculate the overall training-validation accuracy and loss values of 5 different models. Do we need to add up and average the success of all models?
Question 2: We separated the TEST data set from the original data set at the beginning of the training. How can we test the TEST data set on 5 different models? Should we test on 5 models then get their average accuracy or should we test only on the most successful model?
Related
After following some tutorials on LSTM networks, I've decided to put my knowledge in practice by training a LSTM model on my own dataset.
Here is a view of my data:
As you can observe, I have same number of samples and labels.
Let's say that I have 10 samples and 10 labels for those samples and I want to split those samples in 2 timesteps.
After spliting I would have 5 samples, each having 2 timesteps, but I would still have 10 labels.
Am I right?
How you guys deal with this problem?
If I'm trying to feed the data in this form, I will get a "Data cardinality is ambiguous" exception.
In an LSTM, every input sequence has one one label (in the case of simple classification, at least). So in your case you would have your data be two samples of the position, and then a single label.
I am trying to use DL4J for deep learning and have provided the training data with the labels. I am then trying to send a test data by assigning a dummy label. Without providing a dummy label, it gives runtime error. I dont understand why we need to assign label to test data.
Additionally, I want to know what is the accuracy of the prediction made. From what I saw in the dl4j docs, there is something known as a confusion matrix which is generated. I understand that this just gives us an idea of how well the training data has trained the system. Is there a way to get the accuracy of prediction on test data? Since we are giving a dummy label for the test data, I feel that the confusion matrix is also not generated correctly.
First, how can you test if the network outputs the correct labels if you don't know what the correct labels are? You should always have a labels when training and testing because that way you can assert if the output is correct.
Second question, I've found this on dl4j webpage:
Evaluation eval = new Evaluation(3);
INDArray output = model.output(testData.getFeatures());
eval.eval(testData.getLabels(), output);
log.info(eval.stats());
There is stated that this .stats() method displays the confusion matrix entries (one per line), Accuracy, Precision, Recall and F1 Score. Additionally the Evaluation Class can also calculate and return the following values:
Confusion Matrix
False Positive/Negative Rate
True Positive/Negative
Class Counts
F-beta, G-measure, Matthews Correlation Coefficient and more
I hope this helps you.
You may find people who can respond to your question in the DL4J dev community here: https://gitter.im/deeplearning4j/deeplearning4j/tuninghelp
I am trying to construct a RNN to predict the possibility of a player playing the match along with the runs score and wickets taken by the player.I would use a LSTM so that performance in current match would influence player's future selection.
Architecture summary:
Input features: Match details - Venue, teams involved, team batting first
Input samples: Player roster of both teams.
Output:
Discrete: Binary: Did the player play.
Discrete: Wickets taken.
Continous: Runs scored.
Continous: Balls bowled.
Question:
Most often RNN uses "Softmax" or"MSE" in the final layers to process "a" from LSTM -providing only a single variable "Y" as output. But here there are four dependant variables( 2 Discrete and 2 Continuous). Is it possible to stitch together all four as output variables?
If yes, how do we handle mix of continuous and discrete outputs with loss function?
(Though the output from LSTM "a" has multiple features and carries the information to the next time-slot, we need multiple features at output for training based on the ground-truth)
You just do it. Without more detail on the software (if any) in use it is hard to give more detasmail
The output of the LSTM unit is at every times step on of the hidden layers of your network
You can then input it in to 4 output layers.
1 sigmoid
2 i'ld messarfound wuth this abit. Maybe 4x sigmoid(4 wickets to an innnings right?) Or relu4
3,4 linear (squarijng it is as lso an option,e or relu)
For training purposes your loss function is the sum of your 4 individual losses.
Since f they were all MSE you could concatenat your 4 outputs before calculating the loss.
But sincd the first is cross-entropy (for a decision sigmoid) yould calculate seperately and sum.
You can still concatenate them after to have a output vector
What exactly do kur test and kur evaluate differ?
The differences we see from console
(dlnd-tf-lab) ->kur evaluate mnist.yml
Evaluating: 100%|████████████████████████████| 10000/10000 [00:04<00:00, 2417.95samples/s]
LABEL CORRECT TOTAL ACCURACY
0 949 980 96.8%
1 1096 1135 96.6%
2 861 1032 83.4%
3 868 1010 85.9%
4 929 982 94.6%
5 761 892 85.3%
6 849 958 88.6%
7 935 1028 91.0%
8 828 974 85.0%
9 859 1009 85.1%
ALL 8935 10000 89.3%
Focus on one: /Users/Natsume/Downloads/kur/examples
(dlnd-tf-lab) ->kur test mnist.yml
Testing, loss=0.458: 100%|█████████████████████| 3200/3200 [00:01<00:00, 2427.42samples/s]
Without understanding the source codes behind kur test and kur evaluate, how can we understand what exactly do they differ?
#ajsyp the developer of Kur (deep learning library) provided the following answer, which I found to be very helpful.
kur test is used when you know what the "correct answer" is, and you
simply want to see how well your model performs on a held-out sample.
kur evaluate is pure inference: it is for generating results from
your trained model.
Typically in machine learning you split your available data into 3
sets: training, validation, and testing (people sometimes call these
different things, just so you're aware). For a particular model
architecture / selection of model hyperparameters, you train on the
training set, and use the validation set to measure how well the model
performs (is it learning correctly? is it overtraining? etc). But you
usually want to compare many different model hyperparameters: maybe
you tweak the number of layers, or their size, for example.
So how do you select the "best" model? The most naive thing to do is
to pick the model with the lowest validation loss. But then you run
the risk of optimizing/tweaking your model to work well on the
validation set.
So the test set comes into play: you use the test set as a very final,
end of the day, test of how well each of your models is performing.
It's very important to hide that test set for as long as possible,
otherwise you have no impartial way of knowing how good your model is
or how it might compare to other models.
kur test is intended to be used to run a test set through the model
to calculate loss (and run any applicable hooks).
But now let's say you have a trained model, say an image recognition
model, and now you want to actually use it! You get some new data (you
probably don't even have "truth" labels for them, just the raw
images), and you want the model to classify the images. That's what
kur evaluate is for: it takes a trained model and uses it "in
production mode," where you don't have/need truth values.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What is out of bag error in Random Forests?
Is it the optimal parameter for finding the right number of trees in a Random Forest?
I will take an attempt to explain:
Suppose our training data set is represented by T and suppose data set has M features (or attributes or variables).
T = {(X1,y1), (X2,y2), ... (Xn, yn)}
and
Xi is input vector {xi1, xi2, ... xiM}
yi is the label (or output or class).
summary of RF:
Random Forests algorithm is a classifier based on primarily two methods -
Bagging
Random subspace method.
Suppose we decide to have S number of trees in our forest then we first create S datasets of "same size as original" created from random resampling of data in T with-replacement (n times for each dataset). This will result in {T1, T2, ... TS} datasets. Each of these is called a bootstrap dataset. Due to "with-replacement" every dataset Ti can have duplicate data records and Ti can be missing several data records from original datasets. This is called Bootstrapping. (en.wikipedia.org/wiki/Bootstrapping_(statistics))
Bagging is the process of taking bootstraps & then aggregating the models learned on each bootstrap.
Now, RF creates S trees and uses m (=sqrt(M) or =floor(lnM+1)) random subfeatures out of M possible features to create any tree. This is called random subspace method.
So for each Ti bootstrap dataset you create a tree Ki. If you want to classify some input data D = {x1, x2, ..., xM} you let it pass through each tree and produce S outputs (one for each tree) which can be denoted by Y = {y1, y2, ..., ys}. Final prediction is a majority vote on this set.
Out-of-bag error:
After creating the classifiers (S trees), for each (Xi,yi) in the original training set i.e. T, select all Tk which does not include (Xi,yi). This subset, pay attention, is a set of boostrap datasets which does not contain a particular record from the original dataset. This set is called out-of-bag examples. There are n such subsets (one for each data record in original dataset T). OOB classifier is the aggregation of votes ONLY over Tk such that it does not contain (xi,yi).
Out-of-bag estimate for the generalization error is the error rate of the out-of-bag classifier on the training set (compare it with known yi's).
Why is it important?
The study of error estimates for bagged classifiers in Breiman
[1996b], gives empirical evidence to show that the out-of-bag estimate
is as accurate as using a test set of the same size as the training
set. Therefore, using the out-of-bag error estimate removes the need
for a set aside test set.1
(Thanks #Rudolf for corrections. His comments below.)
In Breiman's original implementation of the random forest algorithm, each tree is trained on about 2/3 of the total training data. As the forest is built, each tree can thus be tested (similar to leave one out cross validation) on the samples not used in building that tree. This is the out of bag error estimate - an internal error estimate of a random forest as it is being constructed.