I want to split my data in 3 partitions training, validation and test: 70% training, 15% validation and 15% test for regression. Python provides a way to do that only for training and testing by cross_validation.train_test_split. Any Ideas?
Use cross_validation.train_test_split, 2 times.
First with (70,30) => (training, validation_test) and secondly use (50,50) -> (validation,test).
Related
I trained an image classification model of 10 classes by finetuning EfficientNet-B4 for 100 epochs. I split my training data to 70/30. I used stochastic gradient descent with Nesterov momentum of 0.9 and the starting learning rate is 0.001. The batch size is 10. The test loss seemed to stuck at 84% for the next 50 epochs (51st - 100th). I do not know whether the model was stuck in local minima or the model was overfitted. Below is an image of the test and train loss from 51st epoch to 100th. I need your help a lot. Thanks. Train test loss image from 51st to 100th epoch.
From the graph you provided, both validation and training losses are still going down so your model is still training and there is no overfit. If your test set is stuck at the same accuracy, the reason is probably that the data you are using for your training/validation dataset does not generalize well enough on your test dataset (in your graph the validation only reached 50% accuracy while your test set reached 84% accuracy).
I looked into your training and validation graph. yes, your model is training and the losses are going down, but your validation error is near 50%, which means 'random guess'.
Possible reasons-
1- From your train error (which is presented in the image between 50-100 epoch), the error in average is going down, but it's random. like your error at epoch 100 is pretty much the same at epoch 70. This could be because your either dataset is too simple and you are forcing huge network like an efficient net to overfit it.
2- it could also be because of the way you are finetuning it, there could be any problem. like which all layers you froze and for which layer you are taking the gradients while doing BP. I am assuming you are using pre-trained weights.
3- Optimizer issue. try to use Adam
It would be great if you can provide total losses (from epoch 1 - 100).
I am newby on data science and would like to ask for help of model selection.
I have built 8 models to predict Salary vs year exp, position name and location.
Then, I tried to compare 8 models by RMSE. But finally, I am not sure that which model I should select. (In m mind, I prefer model 8 because after test with random forest, the result is better than Regression, then I have used all data set to make final version but it is more difficult to interpret coef than regression)
Can you help which model do you prefer and why?
And in reality, do data scientist do the process like this or they have automatic way to deal with?
1 RMSElm1 : model: linear regression, data: Train 80%, test 20% No any imputation
= 22067.58
2 RMSElm2:model: linear regression, data: Train 80%, test 20%: Imputation some locations which I think they give the same idea of salary
= 22115.64
3 RMSElm3: model: linear regression+ Stepwise, data: Train 80%, test 20% No any imputation
= 22081.06
4 RMSEdeep1: model: Deep learning (H2O package activation = 'Rectifier', hidden c(5,5),epochs = 100,), data: Train 80%, test 20%: No any imputation
= 16265.13
5 RMSErf1: model: Random forest (ntree =10),data: Train 80%, test 20% No any imputation
= 14669.92
6 RMSErf2: model: Random forest (ntree =500),data: Train 80%, test 20% No any imputation
[1] 14669.92
7 RMSErf3: model: Random forest (ntree =10,)data: K-Fold 10 No any imputation
[1] 14440.82
8 RMSErf4 model: Random forest (ntree =10),data: all dataset No any imputation
[1] 13532.74
In regression problems, mse or rmse is a way to identify how good your model is doing. Low rmse or mse is preferred. So, go with the model which gives the lowest mse or rmse value and try it on test data. Ensemble methods often give the best results. XGBoost is often used in competitions.
There might be a case of overfitting where you might get very low rmse in training data but high rmse in test data. Thus, it is considered a good practice to use cross-validation.
You might want to check it: https://stats.stackexchange.com/questions/56302/what-are-good-rmse-values
I am training a Deep CNN on a very unbalanced data set for a binary classification problem. I have 90% 0's and 10% 1's. To penalize the misclassification of 1, I am using a class_weight that was determined by sklearn's compute_class_weight(). In the validation tuple passed to the fit_generator(), I am using a sample_weight that was computed by sklearn's compute_sample_weight().
The network seems to be learning fine but the validation accuracy continues to be 90% or 10% after every epoch. How can I solve this data unbalance issue in Keras considering the steps I have already taken to overcome it?
Picture of fit_generator: fit_generator()
Picture of log outputs: log outputs
It's ver y strange that your val_accuracy jumps from 0.9 to 0.1 and back. Do you have right learning rate? Try to lower it even more.
And my advice: use f1 metric also.
How did you split the data - train set classes have the same rate in test set?
What exactly do kur test and kur evaluate differ?
The differences we see from console
(dlnd-tf-lab) ->kur evaluate mnist.yml
Evaluating: 100%|████████████████████████████| 10000/10000 [00:04<00:00, 2417.95samples/s]
LABEL CORRECT TOTAL ACCURACY
0 949 980 96.8%
1 1096 1135 96.6%
2 861 1032 83.4%
3 868 1010 85.9%
4 929 982 94.6%
5 761 892 85.3%
6 849 958 88.6%
7 935 1028 91.0%
8 828 974 85.0%
9 859 1009 85.1%
ALL 8935 10000 89.3%
Focus on one: /Users/Natsume/Downloads/kur/examples
(dlnd-tf-lab) ->kur test mnist.yml
Testing, loss=0.458: 100%|█████████████████████| 3200/3200 [00:01<00:00, 2427.42samples/s]
Without understanding the source codes behind kur test and kur evaluate, how can we understand what exactly do they differ?
#ajsyp the developer of Kur (deep learning library) provided the following answer, which I found to be very helpful.
kur test is used when you know what the "correct answer" is, and you
simply want to see how well your model performs on a held-out sample.
kur evaluate is pure inference: it is for generating results from
your trained model.
Typically in machine learning you split your available data into 3
sets: training, validation, and testing (people sometimes call these
different things, just so you're aware). For a particular model
architecture / selection of model hyperparameters, you train on the
training set, and use the validation set to measure how well the model
performs (is it learning correctly? is it overtraining? etc). But you
usually want to compare many different model hyperparameters: maybe
you tweak the number of layers, or their size, for example.
So how do you select the "best" model? The most naive thing to do is
to pick the model with the lowest validation loss. But then you run
the risk of optimizing/tweaking your model to work well on the
validation set.
So the test set comes into play: you use the test set as a very final,
end of the day, test of how well each of your models is performing.
It's very important to hide that test set for as long as possible,
otherwise you have no impartial way of knowing how good your model is
or how it might compare to other models.
kur test is intended to be used to run a test set through the model
to calculate loss (and run any applicable hooks).
But now let's say you have a trained model, say an image recognition
model, and now you want to actually use it! You get some new data (you
probably don't even have "truth" labels for them, just the raw
images), and you want the model to classify the images. That's what
kur evaluate is for: it takes a trained model and uses it "in
production mode," where you don't have/need truth values.
So basically one splits the database in training/testing. Let's say 2/3 training and the rest is set for testing.
Then in caffe we split our training data in batches of different sizes, let's say that we have 100 batches of 50 images each, so we have 5000 training images. Now let's say that we have 50 testing batches of 50 images each.
Now let' say that caffe did 1 epoch and then test with the testing batches. How does caffe do this?
It takes first training batch and with it, it tries to predict the labels of every testing batch?
Like:
training_batch_1 : testing_batch_1 = accuracy xxxx;
training_batch_1 : testing_batch_2 = accuracy xxxx;
....
training_batch_1 : testing_batch_50 = accuracy xxxx;
And then it extract the mean accuracy for training_batch_1. Then does the same thing with training_batch_2 and so on?
A test simply runs the input vector through a single forward pass of the trained model. Does the top predicted label match the given test value? If so, score 1 point. At the end of the batch, divide total points by batch size, and that's the batch accuracy.
At the end of the testing run, take the mean of the batch accuracies; that's the testing accuracy.
Is that what you needed to know?