Large number of training steps results in poor performance in transfer learning - deep-learning

I have a question. I have used transfer learning to retrain googlenet on my image classification problem. I have 80,000 images which belong to 14 categories. I set number of training steps equal to 200,000. I think the code provided by Tensorflow should have drop out implimented and it trains based on random shuffling of dataset and cross validation approach, and and I do not see any overfiting in training and classification curves, and I get high cross validation accuracy and high test accuracy but when I apply my model to new dataset then I get poor classification result. Anybodey know what is going on?Thanks!

Related

Ways to prevent underfitting and overfitting to when using data augmentation to train a transposed CNN

I'm training a CNN (one using a series of ConvTranspose2D in pytorch) that uses input data from JSON to constitute an image. Unlike natural language, the input data can be in any order, as it contains info about various sprites in a scene.
In my first attempts to train the model, I didn't change the order of the input data (meaning, on each epoch, each sprite was represented in the same place in the input data). The model learned for about 10 epochs, but then there started to be divergence between the training loss (which continued to go down) and the test loss. So classic overfitting.
I tried to solve this by doing a form of data augmentation where the output data (in this case an image) stayed the same but I shuffled the order of the input data. As I have around 400 sprites, the maximum shuffling is 400!, so theoretically this can vastly expand the amount of training data. For example, instead of 100k JSON documents corresponding to 100K images, by shuffling the order of sprites in the input data, you have 400!*100000 training data points. In practice of course this amount of data is impractical, so I went with around 2m data points for an initial test. The issue I ran into here was that the model was not learning at all - after getting to a certain loss very quickly (after the first few mini-batches), it didn't learn at all for around 4 epochs. So classic underfitting.
Like Goldilocks, I'd like to find "just right" between the initial overfitting and subsequent underfitting. I'm wondering other strategies I could try out. One idea I had was letting the model train on a predetermined order of sprites (the overfitting case) and then, once overfitting starts (ie two straight epochs with divergence between the test and training loss) shuffling the data. I can also play with changing the model, although it can only be so big because of constraints with the hardware and the fact that inference needs to happen in under 20ms.
Are there any papers or techniques that are recommended in this scenario where data augmentation can lead to vastly more data points but results in a model ceasing to learn? Thanks in advance for any tips!

Should I split the only dataset to a train and test or I can use whole of it for regression problem?

In Kaggle competitions, we have a train and test dataset. So we usually develop a model on the training dataset and evaluate it with a test dataset that is unseen for the algorithm. I was wondering what is the best method for validation of a regression problem if just one dataset is given to us without any test dataset. I think there might be two approaches:
At the first step, after importing the dataset, it is converted to train and test datasets, with this approach the test set will not see by the algorithm until the last step. After performing preprocessing and feature engineering, we can use cross-validation techniques on the training dataset or use train-test-split to improve the error of our model. Finally, the quality of the model can be checked by the unseen data.
Also, I saw that for regression problems, some data scientists use the whole dataset for testing and validation, I mean they use all the data at the same time.
Could you please help me with which strategy is better? Especially, when the recruiter gives us just a dataset and asks us to develop a model to predict the target variable.
Thanks,
Med
You must divide the Data set in to two parts : Training and validation datasets.
Then train your model on to the training data set. Validate the model on validation data set. The more data you have the better your model can be fitted. Quality checking of the model can be done with validation data set split earlier. You can also check the quality of your model by accuracy and scoring parameters.
When checking the quality of the model you can create your own custom data set which is similar to the values of the original data set.
When on Kaggle, the competition is about to be closed, they will release the actual test data set on which the result of the model is ranked.
The reason is that when you have more data, the algorithm will have more feature label pair to train and validate. This will increase the efficiency of the model.
Approach 2 described in the question is better.
Also, I saw that for regression problems, some data scientists use the
whole data set for testing and validation, I mean they use all the data
at the same time.
Approach one is not preferred as in a competitive platform your model has to perform better. So having lesser training and validation data can affect the accuracy.
Divide your One dataset into a Training dataset and Testing dataset.
While training your model divide your Training dataset into training, validation,and testing and run the model and check the accuracy & save the model.
Import the save model and predict the testing dataset.

Deep Learning: Is validation dataset used in training?

In supervised learning, original data is divided three part: training dataset, validation dataset and test dataset.
The training dataset is used to train a model.
The test dataset is used to evaluate the model finally, so is not used in training process.
The validation dataset is used for tuning parameters of the model while training, I think.
What I want to know is whether the validation dataset is used for training or not. Is it used for calculating weights and bias?
Yes as you said you use validation data for hyperparameter tuning. One more use of validation data is that it is used to check if you are overfitting on the training data
In supervised learning, the validation dataset is used during the training phase, but not in the same way the training dataset is used.
Since the goal is to get a model that can predict/classify new instances with high precision and/or accuracy, it's very important to minimize the error.
Thus, the training dataset is used to calculate the weights and biases of the neuronal network. And the validation dataset is used to calculate the error and adjust the weights/biases for the next training epoch. This is done by using the validation dataset instances to predict the label, compare it with the actual and calculate the precision.
Hope this helps you clarify this topic. You can also refer to some textbooks.

training small amount of data on the large capacity network

Currently I am using the convolutional neural networks to solve the binary classification problem. The data I use is 2D-images and the number of training data is only about 20,000-30,000. In deep learning, it is generally known that overfitting problems can arise if the model is too complex relative to the amount of the training data. So, to prevent overfitting, the simplified model or transfer learning is used.
Previous developers in the same field did not use high-capacity models (high-capacity means a large number of model parameters) due to the small amount of training data. Most of them used small-capacity models and transfer learning.
But, when I was trying to train the data on high-capacity models (based on ResNet50, InceptionV3, DenseNet101) from scratch, which have about 10 million to 20 million parameters in, I got a high accuracy in the test set.
(Note that the training set and the test set were exclusively separated, and I used early stopping to prevent overfitting)
In the ImageNet image classification task, the training data is about 10 million. So, I also think that the amount of my training data is very small compared to the model capacity.
Here I have two questions.
1) Even though I got high accuracy, is there any reason why I should not use a small amount of data on the high-capacity model?
2) Why does it perform well? Even if there is a (very) large gap between the amount of data and the number of model parameters, the techniques like early stopping overcome the problems?
1) You're completely right that small amounts of training data can be problematic when working with a large model. Given that your ultimate goal is to achieve a "high accuracy" this theoretical limitation shouldn't bother you too much if the practical performance is satisfactory for you. Of course, you might always do better but I don't see a problem with your workflow if the score on the test data is legit and you're happy with it.
2) First of all, I believe ImageNet consists of 1.X million images so that puts you a little closer in terms of data. Here are a few ideas I can think of:
Your problem is easier to solve than ImageNet
You use image augmentation to synthetically increase your image data
Your test data is very similar to the training data
Also, don't forget that 30,000 samples means (30,000 * 224 * 224 * 3 =) 4.5 billion values. That should make it quite hard for a 10 million parameter network to simply memorize your data.
3) Welcome to StackOverflow

Keras pass data through layers explicitly

I am trying to implement a Pairwise Learning to rank model with keras where features are being computed by deep neural network.
In the pairwise L2R model, while training, I am giving the query, one positive and one negative result. And it is trained on the classification loss by difference of feature vector.
I am able to do compile and fit model successfully but the problem is to actually use this model on test data.
As in Pairwise L2R model, at testing time I would have only query and sample pair (no separate negative and positives). And I can use the calculated value before softmax to rank samples.
Is there any way I can use keras to pass data manually at test time through particular trained layers. (In short I have 3 set of inputs at train time and 2 at testing time.)