enter image description hereI encountered one perplexing situation: I trained a Transformer model for conditional music generation, finding the training loss decrease while the validation loss keeps increasing from the beginning, which is quite strange.
I attached a tensorboard screenshot for your reference. Here is the data allocation of about 2.7k songs: train:valid:test = 7:2:1.
I even try changing the learning rate from 0.01 to 0.00001, resulting nonetheless the same situation. On this above, shall I use more data for training or are there any other possible solutions regarding this?
Related
I am working on reproducing the results reported in this paper. A UNET based network is used for estimating sound speed map from raw Ultrasound channel data. I have been stuck in further reducing the train/val loss for a long time.
Basically, I followed their methods of data simulation, preprocessing and used the same network architecture, hyperparameters (include kernel initializer, batch size, decay rate, etc.). The input size is 128*1024 rather than 192*2048 according to my ultrasound probe (based on their recent paper, the input size won't affect the performance).
So my question is do you have any suggestion to further investigate this problem based on your experience?
I attached my results (loss curves and images).
RMSE loss
estimated sound speed
And their results
RMSE loss
estimated sound speed
It seems my network failed to have a comparative convergence at the background, that could explain that I got a larger initial loss.
PS: Unfortunately, the paper didn't provide codes, so I have no clue to some details in terms of data simulation and training. I have contacted the author but hasn't got response yet.
The author mentioned somewhere instead of using a pixel-wise MSE, try a larger window size 3*3 or 5*5, I am not clear whether it is used for training or metric evaluation, any reference for the former?
I am asking this question because I noticed in competitions people tend to minimize the loss to 0. I have an "image binary classification " problem and I already achieved the binary_crossentropy_loss to 0.003 with a "train from scratch" transfer learning model. How can I further reduce it to 0? Should I fine-tune the model again or should I go back to do image feature engineering?
Additionally, according to the picture here, I suppose I encountered "vanished gradient" instead of "overfitting". If so, what should I do on the next step?
Thank you!
Since you are trying to perform image binary classification, if you can minimize both your training and validation loss to 0, that basically means your network is 'perfectly' trained to recognize all the validation images by using just the training images. When this happens, I think it's better for you to get 'harder' data for your network to learn.
From your image, I think you should continue training your model for more epochs, since val_loss does not seem to converge yet; as a result, there are no indications of 'overfitting'.
Regarding 'vanished gradient', it's not possible to tell from your picture since the common sign of vanishing gradients is weights dying down to 0. To check for this problem, I think you should keep track of the weights distribution of your model in addition to the losses.
I have a dataset of around 6K chemical formulas which I am preprocessing via Keras' tokenization to perform binary classification. I am currently using a 1D convolutional neural network with dropouts and am obtaining an accuracy of 82% and validation accuracy of 80% after only two epochs. No matter what I try, the model just plateaus there and doesn't seem to be improving at all. Those same exact accuracies are reached with a vanilla LSTM too. What else can I try to improve my accuracies? Losses only have a difference of 0.04... Anyone have any ideas? Both models use an embedding layer and changing the output dimension isn't having an effect either.
According to your answer, I believe your model has a high bias and low variance (see this link for further details). Thus, your model is not fitting your data very well and it is causing underfitting. So, I suggest you 3 things:
Train your model a little longer: I believe two epoch are too few to give a chance to your model understand the patterns in the data. Try to minimize learning rate and increase the number of epochs.
Try a different architecture: you may change the amount of convolutions, filters and layers, You can also use different activation functions and other layers like max pooling.
Make an error analysis: once you finished your training, apply your model to test set and take a look into the errors. How much false positives and false negatives do you have? Is your model better to classify one class than the other? You can see a pattern in the errors that may be related to your data?
Finally, if none of these suggestions helped you, you may also try to increase the number of features, if possible.
all experts
I am new in CNN and Caffe. I have a task in classification between 2 classes. The data set that I have collected is very small about 50 for class A and 50 for class B (I know that it is very very small). It is a human images.
I took the BVLC model and made a change such as Batch size for testing and training and also the maximum iteration. I try with many various setup, but the model doesn't work.
Any idea about how to come up with appropriate model or setting or other solutions ?
remark** I once randomly made a change to the BVLC model setup and it worked, but i lost the set up file.
For the train.prototxt and Solve.prototxt, I get it from this guy Adil Moujahid
I did try training batch size as 32,64,128,256 and testing for 5,20,30 but failed
For the data set, it is images of normal women and beautiful women and i will classify it, but Stackoverflow does not allowed me to add more than 2 links
I wonder that is there any formula , equation or steps that I can come up with and choose the right model setting.
Thank you in advance.
What is your meaning in "doesn't work"? Loss stays too high? Training is converged, but accuracy is low? Andrew Ng has an excellent session on "debugging" CNNs - Nuts and Bolts of Building Applications using Deep Learning (NIPS slides, summary, additional summary).
My humble guess is that your network has an overfitting problem - it learns the specific examples and can't generalize - so increasing the train dataset / regularization / data augmentation can help.
My apologies since my question may sound stupid question. But I am quite new in deep learning and caffe.
How can we detect how many iterations are required to fine-tune a pre-trained on our own dataset? For example, I am running fcn32 for my own data with 5 classes. When can I stop the fine-tuning process by looking at the loss and accuracy of training phase?
Many thanks
You shouldn't do it by looking at the loss or accuracy of training phase. Theoretically, the training accuracy should always be increasing (also means the training loss should always be decreasing) because you train the network to decrease the training loss. But a high training accuracy doesn't necessary mean a high test accuracy, that's what we referred as over-fitting problem. So what you need to find is a point where the accuracy of test set (or validation set if you have it) stops increasing. And you can simply do it by specifying a relatively larger number of iteration at first, then monitor the test accuracy or test loss, if the test accuracy stops increasing (or the loss stops decreasing) in consistently N iterations (or epochs), where N could be 10 or other number specified by you, then stop the training process.
The best thing to do is to track training and validation accuracy and store snapshots of the weights every k iterations. To compute validation accuracy you need to have a sparate set of held out data which you do not use for training.
Then, you can stop once the validation accuracy stops increasing or starts decreasing. This is called early stopping in the literature. Keras, for example, provides functionality for this: https://keras.io/callbacks/#earlystopping
Also, it's good practice to plot the above quantities, because it gives you important insights into the training process. See http://cs231n.github.io/neural-networks-3/#accuracy for a great illustration (not specific to early stopping).
Hope this helps
Normally you converge to a specific validation accuracy for your model. In practice you normally stop training, if the validation loss did not increase in x epochs. Depending on your epoch duration x may vary most commonly between 5 and 20.
Edit:
An epoch is one iteration over your dataset for trainig in ML terms. You do not seem to have a validation set. Normally the data is split into training and validation data so you can see how well your model performs on unseen data and made decisions about which model to take by looking at this data. You might want to take a look at http://caffe.berkeleyvision.org/gathered/examples/mnist.html to see the usage of a validation set, even though they call it test set.