I design and train an Inception-ResNet model for image recognition. The network has learned well from the training dataset. However, the test accuracy is very unstable.
Here are some parameters and important information I have used for learning process:
The number of training samples: 40,000 images.
The number of test samples: 15,000 images.
Learning rate is set to 0.001 for the first 50 epochs, 0.0001 for the next 50 epochs and 0.00001 for the rest.
Batchsize: 128
Dropout rate: 0.2
After 150 epochs, learning curves, including training loss and test accuracy look like that:
Training loss and test accuracy
I tried to increase the batch size. However, it is not the solution to my problem.
Thank you in advance for any help you might be able to provide.
Regards,
An Nhien./.
Related
I have an Scheduling problem where the state/observation is an image of 125X100 pixels. The action space is a sequence of three actions the agent can take:
MultiDiscrete [1 to 20, 0 to 5, 0 to 5]. These give a total of 20 * 6 * 6 = 720 possible actions.
I am currently using a DQN algorithm to train the agent and at every 'step' the value function V(s) is trained on one of these actions making it very sparse. I trained for 100,000 iterations but it didn't converge.
How to train the agent using DQN in these situations? How long does the training time increases due to such a large action space?
Is there any alternate algorithm that works better in these scenarios? Because in the future problem the action space can increase even more.
I trained for 100,000 iterations but it didn't converge.
How to train the agent using DQN in these situations? How long does the training time increases due to such a large action space?
I am training a network to classify psychosis (binary classification as either healthy or psychosis) given an MRI scan of a subject. My dataset is 500 items, where I am using 350 for training and 150 for validation. Around 44% of the dataset is healthy, and ~56% has psychosis.
When I train the network without data augmentation, the training loss begins decreasing immediately while validation loss never changes. The red line in the accuracy graph below is the dominant class percentage (56%).
When I re-train using data augmentation 80% of the time (random affine, blur, noise, flip), overfitting is prevented, but now nothing is learned at all.
So I suppose my question is: What are some ideas for how to get the validation accuracy to increase? i.e. get the network to learn things without overfitting...
I am training a ViT based classification model and trying to study the behaviour of the model by increasing the number of fc layers. One thing that I noticed that if I increase the fc layers beyond 2, loss decreases significantly only after a few iterations while with one or two fc layers, loss curve appears smoother and decreases slowly. I am adding the loss curves for your reference (left: 3 layers, right: 2 layers)
I have read that increasing the number of layers can cause more training accuracy but at the same time, it might overfit. But by looking at the loss curve, it doesn't appear to be overfitting (number of neurons in one fc layer: 1000)
Can someone explain this behaviour? Thanks in advance
My questions are about underfitting/overfitting and it's related with the following results :
here
In this scenario, a flat validation loss and a decreasing training loss can be considered a symptom of overfitting? I'd have expected a validation loss that starts to increase.
Moreover, at the end, the training loss was flattening, so is it correct to say that the model can't learn more with these hyperparameters? Is this, instead, a symptom of underfitting?
I'm working on this dataset (here).
I implemented a convolutional neural network with 7 conv layers and 2 FC (similar to VGG, 64-P-128-128-P-256-256-P-512-512, a hidden FC of 256 neurons and the last for classifcation), clearly not to obtain a state-of-the-art score (currently about 75%).
It seems strange to me to talk about underfitting and overfitting in the same training process, so I'm pretty sure there's something I'm missing. Could you help me to understand these results?
Thanks for your attention
I've found a similar question but it didn't help (here).
I want to know whether it is possible to estimate the training time of a convolutional neural network, given parameters like depth, filter, size of input, etc.
For instance, I am working on a 3D convolutional neural network whose structure is like:
a (20x20x20) convolutional layer with stride of 1 and 8 filters
a (20x20x20) max-pooling layer with stride of 20
a fully connected layer mapping to 8 nodes
a fully connected layer mapping to 1 output
I am running 100 epochs and print the loss(mean squared error) every 10 epochs. Now it has run 24 hours and no loss printed(I suppose it has not run 10 epochs yet). By the way, I am not using GPU.
Is it possible to estimate the training time like a formula or something like that? Is it related to time complexity or my hardware? I also found the following paper, will it give me some information?
https://ai2-s2-pdfs.s3.amazonaws.com/140f/467566e799f32831db6913d84ccdbdcac0b2.pdf
Thanks in advance.