Estimating the training time in convolutional neural network - deep-learning

I want to know whether it is possible to estimate the training time of a convolutional neural network, given parameters like depth, filter, size of input, etc.
For instance, I am working on a 3D convolutional neural network whose structure is like:
a (20x20x20) convolutional layer with stride of 1 and 8 filters
a (20x20x20) max-pooling layer with stride of 20
a fully connected layer mapping to 8 nodes
a fully connected layer mapping to 1 output
I am running 100 epochs and print the loss(mean squared error) every 10 epochs. Now it has run 24 hours and no loss printed(I suppose it has not run 10 epochs yet). By the way, I am not using GPU.
Is it possible to estimate the training time like a formula or something like that? Is it related to time complexity or my hardware? I also found the following paper, will it give me some information?
https://ai2-s2-pdfs.s3.amazonaws.com/140f/467566e799f32831db6913d84ccdbdcac0b2.pdf
Thanks in advance.

Related

How to model Multi Discrete action space with 720 possible combinatorial actions?

I have an Scheduling problem where the state/observation is an image of 125X100 pixels. The action space is a sequence of three actions the agent can take:
MultiDiscrete [1 to 20, 0 to 5, 0 to 5]. These give a total of 20 * 6 * 6 = 720 possible actions.
I am currently using a DQN algorithm to train the agent and at every 'step' the value function V(s) is trained on one of these actions making it very sparse. I trained for 100,000 iterations but it didn't converge.
How to train the agent using DQN in these situations? How long does the training time increases due to such a large action space?
Is there any alternate algorithm that works better in these scenarios? Because in the future problem the action space can increase even more.
I trained for 100,000 iterations but it didn't converge.
How to train the agent using DQN in these situations? How long does the training time increases due to such a large action space?

A flat validation loss and a decreasing training loss can be considered a symptom of overfitting?

My questions are about underfitting/overfitting and it's related with the following results :
here
In this scenario, a flat validation loss and a decreasing training loss can be considered a symptom of overfitting? I'd have expected a validation loss that starts to increase.
Moreover, at the end, the training loss was flattening, so is it correct to say that the model can't learn more with these hyperparameters? Is this, instead, a symptom of underfitting?
I'm working on this dataset (here).
I implemented a convolutional neural network with 7 conv layers and 2 FC (similar to VGG, 64-P-128-128-P-256-256-P-512-512, a hidden FC of 256 neurons and the last for classifcation), clearly not to obtain a state-of-the-art score (currently about 75%).
It seems strange to me to talk about underfitting and overfitting in the same training process, so I'm pretty sure there's something I'm missing. Could you help me to understand these results?
Thanks for your attention
I've found a similar question but it didn't help (here).

Large number of training steps results in poor performance in transfer learning

I have a question. I have used transfer learning to retrain googlenet on my image classification problem. I have 80,000 images which belong to 14 categories. I set number of training steps equal to 200,000. I think the code provided by Tensorflow should have drop out implimented and it trains based on random shuffling of dataset and cross validation approach, and and I do not see any overfiting in training and classification curves, and I get high cross validation accuracy and high test accuracy but when I apply my model to new dataset then I get poor classification result. Anybodey know what is going on?Thanks!

Keras: Why is my testing accuracy unstable?

I design and train an Inception-ResNet model for image recognition. The network has learned well from the training dataset. However, the test accuracy is very unstable.
Here are some parameters and important information I have used for learning process:
The number of training samples: 40,000 images.
The number of test samples: 15,000 images.
Learning rate is set to 0.001 for the first 50 epochs, 0.0001 for the next 50 epochs and 0.00001 for the rest.
Batchsize: 128
Dropout rate: 0.2
After 150 epochs, learning curves, including training loss and test accuracy look like that:
Training loss and test accuracy
I tried to increase the batch size. However, it is not the solution to my problem.
Thank you in advance for any help you might be able to provide.
Regards,
An Nhien./.

Does resnet have fully connected layers?

In my understanding, fully connected layer(fc in short) is used for predicting.
For example, VGG Net used 2 fc layers, which are both 4096 dimension. The last layer for softmax has dimension same with classes num:1000.
But for resnet, it used global average pooling, and use the pooled result of last convolution layer as the input.
But they still has a fc layer! Does this layer a really fc layer? Or this layer is only to make input into a vector of features which number is classes number? Does this layer has function for prediction result?
In a word, how many fc layers do resnet and VGGnet have? Does VGGnet's 1st 2nd 3rd fc layer has different function?
VGG has three FC layers, two with 4096 neurons and one with 1000 neurons which outputs the class probabilities.
ResNet only has one FC layer with 1000 neurons which again outputs the class probabilities. In a NN classifier always the best choice is to use softmax, some authors make this explicit in the diagram while others do not.
In essence the guys at microsoft (ResNet) favor more convolutional layers instead of fully connected ones and therefore ommit fully connected layers. GlobalAveragePooling also decreases the feature size dramatically and therefore reduces the number of parameters going from the convolutional part to the fully connected part.
I would argue that the performance difference is quite slim, but one of their main accomplishments, by introducing ResNets is the dramatic reduction of parameters and those two points helped them accomplish that.