I'm trying to solve a DeepLearning PB in which I took 8000 input data points in the training set and a batch size of 10 and the number of epochs is 100, so per epoch my iterations come to look like 5359/5359 at every epoch.
What is the relation between
batch size,
epoch,
the total number of iterations,
number of input features and
the number of passes in the case deep learning model training?
Related
I have an Scheduling problem where the state/observation is an image of 125X100 pixels. The action space is a sequence of three actions the agent can take:
MultiDiscrete [1 to 20, 0 to 5, 0 to 5]. These give a total of 20 * 6 * 6 = 720 possible actions.
I am currently using a DQN algorithm to train the agent and at every 'step' the value function V(s) is trained on one of these actions making it very sparse. I trained for 100,000 iterations but it didn't converge.
How to train the agent using DQN in these situations? How long does the training time increases due to such a large action space?
Is there any alternate algorithm that works better in these scenarios? Because in the future problem the action space can increase even more.
I trained for 100,000 iterations but it didn't converge.
How to train the agent using DQN in these situations? How long does the training time increases due to such a large action space?
For some reason, during training:
I want to store the checkpoint after every epoch and start training the next epoch from the stored checkpoint.
I want the training state to remain continued among epochs. For example, when training epoch 2 from the checkpoint of epoch 1, the learning_rate_schedule, epoch nums...should be the same as if I train epoch 2 and epoch 1 together (the vanilla training process).
My implementation is using the argument --recover. Allennlp will store the checkpoint after every epoch. So, for epochs after the first, I add --recover to the training commands, wishing the model's parameters and training states will be restored.
However, the above implementation seems wrong because, in my testing, training epoch 2 from the checkpoint of epoch 1 gives different results from training epoch 2 and 1 together.
I tried hard to read the allennlp document but find difficult to figure the problem out. Any guys have comments on my implementation, or other ways to fulfill my requirements? Thanks a lot!!!
On Caffe, I am trying to implement a Fully Convolution Network for semantic segmentation. I was wondering is there a specific strategy to set up your 'solver.prototxt' values for the following hyper-parameters:
test_iter
test_interval
iter_size
max_iter
Does it depend on the number of images you have for your training set? If so, how?
In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data:
1. Training set size the total number of training examples you have, let's call this quantity T.
2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the 'train_val.prototxt'. For example, in this file the train batch size is set to 256. Let's denote this quantity by tb.
3. Validation set size the total number of examples you set aside for validating your model, let's denote this by V.
4. Validation batch size value set in batch_size for the TEST phase. In this example it is set to 50. Let's call this vb.
Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for test_iter iterations. To cover the entire validation set you need to have test_iter = V/vb.
How often would you like to get this estimation? It's really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. test_interval determines how often you validate: usually for large nets you set test_interval in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you.
In order to cover the entire training set (completing an "epoch") you need to run T/tb iterations. Usually one trains for several epochs, thus max_iter=#epochs*T/tb.
Regarding iter_size: this allows to average gradients over several training mini batches, see this thread fro more information.
On Caffe, I am trying to implement a Fully Convolution Network for semantic segmentation. I was wondering is there a specific strategy to set up your 'solver.prototxt' values for the following hyper-parameters:
test_iter
test_interval
iter_size
max_iter
Does it depend on the number of images you have for your training set? If so, how?
In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data:
1. Training set size the total number of training examples you have, let's call this quantity T.
2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the 'train_val.prototxt'. For example, in this file the train batch size is set to 256. Let's denote this quantity by tb.
3. Validation set size the total number of examples you set aside for validating your model, let's denote this by V.
4. Validation batch size value set in batch_size for the TEST phase. In this example it is set to 50. Let's call this vb.
Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for test_iter iterations. To cover the entire validation set you need to have test_iter = V/vb.
How often would you like to get this estimation? It's really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. test_interval determines how often you validate: usually for large nets you set test_interval in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you.
In order to cover the entire training set (completing an "epoch") you need to run T/tb iterations. Usually one trains for several epochs, thus max_iter=#epochs*T/tb.
Regarding iter_size: this allows to average gradients over several training mini batches, see this thread fro more information.
I want to know whether it is possible to estimate the training time of a convolutional neural network, given parameters like depth, filter, size of input, etc.
For instance, I am working on a 3D convolutional neural network whose structure is like:
a (20x20x20) convolutional layer with stride of 1 and 8 filters
a (20x20x20) max-pooling layer with stride of 20
a fully connected layer mapping to 8 nodes
a fully connected layer mapping to 1 output
I am running 100 epochs and print the loss(mean squared error) every 10 epochs. Now it has run 24 hours and no loss printed(I suppose it has not run 10 epochs yet). By the way, I am not using GPU.
Is it possible to estimate the training time like a formula or something like that? Is it related to time complexity or my hardware? I also found the following paper, will it give me some information?
https://ai2-s2-pdfs.s3.amazonaws.com/140f/467566e799f32831db6913d84ccdbdcac0b2.pdf
Thanks in advance.