Multiprocessing in StableBaselines3 - how is batch size distributed? - reinforcement-learning

I am following this multiprocessing notebook. I want to understand how the batch_size parameter of the model is distributed across the multiple environments.
I have a model trained with 1 worker on 1 environment with a batch_size = 64, I understand that the network is updated in batches of 64 samples/timesteps.
Now what if I have that same model but trained with 4 workers on 4 environments, with parameter batch_size set to 64? Is the model now actually being updated with 64*4 samples/timesteps? Or is the 64 batch size being split 4 ways, so model updated with 64 samples, but 16 from each environment?
Thank you!

Related

YOLO - change parameters for multi gpu

The goal is to train YOLO with multi-GPU. According to Darknet AlexeyAB, we should train YOLO with single GPU for 1000 iterations first, and then continue it with multi-GPU from saved weight (1000_iter.weigts). So, we don't need to change any parameters in .cfg file?
Here is my .cfg when I trained my model with single GPU:
[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
AlexyAB says: modify .cfg "if you get Nan". In my case, I'm not getting Nan, but my loss is fluctuating. Shouldn't we change anything when we continue training with multi-GPU? batch? subdivisions? learning_rate? burn_in? We just need to continue training with same configurations?
You will need to change burn_in, max_batches and steps between the two cases, for example, if your final target is 500200, your first .cfg file should have this:
burn_in=100
max_batches = 50000
policy=steps
steps=40000,45000
and the second file like this:
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
You need only to change learning_rate if you get a Nan according to this, then you should divide learning_rate by the number of GPUs and multiply burn_in by the same number.

How BatchSize in Keras works ? LSTM-WithState-Time Series

I am working on Time series problem using LSTM (Stateful) on Keras.
I have 40,000 samples and using batch size of 64 and look back is 7 days. So my tensor shape is (64, 7, 6) 6 is number of features.
My question is when I say batch size = 64; How are samples selected in Keras LSTM. Is it first 64 samples followed by next 64 samples or does it divide samples to 625 windows (40000/64) and send corresponding 64 samples from each window ?
Is this important as I am working on time series problem with state LSTM as forecasting depends on previous days.

How to determine amount of augmented images in Keras?

I am working with Keras 2.0.0 and I'd like to train a deep model with a huge amount of parameters on a GPU.
As my data are big, I have to use the ImageDataGenerator. To be honest, I want to abuse the ImageDataGenerator in that sense, that I don't want to perform any augmentations. I just want to put my training images into batches (and rescale them), so I can feed them to model.fit_generator.
I adapted the code from here and did some small changes according to my data (i.e. changing binary classification to categorical. But this doesn't matter for this problem which should be discussed here).
I have 15000 train images and the only 'augmentation' I want to perform, is rescaling to scope [0,1] by train_datagen = ImageDataGenerator(rescale=1./255).
After creating my 'train_generator' :
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle = True,
seed = 1337,
save_to_dir = save_data_dir)
I fit the model by using model.fit_generator().
I set amount of epochs to: epochs = 1
And batch_size to: batch_size = 60
What I expect to see in the directory where my augmented (i.e. resized) images are stored: 15.000 rescaled images per epoch, i.e. with only one epoch: 15.000 rescaled images. But, mysteriously, there are 15.250 images.
Is there a reason for this amount of images?
Do I have the power to control the amount of augmented images?
Similar problems:
Model fit_generator not pulling data samples as expected (respectively at stackoverflow: Keras - How are batches and epochs used in fit_generator()?)
A concrete example for using data generator for large datasets such as ImageNet
I appreciate your help.

How does testing work in caffe framework?

So basically one splits the database in training/testing. Let's say 2/3 training and the rest is set for testing.
Then in caffe we split our training data in batches of different sizes, let's say that we have 100 batches of 50 images each, so we have 5000 training images. Now let's say that we have 50 testing batches of 50 images each.
Now let' say that caffe did 1 epoch and then test with the testing batches. How does caffe do this?
It takes first training batch and with it, it tries to predict the labels of every testing batch?
Like:
training_batch_1 : testing_batch_1 = accuracy xxxx;
training_batch_1 : testing_batch_2 = accuracy xxxx;
....
training_batch_1 : testing_batch_50 = accuracy xxxx;
And then it extract the mean accuracy for training_batch_1. Then does the same thing with training_batch_2 and so on?
A test simply runs the input vector through a single forward pass of the trained model. Does the top predicted label match the given test value? If so, score 1 point. At the end of the batch, divide total points by batch size, and that's the batch accuracy.
At the end of the testing run, take the mean of the batch accuracies; that's the testing accuracy.
Is that what you needed to know?

Training time on GeForce GTX Titan X with CUDA 7.5

I'm running the Caffe library on GeForce GTX Titan X with CUDA 7.5 (Ubuntu 14). I'm not sure whether Caffe is properly configured for my setup. My dataset consists of images with 256 x 256 pixels (3 channels), 100000 training / 10000 test samples. For the very first test I'm using AlexNet with new_height=256, new_width=256, crop_size=227. Running 1000 training iterations on one Titan X with batch_size=256 takes about 17 minutes... Is it not too slow for this hardware?
Any help and advices are kindly appreciated!
Running 1000 iterations on a batch of 256 images:
(256 height* 256 width* 256 batch size * 1000 iteration * 3 channels) bytes / ((1024*1024)MB * (17*60)seconds) = 47MBps compute speed.
The following may improve the performance:
If the original images are of bigger resolution, try to preprocess them to 256x256 thus reducing a lot of pixel reads from the harddisk.
Compile Caffe using Cudnn flag. This may lead to a 30% improvement in speed
Try creating an LMDB dataset of the input set and use the LMDB data for training.
Try using an SSD instead of a SATA harddisk.
No, it is not. Check out this link for Caffe performance and hardware configuration.