Splitting Dataset over Multiple GPUs - deep-learning

I'm training a large network that inputs and outputs 512x512 images. At the moment, I have 2 Tesla A100 GPUs with 40 GB of memory each, and a dataset comprising 10,000 input and outputs pairs. This adds up to roughly 38 GB of training data, which leads me to run out of memory when sending this data to the "cuda" device to create my dataset. I am simply using DataParallel to distribute the training.
How can I split my dataset up over the two GPUs to avoid running out of memory?

Here is my solution. Open to others, especially more memory-efficient options!
to_t = lambda array: torch.tensor(array, device=device)
class CustomDataset(Dataset):
def __init__(self, image, label):
self.image = image
self.label = label
def __len__(self):
return len(self.label)
def __getitem__(self, idx):
image = self.image[idx]
label = self.label[idx]
return to_t(image).float(), to_t(label).float()

Related

How to enlarge dataset using augmentations in Pytorch

If i enlarge my dataset using augmentations, I get a better result?
For example, I have 1 class, it is a dog class and 4 images for it. I applied augmentations to 4 images. Now some of these images are augmented, some are not. But I still have 4 images.
Will it be more efficient if I add to augmented images original images? -> It will be 8 images in dataset. I tried to do this thing, changing my "Custom Dataset", but if I have lot of images (100000) then Collab tell me bye bye, because of memory ran out.
Is it matter to make augmentations before creating dataset and after creating dataset in training loop like this:
for x, y in train_loader:
aug_x = aug(x)
...
output = model(aug_x)
loss = ...
loss.backward()
...
I suppose, I need to choose 1 way to apply augmentations to my images either before dataset or in the training loop. Am I wrong? Write below ypur suggestions with code. Thank you!
Usually approprietly chosen augmentations leads to better results.
You are right the preliminary augmentation of your dataset and saving augmented images consumes all the disk memory in the case of big datasets.
So it makes sense to apply augmentations dynamically, on-the-fly.
Simple pytorch example:
import cv2
import numpy as np
from torch.utils.data import DataLoader, Dataset
class MyDataset(Dataset):
def __init__(self, image_paths, size):
self._image_paths = image_paths
self._size = size
def __getitem__(self, idx):
path = self._image_paths[0]
image = cv2.imread(path)
# Insert here your augmentations
if np.random.rand() < 0.5:
image = cv2.flip(image, 0)
if np.random.rand() < 0.5:
image = cv2.flip(image, 1)
return image
def __len__(self):
return self._size
image_paths = ["1.png"]
loader = DataLoader(MyDataset(image_paths, 10), batch_size=4)
for batch in loader:
batch_images = np.hstack([image for image in batch])
cv2.imshow("image", batch_images)
cv2.waitKey()
One special case when this approach will work poorly is when the augmentation process takes a lot of time. For example when you need to render some 3D objects using complex pipeline with Blender. Such augmentations will became the bottleneck during the training so it makes sense to save the augmented data to disk first and the use it to enlarge dataset during training.
The choice of augmentations heavily depends on the domain of your data. Small augmentations could lead to small to no accuracy gains. Very heavy augmentations could distort the training distribution too big which results to decrease in quality.
If you are interested in image augmentations you can check out these projects:
https://github.com/aleju/imgaug
https://github.com/albumentations-team/albumentations
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

PyTorch: Inference on a single very large image using multiple GPUs?

I want to perform inference (i.e. semantic segmentation) on a very large satellite image without splitting it into pieces. I have access to 4 GPUs (each having 15 GBs of memory) and was wondering if it is possible to somehow use all the memory of these GPUs combined (i.e. 60 GB) for inference on the image in PyTorch?
You are looking for model parallel mode of work.
Basically, you can assign different parts of your model to different GPUs and then you should take care of the "bookkeeping".
This solution is very model-specific and task-specific therefore, there are no "generic" wrappers for it (as opposed to data parallel).
For example:
class MyModelParallelNetwork(nn.Module):
def __init__(self, ...):
# define the network
self.part_one = ... # some nn.Module
self.part_two = ... # additional nn.Module
self.part_three = ...
self.part_four = ...
# important part - "send" the different parts to different GPUs
self.part_one.to(torch.device('gpu:0'))
self.part_two.to(torch.device('gpu:1'))
self.part_three.to(torch.device('gpu:2'))
self.part_four.to(torch.device('gpu:3'))
def forward(self, x):
# forward through model parts and GPUs:
p1 = self.part_one(x.to(torch.device('gpu:0')))
p2 = self.part_two(p1.to(torch.device('gpu:1')))
p3 = self.part_three(p2.to(torch.device('gpu:2')))
y = self.part_four(p3.to(torch.device('gpu:3')))
return y # result is on cuda:3 device

Proper way to extract embedding weights for CBOW model?

I'm currently trying to implement the CBOW model on managed to get the training and testing, but am facing some confusion as to the "proper" way to finally extract the weights from the model to use as our word embeddings.
Model
class CBOW(nn.Module):
def __init__(self, config, vocab):
self.config = config # Basic config file to hold arguments.
self.vocab = vocab
self.vocab_size = len(self.vocab.token2idx)
self.window_size = self.config.window_size
self.embed = nn.Embedding(num_embeddings=self.vocab_size, embedding_dim=self.config.embed_dim)
self.linear = nn.Linear(in_features=self.config.embed_dim, out_features=self.vocab_size)
def forward(self, x):
x = self.embed(x)
x = torch.mean(x, dim=0) # Average out the embedding values.
x = self.linear(x)
return x
Main process
After I run my model through a Solver with the training and testing data, I basically told the train and test functions to also return the model that's used. Then I assigned the embedding weights to a separate variable and used those as the word embeddings.
Training and testing was conducted using cross entropy loss, and each training and testing sample is of the form ([context words], target word).
def run(solver, config, vocabulary):
for epoch in range(config.num_epochs):
loss_train, model_train = solver.train()
loss_test, model_test = solver.test()
embeddings = model_train.embed.weight
I'm not sure if this is the correct way of going about extracting and using the embeddings. Is there usually another way to do this? Thanks in advance.
Yes, model_train.embed.weight will give you a torch tensor that stores the embedding weights. Note however, that this tensor also contains the latest gradients. If you don't want/need them, model_train.embed.weight.data will give you the weights only.
A more generic option is to call model_train.embed.parameters(). This will give you a generator of all the weight tensors of the layer. In general, there are multiple weight tensors in a layer and weight will give you only one of them. Embedding happens to have only one, so here it doesn't matter which option you use.

Need to change GPU option to CPU in a python pytorch based code

The code basically trains the usual MNIST image dataset but it does the training on a GPU. I need to change this option so the code trains the model using my laptop computer. I need to substitute the .cuda() at the second line for the equivalent in CPU.
I know there are many examples online on how to train neural networks using the MNIST database but what is special about this code is that it does the optimization using a PID controller (commonly used in industry) and I need the code as part of my research.
net = Net(input_size, hidden_size, num_classes)
net.cuda()
net.train()
#Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = PIDOptimizer(net.parameters(), lr=learning_rate, weight_decay=0.0001, momentum=0.9, I=I, D=D)
# Train the Model
for epoch in range(num_epochs):
train_loss_log = AverageMeter()
train_acc_log = AverageMeter()
val_loss_log = AverageMeter()
val_acc_log = AverageMeter()
for i, (images, labels) in enumerate(train_loader):
# Convert torch tensor to Variable
images = Variable(images.view(-1, 28*28).cuda())
labels = Variable(labels.cuda())
Would need to be able to run the code without using the .cuda() option which is for training using a GPU. Need to run it on my PC.
Here's the source code in case needed.
https://github.com/tensorboy/PIDOptimizer
Many thanks, community!
It is better to move up to latest pytorch (1.0.x).
With latest pytorch, it is more easy to manage "device".
Below is a simple example.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#Now send existing model to device.
model_ft = model_ft.to(device)
#Now send input to device and so on.
inputs = inputs.to(device)
With this construct, your code automatically uses appropriate device.
Hope this helps!

Why does Keras LSTM batch size used for prediction have to be the same as fitting batch size?

When using a Keras LSTM to predict on time series data I've been getting errors when I'm trying to train the model using a batch size of 50, while then trying to predict on the same model using a batch size of 1 (ie just predicting the next value).
Why am I not able to train and fit the model with multiple batches at once, and then use that model to predict for anything other than the same batch size. It doesn't seem to make sense, but then I could easily be missing something about this.
Edit: this is the model. batch_size is 50, sl is sequence length, which is set at 20 currently.
model = Sequential()
model.add(LSTM(1, batch_input_shape=(batch_size, 1, sl), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, verbose=2)
here is the line for predicting on the training set for RMSE
# make predictions
trainPredict = model.predict(trainX, batch_size=batch_size)
here is the actual prediction of unseen time steps
for i in range(test_len):
print('Prediction %s: ' % str(pred_count))
next_pred_res = np.reshape(next_pred, (next_pred.shape[1], 1, next_pred.shape[0]))
# make predictions
forecastPredict = model.predict(next_pred_res, batch_size=1)
forecastPredictInv = scaler.inverse_transform(forecastPredict)
forecasts.append(forecastPredictInv)
next_pred = next_pred[1:]
next_pred = np.concatenate([next_pred, forecastPredict])
pred_count += 1
This issue is with the line:
forecastPredict = model.predict(next_pred_res, batch_size=batch_size)
The error when batch_size here is set to 1 is:
ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)' which is the same error that throws when batch_size here is set to 50 like the other batch sizes as well.
The total error is:
forecastPredict = model.predict(next_pred_res, batch_size=1)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/models.py", line 899, in predict
return self.model.predict(x, batch_size=batch_size, verbose=verbose)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1573, in predict
batch_size=batch_size, verbose=verbose)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1203, in _predict_loop
batch_outs = f(ins_batch)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2103, in __call__
feed_dict=feed_dict)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 944, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)'
Edit: Once I set the model to stateful=False then I am able to use different batch sizes for fitting/training and prediction. What is the reason for this?
Unfortunately what you want to do is impossible with Keras ... I've also struggle a lot of time on this problems and the only way is to dive into the rabbit hole and work with Tensorflow directly to do LSTM rolling prediction.
First, to be clear on terminology, batch_size usually means number of sequences that are trained together, and num_steps means how many time steps are trained together. When you mean batch_size=1 and "just predicting the next value", I think you meant to predict with num_steps=1.
Otherwise, it should be possible to train and predict with batch_size=50 meaning you are training on 50 sequences and make 50 predictions every time step, one for each sequence (meaning training/prediction num_steps=1).
However, I think what you mean is that you want to use stateful LSTM to train with num_steps=50 and do prediction with num_steps=1. Theoretically this make senses and should be possible, and it is possible with Tensorflow, just not Keras.
The problem: Keras requires an explicit batch size for stateful RNN. You must specify batch_input_shape (batch_size, num_steps, features).
The reason: Keras must allocate a fixed-size hidden state vector in the computation graph with shape (batch_size, num_units) in order to persist the values between training batches. On the other hand, when stateful=False, the hidden state vector can be initialized dynamically with zeroes at the beginning of each batch so it does not need to be a fixed size. More details here: http://philipperemy.github.io/keras-stateful-lstm/
Possible work around: Train and predict with num_steps=1. Example: https://github.com/keras-team/keras/blob/master/examples/lstm_stateful.py. This might or might not work at all for your problem as the gradient for back propagation will be computed on only one time step. See: https://github.com/fchollet/keras/issues/3669
My solution: use Tensorflow: In Tensorflow you can train with batch_size=50, num_steps=100, then do predictions with batch_size=1, num_steps=1. This is possible by creating a different model graph for training and prediction sharing the same RNN weight matrices. See this example for next-character prediction: https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py#L11 and blog post http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Note that one graph can still only work with one specified batch_size, but you can setup multiple model graphs sharing weights in Tensorflow.
Sadly what you wish for is impossible because you specify the batch_size when you define the model...
However, I found a simple way around this problem: create 2 models! The first is used for training and the second for predictions, and have them share weights:
train_model = Sequential([Input(batch_input_shape=(batch_size,...),
<continue specifying your model>])
predict_model = Sequential([Input(batch_input_shape=(1,...),
<continue specifying exact same model>])
train_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
predict_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
Now you can use any batch size you want. after you fit your train_model just save it's weights and load them with the predict_model:
train_model.save_weights('lstm_model.h5')
predict_model.load_weights('lstm_model.h5')
notice that you only want to save and load the weights, and not the whole model (which includes the architecture, optimizer etc...). This way you get the weights but you can input one batch at a time...
more on keras save/load models:
https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
notice that you need to install h5py to use "save weights".
Another easy workaround is:
def create_model(batch_size):
model = Sequential()
model.add(LSTM(1, batch_input_shape=(batch_size, 1, sl), stateful=True))
model.add(Dense(1))
return model
model_train = create_model(batch_size=50)
model_train.compile(loss='mean_squared_error', optimizer='adam')
model_train.fit(trainX, trainY, epochs=epochs, batch_size=batch_size)
model_predict = create_model(batch_size=1)
weights = model_train.get_weights()
model_predict.set_weights(weights)
The best solution to this problem is "Copy Weights". It can be really helpful if you want to train & predict with your LSTM model with different batch sizes.
For example, once you have trained your model with 'n' batch size as shown below:
# configure network
n_batch = len(X)
n_epoch = 1000
n_neurons = 10
# design network
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
And now you want to want predict values fewer than your batch size where n=1.
What you can do is that, copy the weights of your fit model and reinitialize the new model LSTM model with same architecture and set batch size equal to 1.
# re-define the batch size
n_batch = 1
# re-define model
new_model = Sequential()
new_model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
new_model.add(Dense(1))
# copy weights
old_weights = model.get_weights()
new_model.set_weights(old_weights)
Now you can easily predict and train LSTMs with different batch sizes.
For more information please read: https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
I found below helpful (and fully inline with above). The section "Solution 3: Copy Weights" worked for me:
How to use Different Batch Sizes when Training and Predicting with LSTMs, by Jason Brownlee
n_neurons = 10
# design network
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# fit network
for i in range(n_epoch):
model.fit(X, y, epochs=1, batch_size=n_batch, verbose=1, shuffle=False)
model.reset_states()
# re-define the batch size
n_batch = 1
# re-define model
new_model = Sequential()
new_model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
new_model.add(Dense(1))
# copy weights
old_weights = model.get_weights()
new_model.set_weights(old_weights)
# compile model
new_model.compile(loss='mean_squared_error', optimizer='adam')
I also have same problem and resolved it.
In another way, you can save your weights, when you test your result, you can reload your model with same architecture and set batch_size=1 as below:
n_neurons = 10
# design network
model = Sequential()
model.add(LSTM(n_neurons, batch_size=1, batch_input_shape=(n_batch,X.shape[1], X.shape[2]), statefull=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.load_weights("w.h5")
It will work well. I hope it will helpfull for you.
If you don't have access to the code that created the model or if you just don't want your prediction/validation code to depend on your model creation and training code there is another way:
You could create a new model from a modified version of the loaded model's config like this:
loaded_model = tf.keras.models.load_model('model_file.h5')
config = loaded_model.get_config()
old_batch_input_shape = config['layers'][0]['config']['batch_input_shape']
config['layers'][0]['config']['batch_input_shape'] = (new_batch_size, old_batch_input_shape[1])
new_model = loaded_model.__class__.from_config(config)
new_model.set_weights(loaded_model.get_weights())
This works well for me in a situation where I have several different models with state-full RNN layers working together in a graph network but being trained separately with different networks leading to different batch sizes. It allows me to experiment with the model structures and training batches without needing to change anything in my validation script.