Working on audio classification task where my inputs are raw audio samples and outputs are class labeles , and for this particular question, I want to augment only the Trainset split samples
Q: is it a good practice to augment the same audio sample more then one time ?
E.g., to apply to the same record x, first aug1 , which yield record_x_aug1_sample, and later aug2, which yield record_x_aug2_sample.
Then the Trainset will hold both: [record_x_aug1_sample,record_x_aug2_sample] and a model will train on this Trainset
Q2: is it a good practice to also add the original record x to the Trainset?
It is perfectly fine to augment the same audio more then one time. Moreover it is a good practice to reduce overfitting when your model each time takes slightly different versions of the same sample.
Yes it's fine. Also you can construct two datasets: 1. the original samples without augmentation 2. dataset with augmentations. Comparing the quality on those two dataset you can get a grasp of how strong your augmentations are. Also it can show the benefits of adding of augmentations to your training process.
Also you may consider augmenting your samples on-the-fly if you are using some iterative training process (like a neural network fitted with SGD). So the samples will be slightly different all the time. Pseudo-code:
for sample in dataset:
augmented_sample = augment(sample)
model.train(augmented_sample)
Another approach that may improve performance is first train on the augmented datasets. Then fine-tune the model on the clean original samples for few time.
Some libraries for audio augmentation:
https://github.com/iver56/audiomentations
https://github.com/asteroid-team/torch-audiomentations
Usage:
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np
augment = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])
# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)
# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)
First I want to say thank to anyone consider reading this question, and I want to sorry if my question is so stubborn, and for my poor English.
So currently I'm working on a recommendation system problem, and my approach was to use matrix factorization with implicit feedback using BPR (arXiv:1205.2618). Somehow, I discovered that when I trained my model (BPRMF), using a large batch size (in this case 4096), resulted in a poorer BPR loss compared to when I used a smaller batch size (1024). my training log on few epochs.
I noted that higher batch size resulted in faster training time as it can utilize GPU memory more efficiently, but the higher loss is something maybe I'm not so willingly to trade. As far as I know, a large batch size bring much more information for the gradient descent step to take a better step, so it should help with convergence, and usually problem with large batch size is in memory and resource, not with loss.
I have did some research about this, and saw that Large Batch Training Result in Poor Generalization and here another, but in my case, it was poor lost while in training.
My best guess is that using a large batch size, then take the mean of the loss make the gradient flow to the user and item embedding lower by the mean ( 1 / batch size) coefficient, make it hard to escape local maxima while training. Is it the answer in this case ? (However, I saw that recent study has show that local minima is not necessarily bad, so ...)
Really appreciated anybody help me answer why large batchsize ended up with anomaly results.
Side note: Might be another stupid question, but as you can see in the code below, you can see that the l2 loss is not normalized by batch size, so I expected it to at least double or quadruple when I multiply batch size by 4, but that seem not to be the case here in the log above.
Here is my code
from typing import Tuple
import torch
from torch.nn.parameter import Parameter
import torch.nn.functional as F
from .PretrainedModel import PretrainedModel
class BPRMFModel(PretrainedModel):
def __init__(self, n_users: int, n_items: int, u_embed: int, l2:float,
dataset: str, u_i_pretrained_dir, use_pretrained = 0, **kwargs) -> None:
super().__init__(n_users=n_users, n_items=n_items, u_embed=u_embed, dataset=dataset,
u_i_pretrained_dir=u_i_pretrained_dir, use_pretrained=use_pretrained,
**kwargs)
self.l2 = l2
self.reset_parameters()
self.items_e = Parameter(self._items_e)
self.users_e = Parameter(self._users_e)
def forward(self, u: torch.Tensor, i: torch.Tensor) -> torch.Tensor:
u = F.embedding(u, self.users_e)
i = F.embedding(i, self.items_e)
return torch.matmul(u, i.T)
def CF_loss(self, u: torch.Tensor, i_pos: torch.Tensor, i_neg: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
#u, i_pos, i_neg shape is [batch_size,]
u = F.embedding(u, self.users_e)
i_pos = F.embedding(i_pos, self.items_e)
i_neg = F.embedding(i_neg, self.items_e)
pos_scores = torch.einsum("ij,ij->i", u, i_pos)
neg_scores = torch.einsum("ij,ij->i", u, i_neg)
# loss = torch.mean(
# F.softplus(-(pos_scores - neg_scores))
# )
loss = torch.neg(
torch.mean(
F.logsigmoid(pos_scores - neg_scores)
)
)
l2_loss = (
u.pow(2).sum() +
i_pos.pow(2).sum() +
i_neg.pow(2).sum()
)
return loss, self.l2 * l2_loss
def get_users_rating_for_each_items(self, u: torch.Tensor, i: torch.Tensor) -> torch.Tensor:
return self(u, i)
def save_pretrained(self):
self._items_e = self.items_e.data
self._users_e = self.users_e.data
return super().save_pretrained()
PretrainedModel is just a base class helping me with the save and load model weight
Really appreciated anyone who bear with me till this end.
I'm studying on a deep learning(supervised-learning) to estimate depth images from monocular images.
And the dataset currently uses KITTI data. RGB images (input image) are used KITTI Raw data, and data from the following link is used for ground-truth.
In the process of learning a model by designing a simple encoder-decoder network, the result is not so good, so various attempts are being made.
While searching for various methods, I found that groundtruth only learns valid areas by masking because there are many invalid areas, i.e., values that cannot be used, as shown in the image below.
So, I learned through masking, but I am curious about why this result keeps coming out.
and this is my training part of code.
How can i fix this problem.
for epoch in range(num_epoch):
model.train() ### train ###
for batch_idx, samples in enumerate(tqdm(train_loader)):
x_train = samples['RGB'].to(device)
y_train = samples['groundtruth'].to(device)
pred_depth = model.forward(x_train)
valid_mask = y_train != 0 #### Here is masking
valid_gt_depth = y_train[valid_mask]
valid_pred_depth = pred_depth[valid_mask]
loss = loss_RMSE(valid_pred_depth, valid_gt_depth)
As far as I can understand, you are trying to estimate depth from an RGB image as input. This is an ill-posed problem since the same input image can project to multiple plausible depth values. You would need to integrate certain techniques to estimate accurate depth from RGB images instead of simply taking an L1 or L2 loss between an RGB image and its corresponding depth image.
I would suggest you to go through some papers in estimating depth from single images such as: Depth Map Prediction from a Single Image using a Multi-Scale Deep Network where they use a network to first estimate the global structure of the given image and then use a second network that refines the local scene information. Instead of taking a simple RMSE loss, as you did, they use a scale-invariant error function in which the relationship between points is measured.
When using a Keras LSTM to predict on time series data I've been getting errors when I'm trying to train the model using a batch size of 50, while then trying to predict on the same model using a batch size of 1 (ie just predicting the next value).
Why am I not able to train and fit the model with multiple batches at once, and then use that model to predict for anything other than the same batch size. It doesn't seem to make sense, but then I could easily be missing something about this.
Edit: this is the model. batch_size is 50, sl is sequence length, which is set at 20 currently.
model = Sequential()
model.add(LSTM(1, batch_input_shape=(batch_size, 1, sl), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, verbose=2)
here is the line for predicting on the training set for RMSE
# make predictions
trainPredict = model.predict(trainX, batch_size=batch_size)
here is the actual prediction of unseen time steps
for i in range(test_len):
print('Prediction %s: ' % str(pred_count))
next_pred_res = np.reshape(next_pred, (next_pred.shape[1], 1, next_pred.shape[0]))
# make predictions
forecastPredict = model.predict(next_pred_res, batch_size=1)
forecastPredictInv = scaler.inverse_transform(forecastPredict)
forecasts.append(forecastPredictInv)
next_pred = next_pred[1:]
next_pred = np.concatenate([next_pred, forecastPredict])
pred_count += 1
This issue is with the line:
forecastPredict = model.predict(next_pred_res, batch_size=batch_size)
The error when batch_size here is set to 1 is:
ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)' which is the same error that throws when batch_size here is set to 50 like the other batch sizes as well.
The total error is:
forecastPredict = model.predict(next_pred_res, batch_size=1)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/models.py", line 899, in predict
return self.model.predict(x, batch_size=batch_size, verbose=verbose)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1573, in predict
batch_size=batch_size, verbose=verbose)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1203, in _predict_loop
batch_outs = f(ins_batch)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2103, in __call__
feed_dict=feed_dict)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 944, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)'
Edit: Once I set the model to stateful=False then I am able to use different batch sizes for fitting/training and prediction. What is the reason for this?
Unfortunately what you want to do is impossible with Keras ... I've also struggle a lot of time on this problems and the only way is to dive into the rabbit hole and work with Tensorflow directly to do LSTM rolling prediction.
First, to be clear on terminology, batch_size usually means number of sequences that are trained together, and num_steps means how many time steps are trained together. When you mean batch_size=1 and "just predicting the next value", I think you meant to predict with num_steps=1.
Otherwise, it should be possible to train and predict with batch_size=50 meaning you are training on 50 sequences and make 50 predictions every time step, one for each sequence (meaning training/prediction num_steps=1).
However, I think what you mean is that you want to use stateful LSTM to train with num_steps=50 and do prediction with num_steps=1. Theoretically this make senses and should be possible, and it is possible with Tensorflow, just not Keras.
The problem: Keras requires an explicit batch size for stateful RNN. You must specify batch_input_shape (batch_size, num_steps, features).
The reason: Keras must allocate a fixed-size hidden state vector in the computation graph with shape (batch_size, num_units) in order to persist the values between training batches. On the other hand, when stateful=False, the hidden state vector can be initialized dynamically with zeroes at the beginning of each batch so it does not need to be a fixed size. More details here: http://philipperemy.github.io/keras-stateful-lstm/
Possible work around: Train and predict with num_steps=1. Example: https://github.com/keras-team/keras/blob/master/examples/lstm_stateful.py. This might or might not work at all for your problem as the gradient for back propagation will be computed on only one time step. See: https://github.com/fchollet/keras/issues/3669
My solution: use Tensorflow: In Tensorflow you can train with batch_size=50, num_steps=100, then do predictions with batch_size=1, num_steps=1. This is possible by creating a different model graph for training and prediction sharing the same RNN weight matrices. See this example for next-character prediction: https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py#L11 and blog post http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Note that one graph can still only work with one specified batch_size, but you can setup multiple model graphs sharing weights in Tensorflow.
Sadly what you wish for is impossible because you specify the batch_size when you define the model...
However, I found a simple way around this problem: create 2 models! The first is used for training and the second for predictions, and have them share weights:
train_model = Sequential([Input(batch_input_shape=(batch_size,...),
<continue specifying your model>])
predict_model = Sequential([Input(batch_input_shape=(1,...),
<continue specifying exact same model>])
train_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
predict_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
Now you can use any batch size you want. after you fit your train_model just save it's weights and load them with the predict_model:
train_model.save_weights('lstm_model.h5')
predict_model.load_weights('lstm_model.h5')
notice that you only want to save and load the weights, and not the whole model (which includes the architecture, optimizer etc...). This way you get the weights but you can input one batch at a time...
more on keras save/load models:
https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
notice that you need to install h5py to use "save weights".
Another easy workaround is:
def create_model(batch_size):
model = Sequential()
model.add(LSTM(1, batch_input_shape=(batch_size, 1, sl), stateful=True))
model.add(Dense(1))
return model
model_train = create_model(batch_size=50)
model_train.compile(loss='mean_squared_error', optimizer='adam')
model_train.fit(trainX, trainY, epochs=epochs, batch_size=batch_size)
model_predict = create_model(batch_size=1)
weights = model_train.get_weights()
model_predict.set_weights(weights)
The best solution to this problem is "Copy Weights". It can be really helpful if you want to train & predict with your LSTM model with different batch sizes.
For example, once you have trained your model with 'n' batch size as shown below:
# configure network
n_batch = len(X)
n_epoch = 1000
n_neurons = 10
# design network
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
And now you want to want predict values fewer than your batch size where n=1.
What you can do is that, copy the weights of your fit model and reinitialize the new model LSTM model with same architecture and set batch size equal to 1.
# re-define the batch size
n_batch = 1
# re-define model
new_model = Sequential()
new_model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
new_model.add(Dense(1))
# copy weights
old_weights = model.get_weights()
new_model.set_weights(old_weights)
Now you can easily predict and train LSTMs with different batch sizes.
For more information please read: https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
I found below helpful (and fully inline with above). The section "Solution 3: Copy Weights" worked for me:
How to use Different Batch Sizes when Training and Predicting with LSTMs, by Jason Brownlee
n_neurons = 10
# design network
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# fit network
for i in range(n_epoch):
model.fit(X, y, epochs=1, batch_size=n_batch, verbose=1, shuffle=False)
model.reset_states()
# re-define the batch size
n_batch = 1
# re-define model
new_model = Sequential()
new_model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
new_model.add(Dense(1))
# copy weights
old_weights = model.get_weights()
new_model.set_weights(old_weights)
# compile model
new_model.compile(loss='mean_squared_error', optimizer='adam')
I also have same problem and resolved it.
In another way, you can save your weights, when you test your result, you can reload your model with same architecture and set batch_size=1 as below:
n_neurons = 10
# design network
model = Sequential()
model.add(LSTM(n_neurons, batch_size=1, batch_input_shape=(n_batch,X.shape[1], X.shape[2]), statefull=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.load_weights("w.h5")
It will work well. I hope it will helpfull for you.
If you don't have access to the code that created the model or if you just don't want your prediction/validation code to depend on your model creation and training code there is another way:
You could create a new model from a modified version of the loaded model's config like this:
loaded_model = tf.keras.models.load_model('model_file.h5')
config = loaded_model.get_config()
old_batch_input_shape = config['layers'][0]['config']['batch_input_shape']
config['layers'][0]['config']['batch_input_shape'] = (new_batch_size, old_batch_input_shape[1])
new_model = loaded_model.__class__.from_config(config)
new_model.set_weights(loaded_model.get_weights())
This works well for me in a situation where I have several different models with state-full RNN layers working together in a graph network but being trained separately with different networks leading to different batch sizes. It allows me to experiment with the model structures and training batches without needing to change anything in my validation script.
I am working with Keras 2.0.0 and I'd like to train a deep model with a huge amount of parameters on a GPU.
As my data are big, I have to use the ImageDataGenerator. To be honest, I want to abuse the ImageDataGenerator in that sense, that I don't want to perform any augmentations. I just want to put my training images into batches (and rescale them), so I can feed them to model.fit_generator.
I adapted the code from here and did some small changes according to my data (i.e. changing binary classification to categorical. But this doesn't matter for this problem which should be discussed here).
I have 15000 train images and the only 'augmentation' I want to perform, is rescaling to scope [0,1] by train_datagen = ImageDataGenerator(rescale=1./255).
After creating my 'train_generator' :
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle = True,
seed = 1337,
save_to_dir = save_data_dir)
I fit the model by using model.fit_generator().
I set amount of epochs to: epochs = 1
And batch_size to: batch_size = 60
What I expect to see in the directory where my augmented (i.e. resized) images are stored: 15.000 rescaled images per epoch, i.e. with only one epoch: 15.000 rescaled images. But, mysteriously, there are 15.250 images.
Is there a reason for this amount of images?
Do I have the power to control the amount of augmented images?
Similar problems:
Model fit_generator not pulling data samples as expected (respectively at stackoverflow: Keras - How are batches and epochs used in fit_generator()?)
A concrete example for using data generator for large datasets such as ImageNet
I appreciate your help.