I have about (10000, 6) time series data.
I made a sequence with 10 bundles.
The whole data consisted of (9990, 10, 6).
If I use a batch size of 20 and put it as an LSTM(batch_first=True) input, is the output (20, 10, n) correct?
I didn't understand why the value of n comes out as hidden_size in pytorch.
How is the output of an RNN unit determined?
6 inputs 1 output per unit?
If I'm right, the output should be (batch_size, hidden_size, 1)
Related
I want to implement lstms with CNN in pytorch as my data is a time series data i.e. frames of video for heart rate detection, I am struggling with the input and output dimensions for lstms what and how i should properly configure the dimensions/parameters/arguments at input of lstms in pytorch as its quite confusing when considering time steps, hidden state etc.
my output from CNN is “2 batches of 256 frames”, which is now the input to lstms
batch is 2
features =256
the output is also a batch of 2 with 256 frames.
Generally, the input shape of sequential data takes the form (batch_size, seq_len, num_features). Based on your explanation, I assume your input is of the form (2, 256), where 2 is the batch size and 256 is the sequence length of scalars (1-dimensional tensor). Therefore, you should reshape your input to be (2, 256, 1) by inputs.unsqueeze(2).
To declare and use an LSTM model, simply try
from torch import nn
model = nn.LSTM(
input_size=1, # 1-dimensional features
batch_first=True, # batch is the first (zero-th) dimension
hidden_size=some_hidden_size, # maybe 64, 128, etc.
num_layers=some_num_layers, # maybe 1 or 2
proj_size=1, # output should also be 1-dimensional
)
outputs, (hidden_state, cell_state) = model(inputs)
I have a question regarding the role of the batch size. My MLP model has 2 Dense-layers with "softmax" activation function:
# Creat my MLP MODEL:
model = Sequential()
model.add(Dense(units=64, input_dim = 100))
model.add(BatchNormalization())
model.add(Activation("softmax"))
model.add(Dense(units=64))
model.add(BatchNormalization())
model.add(Activation("softmax"))
model.add(Dense(units=1))
Green's Batchsize = 2, Pink's Batchsize = 8, Red's Batchsize = 5
The dataset has 84000 samples. Each of the sample consists of 100 input values and 1 output value. Each of the sample describes a different subprocess, so the relationship between the samples do not exist. I have evaluated the training process with different batch_size. What is the reason that the training result looks better when the batch size is increased to 8? As far as I Is there a relationship in my datasample that I was not aware of?
First of all you are useing batchnorm, which, as the name suggests normalises samples based on statistics in the batch, thus it will work better if the sample size (batch size) is bigger. Apart from this higher batch size also means lower variance of your gradient estimator and is often good.
When I train a neural network consisting of 2 convolutional and 2 fully connected layers on the MNIST handwritten digits task, I receive the following train loss curve:
The datasets contains 235 batches and I plotted the loss after each batch for 1500 batches in total, and trained therefore the model for a little more than 6 epochs. The batches are sampled using the following PyTorch code:
sampler = torch.utils.data.RandomSampler(train_dataset, replacement=False)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=sampler)
So, in each epoch every batch is looked at once and every epoch has a new shuffling of the batches. As you see in the plot, the loss decreases at every start of the epoch rapidly.
I have never seen this behavior before and I was wondering why that is the case.
For comparison, when I run the same code, but I choose to sample with replacement, I get the following (normal looking) loss curve:
My thoughts so far:
Since at the start of each epoch the specific samples are already looked at before, the network could have an easier task if it memorized the specific batches. Still, since at every epoch there is a different shuffling, the network would have to memorize all batches which in my experience does not happen after 2 epochs.
I have also tried it on another dataset and with a variation of the model with the same results.
My model structure is the following:
self.conv_part = nn.Sequential(
nn.Conv2d(1, 32, (5, 5), stride=1, padding=2),
nn.ReLU(),
nn.MaxPool2d((2, 2)),
nn.Conv2d(32, 64, (5, 5), stride=1, padding=2),
nn.ReLU(),
nn.MaxPool2d((2, 2))
)
self.linear_part = nn.Sequential(
nn.Linear(7*7*64, 1024),
nn.ReLU(),
nn.Linear(1024, 10)
)
When the model is simplified by reducing the number of channels in each layer significantly, the problem vanishes almost completely, I have marked the epochs to get a clearer view.
When I read some classical papers about CNNs, like Inception family, ResNet, VGGnet and so on, I notice the terminology concatenation, summation and aggregation, which makes me confused(summation is easy to understand for me). Could someone tell me what the differences are among them? Maybe in a more sepcific way, like using examples to illustrate the dimensionality and representation ability differences.
Concatenation generally consists of taking 2 or more output tensors from different network layers and concatenating them along the channel dimension
Aggregation consists in taking 2 or more output tensors from different network layers and applying a chosen multivariate function on them to aggregate the results
Summation is a special case of aggregation where the function is a sum
This implies that we lose information by doing aggregation. On the other hand, concatenation will make it possible to retain information at the cost of greater memory usage.
E.g. in PyTorch:
import torch
batch_size = 8
num_channels = 3
h, w = 512, 512
t1 = torch.rand(batch_size, num_channels, h, w) # A tensor with shape [8, 3, 512, 512]
t2 = torch.rand(batch_size, num_channels, h, w) # A tensor with shape [8, 3, 512, 512]
torch.cat((t1, t2), dim=1) # A tensor with shape [8, 6, 512, 512]
t1 + t2 # A tensor with shape [8, 3, 512, 512]
I'm working on 128 x 128 x 3 cell images and want to segment them into 5 classes including backgrounds. I first made target images to be 128 x 128 and values are in {0,1,2,3,4}. But I found I have to make my target ground truth as 5-channel image, and all the values are 0 or 1: if a pixel has 1 in the nth channel, then it should be classified to nth class.
But when I run my model into a Unet model which I forked from GitHub, I found there's an error while calculating cross-entropy loss.
I initially set up the number of channels in the input to be 3 and the number of classes in the output to be 5. And batch size = 2
Here is my codes:
for i, (x, y) in batch_iter:
input, target = x.to(self.device), y.to(self.device) # send to device (GPU or CPU)
self.optimizer.zero_grad() # zerograd the parameters
out = self.model(input) # one forward pass
loss = self.criterion(out, target) # calculate loss
loss_value = loss.item()
train_losses.append(loss_value)
loss.backward() # one backward pass
self.optimizer.step() # update the parameters
batch_iter.set_description(f'Training: (loss {loss_value:.4f})') # update progressbar
self.training_loss.append(np.mean(train_losses))
self.learning_rate.append(self.optimizer.param_groups[0]['lr'])
batch_iter.close()
And error message
RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [2, 5, 128, 128]
How can I solve this?
It seems you are using either nn.CrossEntropyLoss or nn.functional.cross_entropy
I also faced the same error.
CrossEntropyLoss is usually used for classification use cases.
If your targets are normalized tensors with values in [0, 1], you could use nn.BCELoss or nn.functional.binary_cross_entropy_with_logits. This worked in my case as we are using separate mask for each class - it becomes a binary cross entropy problem.