RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x3072 and 1024x512)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x3072 and 1024x512) - deep-learning

I'm trying to create a Pytorch Neural Network and keep getting this error
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x3072 and 1024x512)
Here is my code where I create the model:
# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(32*32, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
print(model)
An answer would be much appreciated

There can be two problems here:
Either the tensor is not being flattened which is possible in cases such as 64x1x3076 sized tensor.
The problem might be in code unrelated to your model but in the training loop of the code. Please add that part of the code as well.

Related

CNN-LSTM performance identical to LSTM, is there a mistake in my code? (Pytorch)

I am trying to recreate the models from a study in which CNN-LSTM outperformed LSTM, but my CNN-LSTM produces nearly identical results to the LSTM. So it seems like the addition of the convolutional layers is not doing anything. The study describes the CNN-LSTM model like this:
The model is constructed by a single LSTM layer and two CNN layers. To form the CNN part, two 1D convolutional neural networks are stacked without any pooling layer. The second CNN layer is followed by a Rectified Linear Unit (ReLU) activation function. Each of the flattened output of the CNN’s ReLU layer and the LSTM layer is projected to the same dimension using a fully connected layer. Finally, a dropout layer is placed before the output layer.
Did I make a mistake in the implementation? The results of my CNN-LSTM are almost exactly the same as when I use the LSTM on its own. The LSTM on its own is the exact same code as below, just without the two conv1d layers and without the ReLu activation function.
class CNN_LSTM(nn.Module):
def __init__(self, input_size, seq_len, params, output_size):
super(CNN_LSTM, self).__init__()
self.n_hidden = params['lstm_hidden'] # neurons in each lstm layer
self.seq_len = seq_len # length of the input sequence
self.n_layers = 1 # nr of recurrent layers in the lstm
self.n_filters = params['n_filters'] # size of filter in cnn
self.c1 = nn.Conv1d(in_channels=1, out_channels=params['n_filters'], kernel_size=1, stride=1)
self.c2 = nn.Conv1d(in_channels=params['n_filters'], out_channels=1, kernel_size=1, stride=1)
self.lstm = nn.LSTM(
input_size=input_size, # nr of input features
hidden_size=params['lstm_hidden'],
num_layers=1
)
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(in_features=seq_len*params['lstm_hidden'], out_features=params['dense_hidden'])
self.dropout = nn.Dropout(p=.4)
self.fc2 = nn.Linear(in_features=params['dense_hidden'], out_features=output_size) # output_size = nr of output features
def reset_hidden_state(self):
self.hidden = (
torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),
torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),
)
def forward(self, sequences):
out = self.c1(sequences.view(len(sequences), 1, -1))
out = self.c2(out.view(len(out), self.n_filters, -1))
out = F.relu(out)
out, self.hidden = self.lstm(
out.view(len(out), self.seq_len, -1),
self.hidden
)
out = self.flatten(out)
out = self.fc1(out)
out = self.dropout(out)
out = self.fc2(out)
return out
Source for the study I am using.

apply LSTM on BERT embedding

I use a code to classify toxic tweet
I want to modify the following code to used LSTM architecture
the code fed BERT embedding to LSTM Layers
class BertClassifier(nn.Module):
def __init__(self, freeze_bert=False):
super(BertClassifier, self).__init__()
# Specify hidden size of BERT, hidden size of our classifier, and number of labels
D_in, H, D_out = 768, 50, 2
# Instantiate BERT model
self.bert = BertModel.from_pretrained('aubmindlab/bert-base-arabertv02-twitter')
# Instantiate an one-layer feed-forward classifier
self.classifier = nn.Sequential(
nn.Linear(D_in, H),
nn.ReLU(),
#nn.Dropout(0.5),
nn.Linear(H, D_out)
)
if freeze_bert:
for param in self.bert.parameters():
param.requires_grad = False
when I use LSTM an error appears that forward function must be modified
def forward(self, input_ids, attention_mask):
#return logits (torch.Tensor): an output tensor with shape (batch_size,num_labels)
outputs = self.bert(input_ids=input_ids,
attention_mask=attention_mask)
last_hidden_state_cls = outputs[0][:, 0, :]
logits = self.classifier(last_hidden_state_cls)
return logits

Gradio - Pytorch MNIST Digit Recognizer

I watched the following video on YouTube https://www.youtube.com/watch?v=jx9iyQZhSwI where it was shown that it is possible to use Gradio and the learned model of MNIST dataset in Tensorflow. I have read and written that it is possible to use Pytorch in Gradio, but I have problems with its implementation. Does anyone have an idea how to do this?
My Pytorch code of cnn
import torch.nn as nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=16,
kernel_size=5,
stride=1,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
)
self.conv2 = nn.Sequential(
nn.Conv2d(16, 32, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(2),
)
# fully connected layer, output 10 classes
self.out = nn.Linear(32 * 7 * 7, 10)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
# flatten the output of conv2 to (batch_size, 32 * 7 * 7)
x = x.view(x.size(0), -1)
output = self.out(x)
return output, x # return x for visualization
By watching I find that I need to change function that Gradio use
def predict_image(img):
img_3d=img.reshape(-1,28,28)
im_resize=img_3d/255.0
prediction=CNN(im_resize)
pred=np.argmax(prediction)
return pred

Im sorry if I got your question wrong, but from what I understand you are getting an error when trying to predict the digit using your function predict image.
So here are two possible hints. Maybe you have implemented them already, but I don't know because of the very small code snippet.
First of all. Have you set your model into evaluation mode using
CNN.eval()
Do after you finished training your model and want to evaluate inputs without training the model.
Second of all, maybe you need to add a fourth dimension to your input tensor "im_resize". Normally your model expects a dimension for the number of channels, the batch size, the height and the width of your input.
In addition I can not tell if your input is a of the datatype torch.tensor . If not transform your array into a tensor first.
You can add a batch dimension to your input tensor by using
im_resize = im_resize.unsqueeze(0)
I hope that I understand your question correctly and was able to help you.

Pytorch model running out of memory on both CPU and GPU, can’t figure out what I’m doing wrong

Trying to implement a simple multi-label image classifier using Pytorch Lightning. Here's the model definition:
import torch
from torch import nn
# creates network class
class Net(pl.LightningModule):
def __init__(self):
super().__init__()
# defines conv layers
self.conv_layer_b1 = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32,
kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
)
# passes dummy x matrix to find the input size of the fc layer
x = torch.randn(1, 3, 800, 600)
self._to_linear = None
self.forward(x)
# defines fc layer
self.fc_layer = nn.Sequential(
nn.Linear(in_features=self._to_linear,
out_features=256),
nn.ReLU(),
nn.Linear(256, 5),
)
# defines accuracy metric
self.accuracy = pl.metrics.Accuracy()
self.confusion_matrix = pl.metrics.ConfusionMatrix(num_classes=5)
def forward(self, x):
x = self.conv_layer_b1(x)
if self._to_linear is None:
# does not run fc layer if input size is not determined yet
self._to_linear = x.shape[1]
else:
x = self.fc_layer(x)
return x
def cross_entropy_loss(self, logits, y):
criterion = nn.CrossEntropyLoss()
return criterion(logits, y)
def training_step(self, train_batch, batch_idx):
x, y = train_batch
logits = self.forward(x)
train_loss = self.cross_entropy_loss(logits, y)
train_acc = self.accuracy(logits, y)
train_cm = self.confusion_matrix(logits, y)
self.log('train_loss', train_loss)
self.log('train_acc', train_acc)
self.log('train_cm', train_cm)
return train_loss
def validation_step(self, val_batch, batch_idx):
x, y = val_batch
logits = self.forward(x)
val_loss = self.cross_entropy_loss(logits, y)
val_acc = self.accuracy(logits, y)
return {'val_loss': val_loss, 'val_acc': val_acc}
def validation_epoch_end(self, outputs):
avg_val_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
avg_val_acc = torch.stack([x['val_acc'] for x in outputs]).mean()
self.log("val_loss", avg_val_loss)
self.log("val_acc", avg_val_acc)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=0.0008)
return optimizer
The issue is probably not the machine since I'm using a cloud instance with 60 GBs of RAM and 12 GBs of VRAM. Whenever I run this model even for a single epoch, I get an out of memory error. On the CPU it looks like this:
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1966080000 bytes. Error code 12 (Cannot allocate memory)
and on the GPU it looks like this:
RuntimeError: CUDA out of memory. Tried to allocate 7.32 GiB (GPU 0; 11.17 GiB total capacity; 4.00 KiB already allocated; 2.56 GiB free; 2.00 MiB reserved in total by PyTorch)
Clearing the cache and reducing the batch size did not work. I'm a novice so clearly something here is exploding but I can't tell what. Any help would be appreciated.
Thank you!

Indeed, it's not a machine issue; the model itself is simply unreasonably big. Typically, if you take a look at common CNN models, the fc layers occur near the end, after the inputs already pass through quite a few convolutional blocks (and have their spatial resolutions reduced).
Assuming inputs are of shape (batch, 3, 800, 600), while passing the conv_layer_b1 layer, the feature map shape would be (batch, 32, 400, 300) after the MaxPool operation. After flattening, the inputs become (batch, 32 * 400 * 300), ie, (batch, 3840000).
The immediately following fc_layer thus contains nn.Linear(3840000, 256), which is simply absurd. This single linear layer contains ~983 million trainable parameters! For reference, popular image classification CNNs roughly have 3 to 30 million parameters on average, with larger variants reaching 60 to 80 million. Few ever really cross the 100 million mark.
You can count your model params with this:
def count_params(model):
return sum(map(lambda p: p.data.numel(), model.parameters()))
My advice: 800 x 600 is really a massive input size. Reduce it to something like 400 x 300, if possible. Furthermore, add several convolutional blocks similar to conv_layer_b1, before the FC layer. For example:
def get_conv_block(C_in, C_out):
return nn.Sequential(
nn.Conv2d(in_channels=C_in, out_channels=C_out,
kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
class Net(pl.LightningModule):
def __init__(self):
super().__init__()
# defines conv layers
self.conv_layer_b1 = get_conv_block(3, 16)
self.conv_layer_b2 = get_conv_block(16, 32)
self.conv_layer_b3 = get_conv_block(32, 64)
self.conv_layer_b4 = get_conv_block(64, 128)
self.conv_layer_b5 = get_conv_block(128, 256)
# passes dummy x matrix to find the input size of the fc layer
x = torch.randn(1, 3, 800, 600)
self._to_linear = None
self.forward(x)
# defines fc layer
self.fc_layer = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=self._to_linear,
out_features=256),
nn.ReLU(),
nn.Linear(256, 5)
)
# defines accuracy metric
self.accuracy = pl.metrics.Accuracy()
self.confusion_matrix = pl.metrics.ConfusionMatrix(num_classes=5)
def forward(self, x):
x = self.conv_layer_b1(x)
x = self.conv_layer_b2(x)
x = self.conv_layer_b3(x)
x = self.conv_layer_b4(x)
x = self.conv_layer_b5(x)
if self._to_linear is None:
# does not run fc layer if input size is not determined yet
self._to_linear = nn.Flatten()(x).shape[1]
else:
x = self.fc_layer(x)
return x
Here, because more conv-relu-pool layers are applied, the input is reduced to a feature map of a much smaller shape, (batch, 256, 25, 18), and the overall number of trainable parameters would be reduced to about ~30 million parameters.

GAN, generate regression output by the real image, not from the random noise

Is this concept possible to be implemented with the GAN algorithm?
I want the GAN to generate a regression-output(G-Value) of the shape(4,) by the real-image, not from the random noise, and discriminate G-Value with real regression-value(R-Value) of the same shape(4, ). R-Value is of the "y-train" dataset.
It means that if an image has a pattern like circular, it generally has the 4 features of position x, y, z, and alpha. I call it Real-Value(R-Value) and I want the GAN to generate fake value (G-Value) fooling the discriminator.
I have tried to implement it as below.
class UTModel:
def __init__(self):
optimizer__ = Adam(2e-4)
self.__dropout = .3
self.optimizerGenerator = Adam(1e-4)
self.optimizerDiscriminator = Adam(1e-4)
self.generator, self.discriminator = self.build()
def build(self):
# build the generator
g = Sequential()
g.add(Conv2D(512, kernel_size=3, strides=2, input_shape=(128, 128, 1), padding='same'))
g.add(BatchNormalization(momentum=0.8))
g.add(LeakyReLU(alpha=0.2))
g.add(Dropout(self.__dropout))
g.add(Conv2D(256, kernel_size=3, strides=2, padding='same'))
g.add(BatchNormalization(momentum=0.8))
g.add(LeakyReLU(alpha=0.2))
g.add(Dropout(self.__dropout))
g.add(Conv2D(128, kernel_size=3, strides=2, padding='same'))
g.add(BatchNormalization(momentum=0.8))
g.add(LeakyReLU(alpha=0.2))
g.add(Dropout(self.__dropout))
g.add(Conv2D(64, kernel_size=3, strides=1, padding='same'))
g.add(BatchNormalization(momentum=0.8))
g.add(LeakyReLU(alpha=0.2))
g.add(Dropout(self.__dropout))
g.add(Flatten())
g.add(Dense(4, activation='linear'))
# build the discriminator
d = Sequential()
d.add(Dense(128, input_shape=(4,)))
d.add(LeakyReLU(alpha=0.2))
d.add(Dropout(self.__dropout))
d.add(Dense(64))
d.add(LeakyReLU(alpha=0.2))
d.add(Dropout(self.__dropout))
d.add(Dense(64))
d.add(LeakyReLU(alpha=0.2))
d.add(Dropout(self.__dropout))
d.add(Dense(32))
d.add(LeakyReLU(alpha=0.2))
d.add(Dropout(self.__dropout))
d.add(Dense(1, activation='sigmoid'))
return g, d
def computeLosses(self, rValid, fValid):
bce = BinaryCrossentropy(from_logits=True)
# Discriminator loss
rLoss = bce(tf.ones_like(rValid), rValid)
fLoss = bce(tf.zeros_like(fValid), fValid)
dLoss = rLoss + fLoss
# Generator loss
gLoss = bce(tf.zeros_like(fValid), fValid)
return dLoss, gLoss
def train(self, images, rValues):
with tf.GradientTape() as gTape, tf.GradientTape() as dTape:
gValues = self.generator(images, training=True)
rValid = self.discriminator(rValues, training=True)
fValid = self.discriminator(gValues, training=True)
dLoss, gLoss = self.computeLosses(rValid, fValid)
dGradients = dTape.gradient(dLoss, self.discriminator.trainable_variables)
gGradients = gTape.gradient(gLoss, self.generator.trainable_variables)
self.optimizerDiscriminator.apply_gradients(zip(dGradients, self.discriminator.trainable_variables))
self.optimizerGenerator.apply_gradients(zip(gGradients, self.generator.trainable_variables))
print (dLoss, gLoss)
class UTTrainer:
def __init__(self):
self.env = 3DPatterns()
self.model = UTModel()
def start(self):
if not self.env.available:
return
batch = 32
for epoch in range(1):
# set new episod
while self.env.setEpisod():
for i in range(0, self.env.episodelen, batch):
self.model.train(self.env.episode[i:i+batch], self.env.y[i:i+batch])
But the G-Values have not generated as valid values. It converges the 1 or -1 always. The proper value should be like [-0.192798, 0.212887, -0.034519, -0.015000]. Please help me to find the right way.
Thank you.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x3072 and 1024x512) - deep-learning

There can be two problems here: Either the tensor is not being flattened which is possible in cases such as 64x1x3076 sized tensor. The problem might be in code unrelated to your model but in the training loop of the code. Please add that part of the code as well.

Related

CNN-LSTM performance identical to LSTM, is there a mistake in my code? (Pytorch)

apply LSTM on BERT embedding

Gradio - Pytorch MNIST Digit Recognizer

Pytorch model running out of memory on both CPU and GPU, can’t figure out what I’m doing wrong

GAN, generate regression output by the real image, not from the random noise

Categories

Resources