In some deep learning models which analyse temporal data (e.g. audio, or video), we use a "time-distributed dense" (TDD) layer. What this creates is a fully-connected (dense) layer which is applied separately to every time-step.
In Keras this can be done using the TimeDistributed wrapper, which is actually slightly more general. In PyTorch it's been an open feature request for a couple of years.
How can we implement time-distributed dense manually in PyTorch?
Specifically for time-distributed dense (and not time-distributed anything else), we can hack it by using a convolutional layer.
Look at the diagram you've shown of the TDD layer. We can re-imagine it as a convolutional layer, where the convolutional kernel has a "width" (in time) of exactly 1, and a "height" that matches the full height of the tensor. If we do this, while also making sure that our kernel is not allowed to move beyond the edge of the tensor, it should work:
self.tdd = nn.Conv2d(1, num_of_output_channels, (num_of_input_channels, 1))
You may need to do some rearrangement of tensor axes. The "input channels" for this line of code are in fact coming from the "freq" axis (the "image's y axis") of your tensor, and the "output channels" will indeed be arranged on the "channel" axis. (The "y axis" of the output will be a singleton dimension of height 1.)
As pointed out in the discussion you referred to:
Meanwhile this #1935 will make TimeDistributed/Bottle unnecessary for Linear layers.
For TDD layer, it would be applying the linear layer directly on the inputs with time slices.
In [1]: import torch
In [2]: m = torch.nn.Linear(20, 30)
In [3]: input = torch.randn(128, 5, 20)
In [4]: output = m(input)
In [5]: print(output.size())
torch.Size([128, 5, 30])
The Following is a short illustration of the computational results
In [1]: import torch
In [2]: m = torch.nn.Linear(2, 3, bias=False)
...:
...: for name, param in m.named_parameters():
...: print(name)
...: print(param)
...:
weight
Parameter containing:
tensor([[-0.3713, -0.1113],
[ 0.2938, 0.4709],
[ 0.2791, 0.5355]], requires_grad=True)
In [3]: input = torch.stack([torch.ones(3, 2), 2 * torch.ones(3, 2)], dim=0)
...: print(input)
tensor([[[1., 1.],
[1., 1.],
[1., 1.]],
[[2., 2.],
[2., 2.],
[2., 2.]]])
In [4]: m(input)
Out[4]:
tensor([[[-0.4826, 0.7647, 0.8145],
[-0.4826, 0.7647, 0.8145],
[-0.4826, 0.7647, 0.8145]],
[[-0.9652, 1.5294, 1.6291],
[-0.9652, 1.5294, 1.6291],
[-0.9652, 1.5294, 1.6291]]], grad_fn=<UnsafeViewBackward>)
More details of the operation of nn.Linear can be seen from the torch.matmul. Note, you may need to add another non-linear function like torch.tanh() to get exact same layer as Dense() in Keras, where they support such non-linearity as keyword argument activation='tanh'.
For Timedistributed with e.g., CNN layers, maybe the snippet from the PyTorch forum could be useful.
Related
I watched the following video on YouTube https://www.youtube.com/watch?v=jx9iyQZhSwI where it was shown that it is possible to use Gradio and the learned model of MNIST dataset in Tensorflow. I have read and written that it is possible to use Pytorch in Gradio, but I have problems with its implementation. Does anyone have an idea how to do this?
My Pytorch code of cnn
import torch.nn as nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=16,
kernel_size=5,
stride=1,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
)
self.conv2 = nn.Sequential(
nn.Conv2d(16, 32, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(2),
)
# fully connected layer, output 10 classes
self.out = nn.Linear(32 * 7 * 7, 10)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
# flatten the output of conv2 to (batch_size, 32 * 7 * 7)
x = x.view(x.size(0), -1)
output = self.out(x)
return output, x # return x for visualization
By watching I find that I need to change function that Gradio use
def predict_image(img):
img_3d=img.reshape(-1,28,28)
im_resize=img_3d/255.0
prediction=CNN(im_resize)
pred=np.argmax(prediction)
return pred
Im sorry if I got your question wrong, but from what I understand you are getting an error when trying to predict the digit using your function predict image.
So here are two possible hints. Maybe you have implemented them already, but I don't know because of the very small code snippet.
First of all. Have you set your model into evaluation mode using
CNN.eval()
Do after you finished training your model and want to evaluate inputs without training the model.
Second of all, maybe you need to add a fourth dimension to your input tensor "im_resize". Normally your model expects a dimension for the number of channels, the batch size, the height and the width of your input.
In addition I can not tell if your input is a of the datatype torch.tensor . If not transform your array into a tensor first.
You can add a batch dimension to your input tensor by using
im_resize = im_resize.unsqueeze(0)
I hope that I understand your question correctly and was able to help you.
I am trying to use resnet18 and densenet121 as the pretrained models and have added 1 FC layer at the end of each network two change dimensions to 512 of both network and then have concatenated the outputs and passed to two final FC layers.
this can be seen here:
class classifier(nn.Module):
def __init__(self,num_classes):
super(classifier,self).__init__()
self.resnet=models.resnet18(pretrained=True)
self.rfc1=nn.Linear(512,512)
self.densenet=models.densenet121(pretrained=True)
self.dfc1=nn.Linear(1024,512)
self.final_fc1=nn.Linear(1024,512)
self.final_fc2=nn.Linear(512,num_classes)
self.dropout=nn.Dropout(0.2)
def forward(self,x):
y=x.detach().clone()
x=self.resnet.conv1(x)
x=self.resnet.bn1(x)
x=self.resnet.relu(x)
x=self.resnet.maxpool(x)
x=self.resnet.layer1(x)
x=self.resnet.layer2(x)
x=self.resnet.layer3(x)
x=self.resnet.layer4(x)
x=self.resnet.avgpool(x)
x=x.view(x.size(0),-1)
x=nn.functional.relu(self.rfc1(x))
y=self.densenet.features(y)
y=y.view(y.size(0),-1)
y=nn.functional.relu(self.dfc1(y))
x=torch.cat((x,y),0)
x=nn.functional.relu(self.final_fc1(x))
x=self.dropout(x)
x=self.final_fc2(x)
return x
Error:
size mismatch, m1: [1048576 x 1], m2: [1024 x 512] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283
but as i could see the densenet outputs 1024 feature to the last layer
Questions:
Is the implementation correct ?
Can this implementation work as a Ensemble
In keras, is it possible to share weights between two layers, but to have other parameters differ? Consider the following (admittedly a bit contrived) example:
conv1 = Conv2D(64, 3, input_shape=input_shape, padding='same')
conv2 = Conv2D(64, 3, input_shape=input_shape, padding='valid')
Notice that the layers are identical except for the padding. Can I get keras to use the same weights for both? (i.e. also train the network accordingly?)
I've looked at the keras doc, and the section on shared layers seems to imply that sharing works only if the layers are completely identical.
To my knowledge, this cannot be done by the common "API level" of Keras usage.
However, if you dig a bit deeper, there are some (ugly) ways to share the weights.
First of all, the weights of the Conv2D layers are created inside the build() function, by calling add_weight():
self.kernel = self.add_weight(shape=kernel_shape,
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
For your provided usage (i.e., default trainable/constraint/regularizer/initializer), add_weight() does nothing special but appending the weight variables to _trainable_weights:
weight = K.variable(initializer(shape), dtype=dtype, name=name)
...
self._trainable_weights.append(weight)
Finally, since build() is only called inside __call__() if the layer hasn't been built, shared weights between layers can be created by:
Call conv1.build() to initialize the conv1.kernel and conv1.bias variables to be shared.
Call conv2.build() to initialize the layer.
Replace conv2.kernel and conv2.bias by conv1.kernel and conv1.bias.
Remove conv2.kernel and conv2.bias from conv2._trainable_weights.
Append conv1.kernel and conv1.bias to conv2._trainable_weights.
Finish model definition. Here conv2.__call__() will be called; however, since conv2 has already been built, the weights are not going to be re-initialized.
The following code snippet may be helpful:
def create_shared_weights(conv1, conv2, input_shape):
with K.name_scope(conv1.name):
conv1.build(input_shape)
with K.name_scope(conv2.name):
conv2.build(input_shape)
conv2.kernel = conv1.kernel
conv2.bias = conv1.bias
conv2._trainable_weights = []
conv2._trainable_weights.append(conv2.kernel)
conv2._trainable_weights.append(conv2.bias)
# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights) # True
# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')
X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())]) # [True, True]
One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.
In PyTorch, we can define architectures in multiple ways. Here, I'd like to create a simple LSTM network using the Sequential module.
In Lua's torch I would usually go with:
model = nn.Sequential()
model:add(nn.SplitTable(1,2))
model:add(nn.Sequencer(nn.LSTM(inputSize, hiddenSize)))
model:add(nn.SelectTable(-1)) -- last step of output sequence
model:add(nn.Linear(hiddenSize, classes_n))
However, in PyTorch, I don't find the equivalent of SelectTable to get the last output.
nn.Sequential(
nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
# what to put here to retrieve last output of LSTM ?,
nn.Linear(hiddenSize, classe_n))
Define a class to extract the last cell output:
# LSTM() returns tuple of (tensor, (recurrent state))
class extract_tensor(nn.Module):
def forward(self,x):
# Output shape (batch, features, hidden)
tensor, _ = x
# Reshape shape (batch, hidden)
return tensor[:, -1, :]
nn.Sequential(
nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
extract_tensor(),
nn.Linear(hiddenSize, classe_n)
)
According to the LSTM cell documentation the outputs parameter has a shape of (seq_len, batch, hidden_size * num_directions) so you can easily take the last element of the sequence in this way:
rnn = nn.LSTM(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, (h0, c0))
print(output[-1]) # last element
Tensor manipulation and Neural networks design in PyTorch is incredibly easier than in Torch so you rarely have to use containers. In fact, as stated in the tutorial PyTorch for former Torch users PyTorch is built around Autograd so you don't need anymore to worry about containers. However, if you want to use your old Lua Torch code you can have a look to the Legacy package.
As far as I'm concerned there's nothing like a SplitTable or a SelectTable in PyTorch. That said, you are allowed to concatenate an arbitrary number of modules or blocks within a single architecture, and you can use this property to retrieve the output of a certain layer. Let's make it more clear with a simple example.
Suppose I want to build a simple two-layer MLP and retrieve the output of each layer. I can build a custom class inheriting from nn.Module:
class MyMLP(nn.Module):
def __init__(self, in_channels, out_channels_1, out_channels_2):
# first of all, calling base class constructor
super().__init__()
# now I can build my modular network
self.block1 = nn.Linear(in_channels, out_channels_1)
self.block2 = nn.Linear(out_channels_1, out_channels_2)
# you MUST implement a forward(input) method whenever inheriting from nn.Module
def forward(x):
# first_out will now be your output of the first block
first_out = self.block1(x)
x = self.block2(first_out)
# by returning both x and first_out, you can now access the first layer's output
return x, first_out
In your main file you can now declare the custom architecture and use it:
from myFile import MyMLP
import numpy as np
in_ch = out_ch_1 = out_ch_2 = 64
# some fake input instance
x = np.random.rand(in_ch)
my_mlp = MyMLP(in_ch, out_ch_1, out_ch_2)
# get your outputs
final_out, first_layer_out = my_mlp(x)
Moreover, you could concatenate two MyMLP in a more complex model definition and retrieve the output of each one in a similar way.
I hope this is enough to clarify, but in case you have more questions, please feel free to ask, since I may have omitted something.
I am working on image OCR with my own dataset, I have 1000 images of variable length and I want to feed in images in form of patches of 46X1. I have generated patches of my images and my label values are in Urdu text, so I have encoded them as utf-8. I want to implement CTC in the output layer. I have tried to implement CTC following the image_ocr example at github. But I get the following error in my CTC implementation.
'numpy.ndarray' object has no attribute 'get_shape'
Could anyone guide me about my mistakes? Kindly suggest the solution for it.
My code is:
X_train, X_test, Y_train, Y_test =train_test_split(imageList, labelList, test_size=0.3)
X_train_patches = np.array([image.extract_patches_2d(X_train[i], (46, 1))for i in range (700)]).reshape(700,1,1) #(Samples, timesteps,dimensions)
X_test_patches = np.array([image.extract_patches_2d(X_test[i], (46, 1))for i in range (300)]).reshape(300,1,1)
Y_train=np.array([i.encode("utf-8") for i in str(Y_train)])
Label_length=1
input_length=1
####################Loss Function########
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
#Building Model
model =Sequential()
model.add(LSTM(20, input_shape=(None, X_train_patches.shape[2]), return_sequences=True))
model.add(Activation('relu'))
model.add(TimeDistributed(Dense(12)))
model.add(Activation('tanh'))
model.add(LSTM(60, return_sequences=True))
model.add(Activation('relu'))
model.add(TimeDistributed(Dense(40)))
model.add(Activation('tanh'))
model.add(LSTM(100, return_sequences=True))
model.add(Activation('relu'))
loss_out = Lambda(ctc_lambda_func, name='ctc')([X_train_patches, Y_train, input_length, Label_length])
The way CTC is modelled currently in Keras is that you need to implement the loss function as a layer, you did that already (loss_out). Your problem is that the inputs you give that layer are not tensors from Theano/TensorFlow but numpy arrays.
To change that one option is to model these values as inputs to your model. This is exactly what the implementation does that you copied the code from:
labels = Input(name='the_labels', shape=[img_gen.absolute_max_string_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
To make this work you need to ditch the Sequential model and use the functional model API, exactly as done in the code linked above.