Model doesn't predict in eval mode using the same dataset - deep-learning

I have created simple model to detect moving object and display its coordinates. Dataset was consisting of thermal videos captured from above, where the target was almost a dot. My model was predicting coordinates with accuracy of 1 pixel in 80% of all training frames.
However, when I switched model to the eval() mode, just for a try, I gave the same inputs as for training and the results were way different. How is it possible and what can I do to restore previous accuracy??
Here's my model:
(conv1): Conv2d(1, 40, kernel_size=(6, 6), stride=(5, 5))
(conv2): Conv2d(40, 40, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
(conv3): Conv2d(40, 120, kernel_size=(3, 4), stride=(3, 3))
(relu0): ReLU()
(maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(flatten1): Flatten(start_dim=1, end_dim=-1)
(lstm): LSTM(20, 175, num_layers=2)
(flatten2): Flatten(start_dim=0, end_dim=-1)
(linear3): Linear(in_features=21000, out_features=2, bias=True)
And optimizer:
Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
capturable: False
differentiable: False
eps: 1e-08
foreach: None
fused: False
lr: 0.0005
maximize: False
weight_decay: 0
)

A common source for such behavior is when Dropout or BatchNorm is applied (since it leads to different network behavior during training and evaluation), this is however not the case in your model as far as I can tell. Instead of using model.eval() you can also try model.train(mode=False) to see if the behavior is consistent. Obtaining a single batch with 80% accuracy while the average remains below this can however happen. Thus ensure that the two data sources you are testing on are identical and that the model is not reinitialized or otherwise altered between training and testing.

Related

Training and validation accuracy producing overfitting

Below are the code of CNN model, issue is the training accuracy is 96% and validation accuracy is 69%. help me to increase the validation accuracy.
`model = Sequential()`
`model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape=(128,128,1), padding ='same', name='Conv_1'))`
`model.add(MaxPooling2D((2,2),name='MaxPool_1'))
`model.add(Conv2D(64, (3, 3), activation = 'relu',padding ='same', name='Conv_2'))
`model.add(MaxPooling2D((2,2),name='MaxPool_2'))
`model.add(Conv2D(128, (3, 3), activation = 'relu', padding ='same', name='Conv_3'))
`model.add(Flatten(name='Flatten'))`
`model.add(Dropout(0.5,name='Dropout'))
`model.add(Dense(128, kernel_initializer='normal', activation='relu', name='Dense_1'))
`model.add(Dense(1, kernel_initializer='normal', activation='sigmoid', name='Dense_2'))`
`model.summary()`
`model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])`
`history = model.fit(x_train2, y_train2, epochs=25, batch_size=10, verbose=2, validation_data=(x_test, y_test))`
Findings:
Train: accuracy = 0.937500 ; loss = 0.125126
Test: accuracy = 0.662508 ; loss = 1.089228
first, you must try increase the epoch
second, maybe you should try add more the Dense layer before the dense for classification or add Dropout layer after the last Conv2d layer.
I hope you get better

How to select an action from a matrix in Q learning when using multiple frames as input

When using deep q-learning I am trying to capture motion by passing a number of grayscale frames as the input, each with the dimensions 90x90. There will be four 90x90 frames passed in to allow the network to detect motion. The multiple frames should be considered a single state rather than a batch of 4 states, how can I get a vector of actions as a result instead of a matrix?
I am using pytorch and it will return a matrix of 4x7 - a row of actions for each frame. here is the network:
self.conv1 = Conv2d(self.channels, 32, 8)
self.conv2 = Conv2d(32, 64, 4)
self.conv3 = Conv2d(64, 128, 3)
self.fc1 = Linear(128 * 52 * 52, 64)
self.fc2 = Linear(64, 32)
self.output = Linear(32, action_space)
Select the action with the highest value.
Let's call the output tensor be called action_values.
action=torch.argmax(action_values.data)
or
action=np.argmax(action_values.cpu().data.numpy())

Having trouble with input dimensions for Pytorch LSTM with torchtext

Problem
I'm trying to build a text classifier network using LSTM. The error I'm getting is:
RuntimeError: Expected hidden[0] size (4, 600, 256), got (4, 64, 256)
Details
The data is json and looks like this:
{"cat": "music", "desc": "I'm in love with the song's intro!", "sent": "h"}
I'm using torchtext to load the data.
from torchtext import data
from torchtext import datasets
TEXT = data.Field(fix_length = 600)
LABEL = data.Field(fix_length = 10)
BATCH_SIZE = 64
fields = {
'cat': ('c', LABEL),
'desc': ('d', TEXT),
'sent': ('s', LABEL),
}
My LSTM looks like this
EMBEDDING_DIM = 64
HIDDEN_DIM = 256
N_LAYERS = 4
MyLSTM(
(embedding): Embedding(11967, 64)
(lstm): LSTM(64, 256, num_layers=4, batch_first=True, dropout=0.5)
(dropout): Dropout(p=0.3, inplace=False)
(fc): Linear(in_features=256, out_features=8, bias=True)
(sig): Sigmoid()
)
I end up with the following dimensions for the inputs and labels
batch = list(train_iterator)[0]
inputs, labels = batch
print(inputs.shape) # torch.Size([600, 64])
print(labels.shape) # torch.Size([100, 2, 64])
And my initialized hidden tensor looks like:
hidden # [torch.Size([4, 64, 256]), torch.Size([4, 64, 256])]
Question
I'm trying to understand what the dimensions at each step should be.
Should the hidden dimension be initialized to (4, 600, 256) or (4, 64, 256)?
The documentation of nn.LSTM - Inputs explains what the dimensions are:
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1.
Therefore, your hidden state should have size (4, 64, 256), so you did that correctly. On the other hand, you are not providing the correct size for the input.
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() or torch.nn.utils.rnn.pack_sequence() for details.
While it says that the size of the input needs to be (seq_len, batch, input_size), you've set batch_first=True in your LSTM, which swaps batch and seq_len. Therefore your input should have size (batch_size, seq_len, input_size), but that is not the case as your input has seq_len first (600) and batch second (64), which is the default in torchtext because that's the more common representation, which also matches the default behaviour of LSTM.
You need to set batch_first=False in your LSTM.
Alternatively. if you prefer having batch as the first dimension in general, torch.data.Field also has the batch_first option.

How is Siamese network realized with Pytorch if it is single input during inference?

I am trying to train one CNN model with Pytorch, so that the output behaves differently for different types of inputs. (i.e. If the input images are human-beings, it outputs pattern A, but if the input is some other animals, it outputs pattern B).
After some online search, it seems Siamese network is related to this. So I have the following 2 questions:
(1) Is Siamese network really a good way to train such a model?
(2) From the implementation point of view, how should I implement the code in pytorch?
class SiameseNetwork(nn.Module):
def __init__(self):
super(SiameseNetwork, self).__init__()
self.cnn1 = nn.Sequential(
nn.ReflectionPad2d(1),
nn.Conv2d(1, 4, kernel_size=3),
nn.ReLU(inplace=True),
nn.BatchNorm2d(4),
nn.ReflectionPad2d(1),
nn.Conv2d(4, 8, kernel_size=3),
nn.ReLU(inplace=True),
nn.BatchNorm2d(8),
nn.ReflectionPad2d(1),
nn.Conv2d(8, 8, kernel_size=3),
nn.ReLU(inplace=True),
nn.BatchNorm2d(8),
)
self.fc1 = nn.Sequential(
nn.Linear(8*100*100, 500),
nn.ReLU(inplace=True),
nn.Linear(500, 500),
nn.ReLU(inplace=True),
nn.Linear(500, 5))
def forward_once(self, x):
output = self.cnn1(x)
output = output.view(output.size()[0], -1)
output = self.fc1(output)
return output
def forward(self, input1, input2):
output1 = self.forward_once(input1)
output2 = self.forward_once(input2)
return output1, output2
Currently, I am trying some existing implementation I found online like the above class definition. It works, but there will always be two inputs and two outputs for this model. I agree that it is convenient for training, but ideally, it should be only one input and one (two is also fine) output during inference.
Could someone provide some guidance on how to modify the code to make it single input?
You can call forward_once during inference: this takes a single input and returns a single output. Note that explicitly calling forward_once will not invoke any hooks you might have on forward/backward calls of your module.
Alternatively, you can make forward_once your module's forward function, and make your training function do the double calling of your model (which makes more sense: Siamese networks is a training method, and not part of a network's architecture).

Can I share weights between keras layers but have other parameters differ?

In keras, is it possible to share weights between two layers, but to have other parameters differ? Consider the following (admittedly a bit contrived) example:
conv1 = Conv2D(64, 3, input_shape=input_shape, padding='same')
conv2 = Conv2D(64, 3, input_shape=input_shape, padding='valid')
Notice that the layers are identical except for the padding. Can I get keras to use the same weights for both? (i.e. also train the network accordingly?)
I've looked at the keras doc, and the section on shared layers seems to imply that sharing works only if the layers are completely identical.
To my knowledge, this cannot be done by the common "API level" of Keras usage.
However, if you dig a bit deeper, there are some (ugly) ways to share the weights.
First of all, the weights of the Conv2D layers are created inside the build() function, by calling add_weight():
self.kernel = self.add_weight(shape=kernel_shape,
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
For your provided usage (i.e., default trainable/constraint/regularizer/initializer), add_weight() does nothing special but appending the weight variables to _trainable_weights:
weight = K.variable(initializer(shape), dtype=dtype, name=name)
...
self._trainable_weights.append(weight)
Finally, since build() is only called inside __call__() if the layer hasn't been built, shared weights between layers can be created by:
Call conv1.build() to initialize the conv1.kernel and conv1.bias variables to be shared.
Call conv2.build() to initialize the layer.
Replace conv2.kernel and conv2.bias by conv1.kernel and conv1.bias.
Remove conv2.kernel and conv2.bias from conv2._trainable_weights.
Append conv1.kernel and conv1.bias to conv2._trainable_weights.
Finish model definition. Here conv2.__call__() will be called; however, since conv2 has already been built, the weights are not going to be re-initialized.
The following code snippet may be helpful:
def create_shared_weights(conv1, conv2, input_shape):
with K.name_scope(conv1.name):
conv1.build(input_shape)
with K.name_scope(conv2.name):
conv2.build(input_shape)
conv2.kernel = conv1.kernel
conv2.bias = conv1.bias
conv2._trainable_weights = []
conv2._trainable_weights.append(conv2.kernel)
conv2._trainable_weights.append(conv2.bias)
# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights) # True
# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')
X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())]) # [True, True]
One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.