I'm trying to convert ResNet50 encoder to ResNet18 encoder for U-Net model from this repository https://github.com/kevinlu1211/pytorch-unet-resnet-50-encoder/blob/master/u_net_resnet_50_encoder.py. Its all new to me, I don't understand why changing the network like this does not work.
self.bridge = Bridge(512, 512)
up_blocks.append(UpBlockForUNetWithResNet50(512, 256))
up_blocks.append(UpBlockForUNetWithResNet50(256, 128))
up_blocks.append(UpBlockForUNetWithResNet50(128,64))
The error received is:
Given groups=1, weight of size [128, 192, 3, 3], expected input[2, 256, 64, 64] to have 192 channels, but got 256 channels instead.
Related
Is Next-Frame Video Prediction with Convolutional LSTMs a regression problem or a classification?
Why regression/classification?
Why do we use the last Conv3D layer?
I would consider closer to a regression problem than a classification problem since it's inputs are all the previous frames from which it learns the trend or function to fit, in this case learns the direction in which the MNIST digit might be moving and then predicts the next best possible location.
Since, it is NOT trying to classify a set of available digit postions as next_location or NOT_next_location, it doesn't seem like a classification problem.
The last layer defined as:
x = layers.Conv3D(filters=1, kernel_size=(3, 3, 3), activation="sigmoid", padding="same")(x)
Is essentially taking in all past 2D-frames(individual MNIST images), so it's (height,width,frame_num) and compressing them to predict the next single frame.
IF you go to the colab notebook link in the keras tutorial you mentioned, you can first 4 cells and add a model.summary() to see this:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, None, 64, 64, 1) 0
]
conv_lstm2d (ConvLSTM2D) (None, None, 64, 64, 64) 416256
batch_normalization (BatchN (None, None, 64, 64, 64) 256
ormalization)
conv_lstm2d_1 (ConvLSTM2D) (None, None, 64, 64, 64) 295168
batch_normalization_1 (Batc (None, None, 64, 64, 64) 256
hNormalization)
conv_lstm2d_2 (ConvLSTM2D) (None, None, 64, 64, 64) 33024
conv3d (Conv3D) (None, None, 64, 64, 1) 1729
=================================================================
Total params: 746,689
Trainable params: 746,433
Non-trainable params: 256
conv3D layer here will output a single prediction frame of dimensions (64,64,1)
Connor shorten has made a video explanations as well: Youtube Tutorial Link
been stuck on this for hours now, cant figure out where is the mistake and what i am doing wrong, basically its a simple neural network, so i had list of lists that each element is an image, from my understanding i converted it to np.array. its called x now, i printed the entire shapes, but i still cant figure out how to fix it.
the train data contains 4057 images of size 32*32, so thats 1024. i got 27 classes so thats the last output layer.
image_shape = (1024,1)
classes_num = 27
batch = 256
epoch=50
model = Sequential()
model.add(Dense(1024, activation='relu', input_shape=image_shape,kernel_regularizer=reg))
model.add(Dropout(dropout))
model.add(Dense(512,kernel_regularizer=reg))
model.add(Dropout(dropout))
model.add(Dense(512,kernel_regularizer=reg))
model.add(Dropout(dropout))
model.add(Dense(classes_num, activation='softmax'))
model.summary()
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
//original were list of list, so i made np.array from it.
x=np.array([np.array(xi) for xi in splitDataDict['train_x']])
y=np.array([np.array(xi) for xi in splitDataDict['train_y']])
val_x=np.array([np.array(xi) for xi in splitDataDict['val_x']])
val_y=np.array([np.array(xi) for xi in splitDataDict['val_y']])
print(x.shape)
print(y.shape)
print(val_x.shape)
print(val_y.shape)
history = model.fit(x, y, validation_data=(val_x, val_y), epochs=epoch, batch_size=batch)
Shapes
x = (4057, 1024, 1)
y= (4057,)
val_x = (508, 1024, 1)
val_y = (508,)
Output
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1024, 1024) 2048
_________________________________________________________________
dense_2 (Dense) (None, 1024, 512) 524800
_________________________________________________________________
dense_3 (Dense) (None, 1024, 512) 262656
_________________________________________________________________
dense_4 (Dense) (None, 1024, 27) 13851
=================================================================
Total params: 803,355
Trainable params: 803,355
Non-trainable params: 0
_________________________________________________________________
Error
ValueError: Error when checking target: expected dense_4 to have 3 dimensions, but got array with shape (4057, 1)
I'm building a DQN that takes a 24x10 array of 0,1,2 (representing a tetris board) and a int 0-5 (representing the current playable tetramino)
I flatten my array and convert it to a Tensor before inputting it to my convolutional layers but this is the error I keep on getting
Expected 4-dimensional input for 4-dimensional weight [16, 3, 240, 240], but got 1-dimensional input of size [240] instead
I've tried reducing the Kernel size and stride as well as not flattening the array but neither has worked.
For reference this is my DQN
class DQN(nn.Module):
def __init__(self):
super(DQN, self).__init__()
self.conv1_board = nn.Conv2d(3, 16, kernel_size=240, stride=1) #3 input channels for 0,1,2 . kernel_size 240 for length of tensor
self.conv2_board = nn.Conv2d(16, 32, kernel_size=240, stride=1)
self.conv3_board = nn.Conv2d(32, 6, kernel_size=240, stride=1)
self.conv1_piece = nn.Conv2d(6, 16, kernel_size=240, stride=1) #in channels 6 as 6 possible values
self.conv2_piece = nn.Conv2d(16, 6, kernel_size=240, stride=1)
self.fc1 = nn.Linear(1, 32)
self.fc2 = nn.Linear(32, 6)
self.flatten = nn.Flatten()
def flt_totns(self, arr):
flt = []
for l in arr:
flt.extend(l)
return torch.FloatTensor(flt)
def forward(self, states): #inputs to conv layers should be Tensors not list. convert list => tensor
board, piece = states
board = self.flt_totns(board)
embed_board = flatten(self.conv3_board(self.conv2_board(self.conv1_board(board))))
embed_piece = flatten(self.conv2_piece(self.conv1_piece(piece)))
embed_joined = torch.cat([embed_board, embed_piece])
return self.fc2(self.fc1(embed_joined))
I'm very new to CNNs in pytorch so I'm sure a lot of my reasoning is faulty. For example I'm still not sure how Kernel size exactly relates to the shape of your input, or if input channels still applies to array inputs. Buy any help would be greatly appreciated.
I'm having some difficulty understanding the input-output flow of layers in stacked LSTM networks. Let's say i have created a stacked LSTM network like the one below:
# parameters
time_steps = 10
features = 2
input_shape = [time_steps, features]
batch_size = 32
# model
model = Sequential()
model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
model.add(LSTM(32,input_shape=input_shape))
where our stacked-LSTM network consists of 2 LSTM layers with 64 and 32 hidden units respectively. In this scenario, we expect that at each time-step the 1st LSTM layer -LSTM(64)- will pass as input to the 2nd LSTM layer -LSTM(32)- a vector of size [batch_size, time-step, hidden_unit_length], which would represent the hidden state of the 1st LSTM layer at the current time-step. What confuses me is:
Does the 2nd LSTM layer -LSTM(32)- receives as X(t) (as input) the hidden state of the 1st layer -LSTM(64)- that has the size [batch_size, time-step, hidden_unit_length] and passes it through it's own hidden network - in this case consisting of 32 nodes-?
If the first is true, why the input_shape of the 1st -LSTM(64)- and 2nd -LSTM(32)- is the same, when the 2nd only processes the hidden state of the 1st layer? Shouldn't in our case have input_shape set to be [32, 10, 64]?
I found the LSTM visualization below very helpful (found here) but it doesn't expand on stacked-lstm networks:
Any help would be highly appreciated.
Thanks!
The input_shape is only required for the first layer. The subsequent layers take the output of previous layer as its input (as so their input_shape argument value is ignored)
The model below
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32))
represent the below architecture
Which you can verify it from model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_26 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_27 (LSTM) (None, 32) 12416
=================================================================
Replacing the line
model.add(LSTM(32))
with
model.add(LSTM(32, input_shape=(1000000, 200000)))
will still give you the same architecture (verify using model.summary()) because the input_shape is ignore as it takes as input the tensor output of the previous layer.
And If you need a sequence to sequence architecture like below
you should be using the code:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32, return_sequences=True))
which should return a model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_32 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_33 (LSTM) (None, 5, 32) 12416
=================================================================
In keras document, mentioned the input is [batch_size, time-step, input_dim], rather than [batch_size, time-step, hidden_unit_length], so I think 64, 32 coorresponding the X-input's has 64 features and LSTM-32 has 32 features for each time-step.
I want to train my data with a convolution neural network, I have reshaped my data:
Those are parameters that I have used:
'x_train.shape'=(500000, 3253)
'y_train.shape', (500000,)
'y_test.shape', (20000,)
'y_train[0]', 97
'y_test[0]', 99
'y_train.shape', (500000, 256)
'y_test.shape', (20000, 256)
This is how I define my model architecture:
# 3. Define model architecture
model = Sequential()
model.add(Conv1D(64, 8, strides=1, padding='valid',
dilation_rate=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform',
bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, bias_constraint=None, input_shape=x_train.shape))
# input_traces=N_Features
# input_shape=(batch_size, trace_lenght,num_of_channels)
model.add(MaxPooling1D(pool_size=2,strides=None, padding='valid'))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(1, activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, batch_size=100, epochs=500,verbose=2)
But i got two Errors :
1-
ValueError: Error when checking input: expected conv1d_1_input to have shape (None, 500000, 3253) but got array with shape (500000, 3253, 1)
2-
With model.fit()
How do I resolve this problem?
The input shape is wrong, it should be input_shape = (1, 3253) for Theano or (3253, 1) for TensorFlow. The input shape doesn't include the number of samples.
Then you need to reshape your data to include the channels axis:
x_train = x_train.reshape((500000, 1, 3253))
Or move the channels dimension to the end if you use TensorFlow. After these changes it should work.
input_shape = (3253, 1)
this must be Input_shape of first Convolution layer Conv1D
You got error with model.fit() Because you still don't build your model yet.