In a Keras implementation, I once saw the two last fully connected layers defined as follows
outX = Dense(300, activation='relu')(outX)
outX = Flatten()(outX)
predictions = Dense(1,activation='linear')(outX)
Between the two Dense layers, there is Flatten layer, why we must add a Flatten operation between two fully connected layer. Is that always required?
Short answer: a Flatten layer doesn't have any parameter to learn itself. However, adding a Flatten layer to the model can increase the learning parameters of the model.
Example: try to figure out the difference between these two models:
1) Without Flatten:
inp = Input(shape=(20,10,))
A = Dense(300, activation='relu')(inp)
#A = Flatten()(A)
A = Dense(1, activation='relu')(A)
m = Model(inputs=inp,outputs=A)
m.summary()
Output:
input_9 (InputLayer) (None, 20, 10) 0
dense_20 (Dense) (None, 20, 300) 3300
dense_21 (Dense) (None, 20, 1) 301
Total params: 3,601
Trainable params: 3,601
Non-trainable params: 0
2) With Flatten:
inp = Input(shape=(20,10,))
A = Dense(300, activation='relu')(inp)
A = Flatten()(A)
A = Dense(1, activation='relu')(A)
m = Model(inputs=inp,outputs=A)
m.summary()
Output:
input_10 (InputLayer) (None, 20, 10) 0
dense_22 (Dense) (None, 20, 300) 3300
flatten_9 (Flatten) (None, 6000) 0
dense_23 (Dense) (None, 1) 6001
Total params: 9,301
Trainable params: 9,301
Non-trainable params: 0
Finally, To add or not to add a Flatten layer depends on the data at hand. Having more parameter to learn can lead to a more accurate model OR can cause overfitting. So, one answer should be: "apply both, choose best"
Related
I am trying to implement Stabnet, which uses the DeepStab dataset and a Siamese network to output two homography matrices and stabilize a video.
I define the network like this:
from keras.layers import MaxPooling2D , TimeDistributed, LSTM, GlobalAveragePooling2D ,Dense
from keras.applications import ResNet50V2
def build_resnet(shape):
model = ResNet50V2(
include_top=False,
input_shape=shape,
weights='imagenet')
# Keep 9 layers to train
trainable = 9
for layer in model.layers[:-trainable]:
layer.trainable = False
for layer in model.layers[-trainable:]:
layer.trainable = True
return model
def get_embeded():
model = keras.Sequential()
resnet = build_resnet(Config.shape[1:])
model.add(TimeDistributed(resnet , batch_input_shape = (None,) + Config.shape))
model.add(TimeDistributed(GlobalAveragePooling2D()))
model.add(LSTM(64))
model.add(Dense(8))
return model
input1 = keras.layers.Input(Config.shape)
input2 = keras.layers.Input(Config.shape)
embeded = get_embeded()
upper = embeded(input1)
lower = embeded(input2)
outputs = [upper, lower]
Siamese = keras.models.Model(inputs = [input1, input2], outputs =outputs)
Siamese.summary()
The inputs to each branch are input1 = Current unsteady frame + 5 steady frames from the previous second, and for the lower branch, the same thing but for the previous unsteady frame. At the end, I get two homographies which are supposed to warp each frame so as to stabilize the video. The paper uses a complicated loss function, but for testing purposes, I am only interested in the mean squared error between the steady frame and the current unsteady warped with the homography.
def Lpixel(y_true, y_pred):
T = y_true[0] #[ground truth steady frame, unsteady frame] fro time t
T_1 = y_true[1]#[ground truth steady frame, unsteady frame] fro time t-1
It_gt = T[0]
It = T[1]
It_1_gt = T_1[0]
It_1 = T_1[1]
It_gt = tf.image.convert_image_dtype(It_gt, tf.float32)
It = tf.image.convert_image_dtype(It, tf.float32)
It_1_gt = tf.image.convert_image_dtype(It_1_gt, tf.float32)
It_1 = tf.image.convert_image_dtype(It_1, tf.float32)
h1 = y_pred[0]
h2 = y_pred[1]
h1 = tf.Variable(h1, dtype= 'float32')
h2 = tf.Variable(h2, dtype= 'float32')
warped_current = tfa.image.transform(It,h1, output_shape=Config.shape[1:3], interpolation='NEAREST')
warped_prev = tfa.image.transform(It_1,h2, output_shape=Config.shape[1:3], interpolation='NEAREST')
mse_current = tf.reduce_mean(tf.square(It_gt - warped_current))
mse_prev = tf.reduce_mean(tf.square(It_1_gt - warped_prev))
loss = tf.Variable(0.5 * mse_current + 0.5 * mse_prev)
return loss
I am getting the following error
ValueError: in user code:
File "c:\Users\VINY\anaconda3\envs\cv\lib\site-packages\keras\engine\training.py", line 1249, in train_function *
return step_function(self, iterator)
...
File "c:\Users\VINY\anaconda3\envs\cv\lib\site-packages\keras\optimizers\optimizer_v2\utils.py", line 77, in filter_empty_gradients
raise ValueError(
ValueError: No gradients provided for any variable: (['conv5_block3_2_conv/kernel:0', 'conv5_block3_2_bn/gamma:0', 'conv5_block3_2_bn/beta:0', 'conv5_block3_3_conv/kernel:0', 'conv5_block3_3_conv/bias:0', 'post_bn/gamma:0', 'post_bn/beta:0', 'lstm/lstm_cell/kernel:0', 'lstm/lstm_cell/recurrent_kernel:0', 'lstm/lstm_cell/bias:0', 'dense/kernel:0', 'dense/bias:0'],). Provided `grads_and_vars` is ((None, <tf.Variable 'conv5_block3_2_conv/kernel:0' shape=(3, 3, 512, 512) dtype=float32>), (None, <tf.Variable 'conv5_block3_2_bn/gamma:0' shape=(512,) dtype=float32>), (None, <tf.Variable 'conv5_block3_2_bn/beta:0' shape=(512,) dtype=float32>), (None, <tf.Variable 'conv5_block3_3_conv/kernel:0' shape=(1, 1, 512, 2048) dtype=float32>), (None, <tf.Variable 'conv5_block3_3_conv/bias:0' shape=(2048,) dtype=float32>), (None, <tf.Variable 'post_bn/gamma:0' shape=(2048,) dtype=float32>), (None, <tf.Variable 'post_bn/beta:0' shape=(2048,) dtype=float32>), (None, <tf.Variable 'lstm/lstm_cell/kernel:0' shape=(2048, 256) dtype=float32>), (None, <tf.Variable 'lstm/lstm_cell/recurrent_kernel:0' shape=(64, 256) dtype=float32>), (None, <tf.Variable 'lstm/lstm_cell/bias:0' shape=(256,) dtype=float32>), (None, <tf.Variable 'dense/kernel:0' shape=(64, 8) dtype=float32>), (None, <tf.Variable 'dense/bias:0' shape=(8,) dtype=float32>)).
From what I understand, the warping operation is not differentiable. Is there any other approach? (I do not want to predict homographies and compare them with ground truth homographies
Is Next-Frame Video Prediction with Convolutional LSTMs a regression problem or a classification?
Why regression/classification?
Why do we use the last Conv3D layer?
I would consider closer to a regression problem than a classification problem since it's inputs are all the previous frames from which it learns the trend or function to fit, in this case learns the direction in which the MNIST digit might be moving and then predicts the next best possible location.
Since, it is NOT trying to classify a set of available digit postions as next_location or NOT_next_location, it doesn't seem like a classification problem.
The last layer defined as:
x = layers.Conv3D(filters=1, kernel_size=(3, 3, 3), activation="sigmoid", padding="same")(x)
Is essentially taking in all past 2D-frames(individual MNIST images), so it's (height,width,frame_num) and compressing them to predict the next single frame.
IF you go to the colab notebook link in the keras tutorial you mentioned, you can first 4 cells and add a model.summary() to see this:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, None, 64, 64, 1) 0
]
conv_lstm2d (ConvLSTM2D) (None, None, 64, 64, 64) 416256
batch_normalization (BatchN (None, None, 64, 64, 64) 256
ormalization)
conv_lstm2d_1 (ConvLSTM2D) (None, None, 64, 64, 64) 295168
batch_normalization_1 (Batc (None, None, 64, 64, 64) 256
hNormalization)
conv_lstm2d_2 (ConvLSTM2D) (None, None, 64, 64, 64) 33024
conv3d (Conv3D) (None, None, 64, 64, 1) 1729
=================================================================
Total params: 746,689
Trainable params: 746,433
Non-trainable params: 256
conv3D layer here will output a single prediction frame of dimensions (64,64,1)
Connor shorten has made a video explanations as well: Youtube Tutorial Link
been stuck on this for hours now, cant figure out where is the mistake and what i am doing wrong, basically its a simple neural network, so i had list of lists that each element is an image, from my understanding i converted it to np.array. its called x now, i printed the entire shapes, but i still cant figure out how to fix it.
the train data contains 4057 images of size 32*32, so thats 1024. i got 27 classes so thats the last output layer.
image_shape = (1024,1)
classes_num = 27
batch = 256
epoch=50
model = Sequential()
model.add(Dense(1024, activation='relu', input_shape=image_shape,kernel_regularizer=reg))
model.add(Dropout(dropout))
model.add(Dense(512,kernel_regularizer=reg))
model.add(Dropout(dropout))
model.add(Dense(512,kernel_regularizer=reg))
model.add(Dropout(dropout))
model.add(Dense(classes_num, activation='softmax'))
model.summary()
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
//original were list of list, so i made np.array from it.
x=np.array([np.array(xi) for xi in splitDataDict['train_x']])
y=np.array([np.array(xi) for xi in splitDataDict['train_y']])
val_x=np.array([np.array(xi) for xi in splitDataDict['val_x']])
val_y=np.array([np.array(xi) for xi in splitDataDict['val_y']])
print(x.shape)
print(y.shape)
print(val_x.shape)
print(val_y.shape)
history = model.fit(x, y, validation_data=(val_x, val_y), epochs=epoch, batch_size=batch)
Shapes
x = (4057, 1024, 1)
y= (4057,)
val_x = (508, 1024, 1)
val_y = (508,)
Output
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1024, 1024) 2048
_________________________________________________________________
dense_2 (Dense) (None, 1024, 512) 524800
_________________________________________________________________
dense_3 (Dense) (None, 1024, 512) 262656
_________________________________________________________________
dense_4 (Dense) (None, 1024, 27) 13851
=================================================================
Total params: 803,355
Trainable params: 803,355
Non-trainable params: 0
_________________________________________________________________
Error
ValueError: Error when checking target: expected dense_4 to have 3 dimensions, but got array with shape (4057, 1)
I'm currently coding a GAN to generate sequences. Both the generator and the discriminator are working, when trained standalone. As soon as I combine both to the complete GAN model (to train the generator with discriminators weights frozen) the following error occurs and the graph seems to be not connected between generator and discriminator.
ValueError: No gradients provided for any variable: ['generator_lstm/kernel:0', 'generator_lstm/recurrent_kernel:0', 'generator_lstm/bias:0', 'generator_softmax/kernel:0', 'generator_softmax/bias:0'].
At first i thought my custom activation function was causing the issue. But since it works standalone i think the both "submodels" are not connected correctly. Im not sure if its important, but in the tensorboard graph there is no connection between both models.
Tensorboard Graph to my model
The error occurs on exactly the last line of the train() function.
I already tried TF versions 2.1 and 2.4.1, it makes no difference.
# softargmax and build[...]() functions are located in my "gan" python module
# custom softargmax implementation
#tf.function
def softargmax(values, beta = 1000000.0):
# tf.range over all possible indices
range_tensor = tf.range(54, dtype=tf.float32)
range_tensor = tf.reshape(range_tensor, [-1, 1])
# softmax activation of (input*beta)
values = tf.cast(values, dtype=tf.float32)
beta = tf.cast(beta, dtype=tf.float32)
softmax = tf.nn.softmax(((values*beta) - tf.reduce_max(values*beta)))
return softmax # range_tensor
callable_softargmax = tf.function(softargmax)
get_custom_objects().update({'custom_activation': Activation(callable_softargmax)})
def build_generator(z_dim, seq_len, num_of_words):
gen_input = Input(shape=(z_dim,), name="generator_input")
gen_repeat = RepeatVector(seq_len, name="generator_repeat")(gen_input)
gen_lstm = LSTM(128, activation="relu", return_sequences=True, name="generator_lstm")(gen_repeat)
gen_softmax = Dense(num_of_words, name="generator_softmax")(gen_lstm)
#gen_activation = tf.keras.layers.Activation(callable_softargmax)(gen_softmax)
gen_soft_argmax = Lambda(callable_softargmax, name="generator_soft_argmax")(gen_softmax)
generator = Model(gen_input, gen_soft_argmax, name="generator_model")
generator.summary()
return generator
def build_discriminator(seq_len, num_of_words, embedding_len):
embedding = np.load(PATH + MODELS + "embedding_ae.npy")
discriminator = Sequential(name="gan_discriminator")
discriminator.add(tf.keras.layers.InputLayer(input_shape=(seq_len,1), name="discriminator_input"))
discriminator.add(Reshape(target_shape=[18,], dtype=tf.float32, name="discriminator_reshape"))
discriminator.add(Embedding(input_dim=num_of_words, output_dim=embedding_len, input_length=seq_len, mask_zero=False,
embeddings_initializer=tf.keras.initializers.Constant(embedding), trainable=False, name="discriminator_emb"))
discriminator.add(Bidirectional(LSTM(128, activation="tanh", recurrent_activation="sigmoid", recurrent_dropout=0, unroll=False, use_bias=True,
return_sequences=True), name="discriminator_lstm"))
discriminator.add(Dropout(0.2, name="discriminator_dropout"))
discriminator.add(LSTM(128, activation="tanh", recurrent_activation="sigmoid", recurrent_dropout=0, unroll=False, use_bias=True,
name="discriminator_lstm2"))
discriminator.add(Dropout(0.2, name="discriminator_dropout2"))
discriminator.add(Dense(1, activation="sigmoid", name="discriminator_output"))
discriminator.summary()
return discriminator
def build_gan(generator, discriminator):
gan = Sequential(name="gan")
gan.add(generator)
gan.add(discriminator)
return gan
def train(train_data, generator, discriminator, gan, iterations, batch_size, z_dim):
real = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))
for iteration in range(iterations):
idx = np.random.randint(0, train_data.shape[0], batch_size)
train_samples = train_data[idx]
train_samples = np.reshape(train_samples, [batch_size, 18, 1])
# train discriminator
z = np.random.normal(0, 1, (batch_size, z_dim))
z = np.reshape(z, [batch_size, z_dim])
gen_samples = generator.predict(z)
d_loss_real = discriminator.train_on_batch(train_samples, real)
d_loss_fake = discriminator.train_on_batch(gen_samples, fake)
d_loss, accuracy, = 0.5 * np.add(d_loss_real, d_loss_fake)
# train generator
z = np.random.normal(0, 1, (batch_size, z_dim))
gen_samples = generator.predict(z)
g_loss = gan.train_on_batch(z, real)
# compiling and running models in main.py
discriminator = gan.build_discriminator(seq_len=18, num_of_words=54, embedding_len=200)
discriminator.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(lr=0.001), metrics=["accuracy"])
discriminator.trainable = False
generator = gan.build_generator(z_dim, seq_len=18, num_of_words=54)
gan_model = gan.build_gan(generator, discriminator)
gan_model.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(lr=0.001))
gan.train(train_data=train, generator=generator, discriminator=discriminator,
gan=gan_model, iterations=iterations, batch_size=batch_size, z_dim=z_dim)
Model: "gan_discriminator"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
discriminator_reshape (Resha (None, 18) 0
_________________________________________________________________
discriminator_emb (Embedding (None, 18, 200) 10800
_________________________________________________________________
discriminator_lstm (Bidirect (None, 18, 256) 336896
_________________________________________________________________
discriminator_dropout (Dropo (None, 18, 256) 0
_________________________________________________________________
discriminator_lstm2 (LSTM) (None, 128) 197120
_________________________________________________________________
discriminator_dropout2 (Drop (None, 128) 0
_________________________________________________________________
discriminator_output (Dense) (None, 1) 129
=================================================================
Total params: 544,945
Trainable params: 534,145
Non-trainable params: 10,800
_________________________________________________________________
Model: "generator_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
generator_input (InputLayer) [(None, 128)] 0
_________________________________________________________________
generator_repeat (RepeatVect (None, 18, 128) 0
_________________________________________________________________
generator_lstm (LSTM) (None, 18, 128) 131584
_________________________________________________________________
generator_softmax (Dense) (None, 18, 54) 6966
_________________________________________________________________
generator_soft_argmax (Lambd (None, 18, 1) 0
=================================================================
Total params: 138,550
Trainable params: 138,550
Non-trainable params: 0
_________________________________________________________________
Model: "gan"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
generator_model (Model) (None, 18, 1) 138550
_________________________________________________________________
gan_discriminator (Sequentia (None, 1) 544945
=================================================================
Total params: 683,495
Trainable params: 138,550
Non-trainable params: 544,945
_________________________________________________________________
Do you have any suggestions on the model and what might possibly be wrong?
I've trained a GRU with Keras. Getting the error when I ran
nxt = model.predict([features,embedding_matrix[enc_map[cur]]])
ValueError: Error when checking : expected input_2 to have shape (512,) but got array with shape (1,)
But
features.shape
(512,)`
And
embedding_matrix[enc_map[cur]].shape
(50,)
Here's the summary of the model:
model.summary()
________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
================================================================================
input_2 (InputLayer) (None, 512) 0
________________________________________________________________________________
input_1 (InputLayer) (None, 50) 0
________________________________________________________________________________
merge_1 (Merge) (None, 562) 0 input_2[0][0]
input_1[0][0]
________________________________________________________________________________
reshape_1 (Reshape) (None, 1, 562) 0 merge_1[0][0]
________________________________________________________________________________
gru_1 (GRU) (None, 128) 265344 reshape_1[0][0]
_______________________________________________________________________________
dense_1 (Dense) (None, 50) 6450 gru_1[0][0]
================================================================================
Total params: 271,794
Trainable params: 271,794
Non-trainable params: 0
The inputs must be a numpy array with shape (any, 512)
Check out the shape of the X_train data you used for training, it must follow the same rules.
If this results in a correct shape, you can:
input_data = np.array([features,embedding_matrix[enc_map[cur]]])
But if there is anything wrong with this data and it doesn't fit the required (any,512), the model will not be able to use it.
You need to reshape the input_data array to (512,1) and also transpose it.
input_data = input_data.reshape(512,1).T