Data Flow in CNN - deep-learning

What would be the output of a network with 32 units in 1st convolutional layer followed by a pooling layer with stride=2 and k=2 followed by another convolutional layer with 64 units followed by another pooling layer with stride = 2 and k=2 , input image dimensions are 28*28*1

I will assume that you are using same padding in each convolution, then the output of you model will be of shape (7, 7, 64). You can use a code similar to this one in keras if you don't want to compute the size by hand:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
model.add(Conv2D(32, 3, padding="same", input_shape=(28, 28, 1)))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, 3, padding="same"))
model.add(MaxPooling2D(2, 2))
print(model.summary())

Related

Training accuracy is not improving in recognizing violence with CNN + LSTM

I'm trying to implement a violence recognizer with CNN + LSTM but when I train the model it produces the same accuracy over the epochs as the following picture 55% over 20 epochs Right here
The data consists of 100 videos violence/non violence from the hockey dataset and it was preprocessed by cropping dark frames, scaled pixel values [0 to 1] and normalized in which the mean is zero with same standard deviation. The input shape is (100, 20, 160, 160, 3) which is number of videos, frames per video, frame height, frame width, RGB channels respectively. And the labels tensor of shape (100,2) which represents a vector [0 1] or [1 0] both arrays are given into the model as floats
The code of the model
def CNN_LSTM():
input_shapes=(NUMBER_OF_FRAMES,IMAGE_SIZE,IMAGE_SIZE,3)
np.random.seed(1234)
vg19 = tensorflow.keras.applications.vgg19.VGG19
base_model=vg19(include_top=False,weights='imagenet',input_shape=(IMAGE_SIZE, IMAGE_SIZE,3))
for layer in base_model.layers:
layer.trainable = True
cnn = TimeDistributed(base_model, input_shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, 3))
model = Sequential()
model.add(Input(shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNELS)))
model.add(cnn)
model.add(TimeDistributed(Flatten()))
model.add(LSTM(NUMBER_OF_FRAMES , return_sequences= True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(90)))
model.add(BatchNormalization())
model.add(GlobalAveragePooling1D())
model.add(BatchNormalization())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(2, activation="sigmoid"))
adam = Adam(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=["accuracy"])
return model
x = CNN_LSTM()
x.summary()
history = x.fit(fvideo,flabels, batch_size=10, epochs=15)`
How can I solve the problem of the same accuracy? Thanks in advance

What's the input of each LSTM layer in a stacked LSTM network?

I'm having some difficulty understanding the input-output flow of layers in stacked LSTM networks. Let's say i have created a stacked LSTM network like the one below:
# parameters
time_steps = 10
features = 2
input_shape = [time_steps, features]
batch_size = 32
# model
model = Sequential()
model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
model.add(LSTM(32,input_shape=input_shape))
where our stacked-LSTM network consists of 2 LSTM layers with 64 and 32 hidden units respectively. In this scenario, we expect that at each time-step the 1st LSTM layer -LSTM(64)- will pass as input to the 2nd LSTM layer -LSTM(32)- a vector of size [batch_size, time-step, hidden_unit_length], which would represent the hidden state of the 1st LSTM layer at the current time-step. What confuses me is:
Does the 2nd LSTM layer -LSTM(32)- receives as X(t) (as input) the hidden state of the 1st layer -LSTM(64)- that has the size [batch_size, time-step, hidden_unit_length] and passes it through it's own hidden network - in this case consisting of 32 nodes-?
If the first is true, why the input_shape of the 1st -LSTM(64)- and 2nd -LSTM(32)- is the same, when the 2nd only processes the hidden state of the 1st layer? Shouldn't in our case have input_shape set to be [32, 10, 64]?
I found the LSTM visualization below very helpful (found here) but it doesn't expand on stacked-lstm networks:
Any help would be highly appreciated.
Thanks!
The input_shape is only required for the first layer. The subsequent layers take the output of previous layer as its input (as so their input_shape argument value is ignored)
The model below
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32))
represent the below architecture
Which you can verify it from model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_26 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_27 (LSTM) (None, 32) 12416
=================================================================
Replacing the line
model.add(LSTM(32))
with
model.add(LSTM(32, input_shape=(1000000, 200000)))
will still give you the same architecture (verify using model.summary()) because the input_shape is ignore as it takes as input the tensor output of the previous layer.
And If you need a sequence to sequence architecture like below
you should be using the code:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32, return_sequences=True))
which should return a model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_32 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_33 (LSTM) (None, 5, 32) 12416
=================================================================
In keras document, mentioned the input is [batch_size, time-step, input_dim], rather than [batch_size, time-step, hidden_unit_length], so I think 64, 32 coorresponding the X-input's has 64 features and LSTM-32 has 32 features for each time-step.

CNN-LSTM in Keras: Dimension Error

In keras 1.2.2, I have made a dataset which has these dimensions:
X_train: (2000, 100, 32, 32, 3)
y_train: (2000,1)
Here, 2000 is the number of instances (batches of data), 100 is the number of samples in each batch, 32 is the image rows and cols, and 3 is the number of channels (RGB).
I have written this code which applies an LSTM after a CNN. I used TimeDistributed layers and took the average of the output of LSTM to have something like this:
I want LSTM to work on each batch, and then I take the average of the LSTM output on that batch. So, my total output (my labels) is a (2000,1) vector.
I get this error:
Error when checking model target: expected lambda_14 to have 2
dimensions, but got array with shape (2000L, 100L, 1L)
This is my code:
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
import keras
from keras.layers import Input ,Dense, Dropout, Activation, LSTM
from keras.layers import Lambda, Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model
import keras.backend as K
import numpy as np
timesteps=100;
number_of_samples=2500;
nb_samples=number_of_samples;
frame_row=32;
frame_col=32;
channels=3;
nb_epoch=1;
batch_size=timesteps;
data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,timesteps,1))
X_train=data[0:2000,:]
y_train=label[0:2000]
X_test=data[2000:,:]
y_test=label[2000:,:]
#%%
model=Sequential();
model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Convolution2D(32, 3, 3)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dense(512)))
#output dimension here is (None, 100, 512)
model.add(TimeDistributed(Dense(35, name="first_dense" )))
#output dimension here is (None, 100, 35)
model.add(LSTM(output_dim=20, return_sequences=True))
#output dimension here is (None, 100, 20)
time_distributed_merge_layer = Lambda(function=lambda x: K.mean(x, axis=1, keepdims=False))
model.add(time_distributed_merge_layer)
#output dimension here is (None, 1, 20)
#model.add(Flatten())
model.add(Dense(1, activation='sigmoid', input_shape=(None,20)))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_test, y_test))
If we follow the track of the shapes in your model, we get to shape = (None, 100, 35) after the Dense(). Then you feed that to an LSTM() which will return the whole sequence of hidden vectors of length 20 so you get a shape = (None, 100, 20). Then you take the mean over the axis 1 so you get shape = (None, 100, 1).
There is something wrong with the architecture of that network because your targets have the shape = (None, 1). So either change
LSTM(output_dim=20, return_sequences=False)
and add a Flatten() layer after the merge. Or Flatten() after the merge and use a Dense(1, activation='sigmoid') to get your predictions?
It depends on you but right now it cannot work.

How to split a tensor column-wise in Keras to implement STFCN

I'd like to implement the Spatiotemporal Fully Convolutional Network (STFCN) in Keras. I need to feed each depth column of a 3D convolutional output, e.g. a tensor with shape (64, 16, 16), as input to a separate LSTM.
To make this clear, I have a (64 x 16 x 16) tensor of dimensions (channels, height, width). I need to split (explicitly or implicitly) the tensor into 16 * 16 = 256 tensors of shape (64 x 1 x 1).
Here is a diagram from the STFCN paper to illustrate the Spatio-Temporal module. What I described above is the arrow between 'Spatial Features' and 'Spatio-Temporal Module'.
How would this idea be best implemented in Keras?
You can use tf.split from Tensorflow using Keras Lambda layer
Use Lambda to split a tensor of shape (64,16,16) into (64,1,1,256) and then subset any indexes you need.
import numpy as np
import tensorflow as tf
import keras.backend as K
from keras.models import Model
from keras.layers import Input, Lambda
# input data
data = np.ones((3,64,16,16))
# define lambda function to split
def lambda_fun(x) :
x = K.expand_dims(x, 4)
split1 = tf.split(x, 16, 2)
x = K.concatenate(split1, 4)
split2 = tf.split(x, 16, 3)
x = K.concatenate(split2, 4)
return x
## check thet splitting works fine
input = Input(shape= (64,16,16))
ll = Lambda(lambda_fun)(input)
model = Model(inputs=input, outputs=ll)
res = model.predict(data)
print(np.shape(res)) #(3, 64, 1, 1, 256)

Keras ImageDataGenerator not working as expected

I'm trying to build an autoencoder using Keras, based on [this example][1] from the docs. Because my data is large, I'd like to use a generator to avoid loading it into memory.
My model looks like:
model = Sequential()
model.add(Convolution2D(16, 3, 3, activation='relu', border_mode='same', input_shape=(3, 256, 256)))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(16, 3, 3, activation='relu'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(1, 3, 3, activation='sigmoid', border_mode='same'))
model.compile(optimizer='adadelta', loss='binary_crossentropy')
My generator:
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory('IMAGE DIRECTORY', color_mode='rgb', class_mode='binary', batch_size=32, target_size=(256, 256))
And then fitting the model:
model.fit_generator(
train_generator,
samples_per_epoch=1,
nb_epoch=1,
verbose=1,
)
I'm getting this error:
Exception: Error when checking model target: expected convolution2d_76 to have 4 dimensions, but got array with shape (32, 1)
That looks like the size of my batch rather than a sample. What am I doing wrong?
The error is most likely due to the class_mode='binary'. It makes the generator produce binary classes, so the output has shape (batch_size, 1), while your model produces a four dimensional output (since the last layer is a convolution).
I guess that you want your label to be the image itself. Based on the source of the flow_from_directory and the DirectoryIterator it uses, it is impossible to do by just changing the class_mode. A possible solution would be along the lines of:
train_generator_ = train_datagen.flow_from_directory('IMAGE DIRECTORY', color_mode='rgb', class_mode=None, batch_size=32, target_size=(256, 256))
def train_generator():
for x in train_iterator_:
yield x, x
Note that I set class_mode to None. It makes the generator to return just the image instead of tuple(image, label). I then define a new generator, that returns the image as both the input and the label.