Training accuracy is not improving in recognizing violence with CNN + LSTM - deep-learning

I'm trying to implement a violence recognizer with CNN + LSTM but when I train the model it produces the same accuracy over the epochs as the following picture 55% over 20 epochs Right here
The data consists of 100 videos violence/non violence from the hockey dataset and it was preprocessed by cropping dark frames, scaled pixel values [0 to 1] and normalized in which the mean is zero with same standard deviation. The input shape is (100, 20, 160, 160, 3) which is number of videos, frames per video, frame height, frame width, RGB channels respectively. And the labels tensor of shape (100,2) which represents a vector [0 1] or [1 0] both arrays are given into the model as floats
The code of the model
def CNN_LSTM():
input_shapes=(NUMBER_OF_FRAMES,IMAGE_SIZE,IMAGE_SIZE,3)
np.random.seed(1234)
vg19 = tensorflow.keras.applications.vgg19.VGG19
base_model=vg19(include_top=False,weights='imagenet',input_shape=(IMAGE_SIZE, IMAGE_SIZE,3))
for layer in base_model.layers:
layer.trainable = True
cnn = TimeDistributed(base_model, input_shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, 3))
model = Sequential()
model.add(Input(shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNELS)))
model.add(cnn)
model.add(TimeDistributed(Flatten()))
model.add(LSTM(NUMBER_OF_FRAMES , return_sequences= True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(90)))
model.add(BatchNormalization())
model.add(GlobalAveragePooling1D())
model.add(BatchNormalization())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(2, activation="sigmoid"))
adam = Adam(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=["accuracy"])
return model
x = CNN_LSTM()
x.summary()
history = x.fit(fvideo,flabels, batch_size=10, epochs=15)`
How can I solve the problem of the same accuracy? Thanks in advance

Related

Inputting 2D array in to conv2D layer

I'm building a DQN that takes a 24x10 array of 0,1,2 (representing a tetris board) and a int 0-5 (representing the current playable tetramino)
I flatten my array and convert it to a Tensor before inputting it to my convolutional layers but this is the error I keep on getting
Expected 4-dimensional input for 4-dimensional weight [16, 3, 240, 240], but got 1-dimensional input of size [240] instead
I've tried reducing the Kernel size and stride as well as not flattening the array but neither has worked.
For reference this is my DQN
class DQN(nn.Module):
def __init__(self):
super(DQN, self).__init__()
self.conv1_board = nn.Conv2d(3, 16, kernel_size=240, stride=1) #3 input channels for 0,1,2 . kernel_size 240 for length of tensor
self.conv2_board = nn.Conv2d(16, 32, kernel_size=240, stride=1)
self.conv3_board = nn.Conv2d(32, 6, kernel_size=240, stride=1)
self.conv1_piece = nn.Conv2d(6, 16, kernel_size=240, stride=1) #in channels 6 as 6 possible values
self.conv2_piece = nn.Conv2d(16, 6, kernel_size=240, stride=1)
self.fc1 = nn.Linear(1, 32)
self.fc2 = nn.Linear(32, 6)
self.flatten = nn.Flatten()
def flt_totns(self, arr):
flt = []
for l in arr:
flt.extend(l)
return torch.FloatTensor(flt)
def forward(self, states): #inputs to conv layers should be Tensors not list. convert list => tensor
board, piece = states
board = self.flt_totns(board)
embed_board = flatten(self.conv3_board(self.conv2_board(self.conv1_board(board))))
embed_piece = flatten(self.conv2_piece(self.conv1_piece(piece)))
embed_joined = torch.cat([embed_board, embed_piece])
return self.fc2(self.fc1(embed_joined))
I'm very new to CNNs in pytorch so I'm sure a lot of my reasoning is faulty. For example I'm still not sure how Kernel size exactly relates to the shape of your input, or if input channels still applies to array inputs. Buy any help would be greatly appreciated.

Data Flow in CNN

What would be the output of a network with 32 units in 1st convolutional layer followed by a pooling layer with stride=2 and k=2 followed by another convolutional layer with 64 units followed by another pooling layer with stride = 2 and k=2 , input image dimensions are 28*28*1
I will assume that you are using same padding in each convolution, then the output of you model will be of shape (7, 7, 64). You can use a code similar to this one in keras if you don't want to compute the size by hand:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
model.add(Conv2D(32, 3, padding="same", input_shape=(28, 28, 1)))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, 3, padding="same"))
model.add(MaxPooling2D(2, 2))
print(model.summary())

What's the input of each LSTM layer in a stacked LSTM network?

I'm having some difficulty understanding the input-output flow of layers in stacked LSTM networks. Let's say i have created a stacked LSTM network like the one below:
# parameters
time_steps = 10
features = 2
input_shape = [time_steps, features]
batch_size = 32
# model
model = Sequential()
model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
model.add(LSTM(32,input_shape=input_shape))
where our stacked-LSTM network consists of 2 LSTM layers with 64 and 32 hidden units respectively. In this scenario, we expect that at each time-step the 1st LSTM layer -LSTM(64)- will pass as input to the 2nd LSTM layer -LSTM(32)- a vector of size [batch_size, time-step, hidden_unit_length], which would represent the hidden state of the 1st LSTM layer at the current time-step. What confuses me is:
Does the 2nd LSTM layer -LSTM(32)- receives as X(t) (as input) the hidden state of the 1st layer -LSTM(64)- that has the size [batch_size, time-step, hidden_unit_length] and passes it through it's own hidden network - in this case consisting of 32 nodes-?
If the first is true, why the input_shape of the 1st -LSTM(64)- and 2nd -LSTM(32)- is the same, when the 2nd only processes the hidden state of the 1st layer? Shouldn't in our case have input_shape set to be [32, 10, 64]?
I found the LSTM visualization below very helpful (found here) but it doesn't expand on stacked-lstm networks:
Any help would be highly appreciated.
Thanks!
The input_shape is only required for the first layer. The subsequent layers take the output of previous layer as its input (as so their input_shape argument value is ignored)
The model below
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32))
represent the below architecture
Which you can verify it from model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_26 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_27 (LSTM) (None, 32) 12416
=================================================================
Replacing the line
model.add(LSTM(32))
with
model.add(LSTM(32, input_shape=(1000000, 200000)))
will still give you the same architecture (verify using model.summary()) because the input_shape is ignore as it takes as input the tensor output of the previous layer.
And If you need a sequence to sequence architecture like below
you should be using the code:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32, return_sequences=True))
which should return a model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_32 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_33 (LSTM) (None, 5, 32) 12416
=================================================================
In keras document, mentioned the input is [batch_size, time-step, input_dim], rather than [batch_size, time-step, hidden_unit_length], so I think 64, 32 coorresponding the X-input's has 64 features and LSTM-32 has 32 features for each time-step.

ValueError: Error when checking input: expected conv1d_1_input to have shape (None, 500000, 3253) but got array with shape (500000, 3253, 1)

I want to train my data with a convolution neural network, I have reshaped my data:
Those are parameters that I have used:
'x_train.shape'=(500000, 3253)
'y_train.shape', (500000,)
'y_test.shape', (20000,)
'y_train[0]', 97
'y_test[0]', 99
'y_train.shape', (500000, 256)
'y_test.shape', (20000, 256)
This is how I define my model architecture:
# 3. Define model architecture
model = Sequential()
model.add(Conv1D(64, 8, strides=1, padding='valid',
dilation_rate=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform',
bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, bias_constraint=None, input_shape=x_train.shape))
# input_traces=N_Features
# input_shape=(batch_size, trace_lenght,num_of_channels)
model.add(MaxPooling1D(pool_size=2,strides=None, padding='valid'))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(1, activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, batch_size=100, epochs=500,verbose=2)
But i got two Errors :
1-
ValueError: Error when checking input: expected conv1d_1_input to have shape (None, 500000, 3253) but got array with shape (500000, 3253, 1)
2-
With model.fit()
How do I resolve this problem?
The input shape is wrong, it should be input_shape = (1, 3253) for Theano or (3253, 1) for TensorFlow. The input shape doesn't include the number of samples.
Then you need to reshape your data to include the channels axis:
x_train = x_train.reshape((500000, 1, 3253))
Or move the channels dimension to the end if you use TensorFlow. After these changes it should work.
input_shape = (3253, 1)
this must be Input_shape of first Convolution layer Conv1D
You got error with model.fit() Because you still don't build your model yet.

How to reshape my input to feed it into 1D Convolutional layer for sequence classification?

I have a csv file with 339732 rows and two columns :
the first being 29 feature values, i.e. X
the second being a binary label value, i.e. Y
dataframe = pd.read_csv("features.csv", header = None)
dataset = dataframe.values
X = dataset[:, 0:29].astype(float)
Y = dataset[:,29]
X_train, y_train, X_test, y_test = train_test_split(X,Y, random_state = 42)
I am trying to train it on a 1D convolutional layer:
model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(X_train.shape[0], 29)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(128, 3, activation='relu'))
model.add(Conv1D(128, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=16, epochs=2)
score = model.evaluate(X_test, y_test, batch_size=16)
Since, the Conv1D layer expects a 3-D input, I transformed my input as follows:
X_train = np.reshape(X_train, (1, X_train.shape[0], X_train.shape[1]))
X_test = np.reshape(X_test, (1, X_test.shape[0], X_test.shape[1]))
However, this still throws error:
ValueError: Negative dimension size caused by subtracting 3 from 1 for 'conv1d_1/convolution/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1,29], [1,3,29,64].
Is there any way to feed my input correctly?
As far as I know 1D Convolution layer accepts inputs of the form Batchsize x Width x Channels. You are reshaping with
X_train = np.reshape(X_train, (1, X_train.shape[0], X_train.shape[1]))
But X_train.shape[0] is your batchsize I guess.I think the problem is somewhere here. Can you please tell what is the shape of X_train before reshape?
You have to think about if your data have some progression relation between the 339732 entries or the 29 features, this means if the order matters. If not I don't think that CNN is suitable for this case.
If the 29 features "indicates the progression of something":
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1],1))
If the 29 features are independent, then is like the channels on the image, but doesn't make sense convolute with only 1.
X_train = X_train.reshape((X_train.shape[0],1, X_train.shape[1]))
If you want to pick the 339732 entries like in blocks where the order matters (clip the 339732 or add zero padding in order to be divisible by timesteps):
X_train = X_train.reshape((int(X_train.shape[0]/timesteps),timesteps, X_train.shape[1],1))