Training and validation accuracy producing overfitting - deep-learning

Below are the code of CNN model, issue is the training accuracy is 96% and validation accuracy is 69%. help me to increase the validation accuracy.
`model = Sequential()`
`model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape=(128,128,1), padding ='same', name='Conv_1'))`
`model.add(MaxPooling2D((2,2),name='MaxPool_1'))
`model.add(Conv2D(64, (3, 3), activation = 'relu',padding ='same', name='Conv_2'))
`model.add(MaxPooling2D((2,2),name='MaxPool_2'))
`model.add(Conv2D(128, (3, 3), activation = 'relu', padding ='same', name='Conv_3'))
`model.add(Flatten(name='Flatten'))`
`model.add(Dropout(0.5,name='Dropout'))
`model.add(Dense(128, kernel_initializer='normal', activation='relu', name='Dense_1'))
`model.add(Dense(1, kernel_initializer='normal', activation='sigmoid', name='Dense_2'))`
`model.summary()`
`model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])`
`history = model.fit(x_train2, y_train2, epochs=25, batch_size=10, verbose=2, validation_data=(x_test, y_test))`
Findings:
Train: accuracy = 0.937500 ; loss = 0.125126
Test: accuracy = 0.662508 ; loss = 1.089228

first, you must try increase the epoch
second, maybe you should try add more the Dense layer before the dense for classification or add Dropout layer after the last Conv2d layer.
I hope you get better

Related

Model doesn't predict in eval mode using the same dataset

I have created simple model to detect moving object and display its coordinates. Dataset was consisting of thermal videos captured from above, where the target was almost a dot. My model was predicting coordinates with accuracy of 1 pixel in 80% of all training frames.
However, when I switched model to the eval() mode, just for a try, I gave the same inputs as for training and the results were way different. How is it possible and what can I do to restore previous accuracy??
Here's my model:
(conv1): Conv2d(1, 40, kernel_size=(6, 6), stride=(5, 5))
(conv2): Conv2d(40, 40, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
(conv3): Conv2d(40, 120, kernel_size=(3, 4), stride=(3, 3))
(relu0): ReLU()
(maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(flatten1): Flatten(start_dim=1, end_dim=-1)
(lstm): LSTM(20, 175, num_layers=2)
(flatten2): Flatten(start_dim=0, end_dim=-1)
(linear3): Linear(in_features=21000, out_features=2, bias=True)
And optimizer:
Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
capturable: False
differentiable: False
eps: 1e-08
foreach: None
fused: False
lr: 0.0005
maximize: False
weight_decay: 0
)
A common source for such behavior is when Dropout or BatchNorm is applied (since it leads to different network behavior during training and evaluation), this is however not the case in your model as far as I can tell. Instead of using model.eval() you can also try model.train(mode=False) to see if the behavior is consistent. Obtaining a single batch with 80% accuracy while the average remains below this can however happen. Thus ensure that the two data sources you are testing on are identical and that the model is not reinitialized or otherwise altered between training and testing.

Image regression - estimating sensors from images

I am trying to use images to predict the sensor data of a racing game. Being a bit of a newcomer I have multiple questions. All help/suggestion is appreciated.
Dataset
The dataset looks something like:
image: 160x120 grayscale from the bumper view - here is an example of some images
sensors: vector of 21 elements, all normalized between [0, 1], representing the 3 sensors. Those sensors are:
angle between the car and the track axis (sensors[0])
19 rangefinders, returning the distance from the car to the track limit, spanning from -pi and pi (sensors[1:20])
distance from track axis (sensors[20])
Here is an example of a sensor vector.
[
0.01011692 # angle
0.059058 0.299319 0.23943199 0.20102449 0.18029851
0.1706595 0.165723 0.161521 0.15858699 0.15570949 0.15288849
0.150124 0.146348 0.142166 0.1347065 0.121228 0.102669
0.08340649 0.04948675 # rangefinders
0.00183716 # distance from center
]
I generated about 50000 entries in the dataset. In case this is not enough, increasing the size of the dataset is trivial as its creation is an entirely automated process.
Reasons and goal
The end goal is to use the sensors predicted from game frames to drive the car in real time. This way the car can be driven using only images.
The quality of the estimation from both models is not good enough, as the "driver" beheaves weirdly or has no idea of what to do.
Progress and results
I started with a plain CNN:
model = Sequential()
model.add(Conv2D(8, (4, 4), input_shape = (img_height, img_width, stack_depth), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(8, (4, 4), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(8, (4, 4), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(8, (4, 4), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(16, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(16, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(16, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(16, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(32, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(32, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(192, activation="relu"))
model.add(Dense(96, activation="relu"))
model.add(Dense(48, activation="relu"))
model.add(Dense(output_size, activation="linear"))
adam = Adam(learning_rate=1e-5)
model.compile(loss="mean_squared_error", optimizer=adam)
After completing the training the model has
loss (MSE) of 0.02 on "easy" tracks and >0.1 on harder ones (more detailed)
R^2 index of 0.7 on easy and <0.3 on hard
Because the performance of the CNN was not enough I also tried a Residual network. The model is far bigger (4 million parameters) and has slightly better results on detailed tracks:
loss (MSE) of ~0.03 on easy and ~0.05 on hard
R^2 index of 0.7 on easy and ~0.5 on hard
There isn't almost any difference, if not the plain CNN performing better, on simple tracks.
If you want the code for both models is here.
Questions
First of all, are there any errors in my code?
Could a larger dataset improve the quality of the predictions?
Could higher resolution images and/or RGB help? Could a different camera angle (eg camera set higher up) also help?
The dataset is generated in different tracks and with "noisy" driving (swirling from left to right, breaking randomly, etc). Is this a good idea or does it just slow down the training?
In my opinion the sensors vector is (loosely) structured and values are significant to each other to some extent. Can this property be used?
Is there any other recommended architecture or strategy for this kind of regression with images?
Thanks in advance for any answer.

Training accuracy is not improving in recognizing violence with CNN + LSTM

I'm trying to implement a violence recognizer with CNN + LSTM but when I train the model it produces the same accuracy over the epochs as the following picture 55% over 20 epochs Right here
The data consists of 100 videos violence/non violence from the hockey dataset and it was preprocessed by cropping dark frames, scaled pixel values [0 to 1] and normalized in which the mean is zero with same standard deviation. The input shape is (100, 20, 160, 160, 3) which is number of videos, frames per video, frame height, frame width, RGB channels respectively. And the labels tensor of shape (100,2) which represents a vector [0 1] or [1 0] both arrays are given into the model as floats
The code of the model
def CNN_LSTM():
input_shapes=(NUMBER_OF_FRAMES,IMAGE_SIZE,IMAGE_SIZE,3)
np.random.seed(1234)
vg19 = tensorflow.keras.applications.vgg19.VGG19
base_model=vg19(include_top=False,weights='imagenet',input_shape=(IMAGE_SIZE, IMAGE_SIZE,3))
for layer in base_model.layers:
layer.trainable = True
cnn = TimeDistributed(base_model, input_shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, 3))
model = Sequential()
model.add(Input(shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNELS)))
model.add(cnn)
model.add(TimeDistributed(Flatten()))
model.add(LSTM(NUMBER_OF_FRAMES , return_sequences= True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(90)))
model.add(BatchNormalization())
model.add(GlobalAveragePooling1D())
model.add(BatchNormalization())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(2, activation="sigmoid"))
adam = Adam(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=["accuracy"])
return model
x = CNN_LSTM()
x.summary()
history = x.fit(fvideo,flabels, batch_size=10, epochs=15)`
How can I solve the problem of the same accuracy? Thanks in advance

How to include batch normalization in non-sequential keras model

I am new to DL and Keras. Currently I try to implement a Unet-like CNN and now I want to include batch normalization layers into my non-sequential model but do not really now how.
That is my current try to include it:
input_1 = Input((X_train.shape[1],X_train.shape[2], X_train.shape[3]))
conv1 = Conv2D(16, (3,3), strides=(2,2), activation='relu', padding='same')(input_1)
batch1 = BatchNormalization(axis=3)(conv1)
conv2 = Conv2D(32, (3,3), strides=(2,2), activation='relu', padding='same')(batch1)
batch2 = BatchNormalization(axis=3)(conv2)
conv3 = Conv2D(64, (3,3), strides=(2,2), activation='relu', padding='same')(batch2)
batch3 = BatchNormalization(axis=3)(conv3)
conv4 = Conv2D(128, (3,3), strides=(2,2), activation='relu', padding='same')(batch3)
batch4 = BatchNormalization(axis=3)(conv4)
conv5 = Conv2D(256, (3,3), strides=(2,2), activation='relu', padding='same')(batch4)
batch5 = BatchNormalization(axis=3)(conv5)
conv6 = Conv2D(512, (3,3), strides=(2,2), activation='relu', padding='same')(batch5)
drop1 = Dropout(0.25)(conv6)
upconv1 = Conv2DTranspose(256, (3,3), strides=(1,1), padding='same')(drop1)
upconv2 = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same')(upconv1)
upconv3 = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same')(upconv2)
upconv4 = Conv2DTranspose(32, (3,3), strides=(2,2), padding='same')(upconv3)
upconv5 = Conv2DTranspose(16, (3,3), strides=(2,2), padding='same')(upconv4)
upconv5_1 = concatenate([upconv5,conv2], axis=3)
upconv6 = Conv2DTranspose(8, (3,3), strides=(2,2), padding='same')(upconv5_1)
upconv6_1 = concatenate([upconv6,conv1], axis=3)
upconv7 = Conv2DTranspose(1, (3,3), strides=(2,2), activation='linear', padding='same')(upconv6_1)
model = Model(outputs=upconv7, inputs=input_1)
Is the batch normalization used in the right way? In the keras documentation I read that you typically want to normalize the "features axis"!?
This is a short snippet out of the model summary:
====================================================================================================
input_1 (InputLayer) (None, 512, 512, 9) 0
____________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 256, 256, 16) 1312 input_1[0][0]
____________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 128, 32) 4640 conv2d_1[0][0]
____________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 64, 64, 64) 18496 conv2d_2[0][0]
____________________________________________________________________________________________________
In this case my features axis is axis 3(start counting at 0), right?
I read about discussions whether you should implement the batch normalization before or after the activation function. In this case it is used after the activation function, right? Is there a possibility to use it before the activation function?
Thank you very much for your help and feedback! Really appreciate it!
Part 1: Is the batch normalization used in the right way?
The way you've called the BatchNormalization layer is correct; axis=3 is what you want, as recommended by the documentation.
Keep in mind that in the case of your model, axis=3 is equivalent to the default setting, axis=-1, so you do not need to call it explicitly.Part 2: In this case it is used after the activation function, right? Is there a possibility to use it before the activation function?
Yes, batch normalization as defined in the 2014 research paper by Ioffe and Szegedy is intended for use after the activation layer as a means of reducing internal covariate shift. Your code correctly applies the batchnorm after the activations on your convolutional layers. Its use after the activation layer can be thought of as a "pre-processing step" for the information before it reaches the next layer as an input.
For that reason, batch normalization can also serve as a data pre-processing step, which you can use immediately after your input layer (as discussed in this response.) However, as that answer mentions, batchnorm should not be abused; it's computationally expensive and can force your model into approximately linear behavior (this answer goes into more detail about this issue).
Using batchnorm in some other step in the model (not after activation layer or input layer) would have poorly-understood effects on model performance; it's a process intended explicitly to be applied to the outputs of the activation layer.
In my experience with u-nets, I've had a lot of success applying batchnorm only after the convolutional layers before max pooling; this effectively doubles the computational "bang for my buck" on normalization, since these tensors are re-used in the u-net architecture. Aside from that, I don't use batchnorm (except maybe on the inputs if the mean pixel intensities per image are super heterogeneous.)
axis 3 = axis -1 which is the default parameter.

How to reshape my input to feed it into 1D Convolutional layer for sequence classification?

I have a csv file with 339732 rows and two columns :
the first being 29 feature values, i.e. X
the second being a binary label value, i.e. Y
dataframe = pd.read_csv("features.csv", header = None)
dataset = dataframe.values
X = dataset[:, 0:29].astype(float)
Y = dataset[:,29]
X_train, y_train, X_test, y_test = train_test_split(X,Y, random_state = 42)
I am trying to train it on a 1D convolutional layer:
model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(X_train.shape[0], 29)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(128, 3, activation='relu'))
model.add(Conv1D(128, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=16, epochs=2)
score = model.evaluate(X_test, y_test, batch_size=16)
Since, the Conv1D layer expects a 3-D input, I transformed my input as follows:
X_train = np.reshape(X_train, (1, X_train.shape[0], X_train.shape[1]))
X_test = np.reshape(X_test, (1, X_test.shape[0], X_test.shape[1]))
However, this still throws error:
ValueError: Negative dimension size caused by subtracting 3 from 1 for 'conv1d_1/convolution/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1,29], [1,3,29,64].
Is there any way to feed my input correctly?
As far as I know 1D Convolution layer accepts inputs of the form Batchsize x Width x Channels. You are reshaping with
X_train = np.reshape(X_train, (1, X_train.shape[0], X_train.shape[1]))
But X_train.shape[0] is your batchsize I guess.I think the problem is somewhere here. Can you please tell what is the shape of X_train before reshape?
You have to think about if your data have some progression relation between the 339732 entries or the 29 features, this means if the order matters. If not I don't think that CNN is suitable for this case.
If the 29 features "indicates the progression of something":
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1],1))
If the 29 features are independent, then is like the channels on the image, but doesn't make sense convolute with only 1.
X_train = X_train.reshape((X_train.shape[0],1, X_train.shape[1]))
If you want to pick the 339732 entries like in blocks where the order matters (clip the 339732 or add zero padding in order to be divisible by timesteps):
X_train = X_train.reshape((int(X_train.shape[0]/timesteps),timesteps, X_train.shape[1],1))