Low accuracy of a Deep CNN - deep-learning

I have generated a dataset using EMNIST and mathematical symbols that has one character per image or two characters per image. There are 72 possible characters in the dataset. The image is sized at 28x56(hxw).
Ex:- single character double character
There are 5256 (72*73) possible classes considering all the combinations of the characters. This is from 72 possible characters in the first part and 73 possible characters(including a blank) in the second part of the label. I have made sure that each class has around 540-600 images. The total dataset has around 3 million images.
The CNN models I have tried:
input_shape = (28, 56, 1)
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', use_bias=False,
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(.2))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(Activation('relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(.2))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(Activation('relu'))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(.3))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(4096, activation='relu'))
model.add(Dense(units=5256, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['sparse_categorical_accuracy'])
I even tried a model with two Dense layers of 10512 units aswell. I was only able to acheive an accuracy of around 66%. I have tried various batch sizes from 32,64,256 and ADAM optimizer with various learning rates aswell. It would be great if someone can point out what I am doing wrong here.or give some tips on increasing the accuracy.

On following Ruslan S.'s recommendation, I trained the model on Resnet50(Retrained the whole network, not just last layers). I was able to achieve a significant improvement of Accuracy. I was able to reach around 96% accuracy.

Related

How can I get my TensorFlow GAN output to not be fuzzy?

Loss Output
Epoch 996 Output
I have been working on a deep convolutional generative adversarial network (DCGAN) that generates pictures of cats (RGB, 64x64 pixels). It seems to learn rather quickly, as it is clear the images are cats by around the 300th epoch. For some reason, even after 1000 epochs, they have a good amount of blur on them, which is preventing them from being their full resolution. I am almost certain the issue is in my generator network structure, so I have attached it below.
model = tf.keras.Sequential()
model.add(layers.Dense(8*8*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((8, 8, 256)))
assert model.output_shape == (None, 8, 8, 256)
model.add(layers.Conv2DTranspose(256, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 8, 8, 256)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 16, 16, 128)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 32, 32, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
# activation is tanh to not squash out the negatives thats we've been keeping through leakyReLU
model.add(layers.Conv2DTranspose(3, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
print("model.output_shape=", model.output_shape)
assert model.output_shape == (None, 64, 64, 3)[[enter image description here](https://i.stack.imgur.com/eICCQ.png)](https://i.stack.imgur.com/pfMY5.png)
I suspect the problems results from the artifacts generated due to my use of Conv2DTranspose layers, but is it worth it to switch to Upscaling followed by a Convolutional layer? I feel like it would do less learning this way.

InvalidArgumentError: input depth must be evenly divisible by filter depth: 3 vs 6

input_shape=(100,100,6)
input_tensor=keras.Input(input_shape)
model.add(Conv2D(32, 3, padding='same', activation='relu', input_shape=input_shape))
model.add((Conv1D(filters=32, kernel_size=2, activation='relu', padding='same')))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Conv2D(64, 3, padding='same', activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Conv2D(128, 3, padding='same', activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])
training_set = train_datagen.flow_from_directory('/content/gdrive/My Drive/Data/training_set',
target_size = (128, 128),
batch_size = 32,
class_mode = 'categorical')
history=model.fit(training_set,
steps_per_epoch=nb_train_images//batch_size,
epochs=100,
validation_data=test_set,
validation_steps=nb_test_images//batch_size,
callbacks=callbacks)
history=model.fit(training_set,
steps_per_epoch=nb_train_images//batch_size,
epochs=40,
validation_data=test_set,
validation_steps=nb_test_images//batch_size,
callbacks=callbacks)
I have 6 different types of set to classify. where am i going wrong? i have add the input shape in above where i mentioned 1001006 ,can someone help to understand this issue.
This was happening to me too. The following is my code. The way that I fixed it was that instead of having some different input shape, I just made the input shape the training data's image shape.
train = image_gen.flow_from_directory(
train_path,
target_size=(500, 500
),
color_mode='grayscale',
class_mode='binary',
batch_size=16
)
#and then later, when I build the model
model.add(Conv2D(filters[0], (5, 5), padding='same',kernel_regularizer=12(0.001), activation='relu', input_shape=train.image_shape))
#the important part is the input_shape=train.image_shape

Image regression - estimating sensors from images

I am trying to use images to predict the sensor data of a racing game. Being a bit of a newcomer I have multiple questions. All help/suggestion is appreciated.
Dataset
The dataset looks something like:
image: 160x120 grayscale from the bumper view - here is an example of some images
sensors: vector of 21 elements, all normalized between [0, 1], representing the 3 sensors. Those sensors are:
angle between the car and the track axis (sensors[0])
19 rangefinders, returning the distance from the car to the track limit, spanning from -pi and pi (sensors[1:20])
distance from track axis (sensors[20])
Here is an example of a sensor vector.
[
0.01011692 # angle
0.059058 0.299319 0.23943199 0.20102449 0.18029851
0.1706595 0.165723 0.161521 0.15858699 0.15570949 0.15288849
0.150124 0.146348 0.142166 0.1347065 0.121228 0.102669
0.08340649 0.04948675 # rangefinders
0.00183716 # distance from center
]
I generated about 50000 entries in the dataset. In case this is not enough, increasing the size of the dataset is trivial as its creation is an entirely automated process.
Reasons and goal
The end goal is to use the sensors predicted from game frames to drive the car in real time. This way the car can be driven using only images.
The quality of the estimation from both models is not good enough, as the "driver" beheaves weirdly or has no idea of what to do.
Progress and results
I started with a plain CNN:
model = Sequential()
model.add(Conv2D(8, (4, 4), input_shape = (img_height, img_width, stack_depth), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(8, (4, 4), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(8, (4, 4), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(8, (4, 4), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(16, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(16, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(16, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(16, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(32, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(32, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(192, activation="relu"))
model.add(Dense(96, activation="relu"))
model.add(Dense(48, activation="relu"))
model.add(Dense(output_size, activation="linear"))
adam = Adam(learning_rate=1e-5)
model.compile(loss="mean_squared_error", optimizer=adam)
After completing the training the model has
loss (MSE) of 0.02 on "easy" tracks and >0.1 on harder ones (more detailed)
R^2 index of 0.7 on easy and <0.3 on hard
Because the performance of the CNN was not enough I also tried a Residual network. The model is far bigger (4 million parameters) and has slightly better results on detailed tracks:
loss (MSE) of ~0.03 on easy and ~0.05 on hard
R^2 index of 0.7 on easy and ~0.5 on hard
There isn't almost any difference, if not the plain CNN performing better, on simple tracks.
If you want the code for both models is here.
Questions
First of all, are there any errors in my code?
Could a larger dataset improve the quality of the predictions?
Could higher resolution images and/or RGB help? Could a different camera angle (eg camera set higher up) also help?
The dataset is generated in different tracks and with "noisy" driving (swirling from left to right, breaking randomly, etc). Is this a good idea or does it just slow down the training?
In my opinion the sensors vector is (loosely) structured and values are significant to each other to some extent. Can this property be used?
Is there any other recommended architecture or strategy for this kind of regression with images?
Thanks in advance for any answer.

Training hyperspectral data using Tensorflow & Keras

I am looking for an approach to train a hyperspectral image data on Tensorflow.
The training sample is encoded in CSV and has an arbitrary x-y dimension but constant depth:
The data looks like this:
Sample1.csv: 50x4x220 (Row 1-50 is supposed to be aligned with row 51-100, 101-150, and 151-200)
Sample2.csv: 18x71x220 (Row 1-18 is supposed to be aligned with row 19-36, etc.)
Sample3.csv: 33x41x220 (same as above)
....
Sample100.csv: 15x8x220 (same as above)
Is there any project example that I can use? Thanks in advance.
Here is a survey on DL algorithms used to classify hyperspectral datas.
Since you have datas or varying size, you will have to create patches of datas, you won't be able to feed datas of different sizes.
For example you could feed patches of (16, 16, 220) to your network.
I worked on a CNN with images of multispectral bands, I had less bands that you have, the size of patches was obviously important, I used a UNET in image segmentation.
Edit with an example using(None, None, 220) as input :
model = Sequential()
# this applies 32 convolution filters of size 3x3 each.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(None, None, 220)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# model.add(Flatten())
# Replace flatten by GlobalPooling example :
model.add(GlobalMaxPooling2D())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
adam = Adam(lr=1e-4)
model.compile(loss='categorical_crossentropy', optimizer=adam)

Reshaping image data in Keras to match CNN requirements

I've created a CNN designed to recognize objects.
from keras.preprocessing.image import img_to_array, load_img
img = load_img('newimage.jpg')
x = img_to_array(img)
x = x.reshape( (1,) + x.shape )
scores = model.predict(x, verbose=1)
print(scores)
However I'm getting:
expected convolution2d_input_1 to have shape (None, 3, 108, 192) but got array with shape (1, 3, 192, 108)
My model:
def create_model():
model = Sequential()
model.add(Convolution2D(32, 3, 3, input_shape=(3, img_width, img_height)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
return model
I've looked at related answers and the documentation, but at a loss as to how to reshape the array to match what's expected?
I guess the problem is with setting up the image width and height. As the error says:
expected convolution2d_input_1 to have shape (None, 3, 108, 192) # expected width = 108 and height = 192
but got array with shape (1, 3, 192, 108) # width = 192, height = 108
Update: I tested your code with a small change and it worked!
I am giving just changed lines:
img_width, img_height = 960, 717
model.add(Convolution2D(32, 3, 3, input_shape=(img_height, img_width, 3)))
This is the main change - input_shape=(img_height, img_width, 3)
The image i used to run this code was of width = 960 and height = 717. I have updated my previous answer as some part of the answer was wrong! Sorry for that.