Image regression - estimating sensors from images - deep-learning

I am trying to use images to predict the sensor data of a racing game. Being a bit of a newcomer I have multiple questions. All help/suggestion is appreciated.
Dataset
The dataset looks something like:
image: 160x120 grayscale from the bumper view - here is an example of some images
sensors: vector of 21 elements, all normalized between [0, 1], representing the 3 sensors. Those sensors are:
angle between the car and the track axis (sensors[0])
19 rangefinders, returning the distance from the car to the track limit, spanning from -pi and pi (sensors[1:20])
distance from track axis (sensors[20])
Here is an example of a sensor vector.
[
0.01011692 # angle
0.059058 0.299319 0.23943199 0.20102449 0.18029851
0.1706595 0.165723 0.161521 0.15858699 0.15570949 0.15288849
0.150124 0.146348 0.142166 0.1347065 0.121228 0.102669
0.08340649 0.04948675 # rangefinders
0.00183716 # distance from center
]
I generated about 50000 entries in the dataset. In case this is not enough, increasing the size of the dataset is trivial as its creation is an entirely automated process.
Reasons and goal
The end goal is to use the sensors predicted from game frames to drive the car in real time. This way the car can be driven using only images.
The quality of the estimation from both models is not good enough, as the "driver" beheaves weirdly or has no idea of what to do.
Progress and results
I started with a plain CNN:
model = Sequential()
model.add(Conv2D(8, (4, 4), input_shape = (img_height, img_width, stack_depth), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(8, (4, 4), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(8, (4, 4), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(8, (4, 4), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(16, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(16, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(16, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(16, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(32, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Conv2D(32, (3, 3), padding="same", activation = "relu"))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), padding="same", strides = 2, activation = "relu"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(192, activation="relu"))
model.add(Dense(96, activation="relu"))
model.add(Dense(48, activation="relu"))
model.add(Dense(output_size, activation="linear"))
adam = Adam(learning_rate=1e-5)
model.compile(loss="mean_squared_error", optimizer=adam)
After completing the training the model has
loss (MSE) of 0.02 on "easy" tracks and >0.1 on harder ones (more detailed)
R^2 index of 0.7 on easy and <0.3 on hard
Because the performance of the CNN was not enough I also tried a Residual network. The model is far bigger (4 million parameters) and has slightly better results on detailed tracks:
loss (MSE) of ~0.03 on easy and ~0.05 on hard
R^2 index of 0.7 on easy and ~0.5 on hard
There isn't almost any difference, if not the plain CNN performing better, on simple tracks.
If you want the code for both models is here.
Questions
First of all, are there any errors in my code?
Could a larger dataset improve the quality of the predictions?
Could higher resolution images and/or RGB help? Could a different camera angle (eg camera set higher up) also help?
The dataset is generated in different tracks and with "noisy" driving (swirling from left to right, breaking randomly, etc). Is this a good idea or does it just slow down the training?
In my opinion the sensors vector is (loosely) structured and values are significant to each other to some extent. Can this property be used?
Is there any other recommended architecture or strategy for this kind of regression with images?
Thanks in advance for any answer.

Related

Training and validation accuracy producing overfitting

Below are the code of CNN model, issue is the training accuracy is 96% and validation accuracy is 69%. help me to increase the validation accuracy.
`model = Sequential()`
`model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape=(128,128,1), padding ='same', name='Conv_1'))`
`model.add(MaxPooling2D((2,2),name='MaxPool_1'))
`model.add(Conv2D(64, (3, 3), activation = 'relu',padding ='same', name='Conv_2'))
`model.add(MaxPooling2D((2,2),name='MaxPool_2'))
`model.add(Conv2D(128, (3, 3), activation = 'relu', padding ='same', name='Conv_3'))
`model.add(Flatten(name='Flatten'))`
`model.add(Dropout(0.5,name='Dropout'))
`model.add(Dense(128, kernel_initializer='normal', activation='relu', name='Dense_1'))
`model.add(Dense(1, kernel_initializer='normal', activation='sigmoid', name='Dense_2'))`
`model.summary()`
`model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])`
`history = model.fit(x_train2, y_train2, epochs=25, batch_size=10, verbose=2, validation_data=(x_test, y_test))`
Findings:
Train: accuracy = 0.937500 ; loss = 0.125126
Test: accuracy = 0.662508 ; loss = 1.089228
first, you must try increase the epoch
second, maybe you should try add more the Dense layer before the dense for classification or add Dropout layer after the last Conv2d layer.
I hope you get better

InvalidArgumentError: input depth must be evenly divisible by filter depth: 3 vs 6

input_shape=(100,100,6)
input_tensor=keras.Input(input_shape)
model.add(Conv2D(32, 3, padding='same', activation='relu', input_shape=input_shape))
model.add((Conv1D(filters=32, kernel_size=2, activation='relu', padding='same')))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Conv2D(64, 3, padding='same', activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Conv2D(128, 3, padding='same', activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])
training_set = train_datagen.flow_from_directory('/content/gdrive/My Drive/Data/training_set',
target_size = (128, 128),
batch_size = 32,
class_mode = 'categorical')
history=model.fit(training_set,
steps_per_epoch=nb_train_images//batch_size,
epochs=100,
validation_data=test_set,
validation_steps=nb_test_images//batch_size,
callbacks=callbacks)
history=model.fit(training_set,
steps_per_epoch=nb_train_images//batch_size,
epochs=40,
validation_data=test_set,
validation_steps=nb_test_images//batch_size,
callbacks=callbacks)
I have 6 different types of set to classify. where am i going wrong? i have add the input shape in above where i mentioned 1001006 ,can someone help to understand this issue.
This was happening to me too. The following is my code. The way that I fixed it was that instead of having some different input shape, I just made the input shape the training data's image shape.
train = image_gen.flow_from_directory(
train_path,
target_size=(500, 500
),
color_mode='grayscale',
class_mode='binary',
batch_size=16
)
#and then later, when I build the model
model.add(Conv2D(filters[0], (5, 5), padding='same',kernel_regularizer=12(0.001), activation='relu', input_shape=train.image_shape))
#the important part is the input_shape=train.image_shape

Low accuracy of a Deep CNN

I have generated a dataset using EMNIST and mathematical symbols that has one character per image or two characters per image. There are 72 possible characters in the dataset. The image is sized at 28x56(hxw).
Ex:- single character double character
There are 5256 (72*73) possible classes considering all the combinations of the characters. This is from 72 possible characters in the first part and 73 possible characters(including a blank) in the second part of the label. I have made sure that each class has around 540-600 images. The total dataset has around 3 million images.
The CNN models I have tried:
input_shape = (28, 56, 1)
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', use_bias=False,
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(.2))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(Activation('relu'))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(.2))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(Activation('relu'))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(.3))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(4096, activation='relu'))
model.add(Dense(units=5256, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['sparse_categorical_accuracy'])
I even tried a model with two Dense layers of 10512 units aswell. I was only able to acheive an accuracy of around 66%. I have tried various batch sizes from 32,64,256 and ADAM optimizer with various learning rates aswell. It would be great if someone can point out what I am doing wrong here.or give some tips on increasing the accuracy.
On following Ruslan S.'s recommendation, I trained the model on Resnet50(Retrained the whole network, not just last layers). I was able to achieve a significant improvement of Accuracy. I was able to reach around 96% accuracy.

Training hyperspectral data using Tensorflow & Keras

I am looking for an approach to train a hyperspectral image data on Tensorflow.
The training sample is encoded in CSV and has an arbitrary x-y dimension but constant depth:
The data looks like this:
Sample1.csv: 50x4x220 (Row 1-50 is supposed to be aligned with row 51-100, 101-150, and 151-200)
Sample2.csv: 18x71x220 (Row 1-18 is supposed to be aligned with row 19-36, etc.)
Sample3.csv: 33x41x220 (same as above)
....
Sample100.csv: 15x8x220 (same as above)
Is there any project example that I can use? Thanks in advance.
Here is a survey on DL algorithms used to classify hyperspectral datas.
Since you have datas or varying size, you will have to create patches of datas, you won't be able to feed datas of different sizes.
For example you could feed patches of (16, 16, 220) to your network.
I worked on a CNN with images of multispectral bands, I had less bands that you have, the size of patches was obviously important, I used a UNET in image segmentation.
Edit with an example using(None, None, 220) as input :
model = Sequential()
# this applies 32 convolution filters of size 3x3 each.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(None, None, 220)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# model.add(Flatten())
# Replace flatten by GlobalPooling example :
model.add(GlobalMaxPooling2D())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
adam = Adam(lr=1e-4)
model.compile(loss='categorical_crossentropy', optimizer=adam)

Keras ImageDataGenerator not working as expected

I'm trying to build an autoencoder using Keras, based on [this example][1] from the docs. Because my data is large, I'd like to use a generator to avoid loading it into memory.
My model looks like:
model = Sequential()
model.add(Convolution2D(16, 3, 3, activation='relu', border_mode='same', input_shape=(3, 256, 256)))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(16, 3, 3, activation='relu'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(1, 3, 3, activation='sigmoid', border_mode='same'))
model.compile(optimizer='adadelta', loss='binary_crossentropy')
My generator:
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory('IMAGE DIRECTORY', color_mode='rgb', class_mode='binary', batch_size=32, target_size=(256, 256))
And then fitting the model:
model.fit_generator(
train_generator,
samples_per_epoch=1,
nb_epoch=1,
verbose=1,
)
I'm getting this error:
Exception: Error when checking model target: expected convolution2d_76 to have 4 dimensions, but got array with shape (32, 1)
That looks like the size of my batch rather than a sample. What am I doing wrong?
The error is most likely due to the class_mode='binary'. It makes the generator produce binary classes, so the output has shape (batch_size, 1), while your model produces a four dimensional output (since the last layer is a convolution).
I guess that you want your label to be the image itself. Based on the source of the flow_from_directory and the DirectoryIterator it uses, it is impossible to do by just changing the class_mode. A possible solution would be along the lines of:
train_generator_ = train_datagen.flow_from_directory('IMAGE DIRECTORY', color_mode='rgb', class_mode=None, batch_size=32, target_size=(256, 256))
def train_generator():
for x in train_iterator_:
yield x, x
Note that I set class_mode to None. It makes the generator to return just the image instead of tuple(image, label). I then define a new generator, that returns the image as both the input and the label.