Related
I am doing a deep learning project (binary classification) with a ResNet on small datasets (896 total images). I have tried several models where ResNet gives me the best performance even though the model sufferers from exploding gradients with SGD optimizers (Adam converges faster but fluctuates much more). (Code source)
But the model performs better when I try ResNetv50 using transfer learning without initializing any weights (weights = None).
To my understanding, both models should perform similarly, but due to less coding experience, I failed to understand I got a different result.
def identity_block(input_tensor, kernel_size, filters, stage, block):
filters1, filters2, filters3 = filters
bn_axis = 3
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
x = layers.Conv2D(filters1, (1, 1),
kernel_initializer='he_normal',
name=conv_name_base + '2a')(input_tensor)
x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(filters2, kernel_size,
padding='same',
kernel_initializer='he_normal',
name=conv_name_base + '2b')(x)
x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(filters3, (1, 1),
kernel_initializer='he_normal',
name=conv_name_base + '2c')(x)
x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)
x = layers.add([x, input_tensor])
x = layers.Activation('relu')(x)
return x
def conv_block(input_tensor,
kernel_size,
filters,
stage,
block,
strides=(2, 2)):
filters1, filters2, filters3 = filters
bn_axis = 3
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
x = layers.Conv2D(filters1, (1, 1), strides=strides,
kernel_initializer='he_normal',
name=conv_name_base + '2a')(input_tensor)
x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(filters2, kernel_size, padding='same',
kernel_initializer='he_normal',
name=conv_name_base + '2b')(x)
x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(filters3, (1, 1),
kernel_initializer='he_normal',
name=conv_name_base + '2c')(x)
x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)
shortcut = layers.Conv2D(filters3, (1, 1), strides=strides,
kernel_initializer='he_normal',
name=conv_name_base + '1')(input_tensor)
shortcut = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '1')(shortcut)
x = layers.add([x, shortcut])
x = layers.Activation('relu')(x)
return x
def ResNet50(input_shape, classes):
bn_axis = 3
img_input = Input(input_shape)
x = layers.ZeroPadding2D(padding=(3, 3), name='conv1_pad')(img_input)
x = layers.Conv2D(64, (7, 7),
strides=(2, 2),
padding='valid',
kernel_initializer='he_normal',
name='conv1')(img_input)
x = layers.BatchNormalization(axis=bn_axis, name='bn_conv1')(x)
x = layers.Activation('relu')(x)
x = layers.ZeroPadding2D(padding=(1, 1), name='pool1_pad')(x)
x = layers.MaxPooling2D((3, 3), strides=(2, 2))(x)
x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')
x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')
x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
x = layers.Dense(classes, activation='sigmoid', name='fc')(x)
# Create model.
model = models.Model(inputs = img_input, outputs = x, name='resnet50')
return model
model_resnet = ResNet50(input_shape = (3, 256, 256), classes = 1)
# compile model
model_resnet.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(learning_rate = 0.0001),
metrics=['accuracy', 'Recall', 'Precision'])
# make directory for logs
logdir = os.path.join('logs', model_name)
#os.mkdir(logdir)
from math import floor
N_FLODS = 5
INIT_LR = 1e-4 # 0.001
T_BS = 16
V_BS = 16
decay_rate = 0.95
decay_step = 1
# early stopping
cp = EarlyStopping(monitor ='val_loss', mode = 'min', verbose = 2, patience = PATIENCE, restore_best_weights=True)
mc = ModelCheckpoint(model_name, monitor = 'val_loss', mode = 'min', verbose = 2, save_best_only = True)
tsb = TensorBoard(log_dir=logdir)
lrs = LearningRateScheduler(lambda epoch : INIT_LR * pow(decay_rate, floor(epoch / decay_step)))
# training
start = timer()
# Fit the model
history_resnet= model_resnet.fit(train_g1,
epochs=1000,
steps_per_epoch=len(train_g),
validation_data=val_g,
validation_steps=len(val_g),
callbacks= [cp, mc, tsb])
end = timer()
elapsed = end - start
print('Total Time Elapsed: ', int(elapsed//60), ' minutes ', (round(elapsed%60)), ' seconds')
I have images(X_train) and masks data (y_train).
I want to train a unet network. I am currently using iou metric and the validation iou is very low and constant!
I am not sure if I can handle right the scaling preprocessing of images and masks.
I have tried either to use only rescale=1.0/255 in the generator, either to scale only X_train and X_val hence (images) values and not masks values, either scale in the unet model (s = Lambda(lambda x: x / 255.0) (inputs)) . I am not sure if that is the problem, just wondering.
here you can download X_train and y_train data
import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Conv2DTranspose, \
Dropout, Input, Concatenate, Lambda
from imgaug import augmenters as iaa
from tensorflow.keras import backend as K
# gpu setup
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
X_train = np.load('./X_train.npy')
y_train = np.load('/y_train.npy')
X_train = X_train.astype('uint8')
y_train = y_train.astype('uint8')
BATCH_SIZE=8
SEED=123
VAL_SPLIT = 0.2
IMG_HEIGHT = 256
IMG_WIDTH = 256
def augment(images):
seq = iaa.Sequential([
iaa.Fliplr(0.5), # horizontal flips
iaa.Flipud(0.5), # vertical flips
iaa.Sometimes(
0.1,
iaa.GaussianBlur(sigma=(0, 0.5))
),
iaa.LinearContrast((0.75, 1.5)),
iaa.Sharpen(alpha=(0, 1.0), lightness=(0.75, 1.5)),
iaa.BlendAlphaSimplexNoise(
iaa.EdgeDetect(0.3),
upscale_method="linear"),
], random_order=True)
return seq.augment_image(images)
def create_gen(X,
y,
batch_size=BATCH_SIZE,
seed=SEED):
X_train, X_val, y_train, y_val = \
train_test_split(X,
y,
test_size=VAL_SPLIT)
# Image data generator
data_gen_args = dict(rescale = 1.0/255,
preprocessing_function=augment)
data_gen_args_masks = dict( preprocessing_function=augment)
X_datagen = ImageDataGenerator(**data_gen_args)
y_datagen = ImageDataGenerator(**data_gen_args_masks)
X_datagen.fit(X_train, augment=True, seed=seed)
y_datagen.fit(y_train, augment=True, seed=seed)
X_train_augmented = X_datagen.flow(X_train,
batch_size=batch_size,
shuffle=True,
seed=seed)
y_train_augmented = y_datagen.flow(y_train,
batch_size=batch_size,
shuffle=True,
seed=seed)
# Validation data generator
data_gen_args_val = dict(rescale = 1.0/255)
X_datagen_val = ImageDataGenerator(**data_gen_args_val)
y_datagen_val = ImageDataGenerator()
X_datagen_val.fit(X_val, augment=True, seed=seed)
y_datagen_val.fit(y_val, augment=True, seed=seed)
X_val_after = X_datagen_val.flow(X_val,
batch_size=batch_size,
shuffle=False)
y_val_after = y_datagen_val.flow(y_val,
batch_size=batch_size,
shuffle=False)
train_generator = zip(X_train_augmented, y_train_augmented)
val_generator = zip(X_val_after, y_val_after)
steps_per_epoch = X_train_augmented.n // X_train_augmented.batch_size
validation_steps = X_val_after.n // X_val_after.batch_size
return train_generator, val_generator, steps_per_epoch, validation_steps
train_generator, val_generator, steps_per_epoch, validation_steps = \
create_gen(X_train,
y_train,
batch_size=BATCH_SIZE)
# Build U-Net model
inputs = Input((IMG_HEIGHT, IMG_WIDTH, 3))
#s = Lambda(lambda x: x / 255) (inputs) # rescale inputs
c1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (inputs)
c1 = Dropout(0.1) (c1)
c1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)
c2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p1)
c2 = Dropout(0.1) (c2)
c2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c2)
p2 = MaxPooling2D((2, 2)) (c2)
c3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p2)
c3 = Dropout(0.2) (c3)
c3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)
c4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p3)
c4 = Dropout(0.2) (c4)
c4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)
c5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p4)
c5 = Dropout(0.3) (c5)
c5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c5)
u6 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = Concatenate()([u6, c4])
c6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u6)
c6 = Dropout(0.2) (c6)
c6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c6)
u7 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = Concatenate()([u7, c3])
c7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u7)
c7 = Dropout(0.2) (c7)
c7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c7)
u8 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = Concatenate()([u8, c2])
c8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u8)
c8 = Dropout(0.1) (c8)
c8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c8)
u9 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = Concatenate()([u9, c1])
c9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u9)
c9 = Dropout(0.1) (c9)
c9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c9)
outputs = Conv2D(1, (1, 1), activation='sigmoid') (c9)
model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[iouMetric])
EPOCHS = 40
model.fit( train_generator,
validation_data=val_generator,
batch_size=BATCH_SIZE,
steps_per_epoch= steps_per_epoch,
validation_steps=validation_steps,
epochs=EPOCHS)
code for ioumetric:
def castF(x):
return K.cast(x, K.floatx())
def castB(x):
return K.cast(x, bool)
def iou_loss_core(true,pred): #this can be used as a loss if you make it negative
intersection = true * pred
notTrue = 1 - true
union = true + (notTrue * pred)
return (K.sum(intersection, axis=-1) + K.epsilon()) / (K.sum(union, axis=-1) + K.epsilon())
def iouMetric(true, pred):
tresholds = [0.5 + (i * 0.05) for i in range(5)]
#flattened images (batch, pixels)
true = K.batch_flatten(true)
pred = K.batch_flatten(pred)
pred = castF(K.greater(pred, 0.5))
#total white pixels - (batch,)
trueSum = K.sum(true, axis=-1)
predSum = K.sum(pred, axis=-1)
#has mask or not per image - (batch,)
true1 = castF(K.greater(trueSum, 1))
pred1 = castF(K.greater(predSum, 1))
#to get images that have mask in both true and pred
truePositiveMask = castB(true1 * pred1)
#separating only the possible true positives to check iou
testTrue = tf.boolean_mask(true, truePositiveMask)
testPred = tf.boolean_mask(pred, truePositiveMask)
#getting iou and threshold comparisons
iou = iou_loss_core(testTrue,testPred)
truePositives = [castF(K.greater(iou, tres)) for tres in tresholds]
#mean of thressholds for true positives and total sum
truePositives = K.mean(K.stack(truePositives, axis=-1), axis=-1)
truePositives = K.sum(truePositives)
#to get images that don't have mask in both true and pred
trueNegatives = (1-true1) * (1 - pred1) # = 1 -true1 - pred1 + true1*pred1
trueNegatives = K.sum(trueNegatives)
return (truePositives + trueNegatives) / castF(K.shape(true)[0])
I tried other metrics as well, dice loss is also constant and very low. Accuracy is around 79 and constant.
The problem is with the pre-processing. According to tf.keras.preprocessing.image.ImageDataGenerator documentation:
preprocessing_function: function that will be applied on each input. The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.
So, this function will run additionally after the augmentation you mentioned inside ImageDataGenerator. But the problem is that you already have scaled the images with 1.0 / 255, so, the augment() function is getting a scaled image. But according to the documentation of imgaug, it wants an un-scaled image (see the comment inside the example):
'images' should be either a 4D numpy array of shape (N, height, width, channels)
or a list of 3D numpy arrays, each having shape (height, width, channels).
Grayscale images must have shape (height, width, 1) each.
All images must have numpy's dtype uint8. Values are expected to be in
range 0-255.
Edit:
In the output you are using sigmoid activation function. Which will force the output to be always within [0, 1]. But you are not scaling the mask, that means the mask will be within [0, 255]. So, for obvious reason, the model will never be able output these large values, as it is restricted to be within [0, 1]. So, remove the sigmoid from the last layer and see what happens.
I am doing a little experiment on VGG network with keras.
The dataset I use is the flowers dataset with 5 classes including rose, sunflower, dandelion, tulip and daisy.
There is something I could not figure out:
When I used a small CNN network(not VGG, in the code below), it converged quickly and reached a validation accuracy about 75% after only about 8 epochs.
Then I switched to VGG network(the commented out area in the code). The loss and accuracy of the network just did not change at all, it output something like:
Epoch 1/50 402/401 [==============================] - 199s 495ms/step - loss: 13.3214 - acc: 0.1713 - val_loss: 13.0144 - val_acc: 0.1926
Epoch 2/50
402/401 [==============================] - 190s 473ms/step - loss: 13.3473 - acc: 0.1719 - val_loss: 13.0144 - val_acc: 0.1926
Epoch 3/50
402/401 [==============================] - 204s 508ms/step - loss: 13.3423 - acc: 0.1722 - val_loss: 13.0144 - val_acc: 0.1926
Epoch 4/50
402/401 [==============================] - 190s 472ms/step - loss: 13.3522 - acc: 0.1716 - val_loss: 13.0144 - val_acc: 0.1926
Epoch 5/50
402/401 [==============================] - 189s 471ms/step - loss: 13.3364 - acc: 0.1726 - val_loss: 13.0144 - val_acc: 0.1926
Epoch 6/50
402/401 [==============================] - 189s 471ms/step - loss: 13.3453 - acc: 0.1720 - val_loss: 13.0144 - val_acc: 0.1926
Epoch 7/50
Epoch 7/50 402/401 [==============================] - 189s 471ms/step - loss: 13.3503 - acc: 0.1717 - val_loss: 13.0144 - val_acc: 0.1926
PS: I did this experiment with other datasets and frameworks as well (place365 dataset with tensorflow and slim). The result is just the same. I have looked into the VGG paper(Simonyan&Zisserman), it says there are multiple stages to train a deep network like VGG, like from stage A to stage E with different network structures. I am not sure I have to train my VGG network the same way as it is described in the VGG paper. And other online courses did not mention this complex training process as well.
Anyone has any ideas?
My code:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
# dimensions of our images.
img_width, img_height = 224, 224
train_data_dir = './data/train'
validation_data_dir = './data/val'
nb_train_samples = 3213
nb_validation_samples = 457
epochs = 50
batch_size = 8
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
# random cnn model:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(5))
model.add(Activation('softmax'))
# vgg model:
'''model = Sequential([
Conv2D(64, (3, 3), input_shape=input_shape, padding='same',
activation='relu'),
Conv2D(64, (3, 3), activation='relu', padding='same'),
MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
Conv2D(128, (3, 3), activation='relu', padding='same'),
Conv2D(128, (3, 3), activation='relu', padding='same',),
MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
Conv2D(256, (3, 3), activation='relu', padding='same',),
Conv2D(256, (3, 3), activation='relu', padding='same',),
Conv2D(256, (3, 3), activation='relu', padding='same',),
MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
Conv2D(512, (3, 3), activation='relu', padding='same',),
Conv2D(512, (3, 3), activation='relu', padding='same',),
Conv2D(512, (3, 3), activation='relu', padding='same',),
MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
Conv2D(512, (3, 3), activation='relu', padding='same',),
Conv2D(512, (3, 3), activation='relu', padding='same',),
Conv2D(512, (3, 3), activation='relu', padding='same',),
MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
Flatten(),
Dense(256, activation='relu'),
Dense(256, activation='relu'),
Dense(5, activation='softmax')
])'''
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical')
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size)
model.save_weights('flowers.h5')
Problem solved, I changed my learning rate to 0.0001.
It starts to learn now.
It seems like 0.001 is not small enough.
So just a set up of the problem I am trying to solve. I have around 200k 64x64x3 RGB images of patches of terrain that a robot drove over. Each patch has a corresponding label of what the roughness of that image patch is. The roughness values range from 0-160. The data was collected with the robot driving at varying speeds, hence the range of the roughness values. My aim is to be able to predict the roughness of a patch. I am using the VGG-16 network, with the last layer modified to do regression. My batch size is 1024, the loss is mean sqaured error, the optimize is rmsprop. The network is shown below. My problem is that after training, the network predicts the exact same value for each test image. Another point to note is that the training loss is always higher than the validation loss which is odd. Lastly I tried with other optimizers such as SGD and Adam, as well as varying batch sizes. Right now I am trying to train the network from scratch but it does not seem too promising. I am not sure what is going wrong here, and I would really appreciate any help I can get. Thanks
if input_tensor is None:
img_input = Input(shape=input_shape)
else:
if not K.is_keras_tensor(input_tensor):
img_input = Input(tensor=input_tensor, shape=input_shape)
else:
img_input = input_tensor
# Block 1
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
# Block 5
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(1,name='regression_dense')(x)
According to your explanation you have roughness values ranging from 0 to 160. Normalize these values in [-1, 1]. You can stick to your linear activation function in the last layer.
But in general I think you can solve this problem with a much shallower architecture with just a fraction of parameters.
Cheers
Very briefly my question relates to image-size not remaining the same as the input image size after a maxpool layer when I use padding = 'same' in Keras code. I am going through the Keras blog: Building Autoencoders in Keras. I am building Convolution autoencoder. The autoencoder code is as follows:
input_layer = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
As per autoencoder.summary(), the image output after the very-first Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer) layer is 28 X 28 X 16 ie the same as input image size. This is because padding is 'same'.
In [49]: autoencoder.summary()
(Numbering of layers is given by me and not produced in output)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
1.input_1 (InputLayer) (None, 28, 28, 1) 0
_________________________________________________________________
2.conv2d_1 (Conv2D) (None, 28, 28, 16) 160
_________________________________________________________________
3.max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16) 0
_________________________________________________________________
4.conv2d_2 (Conv2D) (None, 14, 14, 8) 1160
_________________________________________________________________
5.max_pooling2d_2 (MaxPooling2 (None, 7, 7, 8) 0
_________________________________________________________________
6.conv2d_3 (Conv2D) (None, 7, 7, 8) 584
_________________________________________________________________
7.max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8) 0
_________________________________________________________________
8.conv2d_4 (Conv2D) (None, 4, 4, 8) 584
_________________________________________________________________
9.up_sampling2d_1 (UpSampling2 (None, 8, 8, 8) 0
_________________________________________________________________
10.conv2d_5 (Conv2D) (None, 8, 8, 8) 584
_________________________________________________________________
11.up_sampling2d_2 (UpSampling2 (None, 16, 16, 8) 0
_________________________________________________________________
12.conv2d_6 (Conv2D) (None, 14, 14, 16) 1168
_________________________________________________________________
13.up_sampling2d_3 (UpSampling2 (None, 28, 28, 16) 0
_________________________________________________________________
14.conv2d_7 (Conv2D) (None, 28, 28, 1) 145
=================================================================
Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?
Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).
`
Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?
There seems to be misunderstanding of what padding does. Padding just takes care of corner cases (what to do next to the boundary of the image). But you have 2x2 maxpooling operation, and in Keras the default stride equals to the pooling size, so stride=2, which halves the image size. You need to specify stride=1 by hand to avoid that. From Keras doc:
pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.
For the second question
Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).
Layer 12 does not have padding=same specified.