I am new to DL and Keras. Currently I try to implement a Unet-like CNN and now I want to include batch normalization layers into my non-sequential model but do not really now how.
That is my current try to include it:
input_1 = Input((X_train.shape[1],X_train.shape[2], X_train.shape[3]))
conv1 = Conv2D(16, (3,3), strides=(2,2), activation='relu', padding='same')(input_1)
batch1 = BatchNormalization(axis=3)(conv1)
conv2 = Conv2D(32, (3,3), strides=(2,2), activation='relu', padding='same')(batch1)
batch2 = BatchNormalization(axis=3)(conv2)
conv3 = Conv2D(64, (3,3), strides=(2,2), activation='relu', padding='same')(batch2)
batch3 = BatchNormalization(axis=3)(conv3)
conv4 = Conv2D(128, (3,3), strides=(2,2), activation='relu', padding='same')(batch3)
batch4 = BatchNormalization(axis=3)(conv4)
conv5 = Conv2D(256, (3,3), strides=(2,2), activation='relu', padding='same')(batch4)
batch5 = BatchNormalization(axis=3)(conv5)
conv6 = Conv2D(512, (3,3), strides=(2,2), activation='relu', padding='same')(batch5)
drop1 = Dropout(0.25)(conv6)
upconv1 = Conv2DTranspose(256, (3,3), strides=(1,1), padding='same')(drop1)
upconv2 = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same')(upconv1)
upconv3 = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same')(upconv2)
upconv4 = Conv2DTranspose(32, (3,3), strides=(2,2), padding='same')(upconv3)
upconv5 = Conv2DTranspose(16, (3,3), strides=(2,2), padding='same')(upconv4)
upconv5_1 = concatenate([upconv5,conv2], axis=3)
upconv6 = Conv2DTranspose(8, (3,3), strides=(2,2), padding='same')(upconv5_1)
upconv6_1 = concatenate([upconv6,conv1], axis=3)
upconv7 = Conv2DTranspose(1, (3,3), strides=(2,2), activation='linear', padding='same')(upconv6_1)
model = Model(outputs=upconv7, inputs=input_1)
Is the batch normalization used in the right way? In the keras documentation I read that you typically want to normalize the "features axis"!?
This is a short snippet out of the model summary:
====================================================================================================
input_1 (InputLayer) (None, 512, 512, 9) 0
____________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 256, 256, 16) 1312 input_1[0][0]
____________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 128, 32) 4640 conv2d_1[0][0]
____________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 64, 64, 64) 18496 conv2d_2[0][0]
____________________________________________________________________________________________________
In this case my features axis is axis 3(start counting at 0), right?
I read about discussions whether you should implement the batch normalization before or after the activation function. In this case it is used after the activation function, right? Is there a possibility to use it before the activation function?
Thank you very much for your help and feedback! Really appreciate it!
Part 1: Is the batch normalization used in the right way?
The way you've called the BatchNormalization layer is correct; axis=3 is what you want, as recommended by the documentation.
Keep in mind that in the case of your model, axis=3 is equivalent to the default setting, axis=-1, so you do not need to call it explicitly.Part 2: In this case it is used after the activation function, right? Is there a possibility to use it before the activation function?
Yes, batch normalization as defined in the 2014 research paper by Ioffe and Szegedy is intended for use after the activation layer as a means of reducing internal covariate shift. Your code correctly applies the batchnorm after the activations on your convolutional layers. Its use after the activation layer can be thought of as a "pre-processing step" for the information before it reaches the next layer as an input.
For that reason, batch normalization can also serve as a data pre-processing step, which you can use immediately after your input layer (as discussed in this response.) However, as that answer mentions, batchnorm should not be abused; it's computationally expensive and can force your model into approximately linear behavior (this answer goes into more detail about this issue).
Using batchnorm in some other step in the model (not after activation layer or input layer) would have poorly-understood effects on model performance; it's a process intended explicitly to be applied to the outputs of the activation layer.
In my experience with u-nets, I've had a lot of success applying batchnorm only after the convolutional layers before max pooling; this effectively doubles the computational "bang for my buck" on normalization, since these tensors are re-used in the u-net architecture. Aside from that, I don't use batchnorm (except maybe on the inputs if the mean pixel intensities per image are super heterogeneous.)
axis 3 = axis -1 which is the default parameter.
Related
I am trying to implement the architecture I have attached.
The output of my DataLoader has size: torch.Size([128, 192, 9, 1])
I am using a batch size of 128
View just reshapes the output of the dense layer
model = nn.Sequential(nn.Conv2d(192, 32, 3),
nn.ReLU(),
nn.Conv2d(32, 64, 3),
nn.ReLU(),
nn.Conv2d(64, 64, 3),
nn.MaxPool2d(2),
nn.Dropout(0.25),
nn.Flatten(),
nn.Linear(5952, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 126),
View((6, 21)),
nn.Softmax(dim=1))
this is the architecture I currently have and I don't know if my input to the Conv2d are correct
I keep getting errors with my dimensions and kernel sizes. I am unsure how to proceed.
CNN Architecture
I'm having some difficulty understanding the input-output flow of layers in stacked LSTM networks. Let's say i have created a stacked LSTM network like the one below:
# parameters
time_steps = 10
features = 2
input_shape = [time_steps, features]
batch_size = 32
# model
model = Sequential()
model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
model.add(LSTM(32,input_shape=input_shape))
where our stacked-LSTM network consists of 2 LSTM layers with 64 and 32 hidden units respectively. In this scenario, we expect that at each time-step the 1st LSTM layer -LSTM(64)- will pass as input to the 2nd LSTM layer -LSTM(32)- a vector of size [batch_size, time-step, hidden_unit_length], which would represent the hidden state of the 1st LSTM layer at the current time-step. What confuses me is:
Does the 2nd LSTM layer -LSTM(32)- receives as X(t) (as input) the hidden state of the 1st layer -LSTM(64)- that has the size [batch_size, time-step, hidden_unit_length] and passes it through it's own hidden network - in this case consisting of 32 nodes-?
If the first is true, why the input_shape of the 1st -LSTM(64)- and 2nd -LSTM(32)- is the same, when the 2nd only processes the hidden state of the 1st layer? Shouldn't in our case have input_shape set to be [32, 10, 64]?
I found the LSTM visualization below very helpful (found here) but it doesn't expand on stacked-lstm networks:
Any help would be highly appreciated.
Thanks!
The input_shape is only required for the first layer. The subsequent layers take the output of previous layer as its input (as so their input_shape argument value is ignored)
The model below
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32))
represent the below architecture
Which you can verify it from model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_26 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_27 (LSTM) (None, 32) 12416
=================================================================
Replacing the line
model.add(LSTM(32))
with
model.add(LSTM(32, input_shape=(1000000, 200000)))
will still give you the same architecture (verify using model.summary()) because the input_shape is ignore as it takes as input the tensor output of the previous layer.
And If you need a sequence to sequence architecture like below
you should be using the code:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32, return_sequences=True))
which should return a model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_32 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_33 (LSTM) (None, 5, 32) 12416
=================================================================
In keras document, mentioned the input is [batch_size, time-step, input_dim], rather than [batch_size, time-step, hidden_unit_length], so I think 64, 32 coorresponding the X-input's has 64 features and LSTM-32 has 32 features for each time-step.
I am looking for a way to numerically evaluate the results of my unet-like CNN.
The CNN is trained to remove artifacts from grayscale images. Therefore the CNN gets a "9 channel" grayscale image containing artifacts in each channel (9 grayscale images with partially redundant data but different artifacts are concatenated --> dimensions[numTrainInputs, 512, 512, 9]) as input and should output a single grayscale image without artifacts [numTrainInputs, 512, 512, 1]. The CNN is trained using MSE as loss function and Adam as Optimizer and Keras. So far, so good.
Visually the CNN provides good results when compared to an artifact free "ground truth" image --> dimensions[numTrainInputs, 512, 512, 1] but the accuracy during training remains at 0%. I think this is because none of the result images perfectly fits to the ground truth, right!?
But how can I numerically evaluate the results? I searched for some numerical evaluations in the field of autoencoders but coulnd't find a proper way. Can someone give me a hint?
The CNN looks like this:
input_1 = Input((X_train.shape[1],X_train.shape[2], X_train.shape[3]))
conv1 = Conv2D(16, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(input_1)
conv2 = Conv2D(32, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(conv1)
conv3 = Conv2D(64, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(conv2)
conv4 = Conv2D(128, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(conv3)
conv5 = Conv2D(256, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(conv4)
conv6 = Conv2D(512, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(conv5)
upconv1 = Conv2DTranspose(256, (3,3), strides=(1,1), activation='elu', use_bias=True, padding='same')(conv6)
upconv2 = Conv2DTranspose(128, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(upconv1)
upconv3 = Conv2DTranspose(64, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(upconv2)
upconv3_1 = concatenate([upconv3, conv4], axis=3)
upconv4 = Conv2DTranspose(32, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(upconv3_1)
upconv4_1 = concatenate([upconv4, conv3], axis=3)
upconv5 = Conv2DTranspose(16, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(upconv4_1)
upconv5_1 = concatenate([upconv5,conv2], axis=3)
upconv6 = Conv2DTranspose(8, (3,3), strides=(2,2), activation='elu', use_bias=True, padding='same')(upconv5_1)
upconv6_1 = concatenate([upconv6,conv1], axis=3)
upconv7 = Conv2DTranspose(1, (3,3), strides=(2,2), activation='linear', use_bias=True, padding='same')(upconv6_1)
model = Model(outputs=upconv7, inputs=input_1)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=1, epochs=100, shuffle=True, validation_split=0.01, callbacks=[tbCallback])
Thank you very much for your help!
You are using the wrong metrics for this problem.
In regression 'accuracy' as metric makes no sense.
Change it to MSE for example:
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error']))
I want to train my data with a convolution neural network, I have reshaped my data:
Those are parameters that I have used:
'x_train.shape'=(500000, 3253)
'y_train.shape', (500000,)
'y_test.shape', (20000,)
'y_train[0]', 97
'y_test[0]', 99
'y_train.shape', (500000, 256)
'y_test.shape', (20000, 256)
This is how I define my model architecture:
# 3. Define model architecture
model = Sequential()
model.add(Conv1D(64, 8, strides=1, padding='valid',
dilation_rate=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform',
bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, bias_constraint=None, input_shape=x_train.shape))
# input_traces=N_Features
# input_shape=(batch_size, trace_lenght,num_of_channels)
model.add(MaxPooling1D(pool_size=2,strides=None, padding='valid'))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(1, activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, batch_size=100, epochs=500,verbose=2)
But i got two Errors :
1-
ValueError: Error when checking input: expected conv1d_1_input to have shape (None, 500000, 3253) but got array with shape (500000, 3253, 1)
2-
With model.fit()
How do I resolve this problem?
The input shape is wrong, it should be input_shape = (1, 3253) for Theano or (3253, 1) for TensorFlow. The input shape doesn't include the number of samples.
Then you need to reshape your data to include the channels axis:
x_train = x_train.reshape((500000, 1, 3253))
Or move the channels dimension to the end if you use TensorFlow. After these changes it should work.
input_shape = (3253, 1)
this must be Input_shape of first Convolution layer Conv1D
You got error with model.fit() Because you still don't build your model yet.
I have a csv file with 339732 rows and two columns :
the first being 29 feature values, i.e. X
the second being a binary label value, i.e. Y
dataframe = pd.read_csv("features.csv", header = None)
dataset = dataframe.values
X = dataset[:, 0:29].astype(float)
Y = dataset[:,29]
X_train, y_train, X_test, y_test = train_test_split(X,Y, random_state = 42)
I am trying to train it on a 1D convolutional layer:
model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(X_train.shape[0], 29)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(128, 3, activation='relu'))
model.add(Conv1D(128, 3, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=16, epochs=2)
score = model.evaluate(X_test, y_test, batch_size=16)
Since, the Conv1D layer expects a 3-D input, I transformed my input as follows:
X_train = np.reshape(X_train, (1, X_train.shape[0], X_train.shape[1]))
X_test = np.reshape(X_test, (1, X_test.shape[0], X_test.shape[1]))
However, this still throws error:
ValueError: Negative dimension size caused by subtracting 3 from 1 for 'conv1d_1/convolution/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1,29], [1,3,29,64].
Is there any way to feed my input correctly?
As far as I know 1D Convolution layer accepts inputs of the form Batchsize x Width x Channels. You are reshaping with
X_train = np.reshape(X_train, (1, X_train.shape[0], X_train.shape[1]))
But X_train.shape[0] is your batchsize I guess.I think the problem is somewhere here. Can you please tell what is the shape of X_train before reshape?
You have to think about if your data have some progression relation between the 339732 entries or the 29 features, this means if the order matters. If not I don't think that CNN is suitable for this case.
If the 29 features "indicates the progression of something":
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1],1))
If the 29 features are independent, then is like the channels on the image, but doesn't make sense convolute with only 1.
X_train = X_train.reshape((X_train.shape[0],1, X_train.shape[1]))
If you want to pick the 339732 entries like in blocks where the order matters (clip the 339732 or add zero padding in order to be divisible by timesteps):
X_train = X_train.reshape((int(X_train.shape[0]/timesteps),timesteps, X_train.shape[1],1))