I am using keras to implement a simple network for binary classification. I have a dataset with 2 categories and I am trying to train my network using this data. I don't have a huge data set. Total number of images in both categories are around 500.
The network is as below:
self.model = Sequential()
self.model.add(Conv2D(128, (2, 2), padding='same', input_shape=dataset.X_train.shape[1:]))
self.model.add(Activation('relu'))
self.model.add(MaxPooling2D(pool_size=(2, 2)))
self.model.add(Dropout(0.25))
self.model.add(Conv2D(64, (2, 2), padding='same'))
self.model.add(Activation('relu'))
self.model.add(MaxPooling2D(pool_size=(2, 2)))
self.model.add(Dropout(0.25))
SGD config:
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
I am using binary_crossentropy
The model training and loss graph look as below:
I am just wondering why there are a lot of big peaks in the graphs and what i can do to optimize it.
I am a newbie thus any comments and suggestions will be appreciated.
thanks!
If you look at the end of each epoch in the training/test,it seems that the accuracy drops(loss also increases),which means that the sequence of your dataset doesn't changes,this might not lead to better generalization of a the model,in my opinion,what you should do at each epoch is to randomize your dataset(batch) in the training phase,but for testing phase,you can just leave it since the model isn't doing any learning anymore
I believe these peaks actually coincides with a start of new epoch. Throughout one epoch gradients of previous batches are used to compute the current gradient when you use momentum. This explains why loss is decreasing steadily throughout one epoch and hikes at the beginning of the next one, that is, when the new epoch starts optimiser doesn't use gradients computed for batches in previous epochs.
Related
I build a LSTM to forecast Precipitation, but it doesn't work well.
My code is very simple and data is very short only contains 720 points.
i use MinMaxScale to scale the data.
this is my code, seq_len = 12
model = Sequential([
layers.LSTM(2, input_shape=(SEQ_LEN, 1),
layers.Dense(1)])
my data is like this
and the output compares with true value like this
I use adam and mae loss function, epoch=10
is it underfitting? or is this simple net can't do this work?
r2_score is no more than 0.55
please tell me how to adjust it. thanks
there are so many options;
first of all it would be better to define the optimized window size by changing the periods of the sequences
The second option would be changing the batch-size of the dataset
Change optimizer into SGD cause of few datapoints and before training model define the best values for learning rate by setting Learning Rate Schedule callback
Try another model architecture with convolution layers and etc
Sometimes it would be a trick to help model performance by setting lambda layer after the last layer to scale up values cause of lstm default activation function is tanh.
I have a question regarding the role of the batch size. My MLP model has 2 Dense-layers with "softmax" activation function:
# Creat my MLP MODEL:
model = Sequential()
model.add(Dense(units=64, input_dim = 100))
model.add(BatchNormalization())
model.add(Activation("softmax"))
model.add(Dense(units=64))
model.add(BatchNormalization())
model.add(Activation("softmax"))
model.add(Dense(units=1))
Green's Batchsize = 2, Pink's Batchsize = 8, Red's Batchsize = 5
The dataset has 84000 samples. Each of the sample consists of 100 input values and 1 output value. Each of the sample describes a different subprocess, so the relationship between the samples do not exist. I have evaluated the training process with different batch_size. What is the reason that the training result looks better when the batch size is increased to 8? As far as I Is there a relationship in my datasample that I was not aware of?
First of all you are useing batchnorm, which, as the name suggests normalises samples based on statistics in the batch, thus it will work better if the sample size (batch size) is bigger. Apart from this higher batch size also means lower variance of your gradient estimator and is often good.
I am working on a project to predict soccer player values from a set of inputs. The data consists of about 19,000 rows and 8 columns (7 columns for input and 1 column for the target) all of numerical values.
I am using a fully connected Neural Network for the prediction but the problem is the loss is not decreasing as it should.
The loss is very large (1e+13) and doesn’t decrease as it should, it just fluctuates.
This is the function I am using to run the model:
def gradient_descent(model, learning_rate, num_epochs, data_loader, criterion):
losses = []
optimizer = torch.optim.Adam(model.parameters())
for epoch in range(num_epochs): # one epoch
for inputs, outputs in data_loader: # one iteration
inputs, outputs = inputs.to(torch.float32), outputs.to(torch.float32)
logits = model(inputs)
loss = criterion(torch.squeeze(logits), outputs) # forward-pass
optimizer.zero_grad() # zero out the gradients
loss.backward() # compute the gradients (backward-pass)
optimizer.step() # take one step
losses.append(loss.item())
loss = sum(losses[-len(data_loader):]) / len(data_loader)
print(f'Epoch #{epoch}: Loss={loss:.3e}')
return losses
The model is fully connected neural network with 4 hidden layers, each with 7 neurons. input layer has 7 neurons and output has 1. I am using MSE for loss function. I tried changing the learning rate but it is still bad.
What could be the reason behind this?
Thank you!
It is difficult to diagnose your problem from the information you provided, but I'll try to point you in some useful directions.
Data Normalization:
The way we initialize the weights in deep NN has a significant effect on the training process. See, e.g.:
He, K., Zhang, X., Ren, S. and Sun, J., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (ICCV 2015).
Most initialization methods assume the inputs have zero mean and unit variance (or similar statistics). If your inputs violate these assumptions, you will find it difficult to train. See, e.g., this post.
Normalize the Targets:
You are trying to solve a regression problem (MSE loss), it might be the case that your targets are poorly scaled and causing very large loss values. Try and normalize the targets to span a more compact range.
Learning Rate:
Try and adjust your learning rate: both increasing it and decreasing it by orders of magnitude.
For fun, I tried to create a neural network that can detect the time difference (t2 - t1) of two consecutive bounces of a ball within 1.5 seconds (disregarding the third bounce). "The idea is that if you have the time difference of the first two bounces, you can calculate the initial rebounce height, through a physics formula."
Input for the CNN was a spectrogram image as shown below. The output is one neuron, which will output the time difference between the first bounce and the second bounce (t1 the first bounce - t2 the second bounce). Overall there are 1000 samples in this CNN.
The first two bounces can have the same time difference, but be placed somewhere else. For example, one sample might be t2-t1=0.810-0.530=0.280 and another sample might be 0.980-0.7=2.80. This is clear in example 1 and example 2.
Exmaple 1 of Spectrogram
Example 2 of Spectrogram
Here is the full code (isn't much):
https://www.codepile.net/pile/Al51wXl6
Here's the network structure:
cnn = tf.keras.models.Sequential()
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=5, activation='relu', input_shape=[1025, 65, 1]))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
cnn.add(tf.keras.layers.Flatten())
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
cnn.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=30)
The output was far of the accuracy I was hoping for:
Mean Absolute error is: ~0.3
So my question is, am I missunderstanding CNN's or why cant my CNN perform this task.
Most critical mistake
Choice of loss function and output unit
You have a regression task (predicting the continious variable time-difference). But your loss function is binary_crossentropy, which is for classification. Must use something like "mean_squared_error" instead.
The output neuron non-linearity is sigmoid, which is for classification (or other things that should saturate between 0.0 and 1.0). Recomment using linear instead.
I am using an example of Variationnal AutoEncoder on MNIST data (2D images) made by others (http://louistiao.me/posts/implementing-variational-autoencoders-in-keras-beyond-the-quickstart-tutorial/) and change it to use it on music but I have a problem.
I choosed this example because the author don't use convolutionnal layers, only Dense layers so it is easy to use it for time series.
Images values are in the interval [0,1] ( [0,255]/255 ).
Musical values are in the interval [-1,1].
So I changed the musical values to fit the model: x = (x+1)/ 2 -> [0,1].
Result: the network isn't learning (my val_loss doesn't decrease).
I have no idea why !!!
PS: I tried to changed the activation relu/sigmoid to tanh instead of changing the data. The loss was decreasing but she was negative... and I had no good results either.