I am learning about designing Convolutional Neural Networks using Keras. I have developed a simple model using VGG16 as the base. I have about 6 classes of images in the dataset. Here are the code and description of my model.
model = models.Sequential()
conv_base = VGG16(weights='imagenet' ,include_top=False, input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
conv_base.trainable = False
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(6, activation='sigmoid'))
Here is the code for compiling and fitting the model:
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
model.summary()
callbacks = [
EarlyStopping(monitor='acc', patience=1, mode='auto'),
ModelCheckpoint(monitor='val_loss', save_best_only=True, filepath=model_file_path)
]
history = model.fit_generator(
train_generator,
steps_per_epoch=10,
epochs=EPOCHS,
validation_data=validation_generator,
callbacks = callbacks,
validation_steps=10)
Here is the code for prediction of a new image
img = image.load_img(img_path, target_size=(IMAGE_SIZE, IMAGE_SIZE))
plt.figure(index)
imgplot = plt.imshow(img)
x = image.img_to_array(img)
x = x.reshape((1,) + x.shape)
prediction = model.predict(x)[0]
# print(prediction)
Often model.predict() method predicts more than one class.
[0 1 1 0 0 0]
I have a couple of questions
Is it normal for a multiclass classification model to predict more than one output?
How is accuracy measured during training time if more than one class was predicted?
How can I modify the neural network so that only one class is predicted?
Any help is appreciated. Thank you so much!
You are not doing multi-class classification, but multi-label. This is caused by the use of a sigmoid activation at the output layer. To do multi-class classification properly, use a softmax activation at the output, which will produce a probability distribution over classes.
Taking the class with the biggest probability (argmax) will produce a single class prediction, as expected.
Related
I have a Convolutional Neural Network, and it's trying to resolve a classification problem using images (2 classes, so binary classification), using sigmoid.
To evaluate the model I use:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
path_dir = '../../dataset/train'
parth_dir_test = '../../dataset/test'
datagen = ImageDataGenerator(
rescale=1./255,
validation_split = 0.2)
test_set = datagen.flow_from_directory(parth_dir_test,
target_size= (150,150),
batch_size = 64,
class_mode = 'binary')
score = classifier.evaluate(test_set, verbose=0)
print('Test Loss', score[0])
print('Test accuracy', score[1])
And it outputs:
When I try to print the classification report I use:
yhat_classes = classifier.predict_classes(test_set, verbose=0)
yhat_classes = yhat_classes[:, 0]
print(classification_report(test_set.classes,yhat_classes))
But now I get this accuracy:
If I print the test_set.classes, it shows the first 344 numbers of the array as 0, and the next 344 as 1. Is this test_set shuffled before feeding into the network?
I think your model is doing just fine both in "training" and "evaluating".Evaluation accuracy comes on the basis of prediction so maybe you are making some logical mistake while using model.predict_classes().Please check if you are using the trained model weights and not any randomly initialized model while evaluating it.
what "evaluate" does: The model sets apart this fraction of data while training, and will not train on it, and will evaluate loss and any other model's metrics on this data after each "epoch".so, model.evaluate() is for evaluating your trained model. Its output is accuracy or loss, not prediction to your input data!
predict: Generates output predictions for the input samples. model.predict() actually predicts, and its output is target value, predicted from your input data.
FYI: if your accurscy in Binary Classification problem is less than 50%, it's worse than the case that you randomly predict one of those classes (acc = 50%)!
I needed to add a shuffle=False. The code that work is:
test_set = datagen.flow_from_directory(parth_dir_test,
target_size=(150,150),
batch_size=64,
class_mode='binary',
shuffle=False)
I can't find the problem in my code - I'm training a GAN and the gan loss and discriminator loss are very low, 0.04 and it seems like it's converging well but a - the pictures don't look very good but the actual problem is that, b - somehow when I do gan.predict(noise) it's very close to 1, but when I do discriminator.predict(gan(noise)), it's very close to 0 although it's supposed to be identical. Here's my code:
Generator code:
def create_generator():
generator=tf.keras.Sequential()
#generator.add(layers.Dense(units=50176,input_dim=25))
generator.add(layers.Dense(units=12544,input_dim=100))
#generator.add(layers.Dropout(0.2))
generator.add(layers.Reshape([112,112,1])) #112,112
generator.add(layers.Conv2D(32, kernel_size=3,padding='same',activation='relu'))
generator.add(layers.UpSampling2D()) #224,224
generator.add(layers.Conv2D(1, kernel_size=4,padding='same',activation='tanh'))
generator.compile(loss='binary_crossentropy', optimizer=adam_optimizer())
return generator
g=create_generator()
g.summary()
Discriminator code:
#IMAGE DISCRIMINATOR
def create_discriminator():
discriminator=tf.keras.Sequential()
discriminator.add(layers.Conv2D(64, kernel_size=2,padding='same',activation='relu',input_shape=[224,224,1]))
discriminator.add(layers.Dropout(0.5))
discriminator.add(layers.Conv2D(32,kernel_size=2,padding='same',activation='relu'))
discriminator.add(layers.Dropout(0.5))
discriminator.add(layers.Conv2D(16,kernel_size=2,padding='same',activation='relu'))
discriminator.add(layers.Dropout(0.5))
discriminator.add(layers.Conv2D(8,kernel_size=2,padding='same',activation='relu'))
discriminator.add(layers.Dropout(0.5))
discriminator.add(layers.Conv2D(1,kernel_size=2,padding='same',activation='relu'))
discriminator.add(layers.Dropout(0.5))
discriminator.add(layers.Flatten())
discriminator.add(layers.Dense(units=1, activation='sigmoid'))
discriminator.compile(loss='binary_crossentropy', optimizer=tf.optimizers.Adam(lr=0.0002))
return discriminator
d =create_discriminator()
d.summary()
Gan code:
def create_gan(discriminator, generator):
discriminator.trainable=False
gan_input = tf.keras.Input(shape=(100,))
x = generator(gan_input)
gan_output= discriminator(x)
gan= tf.keras.Model(inputs=gan_input, outputs=gan_output)
#gan.compile(loss='binary_crossentropy', optimizer='adam')
gan.compile(loss='binary_crossentropy', optimizer=adam_optimizer())
return gan
gan = create_gan(d,g)
gan.summary()
Training code (I purposely don't do train_on_batch with the gan cause I wanted to see if the gradients zero out.)
##tf.function
def training(epochs=1, batch_size=128, rounds=50):
batch_count = X_bad.shape[0] / batch_size
# Creating GAN
generator = create_generator()
discriminator = create_discriminator()
###########if you want to continue training an already trained gan
#discriminator.set_weights(weights)
gan = create_gan(discriminator, generator)
start = time.time()
for e in range(1,epochs+1 ):
#print("Epoch %d" %e)
#for _ in tqdm(range(batch_size)):
#generate random noise as an input to initialize the generator
noise= np.random.normal(0,1, [batch_size, 100])
# Generate fake MNIST images from noised input
generated_images = generator.predict(noise)
#print('gen im shape: ',np.shape(generated_images))
# Get a random set of real images
image_batch = X_bad[np.random.randint(low=0,high=X_bad.shape[0],size=batch_size)]
#print('im batch shape: ',image_batch.shape)
#Construct different batches of real and fake data
X= np.concatenate([image_batch, generated_images])
# Labels for generated and real data
y_dis=np.zeros(2*batch_size)
y_dis[:batch_size]=0.99
#Pre train discriminator on fake and real data before starting the gan.
discriminator.trainable=True
discriminator.train_on_batch(X, y_dis)
#Tricking the noised input of the Generator as real data
noise= np.random.normal(0,1, [batch_size, 100])
y_gen = np.ones(batch_size)
# During the training of gan,
# the weights of discriminator should be fixed.
#We can enforce that by setting the trainable flag
discriminator.trainable=False
#training the GAN by alternating the training of the Discriminator
#and training the chained GAN model with Discriminator’s weights freezed.
#gan.train_on_batch(noise, y_gen)
with tf.GradientTape() as tape:
pred=gan(noise)
loss_val=tf.keras.losses.mean_squared_error(y_gen,pred)
# loss_val=gan.test_on_batch(noise,y_gen)
# loss_val=tf.cast(loss_val,dtype=tf.float32)
grads=tape.gradient(loss_val,gan.trainable_variables)
optimizer.apply_gradients(zip(grads, gan.trainable_variables))
if e == 1 or e % rounds == 0:
end = time.time()
loss_value=discriminator.test_on_batch(X, y_dis)
print("Epoch {:03d}: Loss: {:.3f}".format(e,loss_value))
gen_loss=gan.test_on_batch(noise,y_gen)
print('gan loss: ',gen_loss)
#print('Epoch: ',e,' Loss: ',)
print('Time for ',rounds,' epochs: ',end-start,' seconds')
local_time = time.ctime(end)
print('Printing time: ',local_time)
plot_generated_images(e, generator,examples=5)
start = time.time()
return discriminator,generator,grads
Now, the final losses after around 2000 epochs are 0.039 for the generator and 0.034 for the discriminator. The thing is that when I do the following, look what I get
print('disc for bad train: ',np.mean(discriminator(X_bad[:50])))
#output
disc for bad train: 0.9995248
noise= np.random.normal(0,1, [500, 100])
generated_images = generator.predict(noise)
print('disc for gen: ',np.mean(discriminator(generated_images[:50])))
print('gan for gen: ',np.mean(gan(noise[:50])))
#output
disc for gen: 0.0018724388
gan for gen: 0.96554756
Can anyone find the problem?
Thanks!
If anyone stumbles upon this, then I figured it out (although I haven't fixed it yet). What's happening here is that my Gan object is training only the generator weights. The discriminator is a different object that is training itself but when I train the gan, I don't update the discriminator weights in the gan, I only updated the generator weights and therefore when I had the noise as the input to the gan, the output was as I wanted, close to one, however when I took the generated images as the input to the discriminator, since the discriminator was being trained separately from the generator (the discriminator weights were not being changed at all inside the gan), the result was close to zero. A better approach is to create a GAN as a class and be able to approach the discriminator within the gan and the generator within the gan in order to update both of their weights inside the gan.
I want to do a binary classification and I used the DenseNet from Pytorch.
Here is my predict code:
densenet = torch.load(model_path)
densenet.eval()
output = densenet(input)
print(output)
And here is the output:
Variable containing:
54.4869 -54.3721
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]
I want to get the probabilities of each class. What should I do?
I have noticed that torch.nn.Softmax() could be used when there are many categories, as discussed here.
import torch.nn as nn
Add a softmax layer to the classifier layer:
i.e. typical:
num_ftrs = model_ft.classifier.in_features
model_ft.classifier = nn.Linear(num_ftrs, num_classes)
updated:
model_ft.classifier = nn.Sequential(nn.Linear(num_ftrs, num_classes),
nn.Softmax(dim=1))
I am new to deep learning and currently working on using LSTMs for language modeling. I was looking at the pytorch documentation and was confused by it.
If I create a
nn.LSTM(input_size, hidden_size, num_layers)
where hidden_size = 4 and num_layers = 2, I think I will have an architecture something like:
op0 op1 ....
LSTM -> LSTM -> h3
LSTM -> LSTM -> h2
LSTM -> LSTM -> h1
LSTM -> LSTM -> h0
x0 x1 .....
If I do something like
nn.LSTM(input_size, hidden_size, 1)
nn.LSTM(input_size, hidden_size, 1)
I think the network architecture will look exactly like above. Am I wrong? And if yes, what is the difference between these two?
The multi-layer LSTM is better known as stacked LSTM where multiple layers of LSTM are stacked on top of each other.
Your understanding is correct. The following two definitions of stacked LSTM are same.
nn.LSTM(input_size, hidden_size, 2)
and
nn.Sequential(OrderedDict([
('LSTM1', nn.LSTM(input_size, hidden_size, 1),
('LSTM2', nn.LSTM(hidden_size, hidden_size, 1)
]))
Here, the input is feed into the lowest layer of LSTM and then the output of the lowest layer is forwarded to the next layer and so on so forth. Please note, the output size of the lowest LSTM layer and the rest of the LSTM layer's input size is hidden_size.
However, you may have seen people defined stacked LSTM in the following way:
rnns = nn.ModuleList()
for i in range(nlayers):
input_size = input_size if i == 0 else hidden_size
rnns.append(nn.LSTM(input_size, hidden_size, 1))
The reason people sometimes use the above approach is that if you create a stacked LSTM using the first two approaches, you can't get the hidden states of each individual layer. Check out what LSTM returns in PyTorch.
So, if you want to have the intermedia layer's hidden states, you have to declare each individual LSTM layer as a single LSTM and run through a loop to mimic the multi-layer LSTM operations. For example:
outputs = []
for i in range(nlayers):
if i != 0:
sent_variable = F.dropout(sent_variable, p=0.2, training=True)
output, hidden = rnns[i](sent_variable)
outputs.append(output)
sent_variable = output
In the end, outputs will contain all the hidden states of each individual LSTM layer.
I am working on image OCR with my own dataset, I have 1000 images of variable length and I want to feed in images in form of patches of 46X1. I have generated patches of my images and my label values are in Urdu text, so I have encoded them as utf-8. I want to implement CTC in the output layer. I have tried to implement CTC following the image_ocr example at github. But I get the following error in my CTC implementation.
'numpy.ndarray' object has no attribute 'get_shape'
Could anyone guide me about my mistakes? Kindly suggest the solution for it.
My code is:
X_train, X_test, Y_train, Y_test =train_test_split(imageList, labelList, test_size=0.3)
X_train_patches = np.array([image.extract_patches_2d(X_train[i], (46, 1))for i in range (700)]).reshape(700,1,1) #(Samples, timesteps,dimensions)
X_test_patches = np.array([image.extract_patches_2d(X_test[i], (46, 1))for i in range (300)]).reshape(300,1,1)
Y_train=np.array([i.encode("utf-8") for i in str(Y_train)])
Label_length=1
input_length=1
####################Loss Function########
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
#Building Model
model =Sequential()
model.add(LSTM(20, input_shape=(None, X_train_patches.shape[2]), return_sequences=True))
model.add(Activation('relu'))
model.add(TimeDistributed(Dense(12)))
model.add(Activation('tanh'))
model.add(LSTM(60, return_sequences=True))
model.add(Activation('relu'))
model.add(TimeDistributed(Dense(40)))
model.add(Activation('tanh'))
model.add(LSTM(100, return_sequences=True))
model.add(Activation('relu'))
loss_out = Lambda(ctc_lambda_func, name='ctc')([X_train_patches, Y_train, input_length, Label_length])
The way CTC is modelled currently in Keras is that you need to implement the loss function as a layer, you did that already (loss_out). Your problem is that the inputs you give that layer are not tensors from Theano/TensorFlow but numpy arrays.
To change that one option is to model these values as inputs to your model. This is exactly what the implementation does that you copied the code from:
labels = Input(name='the_labels', shape=[img_gen.absolute_max_string_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
To make this work you need to ditch the Sequential model and use the functional model API, exactly as done in the code linked above.