best_score_ vs. accuracy_score_ vs. AUROC performance score for classification models (binary)

best_score_ vs. accuracy_score_ vs. AUROC performance score for classification models (binary) - binary

I have these three metrics for my classification task. Can someone tells me in plain English, what the differences are, which one(s) to use and when to utilize them?
Thank you
for name, model in fitted_models.items():
print(name, model.best_score_)
l1
0.8493863035326624
l2
0.8493863035326624
rf
0.9796513913558318
gb
0.9752980461811722
///////////////////////////////////////////////
for name, model in fitted_models.items():
pred = model.predict(X_test)
print(name, accuracy_score(y_test, pred))
l1
0.8603411513859275
l2
0.8603411513859275
rf
0.9790334044065387
gb
0.9758351101634684
///////////////////////////////////////////////
for name, model in fitted_models.items():
pred = model.predict_proba(X_test)
pred = [p[1] for p in pred]
print(name, roc_auc_score(y_test, pred))
l1
0.9015388373737675
l2
0.9015381433597084
rf
0.9915194952019338
gb
0.988678201643009

1.frist thing :
model.best_score_
return to you accuracy on train data but two others in this question works with test data
2.model.best_score_ & accuracy_score(y_test, pred) return to you mean accuracy for all classes with :
accuracy = (tp + tn) / (tp + fp + fn + tn)
but roc_auc_score(y_test, pred) calculate auc with probability perdicts separate for each class and Of course with area under the ROC curve
3.accuracy on train data not good metrics parameter and roc auc is better than accuracy when we can use predict_proba()
4.at the end Precision and recall is better parameters than accuracy and we must choose wisely between the two them based on our need
i offer you classification_report this method return to you all parameters for each class and average them :
from sklearn.metrics import classification_report
print(classification_report(y_pred=y_pred, y_true=y_test))

Related

Overcoming Overfitting: How to Improve Video Classification AI Training Accuracy

I am developing an AI for video classification, which classifies a video file into one of three labels: Normal, Violent, or Pornography.
Here is a summary of my efforts so far to improve the accuracy of the model:
1. Dataset: I have collected a training dataset of 50,000 videos, consisting of 5000 original videos and 45,000 augmented videos, evenly split between the three labels.
2. Pre-processing: I have used an InceptionV3 model pre-trained on the ImageNet dataset to extract features from the videos for feeding into my main model.
3. Model Architecture: I have tried many different model architectures, but all of them resulted in overfitting problems after a maximum of 15 epochs.
4. Regularization: I have added L1 and L2 regularization, but they did not help improve the model.
5. Early Stopping: I have implemented early stopping, but it stopped training when the validation values were still not good enough to achieve good accuracy.
6. Model Complexity: I have tried both complex and less complex models, but both still resulted in overfitting.
7. Batch Normalization: I have added batch normalization, but it did not solve the overfitting problem.
8. Learning Rate Scheduler: I have tried using ReduceLROnPlateau and LearningRateScheduler togheter and alone, but still no luck.
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, verbose=0, mode='min', min_delta=0.0001, cooldown=0, min_lr=0)
lr_schedule = keras.callbacks.LearningRateScheduler(
lambda epoch: 0.0005* tf.math.exp(-0.05 * epoch),
verbose=True)
9. Computing Resources: I am running the training on AWS Sagemaker ml.t3.2xlarge with 32GB RAM memory.
10. Dataset Size: I would prefer to avoid increasing the size of the dataset as I am running short on time for the project delivery. However, if this is my only option, I am open to suggestions.
11. Tuning regularizer Gradually increase the regularization value in each layer to fine-tune the model.
Please note that these are just examples of the models I have tried, I have experimented with many others with similar results.
x = keras.layers.GRU(32, return_sequences=True, kernel_regularizer=keras.regularizers.l2(0.001))(
frame_features_input, mask=mask_input
)
x = keras.layers.GRU(16, kernel_regularizer=keras.regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.4)(x)
x = keras.layers.Dense(1024, activation="relu",
kernel_regularizer=keras.regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(256, activation="relu",
kernel_regularizer=keras.regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(128, activation="relu",
kernel_regularizer=keras.regularizers.l2(0.001))(x)
output = keras.layers.Dense(len(class_vocab), activation="softmax")(x)
rnn_model = keras.Model([frame_features_input, mask_input], output)
opt = keras.optimizers.experimental.AdamW(
learning_rate=0.0001, # 0.001
weight_decay=0.004, # .004 best perform
beta_1=0.9,
beta_2=0.999,
epsilon=1e-07,
amsgrad=False,
clipnorm=None,
clipvalue=None,
global_clipnorm=None,
use_ema=False,
ema_momentum=0.99,
ema_overwrite_frequency=None,
jit_compile=True,
name="AdamW")
rnn_model.compile(
loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"]
)
x = keras.layers.GRU(128, return_sequences=True, recurrent_dropout=0.3)(frame_features_input)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.GRU(64, return_sequences=False, recurrent_dropout=0.3)(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dense(32, activation="relu", kernel_regularizer=keras.regularizers.l2(0.01))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.BatchNormalization()(x)
output = keras.layers.Dense(len(class_vocab), activation="softmax")(x)
x = keras.layers.GRU(256, return_sequences=True, recurrent_dropout=0.3)(frame_features_input)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.GRU(128, return_sequences=True, recurrent_dropout=0.3)(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.GRU(64, return_sequences=False, recurrent_dropout=0.3)(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dense(32, activation="relu", kernel_regularizer=keras.regularizers.l2(0.01))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.BatchNormalization()(x)
output = keras.layers.Dense
The results
The results using learning rate scheduler
Tried different model architectures, adding regularization, early stopping, and batch normalization, but still faced overfitting issue. Expected improved accuracy, but actual results show overfitting.

Why Is accuracy so different when I use evaluate() and predict()?

I have a Convolutional Neural Network, and it's trying to resolve a classification problem using images (2 classes, so binary classification), using sigmoid.
To evaluate the model I use:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
path_dir = '../../dataset/train'
parth_dir_test = '../../dataset/test'
datagen = ImageDataGenerator(
rescale=1./255,
validation_split = 0.2)
test_set = datagen.flow_from_directory(parth_dir_test,
target_size= (150,150),
batch_size = 64,
class_mode = 'binary')
score = classifier.evaluate(test_set, verbose=0)
print('Test Loss', score[0])
print('Test accuracy', score[1])
And it outputs:
When I try to print the classification report I use:
yhat_classes = classifier.predict_classes(test_set, verbose=0)
yhat_classes = yhat_classes[:, 0]
print(classification_report(test_set.classes,yhat_classes))
But now I get this accuracy:
If I print the test_set.classes, it shows the first 344 numbers of the array as 0, and the next 344 as 1. Is this test_set shuffled before feeding into the network?

I think your model is doing just fine both in "training" and "evaluating".Evaluation accuracy comes on the basis of prediction so maybe you are making some logical mistake while using model.predict_classes().Please check if you are using the trained model weights and not any randomly initialized model while evaluating it.
what "evaluate" does: The model sets apart this fraction of data while training, and will not train on it, and will evaluate loss and any other model's metrics on this data after each "epoch".so, model.evaluate() is for evaluating your trained model. Its output is accuracy or loss, not prediction to your input data!
predict: Generates output predictions for the input samples. model.predict() actually predicts, and its output is target value, predicted from your input data.
FYI: if your accurscy in Binary Classification problem is less than 50%, it's worse than the case that you randomly predict one of those classes (acc = 50%)!

I needed to add a shuffle=False. The code that work is:
test_set = datagen.flow_from_directory(parth_dir_test,
target_size=(150,150),
batch_size=64,
class_mode='binary',
shuffle=False)

Difference between WGAN and WGAN-GP (Gradient Penalty)

I just find that in the code here:
https://github.com/NUS-Tim/Pytorch-WGAN/tree/master/models
The "generator" loss, G, between WGAN and WGAN-GP is different, for WGAN:
g_loss = self.D(fake_images)
g_loss = g_loss.mean().mean(0).view(1)
g_loss.backward(one) # !!!
g_cost = -g_loss
But for WGAN-GP:
g_loss = self.D(fake_images)
g_loss = g_loss.mean()
g_loss.backward(mone) # !!!
g_cost = -g_loss
Why one is one=1 and another is mone=-1?

You might have misread the source code, the first sample you gave is not averaging the resut of D to compute its loss but instead uses the binary cross-entropy.
To be more precise:
The first method ("GAN") uses the BCE loss to compute the loss terms for D and G. The standard GAN optimization objective for D is to minimize E_x[log(D(x))] + E_z[log(1-D(G(z)))]. Source code:
outputs = self.D(images)
d_loss_real = self.loss(outputs.flatten(), real_labels) # <- bce loss
real_score = outputs
# Compute BCELoss using fake images
fake_images = self.G(z)
outputs = self.D(fake_images)
d_loss_fake = self.loss(outputs.flatten(), fake_labels) # <- bce loss
fake_score = outputs
# Optimizie discriminator
d_loss = d_loss_real + d_loss_fake
self.D.zero_grad()
d_loss.backward()
self.d_optimizer.step()
For d_loss_real you optimize towards 1s (output is considered real), while d_loss_fake optimizes towards 0s (output is considered fake).
While the second ("WCGAN") uses the Wasserstein loss (ref) whereby we maximise for D the loss: E_x[D(x)] - E_z[D(G(z))]. Source code:
# Train discriminator
# WGAN - Training discriminator more iterations than generator
# Train with real images
d_loss_real = self.D(images)
d_loss_real = d_loss_real.mean()
d_loss_real.backward(mone)
# Train with fake images
z = self.get_torch_variable(torch.randn(self.batch_size, 100, 1, 1))
fake_images = self.G(z)
d_loss_fake = self.D(fake_images)
d_loss_fake = d_loss_fake.mean()
d_loss_fake.backward(one)
# [...]
Wasserstein_D = d_loss_real - d_loss_fake
By doing d_loss_real.backward(mone) you backpropage with a gradient of opposite sign, i.e. its's a gradient ascend, and you end up maximizing d_loss_real.

In order to Update D network:
lossD = Expectation of D(fake data) - Expectation of D(real data) + gradient penalty
lossD ↓，D(real data) ↑
so you need to add minus one to the gradient process

More than one prediction in multi-classification in Keras?

I am learning about designing Convolutional Neural Networks using Keras. I have developed a simple model using VGG16 as the base. I have about 6 classes of images in the dataset. Here are the code and description of my model.
model = models.Sequential()
conv_base = VGG16(weights='imagenet' ,include_top=False, input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
conv_base.trainable = False
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(6, activation='sigmoid'))
Here is the code for compiling and fitting the model:
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
model.summary()
callbacks = [
EarlyStopping(monitor='acc', patience=1, mode='auto'),
ModelCheckpoint(monitor='val_loss', save_best_only=True, filepath=model_file_path)
]
history = model.fit_generator(
train_generator,
steps_per_epoch=10,
epochs=EPOCHS,
validation_data=validation_generator,
callbacks = callbacks,
validation_steps=10)
Here is the code for prediction of a new image
img = image.load_img(img_path, target_size=(IMAGE_SIZE, IMAGE_SIZE))
plt.figure(index)
imgplot = plt.imshow(img)
x = image.img_to_array(img)
x = x.reshape((1,) + x.shape)
prediction = model.predict(x)[0]
# print(prediction)
Often model.predict() method predicts more than one class.
[0 1 1 0 0 0]
I have a couple of questions
Is it normal for a multiclass classification model to predict more than one output?
How is accuracy measured during training time if more than one class was predicted?
How can I modify the neural network so that only one class is predicted?
Any help is appreciated. Thank you so much!

You are not doing multi-class classification, but multi-label. This is caused by the use of a sigmoid activation at the output layer. To do multi-class classification properly, use a softmax activation at the output, which will produce a probability distribution over classes.
Taking the class with the biggest probability (argmax) will produce a single class prediction, as expected.

prioritized experience replay in deep Q-learning

i was implementing DQN in mountain car problem of openai gym. this problem is special as the positive reward is very sparse. so i thought of implementing prioritized experience replay as proposed in this paper by google deep mind.
there are certain things that are confusing me:
how do we store the replay memory. i get that pi is the priority of transition and there are two ways but what is this P(i)?
if we follow the rules given won't P(i) change every time a sample is added.
what does it mean when it says "we sample according to this probability distribution". what is the distribution.
finally how do we sample from it. i get that if we store it in a priority queue we can sample directly but we are actually storing it in a sum tree.
thanks in advance

According to the paper, there are two ways for calculating Pi and base on your choice, your implementation differs. I assume you selected Proportional Prioriziation then you should use "sum-tree" data structure for storing a pair of transition and P(i). P(i) is just the normalized version of Pi and it shows how important that transition is or in other words how effective that transition is for improving your network. When P(i) is high, it means it's so surprising for the network so it can really help the network to tune itself.
You should add each new transition with infinity priority to make sure it will be played at least once and there is no need to update all the experience replay memory for each new coming transition. During the experience replay process, you select a mini-batch and update the probability of those experiences in the mini-batch.
Each experience has a probability so all of the experiences together make a distribution and we select our next mini-batch according to this distribution.
You can sample via this policy from your sum-tree:
def retrieve(n, s):
if n is leaf_node: return n
if n.left.val >= s: return retrieve(n.left, s)
else: return retrieve(n.right, s - n.left.val)
I have taken the code from here.

You can reuse the code in OpenAI Baseline or using SumTree
import numpy as np
import random
from baselines.common.segment_tree import SumSegmentTree, MinSegmentTree
class ReplayBuffer(object):
def __init__(self, size):
"""Create Replay buffer.
Parameters
----------
size: int
Max number of transitions to store in the buffer. When the buffer
overflows the old memories are dropped.
"""
self._storage = []
self._maxsize = size
self._next_idx = 0
def __len__(self):
return len(self._storage)
def add(self, obs_t, action, reward, obs_tp1, done):
data = (obs_t, action, reward, obs_tp1, done)
if self._next_idx >= len(self._storage):
self._storage.append(data)
else:
self._storage[self._next_idx] = data
self._next_idx = (self._next_idx + 1) % self._maxsize
def _encode_sample(self, idxes):
obses_t, actions, rewards, obses_tp1, dones = [], [], [], [], []
for i in idxes:
data = self._storage[i]
obs_t, action, reward, obs_tp1, done = data
obses_t.append(np.array(obs_t, copy=False))
actions.append(np.array(action, copy=False))
rewards.append(reward)
obses_tp1.append(np.array(obs_tp1, copy=False))
dones.append(done)
return np.array(obses_t), np.array(actions), np.array(rewards), np.array(obses_tp1), np.array(dones)
def sample(self, batch_size):
"""Sample a batch of experiences.
Parameters
----------
batch_size: int
How many transitions to sample.
Returns
-------
obs_batch: np.array
batch of observations
act_batch: np.array
batch of actions executed given obs_batch
rew_batch: np.array
rewards received as results of executing act_batch
next_obs_batch: np.array
next set of observations seen after executing act_batch
done_mask: np.array
done_mask[i] = 1 if executing act_batch[i] resulted in
the end of an episode and 0 otherwise.
"""
idxes = [random.randint(0, len(self._storage) - 1) for _ in range(batch_size)]
return self._encode_sample(idxes)
class PrioritizedReplayBuffer(ReplayBuffer):
def __init__(self, size, alpha):
"""Create Prioritized Replay buffer.
Parameters
----------
size: int
Max number of transitions to store in the buffer. When the buffer
overflows the old memories are dropped.
alpha: float
how much prioritization is used
(0 - no prioritization, 1 - full prioritization)
See Also
--------
ReplayBuffer.__init__
"""
super(PrioritizedReplayBuffer, self).__init__(size)
assert alpha >= 0
self._alpha = alpha
it_capacity = 1
while it_capacity < size:
it_capacity *= 2
self._it_sum = SumSegmentTree(it_capacity)
self._it_min = MinSegmentTree(it_capacity)
self._max_priority = 1.0
def add(self, *args, **kwargs):
"""See ReplayBuffer.store_effect"""
idx = self._next_idx
super().add(*args, **kwargs)
self._it_sum[idx] = self._max_priority ** self._alpha
self._it_min[idx] = self._max_priority ** self._alpha
def _sample_proportional(self, batch_size):
res = []
p_total = self._it_sum.sum(0, len(self._storage) - 1)
every_range_len = p_total / batch_size
for i in range(batch_size):
mass = random.random() * every_range_len + i * every_range_len
idx = self._it_sum.find_prefixsum_idx(mass)
res.append(idx)
return res
def sample(self, batch_size, beta):
"""Sample a batch of experiences.
compared to ReplayBuffer.sample
it also returns importance weights and idxes
of sampled experiences.
Parameters
----------
batch_size: int
How many transitions to sample.
beta: float
To what degree to use importance weights
(0 - no corrections, 1 - full correction)
Returns
-------
obs_batch: np.array
batch of observations
act_batch: np.array
batch of actions executed given obs_batch
rew_batch: np.array
rewards received as results of executing act_batch
next_obs_batch: np.array
next set of observations seen after executing act_batch
done_mask: np.array
done_mask[i] = 1 if executing act_batch[i] resulted in
the end of an episode and 0 otherwise.
weights: np.array
Array of shape (batch_size,) and dtype np.float32
denoting importance weight of each sampled transition
idxes: np.array
Array of shape (batch_size,) and dtype np.int32
idexes in buffer of sampled experiences
"""
assert beta > 0
idxes = self._sample_proportional(batch_size)
weights = []
p_min = self._it_min.min() / self._it_sum.sum()
max_weight = (p_min * len(self._storage)) ** (-beta)
for idx in idxes:
p_sample = self._it_sum[idx] / self._it_sum.sum()
weight = (p_sample * len(self._storage)) ** (-beta)
weights.append(weight / max_weight)
weights = np.array(weights)
encoded_sample = self._encode_sample(idxes)
return tuple(list(encoded_sample) + [weights, idxes])
def update_priorities(self, idxes, priorities):
"""Update priorities of sampled transitions.
sets priority of transition at index idxes[i] in buffer
to priorities[i].
Parameters
----------
idxes: [int]
List of idxes of sampled transitions
priorities: [float]
List of updated priorities corresponding to
transitions at the sampled idxes denoted by
variable `idxes`.
"""
assert len(idxes) == len(priorities)
for idx, priority in zip(idxes, priorities):
assert priority > 0
assert 0 <= idx < len(self._storage)
self._it_sum[idx] = priority ** self._alpha
self._it_min[idx] = priority ** self._alpha
self._max_priority = max(self._max_priority, priority)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

best_score_ vs. accuracy_score_ vs. AUROC performance score for classification models (binary) - binary

Related

Overcoming Overfitting: How to Improve Video Classification AI Training Accuracy

Why Is accuracy so different when I use evaluate() and predict()?

Difference between WGAN and WGAN-GP (Gradient Penalty)

More than one prediction in multi-classification in Keras?

prioritized experience replay in deep Q-learning

Categories

Resources