Usage of class_weights in catboostclassifier - catboost

How to use 'class_weights' while using CatboostClassifier for Multiclass problem. The documentation says it should be a list but In what order do I need to put the weights? I have a label array with 15 classes from -2 to +2 including decimal numbers, with class-0 having much higher density compared to the others.
Please help.
Thanks,
I tried for the binary class which is easier to work with but no clue about multiclass.
cb_model_step1 = run_catboost(X_train, y_train_new, X_test, y_test_new, n_estimators = 1000, verbose=100, eta = 0.3, loss_function = 'MultiClassOneVsAll', class_weights = counter_new)
cb = CatBoostClassifier(thread_count=4, n_estimators=n_estimators, max_depth=10, class_weights = class_weights, eta=eta, loss_function = loss_function)

Now it is possible to pass a dictionary with labels and corresponding weights.
Suppose we have X_train, y_train and multiclassification problem. Then we can do the following
import numpy as np
from catboost import CatBoostClassifier
from sklearn.utils.class_weight import compute_class_weight
classes = np.unique(y_train)
weights = compute_class_weight(class_weight='balanced', classes=classes, y=y_train)
class_weights = dict(zip(classes, weights))
clf = CatBoostClassifier(loss_function='MultiClassOneVsAll', class_weights=class_weights)
clf.fit(X_train, y_train)

you need to fit model without any weights on tour dataset, then run CatBoostClassifier().classes_. it will show you classes order in catboost:
model_multiclass = CatBoostClassifier(iterations=1000,
depth=4,
learning_rate=0.05,
loss_function='MultiClass',
verbose=True,
early_stopping_rounds = 200,
bagging_temperature = 1,
metric_period = 100)
model_multiclass.fit(X_train, Y_train)
model_multiclass.classes_
Result:['35мр', '4мр', 'вывод на ИП', 'вывод на кк', 'вывод на фл', 'транзит']

Related

Building neural network using k-fold cross validation

I am new to deep learning, trying to implement a neural network using 4-fold cross-validation for training, testing, and validating. The topic is to classify the vehicle using an existing dataset.
The accuracy result is 0.7.
Traning Accuracy
An example output for epochs
I also don't know whether the code is correct and what to do for increasing the accuracy.
Here is the code:
!pip install category_encoders
import tensorflow as tf
from sklearn.model_selection import KFold
import pandas as pd
import numpy as np
from tensorflow import keras
import category_encoders as ce
from category_encoders import OrdinalEncoder
car_data = pd.read_csv('car_data.csv')
car_data.columns = ['Purchasing', 'Maintenance', 'No_Doors','Capacity','BootSize','Safety','Evaluation']
# Extract the features and labels from the dataset
X = car_data.drop(['Evaluation'], axis=1)
Y = car_data['Evaluation']
encoder = ce.OrdinalEncoder(cols=['Purchasing', 'Maintenance', 'No_Doors','Capacity','BootSize','Safety'])
X = encoder.fit_transform(X)
X = X.to_numpy()
Y_df = pd.DataFrame(Y, columns=['Evaluation'])
encoder = OrdinalEncoder(cols=['Evaluation'])
Y_encoded = encoder.fit_transform(Y_df)
Y = Y_encoded.to_numpy()
input_layer = tf.keras.layers.Input(shape=(X.shape[1]))
# Define the hidden layers
hidden_layer_1 = tf.keras.layers.Dense(units=64, activation='relu', kernel_initializer='glorot_uniform')(input_layer)
hidden_layer_2 = tf.keras.layers.Dense(units=32, activation='relu', kernel_initializer='glorot_uniform')(hidden_layer_1)
# Define the output layer
output_layer = tf.keras.layers.Dense(units=1, activation='sigmoid', kernel_initializer='glorot_uniform')(hidden_layer_2)
# Create the model
model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
# Initialize the 4-fold cross-validation
kfold = KFold(n_splits=4, shuffle=True, random_state=42)
# Initialize a list to store the scores
scores = []
quality_weights= []
# Compile the model
model.compile(optimizer='adam',
loss=''sparse_categorical_crossentropy'',
metrics=['accuracy'],
sample_weight_mode='temporal')
for train_index, test_index in kfold.split(X,Y):
# Split the data into train and test sets
X_train, X_test = X[train_index], X[test_index]
Y_train, Y_test = Y[train_index], Y[test_index]
# Fit the model on the training data
model.fit(X_train, Y_train, epochs=300, batch_size=64, sample_weight=quality_weights)
# Evaluate the model on the test data
score = model.evaluate(X_test, Y_test)
# Append the score to the scores list
scores.append(score[1])
plt.plot(history.history['accuracy'])
plt.title('Model Training Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper left')
plt.show()
# Print the mean and standard deviation of the scores
print(f'Mean accuracy: {np.mean(scores):.3f} +/- {np.std(scores):.3f}')
The first thing that caught my attention was here:
model.fit(X_train, Y_train, epochs=300, batch_size=64, sample_weight=quality_weights)
Your quality_weights should be a numpy array of size of the input.
Refer here: https://keras.io/api/models/model_training_apis/#fit-method
If changing that doesn't seemt to help then may be your network doesn't seem to be learning from the data. A few possible reasons could be:
The network is a bit too shallow. Try adding just one more hidden layer to see if that improves anything
From the code I can't see the size of your input data. Does it have enough datapoints for 4-fold cross-validation? Can you somehow augment the data?

How to increase Emotion Detection Validation Accuracy on VGG16 model ? [Transfer Learning]

import pandas as pd
import numpy as np
import keras
import tensorflow
from keras.models import Model
from keras.layers import Dense
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing import image
trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory="path",target_size=(224,224))
tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory="path", target_size=(224,224))
from keras.applications.vgg16 import VGG16
vggmodel = VGG16(weights='imagenet', include_top=True)
vggmodel.summary()
for layers in (vggmodel.layers)[:19]:
print(layers)
layers.trainable = False
#flatten_out = tensorflow.keras.layers.Flatten()(vggmodel.output)
#fc1 = tensorflow.keras.layers.Dense(units=4096,activation="relu")(flatten_out)
#fc2 = tensorflow.keras.layers.Dense(units=4096,activation="relu")(fc1)
#fc3 = tensorflow.keras.layers.Dense(units=256,activation="relu")(fc2)
#predictions = tensorflow.keras.layers.Dense(units=3, activation="softmax")(fc3)
X= vggmodel.layers[-2].output
predictions = Dense(units=3, activation="softmax")(X)
model_final = Model(vggmodel.input, predictions)
model_final.compile(loss = "categorical_crossentropy", optimizer = optimizers.SGD(lr=0.001, momentum=0.9), metrics=["accuracy"])
model_final.summary()
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping
checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=40, verbose=1, mode='auto')
model_final.fit_generator(generator= traindata, steps_per_epoch= 95, epochs= 100, validation_data= testdata, validation_steps=7, callbacks=[checkpoint,early])
i am classifying emotion in positive, negative and neutral.
i a, using Vgg16 transfer learning model.
though i m still not getting better validation accuracy.
things i've tried:
increase the number of training data
layers.trainable=False/True
learning rate:0.0001,0.001,0.01
Activation function= relu/softmax
batch size= 64
optimizers= adam/sgd
loss fn= categoricalcrossentrpy / sparsecategoricalcrossentrpy
momentum =0.09 /0.9
also, i tried to change my dataset color to GRAY and somehow it gave better accuracy than previous COLOR IMAGE but it is still not satisfactory.
i also changed my code and add dropout layers but still no progress.
i tried with FER2013 dataset it was giving me pretty decent accuracy.
these are the results on the FER dataset:
accuracy: 0.9997 - val_accuracy: 0.7105
but on my own dataset(which is pretty good) validation accuracy is not increasing more than 66%.
what else can I do to increase val_accuracy?
I think your model is more complex than necessary. I would remove the fc1 and fc2 layers. I would include regularization in the fc3 layer. I would add a dropout layer after the fc3 . In your early stopping callback change patience to 4. I recommend you use the Keras callback Reduce Learning rate on plateau. Full recommendations are in the code below
#flatten_out = tensorflow.keras.layers.Flatten()(vggmodel.output)
#fc3 = tensorflow.keras.layers.Dense(kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu'))(flatten_out)
x=Dropout(rate=.4, seed=123)
#predictions = tensorflow.keras.layers.Dense(units=3, activation="softmax")(x)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor='val_loss',
factor=0.4,patience=2,
verbose=0, mode='auto')
callbacks=[rlronp, checkpoint, early]
X= vggmodel.layers[-2].output
predictions = Dense(units=3, activation="softmax")(X)
model_final.fit_generator(generator= traindata, steps_per_epoch= 95, epochs= 100, validation_data= testdata, validation_steps=7, callbacks=callbacks)
I do not like VGG it is a very large model and is a bit old and slow. I think you will get better and faster result using EfficientNet models, EfficientNetB3 should work fine.
If you want to try that get rid of all code for VGG and use
lr=.001
img_size=(256,256)
base_model=tf.keras.applications.efficientnet.EfficientNetB3(include_top=False,
weights="imagenet",input_shape=img_shape, pooling='max')
base_model.trainable=True
x=base_model.output
x=BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x = Dense(256, kernel_regularizer = regularizers.l2(l =
0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu')(x)
x=Dropout(rate=.4, seed=123)(x)
output=Dense(class_count, activation='softmax')(x)
model=Model(inputs=base_model.input, outputs=output)
model.compile(Adamax(learning_rate=lr), loss='categorical_crossentropy', metrics=
['accuracy'])
NOTE: EfficientNet models expect pixels in the range 0 to 255 so don't scale the pixels. Also note I make the base model trainable. They tell you NOT to do that but in many experiments I find training the base model from the outset leads to faster convergence and net lower validation loss.

How to feed an LSTM/GRU model multiple independent Time Series?

In Order to explain it simply: I have 53 Oil Producing wells measurements, each well has been measured each day for 6 years, we recorded multiple variables (Pressure, water production, gas production...etc), and our main component(The one we want to study and forecast) is the Oil production rate. How can I Use all the data to train my model of LSTM/GRU knowing that the Oil wells are independent and that the measurments have been done in the same time for each one?
The knowledge that "the measurments have been done in the same time for each [well]" is not necessary if you want to assume that the wells are independent. (Why do you think that that knowledge is useful?)
So if the wells are considered independent, treat them as individual samples. Split them into a training set, validation set, and test set, as usual. Train a usual LSTM or GRU on the training set.
By the way, you might want to use the attention mechanism instead of recurrent networks. It is easier to train and usually yields comparable results.
Even convolutional networks might be good enough. See methods like WaveNet if you suspect long-range correlations.
These well measurements sound like specific and independent events. I work in the finance sector. We always look at different stocks, and each stocks specific time neries using LSTM, but not 10 stocks mashed up together. Here's some code to analyze a specific stock. Modify the code to suit your needs.
from pandas_datareader import data as wb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.preprocessing import MinMaxScaler
start = '2019-06-30'
end = '2020-06-30'
tickers = ['GOOG']
thelen = len(tickers)
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Open','Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Open', 'Adj Close']])
#names = np.reshape(price_data, (len(price_data), 1))
df = pd.concat(price_data)
df.reset_index(inplace=True)
for col in df.columns:
print(col)
#used for setting the output figure size
rcParams['figure.figsize'] = 20,10
#to normalize the given input data
scaler = MinMaxScaler(feature_range=(0, 1))
#to read input data set (place the file name inside ' ') as shown below
df['Adj Close'].plot()
plt.legend(loc=2)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
ntrain = 80
df_train = df.head(int(len(df)*(ntrain/100)))
ntest = -80
df_test = df.tail(int(len(df)*(ntest/100)))
#importing the packages
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
#dataframe creation
seriesdata = df.sort_index(ascending=True, axis=0)
new_seriesdata = pd.DataFrame(index=range(0,len(df)),columns=['Date','Adj Close'])
length_of_data=len(seriesdata)
for i in range(0,length_of_data):
new_seriesdata['Date'][i] = seriesdata['Date'][i]
new_seriesdata['Adj Close'][i] = seriesdata['Adj Close'][i]
#setting the index again
new_seriesdata.index = new_seriesdata.Date
new_seriesdata.drop('Date', axis=1, inplace=True)
#creating train and test sets this comprises the entire data’s present in the dataset
myseriesdataset = new_seriesdata.values
totrain = myseriesdataset[0:255,:]
tovalid = myseriesdataset[255:,:]
#converting dataset into x_train and y_train
scalerdata = MinMaxScaler(feature_range=(0, 1))
scale_data = scalerdata.fit_transform(myseriesdataset)
x_totrain, y_totrain = [], []
length_of_totrain=len(totrain)
for i in range(60,length_of_totrain):
x_totrain.append(scale_data[i-60:i,0])
y_totrain.append(scale_data[i,0])
x_totrain, y_totrain = np.array(x_totrain), np.array(y_totrain)
x_totrain = np.reshape(x_totrain, (x_totrain.shape[0],x_totrain.shape[1],1))
#LSTM neural network
lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(x_totrain.shape[1],1)))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dense(1))
lstm_model.compile(loss='mean_squared_error', optimizer='adadelta')
lstm_model.fit(x_totrain, y_totrain, epochs=10, batch_size=1, verbose=2)
#predicting next data stock price
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
myinputs = myinputs.reshape(-1,1)
myinputs = scalerdata.transform(myinputs)
tostore_test_result = []
for i in range(60,myinputs.shape[0]):
tostore_test_result.append(myinputs[i-60:i,0])
tostore_test_result = np.array(tostore_test_result)
tostore_test_result = np.reshape(tostore_test_result,(tostore_test_result.shape[0],tostore_test_result.shape[1],1))
myclosing_priceresult = lstm_model.predict(tostore_test_result)
myclosing_priceresult = scalerdata.inverse_transform(myclosing_priceresult)
totrain = df_train
tovalid = df_test
#predicting next data stock price
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
# Printing the next day’s predicted stock price.
print(len(tostore_test_result));
print(myclosing_priceresult);
Final result:
1
[[1396.532]]

Getting polynomial regression to overfit with TensorFlow

The Sklearn documentation contains an example of a polynomial regression which beautifully illustrates the idea of overfitting (link).
The third plot shows a 15th order polynomial that overfits the simulated data. I replicated this model in TensorFlow, but I cannot get it to overfit.
Even when tuning the learning rate and the numbers of learning epochs, I cannot get the model to overfit. What am I missing?
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
def true_fun(X):
return np.cos(1.5 * np.pi * X)
# Generate dataset
n_samples = 30
np.random.seed(0)
x_train = np.sort(np.random.rand(n_samples)) # Draw from uniform distribution
y_train = true_fun(x_train) + np.random.randn(n_samples) * 0.1
x_test = np.linspace(0, 1, 100)
y_true = true_fun(x_test)
# Helper function
def run_dir(base_dir, dirname='run'):
"Number log directories incrementally"
import os
import re
pattern = re.compile(dirname+'_(\d+)')
try:
previous_runs = os.listdir(base_dir)
except FileNotFoundError:
previous_runs = []
run_number = 0
for name in previous_runs:
match = pattern.search(name)
if match:
number = int(match.group(1))
if number > run_number:
run_number = number
run_number += 1
logdir = os.path.join(base_dir, dirname + '_%02d' % run_number)
return(logdir)
# Define the polynomial model
def model(X, w):
"""Polynomial model
param X: data
param y: coeficients in the polynomial regression
returns: Polynomial function Y(X, w)
"""
terms = []
for i in range(int(w.shape[0])):
term = tf.multiply(w[i], tf.pow(X, i))
terms.append(term)
return(tf.add_n(terms))
# Create the computation graph
order = 15
tf.reset_default_graph()
X = tf.placeholder("float")
Y = tf.placeholder("float")
w = tf.Variable([0.]*order, name="parameters")
lambda_reg = tf.placeholder('float', shape=[])
learning_rate_ph = tf.placeholder('float', shape=[])
y_model = model(X, w)
loss = tf.div(tf.reduce_mean(tf.square(Y-y_model)), 2) # Square error
loss_rg = tf.multiply(lambda_reg, tf.reduce_sum(tf.square(w))) # L2 pentalty
loss_total = tf.add(loss, loss_rg)
loss_hist1 = tf.summary.scalar('loss', loss)
loss_hist2 = tf.summary.scalar('loss_rg', loss_rg)
loss_hist3 = tf.summary.scalar('loss_total', loss_total)
summary = tf.summary.merge([loss_hist1, loss_hist2, loss_hist3])
train_op = tf.train.GradientDescentOptimizer(learning_rate_ph).minimize(loss_total)
init = tf.global_variables_initializer()
def train(sess, x_train, y_train, lambda_val=0, epochs=2000, learning_rate=0.01):
feed_dict={X: x_train, Y: y_train, lambda_reg: lambda_val, learning_rate_ph: learning_rate}
logdir = run_dir("logs/polynomial_regression2/")
writer = tf.summary.FileWriter(logdir)
sess.run(init)
for epoch in range(epochs):
_, summary_str = sess.run([train_op, summary], feed_dict=feed_dict)
writer.add_summary(summary_str, global_step=epoch)
final_cost, final_cost_rg, w_learned = sess.run([loss, loss_rg, w], feed_dict=feed_dict)
return final_cost, final_cost_rg, w_learned
def plot_test(w_learned, x_test, x_train, y_train):
y_learned = calculate_y(x_test, w_learned)
plt.scatter(x_train, y_train)
plt.plot(x_test, y_true, label="true function")
plt.plot(x_test, y_learned,'r', label="learned function")
#plt.title('$\lambda = {:03.2f}$'.format(lambda_values[i]))
plt.ylabel('y')
plt.xlabel('x')
plt.legend()
plt.show()
def calculate_y(x, w):
y = 0
for i in range(w.shape[0]):
y += w[i] * np.power(x, i)
return y
sess = tf.Session()
final_cost, final_cost_rg, w_learned = train(sess, x_train, y_train, lambda_val=0,
learning_rate=0.3, epochs=2000)
sess.close()
plot_test(w_learned, x_test, x_train, y_train)
I have same problem about this. When I do polynomial regression, I also can't overfit the data by using GD in Tensorflow.
Then I compare the coefficients(weights) of the model by using sklearn LinearRegression, I found when the polynomial degree is larger the coefficient of high order is very smaller(i.e. 1e-4), and the low order is relative large(i.e. 0.1).
That's mean when you using GD algorithm for searching the best value of weights, the high order coefficient become extreme sensitive about the value change, and the low order coefficient is not.
And I guess the best coefficient(overfit with data) of low order term is large, and of high order term is tiny. When you set large learning rate, it's impossible to find the right answer, and when you set tiny learning rate, you need lots of iterations.
It's obvious when you using GD algorithm with small data set to make overfit.

How to use keras to fine-tune inception v3 to do multi-class classification?

I want to use Keras to do two classes image classify using Cat vs. Dog dataset from Kaggle.com.
But I have some problem with param "class_mode" as below code.
if I use "binary" mode, accuracy is about 95%, but if I use "categorical" accuracy is abnormally low, only above 50%.
binary mode means only one output in last layer and use sigmoid activation to classify. sample's label is only one integer.
categorical means two output in last layer and use softmax activation to classify. sample's label is one hot format, eg.(1,0), (0,1).
I think these two ways should have the similar result. Anyone knows the reason for the difference? Thanks very much!
import os
import sys
import glob
import argparse
import matplotlib.pyplot as plt
from keras import __version__
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
set some params here
IM_WIDTH, IM_HEIGHT = 299, 299 #fixed size for InceptionV3
NB_EPOCHS = 1
BAT_SIZE = 32
FC_SIZE = 1024
NB_IV3_LAYERS_TO_FREEZE = 172
loss_mode = "binary_crossentropy"
def get_nb_files(directory):
"""Get number of files by searching directory recursively"""
if not os.path.exists(directory):
return 0
cnt = 0
for r, dirs, files in os.walk(directory):
for dr in dirs:
cnt += len(glob.glob(os.path.join(r, dr + "/*")))
return cnt
transfer_learn, keep the weights in inception v3
def setup_to_transfer_learn(model, base_model):
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss=loss_mode, metrics=['accuracy'])
Add last layer to do two classes classification.
def add_new_last_layer(base_model, nb_classes):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top
nb_classes: # of classes
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(FC_SIZE, activation='relu')(x) #new FC layer, random init
if args.class_mode == "binary":
predictions = Dense(1, activation='sigmoid')(x) #new softmax layer
else:
predictions = Dense(nb_classes, activation='softmax')(x) #new softmax layer
model = Model(inputs=base_model.input, outputs=predictions)
return model
Freeze the bottom NB_IV3_LAYERS and retrain the remaining top layers,
and fine tune weights.
def setup_to_finetune(model):
"""Freeze the bottom NB_IV3_LAYERS and retrain the remaining top layers.
note: NB_IV3_LAYERS corresponds to the top 2 inception blocks in the inceptionv3 arch
Args:
model: keras model
"""
for layer in model.layers[:NB_IV3_LAYERS_TO_FREEZE]:
layer.trainable = False
for layer in model.layers[NB_IV3_LAYERS_TO_FREEZE:]:
layer.trainable = True
model.compile(optimizer="rmsprop", loss=loss_mode, metrics=['accuracy'])
#model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
def train(args):
"""Use transfer learning and fine-tuning to train a network on a new dataset"""
nb_train_samples = get_nb_files(args.train_dir)
nb_classes = len(glob.glob(args.train_dir + "/*"))
nb_val_samples = get_nb_files(args.val_dir)
nb_epoch = int(args.nb_epoch)
batch_size = int(args.batch_size)
print("nb_classes:{}".format(nb_classes))
data prepare
train_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
test_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
train_generator = train_datagen.flow_from_directory(
args.train_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
#class_mode='binary'
class_mode=args.class_mode
)
validation_generator = test_datagen.flow_from_directory(
args.val_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
#class_mode='binary'
class_mode=args.class_mode
)
setup model
base_model = InceptionV3(weights='imagenet', include_top=False) #include_top=False excludes final FC layer
model = add_new_last_layer(base_model, nb_classes)
transfer learning
setup_to_transfer_learn(model, base_model)
#model.summary()
history_tl = model.fit_generator(
train_generator,
epochs=nb_epoch,
steps_per_epoch=nb_train_samples//BAT_SIZE,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
fine-tuning
setup_to_finetune(model)
history_ft = model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples//BAT_SIZE,
epochs=nb_epoch,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
model.save(args.output_model_file)
if args.plot:
plot_training(history_ft)
def plot_training(history):
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'r.')
plt.plot(epochs, val_acc, 'r')
plt.title('Training and validation accuracy')
plt.figure()
plt.plot(epochs, loss, 'r.')
plt.plot(epochs, val_loss, 'r-')
plt.title('Training and validation loss')
plt.show()
main func
if __name__=="__main__":
a = argparse.ArgumentParser()
a.add_argument("--train_dir", default="train2")
a.add_argument("--val_dir", default="test2")
a.add_argument("--nb_epoch", default=NB_EPOCHS)
a.add_argument("--batch_size", default=BAT_SIZE)
a.add_argument("--output_model_file", default="inceptionv3-ft.model")
a.add_argument("--plot", action="store_true")
a.add_argument("--class_mode", default="binary")
args = a.parse_args()
if args.train_dir is None or args.val_dir is None:
a.print_help()
sys.exit(1)
if args.class_mode != "binary" and args.class_mode != "categorical":
print("set class_mode as 'binary' or 'categorical'")
if args.class_mode == "categorical":
loss_mode = "categorical_crossentropy"
#set class_mode
print("class_mode:{}, loss_mode:{}".format(args.class_mode, loss_mode))
if (not os.path.exists(args.train_dir)) or (not os.path.exists(args.val_dir)):
print("directories do not exist")
sys.exit(1)
train(args)
I had this problem on several tasks when the learning rate was too high. Try something like 0.0001 or even less.
According to the Keras Documentation, the default rate ist 0.001:
keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
See https://keras.io/optimizers/#rmsprop
I found that if I use SDG or Adam optimizer, the accuracy can go up normally. So is there something wrong using RMSprop optimizer with default learning rate=0.001?