What is the correct order and usage of train_test_split, PolynomialFeatures, StandardScaler,Lasso? - regression

I am a little bit confused here. Any help is highly appreciated.
So, I want train_test_split, PolynomialFeatures, StandardScaler, and Lasso regression on some dataset.
What is the correct order and usage of train_test_split, PolynomialFeatures, StandardScaler, Lasso?
Also, can you please explain why should we use that specific order, explanation would be great for me, or any link that explains this?
Thanks in advance. My code is given below:
s = StandardScaler()
pf = PolynomialFeatures(degree=2, include_bias=False,)
lr = LinearRegression()
las = Lasso()
X_pf = pf.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_pf, y, test_size=0.3, random_state=72018)
X_train_s = s.fit_transform(X_train)
las.fit(X_train_s, y_train)
X_test_s = s.transform(X_test)
y_pred = las.predict(X_test_s)

Related

Why does my ML model always show the same result?

I've already trained several models for a binary classification problem, basing my election on F-Score and AUC. The code used has been the following:
svm = StandardScaler()
svm.fit(feat_train)
feat_train_std = svm.transform(feat_train)
feat_test_std = svm.transform(feat_test)
model_10= BalancedBaggingClassifier(base_estimator=SVC(C=1.0, random_state=1, kernel='linear'),
sampling_strategy='auto',
replacement=False,
random_state=0)
model_10.fit(feat_train_std, target_train)
pred_target_10 = model_10.predict(feat_test)
mostrar_resultados(target_test, pred_target_10)
pred_target_10 = model_10.predict_proba(feat_test)[:, 1]
average_precision_10 = average_precision_score(target_test, pred_target_10)
precision_10, recall_10, thresholds = precision_recall_curve(target_test, pred_target_10)
auc_precision_recall_10 = auc(recall_10, precision_10)
disp_10 = plot_precision_recall_curve(model_10, feat_test, target_test)
disp_10.ax_.set_title('Binary class Precision-Recall curve: '
'AUC={0:0.2f}'.format(auc_precision_recall_10))
Afterwards, I load the model as follows:
modelo_pickle = 'modelo_pickle.pkl'
joblib.dump(model_10,modelo_pickle)
loaded_model = joblib.load(modelo_pickle)
Then, the aim is to load a new dataset, which columns are the same as the model's variables, and make a prediction for each line:
lista_x=x.to_numpy().tolist()
resultados=[]
for i in lista_x:
pred = loaded_model.predict([i])
resultados.append(pred)
print(resultados)
However, every single result is equal to 1, which does not make any sense. Would anyone tell me what am I missing, please?
Thank you in advance.
Regards,
Previously described.

Snake Species Image Classification performing poorly

I just started out with deep learning and I have a lot to learn yet. My first project is the following which tries to classify 5 different species of snake using a total of 17389 images for all the 5 classes(about 3500 per class). I have used a pretty small model; I am sure even a smaller would have worked fine. But my accuracy never goes above 30(max 50) & the loss is >1. I am totally new to this with just having knowledge about how different activations and layers work. I have tried tweaking the model but it does not make it improve.
Havent checked the data on test set cause the results are pretty bad.
I have done whatever basic preprocessing I know about.
Any sort of help would be greatly appreciated :)
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1. / 255,
data_format="channels_last",
validation_split=0.25,)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
subset='training',
shuffle=True)
validation_generator = train_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
subset='validation',
shuffle=True)
from keras.utils.np_utils import to_categorical
train_labels = train_generator.classes
num_classes = len(train_generator.class_indices)
train_labels = to_categorical(train_labels, num_classes=num_classes)
print(train_labels)
# Creating a Sequential model
model= Sequential()
model.add(Conv2D(kernel_size=(3,3), filters=32, activation='tanh', input_shape=(150,150,3,)))
model.add(Conv2D(filters=30,kernel_size = (3,3),activation='tanh'))
model.add(MaxPool2D(2,2))
model.add(Conv2D(filters=30,kernel_size = (3,3),activation='tanh'))
model.add(MaxPool2D(2,2))
model.add(Conv2D(filters=30,kernel_size = (3,3),activation='tanh'))
model.add(Flatten())
model.add(Dense(20,activation='relu'))
model.add(Dense(15,activation='relu'))
model.add(Dense(5,activation = 'softmax'))
model.compile(
loss='categorical_crossentropy',
metrics=['acc'],
optimizer='adam'
)
history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples//train_generator.batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_generator.samples//validation_generator.batch_size)
Can you please help me with where I am going wrong. I guess everywhere.
When you pass class_mode='categorical' your labels are one hot encoded, you don't need train_labels = to_categorical(train_labels, num_classes=num_classes) for second time. For detailed info you can refer the docs.
Also your 2DConv has a tanh activation, it is better to use relu
model.add(Conv2D(filters=30,kernel_size = (3,3),activation='relu'))
And try to increase your filters as you go deep in the NN, like this:
model.add(Conv2D(filters=64,kernel_size = (3,3),activation='relu'))
model.add(MaxPool2D(2,2))
model.add(Conv2D(filters=128,kernel_size = (3,3),activation='relu'))
model.add(MaxPool2D(2,2))
model.add(Conv2D(filters=256,kernel_size = (3,3),activation='relu'))
After flattening, use Dense layer which has more units:
model.add(Dense(128,activation='relu'))
model.add(Dense(5,activation = 'softmax'))
You also don't need to set steps_per_epoch explicitly. validation_generator.samples//validation_generator.batch_size is equivalent to len(validation_generator) in this case.

Sequence to Sequence Loss

I'm trying to figure out how sequence to sequence loss is calculated. I am using the huggingface transformers library in this case, but this might actually be relevant to other DL libraries.
So to get the required data we can do:
from transformers import EncoderDecoderModel, BertTokenizer
import torch
import torch.nn.functional as F
torch.manual_seed(42)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
MAX_LEN = 128
tokenize = lambda x: tokenizer(x, max_length=MAX_LEN, truncation=True, padding=True, return_tensors="pt")
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert from pre-trained checkpoints
input_seq = ["Hello, my dog is cute", "my cat cute"]
output_seq = ["Yes it is", "ok"]
input_tokens = tokenize(input_seq)
output_tokens = tokenize(output_seq)
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=output_tokens["input_ids"],
return_dict=True)
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
Edit 1
Thanks to #cronoik I was able to replicate the loss calculated by huggingface as being:
output_logits = logits[:,:-1,:]
output_mask = mask[:,:-1]
label_tokens = output_tokens["input_ids"][:, 1:].unsqueeze(-1)
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
huggingface_loss = -select_logits.mean()
However, since the last two tokens of the second input is just padding, shouldn't we calculate the loss to be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss = -seq_loss.mean()
^This takes into account the length of the sequence of each row of outputs, and the padding by masking it out. Think this is especially useful when we have batches of varying length outputs.
ok I found out where I was making the mistakes. This is all thanks to this thread in the HuggingFace forum.
The output labels need to have -100 for the masked version. The transoformers library does not do it for you.
One silly mistake I made was with the mask. It should have been output_mask = mask[:, 1:] instead of :-1.
1. Using Model
We need to set the masks of output to -100. It is important to use clone as shown below:
labels = output_tokens["input_ids"].clone()
labels[output_tokens["attention_mask"]==0] = -100
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=labels,
return_dict=True)
2. Calculating Loss
So the final way to replicate it is as follows:
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
# shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:]
# gather the logits and mask
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
-select_logits[output_mask==1].mean(), outputs["loss"]
The above however ignores the fact that this comes from two different lines. So an alternate way of calculating loss could be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss.mean()
thanks for sharing. However, the new version of transformers as of today actually does not "shift" anymore. The following is not needed.
#shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:

Why is my loss negative sometines when I use WGAN?

When I use WGAN,sometimes loss is negative??
mo loss code:
self.g_loss = -tf.reduce_mean(d_logits_fake)
self.d_loss = tf.reduce_mean(d_logits_fake) - tf.reduce_mean(d_logits_real)+GP
I think that calcualting g_loss and d_loss need use Model.train_on_batch.I don't know what d_logits_fake comes from.

Keras sequence-to-sequence encoder-decoder part-of-speech tagging example with attention mechanism

I have a sequence of indexed words w_1, ..., w_n. Since I'm new to deep learning, I am looking for a simple implementation of a seq2seq pos tagging model in Keras which uses attention mechanism and produces a sequence of POS tags t_1, ..., t_n out of my word sequence.
To be specific, I don't know how to gather the outputs of LSTM hidden layers of the encoder (since they are TimeDistributed) and how to feed the decoder LSTM layer for each timestamp with the outputs of time "t-1" for generating the output "t".
The model I'm thinking about is looking like the one in this paper http://arxiv.org/abs/1409.0473.
I think this issue will help you, though there is no attention mechanism.
https://github.com/fchollet/keras/issues/2654
The code included in the issue is as follows.
B = self.igor.batch_size
R = self.igor.rnn_size
S = self.igor.max_sequence_len
V = self.igor.vocab_size
E = self.igor.embedding_size
emb_W = self.igor.embeddings.astype(theano.config.floatX)
## dropout parameters
p_emb = self.igor.p_emb_dropout
p_W = self.igor.p_W_dropout
p_U = self.igor.p_U_dropout
p_dense = self.igor.p_dense_dropout
w_decay = self.igor.weight_decay
M = Sequential()
M.add(Embedding(V, E, batch_input_shape=(B,S),
W_regularizer=l2(w_decay),
weights=[emb_W], mask_zero=True, dropout=p_emb))
#for i in range(self.igor.num_lstms):
M.add(LSTM(R, return_sequences=True, dropout_W=p_W, dropout_U=p_U,
U_regularizer=l2(w_decay), W_regularizer=l2(w_decay)))
M.add(Dropout(p_dense))
M.add(LSTM(R*int(1/p_dense), return_sequences=True, dropout_W=p_W, dropout_U=p_U))
M.add(Dropout(p_dense))
M.add(TimeDistributed(Dense(V, activation='softmax',
W_regularizer=l2(w_decay), b_regularizer=l2(w_decay))))
print("compiling")
optimizer = Adam(self.igor.LR, clipnorm=self.igor.max_grad_norm,
clipvalue=5.0)
#optimizer = SGD(lr=0.01, momentum=0.5, decay=0.0, nesterov=True)
M.compile(loss='categorical_crossentropy', optimizer=optimizer,
metrics=['accuracy', 'perplexity'])
print("compiled")
self.model = M