How to finetune cola dataset using trainsformer and pytorch? - deep-learning

I am trying to use RobertaModelForSequenceClassification as my backbone and pytorch.DistributedDataParallel to train dataparallel.
My qusetions are as follows.
metric Matthews correlation is used for training and evaluating or just evaluating? Is the loss function of cola dataset nn.Crossentrophy or Matthews correlation?
what should I input to the model? Is these code below ok?
train_dataset.set_format(type='torch', columns=['input_ids','labels','attention_mask'])
val_dataset.set_format(type='torch', columns=['input_ids','labels','attention_mask'])
In robertaforsentenceclassification source code
https://github.com/huggingface/transformers/blob/198c335d219a5eb4d3f124fdd1ce1a9cd9f78a9b/src/transformers/models/roberta/modeling_roberta.py#L526
are all attention_mask are the same after input to each layer_module in the loop of robertaencoder?
If you could give me some pyotrch&huggingface code without using trainer in huggingface, that would be so great!

Related

Use LSTM to forecast Precipitation

I build a LSTM to forecast Precipitation, but it doesn't work well.
My code is very simple and data is very short only contains 720 points.
i use MinMaxScale to scale the data.
this is my code, seq_len = 12
model = Sequential([
layers.LSTM(2, input_shape=(SEQ_LEN, 1),
layers.Dense(1)])
my data is like this
and the output compares with true value like this
I use adam and mae loss function, epoch=10
is it underfitting? or is this simple net can't do this work?
r2_score is no more than 0.55
please tell me how to adjust it. thanks
there are so many options;
first of all it would be better to define the optimized window size by changing the periods of the sequences
The second option would be changing the batch-size of the dataset
Change optimizer into SGD cause of few datapoints and before training model define the best values for learning rate by setting Learning Rate Schedule callback
Try another model architecture with convolution layers and etc
Sometimes it would be a trick to help model performance by setting lambda layer after the last layer to scale up values cause of lstm default activation function is tanh.

Train an LSTM text generator against a function instead of data

I have a model (G) that generates random molecules in their string representation such as COc1ccc2[C##H]3.
However, these generated molecules are not guaranteed to be valid (chemically).
For this, I have a function that checks whether a given molecule is valid or not in the following form:
def check_validity(molecule_string):
...
...
if valid:
return 1
else return 0
My question is, how can I train my model (G) against the check_validity function in an adversarial way in order to force it to generate valid molecules? which loss function is the most suitable and how to include it in a training loop?
Note: I am using Pytorch.
Training adversarially involves training a discriminator model (essentially a separate model from your generator model). You can directly train a model with input: string representation of molecule and output: classification as real or fake molecule. This would have to be trained with enough data to allow the model to infer the rules which make molecules valid/invalid.

What should be the loss function for classification problem in pytorch if sigmoid is used in the output layer

I am trying to implement a model for binary classification problem. Up to now, I was using softmax function (at the output layer) together with torch.NLLLoss function to calculate the loss. However, now I want to use the sigmoid function (instead of softmax) at the output layer. If I do that, should I also change the loss function (to BCELoss or binary_cross_entropy) or may I still use torch.NLLLoss function ?
If you use sigmoid function, then you can only do binary classification. It's not possible to do a multi-class classification. The reason for this is because sigmoid function always returns a value in the range between 0 and 1. So, for instance one can threshold the value at 0.5 and separate (or classify) it into two classes based on the obtained values.
Regarding the objective function NLLLoss - Negative Log Likelihood Loss. It just learns the data distribution. So, it's not a problem as long as that what you're trying to achieve during training.

Predicting continuous valued output

I am working on predicting Semantic Textual Similarity (SemEval 2017 Task-1) between a pair of texts. The similarity score (output) is a continuous value between [0,5]. The neural network model (link below), therefore, has 6 units in the final layer for prediction between values [0,5]. The objective function used is the Pearson correlation coefficient and softmax activation is used. Now, in order to train the model, how can I give the target output values to the model? Since there are 6 output classes, I should probably send one-hot-encoded vectors of the output. In that case, how can we convert the output (which might be a float value such as 2.33) to a one-hot vector of length 6? Or is there any other way of specifying the target output and training the model?
Paper: http://nlp.arizona.edu/SemEval-2017/pdf/SemEval016.pdf
If the value you're trying to predict is continuously-defined, you might be better off configuring this as a regression architecture. This will be simpler to train and interpret and will give you non-integer predictions (which you can then bucket or threshold however you please).
In order to do this, replace your softmax layer with a layer containing a single neuron with a linear activation function. Then you can simply train this network using your real-valued similarity numbers at the output. For loss function, you can use MSE / L2 unless you have a reason to do otherwise.

How to extract rows of the same value in 2D keras tensor?

I am new to keras, recently work on a project which needs to extract rows of the same labels. Explained below:
k_tensor = [Batch, Labels]
where Batch is batch_size in deep learning training, labels could be thought as categories. I want to extract the labels of the same value.
how to get this in keras. : )
Thanks