What to do if my model stucks ? Natural Language Processing Model [closed]

What to do if my model stucks ? Natural Language Processing Model [closed] - deep-learning

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 days ago.
Improve this question
I have been changing some params and adding layers and also standardization. But the mmodel seems to stuck at constant high accuracy after 1-2 epoch. Currently at 0.90 Acc both in train and val data.
Samples are heavily imbalanced, class proportion : 10:3:2.
I am using GRU as my model. as the code below :
# Using GRU
seed = 11
tf.keras.backend.clear_session()
np.random.seed(seed)
tf.random.set_seed(seed)
model = Sequential()
model.add(text_vectorization)
model.add(embedding)
model.add(GRU(32, return_sequences=True))
model.add(Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(GRU(32))
model.add(Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(Dense(3,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics='accuracy')
model_gru = model.fit(X_train,y_train,epochs=20,validation_data=(X_test, y_test))
EXpected to have accuracy which is not constant after 1 or 2 epoch.

Related

What is the Mathematical formula for sparse categorical cross entropy loss? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
Can anyone help me with the Mathematics of sparse categorical cross entropy loss function? I have searched for the derivation, explanation (Mathematical) but couldn't find any
I know it is not the right place to ask question like this. But I am helpless.

It is just cross entropy loss. The "sparse" refers to the representation it is expecting for efficiency reasons. E.g. in keras it is expected that label provided is an integer i*, an index for which target[i*] = 1.
CE(target, pred) = -1/n SUM_k [ SUM_i target_ki log pred_ki ]
and since we have sparse target, we have
sparse-CE(int_target, pred) = -1/n SUM_k [ log pred_k{int_target_k} ]
So instead of summing over label dimension we just index, since we know all remaining ones are 0s either way.
And overall as long as targets are exactly one class we have:
CE(target, pred) = CE(onehot(int_target), pred) = sparse-CE(int_target, pred)
The only reason for this distinction is efficiency. For regular classification with ~10-100 classes it does not really matter, but imagine word-level language models where we have thousands of classes.

Confusion about the shape of the output logits from Resnet [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to understand why the shape of the output logits from the Resnet18 model I am working with are (27, 19). The shape of 19 I understand, that is the number of classes I have set the model to predict, but the shape of 27 is the part that I am confused about. I have a batch size of 64 so I would have thought the shape of the logits would be (64, 19), because that would give me 1 prediction vector for each image in the batch...

Turns out I was looking at the logits from the last batch in my epoch, and there weren't enough images left to fill up the entire 64 batch size so it only has 27 images left to train on.

You got it.
The Torch dataloader did this because the method drop_last defaults to False. If you set it to True, it will only output logits shape (64, 19)
https://pytorch.org/docs/stable/data.html

How do I use two loss for two dataset in one pytorch nn? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am quite new to pytorch and deep learning. Here is my question. I have two different datasets with the same feature domain sharing one neural network for a regression problem. The input is the features and the output is the target value. The first dataset uses a normal loss while the second dataset, I am trying to create a new loss for it.
I have searched multi-loss problems, people usually have two loss summed up for the backward process. But I want to use the loss in turn. (When I train the first dataset, the nn uses the first loss and when I train the second dataset, the nn uses the other loss)
Is this possible to do? Appreciate if anyone has some idea.

The loss function does not necessarily have to do with network topology. You can use the corresponding loss with each dataset you use, e.g.
if first_task:
dataloader = torch.utils.data.DataLoader(first_dataset)
loss_fn = first_loss_fn
else:
dataloader = torch.utils.data.Dataloader(second_dataset)
loss_fn = second_loss_fn
# The pytorch training loop, very roughly
for batch in dataloader:
x, y = batch
optimizer.zero_grad()
loss = loss_fn(network.forward(x), y) # calls the corresponding loss function
loss.backward()
optimizer.step()
You can do this for the two datasets sequentially (meaning you interleave by epochs):
for batch in dataloader_1:
...
loss = first_loss_fn(...)
for batch in dataloader_2:
...
loss = second_loss_fn(...)
or better
dataset = torch.utils.data.ChainDataset([first_dataset, second_dataset])
dataloader = torch.utils.data.DataLoader(dataset)
You can also do simultaneously (interleave by examples). The standard way I think would be to use torch.utils.data.ConcatDataset
dataset = torch.utils.data.ConcatDataset([first_dataset, second_dataset])
dataloader = torch.utils.data.DataLoader(dataset)
Note that here you need each sample to store information about the dataset it comes from so you can determine which cost to apply.
A simpler way would be to interleave by batches (then you apply the same cost to the entire batch). For this case one way proposed here is to use separate dataloaders (this way you get flexibility on how often to sample each of them).

Is there a many to many convolution in Pytorch? is this a thing? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have been thinking about convolutions recently. There are common 3by3 convs, where (3,3) kernel's information is weighted and aggregated to supply information to a single spatial point on the output. There are also 3 by 3 upconvs, where a single spatial point on the input supplies weighted information to a 3 by 3 output space.
The conv is a many to one relationship and the upconv is a one to many relationship.
I have however never heard of many to many conv? is there such a thing? For example, a 3by3 kernel supplying information to another 3by3 kernel. I would like to experiment with it in PyTorch. My internet searching has not revealed anything.

You can combine pixel shuffle and averaging to get what you want.
for example, if you want 3x3 -> 3x3 mapping with in_channels to out_channels:
from torch import nn
import torch.nn.functional as nnf
class ManyToManyConv2d(nn.Module):
def __init__(in_channels, out_channels, in_kernel, out_kernel):
self.out_kernel = out_kernel
self.conv = nn.Conv2d(in_channels, out_channles * out_kernel * out_kernel, in_kernel)
def forward(self, x):
y = self.conv(x) # all the output kernel are "folded" into the channel dim
y = nnf.pixel_shuffle(y, self.out_kernel) # "unfold" the out_kernel - image size *out_kernel bigger
y = nnf.avg_pool2d(y, self.out_kernel)
return y

Wy decimal numbers are not stored as expected in mysql [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to save numbers in decimal data type filed formated (10,4), but it's not stored as expected ie. 13850 changed to 13.0000 any help.
this is my code:
$c_price = $unit_price*$rate;
$expense->c_price = number_format($c_price, 4);
$expense->c_total = number_format($quantity*$c_price, 4);
Here c_price and c_total values are changed.

Increase the length of digits:
(19,4)
it will work.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

What to do if my model stucks ? Natural Language Processing Model [closed] - deep-learning

Related

What is the Mathematical formula for sparse categorical cross entropy loss? [closed]

Confusion about the shape of the output logits from Resnet [closed]

How do I use two loss for two dataset in one pytorch nn? [closed]

Is there a many to many convolution in Pytorch? is this a thing? [closed]

Wy decimal numbers are not stored as expected in mysql [closed]

Categories

Resources