Confusion about the shape of the output logits from Resnet [closed] - deep-learning

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I am trying to understand why the shape of the output logits from the Resnet18 model I am working with are (27, 19). The shape of 19 I understand, that is the number of classes I have set the model to predict, but the shape of 27 is the part that I am confused about. I have a batch size of 64 so I would have thought the shape of the logits would be (64, 19), because that would give me 1 prediction vector for each image in the batch...

Turns out I was looking at the logits from the last batch in my epoch, and there weren't enough images left to fill up the entire 64 batch size so it only has 27 images left to train on.

You got it.
The Torch dataloader did this because the method drop_last defaults to False. If you set it to True, it will only output logits shape (64, 19)
https://pytorch.org/docs/stable/data.html

Related

What to do if my model stucks ? Natural Language Processing Model [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 days ago.
Improve this question
I have been changing some params and adding layers and also standardization. But the mmodel seems to stuck at constant high accuracy after 1-2 epoch. Currently at 0.90 Acc both in train and val data.
Samples are heavily imbalanced, class proportion : 10:3:2.
I am using GRU as my model. as the code below :
# Using GRU
seed = 11
tf.keras.backend.clear_session()
np.random.seed(seed)
tf.random.set_seed(seed)
model = Sequential()
model.add(text_vectorization)
model.add(embedding)
model.add(GRU(32, return_sequences=True))
model.add(Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(GRU(32))
model.add(Dropout(0.5))
model.add(tf.keras.layers.BatchNormalization())
model.add(Dense(3,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics='accuracy')
model_gru = model.fit(X_train,y_train,epochs=20,validation_data=(X_test, y_test))
EXpected to have accuracy which is not constant after 1 or 2 epoch.

What is the Mathematical formula for sparse categorical cross entropy loss? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
Can anyone help me with the Mathematics of sparse categorical cross entropy loss function? I have searched for the derivation, explanation (Mathematical) but couldn't find any
I know it is not the right place to ask question like this. But I am helpless.
It is just cross entropy loss. The "sparse" refers to the representation it is expecting for efficiency reasons. E.g. in keras it is expected that label provided is an integer i*, an index for which target[i*] = 1.
CE(target, pred) = -1/n SUM_k [ SUM_i target_ki log pred_ki ]
and since we have sparse target, we have
sparse-CE(int_target, pred) = -1/n SUM_k [ log pred_k{int_target_k} ]
So instead of summing over label dimension we just index, since we know all remaining ones are 0s either way.
And overall as long as targets are exactly one class we have:
CE(target, pred) = CE(onehot(int_target), pred) = sparse-CE(int_target, pred)
The only reason for this distinction is efficiency. For regular classification with ~10-100 classes it does not really matter, but imagine word-level language models where we have thousands of classes.

Octave - How to plot an "infinite"(= Defining the function on [0:35916] for me) sawtooth function [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I know how to plot a sawtooth function (thanks to another forum) but only on the domain [0:10] thanks to the following code which is actually working :
t=0:0.04:10;
A=1;
T=1;
rho= mod(t * A / T, A);
plot(t,rho)
A=the amplitude, T=the period,t=the time interval.
The problem is that I need the same function on the domain [0:35916] but when I try to adapt this code to do so (eg by extending the time interval), I get an error and I don't understand why.
error:
plt2vv: vector lengths must match error: called from plt>plt2vv at line 487 column 5 plt>plt2 at line 246 column 14 plt at line 113 column 17 plot at line 222 column 10
Simply modifying the original upper limit of your interval from 10 to 35916 should do the trick:
t=0:0.04:35916;
A=1;
T=1;
rho= mod(t * A / T, A);
plot(t,rho)
The code above yields the following image:
Of course it is up to you to adjust A and T to suit your needs.

Wy decimal numbers are not stored as expected in mysql [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to save numbers in decimal data type filed formated (10,4), but it's not stored as expected ie. 13850 changed to 13.0000 any help.
this is my code:
$c_price = $unit_price*$rate;
$expense->c_price = number_format($c_price, 4);
$expense->c_total = number_format($quantity*$c_price, 4);
Here c_price and c_total values are changed.
Increase the length of digits:
(19,4)
it will work.

Vectorization or sum as matrix operations [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Let there be the following definition of gradient descent cost function
with the hypothesis function defined as
what I've come up with for multivariate linear regression is
theta = theta - alpha * 1/m * ([theta', -1]*[X';y']*X)';
h_theta = 1/(2*m)* (X*theta - y)'*(X*theta-y);
(octave notation, ' means matrix transpose, [A, n] means adding a new column to matrix A with scalar value n, [A; B] means appending matrix B to matrix A row-wise)
It's doing its job correctly how far I can tell (the plots look ok), however I have a strong feeling that it's unnecessarily complicated.
How to write it with as little matrix operations as possible (and no element-wise operations, of course)?
I don't think that is unnecessarily complicated, and instead this is what you want. Matrix operations are good because you don't have to loop over elements yourself or do element-wise operations. I remember taking a course online and my solution seems pretty similar.
The way you have it is the most efficient way of doing it as it is fully vectorized. It can be done by having a for loop over the summation and so on, however this is very inefficient in terms of processing power.