Densenet with Hinge loss on CIFAR dataset - deep-learning

I am trying to use Hinge loss with densenet on the CIFAR 100 dataset. The learning converges to some point and after that there is no learning. The accuracy is much less than Densenet with CrossEntropy loss function. I tried with different learning rates and weight decays.
Any ideas on why I am unable to train properly Densenet with Hinge loss? I am able to use Hinge loss with Resnet without any problem.

Related

Deep Learning Training Loss does not decrease, and Validation Accuracy fluctuate

My CNN-based deep learning model is fluctuating in validation accuracy at certain epochs. I learned the deep learning model from frequency domain images. Do I need to normalize or not?
And then, the training loss does not decrease and is stable. I tried 80 epochs. The training loss and validation accuracy are still stuck in epoch 20. There are no improvements.
I have already tried tuning hyperparameters such as learning rate. I also added regularization parameters.

How to reduce the difference between training and validation in the loss curve?

I have used the Transformer model to train the time series dataset, but there is always a gap between training and validation in my loss curve. I have tried using different learning rates, batch sizes, dropout, heads, dim_feedforward, and layers, but they don't work. Can anyone give me some ideas on reducing the gap between them?
I also tried to ask the question on the Pytorch forum but didn't get any reply.
How to design a decoder for time series regression in Transformer?
Since you are overfitting your model here
1.Try using more data.
2.Try to add dropOut layers
3. Try using lasso or Ridge

Model Performance with Learning Curves

I am a beginner in machine learning and I need your help, please.
I have created an LSTM forecasting model (multivariate data). The model was trained and the RMSE= 0.109 and the loss curve and the curve of the predicted and the actual values are as follows: Is this model good enough to use or should be updated?
actual vs predicted values
Model loss

Different generation quality with one model

I am training a Beta-VAE using BDD-100k driving dataset. Here are my hyperparameters: Adam optimizer, 0.0001 learning rate, and my latent dimension is 16, loss function is reconstruction loss(MSE) and KLD loss multiplied to Beta factor. After a while of training, the model seems learned something, but with different samples the exact same model's performance is absolutely different. Can anyone give me some hint for how to understand what is going on there? Thanks! Here are examples of same model generating different results.
bad generation
good generation
I would really appreciate if you can leave me some advices!
Thank you!

Training model in eval() mode gives better result in PyTorch?

I have a model with Dropout layers (with p=0.6). I ended up training the model in .eval() mode and again trained the model in .train() mode, I find that the training .eval() mode gave me better accuracy and quicker loss reduction on training data,
train(): Train loss : 0.832, Validation Loss : 0.821
eval(): Train loss : 0.323, Validation Loss : 0.251
Why is this so?
This seems like the model architecture is simple and when in train mode, is not able to capture the features in the data and hence undergoes underfitting.
eval() disables dropouts and Batch normalization, among other modules.
This means that the model trains better without dropout helping the model the learn better with more neurons, also increasing the layer size, increasing the number of layers, decreasing the dropout probability, helps.