Different generation quality with one model - deep-learning

I am training a Beta-VAE using BDD-100k driving dataset. Here are my hyperparameters: Adam optimizer, 0.0001 learning rate, and my latent dimension is 16, loss function is reconstruction loss(MSE) and KLD loss multiplied to Beta factor. After a while of training, the model seems learned something, but with different samples the exact same model's performance is absolutely different. Can anyone give me some hint for how to understand what is going on there? Thanks! Here are examples of same model generating different results.
bad generation
good generation
I would really appreciate if you can leave me some advices!
Thank you!

Related

Why is my generator and discriminator loss converging at higher values in WGAN-GP?

This is the loss plot of WGAN-GP after training for 14000 iterations. My image size is 128 by 128. Though the loss plot seems to be converging, the generator loss at iteration 14000 is -26646 and critic loss is -249909.
Loss plot
Batch Normalization in the discriminator breaks Wasserstein GANs with gradient penalty. The authors themselves advocate the usage of layer normalization instead, but this is clearly written in bold in their paper (https://papers.nips.cc/paper/7159-improved-training-of-wasserstein-gans.pdf). It is hard to say if there are other bugs in your code, but I urge you to thoroughly read the DCGAN and the Wasserstein GAN paper and really take notes on the hyperparameters. Getting them wrong really destroys the performance of the GAN and doing a hyperparameter search gets expensive quite quickly.
By the way transposed convolutions produce stairway artifacts in your output images. Use image resizing instead. For an indepth explanation of that phenomenon I can recommend the following resource (https://distill.pub/2016/deconv-checkerboard/).
This is an interesting find as well, which may help you: Accelerated WGAN update strategy with loss change rate balancing.

In deep learning, can I change the weight of loss dynamically?

Call for experts in deep learning.
Hey, I am recently working on training images using tensorflow in python for tone mapping. To get the better result, I focused on using perceptual loss introduced from this paper by Justin Johnson.
In my implementation, I made the use of all 3 parts of loss: a feature loss that extracted from vgg16; a L2 pixel-level loss from the transferred image and the ground true image; and the total variation loss. I summed them up as the loss for back propagation.
From the function
yˆ=argminλcloss_content(y,yc)+λsloss_style(y,ys)+λTVloss_TV(y)
in the paper, we can see that there are 3 weights of the losses, the λ's, to balance them. The value of three λs are probably fixed throughout the training.
My question is that does it make sense if I dynamically change the λ's in every epoch(or several epochs) to adjust the importance of these losses?
For instance, the perceptual loss converges drastically in the first several epochs yet the pixel-level l2 loss converges fairly slow. So maybe the weight λs should be higher for the content loss, let's say 0.9, but lower for others. As the time passes, the pixel-level loss will be increasingly important to smooth up the image and to minimize the artifacts. So it might be better to adjust it higher a bit. Just like changing the learning rate according to the different epochs.
The postdoc supervises me straightly opposes my idea. He thought it is dynamically changing the training model and could cause the inconsistency of the training.
So, pro and cons, I need some ideas...
Thanks!
It's hard to answer this without knowing more about the data you're using, but in short, dynamic loss should not really have that much effect and may have opposite effect altogether.
If you are using Keras, you could simply run a hyperparameter tuner similar to the following in order to see if there is any effect (change the loss accordingly):
https://towardsdatascience.com/hyperparameter-optimization-with-keras-b82e6364ca53
I've only done this on smaller models (way too time consuming) but in essence, it's best to keep it constant and also avoid angering off your supervisor too :D
If you are running a different ML or DL library, there are optimizer for each, just Google them. It may be best to run these on a cluster and overnight, but they usually give you a good enough optimized version of your model.
Hope that helps and good luck!

How do I verify that my model is actually functioning correctly in deep learning?

I have a dataset of around 6K chemical formulas which I am preprocessing via Keras' tokenization to perform binary classification. I am currently using a 1D convolutional neural network with dropouts and am obtaining an accuracy of 82% and validation accuracy of 80% after only two epochs. No matter what I try, the model just plateaus there and doesn't seem to be improving at all. Those same exact accuracies are reached with a vanilla LSTM too. What else can I try to improve my accuracies? Losses only have a difference of 0.04... Anyone have any ideas? Both models use an embedding layer and changing the output dimension isn't having an effect either.
According to your answer, I believe your model has a high bias and low variance (see this link for further details). Thus, your model is not fitting your data very well and it is causing underfitting. So, I suggest you 3 things:
Train your model a little longer: I believe two epoch are too few to give a chance to your model understand the patterns in the data. Try to minimize learning rate and increase the number of epochs.
Try a different architecture: you may change the amount of convolutions, filters and layers, You can also use different activation functions and other layers like max pooling.
Make an error analysis: once you finished your training, apply your model to test set and take a look into the errors. How much false positives and false negatives do you have? Is your model better to classify one class than the other? You can see a pattern in the errors that may be related to your data?
Finally, if none of these suggestions helped you, you may also try to increase the number of features, if possible.

Deep learning, Loss does not decrease

I have tried to finetune a pretrained model using a training set that has 20 classes. The important thing to mention is that even though I have 20 classes, one class consist the 1/3 of the training images. Is that a reason that my loss does not decrease and testing accuracy is almost 30%?
Thank you for any advise
I had similar problem. I resolved it by increasing the variance of the initial values for the neural network weights. This serves as pre-conditioning for the neural network, to prevent the weights from dying out during back-prop.
I came across neural network lectures from Prof. Jenny Orr's course and found it very informative. (Just realized that Jenny co-authored many papers with Yann LeCun and Leon bottou in the early years on neural network training).
Hope it helps!
Yes it is very possible that your net is overfitting to the unbalanced labels. One solution is you can perform data augmentation on the other labels to balance them out. For example, if you have image data: you can do random crops, take horizontal/vertical flips, a variety of techniques.
Edit:
One way to check if you are overfitting to the unbalanced labels is to compute a histogram of your nets predicted labels. If it's highly skewed towards the unbalanced class, you should try the above data augmentation method and retrain your net and see if that helps.

Markov Regime Switching Regression Models - Time Varying Probabiliites

I am looking into estimating a markov regime switching model with time varying probs. Please help me if you know a simpler way to estimate such model.
This paper might answer your needs.