thanks for looking at this question!
I attempted to train a simple DCGAN to generate room designs from a dataset of 215 coloured images of size 128x128. My attempt can be summarised as below:
Generator: 5 deconvolution layers from (100x1) noise input to (128x128x1) grayscale image output
Discriminator: 4 convolution layers from (128x128x1) grayscale image input
Optimizer: Adam at learning rate of 0.002 for both Generator and Discriminator
Batch size: 21 images/batch
Epoch: 100 epochs with 10 batches/epoch
Results:
1. D-loss is close to 0, G-loss is close to 1. After which, I've cut down my discriminator by 2 convolution layers, reduce Adam learning rate to 0.00002, hoping that the discriminator doesn't overpower my generator.
Upon (1), D-loss and G-loss hovers around 0.5 - 1.0. However, the generated image still show noise images even after 100 epochs.
Questions:
Is there something wrong in terms of how I trained my GAN?
How should I modify my approach to successfully train the GAN?
Thank you so much everyone for your help, really looking forward!
Related
I'm investigating the task of training a neural network to predict one future value given a sinusoidal input. So for example, as seen in the Figure, the input signal is x and the expected output signal y. The model's output is y^. Doing the regression task is fairly straightforward, and there are a lot of choices for this problem. I'm using a simple recurrent neural network with mean-squared error (MSE) loss between y and y^.
Additionally, suppose I know that the sinusoid is made up of N modalities, e.g., at some points, the wave oscillates at 5 Hz, then 10 Hz, then back to 5 Hz, then up to 15 Hz maybe—i.e., N=3.
In this case, I have ground-truth class labels in a vector k and the model does both regression and classification, additionally outputting a vector k^. An example is shown in the Figure. As this is a multi-class problem with exclusivity, I figured binary cross entropy (BCE) loss should be relevant here.
I'm sure there is a lot of research about combining loss functions, but does just adding MSE and BCE make sense? Scaling one up or down by a factor of 10 doesn't seem to change the learning outcome too much. So I was wondering what is considered the standard approach to problems where there is a joint classification and regression objective.
Additionally, on top of just BCE, I want to penalize k^ for quickly jumping around between classes; for example, if the model guesses one class, I'd like it to stay in that one class and switch only when it's necessary. See how in the Figure, there are fast dark blue blips in k^. I would like the same solid bands as seen in k, and naive BCE loss doesn't account for that.
Appreciate any and all advice!
I am training VGG11 on a custom image dataset for 3-way 5-shot image classification using MAML from learn2learn. I am encapsulating the whole VGG11 model with MAML, i.e., not just the classification head. My hyperparameters are as follows:
Meta LR: 0.001
Fast LR: 0.5
Adaptation steps: 1
First order: False
Meta Batch Size: 5
Optimizer: AdamW
During the training, I noticed that after taking the first outer-loop optimization step, i.e., AdamW.step(), loss skyrockets to very large values, like ten thousands. Is this normal? Also, I am measuring the micro F1 score as accuracy metric of which curve for meta training/validation is as follows:
It is fluctuating too much in my opinion, is this normal? What could be the reason of this?
Thanks
I figured it out. I was using VGG11 with vanilla BatchNorm layers from PyTorch which was not working properly in meta training setup. I removed BatchNorm layers and now it works as expected.
I trained an image classification model of 10 classes by finetuning EfficientNet-B4 for 100 epochs. I split my training data to 70/30. I used stochastic gradient descent with Nesterov momentum of 0.9 and the starting learning rate is 0.001. The batch size is 10. The test loss seemed to stuck at 84% for the next 50 epochs (51st - 100th). I do not know whether the model was stuck in local minima or the model was overfitted. Below is an image of the test and train loss from 51st epoch to 100th. I need your help a lot. Thanks. Train test loss image from 51st to 100th epoch.
From the graph you provided, both validation and training losses are still going down so your model is still training and there is no overfit. If your test set is stuck at the same accuracy, the reason is probably that the data you are using for your training/validation dataset does not generalize well enough on your test dataset (in your graph the validation only reached 50% accuracy while your test set reached 84% accuracy).
I looked into your training and validation graph. yes, your model is training and the losses are going down, but your validation error is near 50%, which means 'random guess'.
Possible reasons-
1- From your train error (which is presented in the image between 50-100 epoch), the error in average is going down, but it's random. like your error at epoch 100 is pretty much the same at epoch 70. This could be because your either dataset is too simple and you are forcing huge network like an efficient net to overfit it.
2- it could also be because of the way you are finetuning it, there could be any problem. like which all layers you froze and for which layer you are taking the gradients while doing BP. I am assuming you are using pre-trained weights.
3- Optimizer issue. try to use Adam
It would be great if you can provide total losses (from epoch 1 - 100).
I have an interesting problem. I am working on a project which I am trying to classify 15 logos (14 logo + 1 nonlogo class). The dataset is our own. I am using Digits 5 /6 that employs caffe. My caffe is 0.15.14 flavored by NVIDIA.
I have trained it with Alexnet and Googlenet which have been shipped with Digits. The models built by from scratch and finetuning seem ok. (GLT: 90% accuracy, alexnet: 80%) Meanwhile, these models have been created by finetuning the pretrained models (IMAGENET)
My problem is, I wanted to extend my study to cover resnet 32, densenet 121 and vgg16-19. Whenever I try to model these models, their top 1 accuracies get very poor results. (Generally 0) You might guess (as I did) that sources from by building from scratch. However, as far as i know, the model should converge to some limit but i always see a straight line after 2,3 epochs (the line is accuracy line and it is generally 0) the loss value increases to 87 after a few epochs.
I have searched the possible reasons that i may encounter.
1. I changed the weight_filler param to "xavier" nothing has changed.
2. I increased the learning rate but nothing has changed.
3. I even used a pretrained model to finetune vgg16 but it is still the same.
4. I used the cifar10 dataset by upscaling the image sizes to 224*224 and tried but the values are very similar to my logo dataset
I am struggling in finding the correct way. I am not an expert but it seems soo odd to me to have such bad results after having nice ones in alexnet and googlenet.
Why my models do not converge on these recent networks. I need your advices.
btw my training data contains 400 images per class and for non logo class i have collected 1200 non logo images. The validation data contains different numbers of images per logo class and a different 1200 non logo images for validation. So I have totally 5204 training, 579 test(10% of training) and 4729 validation images
Here i am attaching a trainval.txt for my resnet 32 model.
So what is my problem?
Thanks in advance
resnet32_train_val.prototxt
https://github.com/slavaglaps/ResNet_cifar10/blob/master/resnet.ipynb
This is my model trained in 100 epochs
Accuracy on similar models and similar data reaches 90%
What is my problem?
I think it's worth reducing the learning rate with the passage of the epochs.
What do you think that can help me?
There are a few subtle differences.
You are trying to apply ImageNet style architecture to Cifar-10. First convolution is 3 x 3, not 7 x 7. There is no max-pooling layer. The image is downsampled purely by using stride-2 convolutions.
You should probably do mean-centering by keeping featurewise_center = True in ImageDataGenerator.
Do not use very high number of filters such as [512, 1024, 2048]. There are only 50,000 images for you to train unlike ImageNet which has about a million.
In short, read up section 4.2 in the deep residual network paper and try to replicate the network. You may also read this blog.