Problem Description
I have constructed a fully connected model as shown by the code below:
auto* model = new Sequential({
new Linear(3072,120),
new ReLU(120),
new Linear(120,32),
new ReLU(32),
new Linear(32,10),
new Softmax(10)
});
OptimizerInfo* info = new OPTIMIZER_SGD( /*lr=*/ 0.003);
model->construct(info);
model->randInit();
model->setLoss(crossEntropyLoss);
The model works properly as expected on the MNIST dataset (with the first Linear input set to 784), as the loss decreases as more and more batches get trained. The graph looks like this:
The model performance on MNIST
However, when I switched to the CIFAR-10 Dataset, the loss does not decrease over time and fluctuate around 2.30, with each element in the output vector approaching 0.10 . This case continued for all the following epoches:
The model performance on CIFAR-10
Since I worked a little bit on the problem and checked through several docs and papers, the performance of the layered fully connected model on CIFAR-10 should be around 0.8 accuracy with CE loss below 0.5, so something clearly went wrong here....
Things I have checked:
Both datasets (MNIST and CIFAR-10) are read into the model and normalized between [0,1] properly, and there are no misalignments between labels and corresponding training sample.
no NaN or Inf appeared anywhere in the model or the loss
Though the network is based on an engine I constructed without any external dependencies, I have compared the results of all functions (GEMM, ReLU, Softmax...) with same inputs in pytorch, and there is no algorithmic errors I found in my project.
Adjustments on hyperparameters (Lr, batch_size, change of optimizers, etc) do not make any difference in the performance of the model.
The model is running on FP32 and used CUDA for computations.
For any reasonable network configurations, there are no memory leaking or memory issues occuring.
Anyone have idea about what is happening?
My project link: https://github.com/SeanEngine/SEANN_2
Plz help :3
Related
When we build a model and train it, the initial weights are randomly initialized, unless specified (seed).
As we know, there are a variety of parameters we can adjust like epochs, optimizers, batch_size, etc to find the "best" model.
The concept I have trouble with is: Even if we do find the best model after tuning, the weights will be different, yielding different models and results. So the best model for this maybe wouldn't be the best if we compiled and ran it again with the "best parameters". If we seed the weights with the parameters for reproducibility, we don't know if those would be the best weights. On the other hand, if we tune the weights, then the "best parameters" won't be best parameters anymore? I am stuck in a loop. Is there a general guideline on what parameters to tune first as opposed to others?
Or is this whole logic flawed somewhere and I am way overthinking?
We initialize weights randomly to ensure that each node acts differently(unsymmetric) from others.
Depending upon the hyperparameters(epochs, batch size etc, iterations,.)The weights are updated until the iterations last. In the end, we call the updated weights as models.
Seed is used to control the randomness of initialization. If im not wrong, a good learning algorithm(Objective function and optimizer) converges irrespective of seed values.
Again, A good model means tuning all the hyperparameters, making sure that the model is not underfitting.
On the other hand, even the model shouldn't overfit.
There is nothing like the best parameters(weights, bias), we need to continuously tune the model until the results are satisfactory and the main parts are data processing.
What is going wrong with this code? I have generated adversarial images using cleverhans API - generate_np method. And using the default cleverhans CNN classifier to classify the images. The test accuracy is very low as expected when I use the model after generating the images. But if I save and reload the model, the accuracy is too high. Please check the code here.
https://github.com/csesivakumar/Adversarial_Defense/blob/master/Cleverhans_generatenp.ipynb
Python: 3.6
Pasting my answer from the GitHub issue tracker in case others are facing the same issue:
From your code it looks like you are initializing the model's weights, defining the tf session, etc... after having trained the model using Keras. My guess is that the adv_x array does not contain images that are adversarial. This would explain why the accuracy output by [22] is close to random---because the model weights are random. When you restore the model, its weights are set again to the values learned during training so the accuracy is restored (because the images are not adversarial).
I'm quite new to caffe and this could be a non sense question.
I have trained my network from scratch. It trains well and gets a reasonable accuracy in tests. The question is about retraining or fine tuning this network. Suppose you have new samples of images of the same original categories and you want to teach the net with this new images (because for example the net fails to predict in this particular images).
As far a I know it is possible to resume training with a snapshot and solverstate or fine-tuning using only the weights of the training model. What is the best option in this case?. or is better to retrain the net with original images and new ones together?.
Think in a possible "incremental training" scheme, because not all the cases for a particular category are available in the initial training. Is it possible to retrain the net only with the new samples?. Should I change the learning rate or maintain any parameters in order to maintain the original accuracy in prediction when training with the new samples? the net should predict in original image set with the same behaviour after fine tuning.
Thanks in advance.
I have a pretrained CNN (Resnet-18) trained on Imagenet, now i want to extend it on my own dataset of video frames , now the point is all tutorials i found on Finetuning required dataset to be organised in classes like
class1/train/
class1/test/
class2/train/
class2/test/
but i have only frames on many videos , how will i train my CNN on it.
So can anyone point me in right direction , any tutorial or paper etc ?
PS: My final task is to get deep features of all frames that i provide at the time of testing
for training network, you should have some 'label'(sometimes called y) of your input data. from there, network calculate loss between logit(answer of network) and the given label.
And the network will self-revise using that loss value by backpropagating. that process is what we call 'training'.
Because you only have input data, not label, so you can get the logit only. that means a loss cannot be calculated.
Fine tuning is almost same word with 'additional training', so that you cannot fine tuning your pre-trained network without labeled data.
About train set & test set, that is not the problem right now.
If you have enough labeled input data, you can divide it with some ratio.
(e.g. 80% of data for training, 20% of data for testing)
the reason why divide data into these two sets, we want to check the performance of our trained network more general, unseen situation.
However, if you just input your data into pre-trained network(encoder part), it will give a deep feature. It doesn't exactly fit to your task, still it is deep feature.
Added)
Unsupervised pre-training for convolutional neural network in theano
here is the method you need, deep feature encoder in unsupervised situation. I hope it will help.
I have a dataset of around 6K chemical formulas which I am preprocessing via Keras' tokenization to perform binary classification. I am currently using a 1D convolutional neural network with dropouts and am obtaining an accuracy of 82% and validation accuracy of 80% after only two epochs. No matter what I try, the model just plateaus there and doesn't seem to be improving at all. Those same exact accuracies are reached with a vanilla LSTM too. What else can I try to improve my accuracies? Losses only have a difference of 0.04... Anyone have any ideas? Both models use an embedding layer and changing the output dimension isn't having an effect either.
According to your answer, I believe your model has a high bias and low variance (see this link for further details). Thus, your model is not fitting your data very well and it is causing underfitting. So, I suggest you 3 things:
Train your model a little longer: I believe two epoch are too few to give a chance to your model understand the patterns in the data. Try to minimize learning rate and increase the number of epochs.
Try a different architecture: you may change the amount of convolutions, filters and layers, You can also use different activation functions and other layers like max pooling.
Make an error analysis: once you finished your training, apply your model to test set and take a look into the errors. How much false positives and false negatives do you have? Is your model better to classify one class than the other? You can see a pattern in the errors that may be related to your data?
Finally, if none of these suggestions helped you, you may also try to increase the number of features, if possible.