assumed I have a trained model of torch.nn.Module, and I only need it from now on for evaluation.
Is PyTorch passing my data through all the layers or is it compressing the model so that it only calculates an equivalent function?
If not, is there a way to do it in order to make the calculation faster and the model lighter in memory terms?
I have been looking on the internet for a similar question and didn't find any suitable answer.
You should do two things during inference: set your model to evaluation mode with model.eval() and use no_grad() (disables gradient computation). This will make your code faster and more memory-efficient.
In practice this will look like
model.eval()
with torch.no_grad():
#your inference code here
There are many options and it depends on your specific case.
One option is to convert to TorchScript.
Another option is to do quantization on the model.
Third, you could perform knowledge distillation, transferring the knowledge from your existing model to a smaller one.
Related
I am a student and currently studying deep learning by myself. Here I would like to ask for clarification regarding the transfer learning.
For example MobileNetv2 (https://keras.io/api/applications/mobilenet/#mobilenetv2-function), if the weights parameter is set to None, then I am not doing transfer learning as the weights are random initialized. If I would like to do transfer learning, then I should set the weights parameter to imagenet. Is this concept correct?
Clarification and explanation regarding deep learning
Yes, when you initialize the weights with random values, you are just using the architecture and training the model from scratch. The goal of transfer learning is to use the previously gained knowledge by another trained model to get better results or to use less computational resources.
There are different ways to use transfer learning:
You can freeze the learned weights of the base model and replace the last layer of the model base on your problem and just train the last layer
You can start with the learned weights and fine-tune them (let them change in the learning process). Many people do that because sometimes it makes the training faster and gives better results because the weights already contain so much information.
You can use the first layers to extract basic features like colors, edges, circles... and add your desired layers after them. In this way, you can use your resources to learn high-level features.
There are more cases, but I hope it could give you an idea.
When we build a model and train it, the initial weights are randomly initialized, unless specified (seed).
As we know, there are a variety of parameters we can adjust like epochs, optimizers, batch_size, etc to find the "best" model.
The concept I have trouble with is: Even if we do find the best model after tuning, the weights will be different, yielding different models and results. So the best model for this maybe wouldn't be the best if we compiled and ran it again with the "best parameters". If we seed the weights with the parameters for reproducibility, we don't know if those would be the best weights. On the other hand, if we tune the weights, then the "best parameters" won't be best parameters anymore? I am stuck in a loop. Is there a general guideline on what parameters to tune first as opposed to others?
Or is this whole logic flawed somewhere and I am way overthinking?
We initialize weights randomly to ensure that each node acts differently(unsymmetric) from others.
Depending upon the hyperparameters(epochs, batch size etc, iterations,.)The weights are updated until the iterations last. In the end, we call the updated weights as models.
Seed is used to control the randomness of initialization. If im not wrong, a good learning algorithm(Objective function and optimizer) converges irrespective of seed values.
Again, A good model means tuning all the hyperparameters, making sure that the model is not underfitting.
On the other hand, even the model shouldn't overfit.
There is nothing like the best parameters(weights, bias), we need to continuously tune the model until the results are satisfactory and the main parts are data processing.
I have a dataset of around 6K chemical formulas which I am preprocessing via Keras' tokenization to perform binary classification. I am currently using a 1D convolutional neural network with dropouts and am obtaining an accuracy of 82% and validation accuracy of 80% after only two epochs. No matter what I try, the model just plateaus there and doesn't seem to be improving at all. Those same exact accuracies are reached with a vanilla LSTM too. What else can I try to improve my accuracies? Losses only have a difference of 0.04... Anyone have any ideas? Both models use an embedding layer and changing the output dimension isn't having an effect either.
According to your answer, I believe your model has a high bias and low variance (see this link for further details). Thus, your model is not fitting your data very well and it is causing underfitting. So, I suggest you 3 things:
Train your model a little longer: I believe two epoch are too few to give a chance to your model understand the patterns in the data. Try to minimize learning rate and increase the number of epochs.
Try a different architecture: you may change the amount of convolutions, filters and layers, You can also use different activation functions and other layers like max pooling.
Make an error analysis: once you finished your training, apply your model to test set and take a look into the errors. How much false positives and false negatives do you have? Is your model better to classify one class than the other? You can see a pattern in the errors that may be related to your data?
Finally, if none of these suggestions helped you, you may also try to increase the number of features, if possible.
I am using Theano with keras. I have a trained DNN and I have dumped the weight's in a file. I am performing some operations on these weights and again dumping the new converted weights into another file.
Now, I am loading my DNN model with these converted weights and want to compare the results between the two.
I used the keras.evaluate method but I find the accuracy to be exactly same even though the weights are different.
Is there another approach with which I can compare the accuracy?
Thanks.
Keras performs some under the hood operations for your batch_size including normalization. So if you only scaled and translated your image the result will stay the same.
Anyways you can do model.predict(sample, 1) and write your own evaluation metric to circumvent this issue.
Suppose there are parameters in the network I would like to change manually in pycaffe, rather than update automatically by the solver. For example, suppose we would like to penalize dense activations, this can be implemented as an additional loss layer. Across the training process, we would like to change the strength of this penalty by multiplying the loss with a coefficient that evolves in a pre-specified way. What would be a good way to do this in caffe? Is it possible to specify this in the prototxt definition? In the pycaffe interface?
Update: I suppose setting lr_mult and decay_mult to 0 might be a solution, but seems like a clumsy one. Maybe a DummyDataLayer providing the parameters as a blob would be a better option. But there is so little documentation that it's quite a struggle to write for someone new to caffe
Maybe this is a trivial question, but just in case someone else might be interested, here is a successful implementation I ended up using
In the layer proto def, set lr_mult and decay_mult to 0, which means that we neither want to learn or decay the parameters. Use filler to set initial values. To change the parameters in python during training of the network, use a statement like
net.param['name'][index].data[...] = something