Catboost model can use several eval_sets for detecting overfit.
But how overfitting-detector works when multiple datasets are passed?
Read docs, googled it. I thought I'd find something, but I didn't...
Related
assumed I have a trained model of torch.nn.Module, and I only need it from now on for evaluation.
Is PyTorch passing my data through all the layers or is it compressing the model so that it only calculates an equivalent function?
If not, is there a way to do it in order to make the calculation faster and the model lighter in memory terms?
I have been looking on the internet for a similar question and didn't find any suitable answer.
You should do two things during inference: set your model to evaluation mode with model.eval() and use no_grad() (disables gradient computation). This will make your code faster and more memory-efficient.
In practice this will look like
model.eval()
with torch.no_grad():
#your inference code here
There are many options and it depends on your specific case.
One option is to convert to TorchScript.
Another option is to do quantization on the model.
Third, you could perform knowledge distillation, transferring the knowledge from your existing model to a smaller one.
I am trying to train a model on two different OS (ubuntu:18.04, macOS 11.6.5) and get the same result. I use pytorch_lightning.seed_everything as well as
Trainer( deterministic=True, ..)
Both models are initialized to identically, so the seeds are working correctly. And both train on the cpu.
Training with data that has nice continuous values, I get the identical models at the end. However, if I use data that has a bunch of onehot features, I get similar models on both OS but as the epochs go up they diverge slowly probably due to the small errors/difference in precision adding up.
Does anyone have any ideas of what could cause this issue? Any ideas on how to fix this?
When we build a model and train it, the initial weights are randomly initialized, unless specified (seed).
As we know, there are a variety of parameters we can adjust like epochs, optimizers, batch_size, etc to find the "best" model.
The concept I have trouble with is: Even if we do find the best model after tuning, the weights will be different, yielding different models and results. So the best model for this maybe wouldn't be the best if we compiled and ran it again with the "best parameters". If we seed the weights with the parameters for reproducibility, we don't know if those would be the best weights. On the other hand, if we tune the weights, then the "best parameters" won't be best parameters anymore? I am stuck in a loop. Is there a general guideline on what parameters to tune first as opposed to others?
Or is this whole logic flawed somewhere and I am way overthinking?
We initialize weights randomly to ensure that each node acts differently(unsymmetric) from others.
Depending upon the hyperparameters(epochs, batch size etc, iterations,.)The weights are updated until the iterations last. In the end, we call the updated weights as models.
Seed is used to control the randomness of initialization. If im not wrong, a good learning algorithm(Objective function and optimizer) converges irrespective of seed values.
Again, A good model means tuning all the hyperparameters, making sure that the model is not underfitting.
On the other hand, even the model shouldn't overfit.
There is nothing like the best parameters(weights, bias), we need to continuously tune the model until the results are satisfactory and the main parts are data processing.
I have a dataset of around 6K chemical formulas which I am preprocessing via Keras' tokenization to perform binary classification. I am currently using a 1D convolutional neural network with dropouts and am obtaining an accuracy of 82% and validation accuracy of 80% after only two epochs. No matter what I try, the model just plateaus there and doesn't seem to be improving at all. Those same exact accuracies are reached with a vanilla LSTM too. What else can I try to improve my accuracies? Losses only have a difference of 0.04... Anyone have any ideas? Both models use an embedding layer and changing the output dimension isn't having an effect either.
According to your answer, I believe your model has a high bias and low variance (see this link for further details). Thus, your model is not fitting your data very well and it is causing underfitting. So, I suggest you 3 things:
Train your model a little longer: I believe two epoch are too few to give a chance to your model understand the patterns in the data. Try to minimize learning rate and increase the number of epochs.
Try a different architecture: you may change the amount of convolutions, filters and layers, You can also use different activation functions and other layers like max pooling.
Make an error analysis: once you finished your training, apply your model to test set and take a look into the errors. How much false positives and false negatives do you have? Is your model better to classify one class than the other? You can see a pattern in the errors that may be related to your data?
Finally, if none of these suggestions helped you, you may also try to increase the number of features, if possible.
I have an h2o deeplearning model, "model1", that generalizes very well. Unfortunately, I forgot to set export weights and biases = TRUE when building the model.
I've tried to retrain numerous models with all the exact parameters, seed, and dataset as in the original model1 plus added set export weights and biases to true.
Unfortunately none of these new models generalize well at all. In fact they all fail miserably to generalize - although all models train, validate, cross-validate and test very well. I've even tried checkpointing the original model1 so I can add the export weights and biases argument = TRUE. However, because I did not use Modulo CV, I'm unable to checkpoint.
Irreproducibility is giving me a huge headache. In order for me to productionalize, I need to somehow extract the weights and biases of this original, working model1 - despite export weights and biases being originally set to FALSE.
I've looked at the mean weights and biases of model1 and they simply do not match any of mean weights and biases of my retrained models with same parameters, seed, dataset, etc. I'm uncertain if mean weights and biases can be used somehow to force reproducibility.
I've read that downloading model1 to POJO may allow access to the weights and biases, but I'm uncertain about this, I don't know java and I don't see any example java code to help me along.
Any suggestions or other possible solutions/workarounds?
Thank you in advance for any help.
I partially resolved this on my own: after downloading model to POJO, I opened up the file with a text editor based on Darren Cook's suggestion (thank you), and I think I can see all the weights and biases here.
I'm not certain however b/c I'm unfamiliar with the POJO format.