learn = vision_learner(dls, models.resnet18)
In the above code snippet, I am calling a Vision Learner Resnet 18 model using FastAI and passing in a Dataloader containing my data.
I wonder if this process is performing any transfer learning within this call? As I am passing in my data to the vision learner.
It is important for the task I am carrying out that none is being performed at this stage.
FastAI's vision_learner has a pretrained argument designed specifically for that purpose. By default it is set to True, so in your case you would want to disable it:
learn = vision_learner(dls, models.resnet18, pretrained=False)
When you create a learner, which is a fastai object that combines the data and a model for training, and uses transfer learning to fine tune a pretrained model in just two lines of code:
learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
If you want to make a prediction on a new image, you can use learn.predict
If normalize and pretrained are True, this function adds a Normalization transform to the dls
Related
I have an optimization algorithm deployed as a live deployment. It takes a set of objects and returns a set of objects with potentially different size. This works just fine when I'm using the REST api.
The problem is, I want to let the user of the workshop app query the model with a set of objects. The returned objects need to be written back to the ontology.
I looked into an action backed function but it seems like I can't query the model from a function?!
I looked into webhooks but it seems to not fit the purpose and I also would need to handle the API key and can't write back to the ontology?!
I know how to query the model with scenarios but it is per sample and that does not fit the purpose, plus I cant write back to the ontology.
My questions:
Is there any way to call the model from a function and write the return back to the ontology?
Is there any way to call a model from workshop with a set of objects and write back to the ontology?
Is modeling objectives just the wrong place for this usecase?
Do I need to implement the optimization in Functions itself?
I answered the questions, as well I tried to address some of the earlier points.
Q: "I looked into an action backed function but it seems like I can't query the model from a function?!"
A: That is correct, at this time you can't query a model from a function. However there are javascript based linear optimization libraries which can be used in a function.
Q: "I looked into webhooks but it seems to not fit the purpose and I also would need to handle the API key and can't write back to the ontology?!"
A: Webhooks are for hitting resources on networks where a magritte agent are installed. So if you have like a flask app on your corporate network you could hit that app to conduct the optimization. Then set the webhook as "writeback" on an action and use the webhook outputs as inputs for a ontology edit function.
Q: "I know how to query the model with scenarios but it is per sample and that does not fit the purpose, plus I cant write back to the ontology."
A: When querying a model via workshop you can pass in a single object as well as any objects linked in a 1:1 relationship with that object. This linking is defined in the modeling objective modeling api. You are correct to understand you can't pass in an arbitrary collection of objects. You can write back to the ontology however, you have to set up an action to apply the scenario back to the ontology (https://www.palantir.com/docs/foundry/workshop/scenarios-apply/).
Q: "Is there any way to call the model from a function and write the return back to the ontology?"
A: Not from an ontology edit function
Q: "Is there any way to call a model from workshop with a set of objects and write back to the ontology?"
A: Only object sets where the objects have 1:1 links within the ontology. You can writeback by appyling the scenario (https://www.palantir.com/docs/foundry/workshop/scenarios-apply/).
Q: "Is modeling objectives just the wrong place for this usecase? Do I need to implement the optimization in Functions itself?"
A: If you can write the optimization in an ontology edit function it will be quite a bit more straightforward. The main limitation of this is you have to use Typescript which is not as commonly used for this kind of thing as Python. There are some basic linear optimization libraries available for JS/TS.
I am using pytorch lightning to finetune t5 transformer on a specific task. However, I was not able to understand how the finetuning works. I always see this code :
tokenizer = AutoTokenizer.from_pretrained(hparams.model_name_or_path) model = AutoModelForSeq2SeqLM.from_pretrained(hparams.model_name_or_path)
I don't get how the finetuning is done, are they freezing the whole model and training the head only, (if so how can I change the head) or are they using the pre-trained model as a weight initializing? I have been looking for an answer for couple days already. Any links or help are appreciated.
If you are using PyTorch Lightning, then it won't freeze the head until you specify it do so. Lightning has a callback which you can use to freeze your backbone and training only the head module. See Backbone Finetuning
Also checkout Ligthning-Flash, it allows you to quickly build model for various text tasks and uses Transformers library for backbone. You can use the Trainer to specify which kind of finetuning you want to apply for your training.
Thanks
I am pretty new to allennlp and I am struggling with building a model that does not seem to fit perfectly in the standard way of building model in allennlp.
I want to build a pipeline model using NLP. The pipeline consists basically of two models, let's call them A and B. First A is trained and based on the prediction of the full train A, B trained afterwards.
What I have seen is that people define two separate models, train both using the command line interface allennlp train ... in a shell script that looks like
# set a bunch of environment variables
...
allennlp train -s $OUTPUT_BASE_PATH_A --include-package MyModel --force $CONFIG_MODEL_A
# prepare environment variables for model b
...
allennlp train -s $OUTPUT_BASE_PATH_B --include-package MyModel --force $CONFIG_MODEL_B
I have two concerns about that:
This code is hard to debug
It's not very flexible. When I want to do a forward pass of the fully trained model I have write another script that bash script that does that.
Any ideas on how to do that in a better way?
I thought about using a python script instead of a shell script and invoke allennlp.commands.main(..) directly. Doing so at least you have a joint python module you can run using a debugger.
There are two possibilities.
If you're really just plugging the output of one model into the input of another, you could merge them together into one model and run it that way. You can do this with two already-trained models if you initialize the combined model with the two trained models using a from_file model. To do it at training time is a little harder, but not impossible. You would train the first model like you do now. For the second step, you train the combined model directly, with the inner first model's weights frozen.
The other thing you can do is use AllenNLP as a library, without the config files. We have a template up on GitHub that shows you how to do this. The basic insight is that everything you configure in one of the Jsonnet configuration files corresponds 1:1 to a Python class that you can use directly from Python. There is no requirement to use the configuration files. If you use AllenNLP this way, have much more flexibility, including chaining things together.
I have a PPO policy based model that I train with RLLib using the Ray Tune API on some standard gym environments (with no fancy preprocessing). I have model checkpoints saved which I can load from and restore for further training.
Now, I want to export my model for production onto a system that should ideally have no dependencies on Ray or RLLib. Is there a simple way to do this?
I know that there is an interface export_model in the rllib.policy.tf_policy class, but it doesn't seem particularly easy to use. For instance, after calling export_model('savedir') in my training script, and in another context loading via model = tf.saved_model.load('savedir'), the resulting model object is troublesome (something like model.signatures['serving_default'](gym_observation) doesn't work) to feed the correct inputs into for evaluation. I'm ideally looking for a method that would allow for easy out of the box model loading and evaluation on observation objects
Once you have restored from checkpoint with agent.restore(**checkpoint_path**), you can use agent.export_policy_model(**output_dir**) to export the model as a .pb file and variables folder.
I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text.
For example, having a pre-trained BERT model and a small corpus of medical (or any "type") text, make a language model that is able to generate medical text. The assumption is that you do not have a huge amount of "medical texts" and that is why you have to use transfer learning.
Putting it as a pipeline, I would describe this as:
Using a pre-trained BERT tokenizer.
Obtaining new tokens from my new text and adding them to the existing pre-trained language model (i.e., vanilla BERT).
Re-training the pre-trained BERT model on the custom corpus with the combined tokenizer.
Generating text that resembles the text within the small custom corpus.
Does this sound familiar? Is it possible with hugging-face?
I have not heard of the pipeline you just mentioned. In order to construct an LM for your use-case, you have basically two options:
Further training BERT (-base/-large) model on your own corpus. This process is called domain-adaption as also described in this recent paper. This will adapt the learned parameters of BERT model to your specific domain (Bio/Medical text). Nonetheless, for this setting, you will need quite a large corpus to help BERT model better update its parameters.
Using a pre-trained language model that is pre-trained on a large amount of domain-specific text either from the scratch or fine-tuned on vanilla BERT model. As you might know, the vanilla BERT model released by Google has been trained on Wikipedia and BookCorpus text. After the vanilla BERT, researchers have tried to train the BERT architecture on other domains besides the initial data collections. You may be able to use these pre-trained models which have a deep understanding of domain-specific language. For your case, there are some models such as: BioBERT, BlueBERT, and SciBERT.
Is it possible with hugging-face?
I am not sure if huggingface developers have developed a robust approach for pre-training BERT model on custom corpora as claimed their code is still in progress, but if you are interested in doing this step, I suggest using Google research's bert code which has been written in Tensorflow and is totally robust (released by BERT's authors). In their readme and under Pre-training with BERT section, the exact process has been declared. This will provide you with Tensorflow checkpoint, which can be easily converted to Pytorch checkpoint if you'd like to work with Pytorch/Transformers.
It is entirely possible to both pre-train and further pre-train BERT (or almost any other model that is available in the huggingface library).
Regarding the tokenizer - if you are pre-training on a a small custom corpus (and therefore using a trained bert checkpoint), then you have to use the tokenizer that was used to train Bert. Otherwise, you will just confuse the model.
If your use case is text generation (from some initial sentence/part of sentence), then I can advise you to check gpt-2 (https://huggingface.co/gpt2). I haven't used GPT-2, but with some basic research I think you can do:
from transformers import GPT2Tokenizer, TFGPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = TFGPT2Model.from_pretrained('gpt2')
and follow this tutorial: https://towardsdatascience.com/train-gpt-2-in-your-own-language-fc6ad4d60171 on how to train the gpt-2 model.
Note: I am not sure if DeBERTa-V3, for example, can be pre-trained as usual. I have checked their github repo and it seems that for V3 there is no official pre-training code (https://github.com/microsoft/DeBERTa/issues/71). However, I think that using huggingface we can actually do it. Once I have time I will run a pre-training script and verify this.