How to use mxnet RNN symbol to generate lstm - deep-learning

I've found RNN symbol is added in mxnet v0.7 python lib.
Now, I'm trying to use it to impl lstm in example/rnn with python.
But I have no idea because there's no document or any information of the input and output.
Can anyone give me any advice?
thanks

I think they have already given a simple example on how to use RNN symbol to build LSTM (they have mode option for this, i.e., mode='lstm'). Here is the example, check it out: https://github.com/dmlc/mxnet/blob/master/example/rnn/rnn_cell_demo.py

Related

How does the finetune on transformer (t5) work?

I am using pytorch lightning to finetune t5 transformer on a specific task. However, I was not able to understand how the finetuning works. I always see this code :
tokenizer = AutoTokenizer.from_pretrained(hparams.model_name_or_path) model = AutoModelForSeq2SeqLM.from_pretrained(hparams.model_name_or_path)
I don't get how the finetuning is done, are they freezing the whole model and training the head only, (if so how can I change the head) or are they using the pre-trained model as a weight initializing? I have been looking for an answer for couple days already. Any links or help are appreciated.
If you are using PyTorch Lightning, then it won't freeze the head until you specify it do so. Lightning has a callback which you can use to freeze your backbone and training only the head module. See Backbone Finetuning
Also checkout Ligthning-Flash, it allows you to quickly build model for various text tasks and uses Transformers library for backbone. You can use the Trainer to specify which kind of finetuning you want to apply for your training.
Thanks

How can I start writing the code for my layer?

I have seen that researchers are adding some functionalities to the original version of Caffe and use those layers and functionalities according to what they need and then these versions are shared through Github. If I am not mistaken, there are two ways: 1) by recompiling Caffe after adding c++ and Cuda versions of layers. 2) writing a python code for the functionality and call it as python layer in Caffe.
I want to add a new layer to Caffe based on my research problem. I really do not from which point should I start writing the new layer and which steps I should consider.
My questions are:
1) Is there any documentation or any learning materials that I can use it for writing the layer?
2) Which way of above-mentioned methods of adding a new layer is preferred?
I really appreciate any help and guidance
Thanks a lot
For research purposes, for "playing around", it is usually more convenient to write a python layer: saves you the hustle of compiling etc.
You can find a short tutorial on "Python" layer here.
On the other hand, if you want better performance you should write a native c++ code for your layer.
You can find a short explanation about it here.

INFO:tensorflow:Summary name conv2d_1/kernel:0 is illegal

I am trying to use the tensorboard callback in keras. When I run the pretrained inceptionv3 model with the tensorboard callback I am getting the following warning:
INFO:tensorflow:Summary name conv2d_95/kernel:0 is illegal; using conv2d_95/kernel_0 instead.
I saw a comment on Github addressing this issue. SeaFX on his comment pointed out that he solved it by replacing variable.name with variable.name.replace(':','_'). I am unsure how to do that. Can anyone please help me. Thanks in advance :)
Not sure on getting name replacement to work however a workaround that may be sufficient for your needs is:
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.WARN)
import keras
This will turn off all INFO level logging but keep warnings, errors etc.
See this question for a discussion on the various log levels and changing them. Personally I found setting the TF_CPP_MIN_LOG_LEVEL environment variable didn't work under Jupyter notebook but I haven't tested on base Python.

In keras(deep learning library), sush custom embedding layer possible?

I just moved in recently from theano, lasagne to keras.
When I in theano, I used such custom embedding layer.
How to keep the weight value to zero in a particular location using theano or lasagne?
It' was useful when deal of variable length input by adding padding.
In keras, such custom embedding layer possible?
Then, how can I make it?
And, such embedding layer may be wrong?
This may not be exactly what you want, but the solution I personally use as it is used in Keras examples (e.g. this one) is to pad the data to a constant length before feeding it to network.
Keras itself provide this pre-processing tool for sequences in keras.preprocessing.sequence.pad_sequences(seq, length)

Weka: Limitations on what one can output as source?

I was consulting several references to discover how I may output trained Weka models into Java source code so that I may use the classifiers I am training in actual code for research applications I have been developing.
As I was playing with Weka 3.7, I noticed that while it does output Java code to its main text buffer when use simpler classification (supervised in my case this time) methods such as J48 decision tree, it removes the option (rather, it voids it by removing the ability to checkmark it and fades the text) to output Java code for RandomTree and RandomForest (which are the ones that give me the best performance in my situation).
Note: I am clicking on the "More Options" button and checking "Output source code:".
Does Weka not allow you to output RandomTree or RandomForest as Java code? If so, why? Or if it does and just doesn't put it in the output buffer (since RF is multiple decision trees which I imagine it doesn't want to waste buffer space), how does one go digging up where in the file system Weka outputs java code by default?
Are there any tricks to get Weka to give me my trained RandomForest as Java code? Or is Serialization of the output *.model files my only hope when it comes to RF and RandomTree?
Thanks in advance to those who provide help.
NOTE: (As an addendum to the answer provided below) If you run across a similar situation (requiring you to use your trained classifier/ML model in your code), I recommend following the links posted in the answer that was provided in response to my question. If you do not specifically need the Java code for the RandomForest, as an example, de-serializing the model works quite nicely and fits into Java application code, fulfilling its task as a trained model/hardened algorithm meant to predict future unlabelled instances.
RandomTree and RandomForest can't be output as Java code. I'm not sure for the reasoning why, but they don't implement the "Sourceable" interface.
This explains a little about outputting a classifier as Java code: Link 1
This shows which classifiers can be output as Java code: Link 2
Unfortunately I think the easiest route will be Serialization, although, you could maybe try implementing "Sourceable" for other classifiers on your own.
Another, but perhaps inconvenient solution, would be to use Weka to build the classifier every time you use it. You wouldn't need to load the ".model" file, but you would need to load your training data and relearn the model. Here is a starters guide to building classifiers in your own java code http://weka.wikispaces.com/Use+WEKA+in+your+Java+code.
Solved the problem for myself by turning the output of WEKA's -printTrees option of the RandomForest classifier into Java source code.
http://pielot.org/2015/06/exporting-randomforest-models-to-java-source-code/
Since I am using classifiers with Android, all of the existing options had disadvantages:
shipping Android apps with serialized models didn't reliably work across devices
computing the model on the phone took too much resources
The final code will consist of three classes only: the class with the generated model + two classes to make the classification work.