Information about Embeddings in the Allen Coreference Model - allennlp

I'm an Italian student approaching the NLP world.
First of all I'd like to thank you for the amazing work you've done with the paper " Higher-order Coreference Resolution with Coarse-to-fine Inference".
I am using the model provided by allennlp library and I have two questions for you.
in https://demo.allennlp.org/coreference-resolution it is written that the embedding used is SpanBERT. Is this a BERT embedding trained regardless of the coreference task? I mean, could I possibly use this embedding just as a pretrained model on the english language to embed sentences? (e.g. like https://huggingface.co/facebook/bart-base )
is it possible to modify the code in order to return, along with the coreference prediction, also the aforementioned embeddings of each sentence?
I really hope you can help me.
Meanwhile I thank you in advance for your great availability.
Sincerely,
Emanuele Gusso

SpanBERT is a version of BERT pre-trained to produce useful embeddings on text spans. SpanBERT itself has nothing to do with coreference resolution. The original paper is https://arxiv.org/abs/1907.10529, and the original source code is https://github.com/facebookresearch/SpanBERT, though you might have an easier time using the huggingface version at https://huggingface.co/SpanBERT.
It is definitely possible to get the embeddings as output, along with the coreference predictions. I recommend cloning https://github.com/allenai/allennlp-models, getting it to run in your environment, and then changing the code until it gives you the output you want.

Related

Stacking/chaining CNNs for different use cases

So I'm getting more and more into deep learning using CNNs.
I was wondering if there are examples of "chained" (I don't know what the correct term would be) CNNs - what I mean by that is, using e.g. a first CNN to perform a semantic segmentation task while using its output as input for a second CNN which for example performs a classification task.
My questions would be:
What is the correct term for this sequential use of neural networks?
Is there a way to pack multiple networks into one "big" network which can be trained in one a single step instead of training 2 models and combining them.
Also if anyone could maybe provide a link so I could read about that kind of stuff, I'd really appreciate it.
Thanks a lot in advance!
Sequential use of independent neural networks can have different interpretations:
The first model may be viewed as a feature extractor and the second one is a classifier.
It may be viewed as a special case of stacking (stacked generalization) with a single model on the first level.
It is a common practice in deep learning to chain multiple models together and train them jointly. Usually it calls end-to-end learning. Please see the answer about it: https://ai.stackexchange.com/questions/16575/what-does-end-to-end-training-mean

Where to find deep learning based prediction model

I need to find a deep learning based prediction model, where can I find it?
You can use Pytorch and Tensorflow pretrained models.
https://pytorch.org/docs/stable/torchvision/models.html
They can be automatically downloaded. There are some sample codes, that you can try:
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
If you are interested in deep learning, I suggest you review the basics of it in cs231n stanford. Your question is a bit odd, because you first need to define your task specifically. Prediction is not a good description. You could look for models for classification, segmentation, object detection, sequence2sequence(like translation), and so on...
Then you need to know how to search through projects on github, and then you need to know python (in most cases), and then use a pretrained model or use your own dataset to train or fine-tune the model for that task. Then you could pray that you have found a good model for your task, after that you need to validate the results on a test set. However, implementation of a model for real-life scenarios is another thing that you need to consider many other things, and you usually need some online-learning strategy, like Federated Learning. I hope that I could help you.

Deep Learning methods for Text Generation (PyTorch)

Greetings to everyone,
I want to design a system that is able to generate stories or poetry based on a large dataset of text, without being needed to feed a text description/start/summary as input at inference time.
So far I did this using RNN's, but as you know they have a lot of flaws. My question is, what are the best methods to achieve this task at the time?
I searched for possibilities using Attention mechanisms, but it turns out that they are fitted for translation tasks.
I know about GPT-2, Bert, Transformer, etc., but all of them need a text description as input, before the generation and this is not what I'm seeking. I want a system able to generate stories from scratch after training.
Thanks a lot!
edit
so the comment was: I want to generate text from scratch, not starting from a given sentence at inference time. I hope it makes sense.
yes, you can do that, that's just simple code manipulation on top of the ready models, be it BERT, GPT-2 or LSTM based RNN.
How? You have to provide random input to the model. Such random input can be randomly chosen word or phrase or just a vector of zeroes.
Hope it helps.
You have mixed up several things here.
You can achieve what you want either using LSTM based or transformer based architecture.
When you said you did it with RNN, you probably mean that you have tried LSTM based sequence to sequence model.
Now, there is attention in your question. So you can use attention to improve your RNN but it is not a required condition. However, if you use transformer architecture, then it is built in the transormer blocks.
GPT-2 is nothing but a transformer based model. Its building block is a transformer architecture.
BERT is also another transformer based architecture.
So to answer your question, you should and can try using LSTM based or transformer based architecture to achieve what you want. Sometimes such architecture is called GPT-2, sometimes BERT depending on how it is realized.
I encourage you to read this classic from Karpathy, if you understand it then you have cleared most of your questions:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Affective Demonstratives and POS Tagging

Is there a way to accurately tag affective demonstratives in a corpus? Attempting a project using a Twitter corpus and I need to be able to sort through 200,000+ tweets to pick out the ones with affective demonstratives. I'd rather not do it by hand!
I'm using NLTK and Twython with this whole process if that helps at all.
I don't know of an off-the-shelf solution, but this sounds like a classic NLP classification task. You'll need a sizeable corpus in which you (or someone else) have marked up the "affective demonstratives", and then you'll need to train a classifier and experiment with different features or feature selection algorithms. Look over the nltk book for details.
You would probably want to start by using a standard tagger to POS-tag your corpus; then you can use these tags (and anything else you think might be useful) as input features for your classifier.

How to creat CNN model in Image Recognition with Tensorflow to compare with Inception v3

I'm studying Image Recognition with Tensorflow. I already read about the topic How to retrain Inception's Layer for new categories on Tensorflow.org, which utilize the Inception v3 training model.
Now, I desire to creat my own CNN model in order to compare with Inception v3, but I don't know how can I begin with.
Anyone knows some guides step-by-step on this problem?
I'd appreciate any your suggestion
Thanks in advance
First baby steps
The gold standard for getting started in image recognition is processing MNIST images. Tensorflow has a great tutorial on how to get started and also how to move to convolutional networks.
From there it is a long hard road to compete with Inception without just copying someone else's graph. You'll probably want to get a feel for what the different layers of convolution do. I created a basic Tensorflow Tutorial which contains an example python file that demos different convolution graphs and their resulting accuracy.
Going deeper
After conquering MNIST you'll need a lot of images (you can get them from imageNet) and a lot of GPU (to run all your training) and a software setup so that you can not only run and test your model, but dozens (if not hundreds) of variations to explore your hyper parameters (like learning rate, convolution size, dropout, etc). Remember, it took a team of leading edge Machine Learning experts to create something like Inception, many many months (possibly years) of iteration to find the model they use today, and thousands of CPU/GPU hours.
If you are trying to understand what is going on and what makes a good graph, then trying to recreate Inception is a great idea. If you just want an excellent Image recognition model, then reuse an existing one.
If you are trying to have fun, just do it!
Cheers-