i'm trying to build a model that classify sentences. i'm using a Reccurent neural network(RNN) model "GRUcell" and i have the following Graph. the loss function i'm using is cross entropy.
can you please explain me why the loss after been close to 0 pick to 1 after each iterations?
i can't find any interpretation of this, thank you.
enter image description here
According to the information you have provided it looks like its going down at the end of a batch and again going back up at the start of the next batch. This can be due to a high learning rate with not enough decay over time.
Try to tweak the parameters and see if that helps.
Cheers
Related
So I'm getting more and more into deep learning using CNNs.
I was wondering if there are examples of "chained" (I don't know what the correct term would be) CNNs - what I mean by that is, using e.g. a first CNN to perform a semantic segmentation task while using its output as input for a second CNN which for example performs a classification task.
My questions would be:
What is the correct term for this sequential use of neural networks?
Is there a way to pack multiple networks into one "big" network which can be trained in one a single step instead of training 2 models and combining them.
Also if anyone could maybe provide a link so I could read about that kind of stuff, I'd really appreciate it.
Thanks a lot in advance!
Sequential use of independent neural networks can have different interpretations:
The first model may be viewed as a feature extractor and the second one is a classifier.
It may be viewed as a special case of stacking (stacked generalization) with a single model on the first level.
It is a common practice in deep learning to chain multiple models together and train them jointly. Usually it calls end-to-end learning. Please see the answer about it: https://ai.stackexchange.com/questions/16575/what-does-end-to-end-training-mean
I'm an Italian student approaching the NLP world.
First of all I'd like to thank you for the amazing work you've done with the paper " Higher-order Coreference Resolution with Coarse-to-fine Inference".
I am using the model provided by allennlp library and I have two questions for you.
in https://demo.allennlp.org/coreference-resolution it is written that the embedding used is SpanBERT. Is this a BERT embedding trained regardless of the coreference task? I mean, could I possibly use this embedding just as a pretrained model on the english language to embed sentences? (e.g. like https://huggingface.co/facebook/bart-base )
is it possible to modify the code in order to return, along with the coreference prediction, also the aforementioned embeddings of each sentence?
I really hope you can help me.
Meanwhile I thank you in advance for your great availability.
Sincerely,
Emanuele Gusso
SpanBERT is a version of BERT pre-trained to produce useful embeddings on text spans. SpanBERT itself has nothing to do with coreference resolution. The original paper is https://arxiv.org/abs/1907.10529, and the original source code is https://github.com/facebookresearch/SpanBERT, though you might have an easier time using the huggingface version at https://huggingface.co/SpanBERT.
It is definitely possible to get the embeddings as output, along with the coreference predictions. I recommend cloning https://github.com/allenai/allennlp-models, getting it to run in your environment, and then changing the code until it gives you the output you want.
I have a small dataset of about 1000 images and am training my model to detect 8 classes. I had divided my dataset in a ratio of 80:20 (training: validation) and wanted to apply k-fold cross validation so as to make the most of my dataset.
#1: Is this line of thinking proper or am I misunderstanding something? In another post about K-fold cross-validation in object detection, someone mentioned that since we have confidence scores, we don't require k fold cross-validation. However, I don't see a correlation between training my model on the 'k' number of folds and confidence scores.
#2: Is this something that has to be manually done or does tensorflow 2.x have the means to add k fold cross-validation?
Any clarification would be greatly appreciated! Thanks!
About your query 1 and 2
(IMO), It would be proper to do K-Fold. FYI, splitting the data set into the 8:2 ratio is something called the holdout method, AFAIK, it's not K-Fold. When you want to do K-Fold there is something you probably need to consider such as class distribution, bounding box distribution, etc. However, as you don't provide any sample data or code, here is a similar discussion that might help you.
It has to be manually done. It's a resampling procedure used to evaluate machine learning models on a limited data sample. It's not something integrated with any framework.
I'm new to this forum. I viewed this simple reinforcement learning sarsa code This is code link
What i am unable to see is how to store its model, like we used to store weights in CNN in deep learning, so we can just load the model and work it without needing it to train everytime. Is it possible to achieve in this? Thanks a lot
Hi and welcome #BetaLearner. In the example of the link, the Q-function is stored as a table instead of using a neural network or other kind of function approximator. So, you can simply save the table (actually stored as a defaultdict) and load it later without need to train again.
Greetings to everyone,
I want to design a system that is able to generate stories or poetry based on a large dataset of text, without being needed to feed a text description/start/summary as input at inference time.
So far I did this using RNN's, but as you know they have a lot of flaws. My question is, what are the best methods to achieve this task at the time?
I searched for possibilities using Attention mechanisms, but it turns out that they are fitted for translation tasks.
I know about GPT-2, Bert, Transformer, etc., but all of them need a text description as input, before the generation and this is not what I'm seeking. I want a system able to generate stories from scratch after training.
Thanks a lot!
edit
so the comment was: I want to generate text from scratch, not starting from a given sentence at inference time. I hope it makes sense.
yes, you can do that, that's just simple code manipulation on top of the ready models, be it BERT, GPT-2 or LSTM based RNN.
How? You have to provide random input to the model. Such random input can be randomly chosen word or phrase or just a vector of zeroes.
Hope it helps.
You have mixed up several things here.
You can achieve what you want either using LSTM based or transformer based architecture.
When you said you did it with RNN, you probably mean that you have tried LSTM based sequence to sequence model.
Now, there is attention in your question. So you can use attention to improve your RNN but it is not a required condition. However, if you use transformer architecture, then it is built in the transormer blocks.
GPT-2 is nothing but a transformer based model. Its building block is a transformer architecture.
BERT is also another transformer based architecture.
So to answer your question, you should and can try using LSTM based or transformer based architecture to achieve what you want. Sometimes such architecture is called GPT-2, sometimes BERT depending on how it is realized.
I encourage you to read this classic from Karpathy, if you understand it then you have cleared most of your questions:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/