Is there a notation named sensor in nltk? - nltk

I am learning Stanford CS224N: natural language processing with Deep Learning.
Chris said
"very fine-grain differences between sensors that are a human being
can barely understand the difference between them and relate to"
in Lecture 1 while he is illustrating the piece of NLTK code.
Is there a notation named sensor in nltk? if yes, what does that mean?

I think that the automatic captioning of Youtube is wrong and that the lecturer pronounced the word synset.
And yes, there is a notation for synsets in NLTK, in fact the notation is coming from Wordnet.
You can get a synset with:
from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
where dog is the morphological stem of one of the lemma, n is the part of speech (noun in this case), and 01 is an index.
According to the NLTK documentation:
Synset(wordnet_corpus_reader)
Create a Synset from a lemma.pos.number string where: lemma is the word’s morphological stem pos is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB number is the sense number, counting from 0.

Related

word synonym / antonym detection

I need to create a classifier that takes 2 words and determines if they are synonyms or antonyms. I tried nltk's antsyn-net but it doesn't have enough data.
example:
capitalism <-[antonym]-> socialism
capitalism =[synonym]= free market
god <-[antonym]-> atheism
political correctness <-[antonym]-> free speach
advertising =[synonym]= marketing
I was thinking about taking a BERT model, because may be some of the relations would be embedded in it and transfer-learn on a data-set that I found.
I would suggest a following pipeline:
Construct a training set from existing dataset of synonyms and antonyms (taken e.g. from the Wordnet thesaurus). You'll need to craft negative examples carefully.
Take a pretrained model such as BERT and fine-tune it on your tasks. If you choose BERT, it should be probably BertForNextSentencePrediction where you use your words/prhases instead of sentences, and predict 1 if they are synonyms and 0 if they are not; same for antonyms.

How is the self-attention mechanism in Transformers able to learn how the words are related to each other?

Given the sentence The animal didn't cross the street because it was too tired, how the self-attention is able to map with a higher score the word aninal intead of the word street ?
I'm wondering if that might be a consequence of the word embedding vectors fed into the network, that some how already encapsulate some degree of distance among the words.
Word Embeddings are first added to Positional Encoding which adds information about the word's position in the sequence. Then through each Encoder stack(6 to be precise), the Embeddings undergo multiple transformations and are refined to form better representations before they are passed on to the decoder.
The modification to the Embeddings as it passes through the Encoder Stack is learnable. Sometimes it may appear that some Attention-Heads at the top Stack are doing something that may look like coreference resolution which you pointed out in your example. Attending more to the word "animal" simply results in better representation than attending to "street".
How do we know which representations are better? The one that minimizes the loss or produces a better output of course!

ECMAScript 2017, 5 Notational Conventions: What are productions, terminal and nonterminal symbols? [duplicate]

Can someone explain to me what a context free grammar is? After looking at the Wikipedia entry and then the Wikipedia entry on formal grammar, I am left utterly and totally befuddled. Would someone be so kind as to explain what these things are?
I am wondering this because I wish to investigate parsing, and also on the side, the limitation of a regex engine.
I'm not sure if these terms are directly programming related, or if they are related more to linguistics in general. If that is the case, I apologize, perhaps this could be moved if so?
A context free grammar is a grammar which satisfies certain properties. In computer science, grammars describe languages; specifically, they describe formal languages.
A formal language is just a set (mathematical term for a collection of objects) of strings (sequences of symbols... very similar to the programming usage of the word "string"). A simple example of a formal language is the set of all binary strings of length three, {000, 001, 010, 011, 100, 101, 110, 111}.
Grammars work by defining transformations you can make to construct a string in the language described by a grammar. Grammars will say how to transform a start symbol (usually S) into some string of symbols. A grammar for the language given before is:
S -> BBB
B -> 0
B -> 1
The way to interpret this is to say that S can be replaced by BBB, and B can be replaced by 0, and B can be replaced by 1. So to construct the string 010 we can do S -> BBB -> 0BB -> 01B -> 010.
A context-free grammar is simply a grammar where the thing that you're replacing (left of the arrow) is a single "non-terminal" symbol. A non-terminal symbol is any symbol you use in the grammar that can't appear in your final strings. In the grammar above, "S" and "B" are non-terminal symbols, and "0" and "1" are "terminal" symbols. Grammars like
S -> AB
AB -> 1
A -> AA
B -> 0
Are not context free since they contain rules like AB -> 1 that have more than one non-terminal symbol on the left.
Language Theory is related to Theory of Computation. Which is the more philosophical side of Computer Science, about deciding which programs are possible, or which will ever be possible to write, and what type of problems is it impossible to write an algorithm to solve.
A regular expression is a way of describing a regular language. A regular language is a language which can be decided by a deterministic finite automaton.
You should read the article on Finite State Machines: http://en.wikipedia.org/wiki/Finite_state_machine
And Regular languages:
http://en.wikipedia.org/wiki/Regular_language
All Regular Languages are Context Free Languages, but there are Context Free Languages that are not regular. A Context Free Language is the set of all strings accept by a Context Free Grammer or a Pushdown Automata which is a Finite State Machine with a single stack: http://en.wikipedia.org/wiki/Pushdown_automaton#PDA_and_Context-free_Languages
There are more complicated languages that require a Turing Machine (Any possible program you can write on your computer) to decide if a string is in the language or not.
Language theory is also very related to the P vs. NP problem, and some other interesting stuff.
My Introduction to Computer Science third year text book was pretty good at explaining this stuff: Introduction to the Theory of Computation. By Michael Sipser. But, it cost me like $160 to buy it new and it's not very big. Maybe you can find a used copy or find a copy at a library or something it might help you.
EDIT:
The limitations of Regular Expressions and higher language classes have been researched a ton over the past 50 years or so. You might be interested in the pumping lemma for regular languages. It is a means of proving that a certain language is not regular:
http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages
If a language isn't regular it may be Context Free, which means it could be described by a Context Free Grammer, or it may be even in a higher language class, you could prove it's not Context Free by the pumping lemma for Context Free languages which is similar to the one for regular expressions.
A language can even be undecidable, which means even a Turing machine (may program your computer can run) can't be programmed to decide if a string should be accepted as in the language or rejected.
I think the part you're most interested in is the Finite State Machines (Both Deterministic and Deterministic) to see what languages a Regular Expression can decide, and the pumping lemma to prove which languages are not regular.
Basically a language isn't regular if it needs some sort of memory or ability to count. The language of matching parenthesis is not regular for example because the machine needs to remember if it has opened a parenthesis to know if it has to close one.
The language of all strings using the letters a and b that contain at least three b's is a regular language: abababa
The language of all strings using the letters a and b that contain more b's than a's is not regular.
Also you should not that all finite language are regular, for example:
The language of all strings less than 50 characters long using the letters a and b that contain more b's than a's is regular, since it is finite we know it could be described as (b|abb|bab|bba|aabbb|ababb|...) ect until all the possible combinations are listed.

wordnet on different text?

I am new to nltk, and I find wordnet functionality pretty useful. It gives synsets, hypernyms, similarity, etc. But however it fails to give similarity between locations like 'Delhi'-'Hyderabad' obviously as these words are not in the wordnet corpus.
So, I would like to know, if somehow I can update the wordnet corpus OR create wordnet over a different corpus e.g. Set of pages extracted from wikipedia related to travel? If at all we can create wordnet over different corpus, then what would be the format, steps to do the same, any limitations?
Please can you point me to links that describe the above concerns. I have searched the internet, googled, read portions of nltk book, but I don't have a single hint to above question.
Pardon me, if the question sounds completely ridiculous.
For flexibility in measuring the semantic similarity of very specific terms like Dehli or Hyderabad, what you want is not something hand-crafted like WordNet, but an automatically-learned similarity measure from a very large database. These are statistical similarity approaches. Of course, you want to avoid having to train such a model on data yourself...
Thus one thing that may be useful is the Google Distance (wikipedia, original paper). It seems fairly simple to implement such a measure in a language like R (code), and the original paper reports 87% agreement with WordNet.
The similarity measures in Wordnet work as expected because Wordnet measures semantic similarity. In that sense, both are cities, so they are very similar. What you are looking for is probably called geographic similarity.
delhi = wn.synsets('Delhi', 'n')[0]
print delhi.definition()
# a city in north central India
hyderabad = wn.synsets('Hyderabad', 'n')[0]
print hyderabad.definition()
# a city in southern Pakistan on the Indus River
delhi.wup_similarity(hyderabad)
# 0.9
melon = wn.synsets('melon', 'n')[0]
delhi.wup_similarity(melon)
# 0.3
There is a Wordnet extension, called Geowordnet. I kind of had the same problem as you at one point and tried to unify Wordnet with some of its extensions: wnext. Hope that helps.

Find similarity of a sentence with 6 basic emotions using wordnet

i'm working on a project and a part of it needs to detect emotion of the text we work on.
For example,
He is happy to go home.
I'll be taking two words from the above sentence i.e happy and home.
I'll be having a table containing 6 basic emotions. ( Happy, Sad, fear,anger,disgust, suprise)
Each of these emotions will be having some synsets associated with them.
I need to find the similarity between these synsets and the word happy and then similarity between these synsets and the word home.
I tried to use WORDNET for this purpose but couldn't able to understand how wordnet works as i'm new to this.
I think you want to find words in sentence that are similar to any of the words that represent any of the 6 basic given emotions. If I am correct I think you can use following solution.
First extract synset of each of the word sense representing 6 basic emotions. Now form the vectorized representation of each of these synset(collection of synonymous words). You can do this using word2Vec tool available at https://code.google.com/archive/p/word2vec/ . e.g.
Suppose "happy" has the word senses a1, a2, a3 as its synonymous words then
1. First train Word2Vec tool on any large English Corpus e.g. Bojar corpus
2. Then using trained word2Vec obtain word embeddings(vectorized representation) of each synonymous word a1, a2, a3.
3. Then vectorized representation of synset of "happy" would be average of vectorized representation of a1, a2, a3.
4. In this way you can have vectorized representation synset of each of the 6 basic emotion.
Now for given sentence find vectorized representation of each of the word in using trained word2vec generated vocabulary. Now you can use cosine similarity
(https://en.wikipedia.org/wiki/Cosine_similarity) to find distance(similarity) of each of the word from synset of 6 basic emotions. In this way you can determine emotion(basic level) of the sentence.
Source of the technique : Research paper "Unsupervised Most Frequent Sense Detection using Word Embeddings" by Sudha et. al.(http://www.aclweb.org/anthology/N15-1132)