Nltk Wordnet not lemmatizing word even with POS tag - nltk

When I do wnl.lemmatize('promotional','a') or wnl.lemmatize('promotional',wordnet.ADJ), I get merely 'promotional' when it should return promotion. I supplied the correct POS, so why isn't it working? What can I do?

Lemmatization only changes between inflected forms, so the noun "promotion" isn't a lemma of the adjective "promotional".
Note that your noun is included as a pertainym for the lemma.
wn.synsets('promotional')[0].lemmas()[0]
Lemma('promotional.a.01.promotional')
wn.synsets('promotional')[0].lemmas()[0].pertainyms()
[Lemma('promotion.n.01.promotion')]

Related

how to determine past perfect tense from POS tags

The past perfect form of 'I love.' is 'I had loved.' I am trying to identify such past perfects from POS tags (using NLTK, spacy, Stanford CoreNLP). What POS tag should I be looking for? Instead .. should I be looking for past form of the word have .. will that be exhaustive?
I PRP PRON
had VBD VERB
loved VBN VERB
. . PUNCT
The complete POS tag list used by CoreNLP (and I believe all the other libraries trained on the same data) is available at https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
I think your best best is to let the library annotate a list of sentences where you want to identify a specific verbal form and manually derive a series of rules (e.g., sequences of POS tags) that match what you need. For example you could be looking for VBD ("I loved"), VBD VBN ("I had loved"), VBD VBG ("I was loving somebody"), etc...

Html entity for true minus symbol

I'm a typophile adding mathematical equations to my pages.
I've found questions like this one that have explained to use × instead of 'x' for a true multiplication symbol. But I can't find any questions that indicate whether an html entity exists for a true minus symbol instead of using a hyphen, en-dash or em-dash?
Any help would be much appreciated.
According to this reference, the HTML entity is −

Idiomatic Proof by Contradiction in Isabelle?

So far I wrote proofs by contradiction in the following style in Isabelle (using a pattern by Jeremy Siek):
lemma "<expression>"
proof -
{
assume "¬ <expression>"
then have False sorry
}
then show ?thesis by blast
qed
Is there a way that works without the nested raw proof block { ... }?
There is the rule ccontr for classical proofs by contradiction:
have "<expression>"
proof (rule ccontr)
assume "¬ <expression>"
then show False sorry
qed
It may sometimes help to use by contradiction to prove the last step.
There is also the rule classical (which looks less intuitive):
have "<expression>"
proof (rule classical)
assume "¬ <expression>"
then show "<expression>" sorry
qed
For further examples using classical, see $ISABELLE_HOME/src/HOL/Isar_Examples/Drinker.thy
For better understanding of rule classical it can be printed in structured Isar style like this:
print_statement classical
Output:
theorem classical:
obtains "¬ thesis"
Thus the pure evil to intuitionists appears a bit more intuitive: in order to prove some arbitrary thesis, we may assume that its negation holds.
The corresponding canonical proof pattern is this:
notepad
begin
have A
proof (rule classical)
assume "¬ ?thesis"
then show ?thesis sorry
qed
end
Here ?thesis is the concrete thesis of the above claim of A, which may be an arbitrarily complex statement. This quasi abstraction via the abbreviation ?thesis is typical for idiomatic Isar, to emphasize the structure of reasoning.

pos_tag in NLTK does not tag sentences correctly

I have used this code:
# Step 1 : TOKENIZE
from nltk.tokenize import *
words = word_tokenize(text)
# Step 2 : POS DISAMBIG
from nltk.tag import *
tags = pos_tag(words)
to tag two sentences:
John is very nice. Is John very nice?
John in the first sentence was NN while in the second was VB! So, how can we correct pos_tag function without training back-off taggers?
Modified question:
I have seen the demonstration of NLTK taggers here http://text-processing.com/demo/tag/. When I tried the option "English Taggers & Chunckers: Treebank" or "Brown Tagger", I get the correct tags. So how to use Brown Tagger for example without training it?
Short answer: you can't. Slightly longer answer: you can override specific words using a manually created UnigramTagger. See my answer for custom tagging with nltk for details on this method.
I tried to reproduce the bug using NLTK v3.0. I think now nltk.pos_tag() is fixed. As #Jacob mentioned, you can use Brown Corpus to train a tagger(nltk in python) as follows;
from nltk.corpus import brown
train_sents = brown.tagged_sents()
unigram_tagger = nltk.UnigramTagger(train_sents)
tokens=nltk.word_tokenize("Is John very nice?")
tagged=unigram_tagger.tag(tokens)
tagged
But note that The tag set depends on the corpus that was used to train the tagger. The default tagger of nltk.pos_tag() uses the Penn Treebank Tag Set.

What do the abbreviations in POS tagging etc mean?

Say I have the following Penn Tree:
(S (NP-SBJ the steel strike)
(VP lasted
(ADVP-TMP (ADVP much longer)
(SBAR than
(S (NP-SBJ he)
(VP anticipated
(SBAR *?*))))))
.)
What do abbrevations like VP and SBAR etc mean? Where can I find these definitions? What are these abbreviations called?
Those are the Penn Treebank tags, for example, VP means "Verb Phrase". The full list can be found here
The full list of Penn Treebank POS tags (so-called tagset) including examples can be found on https://www.sketchengine.eu/penn-treebank-tagset/
If you are interested in detail information on POS tag or POS tagging, see a brief manual for beginners on https://www.sketchengine.co.uk/pos-tags/
VP means verb phrase . these are standard abbreviation in the treebank.