NLTK multiple grammar rules optional CFG - nltk

I'm playing with NLTK and I meet a problem with the grammar.
(I didn't find any topics about this problem)
For example with a grammar gram.cfg:
S -> NP VP
NP -> 'I'
VP -> V ADJ WHO
ADJ -> 'tall' | 'big' | 'white'
V -> 'am'
WHO -> 'Groot'
and the sentence
"I am tall Groot" it's worked.
I want to have a grammar like
VP -> V (ADJ)* WHO
to have the possibility to obtain sentences like:
"I am white big tall Groot"
"I am big Groot"
"I am tall white Groot"
Only with the same rule VP
How can I do to multiply possibilities with one grammar rule (as showed in example) ?
Is there a documentation about that ? (dynamic rules, about optional rules, undefined number of rules ...etc)

If it is possible for you to use a recursive grammar the following may work.
ADJ -> 'tall' | 'big' | 'white' | ADJ

Related

How can I make this function more elegant

I have the function:
wrap :: Text -> [Text] -> Text
wrap x = intercalate "" . map ((<> x) . (x <>))
The purpose of which is to wrap each element of a list with a given string and join them all together.
The brackets around the first argument to map annoy me, and so does the use of "". So I wonder is there a more elegant (or generic, I guess) way to express this function?
(Copied from my comment so the question can be marked as answered.)
You could use foldMap f instead of intercalate "" . map f. Note that intercalate "" is equivalent to Data.Text.concat.
Just to put my hat in the ring... Since the pattern is
xexxexxex
(where the es are placeholders for elements of the original list), another way you can build this output is by putting two xs between each element, and wrapping the bookends manually. So:
wrap x es = x <> intercalate (x <> x) es <> x
One small but nice feature of this rewrite is that for input lists of length n, this will incur only n+2 calls to (<>) rather than 3n-1 as in theindigamer's answer.

How does the Logical Equivalence Distributive Law make sense?

I understand that a truth table can prove the Distributive Law as a Logical Equivalence:
p V (q ^ r) <=> (p V q) ^ (p V r)
However, this makes no intuitive sense to me. Here is the contradiction I see: if p and q are both true, then wouldn't that result in p ^ q? that can work with the expression on the right, but that doesn't seem to work with the expression on the left. As I see it (and there must be something wrong with how I see it), either only p is true, or only q and r are true, according to the left expression.
Is anyone able to explain to me how this makes sense?
Let me know if I need to clarify anything.
The left hand equation is saying that either p is true or q and r are true. It does not say either p and only p is true, or q and r are only true.
For your example, p^q=> p (it also implies q, and pvq), which makes both sides true.
For example, in English the first equation says that at least one of the following is true
Pablo can swim OR
Quincy and Reginald can swim
If all three of them are true the statement is also true.
The one on the right says both of the following are true
Pablo or Quincy can swim AND
Pablo or Reginald can swim
If we have Pablo and Quincy can swim (your example), then we see that both statements hold. Pablo can swim so the first expression works because of its first clause. For the second expression since Pablo can swim both of its parts are true so it also holds.
I suspect you are using a colloquial meaning of "or", in the sense of "one or the other, but not both." E.g., "Choose the red pen or the blue pen." The meaning of "or" in formal logic is "at least one is true." In your hypothetical, certainly p^q, but the the values of q & r are irrelevant when p.

HTML Layout: mixing [Element] and [Signal Element] in an Elm web apge

I am reading about flow down and it's suppose to let us stack elements vertically on our web site. What are you supposed to do when when parts of your website are signals? I would picture a web site like this:
Introduction
Dynamic Component
More Static Text
The type of flow down: [Element] -> Element so I can't just mix in [signal Element] as I would like. In a previous solution I saw solutions involving lift so here's what I came up with:
import Random
main = column <~ (constant "5") ~ (Random.range 0 100 (every second))
column x y = flow down [asText x, asText y]
Here I just stack the number 5 on top of a randomly changing number. Perhaps it depends depends on the Window size,
import Random
import Window
main = column <~ (constant "5") ~ Window.dimensions
column x y = flow down [asText x, asText y]
Is this considered good practice or are there better ways of doing layout in Elm?
Extracting a non-signal function and lifting it is generally good practice. In this case you could also use Signal.Extra.combine : [Signal a] -> Signal [a] if you like:
main = flow down <~ combine [constant (asText "5"), asText <~ Window.dimensions]
As you can see, there is a lot more lifting going on than in your solution, just to get it into a one-liner. So I don't think it's ideal. But combine can be handy in other (more dynamic) situations.
Full disclosure: I'm the author of the library function that I linked to.
Up to date answer.
Either you use combine, which is now in the Signal-extra library or, for this simple case
column x y =
Signal.map (flow down) <|
Signal.map2 (\a b -> [a, b]) (show x) (show y)

Get the most probable color from a words set

Are there any libraries existing or methods that let you to figure out the most probable color for a words set? For example, cucumber, apple, grass, it gives me green color. Did anyone work in that direction before?
If i have to do that, i will try to search images based on the words using google image or others and recognize the most common color of top n results.
That sounds like a pretty reasonable NLP problem and one thats very easy to handle via map-reduce.
Identify a list of words and phrases that you call colors ['blue', 'green', 'red', ...].
Go over a large corpus of sentences, and for the sentences that mention a particular color, for every other word in that sentence, note down (word, color_name) in a file. (Map Step)
Then for each word you have seen in your corpus, aggregate all the colors you have seen for it to get something like {'cucumber': {'green': 300, 'yellow': 34, 'blue': 2}, 'tomato': {'red': 900, 'green': 430'}...} (Reduce Step)
Provided you use a large enough corpus (something like wikipedia), and you figure out how to prune really small counts, rare words, you should be able to make pretty comprehensive and robust dictionary mapping millions of the items to their colors.
Another way to do that is to do a text search in google for combinations of colors and the word in question and take the combination with the highest number of results. Here's a quick Python script for that:
import urllib
import json
import itertools
def google_count(q):
query = urllib.urlencode({'q': q})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.urlopen(url)
search_results = search_response.read()
results = json.loads(search_results)
data = results['responseData']
return int(data['cursor']['estimatedResultCount'])
colors = ['yellow', 'orange', 'red', 'purple', 'blue', 'green']
# get a list of google search counts
res = [google_count('"%s grass"' % c) for c in colors]
# pair the results with their corresponding colors
res2 = list(itertools.izip(res, colors))
# get the color with the highest score
print "%s is %s" % ('grass', sorted(res2)[-1][1])
This will print:
grass is green
Daniel's and Xi.lin's answers are very good ideas. Along the same axis, we could combine both with an approach similar to Xilin's but more simple: Query Google Image with the word you want to find the color associated with + a "Color" filter (see in the lower left bar). And see which color yields more results.
I would suggest using a tightly defined set of sources if possible such as Wikipedia and Wordnet.
Here, for example, is Wordnet for "panda":
S: (n) giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
(large black-and-white herbivorous mammal of bamboo forests of China and Tibet;
in some classifications considered a member of the bear family or of a separate
family Ailuropodidae)
S: (n) lesser panda, red panda, panda, bear cat, cat bear,
Ailurus fulgens (reddish-brown Old World raccoon-like carnivore;
in some classifications considered unrelated to the giant pandas)
Because of the concise, carefully constructed language it is highly likely that any colour words will be important. Here you can see that pandas are both black-and-white and reddish-brown.
If you identify subsections of Wikipedia (e.g. "Botanical Description") this will help to increase the relevance of your results. Also the first image in Wikipedia is very likely to be the best "definitive" one.
But, as with all statistical methods, you will get false positives (and negatives , though these are probably less of a problem).

Given an R,G,B triplet and a factor F, how do I calculate a “watermark” version of the color?

I have an (R, G, B) triplet, where each color is between 0.0 and 1.0 . Given a factor F (0.0 means the original color and 1.0 means white), I want to calculate a new triplet that is the “watermarked” version of the color.
I use the following expression (pseudo-code):
for each c in R, G, B:
new_c ← c + F × (1 - c)
This produces something that looks okayish, but I understand this introduces deviations to the hue of the color (checking the HSV equivalent before and after the transformation), and I don't know if this is to be expected.
Is there a “standard” (with or without quotes) algorithm to calculate the “watermarked” version of the color? If yes, which is it? If not, what other algorithms to the same effect can you tell me?
Actually this looks like it should give the correct hue, minus small variations for arithmetic rounding errors.
This is certainly a reasonable, simple was to achieve a watermark effect. I don't know of any other "standard" ones, there are a few ways you could do it.
Alternatives are:
Blend with white but do it non-linearly on F, e.g. new_c = c + sqrt(F)*(1-c), or you could use other non-linear functions - it might help the watermark look more or less "flat".
You could do it more efficiently by doing the following (where F takes the range 0..INF):
new_c = 1 - (1-c)/pow(2, F)
for real pixel values (0..255) this would convert into:
new_c = 255 - (255-c)>>F
Not only is that reasonably fast in integer arithmetic, but you may be able to do it in a 32b integer in parallel.
Why not just?
new_c = F*c
I think you should go first over watermarking pixels and figure out if it should be darker or lighter.
For lighter the formula might be
new_c=1-F*(c-1)