what is an ambiguous context free grammar? - terminology

I'm not really very clear on the concept of ambiguity in context free grammars. If anybody could help me out and explain the concept or provide a good resource I'd greatly appreciate it.

T * U;
Is that a pointer declaration or a multiplication? You can't tell until you know what T and U actually are.
So the syntax of the expression depends on the semantics (meaning) of the expression. That's not context-free -- in a context-free language, that could only be one thing, not two. (This is why they didn't let expressions like that be valid statements in D.)
Another example:
T<U> V;
Is that a template usage or is that a greater-than and less-than operation? (This is why they changed the syntax to T!(U) V in D -- parentheses only have one use, whereas carets have another use.)

How would you parse this:
if condition_1 then if condition_2 then action_1 else action_2
To which "if" does the "else" belong?
In Python, they are:
if condition_1:
if condition_2:
action_1
else:
action_2
and:
if condition_1:
if condition_2:
action_1
else:
action_2

Consider an input string recognized by context-free grammar. The string is derived ambiguously if it has two or more different leftmost derivations, or parse trees of you wish. A grammar is ambiguous if it generates strings ambiguously.
For example, the grammar S -> E + E | E * E is an ambiguous grammar as it derives the string x + x * x ambiguously, in other words there are more than one parse tree to represent the expression (there are two actually).
The grammar can be made non-ambiguous by changing the grammar to:
E -> E + T | T
T -> T * F | F
F -> (E) | x
The refactored grammar will always derive the string unambiguously, i.e. the derivation will always produce the same parse tree.

Related

why aren't anonymous functions in Lua expressions?

Can anyone explain to me why the anonymous function construct in Lua isn't a fully fledged expression? To me this seems an oddity: it goes (slightly) against the idea that functions should be first class objects, and is (not often but occasionally) an inconvenience in what is mostly a really well-thought out and elegant language.
example, using the command line Lua, with workaround
Lua 5.3.3 Copyright (C) 1994-2016 Lua.org, PUC-Rio
> function(x) return x*x end (2)
stdin:1: <name> expected near '('
> square = function(x) return x*x end
> square(2)
4
Lua's function call syntax has some syntactic sugar built into it. You can call functions with 3 things:
A parenthesized list of values.
A table constructor (the function will take the table as a single argument).
A string literal.
Lua wants to be somewhat regular in its grammar. So if there's a thing which you can call as a function in one of these ways, then it should make sense to be able to call it in any of these ways.
Consider the following code:
local value = function(args)
--does some stuff
end "I'm a literal" .. foo
If we allow arbitrary, unparenthesized expressions to be called just like any other function call, then this means to create a function, invoke it with the string literal, concatenate the result of that function call with foo, and store that in value.
But... do we actually want that to work? That is, do we want people to be able to write that and have it be valid Lua code?
If such code is considered unsightly or confusing, then there are a few options.
Lua could just not have function calls with string literals. You're only saving 2 parentheses, after all. Maybe even don't allow table constructors as well, though those are less unsightly and far less confusing. Make everyone use parentheses for all function calls.
Lua could make it so that only in the cases of lambdas are function calls with string literals prevented. This would require substantially de-regularizing the grammar.
Lua could force you to parenthesize any construct where calling a function is not an obviously intended result of the preceding text.
Now, one might argue that table_name[var_name] "literal" is already rather confusing as to what is going on. But again, preventing that specifically would require de-regularizing the grammar. You'd have to add in all of these special cases where something like name "literal" is a function call but name.name "literal" is not. So option 2 is out.
The ability to call a function with a string literal is hardly limited to Lua. JavaScript can do it, but you have to use a specific literal syntax to get it. Plus, being able to type require "module_name" feels like a good idea. Since such calls are considered an important piece of syntactic sugar, supported by several languages, option #1 is out.
So your only option is #3: make people parenthesize expressions that they want to call.
Oh I see.. round brackets are needed, sorry.
(function(x) return x*x end) (2)
I still don't see why it is designed like that.
Short Answer
To call a function, the function expression must be either a name, an indexed value, another function call, or an expression inside parentheses.
Long Answer
I don't know why it's designed that way, but I did look up the grammar to see exactly how it works. Here's the entry for a function call:
functioncall ::= prefixexp args | prefixexp ‘:’ Name args
"args" is just a list of arguments in parentheses. The relevant part is "prefixexp".
prefixexp ::= var | functioncall | ‘(’ exp ‘)’
Ok, so we can call another "functioncall". "exp" is just a normal expression:
exp ::= nil | false | true | Numeral | LiteralString | ‘...’ | functiondef | prefixexp | tableconstructor | exp binop exp | unop exp
So we can call any expression as long as it's inside parentheses. "functiondef" covers anonymous functions:
functiondef ::= function funcbody
funcbody ::= ‘(’ [parlist] ‘)’ block end
So an anonymous function is an "exp", but not a "prefixexp", so we do need parentheses around it.
What is "var"?
var ::= Name | prefixexp ‘[’ exp ‘]’ | prefixexp ‘.’ Name
"var" is either a name or an indexed value (usually a table). Note that the indexed value must be a "prefixexp", which means a string literal or table constructor must be in parentheses before we can index them.
To sum up: A called function must be either a name, an indexed value, a function call, or some other expression inside parentheses.
The big question is: Why is "prefixexp" treated differently from "exp"? I don't know. I suspect it has something to do with keeping function calls and indexing outside the normal operator precedence, but I don't know why that's necessary.

How to turn prolog predicates into JSON?

I wonder if there's a way to return a JSON object in SWI-Prolog, such that the predicate names become the keys, and the instantiated variables become the values. For example:
get_fruit(JS_out):-
apple(A),
pear(P),
to_json(..., JS_out). # How to write this part?
apple("Gala").
pear("Bartlett").
I'm expecting JS_out to be:
JS_out = {"apple": "Gala", "pear": "Bartlett"}.
I couldn't figure out how to achieve this by using prolog_to_json/3 or other built-in functions. While there are lost of posts on reading Json into Prolog, I can't find many for the other way around. Any help is appreciated!
Given hard coded facts as shown, the simple solution is:
get_fruit(JS_out) :- apple(A), pear(P), JS_out = {"apple" : A, "pear": B}.
However, in Prolog, you don't need the extra variable. You can write this as:
get_fruit({"apple" : A, "pear": B}) :- apple(A), pear(P).
You could generalize this somewhat based upon two fruits of any kind:
get_fruit(Fruit1, Fruit2, {Fruit1 : A, Fruit2 : B}) :-
call(Fruit1, A),
call(Fruit2, B).
With a bit more work, it could be generalized to any number of fruits.
As an aside, it is a common beginner's mistake to think that is/2 is some kind of general assignment operator, but it is not. It is strictly for arithmetic expression evaluation and assumes that the second argument is a fully instantiated and evaluable arithmetic expression using arithmetic operators supported by Prolog. The first argument is a variable or a numeric value. Anything not meeting these criteria will always fail or generate an error.

Haskell function definition convention

I am beginner in Haskell .
The convention used in function definition as per my school material is actually as follows
function_name arguments_separated_by_spaces = code_to_do
ex :
f a b c = a * b +c
As a mathematics student I am habituated to use the functions like as follows
function_name(arguments_separated_by_commas) = code_to_do
ex :
f(a,b,c) = a * b + c
Its working in Haskell .
My doubt is whether it works in all cases ?
I mean can i use traditional mathematical convention in Haskell function definition also ?
If wrong , in which specific cases the convention goes wrong ?
Thanks in advance :)
Let's say you want to define a function that computes the square of the hypoteneuse of a right-triangle. Either of the following definitions are valid
hyp1 a b = a * a + b * b
hyp2(a,b) = a * a + b * b
However, they are not the same function! You can tell by looking at their types in GHCI
>> :type hyp1
hyp1 :: Num a => a -> a -> a
>> :type hyp2
hyp2 :: Num a => (a, a) -> a
Taking hyp2 first (and ignoring the Num a => part for now) the type tells you that the function takes a pair (a, a) and returns another a (e.g it might take a pair of integers and return another integer, or a pair of real numbers and return another real number). You use it like this
>> hyp2 (3,4)
25
Notice that the parentheses aren't optional here! They ensure that the argument is of the correct type, a pair of as. If you don't include them, you will get an error (which will probably look really confusing to you now, but rest assured that it will make sense when you've learned about type classes).
Now looking at hyp1 one way to read the type a -> a -> a is it takes two things of type a and returns something else of type a. You use it like this
>> hyp1 3 4
25
Now you will get an error if you do include parentheses!
So the first thing to notice is that the way you use the function has to match the way you defined it. If you define the function with parens, you have to use parens every time you call it. If you don't use parens when you define the function, you can't use them when you call it.
So it seems like there's no reason to prefer one over the other - it's just a matter of taste. But actually I think there is a good reason to prefer one over the other, and you should prefer the style without parentheses. There are three good reasons:
It looks cleaner and makes your code easier to read if you don't have parens cluttering up the page.
You will take a performance hit if you use parens everywhere, because you need to construct and deconstruct a pair every time you use the function (although the compiler may optimize this away - I'm not sure).
You want to get the benefits of currying, aka partially applied functions*.
The last point is a little subtle. Recall that I said that one way to understand a function of type a -> a -> a is that it takes two things of type a, and returns another a. But there's another way to read that type, which is a -> (a -> a). That means exactly the same thing, since the -> operator is right-associative in Haskell. The interpretation is that the function takes a single a, and returns a function of type a -> a. This allows you to just provide the first argument to the function, and apply the second argument later, for example
>> let f = hyp1 3
>> f 4
25
This is practically useful in a wide variety of situations. For example, the map functions lets you apply some function to every element of a list -
>> :type map
map :: (a -> b) -> [a] -> [b]
Say you have the function (++ "!") which adds a bang to any String. But you have lists of Strings and you'd like them all to end with a bang. No problem! You just partially apply the map function
>> let bang = map (++ "!")
Now bang is a function of type**
>> :type bang
bang :: [String] -> [String]
and you can use it like this
>> bang ["Ready", "Set", "Go"]
["Ready!", "Set!", "Go!"]
Pretty useful!
I hope I've convinced you that the convention used in your school's educational material has some pretty solid reasons for being used. As someone with a math background myself, I can see the appeal of using the more 'traditional' syntax but I hope that as you advance in your programming journey, you'll be able to see the advantages in changing to something that's initially a bit unfamiliar to you.
* Note for pedants - I know that currying and partial application are not exactly the same thing.
** Actually GHCI will tell you the type is bang :: [[Char]] -> [[Char]] but since String is a synonym for [Char] these mean the same thing.
f(a,b,c) = a * b + c
The key difference to understand is that the above function takes a triple and gives the result. What you are actually doing is pattern matching on a triple. The type of the above function is something like this:
(a, a, a) -> a
If you write functions like this:
f a b c = a * b + c
You get automatic curry in the function.
You can write things like this let b = f 3 2 and it will typecheck but the same thing will not work with your initial version. Also, things like currying can help a lot while composing various functions using (.) which again cannot be achieved with the former style unless you are trying to compose triples.
Mathematical notation is not consistent. If all functions were given arguments using (,), you would have to write (+)((*)(a,b),c) to pass a*b and c to function + - of course, a*b is worked out by passing a and b to function *.
It is possible to write everything in tupled form, but it is much harder to define composition. Whereas now you can specify a type a->b to cover for functions of any arity (therefore, you can define composition as a function of type (b->c)->(a->b)->(a->c)), it is much trickier to define functions of arbitrary arity using tuples (now a->b would only mean a function of one argument; you can no longer compose a function of many arguments with a function of many arguments). So, technically possible, but it would need a language feature to make it simple and convenient.

Erlang pattern matching with functions

As Erlang is an almost pure functional programming language, I'd imagine this was possible:
case X of
foo(Z) -> ...
end.
where foo(Z) is a decidable-invertible pure (side-effect free) bijective function, e.g.:
foo(input) -> output.
Then, in the case that X = output, Z would match as input.
Is it possible to use such semantics, with or without other syntax than my example, in Erlang?
No, what you want is not possible.
To do something like this you would need to be able to find the inverse of any bijective function, which is obviously undecidable.
I guess the reason why that is not allowed is that you want to guarantee the lack of side effects. Given the following structure:
case Expr of
Pattern1 [when GuardSeq1] ->
Body1;
...;
PatternN [when GuardSeqN] ->
BodyN
end
After you evaluate Expr, the patterns are sequentially matched against the result of Expr. Imagine your foo/1 function contains a side effect (e.g. it sends a message):
foo(input) ->
some_process ! some_msg,
output.
Even if the first pattern wouldn't match, you would have sent the message anyway and you couldn't recover from that situation.
No, Erlang only supports literal patterns!
And your original request is not an easy one. Just because there is a an inverse doesn't mean that it is easy to find. Practically it would that the compiler would have to make two versions of functions.
What you can do is:
Y = foo(Z),
case X of
Y -> ...
end.

Why do programming languages not allow spaces in identifiers?

This may seem like a dumb question, but still I don't know the answer.
Why do programming languages not allow spaces in the names ( for instance method names )?
I understand it is to facilitate ( allow ) the parsing, and at some point it would be impossible to parse anything if spaces were allowed.
Nowadays we are so use to it that the norm is not to see spaces.
For instance:
object.saveData( data );
object.save_data( data )
object.SaveData( data );
[object saveData:data];
etc.
Could be written as:
object.save data( data ) // looks ugly, but that's the "nature" way.
If it is only for parsing, I guess the identifier could be between . and ( of course, procedural languages wouldn't be able to use it because there is no '.' but OO do..
I wonder if parsing is the only reason, and if it is, how important it is ( I assume that it will be and it will be impossible to do it otherwise, unless all the programming language designers just... forget the option )
EDIT
I'm ok with identifiers in general ( as the fortran example ) is bad idea. Narrowing to OO languages and specifically to methods, I don't see ( I don't mean there is not ) a reason why it should be that way. After all the . and the first ( may be used.
And forget the saveData method , consider this one:
key.ToString().StartsWith("TextBox")
as:
key.to string().starts with("textbox");
Be cause i twoul d makepa rsing suc hcode reallydif ficult.
I used an implementation of ALGOL (c. 1978) which—extremely annoyingly—required quoting of what is now known as reserved words, and allowed spaces in identifiers:
"proc" filter = ("proc" ("int") "bool" p, "list" l) "list":
"if" l "is" "nil" "then" "nil"
"elif" p(hd(l)) "then" cons(hd(l), filter(p,tl(l)))
"else" filter(p, tl(l))
"fi";
Also, FORTRAN (the capitalized form means F77 or earlier), was more or less insensitive to spaces. So this could be written:
799 S = FLO AT F (I A+I B+I C) / 2 . 0
A R E A = SQ R T ( S *(S - F L O ATF(IA)) * (S - FLOATF(IB)) *
+ (S - F LOA TF (I C)))
which was syntactically identical to
799 S = FLOATF (IA + IB + IC) / 2.0
AREA = SQRT( S * (S - FLOATF(IA)) * (S - FLOATF(IB)) *
+ (S - FLOATF(IC)))
With that kind of history of abuse, why make parsing difficult for humans? Let alone complicate computer parsing.
Yes, it's the parsing - both human and computer. It's easier to read and easier to parse if you can safely assume that whitespace doesn't matter. Otherwise, you can have potentially ambiguous statements, statements where it's not clear how things go together, statements that are hard to read, etc.
Such a change would make for an ambiguous language in the best of cases. For example, in a C99-like language:
if not foo(int x) {
...
}
is that equivalent to:
A function definition of foo that returns a value of type ifnot:
ifnot foo(int x) {
...
}
A call to a function called notfoo with a variable named intx:
if notfoo(intx) {
...
}
A negated call to a function called foo (with C99's not which means !):
if not foo(intx) {
...
}
This is just a small sample of the ambiguities you might run into.
Update: I just noticed that obviously, in a C99-like language, the condition of an if statement would be enclosed in parentheses. Extra punctuation can help with ambiguities if you choose to ignore whitespace, but your language will end up having lots of extra punctuation wherever you would normally have used whitespace.
Before the interpreter or compiler can build a parse tree, it must perform lexical analysis, turning the stream of characters into a stream of tokens. Consider how you would want the following parsed:
a = 1.2423 / (4343.23 * 2332.2);
And how your rule above would work on it. Hard to know how to lexify it without understanding the meaning of the tokens. It would be really hard to build a parser that did lexification at the same time.
There are a few languages which allow spaces in identifiers. The fact that nearly all languages constrain the set of characters in identifiers is because parsing is more easy and most programmers are accustomed to the compact no-whitespace style.
I don’t think there’s real reason.
Check out Stroustrup's classic Generalizing Overloading for C++2000.
We were allowed to put spaces in filenames back in the 1960's, and computers still don't handle them very well (everything used to break, then most things, now it's just a few things - but they still break).
We simply can't wait another 50 years before our code will work again.
:-)
(And what everyone else said, of course. In English, we use spaces and punctuation to separate the words. The same is true for computer languages, except that computer parsers define "words" in a slightly different sense)
Using space as part of an identifier makes parsing really murky (is that a syntactic space or an identifier?), but the same sort "natural reading" behavior is achieved with keyword arguments. object.save(data: something, atomically: true)
The TikZ language for creating graphics in LaTeX allows whitespace in parameter names (also known as 'keys'). For instance, you see things like
\shade[
top color=yellow!70,
bottom color=red!70,
shading angle={45},
]
In this restricted setting of a comma-separated list of key-value pairs, there's no parsing difficulty. In fact, I think it's much easier to read than the alternatives like topColor, top_color or topcolor.