ANTLR: MismatchedTokenException with similar literals - exception

I have the following
rule : A B;
A : 'a_e' | 'a';
B : '_b';
Input:
a_b //dont work
a_e_b //works
Why is the lexer having trouble matching this? When ANTLR matches the 'a_' in 'a_b' shouldnt it backtrack or use lookahead or something to see it cant match a token A and then decide to match token A as 'a' and then procede to match token B as '_b'?
I think ive missunderstood something very basic about how antlr works. Ive tried to read up on it in the ANTLR doc and google. But i have little experience wokring with lexers and parsers.
Thank you very much for any help.

You need to use a syntactic predicate to distinguish the 'a', '_', 'e' and 'b'.
The following will work:
grammar T;
rule : A B;
B : '_b';
A : ('a_e')=>'a_e'
| 'a' ;
This parses 'a_e_b' and 'a_b' as you expect.
Recommend checking chapter 13 of The Definitive ANTLR Reference.

Related

Failed Predicate Exception

Can anyone give me some examples of Failed Predicate exception in ANTLR.
and some examples that will clearly explain :- input mismatch VS failed predicate VS no viable alt exceptions in ANTLR. thanks in advance.
The names of the exceptions pretty much explain when they can appear. Only the failed predicate exception is a bit special.
NoViableAlt:
Thrown when you try to match block with several alternatives and none matches. Example:
r: 'a' | 'b';
Input: 'c'.
InputMismatch:
Thrown when you have input that partially matches. Example:
r: 'a' 'b' 'c' EOF;
Input: 'a' 'b' or 'a' 'c' EOF.
FailedPredicateException:
Thrown in certain situations where a path is guarded by a predicate and this path is the only possible match (or a required match), but the predicate prevents matching it. For example:
... | a ({condition}? b)
However if the block with the predicate is optional (so it's not required) then no predicate exception is thrown, like with:
... | a ({condition}? b)?
For more advanced use of these exceptions for generating better error messages see this error listener.

How to turn prolog predicates into JSON?

I wonder if there's a way to return a JSON object in SWI-Prolog, such that the predicate names become the keys, and the instantiated variables become the values. For example:
get_fruit(JS_out):-
apple(A),
pear(P),
to_json(..., JS_out). # How to write this part?
apple("Gala").
pear("Bartlett").
I'm expecting JS_out to be:
JS_out = {"apple": "Gala", "pear": "Bartlett"}.
I couldn't figure out how to achieve this by using prolog_to_json/3 or other built-in functions. While there are lost of posts on reading Json into Prolog, I can't find many for the other way around. Any help is appreciated!
Given hard coded facts as shown, the simple solution is:
get_fruit(JS_out) :- apple(A), pear(P), JS_out = {"apple" : A, "pear": B}.
However, in Prolog, you don't need the extra variable. You can write this as:
get_fruit({"apple" : A, "pear": B}) :- apple(A), pear(P).
You could generalize this somewhat based upon two fruits of any kind:
get_fruit(Fruit1, Fruit2, {Fruit1 : A, Fruit2 : B}) :-
call(Fruit1, A),
call(Fruit2, B).
With a bit more work, it could be generalized to any number of fruits.
As an aside, it is a common beginner's mistake to think that is/2 is some kind of general assignment operator, but it is not. It is strictly for arithmetic expression evaluation and assumes that the second argument is a fully instantiated and evaluable arithmetic expression using arithmetic operators supported by Prolog. The first argument is a variable or a numeric value. Anything not meeting these criteria will always fail or generate an error.

What type of programming is this?

I just finished taking an entry placement test for computer science as in college. I passed, but missed a bunch of questions in a specific category: variable assignment. I want to make sure I understand this before moving on.
It started out with easy things, like "set age equal to age"
int age = 18, pretty simple
But then, it had a question which I had no clue how to approach. It went something like...
"Determine if character c is is in alphabet and assign to a variable"
I could easily do that with a function, but the issue is, it gave me literally a line to write my entire answer (so about 50 characters max). Here is how the answer box looked:
My first thought was to do something like
in_alphabet = function(c) {
var alphabet = ["a", "b" ... "z"]
if(alphabet.indexOf(c) != -1)
return true;
}
But this solution has two issues:
How can I set the "c" value when the whole function is equal to in_alphabet?
I can't fit this into the small answer box. I am 99% sure they were looking for something else. Does anybody know what they were looking for? I can't think of a one line solution for this
Language doesn't matter (although a solution in java/c++ would be preferred). I would appreciate any guidance (doesn't have to be a solution, I just don't even know where to begin)
The question "Determine if character c is is in alphabet and assign to a variable" does not ask you to create a function (although in many languages this would be the best way to do this).
In R you could do something like:
inAlphabet <- c %in% letters
So you can certainly do it in one line in some real-world languages. Note that letters is a built-in list of characters.
It's a VBA solution and returns C in the variable:
LetterC = Mid("ABCDEFGHIJKLMNOPQRSTUVWXYZ", InStr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "C"), 1)
Is that what you're after?
Many languages have a data type that represents a single character, and they often can be compared using binary operators like < > <= >=, wherein the characters are compared numerically.
So something like this should suffice:
in_alphabet = c >= 'a' && c <= 'z'
And some languages already have built in methods to do things similar to this (e.g., Character.isLetter).
I copied straight from How to check if character is a letter in Javascript?
in_alphabet = c.length === 1 && c.match(/[a-z]/i)? str : ""
In Java, Character.isLetter(c)
In .NET, Char.IsLetter(c)
Perhaps you were being tested on knowledge of basic data types and some of the facilities they provide.

what is an ambiguous context free grammar?

I'm not really very clear on the concept of ambiguity in context free grammars. If anybody could help me out and explain the concept or provide a good resource I'd greatly appreciate it.
T * U;
Is that a pointer declaration or a multiplication? You can't tell until you know what T and U actually are.
So the syntax of the expression depends on the semantics (meaning) of the expression. That's not context-free -- in a context-free language, that could only be one thing, not two. (This is why they didn't let expressions like that be valid statements in D.)
Another example:
T<U> V;
Is that a template usage or is that a greater-than and less-than operation? (This is why they changed the syntax to T!(U) V in D -- parentheses only have one use, whereas carets have another use.)
How would you parse this:
if condition_1 then if condition_2 then action_1 else action_2
To which "if" does the "else" belong?
In Python, they are:
if condition_1:
if condition_2:
action_1
else:
action_2
and:
if condition_1:
if condition_2:
action_1
else:
action_2
Consider an input string recognized by context-free grammar. The string is derived ambiguously if it has two or more different leftmost derivations, or parse trees of you wish. A grammar is ambiguous if it generates strings ambiguously.
For example, the grammar S -> E + E | E * E is an ambiguous grammar as it derives the string x + x * x ambiguously, in other words there are more than one parse tree to represent the expression (there are two actually).
The grammar can be made non-ambiguous by changing the grammar to:
E -> E + T | T
T -> T * F | F
F -> (E) | x
The refactored grammar will always derive the string unambiguously, i.e. the derivation will always produce the same parse tree.

Why do programming languages not allow spaces in identifiers?

This may seem like a dumb question, but still I don't know the answer.
Why do programming languages not allow spaces in the names ( for instance method names )?
I understand it is to facilitate ( allow ) the parsing, and at some point it would be impossible to parse anything if spaces were allowed.
Nowadays we are so use to it that the norm is not to see spaces.
For instance:
object.saveData( data );
object.save_data( data )
object.SaveData( data );
[object saveData:data];
etc.
Could be written as:
object.save data( data ) // looks ugly, but that's the "nature" way.
If it is only for parsing, I guess the identifier could be between . and ( of course, procedural languages wouldn't be able to use it because there is no '.' but OO do..
I wonder if parsing is the only reason, and if it is, how important it is ( I assume that it will be and it will be impossible to do it otherwise, unless all the programming language designers just... forget the option )
EDIT
I'm ok with identifiers in general ( as the fortran example ) is bad idea. Narrowing to OO languages and specifically to methods, I don't see ( I don't mean there is not ) a reason why it should be that way. After all the . and the first ( may be used.
And forget the saveData method , consider this one:
key.ToString().StartsWith("TextBox")
as:
key.to string().starts with("textbox");
Be cause i twoul d makepa rsing suc hcode reallydif ficult.
I used an implementation of ALGOL (c. 1978) which—extremely annoyingly—required quoting of what is now known as reserved words, and allowed spaces in identifiers:
"proc" filter = ("proc" ("int") "bool" p, "list" l) "list":
"if" l "is" "nil" "then" "nil"
"elif" p(hd(l)) "then" cons(hd(l), filter(p,tl(l)))
"else" filter(p, tl(l))
"fi";
Also, FORTRAN (the capitalized form means F77 or earlier), was more or less insensitive to spaces. So this could be written:
799 S = FLO AT F (I A+I B+I C) / 2 . 0
A R E A = SQ R T ( S *(S - F L O ATF(IA)) * (S - FLOATF(IB)) *
+ (S - F LOA TF (I C)))
which was syntactically identical to
799 S = FLOATF (IA + IB + IC) / 2.0
AREA = SQRT( S * (S - FLOATF(IA)) * (S - FLOATF(IB)) *
+ (S - FLOATF(IC)))
With that kind of history of abuse, why make parsing difficult for humans? Let alone complicate computer parsing.
Yes, it's the parsing - both human and computer. It's easier to read and easier to parse if you can safely assume that whitespace doesn't matter. Otherwise, you can have potentially ambiguous statements, statements where it's not clear how things go together, statements that are hard to read, etc.
Such a change would make for an ambiguous language in the best of cases. For example, in a C99-like language:
if not foo(int x) {
...
}
is that equivalent to:
A function definition of foo that returns a value of type ifnot:
ifnot foo(int x) {
...
}
A call to a function called notfoo with a variable named intx:
if notfoo(intx) {
...
}
A negated call to a function called foo (with C99's not which means !):
if not foo(intx) {
...
}
This is just a small sample of the ambiguities you might run into.
Update: I just noticed that obviously, in a C99-like language, the condition of an if statement would be enclosed in parentheses. Extra punctuation can help with ambiguities if you choose to ignore whitespace, but your language will end up having lots of extra punctuation wherever you would normally have used whitespace.
Before the interpreter or compiler can build a parse tree, it must perform lexical analysis, turning the stream of characters into a stream of tokens. Consider how you would want the following parsed:
a = 1.2423 / (4343.23 * 2332.2);
And how your rule above would work on it. Hard to know how to lexify it without understanding the meaning of the tokens. It would be really hard to build a parser that did lexification at the same time.
There are a few languages which allow spaces in identifiers. The fact that nearly all languages constrain the set of characters in identifiers is because parsing is more easy and most programmers are accustomed to the compact no-whitespace style.
I don’t think there’s real reason.
Check out Stroustrup's classic Generalizing Overloading for C++2000.
We were allowed to put spaces in filenames back in the 1960's, and computers still don't handle them very well (everything used to break, then most things, now it's just a few things - but they still break).
We simply can't wait another 50 years before our code will work again.
:-)
(And what everyone else said, of course. In English, we use spaces and punctuation to separate the words. The same is true for computer languages, except that computer parsers define "words" in a slightly different sense)
Using space as part of an identifier makes parsing really murky (is that a syntactic space or an identifier?), but the same sort "natural reading" behavior is achieved with keyword arguments. object.save(data: something, atomically: true)
The TikZ language for creating graphics in LaTeX allows whitespace in parameter names (also known as 'keys'). For instance, you see things like
\shade[
top color=yellow!70,
bottom color=red!70,
shading angle={45},
]
In this restricted setting of a comma-separated list of key-value pairs, there's no parsing difficulty. In fact, I think it's much easier to read than the alternatives like topColor, top_color or topcolor.