How do I find character positions in ANTLR 2? - exception

I have a simple grammar, and have produced a pair of c# classes using antlr 2.7.7. When the parser finds an error with a token, it throws an exception; I want to find out how many characters into a parsed stream the token came. How do I do that?

It's been a long time ago since I played with ANTLR, but if I remember well, to do what you want, I had to subclass the parser to keep a counter of characters that was incremented each time a new token was found (with the token length of course).

You ought to read chapter 10 ("Error Reporting and Recovery") from Terrence Parr's book "The Definitive ANTLR Reference".
Not knowing what target language you're using, it'll be hard to tell you exactly what to do. But I'll assume you're using the Java target, and you can correct me if I'm wrong.
When an ANTLR recognizer fails to match an input string, it throws a very specific exception, based on the failure context. (There are nine different kinds of exceptions, RecognitionException is the root type, and it has eight subclasses of its own: MismatchedTokenException, MismatchedTreeNodeException, NoViableAltException, EarlyExitException, FailedPredicateException, MismatchedRangeException, MismatchedSetException, MismatchedNotSetException).
The root exception type (RecognitionException) has a few handy public fields that you might want to take a look at (specifically: "index", "line" and "charPositionInLine"). The "index" field tells you the exact character position where the error was found. The "line" and "charPositionInLine" fields are pretty self-explanatory. Here's the JavaDoc:
http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1_recognition_exception.html

Related

Why is my %h is List = 1,2; a valid assignment?

While finalizing my upcoming Raku Advent Calendar post on sigils, I decided to double-check my understanding of the type constraints that sigils create. The docs describe sigil type constraints with the table
below:
Based on this table (and my general understanding of how sigils and containers work), I strongly expected this code
my %percent-sigil is List = 1,2;
my #at-sigil is Map = :k<v>;
to throw an error.
Specifically, I expected that is List would attempt to bind the %-sigiled variable to a List, and that this would throw an X::TypeCheck::Binding error – the same error that my %h := 1,2 throws.
But it didn't error. The first line created a List that seemed perfectly ordinary in every way, other than the sigil on its variable. And the second created a seemingly normal Map. Neither of them secretly had Scalar intermediaries, at least as far as I could tell with VAR and similar introspection.
I took a very quick look at the World.nqp source code, and it seems at least plausible that discarding the % type constraint with is List is intended behavior.
So, is this behavior correct/intended? If so, why? And how does that fit in with the type constraints and other guarantees that sigils typically provide?
(I have to admit, seeing an %-sigiled variable that doesn't support Associative indexing kind of shocked me…)
I think this is a grey area, somewhere between DIHWIDT (Docter, It Hurts When I Do This) and an oversight in implementation.
Thing is, you can create your own class and use that in the is trait. Basically, that overrides the type with which the object will be created from the default Hash (for %) and Array (for # sigils). As long as you provide the interface methods, it (currently) works. For example:
class Foo {
method AT-KEY($) { 42 }
}
my %h is Foo;
say %h<a>; # 42
However, if you want to pass such an object as an argument to a sub with a % sigil in the signature, it will fail because the class did not consume the Associatve role:
sub bar(%) { 666 }
say bar(%h);
===SORRY!=== Error while compiling -e
Calling bar(A) will never work with declared signature (%)
I'm not sure why the test for Associative (for the % sigil) and Positional (for #) is not enforced at compile time with the is trait. I would assume it was an oversight, maybe something to be fixed in 6.e.
Quoting the Parameters and arguments section of the S06 specification/speculation document about the related issue of binding arguments to routine parameters:
Array and hash parameters are simply bound "as is". (Conjectural: future versions ... may do static analysis and forbid assignments to array and hash parameters that can be caught by it. This will, however, only happen with the appropriate use declaration to opt in to that language version.)
Sure enough the Rakudo compiler implemented some rudimentary static analysis (in its AOT compilation optimization pass) that normally (but see footnote 3 in this SO answer) insists on binding # routine parameters to values that do the Positional role and % ones to Associatives.
I think this was the case from the first official Raku supporting release of Rakudo, in 2016, but regardless, I'm pretty sure the "appropriate use declaration" is any language version declaration, including none. If your/our druthers are static typing for the win for # and % sigils, and I think they are, then that's presumably very appropriate!
Another source is the IRC logs. A quick search quickly got me nothing.
Hmm. Let's check the blame for the above verbiage so I can find when it was last updated and maybe spot contemporaneous IRC discussion. Oooh.
That is an extraordinary read.
"oversight" isn't the right word.
I don't have time tonight to search the IRC logs to see what led up to that commit, but I daresay it's interesting. The previous text was talking about a PL design I really liked the sound of in terms of immutability, such that code could become increasingly immutable by simply swapping out one kind of scalar container for another. Very nice! But reality is important, and Jonathan switched the verbiage to the implementation reality. The switch toward static typing certainty is welcome, but has it seriously harmed the performance and immutability options? I don't know. Time for me to go to sleep and head off for seasonal family visits. Happy holidays...

What mechanism works to show component ID in chisel3 elaboration

Chisel throws an exception with an elaboration error message. The following is a result of my code as an example.
chisel3.core.Binding$ExpectedHardwareException: data to be connected 'chisel3.core.Bool#81' must be hardware, not a bare Chisel type. Perhaps you forgot to wrap it in Wire(_) or IO(_)?
This exception message is interesting because 81 behind chisel3.core.Bool# looks like ID, not hashcode.
Indeed, Data type extends HasId trait which has _id field, and
_id field seems to generate a unique ID for each components.
I've thought Data type overrides toString to make string that has type#ID, but it does not override. That is why $node in below code should not be able to use ID.
throw Binding.ExpectedHardwareException(s"$prefix'$node' must be hardware, " +
"not a bare Chisel type. Perhaps you forgot to wrap it in Wire(_) or IO(_)?")
Instead of toString, toNamed method exists in Data. However, this method seems to be called to generate a firrtl code, not to convert component into string.
Why can Data type show its ID?
If it is not ID, but exactly hashcode, this question is from my misunderstanding.
I think you should take a look at Chisel PR #985. It changes the way that Data's toString method is implemented. I'm not sure if it answers your question directly but it's possible this will make the meaning and location of the error clearer. If not you should comment on it.
Scala classes come with a default toString method that is of the form className#hashCode.
As you noted, the chisel3.core.Bool#81 sure looks like it's using the _id rather than the hashCode. That's because in the most recently published version of Chisel (3.1.6), the hashcode was the id! You can see this if you inspect the source files at the tag for that version: https://github.com/freechipsproject/chisel3/blob/dc4200f8b622e637ec170dc0728c7887a7dbc566/chiselFrontend/src/main/scala/chisel3/internal/Builder.scala#L81
This is no longer the case on master which probably the source of any confusion! As Chick noted, we have just changed the .toString method to be more informative than the default; expect more informative representations in 3.2.0!

How To Raise Exception with Message in Ada

I'm struggling a little bit to raise exceptions the way I'm used to coming from a C# background.
I have a utility function that expects input values to be in a very specific range, as defined by an external standard. If a value is supplied outside of that range (and there is one value in the middle of the range that is also invalid), then I want to raise an exception to the caller so that they break.
From what I can tell, the syntax is raise Invalid_Argument;
But- is it possible to supply a message with the exception? e.g. the Invalid_Argument exception is somewhat self-explanatory, but I could see further detail specifying what was wrong with the argument. How do I write a brief error message to be stuck into the exception?
It used (Ada 95) to be you had to write
Ada.Exceptions.Raise_Exception (Invalid_Argument’Identity,
"message");
(see Ada95RM 11.4.2(6)) but since Ada 2005 you’ve been able to say
raise Invalid_Argument with "message";
(A2005RM 11.3).
Note that the string part is a string expression, so you can add something to describe the actual invalid value if that would be useful.
First, you can define a [sub]type
[sub]type Valid_Range_For_X is [Integer] range 23 .. 2001;
This will catch most invalid values automatically. If you're using Ada 12, you can add
[sub]type Valid_Range_For_X is [Integer] range 23 .. 2001 with
Dynamic_Predicate => Valid_Range_For_X /= 42;
which will catch the internal invalid value as well. It's usually better to let the language do such checks for you than to do them manually.
If you're using an earlier version of Ada, then you'll have to manually check for the internal value. I usually prefer fine-grained exceptions to a general exception used for many things, differentiated by the exception message. So I would raise something like
X_Is_42
rather than
Invalid_Argument with "X is 42"
This makes it easier to distinguish the 42 case from the (often many) other kinds of invalid arguments. I realize not everyone agrees with this.

Types of Errors during Compilation and at Runtime

I have this question in a homework assignment for my Computer Languages class. I'm trying to figure out what each one means, but I'm getting stuck.
Errors in a computer program can be
classified according to when they are
detected and, if they are detected at
compile time, what part of the
compiler detects them. Using your
favorite programming language, give an
example of:
(a) A lexical error, detected by the
scanner.
(b) A syntax error, detected by the
parser.
(c) A static semantic error, detected
(at compile-time) by semantic
analysis.
(d) A dynamic semantic error, detected
(at run-time) by code generated by the
compiler.
For (a), I think this is would be correct: int char foo;
For (b), int foo (no semicolon)
For (c) and (d), I'm not sure what is being asked.
Thanks for the help.
I think it's important to understand what a scanner is, what a parser is and how they are involved in the compilation process.
(I'll try my best at a high-level explanation)
The scanner takes a sequence of characters (a source file) and converts it to a sequence of tokens. e.g., sees the text if 234 ) and converts to the tokens, IF INTEGER RPAREN (there's more to it but should be enough for the example).
Another way you can think of how the scanner works is that it takes the text and makes sure you use the correct keywords and not makes them up. It has to be able to convert the entire source file to the associated language's recognized tokens and this varies from language to language. In other words, "Does every piece of text correspond to a construct a language understands". Or better put with an example, "Do all these words found in a book, belong to the English language?"
The parser takes a sequence of tokens (usually from the scanner) and (among other things) sees if it is well formed. e.g., a C variable declaration is in the form Type Identifier SEMICOLON.
The parser checks "Does this sequence of tokens in this order make sense to me?" And similarly the analogy, "Does this sequence of English words (with punctuation) form complete sentences?"
C asks for errors that can be found when compiling the program. D asks for errors that you see when running the program after it compiled successfully. You should be able to distinguish these two by now hopefully.
I hope this helps you get a better understanding and make answering these easier.
I'll give it a shot. Here's what I think:
a. int foo+; (foo+ is an invalid identifier because + is not a valid char in identifiers)
b. foo int; (Syntax error is any error where the syntax is invalid - either due to misplacement of words, bad spelling, missing semicolons etc.)
c. Static semantic error are logical errors. for e.g passing float as index of an array - arr[1.5] should be a SSE.
d. I think exceptions like NullReferenceException might be an example of DME. Not completely sure but in covariant returns that raise an exception at compile time (in some languages) might also come in this category. Also, passing the wrong type of object in another object (like passing a Cat in a Person object at runtime might qualify for DME.) Simplest example would be trying to access an index that is out of bounds of the array.
Hope this helps.

flex3 type casting

Does anyone know the real difference between the two ways of type casting in Flex 3?
var myObject1:MyObject = variable as MyObject;
var myObject2:MyObject = MyObject(variable);
I prefer to use the second method because it will throw an Error when type cast fails, whereas the first method will just return null. But are there any other differences? Perhaps any advantages to using the first method?
The second type of casting has different behaviour for top level(http://livedocs.adobe.com/flex/2/langref/) types, e.g. Array(obj) does not cast in the straightforward way you describe; it creates a new Array if possible from obj, even if obj is an Array.
I'm sure the times this would cause unexpected behaviour would be rare but I always use "as" for this reason. It means if I do
int(str)
I know it's a cast in the "attempt to convert" sense of the word not in the "I promise it is" sense.
ref: got some confirmation on this from http://raghuonflex.wordpress.com/2007/07/27/casting-vs-the-as-operator/
The as method returns null if cast fails.
The () method throws and error if the cast fails.
If the value of variable is not compatible with MyObject, myObject1 will contain null and you will be surprised by a null pointer error (1009 : cannot access a property or method of a null object reference.) somewhere later in the program when you try to access it. Where as if you are casting using the MyObject(variable) syntax, you will get a type coercion error (1034 : Type Coercion failed: cannot convert _ to _) at the same line itself - which is more helpful than getting a 1009 somewhere later and wondering what went wrong.
I think I read somewhere on this site that as is slighty faster than (), but I can't find the question again.
Beside that this question have been asked many times, you will find an more in-depth answer here.
I recently discovered the very useful [] tag when searching on StackOverflow, it allows to only search in questions tagged with the specified tag(s). So you could do a search like [actionscript-3] as vs cast. There are more search tips here: https://stackoverflow.com/search.
And no; the irony in that I can not find the question about performance and write about how to search is not lost on me ;)
I think as returns back the base class and not null when the casting fails and () throws an error.