Resolving a reduce/reduce conflict in a simple Yacc grammar - reduce

I'd like to know how to resolve the reduce/reduce conflict in the grammar below. The problematic rule is: LDA HASH expression -when I remove it the conflict disappears. I have tried breaking the LDA xxx parsing rules up into separate rules, but I still get a conflict. What is the problem with this rule exactly and how do I fix it?
%%
program:
program statement NEWLINE { }
|
;
statement:
instruction {
printf("Opcode: %c Address Mode: %d Operand: %d\n", opcode[0], address_mode, operand);
numInstructions++;
}
| assignment {
}
;
instruction: ADC expression {
}
| CLC {
address_mode = IMPLIED_MODE;
opcode[0] = 0x18;
fwrite(opcode, 1, 1, output);
}
| CLD {
address_mode = IMPLIED_MODE;
opcode[0] = 0xd8;
fwrite(opcode, 1, 1, output);
}
| LDA expression {
opcode[0] = 0xad;
}
| LDA HASH expression {
opcode[0] = 0xa9;
address_mode = IMMEDIATE_MODE;
if( operand > 255 ) {
printf("syntax error. Immediate value too large. The value must lie in the range between 0..255. Line: %d\n", yylineno-1);
exit(1);
}
}
| LDA expression COMMA IDENTIFIER {
// LDA operand, x | LDA operand, y
opcode[0] = 0xbd;
}
| STA expression {
}
| RTS {
address_mode = IMPLIED_MODE;
opcode[0] = 0x60;
fwrite(opcode, 1, 1, output);
}
;
expression: number { $$ = $1;
operand = $1;
}
| HASH number {
operand = $2;
}
;
number:
NUMBER { $$ = $1; }
| HEXADECIMAL { $$ = $1; }
| BINARY { $$ = $1; }
;
assignment:
IDENTIFIER EQUALS expression { $$ = $3; }
;
%%

I have just spotted the error in the grammar. The problem is that the "expression: number" rule partially reduces to "HASH number" resulting in a reduce/reduce conflict with the "LDA HASH expression" rule. The way to resolve it is to delete the "HASH number" terms within the expression rule.

Related

CRC16 calculation in Tcl

Im trying to compute the CRC16 of a binary file.
I started by reading 2 Bytes from the Binary file and compute a CRC16 with a Polynome= 1021 and 0xFFFF Initial Value. I used a C code and tried to translate it to TCL. I couldnt use the bytes format because i get by the computation an error about using non numeric string. So i converted the bytes to strings.
proc main {}{
# open binary file
set file_read [open "$input_file" rb]
while {1} {
if {! [eof $fr]} {
append binary_data [read $file_read 2]
}
binary scan [string range $binary_data 0 1] H4 str_bin_data
set CRC_data [CRC_calculation $str_bin_data]
puts " CRC_data := $CRC_data"
}
}
proc CRC_calculation {str_bin_data} {
set Polynome 0x1021
set Highbit 0x8000
set CRC_data 0xFFFF
set byte 0
set bit 0
set data_ln [string length $str_bin_data]
# puts " data_ln := $data_ln"
for {set byte 0} {$byte < $data_ln} {incr byte} {
set CRC_data [ expr {$CRC_data ^ ([lindex $str_bin_data $byte] << 8)} ]
for {set bit 8} {$bit > 0} {incr bit -1} {
if {($CRC_data && $Highbit)} {
set CRC_data [expr {($CRC_data << 1) ^ $Polynome}]
} else {
set CRC_data [expr {$CRC_data << 1}]
}
}
puts " byte_index := $byte"
puts " CRC_data := $CRC_data"
}
return $CRC_data
}
In C when i define a byte array example( first 8 Bytes in the binary file):
unsigned char bytes[3]= {0x55,0x55,0x55,0x55};
then CRC = 0x82b8
In Tcl I dont get the correct value not even a 32 bit CRC value.
Here the C code that i m using:
#include<stdio.h>
#define Polynom 0x1021
#define Highbit 0x8000
unsigned short getCRC(const unsigned char data[])
{
unsigned short rem = 0xFFFF;
unsigned long byte = 0;
int bit = 0;
for (byte = 0; byte < 3; ++byte)
{
rem ^= (data[byte]<< 8);
for (bit = 8; bit > 0; --bit)
{
if (rem & Highbit)
rem = (rem << 1) ^ Polynom;
else
rem = (rem << 1);
}
}
return (rem);
}
int main() {
int rem ;
unsigned char data[]= {0x55,0x55,0x55,0x55};
rem = getCRC (data);
printf("%x", rem);
}
There's a few problems. Firstly, and most importantly, the scanning of the binary data isn't right as we want to end up with unsigned bytes (for parallel operation with that C) and not hex characters. You'd be better with:
# I'm assuming you've got Tcl 8.6, this is how you read 2 bytes as unsigned chars
binary scan [read $file_read 2] "cu*" str_bin_data
# Process the list of parsed byte data here; I'm not sure if you want this in the loop or not
The other big problem is that your CRC calculation isn't correct.
proc CRC_calculation {str_bin_data} {
set Polynom 0x1021
set Highbit 0x8000
set MASK 0xFFFF; # 16 bit mask; for clamping to C unsigned short range
set rem 0xFFFF
# Assume str_bin_data holds a list of unsigned char values
foreach byte $str_bin_data {
set rem [expr {$rem ^ ($byte << 8)}]
foreach _ {7 6 5 4 3 2 1 0} {
set rem [expr {
(($rem << 1) ^ ($rem & $Highbit ? $Polynom : 0)) & $MASK
}]
}
}
return $rem
}
Key observation here? Tcl's numbers are arbitrary precision integers (and IEEE doubles, though not relevant here). This means that you need to clamp the range. Minimally, that would be an AND with 0xFFFF (16-bit mask) after any operation that can increase the number of bits in use, which is just << in this algorithm. That, plus the problems with converting the binary data in the first place, are why things weren't working for you. I've also switched to using foreach as that's fast and clearer for operations where “do every one of them” is the fundamental idea, and merged the inner bits into a single expr (yes, expr expressions can be multiline if you want).
The biggest single problem was that you were passing entirely the wrong thing to the CRC_calculation code. Changing the binary scan is vital.

conversion of and operation having a function from c# to tcl

how to do the and operation given as one line statement in tcl in tcl where pcieDeviceControlRegister is a function given as in the code:
code:
pcieDeviceControlRegister = cfgSpace.pcieDeviceControlRegister & (~((uint)0xF));
Reference for pcieDeviceControlRegister function is :
public uint pcieDeviceControlRegister
{
get
{
if (pcieCapabilityOffset != 0)
return (ReadDW((int)(pcieCapabilityOffset + 8) / 4, 0xF)) & 0xFFFF;
else
return 0;
}
set
{
if (pcieCapabilityOffset != 0)
{
uint val = ReadDW((int)(pcieCapabilityOffset + 8) / 4, 0xF)& 0xFFFF0000;
val |= value;
// write should be done with byte enables !!!
WriteDW((int)(pcieCapabilityOffset + 8) / 4, val, 0xF);
}
}
}
You'll have to arrange for the mapping of ReadDW and WriteDW into Tcl, probably by writing a little C or C++ code that makes commands (with the same names) that do those operations. I'm assuming that you've already done that. (SWIG can generate the glue code if you need it.)
Then, we define a command like this:
proc pcieDeviceControlRegister {{newValue ""}} {
global pcieCapabilityOffset
# Filter the bogus setup case early; if this is really an error case though,
# it is better to actually throw an error instead of struggling on badly.
if {$pcieCapabilityOffset == 0} {
return 0
# error "PCIE capability offset is zero"
}
set offset [expr {($pcieCapabilityOffset + 8) / 4}]
if {$newValue eq ""} {
# This is a read operation
return [expr {[ReadDW $offset 0xF] & 0xFFFF}]
} else {
# This is a write operation
set val [expr {[ReadDW $offset 0xF] & 0xFFFF0000}]
# Note that we do the bit filtering HERE
set val [expr {$val | ($newValue & 0xFFFF)}]
WriteDW $offset $val 0xF
return
}
}
With that, which you should be able to see is a pretty simple translation of the C# property code (with a bit of minor refactoring), you can then write your calling code like this:
pcieDeviceControlRegister [expr {[pcieDeviceControlRegister] & ~0xF}]
With Tcl, you don't write casts to different types of integers: Tcl just has numbers (which are theoretically of infinite width) so instead you need to do a few more bit masks in key places.
The conversion of the above code to a method on an object is left as an exercise. It doesn't change very much…

systemtap userspace function tracing

I have a simple c++ program
main.cpp
#include <iostream>
using namespace std;
int addition (int a, int b)
{
int r;
r=a+b;
return r;
}
int main ()
{
int z;
z = addition (5,3);
cout << "The result is " << z;
}
I want to generate the function tracing for this
- print function names and its input output and return types
My systemtap script : para-callgraph.stp
#! /usr/bin/env stap
function trace(entry_p, extra) {
%( $# > 1 %? if (tid() in trace) %)
printf("%s%s%s %s\n",
thread_indent (entry_p),
(entry_p>0?"->":"<-"),
probefunc (),
extra)
}
probe $1.call { trace(1, $$parms) }
probe $1.return { trace(-1, $$return) }
My C++ Exec is called : a ( compiled as g++ -g main.cpp)
Command I run
stap para-callgraph.stp 'process("a").function("*")' -c "./a > /dev/null"
0 a(15119):->_GLOBAL__I__Z8additionii
27 a(15119): ->__static_initialization_and_destruction_0 __initialize_p=0x0 __priority=0x0
168 a(15119): <-__static_initialization_and_destruction_0
174 a(15119):<-_GLOBAL__I__Z8additionii
0 a(15119):->main
18 a(15119): ->addition a=0x0 b=0x400895
30 a(15119): <-addition return=0x8
106 a(15119):<-main return=0x0
Here ->addition a=0x0 b=0x400895 : its address and not actual values ie 5, 3 which I want.
How to modify my stap script?
This appears to be a systemtap bug. It should print the value of b, not its address. Please report it to the systemtap#sourceware.org mailing list (with compiler/etc. versions and other info, as outlined in man error::reporting.
As to changing the script, the $$parms part is where the local variables are being transformed into a pretty-printed string. It could be changed to something like...
trace(1, $$parms . (#defined($foobar) ? (" foobar=".$foobar$) : ""))
to append foobar=XYZ to the trace record, whereever a parameter foobar is available. To work around the systemtap bug in question, you could try
trace(1, $$parms . (#defined($b) ? (" *b=".user_int($b)) : ""))
to dereference the b variable as if it were an int *.

Write a program to create a formula matching input set to output set

I just saw on StackOverflow an explanation of Obfuscated C explanation. I find the contest very interesting as I get to learn new C, and moreover new programming knowledge everyday with it.
One interesting one is Explain 1-liner party of a US President from IOCCC 2013
As you can see, the program uses a series of modulo functions (ie. (integer)%4796%275%4 ) to hash an input number to form one of two output numbers. This "hash" function had been trialed and errored to work correctly for a limited set of inputs.
I think this would be a great thing to have to minimize the size of your programs so you can map one result to another. For example, you can have {apple, lettuce, carrot, pear, orange, asparagus} when converted to a number and passed through the function, turn into {1,0,0,1,1,0} and this function takes less space than a full database to search a string and ascertain its fruit-ness.
So my question is... how do you go about creating a program to brute-force stacked modulo functions that will work for all a given input set and output set?
Continuing the fruit example, you can convert the first two letters of the fruits into a number like so:
apple -> ap -> 0x6170
so the list becomes {0x6170, 0x6c65, 0x6361, 0x7065, 0x6f72, 0x6173}
So you want generate a function comprising of repeated modulos to map {0x6170, 0x6c65, 0x6361, 0x7065, 0x6f72, 0x6173} to {1,0,0,1,1,0}
How would one go about doing so?
Interesting challenge, but way beyond my mathematical skills. So, here is a brute force implementation in plain C :P
I don't have a mathematical proof and so cannot prove this will always find a solution, but it works for your input. Adding the 'fruit' "clinton" and 'vegetable' "mccaine" make it run for a bit longer; adding "apricot" will signal a clash! (Which is reported but doesn't say where. Note that a clash inside similar items -- fruits or vegetables -- should actually be allowed. I did not write the code for that, but it should be relatively simple.)
To see a longer sequence, add "banana". For some reason, this needs to go up to 163,13,2. Funny enough, then adding "kumquat" (I'm running out of fruits) doesn't take much longer, and even with another "coconut" it doesn't take much more time.
Implementation notes:
You can limit the search to the maximum input 2-byte code, because the mod function will never do anything with larger numbers. For consistency, this C code starts with testing all 1-mod values (although I did not find anything useful). Then it tests 2-mod values, and finally 3-mod values. If it does not find a value, you could extend it to look for 4-mod and 5-mod long sequences, but the search time will increase logarithmically (!).
It should be possible to categorize on more than 2 values; that needs very minor rewriting of the code. I suspect looking for a match for all will then take much longer.
It should also be possible to return other 'indexes' than 0 and 1; as I'm writing this, the rest of the CPU is working hard on 2 and 0. It seems that's going to take quite a bit of time, though.
C implementation, brute force:
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
char *fruits[] = { "apple", "pear", "orange"} ; //, "clinton" };
char *veggies[] = { "lettuce", "carrot", "asparagus" }; // , "mccaine" };
int num_f, num_v, found_match;
unsigned short *code[2], max_code = 0;
int i,j, mod_value1, mod_value2, mod_value3;
num_f = sizeof(fruits)/sizeof(fruits[0]);
num_v = sizeof(veggies)/sizeof(veggies[0]);
code[0] = malloc((num_f+num_v)*sizeof(short));
code[1] = malloc((num_f+num_v)*sizeof(short));
for (i=0; i<num_f; i++)
{
code[0][i] = (fruits[i][0]<<8)+fruits[i][1];
code[1][i] = 1;
if (code[0][i] > max_code)
max_code = code[0][i];
}
for (i=0; i<num_v; i++)
{
code[0][num_f+i] = (veggies[i][0]<<8)+veggies[i][1];
code[1][num_f+i] = 0;
if (code[0][num_f+i] > max_code)
max_code = code[0][num_f+i];
}
for (i=0; i<num_f+num_v; i++)
{
for (j=i+1; j<num_f+num_v; j++)
{
if (code[0][i] == code[0][j])
{
printf ("clash!\n");
exit(-1);
}
}
}
printf ("calculating...\n");
for (mod_value1=1; mod_value1<max_code; mod_value1++)
{
found_match = 1;
for (i=0; i<num_f+num_v; i++)
{
if (code[0][i] % mod_value1 != code[1][i])
{
found_match = 0;
break;
}
}
if (found_match)
{
printf ("mod %d should work\n", mod_value1);
break;
}
}
if (found_match)
{
for (i=0; i<num_f; i++)
{
printf ("%s -> %d\n", fruits[i], code[0][i] % mod_value2 % mod_value1);
}
for (i=0; i<num_v; i++)
{
printf ("%s -> %d\n", veggies[i], code[0][num_f+i] % mod_value2 % mod_value1);
}
} else
{
for (mod_value1=1; mod_value1<max_code; mod_value1++)
{
for (mod_value2=mod_value1+1; mod_value2<max_code; mod_value2++)
{
found_match = 1;
for (i=0; i<num_f+num_v; i++)
{
if (code[0][i] % mod_value2 % mod_value1 != code[1][i])
{
found_match = 0;
break;
}
}
if (found_match)
{
printf ("mod %d mod %d should work\n", mod_value2, mod_value1);
break;
}
}
if (found_match)
break;
}
if (found_match)
{
for (i=0; i<num_f; i++)
{
printf ("%s -> %d\n", fruits[i], code[0][i] % mod_value2 % mod_value1);
}
for (i=0; i<num_v; i++)
{
printf ("%s -> %d\n", veggies[i], code[0][num_f+i] % mod_value2 % mod_value1);
}
} else
{
for (mod_value1=1; mod_value1<max_code; mod_value1++)
{
for (mod_value2=mod_value1+1; mod_value2<max_code; mod_value2++)
{
for (mod_value3=mod_value2+1; mod_value3<max_code; mod_value3++)
{
found_match = 1;
for (i=0; i<num_f+num_v; i++)
{
if (code[0][i] % mod_value3 % mod_value2 % mod_value1 != code[1][i])
{
found_match = 0;
break;
}
}
if (found_match)
{
printf ("mod %d mod %d mod %d should work\n", mod_value3, mod_value2, mod_value1);
break;
}
}
if (found_match)
break;
}
if (found_match)
break;
}
if (found_match)
{
for (i=0; i<num_f; i++)
{
printf ("%s -> %d\n", fruits[i], code[0][i] % mod_value3 % mod_value2 % mod_value1);
}
for (i=0; i<num_v; i++)
{
printf ("%s -> %d\n", veggies[i], code[0][num_f+i] % mod_value3 % mod_value2 % mod_value1);
}
}
}
}
return 0;
}
Result for your original 3 fruit, 3 veggie list:
mod 25 mod 2 should work
apple -> 1
pear -> 1
orange -> 1
lettuce -> 0
carrot -> 0
asparagus -> 0
Late Result, for a wanted output of 2 and 0:
mod 507 mod 8 mod 3 should work
apple -> 2
pear -> 2
orange -> 2
lettuce -> 0
carrot -> 0
asparagus -> 0
Later result yet, with the 'disputable' "tomato" added as a Category 2:
mod 103 mod 7 mod 3 should work
apple -> 1
pear -> 1
orange -> 1
lettuce -> 0
carrot -> 0
asparagus -> 0
tomato -> 2

How do I unescape a html attribute value in Prolog?

I find a predicate xml_quote_attribute/2 in a library(sgml)
of SWI-Prolog. This predicate works with the first argument
as input and the second argument as output:
?- xml_quote_attribute('<abc>', X).
X = '<abc>'.
But I couldn't figure out how I can do the reverse conversion.
For example the following query doesn't work:
?- xml_quote_attribute(X, '<abc>').
ERROR: Arguments are not sufficiently instantiated
Is there another predicate that does the job?
Bye
This is how Ruud's solution looks like with DCG notation + pushback lists / semicontext notation.
:- use_module(library(dcg/basics)).
html_unescape --> sgml_entity, !, html_unescape.
html_unescape, [C] --> [C], !, html_unescape.
html_unescape --> [].
sgml_entity, [C] --> "&#", integer(C), ";".
sgml_entity, "<" --> "<".
sgml_entity, ">" --> ">".
sgml_entity, "&" --> "&".
Using DCGs makes the code a bit more readable. It also does away with some of the superfluous backtracking that Cookie Monster noted is the result of using append/3 for this.
Here's the naive solution, using lists of character codes. Most likely it will not give you the best performance possible, but for strings that are not extremely long, it might just be alright.
html_unescape("", "") :- !.
html_unescape(Escaped, Unescaped) :-
append("&", _, Escaped),
!,
append(E1, E2, Escaped),
sgml_entity(E1, U1),
!,
html_unescape(E2, U2),
append(U1, U2, Unescaped).
html_unescape(Escaped, Unescaped) :-
append([C], E2, Escaped),
html_unescape(E2, U2),
append([C], U2, Unescaped).
sgml_entity(Escaped, [C]) :-
append(["&#", L, ";"], Escaped),
catch(number_codes(C, L), error(syntax_error(_), _), fail),
!.
sgml_entity("<", "<").
sgml_entity(">", ">").
sgml_entity("&", "&").
You will have to complete the list of SGML entities yourself.
Sample output:
?- html_unescape("<a> 曹操", L), format('~s', [L]).
<a> 曹操
L = [60, 97, 62, 32, 26361, 25805].
If you don't mind linking a foreign module, then you can make a very efficient implementation in C.
html_unescape.pl:
:- module(html_unescape, [ html_unescape/2 ]).
:- use_foreign_library(foreign('./html_unescape.so')).
html_unescape.c:
#include <stdio.h>
#include <string.h>
#include <SWI-Prolog.h>
static int to_utf8(char **unesc, unsigned ccode)
{
int ok = 1;
if (ccode < 0x80)
{
*(*unesc)++ = ccode;
}
else if (ccode < 0x800)
{
*(*unesc)++ = 192 + ccode / 64;
*(*unesc)++ = 128 + ccode % 64;
}
else if (ccode - 0xd800u < 0x800)
{
ok = 0;
}
else if (ccode < 0x10000)
{
*(*unesc)++ = 224 + ccode / 4096;
*(*unesc)++ = 128 + ccode / 64 % 64;
*(*unesc)++ = 128 + ccode % 64;
}
else if (ccode < 0x110000)
{
*(*unesc)++ = 240 + ccode / 262144;
*(*unesc)++ = 128 + ccode / 4096 % 64;
*(*unesc)++ = 128 + ccode / 64 % 64;
*(*unesc)++ = 128 + ccode % 64;
}
else
{
ok = 0;
}
return ok;
}
static int numeric_entity(char **esc, char **unesc)
{
int consumed;
unsigned ccode;
int ok = (sscanf(*esc, "&#%u;%n", &ccode, &consumed) > 0 ||
sscanf(*esc, "&#x%x;%n", &ccode, &consumed) > 0) &&
consumed > 0 &&
to_utf8(unesc, ccode);
if (ok)
{
*esc += consumed;
}
return ok;
}
static int symbolic_entity(char **esc, char **unesc, char *name, int ccode)
{
int ok = strncmp(*esc, name, strlen(name)) == 0 &&
to_utf8(unesc, ccode);
if (ok)
{
*esc += strlen(name);
}
return ok;
}
static foreign_t pl_html_unescape(term_t escaped, term_t unescaped)
{
char *esc;
if (!PL_get_chars(escaped, &esc, CVT_ATOM | REP_UTF8))
{
PL_fail;
}
else if (strchr(esc, '&') == NULL)
{
return PL_unify(escaped, unescaped);
}
else
{
char buffer[strlen(esc) + 1];
char *unesc = buffer;
while (*esc != '\0')
{
if (*esc != '&' || !(numeric_entity(&esc, &unesc) ||
symbolic_entity(&esc, &unesc, "<", '<') ||
symbolic_entity(&esc, &unesc, ">", '>') ||
symbolic_entity(&esc, &unesc, "&", '&')))
// TODO: more entities...
{
*unesc++ = *esc++;
}
}
return PL_unify_chars(unescaped, PL_ATOM | REP_UTF8, unesc - buffer, buffer);
}
}
install_t install_html_unescape()
{
PL_register_foreign("html_unescape", 2, pl_html_unescape, 0);
}
The following statement will build a shared library html_unescape.so from html_unescape.c. Tested on Ubuntu 14.04; may be different on Windows.
swipl-ld -shared -o html_unescape html_unescape.c
Start up SWI-Prolog:
swipl html_unescape.pl
Sample output:
?- html_unescape('<a> 曹操', S).
S = '<a> 曹操'.
With special thanks to the SWI-Prolog documentation and source code, and to C library to convert unicode code points to UTF8?
Not aspiring as being the ultimate answer, since it doesn't give
a solution for SWI-Prolog. For a Java based interpreter the problem
is that XML escaping is not part of J2SE, at least not in a simple
form (didn't figure out how to use Xerxes or the like).
A possible route would be to interface to StringEscapeUtils ( * ) from
Apache Commons. But then again this would not be necessary on
Android since there is a class TextUtil. So we rolled our own ( * * )
little conversion. It works as follows:
?- text_escape('<abc>', X).
X = '<abc>'
?- text_escape(X, '<abc>').
X = '<abc>'
Note the use of the Java methods codePointAt() and charCount()
respectively appendCodePoint() in the Java source code. So it
could also escape and unescape code points above the basic
plane, i.e. in a range >0xFFFF (currently not implemented,
left as an exercise).
On the other hand the Apache libraries, at least version 2.6, are
NOT surrogate pair aware and will place two decimal entities per
code point instead as one.
Bye
( * ) Java: Class StringEscapeUtils Source
http://grepcode.com/file/repo1.maven.org/maven2/commons-lang/commons-lang/2.6/org/apache/commons/lang/Entities.java#Entities.escape%28java.io.Writer,java.lang.String%29
( * * ) Jekejeke Prolog: Module xml
http://www.jekejeke.ch/idatab/doclet/prod/en/docs/05_run/10_docu/05_frequent/07_theories/20_system/03_xml.html