Lexical Analysis of Preprocessed Code - language-agnostic

I have programmed an assembler with a preprocessor for the MOS 6502 microprocessor. The assembler spits out the correct binary and the preprocessor performs constant substitution, inclusions and conditional inclusions. The problem is retaining file positions of the included files. At this point the preprocessor emits a file directive just before and after a file is included. Here is an example.
Proggie.asm
JSR init
JSR loop
JSR end
%include "Init.asm"
%include "Loop.asm"
%include "End.asm"
Init.asm
init:
LDX #$00
RTS
Loop.asm
loop:
INX
CPX #$05
BNE loop
RTS
End.asm
end:
BRK
Pre Processor Result
%file "D:\Proggie.asm" 1
JSR init
JSR loop
JSR end
%file "D:\Init.asm" 1
init:
LDX #$00
RTS%file "D:\Init.asm" 2
%file "D:\Loop.asm" 1
loop:
INX
CPX #$05
BNE loop
RTS%file "D:\Loop.asm" 2
%file "D:\End.asm" 1
end:
BRK%file "D:\End.asm" 2
%file "D:\Proggie.asm" 2
This idea comes from the output the preprocessor from GCC produces. The %file directive tells the lexical analyzer that a file has just been entered or exited. The number after the file path says if the analyzer enters or exits the given file respectively. My lexical analyzer kind of works with this. It is still a bit of when telling the current line number.
So my question is: Is this the way to go? Or is there another algorithm I could use?

Gcc's preprocessor fabricates line control directives which look like this:
# 122 "/usr/include/x86_64-linux-gnu/bits/types.h" 2 3 4
Here, the 122 is the line number in the file /usr/include/x86_64-linux-gnu/bits/types.h. Including the line number means that a downstream lexer doesn't need to track the include stack in order to tell which line it is on.
The rest of the line are flags, which are similar to your approach with the addition of a couple of gcc-specific flags:
'1' This indicates the start of a new file.
'2' This indicates returning to a file (after having included another file).
'3' This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
'4' This indicates that the following text should be treated as being wrapped in an implicit 'extern "C"' block.
These allow the downstream lexer to track the include stack if it wishes, and the gcc lexer does so in order to produce more informative (or at least more wordy) error messages.
I think the logic is easier with the preprocessor maintaining the stack, but it doesn't make a huge amount of difference, particularly if you're also going to want to generate "included from" notes in your error messages.

Related

Using Systemverilog to read then print binary file. First bytes read & print ok, trouble\w byte containing a 1 in the ms bit position encountered

The Systemverilog code below is a single file testbench which reads a binary file into a memory using $fread then prints the memory contents. The binary file is 16 bytes and a view of it is included below (this is what I expect the Systemverilog code to print).
The output printed matches what I expect for the first 6 (0-5) bytes. At that point the expected output is 0x80, however the printed output is a sequence of 3 bytes starting with 0xef which are not in the stimulus file. After those 3 bytes the output matches the stimulus again. It seems as if when bit 7 of the binary byte read is 1, then the error occurs. It almost as if the data is being treated as signed, but it is not, its binary data printed as hex. The memory is defined as type logic which is unsigned.
This is similar to a question/answer in this post:
Read binary file data in Verilog into 2D Array.
However my code does not have the same issue (I use "rb") in the $fopen statement, so that
solution does not apply to this issue.
The Systemverilog spec 1800-2012 states in section 21.3.4.4 Reading binary data that $fread can be used to read a binary file and goes on to say how. I believe this example is compliant to what is stated in that section.
The code is posted on EDA Playground so that users can see it and run it.
https://www.edaplayground.com/x/5wzA
You need a login to run it and download. The login is free. It provides
access to full cloud-based versions of the industry standard tools for HDL simulation.
Also tried running 3 different simulators on EDA Playground. They all produce the same result.
Have tried re-arranging the stim.bin file so that the 0x80 value occurs at the beginning of the file rather than in the middle. In that case the error also occurs at the beginning of the testbench printing output.
Maybe the Systemverilog code is fine and the problem is the binary file? I have provided a screenshot of what emacs hexl mode shows for it's contents. Also viewed it another viewer and it looked the same. You can download it when running on EDA Playground to examine it in another editor. The binary file was generated by GNU Octave.
Would prefer to have a solution which uses Systemverilog $fread rather than something else in order to debug the original rather than work around it (learning). This will be developed into a Systemverilog testbench which applies stimulus read from a binary file generated in Octave/Matlab to a Systemverilog DUT. Binary fileIO is prefered because of the file access speed.
Why does the Systemverilog testbench print 0xef rather than 0x80 for mem[6]?
module tb();
// file descriptors
int read_file_descriptor;
// memory
logic [7:0] mem [15:0];
// ---------------------------------------------------------------------------
// Open the file
// ---------------------------------------------------------------------------
task open_file();
$display("Opening file");
read_file_descriptor=$fopen("stim.bin","rb");
endtask
// ---------------------------------------------------------------------------
// Read the contents of file descriptor
// ---------------------------------------------------------------------------
task readBinFile2Mem ();
int n_Temp;
n_Temp = $fread(mem, read_file_descriptor);
$display("n_Temp = %0d",n_Temp);
endtask
// ---------------------------------------------------------------------------
// Close the file
// ---------------------------------------------------------------------------
task close_file();
$display("Closing the file");
$fclose(read_file_descriptor);
endtask
// ---------------------------------------------------------------------------
// Shut down testbench
// ---------------------------------------------------------------------------
task shut_down();
$stop;
endtask
// ---------------------------------------------------------------------------
// Print memory contents
// ---------------------------------------------------------------------------
task printMem();
foreach(mem[i])
$display("mem[%0d] = %h",i,mem[i]);
endtask
// ---------------------------------------------------------------------------
// Main execution loop
// ---------------------------------------------------------------------------
initial
begin :initial_block
open_file;
readBinFile2Mem;
close_file;
printMem;
shut_down;
end :initial_block
endmodule
Binary Stimulus File:
Actual output:
Opening file
n_Temp = 16
Closing the file
mem[15] = 01
mem[14] = 00
mem[13] = 50
mem[12] = 60
mem[11] = 71
mem[10] = 72
mem[9] = 73
mem[8] = bd
mem[7] = bf
mem[6] = ef
mem[5] = 73
mem[4] = 72
mem[3] = 71
mem[2] = 60
mem[1] = 50
mem[0] = 00
Update:
An experiment was run in order to test that the binary file may be getting modified during the process of uploading to EDA playground. There is no Systemverilog code involved in these steps, it's just a file upload/download.
Steps:
(Used https://hexed.it/ to create and view the binary file)
Create/save binary file with the hex pattern 80 00 80 00 80 00 80 00
Create new playground
Upload new created binary file to the new playground
Check the 'download files after run' box on the playground
Save playground
Run playground
Save/unzip the results from the playground run
View the binary file, in my case it has been modified during the process of
upload/download. A screenshot of the result is shown below:
This experiment was conducted on two different Windows workstations.
Based on these results and the comments I am going to close this issue, with the disposition that this is not a Systemverilog issue, but is related to upload/dowload of binary files to EDA playground. Thanks to those who commented.
The unexpected output produced by the testbench is due to modifications that occur to the binary stimulus file during/after upload to EDA playground. The Systemverilog testbench performs as intended to print the contents of the binary file.
This conclusion is based on community comments and experimental results which are provided at the end of the updated question. A detailed procedure is given so that others can repeat the experiment.

Fuzzing command line arguments [argv]

I have a binary I've been trying to fuzz with AFL, the only thing is AFL only fuzzes STDIN, and File inputs and this binary takes input through its arguments pass_read [input1] [input2]. I was wondering if there are any methods/fuzzers that allow fuzzing in this manner?
I don't not have the source code so making a harness is not really applicable.
Michal Zalewski, the creator of AFL, states in this post:
AFL doesn't support argv fuzzing, because TBH, it's just not horribly useful in
practice. There is an example in experimental/argv_fuzzing/ showing how to do it
in a general case if you really want to.
Link to the mentioned example on GitHub: https://github.com/google/AFL/tree/master/experimental/argv_fuzzing
There are some instructions in the file argv-fuzz-inl.h (haven't tried myself).
Bash only Solution
As an example, lets generate 10 random strings and store them in a file
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 10 | head -n 10 > string-file.txt
Next, lets read 2 lines from string-file and pass it into our application
exec handle< string-file.txt
while read string1 <&handle ; do
read string2 <&handle
pass_read $line1 $line2 >> crash_file.txt
done
exec handle<&-
We then have any crashes stored within crash_file.txt for further analysis.
This may not be the most elegant solution, but perhaps you gives you an idea of some other possibilities if no tool necessarily fulfills the current requirements
I looked at the AFLplusplus repo on GitHub. Inside AFLplusplus/utils/argv_fuzzing/, there is a Makefile. If you run it, you will get a .so file (a shared library) that you can use to do argv fuzzing, even if you only have the binary. Obviously, you must use AFL_PRELOAD. You can read more in the README.

How to access VHDL signal attributes in ModelSim via TCL?

I am developing a CPU in VHDL. I am using ModelSim for simulation and testing. In the simulation script I load a program from a binary file to the instruction memory. Now I want to automatically check if the program fits into memory and abort simulation if it doesn't. Since the memory is basically an array of std_logic_vectors, all I would have to do is read the corresponding signal attribute for use in a comparison. My problem is: How do I access a VHDL signal attribute in TCL inside ModelSim?
The closest I have gotten so far is to use the describe command:
describe sim/:tb:uut:imem:mem_array
which prints something like
# Array(0 to 255) [length 256] of
# Array(31 downto 0) [length 32] of
# VHDL standard subtype STD_LOGIC
Now, of course I could parse the length out of there via string operations. But that would not be a very generic solution. Ideally I would like to have something like this:
set mem_size [get_attribute sim/:tb:uut:imem:mem_array'length]
I have searched stackoverflow, googled up and down and searched through the commands in the command reference manual, but I could not find a solution. I am confident there must be a rather easy solution and I just lack the proper wording to successfully search for it. To me, this doesn't look overly specific and I am sure this could come in hand on many occasions when automating design testing. I am using version 10.6.
I would be very grateful if an experienced ModelSim user could help me out.
Disclaimer: I'm not a Tcl expert, so there's probably a more optimized solution out there.
There's a command called examine that you can use to get the value of obejcts.
I created a similar testbench here with a 256 x 32 array, the results were
VSIM> examine -radix hex sim/:tb:uut:imem:mem_array
# {32'hXXXXXXXX} {32'hXXXXXXXX} {32'hXXXXXXXX} {32'hXXXXXXXX} {32'hXXXXXXXX} ...
This is the value of sim/:tb:uut:imem:mem_array at the last simulation step (i.e.,
now).
The command return a list of values for each match (you can use wildcards), so
in our case, it's a list with a single item. You can get the depth by counting
the number of elements it returns:
VSIM> llength [lindex [examine sim/:tb:uut:imem:mem_array] 0]
# 256
You can get the bit width of the first element by using examine -showbase -radix hex,
which will return 32'hFFFFFFFF, where 32'h is the part you want to parse. Wrapping
that into a function would look like
proc get_bit_width { signal } {
set first_element [lindex [lindex [examine -radix hex -showbase $signal] 0] 0]
# Replace everything after 'h, including 'h itself to return only the base
return [regsub "'h.*" $first_element ""]
}
Hope this gives some pointers!
So, I actually found an easy solution. While further studying of the command reference manual brought to light that it is only possible to access a few special signal attributes and length is not one of them, I noticed that ModelSim automatically adds a size object to its object database for the memory array. So I can easily use
set ms [examine sim/:tb:uut:imem:mem_array_size]
to obtain the size and then check if the program fits.
This is just perfect for me, elegant and easy.

When defining a variable in a julia function, I get an error about an undefined variable on that line

Problem
I'm writing a Julia script, and in the function there is a while loop. Inside the while loop there is a variable. That line is throwing errors about the variable being undefined when in fact that is the very line defining the variable.
The code
The error is on line 65
function cleanTexLoop(fileName::String)
f = open(fileName, "r")
while ! eof(f)
line = readline(f), <-- line 65
#line = sentenceFilter(line)
println(line)
end
close(f)
end
The function opens a file which IS getting passed into a loop. The loop runs until the end of file. While looping the file is read line by line. Each time it is read the line is stored in variable line and the file advances. In the proper version, that one line (66) isn't commented out, however for debugging it is. line is then taken as input into a filter which modifies the line before storing it again as line. The final version of this script will have four filters, but for now, I'd be happy to get this to run with just zero filters.
(Note that a user has kindly pointed out the comma that after hours of looking at the code continued to allude me. I'm waiting for that user to write up an answer)
The error message
cleanTexLoop("test.tex")
ERROR: UndefVarError: line not defined
Stacktrace:
[1] cleanTexLoop(::String) at /home/nero/myScripts/latexCleaner.jl:65
[2] macro expansion at ./REPL.jl:97 [inlined]
[3] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at > ./event.jl:73
Previous working version
I had previous written another version of this which works in entirety, however I needed to make some substantial changes to its structure in order to better suit future plans. Note that some of the names aren't up to the normal naming convention. Namely, I use "!" when no variables are actually being changed.
function CleanTexLoop(fileName::String,regX::String,sub::String)
f = open(fileName, "r")
while ! eof(f)
println(applySub!(f,regX,sub))
end
close(f)
end
function applySub!(file::IOStream,regX::String,sub::String)
return replace(
readline(file),
Base.Regex(regX),
Base.SubstitutionString(sub)
)
end
A simple loop which demonstrates why this should work
x = 0
while x < 4
y = x
println(y)
x = x+1
end
As expected, this prints zero to one, and is, as far as I can tell, representative of what I am doing. In both cases I am passing some variable into the loop which, through some action, defines another variable inside the loop which is then printed. Why this works, and the other doesn't is beyond me.
What I've seen on Google.
From looking this problem up, it appears as if this problem arrises when defining variables outside of a loop, or similar environment, as a result of them failing to be passed into the environment. However, this isn't what's happening in my case. In my case the variable is being defined for the first time.
As mentioned in the comments, the problem was an errant comma.

How to source a script file by passing arguments?

Say I have a tcl script and I want to pass some arguments to the second script file which is being sourced in the first tcl:
#first tcl file
source second.tcl
I want to control the flow of second.tcl from first.tcl and I read that tcl source does not accept arguments. I wonder how I can do then.
source does not accept any additional arguments. But you can use (global) variables to pass arguments, e.g.:
# first tcl file
set ::some_variable some_value
source second.tcl
The second TCL file can reference the variable, e.g.:
# second tcl file
puts $::some_variable
Remark:
Sourcing a file means that the content of the sourced script is executed in the current context. That means that the sourced script has access to all variables existing in that context. The above code is the same as:
# one joint tcl file
set ::some_variable some_value
puts $::some_variable
Regarding the "::" thing -- see the explanation here (sorry, I don't have enough rep. to leave comments yet).
I should also add that the original question discusses a problem which appears to be quite odd: it seems that it could be better to provide a specific procedure in your second source file that would set up a state pertaining to what is defined by that script.
Something like:
source file2.tcl
setup_state $foo $bar $baz
Making [source] behave differently based on some global variables looks too obscure to me. Of course you might have legitimate reasons to do this, but anyway...