Break on namespace function in gdb (llvm) - namespaces

I'm trying to step through llvm's opt program (for an assignment) and the instructor suggested setting a breakpoint at runOnFunction. I see this in one of the files:
bool InstCombiner::runOnFunction(Function &F) { /* (Code removed for SO) */ }
but gdb does not seem to find the runOnFunction breakpoint. It occurred to me that the problem might be namespaces? I tried this but gdb never breaks, it just creates the fooOpt.s file:
(gdb) b runOnFunction
Function "runOnFunction" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (runOnFunction) pending.
(gdb) r -S -instcombine -debug -o ~/Desktop/fooOpt.s ~/Desktop/foo.s
I'm on a Mac so I don't have objdump but otool produces 5.6 million lines, wading through that for a starting point does not seem reasonable as runOnFunction appears more than once there.

Gdb has several builtin commands to find name of such functions. First is info functions, which can be used with optional regexp argument to grep all available functions, https://sourceware.org/gdb/current/onlinedocs/gdb/Symbols.html
info functions regexp
Print the names and data types of all defined functions whose names contain a match for regular expression regexp. Thus, ‘info fun step’ finds all functions whose names include step; ‘info fun ^step’ finds those whose names start with step. If a function name contains characters that conflict with the regular expression language (e.g. ‘operator*()’), they may be quoted with a backslash.
So, you can try info functions runOnFunction to get the name. Sometimes it can be useful to add quotes around name when doing break command.
The other way is to use rbreak command instead of break (b). rbreak will do regexp search in functions names and may define several breakpoints: https://sourceware.org/gdb/current/onlinedocs/gdb/Set-Breaks.html#Set-Breaks
rbreak regex
Set breakpoints on all functions matching the regular expression regex. This command sets an unconditional breakpoint on all matches, printing a list of all breakpoints it set. ...
The syntax of the regular expression is the standard one used with tools like grep. Note that this is different from the syntax used by shells, so for instance foo* matches all functions that include an fo followed by zero or more os. There is an implicit .* leading and trailing the regular expression you supply, so to match only functions that begin with foo, use ^foo.
(or even rbreak file:regex to limit search to single source file)
PS: if you want, you can turn on or off C++ function name demangling with set print demangle on or off (https://sourceware.org/gdb/current/onlinedocs/gdb/Debugging-C-Plus-Plus.html#Debugging-C-Plus-Plus). With demangling turned off it will be easier to copy function name to break command.

Related

Use of brackets and space character in Tcl

I am still confused about the usage of the bracket i.e () [] and {} use in Tcl. I always get caught out using the wrong bracket, having missed brackets when it was required to use them or having used too many of them. Besides this, I am also getting confused by Tcl giving me different result depending on presence or absence of space character (in math expression) and also if I have used more than one space character in succession.
Can someone please give me the basic rules that I must keep in mind to get out of this mess. Brackets have always been simple to use in C and some other languages but here they are totally different.
At the level you're looking at, Tcl is very different to any other language you've ever worked with. The heart of Tcl is defined by the Tcl(n) manual page, which states that (among other things):
Whitespace separates words. Every command takes its arguments as a sequence of words. Newlines and semicolons separate command calls; they're totally equivalent, but good style is to use a newline instead of a semicolon.
{braces} are used mainly for quoting text so that it is passed to commands with no substitutions or word separation performed on it. They nest properly. Braces are also used after $ to do variable substitution in a few cases: that's a rare use.
"double quotes" are used for quoting text so that it is passed to commands with substitutions applied, but no word separation.
[brackets] are a command substitution. They are replaced with the result of running the script inside the bracket. The script is usually a single command.
(parentheses) only have one base language use: for (associative) array elements. Thus, $a(b) is a variable substitution that will use the value of the b element in the a array.
The rest of what people call Tcl is really just a standard library, a set of commands to get you started. Some are fundamental. For example:
if is a conditional command, evaluating a branch (a script) if a condition is true. In order for this to be meaningful, the branch has to be not evaluated until the condition has been evaluated and tested; that pretty much requires putting it in braces.
while is a looping command, and not only do you want to brace its body (that's probably going to be evaluated over and over) but you also want to put the condition expression in braces as well as you definitely want that to be reevaluated each time round the loop.
proc is a command that makes your own custom commands. The body of the procedure definitely is something you want to evaluate later; it goes in braces.
expr is a general expression evaluation command. Under all normal circumstances, you'll want to put its expression in braces so that the code can be compiled and won't have double substitution problems. Note that expressions often make heavy use of parentheses: they have additional meanings in expression syntax. In particular, apart from being array element lookups, they're also used for function calls and grouping.
Note that if and while also use that same expression evaluation engine. They just use the result of the expression to decide what to do.
Scoping is a matter for commands to decide. The usual commands for dealing with introducing a scope are proc and namespace eval. This is nothing like C, C++, Java, C#, or Javascript; they have different rules. Variables are local to their procedure unless you explicitly say otherwise.
The community practice is to do calls like this:
if { $foo(bar) > (17 + $grill) * 7 } {
# This is a comment; it lasts to the end of the line
puts "the foobar $foo(bar) is too large"
set foo(bar) [ComputeSmallerValue $grill]
}
That is, barewords (if and puts) are unquoted, expressions and inner scripts are brace-quoted, parentheses are used where meaningful but most for arrays and expressions, whitespace separates all words, inner scripts are indented (usually by 4) for clarity (it doesn't have semantic meaning, but it sure helps with reading), and “blocks” use egyptian braces so that you don't have to add backslashes all over the place.
You don't have to follow these rules (they're guidelines, not the law) but they make your life easier if you do. Sometimes you do need to break the rules, but then you should know to be careful.
You cannot compare Tcl to C. In C, {} defines scope. In Tcl, {} is a grouping operator.
In Tcl, {} may group a string:
{hello world}
Or a list:
{a b c d e f g h}
Or a script:
{
puts -nonewline {hello }
puts world\n
}
Every command is simply a series of groups (which may be a word, a list,
an expression or a script):
{if} {true} { puts "hello\n" }
Of course, you don't need to put braces around every word,
but you do need braces to enclose a script:
if true { puts hello\n }
Generally, for the if statement, not bracing the expression is a bad idea,
so this is better:
if { true } { puts hello\n }
This simple rule creates Tcl's remarkably simple syntax. Every command is simply
a series of groups, whether a word, an expression, a list or script:
if expr script
while expr script
proc name argument-list script
puts string
for initialization condition nextloop script
The one important thing to remember is whenever an expression is wanted, it
should be enclosed within braces in order to prevent early substitution. e.g.:
set i 0
while { $i < 10 } {
incr i
}
The square brackets, [], are replaced with the output of a command enclosed
by the square brackets:
set output [expr {2**5}]
Parentheses are used within expressions as usual:
set output [expr {(2**5)+2}]
And for arrays:
set i 0
while { $i < 5 } {
set output($i) [expr {2**$i}]
incr i
}
parray output

Sublime build exec command array

I am confused as to why Sublime Text 2 build systems tend to put the exec command as an array. Though this is suggested in the docs (and works), just putting the command as a string works just as well, and is (in my opinion) more straightforward.
The Sublime Text build system uses subprocess.Popen, which recommends the usage of an array. Otherwise the interpretation is platform-dependent.
Cited from the python 2 subprocess documentation:
args should be a sequence of program arguments or else a single string. By default, the program to execute is the first item in args if args is a sequence. If args is a string, the interpretation is platform-dependent (...). Unless otherwise stated, it is recommended to pass args as a sequence.
Additional important cite (thanks #Dimpl for pointing that out):
The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.
The shell argument is set True if you use the shell_cmd and False for cmd. Hence based on the cites I would suggest to use an array for cmd and a string for shell_cmd.

Tcl: Is parameter evaluation guaranteed to be left-to-right?

I have a Tcl program where I often find expressions of the following kind:
proc func {} {...}
...
lappend arr([set v [func]]) $v
The intended meaning of the last line is
set v [func]
lappend arr($v) $v
It obviously works. What I would like to know: Does it work "by accident", or does Tcl guarantee, that the first parameter passed to lappend is evaluated before the second?
Tcl is always evaluated from left to right as you can read on the documentation, I quote the part:
Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like:
set y [set x 0][incr x][incr x]
will always set the variable y to the value, 012.
Agreed with Jerry. Adding some flavor in it.
Tcl commands are evaluated in two steps : parsing & execution.
First the Tcl interpreter parses the command string into words, performing substitutions along the way.
Then a command procedure processes the words to produce a result string. Each command has a separate command procedure.
Let us consider the following code.
%set input "The cat in the hat"
The cat in the hat
%string match "*at in*" $input
1
In the parsing step the Tcl interpreter applies the rules described in this chapter to divide the command up into words and perform substitutions.
Parsing is done in exactly the same way for every command. During the parsing step the Tcl interpreter does not apply any meaning to the values of the words. Tcl just performs a set of simple string operations such as replacing the characters $a with the string stored in variable a. Tcl does not know or care whether a or the resulting word is a number or the name of a widget or anything else.
In the execution step meaning is applied to the words of the command. Tcl treats the first word as a command name, checking to see if the command is defined and locating a command procedure to carry out its function. If the command is defined then the Tcl interpreter invokes its command procedure, passing all of the words of the command to the command procedure. The command procedure is free to interpret the words in any way that it pleases, and different commands apply very different meanings to their arguments.
Major rule to remember here
Tcl parses a command and makes substitutions in a single pass from left to right. Each character is scanned exactly once.
At most a single layer of substitution occurs for each character; the result of one substitution is not scanned for further
substitutions.
Reference : Tcl and the Tk Toolkit

Converting Tcl to C++

I am trying to convert some tcl script into a C++ program. I don't have much experience with tcl and am hoping someone could explain what some of the following things are actually doing in the tcl script:
1) set rtn [true_test_sfm $run_dir]
2) cd [glob $run_dir]
3) set pwd [pwd]
Is the first one just checking if true_test_sfm directory exists in run_dir?
Also, I am programming on a windows machine. Would the system function be the equivalent to exec statements in tcl? And if so how would I print the result of the system function call to stdout?
In Tcl, square brackets indicate "evaluate the code between the square brackets". The result of that evaluation is substituted for the entire square-bracketed expression. So, the first line invokes the function true_test_sfm with a single argument $run_dir; the result of that function call is then assigned to the variable rtn. Unfortunately, true_test_sfm is not a built-in Tcl function, which means it's user-defined, which means there's no way we can tell you what the effect of that function call will be based on the information you've provided here.
glob is a built-in Tcl function which takes a file pattern as an argument and then lists files that match that pattern. For example, if a directory contains files "foo", "bar" and "baz", glob b* would return a list of two files, "bar" and "baz". Therefore the second line is looking for any files that match the pattern given by $run_dir, then using the cd command (another Tcl built-in) to change to the directory found by glob. Probably $run_dir is not actually a file pattern, but an explicit file name (ie, no globbing characters like * or ? in the string), otherwise this code may break unexpectedly. On Windows, some combination of FindFirstFile/FindNextFile in C++ could be used as a substitute for glob in Tcl, and SetCurrentDirectory could substitute for cd.
pwd is another built-in Tcl function which returns the process current working directory as an absolute path. So the last line is querying the current working directory and saving the result in a variable named pwd. Here you could use GetCurrentDirectory as a substitute for pwd.

sed: modify function arguments

I'm trying to write a sed command that will allow me to modify a function's arguments. The number of arguments can be variable.
If this is my function:
int myFunction(int arg1, int arg2, Dog arg3) {
// function implementation
}
I would like to be able to perform addition operations on int arg1, int arg2, ...
Here's what I have that does not work:
sed -e '/^[a-zA-Z0-9_]\+\s\+[a-zA-Z0-9_]\+(/ , /)[\n\s]*{/ {
# arguments should be listed here
}'
Any help is appreciated. Go easy on me, it's my first attempt at sed / shell scripting.
Thanks.
Ultimately, sed is not the correct tool for this job - because of your comment about 'the number of arguments can be variable'. If you are dealing with a fixed number of arguments of a fixed type, you can kludge your way around it, but any more general processing requires a more general processor (than sed).
I suggest trying a different task as your introduction to shell scripting and sed.
If you must do it, then maybe:
sed '/^[A-Za-z_][A-Za-z0-9_]* *[A-Za-z_][A-Za-z0-9]* *([A-Za-z_][A-Za-z0-9_]* *\([A-Za-z_][A-Za-z0-9_]*\) *, *[A-Za-z_][A-Za-z0-9_]* *\([A-Za-z_][A-Za-z0-9_]*\)[ ,)].*{/{p;a\
return \1 + \2;
}' $file
That horror of a match contains the sequence [A-Za-z_][A-Za-z0-9_]* 6 times; it matches an identifier each time. The segment from '[ ,)].*{ matches a third or subsequent arguments. The spaces in the pattern should, perhaps, be '[<blank><tab>]' character classes, but they're a pain to enter on StackOverflow. The regex then matches a function definition, and captures (in the '\(<identifier>\)' parts the names of the two variables (arg1 and arg2 in your example). The actions when this is recognized are:
p - print the line that was recognized.
a - append the following line(s) to the output; in this case, one line containing a return statement that is the sum of the two remembered argument names. The backslash indicates that there is another line of output to append. The braces group the operations together.
Some versions of sed support more powerful regular expressions than others; I'm not sure though that even GNU sed supports PCRE (Perl-Compatible Regular Expressions), and it would take something like PCRE to significantly reduce the the regex.
Note that this script leaves the comment line '// function implementation' untouched. It's your call what you do with that.
Finally, remember that if you write more than one function to add two integers together, you are wasting code. Therefore, this is not a plausible transformation. Each function should do something different, somehow. Granted, if the types are different each time, then maybe it has its uses, but even so, it would be easier to write a generator than to parse the skeleton and fill in the bits. And that might be a good scripting exercise.