Sed: snake case functions - function

I need a sed script to automatically convert C functions to lower snake case.
What I have so far is the following which will separate camel case words with underscores, but it doesn't lower case them and it affects everything.
sed -i -e 's/\([a-z0-9]\)\([A-Z]\)/\1_\L\2/g' `find source/ -type f`
How do I make it only apply on functions? I.e. only on strings followed by the character '('.
Also, what do I need to make the strings go lower case?
For example, If I have this code:
void destroyPoolLender(PoolLender *lender)
{
while (!isListEmpty(&lender->pools)) {
MemoryPool *myPool = listPop(&this->pool);
if (pool->inUse) {
logError("%s memory pool still in use. Pool not released.", pool->lenderName);
} else {
free(pool);
}
}
listDestroy(&this->pool);
}
It should look like this once converted:
void destroy_pool_lender(PoolLender *lender)
{
while (!is_list_empty(&lender->pools)) {
MemoryPool *myPool = list_pop(&this->pool);
if (pool->inUse) {
log_error("%s memory pool still in use. Pool not released.", pool->lenderName);
} else {
free(pool);
}
}
list_destroy(&lender->pools);
}
Notice how myPool is untouched because it isn't a function name.

We can do this with sed. The trick is to match everything up to and including the ( as capture group 2, and use \l rather than \L, to downcase only the first matched character:
s/\([a-z0-9]\)\([A-Z][A-Za-z0-9]*(\)/\1_\l\2/
We can't just use the /g modifier, because subsequent replacements may overlap, so use it in a loop:
#!/bin/sed -rf
:loop
s/([a-z0-9])([A-Z][A-Za-z0-9]*\()/\1_\l\2/
tloop
(I used -r for GNU sed to reduce the number of backslashes I needed).
A further simplification is to match a non-word-boundary; this removes the need for two capture groups:
#!/bin/sed -rf
:loop
s/\B[A-Z]\w*\(/_\l&/
tloop
Demo:
$ sed -r ':loop;s/\B[A-Z]\w*\(/_\l&/;tloop' \
<<<'SomeType *myFoo = callMyFunction(myBar, someOtherFunction());'
SomeType *myFoo = call_my_function(myBar, some_other_function());
Note that this only modifies function calls and definitions - it can be hard to identify which names are functions, if you're storing or passing function pointers. You might choose to fix those up manually (reacting to compilation errors) if you only have 70k lines to deal with. If you're working with 1M+, you might want a proper refactoring tool.

Solution for bash. It uses information from object files by the nm command. See man nm.
To creating object files from sources you are needing run gcc with -c option for the each source file (may be you have them already, created by the make command. Then, you can skip this step):
gcc -c one.c -o one.o
gcc -c two.c -o two.o
Usage: ./convert.sh one.o two.o
#!/bin/bash
# store original function names to the variable.
orig_func_names=$(
# get list symbols from all object files
nm -f sysv "$#" |
# picks the functions and removes all information except names.
sed -n '/FUNC/s/\s.*//p' |
# selects only functions, which contain the uppercase letter in the name.
sed -n '/[A-Z]/p'
);
# convert camel case names to snake case names and store new names to the variable.
new_func_names=$(sed 's/[A-Z]/_\l&/g' <<< "$orig_func_names")
# create file, containing substitute commands for 'sed'.
# Example of commands from this file:
# s/\boneTwo\b/one_two/g
# s/\boneTwoThree\b/one_two_three/g
# etc. One line to the each function name.
paste -d'/' <(printf 's/\\b%s\\b\n' ${orig_func_names}) <(printf '%s/g\n' ${new_func_names}) > command_file.txt
# do converting
# change object file extenstions '.o' to C source - '.c' file extensions.
# were this filenames: one.o two.o three.o
# now they are: one.c two.c three.c
# this 'sed' command creates backup for the each file and change the source files.
sed -i_backup -f command_file.txt "${#/.o/.c}"
Should note, that the time of execution grows exponentially in this solution.
For example, if we have 70000 lines and 1000 functions, then it needed do 70 millions checks (70 000 lines * 1000 functions). It would be interesting to know, how much time it will take.
Testing
Input
file one.c
#include <stdio.h>
int one();
int oneTwo();
int oneTwoThree();
int oneTwoThreeFour();
int one() {
puts("");
return 0;
}
int oneTwo() {
printf("%s", "hello");
one();
return 0;
}
int oneTwoThree() {
oneTwo();
return 0;
}
int oneTwoThreeFour() {
oneTwoThree();
return 0;
}
int main() {
return 0;
}
file two.c
#include <stdio.h>
int two() {
return 0;
}
int twoThree() {
two();
return 0;
}
int twoThreeFour() {
twoThree();
return 0;
}
Output
file one.c
#include <stdio.h>
int one();
int one_two();
int one_two_three();
int one_two_three_four();
int one() {
puts("");
return 0;
}
int one_two() {
printf("%s", "hello");
one();
return 0;
}
int one_two_three() {
one_two();
return 0;
}
int one_two_three_four() {
one_two_three();
return 0;
}
int main() {
return 0;
}
file two.c
#include <stdio.h>
int two() {
return 0;
}
int two_three() {
two();
return 0;
}
int two_three_four() {
two_three();
return 0;
}

Related

Support for std::tuple in swig?

When calling a swig generated function returning std::tuple, i get a swig object of that std::tuple.
Is there a way to use type-maps or something else to extract the values? I have tried changing the code to std::vector for a small portion of the code, and that works. (using %include <std_vector.i> and templates) But i don't want to make too many changes in the C++ part.
Edit: here is a minimal reproducible example:
foo.h
#pragma once
#include <tuple>
class foo
{
private:
double secret1;
double secret2;
public:
foo();
~foo();
std::tuple<double, double> return_thing(void);
};
foo.cpp
#include "foo.h"
#include <tuple>
foo::foo()
{
secret1 = 1;
secret2 = 2;
}
foo::~foo()
{
}
std::tuple<double, double> foo::return_thing(void) {
return {secret1, secret2};
}
foo.i
%module foo
%{
#include"foo.h"
%}
%include "foo.h"
When compiled on my linux using
-:$ swig -python -c++ -o foo_wrap.cpp foo.i
-:$ g++ -c foo.cpp foo_wrap.cpp '-I/usr/include/python3.8' '-fPIC' '-std=c++17' '-I/home/simon/Desktop/test_stack_overflow_tuple'
-:$ g++ -shared foo.o foo_wrap.o -o _foo.so
I can import it in python as shown:
test_module.ipynb
import foo as f
Foo = f.foo()
return_object = Foo.return_thing()
type(return_object)
print(return_object)
Outputs is
SwigPyObject
<Swig Object of type 'std::tuple< double,double > *' at 0x7fb5845d8420>
Hopefully this is more helpful, thank you for responding
To clarify i want to be able to use the values in python something like this:
main.cpp
#include "foo.h"
#include <iostream>
//------------------------------------------------------------------------------'
using namespace std;
int main()
{
foo Foo = foo();
auto [s1, s2] = Foo.return_thing();
cout << s1 << " " << s2 << endl;
}
//------------------------------------------------------------------------------
Github repo if anybody is interested
https://github.com/simon-cmyk/test_stack_overflow_tuple
Our goal is to make something like the following SWIG interface work intuitively:
%module test
%include "std_tuple.i"
%std_tuple(TupleDD, double, double);
%inline %{
std::tuple<double, double> func() {
return std::make_tuple(0.0, 1.0);
}
%}
We want to use this within Python in the following way:
import test
r=test.func()
print(r)
print(dir(r))
r[1]=1234
for x in r:
print(x)
i.e. indexing and iteration should just work.
By re-using some of the pre-processor tricks I used to wrap std::function (which were themselves originally from another answer here on SO) we can define a neat macro that "just wraps" std::tuple for us. Although this answer is Python specific it should in practice be fairly simple to adapt for most other languages too. I'll post my std_tuple.i file, first and then annotate/explain it after:
// [1]
%{
#include <tuple>
#include <utility>
%}
// [2]
#define make_getter(pos, type) const type& get##pos() const { return std::get<pos>(*$self); }
#define make_setter(pos, type) void set##pos(const type& val) { std::get<pos>(*$self) = val; }
#define make_ctorargN(pos, type) , type v##pos
#define make_ctorarg(first, ...) const first& v0 FOR_EACH(make_ctorargN, __VA_ARGS__)
// [3]
#define FE_0(...)
#define FE_1(action,a1) action(0,a1)
#define FE_2(action,a1,a2) action(0,a1) action(1,a2)
#define FE_3(action,a1,a2,a3) action(0,a1) action(1,a2) action(2,a3)
#define FE_4(action,a1,a2,a3,a4) action(0,a1) action(1,a2) action(2,a3) action(3,a4)
#define FE_5(action,a1,a2,a3,a4,a5) action(0,a1) action(1,a2) action(2,a3) action(3,a4) action(4,a5)
#define GET_MACRO(_1,_2,_3,_4,_5,NAME,...) NAME
%define FOR_EACH(action,...)
GET_MACRO(__VA_ARGS__, FE_5, FE_4, FE_3, FE_2, FE_1, FE_0)(action,__VA_ARGS__)
%enddef
// [4]
%define %std_tuple(Name, ...)
%rename(Name) std::tuple<__VA_ARGS__>;
namespace std {
struct tuple<__VA_ARGS__> {
// [5]
tuple(make_ctorarg(__VA_ARGS__));
%extend {
// [6]
FOR_EACH(make_getter, __VA_ARGS__)
FOR_EACH(make_setter, __VA_ARGS__)
size_t __len__() const { return std::tuple_size<std::decay_t<decltype(*$self)>>{}; }
%pythoncode %{
# [7]
def __getitem__(self, n):
if n >= len(self): raise IndexError()
return getattr(self, 'get%d' % n)()
def __setitem__(self, n, val):
if n >= len(self): raise IndexError()
getattr(self, 'set%d' % n)(val)
%}
}
};
}
%enddef
This is just the extra includes we need for our macro to work
These apply to each of the type arguments we supply to our %std_tuple macro invocation, we need to be careful with commas here to keep the syntax correct.
This is the mechanics of our FOR_EACH macro, which invokes each action per argument in our variadic macro argument list
Finally the definition of %std_tuple can begin. Essentially this is manually doing the work of %template for each specialisation of std::tuple we care to name inside of the std namespace.
We use our macro for each magic to declare a constructor with arguments for each element of the correct type. The actual implementation here is the default one from the C++ library which is exactly what we need/want though.
We use our FOR_EACH macro twice to make a member function get0, get1, getN of the correct type of each tuple element and the correct number of them for the template argument size. Likewise for setN. Doing it this way allows the usual SWIG typemaps for double, etc. or whatever types your tuple contains to be applied automatically and correctly for each call to std::get<N>. These are really just an implementation detail, not intended to be part of the public interface, but exposing them makes no real odds.
Finally we need an implementation of __getitem__ and a corresponding __setitem__. These simply look up and call the right getN/setN function on the class and call that instead. We take care to raise IndexError instead of the default exception if an invalid index is used as this will stop iteration correctly when we try to iterate of the tuple.
This is then sufficient that we can run our target code and get the following output:
$ swig3.0 -python -c++ -Wall test.i && g++ -shared -o _test.so test_wrap.cxx -I/usr/include/python3.7 -m32 && python3.7 run.py
<test.TupleDD; proxy of <Swig Object of type 'std::tuple< double,double > *' at 0xf766a260> >
['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__swig_destroy__', '__swig_getmethods__', '__swig_setmethods__', '__weakref__', 'get0', 'get1', 'set0', 'set1', 'this']
0.0
1234.0
Generally this should work as you'd hope in most input/output situations in Python.
There are a few improvements we could look to make:
Implement repr
Implement slicing so that tuple[n:m] type indexing works
Handle unpacking like Python tuples.
Maybe do some more automatic conversions for compatible types?
Avoid calling __len__ for every get/setitem call, either by caching the value in the class itself, or postponing it until the method lookup fails?

Running AFL-Fuzzer buffer overflow

I am trying to learn about AFL-fuzzer and I have some questions:
I saw a video shows that if for instance there are two inputs in the code, so in the test case each line is for each input. Is that correct? Since I want put a full message (for example HTTP request) into one variable, so how do I do it?
I don't understand when to put ##.
For example I am trying to fuzz this code:
void Check_buffer(char* data)
{
char buffer[5];
strcpy(buffer, data);
}
int main(int argc, char* argv[])
{
char tmp_data = argv[1];
Check_buffer(argv[1]);
return 0;
}
I have created the in and out folders. In the in folder I have created a txt file with this content: "AAA".
The command line I have executed is:afl-clang -fno-stack-protector -z execstack 4.c -o vul4
Then I run:afl-fuzz -m none -i in/ -o out/ ./vul4 ##
I get the following error:perform_dry_run(), afl-fuzz.c:2852
If I run the command like this:afl-fuzz -m none -i in/ -o out/ ./vul4 AA
it runs good but it does not find any new path and does not find crashes.
As well as, I am trying to understand the concepts of this. If I want to inject code in specific location, how do I do it?
You are trying to get data from command line arguments, but the AFL does not work with argv[] (unless your program reads files like ./prog file.txt ).
Instead use something like
#define INPUTSIZE 100
char input[INPUTSIZE] = {0};
read (STDIN_FILENO, input, INPUTSIZE)
If you are still interested in getting data from argv[], you can use the experimental method from the AFL repository afl argv experimental
## is used when your program accepts a file via the command line
this means that the fuzzer will take the file, mutate it, and substitute it into the program instead ##
p.s.
#include <stdio.h>
#include <unistd.h>
#define INPUTSIZE 100
void Check_buffer(char* data)
{
char buffer[5];
strcpy(buffer, data);
}
int main(int argc, char* argv[])
{
char input[INPUTSIZE] = {0};
read (STDIN_FILENO, input, INPUTSIZE);
Check_buffer(input);
return 0;
}
AFL result image

How to develop tool in C/C++ whose command interface is Tcl shell?

Suppose a tool X need to developed which are written in C/C++ and having Tcl commanline interface, what will the steps or way?
I know about Tcl C API which can be used to extend Tcl by writing C extension for it.
What you're looking to do is embedding Tcl (totally a supported use case; Tcl remembers that it is a C library) but still making something tclsh-like. The simplest way of doing this is:
Grab a copy of tclAppInit.c (e.g., this is the current one in the Tcl 8.6 source tree as I write this) and adapt it, probably by putting the code to register your extra commands, linked variables, etc. in the Tcl_AppInit() function; you can probably trim a bunch of stuff out simply enough. Then build and link directly against the Tcl library (without stubs) to get effectively your own custom tclsh with your extra functionality.
You can use Tcl's API more extensively than that if you're not interested in interactive use. The core for non-interactive use is:
// IMPORTANT: Initialises the Tcl library internals!
Tcl_FindExecutable(argv[0]);
Tcl_Interp *interp = Tcl_CreateInterp();
// Register your custom stuff here
int code = Tcl_Eval(interp, "your script");
// Or Tcl_EvalFile(interp, "yourScriptFile.tcl");
const char *result = Tcl_GetStringResult(interp);
if (code == TCL_ERROR) {
// Really good idea to print out error messages
fprintf(stderr, "ERROR: %s\n", result);
// Probably a good idea to print error traces too; easier from in Tcl
Tcl_Eval(interp, "puts stderr $errorInfo");
exit(1);
}
// Print a non-empty result
if (result[0]) {
printf("%s\n", result);
}
That's about all you need unless you're doing interactive use, and that's when Tcl_Main() becomes really useful (it handles quite a few extra fiddly details), which the sample tclAppInit.c (mentioned above) shows how to use.
Usually, SWIG (Simplified Wrapper and Interface Generator) is the way to go.
SWIG HOMEPAGE
This way, you can write code in C/C++ and define which interface you want to expose.
suppose you have some C functions you want added to Tcl:
/* File : example.c */
#include <time.h>
double My_variable = 3.0;
int fact(int n) {
if (n <= 1) return 1;
else return n*fact(n-1);
}
int my_mod(int x, int y) {
return (x%y);
}
char *get_time()
{
time_t ltime;
time(&ltime);
return ctime(&ltime);
}
Now, in order to add these files to your favorite language, you need to write an "interface file" which is the input to SWIG. An interface file for these C functions might look like this :
/* example.i */
%module example
%{
/* Put header files here or function declarations like below */
extern double My_variable;
extern int fact(int n);
extern int my_mod(int x, int y);
extern char *get_time();
%}
extern double My_variable;
extern int fact(int n);
extern int my_mod(int x, int y);
extern char *get_time();
At the UNIX prompt, type the following:
unix % swig -tcl example.i
unix % gcc -fpic -c example.c example_wrap.c \
-I/usr/local/include
unix % gcc -shared example.o example_wrap.o -o example.so
unix % tclsh
% load ./example.so example
% puts $My_variable
3.0
% fact 5
120
% my_mod 7 3
1
% get_time
Sun Feb 11 23:01:07 2018
The swig command produces a file example_wrap.c that should be compiled and linked with the rest of the program. In this case, we have built a dynamically loadable extension that can be loaded into the Tcl interpreter using the 'load' command.
Taken from http://www.swig.org/tutorial.html

systemtap userspace function tracing

I have a simple c++ program
main.cpp
#include <iostream>
using namespace std;
int addition (int a, int b)
{
int r;
r=a+b;
return r;
}
int main ()
{
int z;
z = addition (5,3);
cout << "The result is " << z;
}
I want to generate the function tracing for this
- print function names and its input output and return types
My systemtap script : para-callgraph.stp
#! /usr/bin/env stap
function trace(entry_p, extra) {
%( $# > 1 %? if (tid() in trace) %)
printf("%s%s%s %s\n",
thread_indent (entry_p),
(entry_p>0?"->":"<-"),
probefunc (),
extra)
}
probe $1.call { trace(1, $$parms) }
probe $1.return { trace(-1, $$return) }
My C++ Exec is called : a ( compiled as g++ -g main.cpp)
Command I run
stap para-callgraph.stp 'process("a").function("*")' -c "./a > /dev/null"
0 a(15119):->_GLOBAL__I__Z8additionii
27 a(15119): ->__static_initialization_and_destruction_0 __initialize_p=0x0 __priority=0x0
168 a(15119): <-__static_initialization_and_destruction_0
174 a(15119):<-_GLOBAL__I__Z8additionii
0 a(15119):->main
18 a(15119): ->addition a=0x0 b=0x400895
30 a(15119): <-addition return=0x8
106 a(15119):<-main return=0x0
Here ->addition a=0x0 b=0x400895 : its address and not actual values ie 5, 3 which I want.
How to modify my stap script?
This appears to be a systemtap bug. It should print the value of b, not its address. Please report it to the systemtap#sourceware.org mailing list (with compiler/etc. versions and other info, as outlined in man error::reporting.
As to changing the script, the $$parms part is where the local variables are being transformed into a pretty-printed string. It could be changed to something like...
trace(1, $$parms . (#defined($foobar) ? (" foobar=".$foobar$) : ""))
to append foobar=XYZ to the trace record, whereever a parameter foobar is available. To work around the systemtap bug in question, you could try
trace(1, $$parms . (#defined($b) ? (" *b=".user_int($b)) : ""))
to dereference the b variable as if it were an int *.

Emulating execvp - Is There a Better Way To Do This?

I'm currently wrapping a command line tool (espeak) with Tcl/Tk, and I have figured this out so far:
load ./extensions/system.so
package require Tk
package require Tclx
set chid 0
proc kill {id} {
exec kill -9 $id
}
proc speak {} {
global chid
set chid [fork]
if {$chid == 0} {
execvp espeak [.text get 1.0 end]
}
}
proc silent {} {
global chid
kill $chid
}
Where system.so is an extension I hacked together to be able to use execvp:
#include <tcl.h>
#include <tclExtend.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
static int
execvp_command(ClientData cdata, Tcl_Interp *interp, int argc, const char* argv[])
{
if (argc == 1)
{
interp->result = "execvp command ?args ...?";
return TCL_ERROR;
}
execvp(argv[1], argv + 1);
return TCL_OK;
}
int System_Init(Tcl_Interp* interp)
{
if (Tcl_InitStubs(interp, "8.1", 0) == NULL)
return TCL_ERROR;
Tcl_CreateCommand(interp, "execvp", execvp_command, NULL, NULL);
Tcl_PkgProvide(interp, "system", "1.0");
return TCL_OK;
}
The reason I need execvp is because a subprocess created by exec (Tcl) seems to keep going when the process dies (I can confirm this by ^C'ing out of the GUI), whereas if I use execvp, espeak dies properly.
Thus, all I really need out of this script is to be able to start a subprocess and kill it on demand.
Is there another library that can do this properly, like Expect?
Tcl uses execvp internally (really; I've just checked the source) so the difference lies elsewhere. That elsewhere will be in signals; the subprocess created by the exec command (and other things that use the same underlying engine) will have the majority of signals forced to use the default signal handlers, but since the only signal it sets to non-default is SIGPIPE, I wonder what else is going on.
That said, the definitive extension for working with this sort of thing is TclX. That gives you access to all the low-level POSIX functionality that you've been using partially. (Expect may also be able to do it.)