Getting full binary control flow graph from Radare2 - reverse-engineering

I want to get a full control flow graph of a binary (malware) using radare2.
I followed this post from another question on SO. I wanted to ask if instead of ag there is another command that gives the control flow graph of the whole binary and not only the graph of one function.

First of all, make sure to install radare2 from git repository and use the newest version:
$ git clone https://github.com/radare/radare2.git
$ cd radare2
$ ./sys/install.sh
After you've downloaded and installed radare2, open your binary and perform analysis on it using the aaa command:
$ r2 /bin/ls
-- We fix bugs while you sleep.
[0x004049a0]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for objc references
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
Adding ? after almost every command in radare will output the subcommands. For example, you know that the ag command and its subcommands can help you to output the visual graphs so by adding ? to ag you can discover its subcommands:
[0x00000000]> ag?
Usage: ag<graphtype><format> [addr]
Graph commands:
| aga[format] Data references graph
| agA[format] Global data references graph
| agc[format] Function callgraph
| agC[format] Global callgraph
| agd[format] [fcn addr] Diff graph
... <truncated> ...
Output formats:
| <blank> Ascii art
| * r2 commands
| d Graphviz dot
| g Graph Modelling Language (gml)
| j json ('J' for formatted disassembly)
| k SDB key-value
| t Tiny ascii art
| v Interactive ascii art
| w [path] Write to path or display graph image (see graph.gv.format and graph.web)
You're searching for the agCd command which will output a full call-graph of the program in dot format.
[0x004049a0]> agCd > output.dot
The dot utility is part of the Graphviz software which can be installed using sudo apt-get install graphviz.
You can view your output in any offline dot viewer, paste the output into an online Graphviz viewer and even convert the dot file to PNG:
$ r2 /bin/ls
[0x004049a0]> aa
[x] Analyze all flags starting with sym. and entry0 (aa)
[0x004049a0]> agCd > output.dot
[0x004049a0]> !!dot -Tpng -o callgraph.png output.dot

Related

match text in a csv file, for the X firsts lines and the last X results and get a value in lua

i'm translating a bash script to a Lua program. In bash script there is a line:
mapfile -t vol < <( cat csv_file | head -$id | grep locateme | tail -3 | cut -f6 -d\,)
the result of that is:
vol[0]=22
vol[1]=33
vol[2]=44
the csv_file is like:
16,a,b,c,d,9,16,0,3,65,0,0,locateme
16,a,b,c,d,11,16,0,3,65,0,0,notme
16,a,b,c,d,22,16,0,3,65,0,0,locateme
16,a,b,c,d,33,16,0,3,65,0,0,locateme
16,a,b,c,d,32,16,0,3,65,0,0,notme
16,a,b,c,d,44,16,0,3,65,0,0,locateme
I need a table with the same results than bash:
vol[1]=22
vol[2]=33
vol[3]=44
please, i have no idea how to start with this
Instead of a Bash array you're going to use a Lua table.
local vol = {}
You'll need a generic for loop and the file:lines(...) iterator. It is a good idea to read through the whole io library.
This will allow you to get each line of the csv file as a string for further processing.
No you'll need Lua's string library. There are multiple ways to do this. One option is to use another generic for loop with string.gmatch and a suitable string pattern that captures the value you're interested in.

Proj: Solving "Grid *.gsb needed but not found on the system."

I would like to transform some coordinates from EPSG 21781 to EPSG 2056. If I run projinfo on these to projections, i get the information that a certain grid is needed but not found on the system (see below).
$ projinfo -o PROJ -s EPSG:21781 -t EPSG:2056
Candidate operations found: 1
-------------------------------------
Operation No. 1:
unknown id, Inverse of Swiss Oblique Mercator 1903M + CH1903 to CH1903+ (1) + Swiss Oblique Mercator 1995, 0.2 m, Europe - Liechtenstein and Switzerland, at least one grid missing
PROJ string:
+proj=pipeline +step +inv +proj=somerc +lat_0=46.9524055555556 +lon_0=7.43958333333333 +k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel +step +proj=hgridshift +grids=CHENyx06a.gsb +step +proj=somerc +lat_0=46.9524055555556 +lon_0=7.43958333333333 +k_0=1 +x_0=2600000 +y_0=1200000 +ellps=bessel
Grid CHENyx06a.gsb needed but not found on the system. Can be obtained from the proj-datumgrid-europe package at https://download.osgeo.org/proj/proj-datumgrid-europe-1.5.zip
After I have downloaded the specified file (proj-datumgrid-europe-1.5.zip), what do I need to do with it? This does not seem to be described in the docs.
I'm working on Ubuntu 20.04 and proj 6.3.1
$ pkg-config --modversion proj
6.3.1
For the PROJ version you are using (6.3.1), the proj-datumgrid docs say that you just need to unzip the file into the PROJ data directory, which is either /usr/local/share/proj or /usr/share/proj

Octave parallel package: User-defined function call issues

I have trouble calling a user-defined function with pararrayfun (likewise for parcellfun). When I execute the following code:
pkg load parallel
function retval = mul(x,y)
retval = x*y;
endfunction
vector_x = 1:2^3;
vector_y = 1:2^3;
vector_z = pararrayfun(nproc, #(x,y) mul(x,y), vector_x, vector_y)
vector_z = pararrayfun(nproc, #(x,y) x*y, vector_x, vector_y)
I get the following output:
vector_z =
-1 -1 -1 -1 -1 -1 -1 -1
vector_z =
1 4 9 16 25 36 49 64
That is, the call to the user-defined function does not seem to work, whereas the same as an anonymous function is working.
The machine is x86_64 with Debian bullseye and 5.10.0-1-amd64 kernel. Octave's version is 6.1.1~hg.2020.12.27-1. The pkg list command gives me:
Package Name | Version | Installation directory
--------------+---------+-----------------------
dataframe | 1.2.0 | /usr/share/octave/packages/dataframe-1.2.0
parallel *| 4.0.0 | /usr/share/octave/packages/parallel-4.0.0
struct *| 1.0.16 | /usr/share/octave/packages/struct-1.0.16
Funny thing is that the same code works flawless on armv7l with Debian buster and 4.14.150-odroidxu4 kernel. That is the call to the user-defined function and the anonymous function produce the output:
parcellfun: 8/8 jobs done
vector_z =
1 4 9 16 25 36 49 64
parcellfun: 8/8 jobs done
vector_z =
1 4 9 16 25 36 49 64
On that machine Octave's version is 4.4.1 and pkg list gives:
Package Name | Version | Installation directory
--------------+---------+-----------------------
dataframe | 1.2.0 | /usr/share/octave/packages/dataframe-1.2.0
parallel *| 3.1.3 | /usr/share/octave/packages/parallel-3.1.3
struct *| 1.0.15 | /usr/share/octave/packages/struct-1.0.15
What is wrong and how can I fix this behavior?
This is probably a bug, but do note that the new version of parallel has introduced a few limitations as per its documentation (also see the latest release news) which may relate to what's happening here.
Having said that, I want to clarify this sentence:
the call to the user-defined funtion does not seem to work, whereas the same as an anonymous function is working.
That's not what's happening. You're passing an anonymous function in both cases. It's just that the first calls mul inside, and the second calls mtimes.
As for your error (bug?) this may have something to do with mul being a 'command-line' function. It's not clear from the documentation if command-line functions are a limitation and this is simply an oversight in the docs, or if ill-treatment of command-line functions is a genuine bug. I think if you put it in its own file it should work fine. (and equally, if you do, it's worth passing it as a handle directly, rather than wrapping it inside another anonymous function).
Having said that, I think the -1's you see are basically "error returns" from inside pararrayfun's guts. The reason for this is the following: if instead of creating mul as a command-line function, you make it an anonymous function:
mul = #(x,y) x * y
Observe what the three calls below return:
x = pararrayfun( nproc, #(x,y) mul(x,y), vector_x, vector_y ) # now it works as expected.
x = pararrayfun( nproc, mul, vector_x, vector_y ) # same: mul is a valid handle expecting two inputs
x = pararrayfun( nproc, #mul, vector_x, vector_y ) # x=-1,-1,-1,-1,-1,-1,-1,-1
If you had tried the last command using normal array fun, you would have seen an error relating to the fact that you accidentally passed #mul instead of mul, when mul is a proper handle. In pararrayfun, it just does the calculation, and presumably -1 was the return value from an internal error.
I don't know exactly why a command-line function fails, but presumably it has something to do with the fact that pararrayfun creates separate octave instances under the hood, which need access to all function definitions, and perhaps command-line functions cannot be transfered / compiled in the new instance as easily as in the parent instance, because of the way they are created / compiled in the current session.
In any case, I think you'll solve your problem if instead of a command-line function definition, you create an external function or (if dealing with simple enough functions) a handle to an anonymous function.
However, I would still submit a bug to the octave bug tracker to help the project :)

SQLite extension binaries

sqlite.org provides windows binaries for the core functions. Are there any pre-built DLLs for the various standard extensions - free text search, virtual tables and JSON in particular? I notice that the command shell as distributed does not support the table-valued JSON functions.
This seems a very obvious request, given the ready availability of binaries for SQLite in other respects, but I can't find anywhere online hosting pre-built extension libraries.
The command-line shell, as distributed, does support the table-valued JSON functions:
sqlite> select * from json_tree('["hello",["world"]]');
key value type atom id parent fullkey path
---------- ------------------- ---------- ---------- ---------- ---------- ---------- ----------
["hello",["world"]] array 0 $ $
0 hello text hello 1 0 $[0] $
1 ["world"] array 2 0 $[1] $
0 world text world 3 2 $[1][0] $[1]
Anyway, the SQLite library is meant to be embedded into your application, i.e., the sqlite3.c file (and any needed extensions not already included in the amalgamation) is to be directly compiled together with your other sources.

Parse ClamAV logs in Bash script using Regex to insert in MySQL

Morning/Evening all,
I've got a problem where I'm making a script for work that uses ClamAV to scan for malware, and then place it's results in MySQL by taking the resultant ClamAV logs using grep with awk to convert the right parts of the log to a variable. The problem I have is that whilst I have done the summary ok, the syntax of detections makes it slightly more difficult. I'm no expert at regex by all means and this is a bit of a learning experience, so there is probably a far better way of doing it than I have!
The lines I'm trying to parse looks like these:
/net/nas/vol0/home/recep/SG4rt.exe: Worm.SomeFool.P FOUND
/net/nas/vol0/home/recep/SG4rt.exe: moved to '/srv/clamav/quarantine/SG4rt.exe'
As far as I was able to establish, I need a positive lookbehind to match what happens after and before the colon, without actually matching the colon or the space after it, and I can't see a clear way of doing it from RegExr without it thinking I'm trying to look for two colons. To make matters worse, we sometimes get these too...
WARNING: Can't open file /net/nas/vol0/home/laser/samples/sample1.avi: Permission denied
The end result is that I can build a MySQL query that inserts the path, malware found and where it was moved to or if there was an error then the path, then the error encountered so as to convert each element to a variable contents in a while statement.
I've done the scan summary as follows:
Summary looks like:
----------- SCAN SUMMARY -----------
Known viruses: 329
Engine version: 0.97.1
Scanned directories: 17350
Scanned files: 50342
Infected files: 3
Total errors: 1
Data scanned: 15551.73 MB
Data read: 16382.67 MB (ratio 0.95:1)
Time: 3765.236 sec (62 m 45 s)
Parsing like this:
SCANNED_DIRS=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned directories" | awk '{gsub("Scanned directories: ", "");print}')
SCANNED_FILES=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned files" | awk '{gsub("Scanned files: ", "");print}')
INFECTED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Infected files" | awk '{gsub("Infected files: ", "");print}')
DATA_SCANNED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data scanned" | awk '{gsub("Data scanned: ", "");print}')
DATA_READ=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data read" | awk '{gsub("Data read: ", "");print}')
TIME_TAKEN=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Time" | awk '{gsub("Time: ", "");print}')
END_TIME=$(date +%s)
mysql -u scanner_parser --password=removed sc_live -e "INSERT INTO bs.live.bs_jobstat VALUES (NULL, '$CURRTIME', '$PID', '$IY', '$SCANNED_DIRS', '$SCANNED_FILES', '$INFECTED', '$DATA_SCANNED', '$DATA_READ', '$TIME_TAKEN', '$END_TIME');"
rm -f /srv/clamav/$IY-scan-$LOGTIME.log
Some of those variables are from other parts of the script and can be ignored. The reason I'm doing this is to save logfile clutter and have a simple web based overview of the status of the system.
Any clues? Am I going about all this the wrong way? Thanks for help in advance, I do appreciate it!
From what I can determine from the question, it seems like you are asking how to distinguish the lines you want from the logger lines that start with WARNING, ERROR, INFO.
You can do this without getting to fancy with lookahead or lookbehind. Just grep for lines beginning with
"/net/nas/vol0/home/recep/SG4rt.exe: "
then using awk you can extract the remainder of the line. Or you can gsub the prefix out like you are doing in the summary processing section.
As far as the question about processing the summary goes, what strikes me most is that you are processing the entire file multiple times, each time pulling out one kind of line. For tasks like this, I would use Perl, Ruby, or Python and make one pass through the file, collecting the pieces of each line after the colon, storing them in regular programming language variables (not env variables), and forming the MySQL insert string using interpolation.
Bash is great for some things but IMHO you are justified in using a more general scripting language (Perl, Python, Ruby come to mind).