Fuzzing command line arguments [argv] - reverse-engineering

I have a binary I've been trying to fuzz with AFL, the only thing is AFL only fuzzes STDIN, and File inputs and this binary takes input through its arguments pass_read [input1] [input2]. I was wondering if there are any methods/fuzzers that allow fuzzing in this manner?
I don't not have the source code so making a harness is not really applicable.

Michal Zalewski, the creator of AFL, states in this post:
AFL doesn't support argv fuzzing, because TBH, it's just not horribly useful in
practice. There is an example in experimental/argv_fuzzing/ showing how to do it
in a general case if you really want to.
Link to the mentioned example on GitHub: https://github.com/google/AFL/tree/master/experimental/argv_fuzzing
There are some instructions in the file argv-fuzz-inl.h (haven't tried myself).

Bash only Solution
As an example, lets generate 10 random strings and store them in a file
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 10 | head -n 10 > string-file.txt
Next, lets read 2 lines from string-file and pass it into our application
exec handle< string-file.txt
while read string1 <&handle ; do
read string2 <&handle
pass_read $line1 $line2 >> crash_file.txt
done
exec handle<&-
We then have any crashes stored within crash_file.txt for further analysis.
This may not be the most elegant solution, but perhaps you gives you an idea of some other possibilities if no tool necessarily fulfills the current requirements

I looked at the AFLplusplus repo on GitHub. Inside AFLplusplus/utils/argv_fuzzing/, there is a Makefile. If you run it, you will get a .so file (a shared library) that you can use to do argv fuzzing, even if you only have the binary. Obviously, you must use AFL_PRELOAD. You can read more in the README.

Related

Extracting CREATE TABLE definitions from MySQL dump?

I have a MySQL dump file over 1 terabyte big. I need to extract the CREATE TABLE statements from it so I can provide the table definitions.
I purchased Hex Editor Neo but I'm kind of disappointed I did. I created a regex CREATE\s+TABLE(.|\s)*?(?=ENGINE=InnoDB) to extract the CREATE TABLE clause, and that seems to be working well testing in NotePad++.
However, the ETA of extracting all instances is over 3 hours, and I cannot even be sure that it is doing it correctly. I don't even know if those lines can be exported when done.
Is there a quick way I can do this on my Ubuntu box using grep or something?
UPDATE
Ran this overnight and output file came blank. I created a smaller subset of data and the procedure is still not working. It works in regex testers however, but grep is not liking it and yielding an empty output. Here is the command I'm running. I'd provide the sample but I don't want to breach confidentiality for my client. It's just a standard MySQL dump.
grep -oP "CREATE\s+TABLE(.|\s)+?(?=ENGINE=InnoDB)" test.txt > plates_schema.txt
UPDATE
It seems to not match on new lines right after the CREATE\s+TABLE part.
You can use Perl for this task... this should be really fast.
Perl's .. (range) operator is stateful - it remembers state between evaluations.
What it means is: if your definition of table starts with CREATE TABLE and ends with something like ENGINE=InnoDB DEFAULT CHARSET=utf8; then below will do what you want.
perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' INPUT_FILE.sql > OUTPUT_FILE.sql
EDIT:
Since you are working with a really large file and would probably like to know the progress, pv can give you this also:
pv INPUT_FILE.sql | perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' > OUTPUT_FILE.sql
This will show you progress bar, speed and ETA.
You can use the following:
grep -ioP "^CREATE\s+TABLE[\s\S]*?(?=ENGINE=InnoDB)" file.txt > output.txt
If you can run mysqldump again, simply add --no-data.
Got it! grep does not support matching across multiple lines. I found this question helpul and I ended up using pcregrep instead.
pcregrep -M "CREATE\s+TABLE(.|\n|\s)+?(?=ENGINE=InnoDB)" test.txt > plates.schema.txt

What does "the composition of UNIX byte streams" mean?

In the opening page of the book of "Lisp In Small Pieces", there is a paragraph goes like this:
Based on the idea of "function", an idea that has matured over
several centuries of mathematical research, applicative languages are
omnipresent in computing; they appear in various forms, such as the
composition of Un*x byte streams, the extension language for the Emacs
editor, as well as other scripting languages.
Can anyone elaborate a bit on "the composition of unix byte streams"? What does it mean? and how it is related to applicative/functional programming?
Thanks,
/bruin
My guess is that this is a reference to something like a pipe under linux.
cal | wc
the symbol | it's what invokes a pipe between 2 applications, a pipe is a feature provided by the kernel so you can use pipes where the applications are written using this kind of kernel APIs.
In this example cal is just the utility that prints a calendar, wc is an utility that counts words, rows and columns in the input that you pass to it, in this case the input is the result of piping cal to wc which makes things easier for you because it's more functional, you only care about what each applications does, you don't care, for example, about what is the name of the argument or where to allocate a temporary file to store the input/output in between.
Without the pipes you should do something like
cal > temp.txt
wc temp.txt
rm temp.xt
to obtain pretty much the same information. Also this second solution could possibly generate problems, for example what if temp.txt already exists ? Following what kind of rationale you will tell to your script to pick a name for your temporary file ? What if another process modifies your file in between the 2 calls to cal and wc ?

Extracting URLs from large text/HTML files

I have a lot of text that I need to process for valid URLs.
The input is vaguely HTMLish, in that it's mostly html. However, It's not really valid HTML.
I*ve been trying to do it with regex, and having issues.
Before you say (or possibly scream - I've read the other HTML + regex questions) "use a parser", there is one thing you need to consider:
The files I am working with are about 5 GB in size
I don't know any parsers that can handle that without failing, or taking days. Furthermore, the fact that, while the text content is largely html, but not necessarily valid html means it would require a very tolerant parser. Lastly, not all links are necessarily in <a> tags (some may be just plaintext).
Given that I don't really care about document structure, are there any better alternatives WRT extracting links?
Right now I'm using the regex:
\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))) (in grep -E)
but even with that, I gave up after letting it run for about 3 hours.
Are there significant differences in Regex engine performance? I'm using MacOS's command-line grep. If there are other compatible implementations with better performance, that might be an option.
I don't care too much about language/platform, though MacOS/command line would be nice.
I wound up string a couple grep commands together:
pv -cN source allContent | grep -oP "(?:\"([^\"' ]*?)\")|(?:'([^\"' ]*?)')|(?:([^\"' ]*?) )" | grep -E "(http)|(www)|(\.com)|(\.net)|(\.to)|(\.cc)|(\.info)|(\.org)" | pv -cN out > extrLinks1
I used pv to give me a progress indicator.
grep -oP "(?:\"([^\"' ]*?)\")|(?:'([^\"' ]*?)')|(?:([^\"' ]*?) )"
Pulls out anything that looks like a word or quoted text, and has no spaces.
grep -E "(http)|(www)|(\.com)|(\.net)|(\.to)|(\.cc)|(\.info)|(\.org)"
Filters the output for anything that looks like it could be a URL.
Finally,
pv -cN out > extrLinks1
Outputs it to a file, and gives a nice activity meter.
I'll probably push the generated file through sort -u to remove duplicate entries, but I didn't want to string that on the end because it would add another layer of complexity, and I'm pretty sure that sort will try to buffer the whole file, which could cause a crash.
Anyways, as it's running right now, it looks like it's going to take about 40 minutes. I didn't know about pv before. It's a really cool utility!
I think you are in the right track, and grep should be able to handle a 5Gb file. Try simplifying your regex avoid the | operator and so many parenthesis. Also, use the head command to grab the first 100Kb before running against the whole file, and chain the greps using pipes to achieve more specificity. For example,
head -c 100000 myFile | grep -E "((src)|(href))\b*=\b*[\"'][\w://\.]+[\"']"
That should be super fast, no?

DIFF utility works for 2 files. How to compare more than 2 files at a time?

So the utility Diff works just like I want for 2 files, but I have a project that requires comparisons with more than 2 files at a time, maybe up to 10 at a time. This requires having all those files side by side to each other as well. My research has not really turned up anything, vimdiff seems to be the best so far with the ability to compare 4 at a time.
My question: Is there any utility to compare more than 2 files at a time, or a way to hack diff/vimdiff so it can do multiple comparisons? The files I will be comparing are relatively short so it should not be too slow.
Displaying 10 files side-by-side and highlighting differences can be easily done with Diffuse. Simply specify all files on the command line like this:
diffuse 1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt 10.txt
Vim can already do this:
vim -d file1 file2 file3
But you're normally limited to 4 files. You can change that by modifying a single line in Vim's source, however. The constant DB_COUNT defines the maximum number of diffed files, and it's defined towards the top of diff.c in versions 6.x and earlier, or about two thirds of the way down structs.h in versions 7.0 and up.
diff has built-in option --from-file and --to-file, which compares one operand to all others.
--from-file=FILE1
Compare FILE1 to all operands. FILE1 can be a directory.
--to-file=FILE2
Compare all operands to FILE2. FILE2 can be a directory.
Note: argument name --to-file is optional.
e.g.
# this will compare foo with bar, then foo with baz .html files
$ diff --from-file foo.html bar.html baz.html
# this will compare src/base-main.js with all .js files in git repo,
# that has 'main' in their filename or path
$ git ls-files :/*main*.js | xargs diff -u --from-file src/base-main.js
Checkout "Beyond Compare": http://www.scootersoftware.com/
It lets you compare entire directories of files, and it looks like it runs on Linux too.
if your running multiple diff's based off one file you could probably try writing a script that has a for loop to run through each directory and run the diff. Although it wouldn't be side by side you could at least compare them quickly. hope that helped.
Not answering the main question, but here's something similar to what Benjamin Neil has suggested but diffing all files:
Store the filenames in an array, then loop over the combinations of size two and diff (or do whatever you want).
files=($(ls -d /path/of/files/some-prefix.*)) # Array of files to compare
max=${#files[#]} # Take the length of that array
for ((idxA=0; idxA<max; idxA++)); do # iterate idxA from 0 to length
for ((idxB=idxA + 1; idxB<max; idxB++)); do # iterate idxB + 1 from idxA to length
echo "A: ${files[$idxA]}; B: ${files[$idxB]}" # Do whatever you're here for.
done
done
Derived from #charles-duffy's answer: https://stackoverflow.com/a/46719215/1160428
There is a simple an good way to do this = GREP.
Depending on the size of the text you can copy and paste it, or you can redirect the input of the file to the grep command. If you make a grep -vir /path to make a reverse search or a grep -ir /path. This is my way for certification exams.

FreeBSD, MySQL, Perl, bash: intermittent blocking on named pipes?

This is weird and I'm not sure who the culprit really is.
I'm doing some scripting, on FreeBSD (6.2)? which makes extensive use of the following ***bash***ism:
do_something <(mysql --skip-column-names -B -e 'select ... from ... where ...;')
... where "do_something is a somewhat crufty utility (in Perl) that won't read from a pipeline. If I use a regular file it works fine. My bash script using things like exec 4< <(...) with these sorts of queries (following by loops of the form while read x y z <&4; do ... never seem to have any issues.
However, Perl (5.8.x) seems to periodically block (apparently forever). I tried changing out the chomp(my $data = <MYDATA>); with a routine that used sysread and I wrote some test cases in Python for comparison. These seem to block far less often than the idiomatic Perl code, but they still do it sometimes. (The Python code using f.read() or os.read(f.fileno()...) seems to behave about equally in this issue).
I've tried reproducing the issue using ... <(cat ...) (where I'm cating the regular file) and that never seems to reproduce that stall.
I've glanced at some ktrace/kdump data ... but I'm far more familiar with Linux strace or even Solaris truss ... so I haven't figured out what's going from there yet, either.
I suppose we can mostly rule out Perl, because I've reproduced the same issue using Python ... I don't see how the bash could be doing anything wrong here (it's just creating a named pipe in /var/tmp/sh-np-xxx and wiring the processes up to that).
What could the mysql shell/utility be doing that might cause this? I don't think I've seen it from anything else (such as cat or dd). I haven't tested this scenario under Linux ... but I've used <(...) (process substitution) for years under Linux and don't recall ever seeing this.
Is it a FreeBSD issue?
Sure I can work around the issue using temporary files ... but I'd sure rather understand why it's doing this (and avoid some of the races and clean-up messiness that temporary files entail).
Any suggestions?
The big difference between operating on the output of mysql and directly on a file is timing. When the perl process is stalled, the big question is: "why is it not making forward progress"? You can use the "l" option to ps to see the wait channel for the perl process; that way you can see if it blocked on a read, or if something else is going on. If it is really blocked on pipe input, I expect the MWCHAN entry for perl to be "piperd".
The same information would be interesting for the mysql process.
What does your Python test code look like?
Another way of writing this while avoiding the bashism is this; that would allow you to rule out bash:
mysql --skip-column-names -B -e 'select ... from ... where ...;' | do_something /dev/stdin
Other interesting questions:
Does the --unbuffered option to mysql change anything?
Does piping the mysql output through dd change anything? (eg. "perlscript <(mysql ... | dd)
Summary: Need more information.