How To Capture network packets to MySQL - mysql

I'm going to design a network Analyzer for WiFi (802.11)
Currently I use tshark to capture and parse the WiFi frames and then pipe the output to a perl script to store the parsed information to Mysql database.
I just find out that I miss alot of frames in this process. I checked and the frames seem to be lost during the Pipe (when the output is delivered to perl to get srored in Mysql)
Here is how it goes
(Tshark) -------frames are lost----> (Perl) --------> (MySQL)
this is the how I pipe the output of tshark to script:
sudo tshark -i mon0 -t ad -T fields -e frame.time -e frame.len -e frame.cap_len -e radiotap.length | perl tshark-sql-capture.pl
this is simple template of the perl script I use (tshark-sql-capture.pl)
# preparing the MySQL
my $dns = "DBI:mysql:capture;localhost";
my $dbh = DBI->connect($dns,user,pass);
my $db = "captured";
while (<STDIN>) {
chomp($data = <STDIN>);
($time, $frame_len, $cap_len, $radiotap_len) = split " ", $data;
my $sth = $dbh-> prepare("INSERT INTO $db VALUES (str_to_date('$time','%M %d, %Y %H:%i:%s.%f'), '$frame_len', '$cap_len', '$radiotap_len'\n)" );
$sth->execute;
}
#Terminate MySQL
$dbh->disconnect;
Any Idea which can help to make the performance better is appreciated.Or may be there is an Alternative mechanism which can do better.
Right now my performance is 50% means I can store in mysql around half of the packets I'v captured.

Things written in a pipe don't get lost, what's probably really going on is that tshark tries to write to the pipe but perl+mysql is too slow to process the input so the pipeb is full, write would block so tshark just drops the packets.
Bottleneck could be either MySQL or Perl itself but probably the DB. Check CPU usage, measure insert rate. Then pick a faster DB or write to multiple DBs. You can also try batch inserts and increasing the size of the pipe buffer.
Update
while (<STDIN>)
this reads a line into $_, then you ignore it.

For pipe problems, you can improve packet capture with GULP http://staff.washington.edu/corey/gulp/
From the Man pages:
1) reduce packet loss of a tcpdump packet capture:
(gulp -c works in any pipeline as it does no data interpretation)
tcpdump -i eth1 -w - ... | gulp -c > pcapfile
or if you have more than 2, run tcpdump and gulp on different CPUs
taskset -c 2 tcpdump -i eth1 -w - ... | gulp -c > pcapfile
(gulp uses CPUs #0,1 so use #2 for tcpdump to reduce interference)

you can use a FIFO file, then read the packets and inserts in mysql using insert delay.
sudo tshark -i mon0 -t ad -T fields -e frame.time -e frame.len -e frame.cap_len -e radiotap.length > MYFIFO

Related

How can I limit strace output size?

I am running this command on plesk ubuntu via ssh terminal:
strace -p 1234567 -Tf 2>&1 | grep -v select > /path/file.log
This traces a running process, filters out select commands, and writes the output to a file.
How can I limit the size of that file to 8M? The goal is to capture the last 8M of output before the process dies. The file grows quickly, so logrotate's daily cycle won't do, and it can't be a manual process because it may have to run for days. I tried piping into "tail -c 8M", but I think buffering is preventing any output from that. How can I accomplish this?

How to limit tcpdump to collect data for set time. ( Only collect for 60 sec for example)

I am trying to run tcp dump to collect all packets for a set time ( ie: 60 seconds,) but not sure how I can achieve it captures all packets and then writes it to file.
So far I have tried:
tcpdump -s0 -i 0.0 -c 5 -vv -n host XXX.XXX.XXX.XXX -w /var/log/XXX.pcap -v
but don't think that is the best option.
Any advice much appreciated!
How to limit tcpdump to collect data for set time
You can combine the options -W (Used in conjunction with the -G option, this will limit the number of rotated dump files that get created, exiting with status 0 when
reaching the limit.) and -G rotate_seconds to that effect, i. e. change -c 5 to -W1 -G60.

Mysql cli not returning data in bash script run by crontab

I have a bash script that is executed via a cron job
#!/bin/bash
# abort on errors
set -e
ABS_DIR=/path/
# extract the creds for the mysql db
DB_USER="USERNAME"
DB_PASS="PASSWORD"
function extract_data() {
file=$2
sql_query=`cat $ABS_DIR/$1`
data=`mysql -u $DB_USER --password="$DB_PASS" -D "database" -e "$sql_query" | tail -n +2`
echo -e "Data:"
echo -e "$data"
}
extract_data "sql_query.sql" "log.csv"
When running it manually with bash extract.sh the mysql cmd fetches the data correctly and I see the echo -e "$data" on the console.
When running the script via a cron job
* 12 * * * /.../extract.sh > /.../cron_log.txt
then I get an empty line saved to the cron_log.txt file!?
This is a common problem; a script behaves differently when run from user shell and when run from crontab. The cause is typically due to differences in the environment variables in the user shell, and in the crontab shell; by default, they are not the same.
To begin debugging this issue, you could direct stderr as well as stdout from crontab, hopefully to capture an error message:
extract.sh &> /.../cron_log.txt
(notice the &)
Also: you have three dots (/.../) -- that is likely a typo, could also be the cause.

Stress test API using multiple JSON files

I am trying to fire 40000 requests towards an API using 40000 different JSON files.
Normally I could do something like this:
for file in /dir/*.json
do
#ab -p $file -T application/json -c1 -n1 <url>
curl -X POST -d#"$file" <url> -H "Content-Type: application/json"
done;
My problem is that I want to run simultaneous requests, e.g. 100 and I want the total time it took to send all requests etc. recorded. I can't use the -c 100 -n 40000 in ab since its the same URL with different files.
The files/requests all look something like
{"source":"000000000000","type":"A"}
{"source":"000000000001","type":"A"}
{"source":"000000000003","type":"A"}
I was not able to find any tool that supports this out of the box (e.g. Apache Benchmark - ab).
I came across this example here on SO (modded for this question).
Not sure I understand why that example would "cat /tmp" when mkfifo tmp is a file and not a dir though. Might work?
mkfifo tmp
counter=0
for file in /dir/*.json
do
if [ $counter -lt 100 ]; then
curl -X POST -H "Content-Type: application/json" -d#"$file" <url> &
let $[counter++];
else
read x < tmp
curl -X POST -H "Content-Type: application/json" -d#"$file" <url> &
fi
done;
cat /tmp > /dev/null
rm tmp
How should I go about achieving this in perl, ksh, bash or similar or does anyone know any tools that supports this out of the box?
Thanks!
If your request is just to time the total time take for sending these 40000 curl requests with different JSON each time, you can use good use of GNU parallel. The tool has great ways achieve job concurrency by making use of multiple cores on your machine.
The download procedure is quite simple. Follow How to install GNU parallel (noarc.rpm) on CentOS 7 for quick and easy list of steps. The tool has a lot more complicated flags to solve multiple use-cases. For your requirement though, just go the folder containing these JSON files and do
parallel --dry-run -j10 curl -X POST -H "Content-Type: application/json" -d#{} <url> ::: *.json
The above command tries to dry run your command, in terms of how parallel sets up the flags and processes its arguments and starts running your command. Here {} represents your JSON file. We've specified here to run 10 jobs at a time and increase the number depending on how fast it runs on your machine and by checking the number of cores on your machine. There are also flags to limit the overall CPU to be allowed use by parallel, so that it doesn't totally choke your system.
Remove --dry-run to run your actual command. And to clock the time taken for the process to complete, use the time command just prefix it before the actual command as time parallel ...

Error Handling on mongoimport

I have a directory of roughly 45,000 json files. The total size is around 12.8 GB currently. This is website data from kissmetrics and its structure is detailed here.
The data:
Each file is multiple json documents separated by a newline
It will be updated every 12 hours with new additional files
I want to import this data to mongoDB using mongoimport. I've tried this shell script to make the process easier:
for filename in revisions/*;
do
echo $filename
mongoimport --host <HOSTNAME>:<PORT> --db <DBNAME> --collection <COLLECTIONNAME> \
--ssl --sslCAFile ~/mongodb.pem --username <USERNAME> --password <PASSWORD> \
--authenticationDatabase admin $filename
done
This will have errors
2016-06-18T00:31:10.781+0000 using 1 decoding workers
2016-06-18T00:31:10.781+0000 using 1 insert workers
2016-06-18T00:31:10.781+0000 filesize: 113 bytes
2016-06-18T00:31:10.781+0000 using fields:
2016-06-18T00:31:10.822+0000 connected to: <HOSTNAME>:<PORT>
2016-06-18T00:31:10.822+0000 ns: <DBNAME>.<COLLECTION>
2016-06-18T00:31:10.822+0000 connected to node type: standalone
2016-06-18T00:31:10.822+0000 standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000 using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.822+0000 standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000 using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.824+0000 Failed: error processing document #1: invalid character 'l' looking for beginning of value
2016-06-18T00:31:10.824+0000 imported 0 documents
I will potentially run into this error, and from my inspection is not due to malformed data.
The error may happen hours into the import.
Can I parse the error in mongoimport to retry the same document? I don't know if the error will have this same form, so I'm not sure if I can try to handle it in bash. Can I keep track of progress in bash and restart if terminated early? Any suggestions on importing large data of this size or handling the error in shell?
Typically a given command will return error codes when it fails (and the are hopefully documented on the man page for the command).
So if you want to do something hacky and just retry once,
cmd="mongoimport --foo --bar..."
$cmd
ret=$?
if [ $ret -ne 0 ]; then
echo "retrying..."
$cmd
if [ $? -ne 0 ]; then
"failed again. Sadness."
exit
fi
fi
Or if you really need what mongoimport outputs, capture it like this
results=`mongoimport --foo --bar...`
Now the variable $results will contain what was returned on stdout. Might have to redirect stderr as well.