Simple way to verify valid BPF filter - libpcap

What is the simplest way to verify a BPF filter as a normal user?
Easiest I have found is to run tcpdump with a small pcap file as input to the -r option.
$ tcpdump -r one_packet.pcap -F invalid_bpf.conf 2> /dev/null ; echo $?
1
$ tcpdump -r one_packet.pcap -F valid_bpf.conf 2> /dev/null ; echo $?
0
Returns standard error codes for invalid or valid BPF filters. This requires that I have a PCAP file to provide as input.
Is there a way to do this simple test without a PCAP file or special privileges?

IF you have a shell that has a built-in "echo" command that supports escape sequences, one somewhat-perverse way of doing this would be to do
echo -en "\0324\0303\0262\0241\02\0\04\0\0\0\0\0\0\0\0\0\0377\0377\0\0\01\0\0\0"|\
./tcpdump -r - -F bpf.conf 2>/dev/null; echo $?
This worked for me on OS X 10.8, which has bash 3.2.48(1)-release (x86_64-apple-darwin12).
That "echo" command writes out a short pcap file with no packets in it, and with a link-layer header type of DLT_EN10MB. That will test whether the filter is valid for Ethernet; there are filters that are valid for some link-layer header types but not valid for others, such as "not broadcast", which is valid for Ethernet but not for PPP, so you'll need to choose some link-layer header type to use when testing.

Related

Why is JSON from aws rds run in Docker "malformed" according to other tools?

To my eyes the following JSON looks valid.
{
"DescribeDBLogFiles": [
{
"LogFileName": "error/postgresql.log.2022-09-14-00",
"LastWritten": 1663199972348,
"Size": 3032193
}
]
}
A) But, jq, json_pp, and Python json.tool module deem it invalid:
# jq 1.6
> echo "$logfiles" | jq
parse error: Invalid numeric literal at line 1, column 2
# json_pp 4.02
> echo "$logfiles" | json_pp
malformed JSON string, neither array, object, number, string or atom,
at character offset 0 (before "\x{1b}[?1h\x{1b}=\r{...") at /usr/bin/json_pp line 51
> python3 -m json.tool <<< "$logfiles"
Expecting value: line 1 column 1 (char 0)
B) But on the other hand, if the above JSON is copy & pasted into an online validator, both 1 and 2, deem it valid.
As hinted by json_pp's error above, hexdump <<< "$logfiles" indeed shows additional, surrounding characters. Here's the prefix: 5b1b 313f 1b68 0d3d 1b7b ...., where 7b is {.
The JSON is output to a logfiles variable by this command:
logfiles=$(aws rds describe-db-log-files \
--db-instance-identifier somedb \
--filename-contains 2022-09-14)
# where `aws` is
alias aws='docker run --rm -it -v ~/.aws:/root/.aws amazon/aws-cli:2.7.31'
> bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Have perused this GitHub issue, yet can't figure out the cause. I suspect that double quotes get mangled somehow when using echo - some reported that printf "worked" for them.
The use of docker run --rm -it -v command to produce the JSON, added some additional unprintable characters to the start of the JSON data. That makes the resulting file $logfiles invalid.
The -t option allocations a tty and the -i creates an interactive shell. In this case the -t is allowing the shell to read login scripts (e.g. .bashrc). Something in your start up scripts is outputting ansi escape codes. Often this will to clear the screen, set up other things for the interactive shell, or make the output more visually appealing by colorizing portions of the data.

Setting the SGE cluster job name with Snakemake while using DRMAA?

Problem
I'm not sure if the -N argument is being saved. SGE Cluster. Everything works except for the -N argument.
Snakemake requires a valid -N call
It doesn't set the job name properly.
It always reverts to the default name. This is my call, which has the same results, with or without the -N argument.
snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan"
The only way I have found to influence the job name is to use --jobname.
snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan" --jobname "{rule}.{wildcards}.{jobid}"
Background
I've tried a variety of things. Usually I actually just use a cluster configuration file, but that isn't working either, so that's why in the code above, I ditched the file system to make sure it's the '-N' command which isn't being saved.
My usual call is:
snakemake --drmaa "{cluster.clusterSpec}" --jobs 10 --cluster-config input/config.json
1) If I use '-n' instead of '-N', I receive a workflow error:
drmaa.errors.DeniedByDrmException: code 17: ERROR! invalid option argument "-n"
2) If I use '-N', but give it an incorrect wildcard, say {rule.name}:
AttributeError: 'str' object has no attribute 'name'
3) I cannot use both --drmaa AND --cluster:
snakemake: error: argument --cluster/-c: not allowed with argument --drmaa
4) If I specify the {jobid} in the config.json file, then Snakemake doesn't know what to do with it.
RuleException in line 13 of /extscratch/clc/projects/tboyarski/gitRepo-LCR-BCCRC/Snakemake/modules/mpileup/mpileupSPLIT:
NameError: The name 'jobid' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}
EDIT Added #5 w/ Solution
5) I can set the job name using the config.json and just concatenate the jobid on afterwards in my snakemake call. That way I have a generic snakemake call (--jobname "{cluster.jobName}.{jobid}"), and a highly configurable and specific job name ({rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}) which results in:
mpileupSPLIT-Pfeiffer_chr19.1.e7152298
The 1 is the Snakemake jobid according to the DAG.
The 7152298 is my cluster's job number.
2nd EDIT - Just tried v3.12, same thing. Concatenation must occur in snakemake call.
Alternative solution
I would also be okay with something like this:
snakemake --drmaa "{cluster.clusterSpec}" --jobname "{cluster.jobName}" --jobs 10 --cluster-config input/config.json
With my cluster file like this:
"mpileupSPLIT": {
"clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -n {rule}.{wildcards}.varScan",
"jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}.{jobid}"
}
Documentation Reviewed
I've read the documentation but I was unable to figure it out.
http://snakemake.readthedocs.io/en/latest/executable.html?-highlight=job_name#cluster-execution
http://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#snakefiles-cluster-configuration
https://groups.google.com/forum/#!topic/snakemake/whwYODy_I74
System
Snakemake v3.10.2 (Will try newest conda version tomorrow)
Red Hat Enterprise Linux Server release 5.4
SGE Cluster
Solution
Use '--jobname' in your snakemake call instead of '-N' in your qsub parameter submission
Setup your cluster config file to have a targetable parameter for the jobname suffix. In this case these are the overrides for my Snakemake rule named "mpileupSPLIT":
"mpileupSPLIT": {
"clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1",
"jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}"
}
Utilize a generic Snakemake call which includes {jobid}. On a cluster (SGE), the 'jobid' variable contains both the Snakemake Job# and the Cluster Job#, both are valuable as the first corresponds to the Snakemake DAG and the later is for cluster logging. (E.g. --jobname "{cluster.jobName}.{jobid}")
EDIT Added solution to resolve post.

Debug a large json file that is in one line

I have a 2MB json file that is only all in one line and now I get an error using jq:
$ jq .<nodes.json
parse error: Invalid literal at line 1, column 377140
How do I debug this on the console? To look at the mentioned column, I tried this:
head -c 377139 nodes.json|tail -c 1000
But I cannot find any error with a wrong t there, so it seems it is not the correct way to reach the position in the file.
How can I debug such a one-liner?
cut the file into more lines with
cat nodes.json|cut -f 1- -d} --output-delimiter=$'}\n'>/tmp/a.json
and analyse /tmp/a.json with, then you get an error with line nr:
parse error: Invalid literal at line 5995, column 47
use less -N /tmp/a.json to find that line
I see you are on a shell prompt. So you could try perl, because your operating system has it pre-installed, presumably.
cat nodes.json | json_xs -f json -t json-pretty
This tells the json_xs command line program to parse the file and prettify it.
If you don't have json_xs installed, you could try json_pp (pp is for pure-perl).
If you have neither, you must install the JSON::XS perl module with this command:
sudo cpanm JSON::XS
[sudo] password for knb:
--> Working on JSON::XS
Fetching http://www.cpan.org/authors/id/M/ML/MLEHMANN/JSON-XS-3.01.tar.gz ... OK
Configuring JSON-XS-3.01 ... OK
Building and testing JSON-XS-3.01 ... OK
Successfully installed JSON-XS-3.01 (upgraded from 2.34)
1 distribution installed
This installs JSON::XS and a few helper scripts, among them json_xs and json_pp.
Then you can run this simple one-liner:
cat dat.json | json_xs -f json -t json-pretty
After misplacing a parenthesis to force a nesting-error somewhere in the valid json file dat.json I got this:
cat dat.json | json_xs -f json -t json-pretty
'"' expected, at character offset 1331 (before "{"A_DESC":"density i...") at /usr/local/bin/json_xs line 181, <STDIN> line 1.
Maybe this is more informative than the jq output.

Converting a bash command output into JSON and serving it over http on the fly

I want to convert the output of ifstat command into JSON and serve it over http on the fly to be used for a javascript graph app. Are there any lightweight -- sed or awk -- command-line solutions which I can use? I do not want to store JSON output on the disk and it would be good if the web-server was a small lightweight command line tool into which I can pipe JSON output.
EDIT 1:
This is the live streaming chart library which will use the data. I'm not keen on a specific web server; any webserver that does the job would be fine.
This is what I have tried.
Terminal #1
ifstat -n | awk 'NR>2{print systime(),$0; fflush()}' | tee ifstat.log
Terminal #2
while :
do
{
echo -e "HTTP/1.1 200 OK"
echo -e "Content-Type: application/json\n"
tail -n1 ifstat.log | awk '{ printf("{\"time\":%s, \"in\":%s, \"out\":%s}\n", $1, $2, $3) }'
} | nc -l 8000
done
firefox
open: http://localhost:8000
{"time":1332052321, "in":1.24, "out":2.62}
I know little about JSON. Maybe the output is invalid. You should rewrite the awk command.

How To Capture network packets to MySQL

I'm going to design a network Analyzer for WiFi (802.11)
Currently I use tshark to capture and parse the WiFi frames and then pipe the output to a perl script to store the parsed information to Mysql database.
I just find out that I miss alot of frames in this process. I checked and the frames seem to be lost during the Pipe (when the output is delivered to perl to get srored in Mysql)
Here is how it goes
(Tshark) -------frames are lost----> (Perl) --------> (MySQL)
this is the how I pipe the output of tshark to script:
sudo tshark -i mon0 -t ad -T fields -e frame.time -e frame.len -e frame.cap_len -e radiotap.length | perl tshark-sql-capture.pl
this is simple template of the perl script I use (tshark-sql-capture.pl)
# preparing the MySQL
my $dns = "DBI:mysql:capture;localhost";
my $dbh = DBI->connect($dns,user,pass);
my $db = "captured";
while (<STDIN>) {
chomp($data = <STDIN>);
($time, $frame_len, $cap_len, $radiotap_len) = split " ", $data;
my $sth = $dbh-> prepare("INSERT INTO $db VALUES (str_to_date('$time','%M %d, %Y %H:%i:%s.%f'), '$frame_len', '$cap_len', '$radiotap_len'\n)" );
$sth->execute;
}
#Terminate MySQL
$dbh->disconnect;
Any Idea which can help to make the performance better is appreciated.Or may be there is an Alternative mechanism which can do better.
Right now my performance is 50% means I can store in mysql around half of the packets I'v captured.
Things written in a pipe don't get lost, what's probably really going on is that tshark tries to write to the pipe but perl+mysql is too slow to process the input so the pipeb is full, write would block so tshark just drops the packets.
Bottleneck could be either MySQL or Perl itself but probably the DB. Check CPU usage, measure insert rate. Then pick a faster DB or write to multiple DBs. You can also try batch inserts and increasing the size of the pipe buffer.
Update
while (<STDIN>)
this reads a line into $_, then you ignore it.
For pipe problems, you can improve packet capture with GULP http://staff.washington.edu/corey/gulp/
From the Man pages:
1) reduce packet loss of a tcpdump packet capture:
(gulp -c works in any pipeline as it does no data interpretation)
tcpdump -i eth1 -w - ... | gulp -c > pcapfile
or if you have more than 2, run tcpdump and gulp on different CPUs
taskset -c 2 tcpdump -i eth1 -w - ... | gulp -c > pcapfile
(gulp uses CPUs #0,1 so use #2 for tcpdump to reduce interference)
you can use a FIFO file, then read the packets and inserts in mysql using insert delay.
sudo tshark -i mon0 -t ad -T fields -e frame.time -e frame.len -e frame.cap_len -e radiotap.length > MYFIFO