How to filter mysql.log whole query? - mysql

Hello I have a problem with filtering mysql.log (general log). I am trying to filter whole query, but in log file lines are split by newline and using GREP shows only part of the query.
Command
tail -n 2000000 mysql.log | grep '016198498'
Produces only this - without UPDATE table SET etc. just part of the code
inm = '016198498',
Any solution to grep whole query with timestamp ?

Solution has been found. You can grep lines before and after. e.g. 10 lines before and 10 lines after which provides sufficient output for me in this case.
tail -n 3000000 mysql.log | grep -B 10 -A 10 '016198498'

Related

separate the saved output of command that process a very large compressed JSON file

Ok, let's start with the command line that i'm using:
curl --silent http://example.com/json.gz | pigz -dc | jq -r '[.name, .value] | #csv' > data.csv
CURL will download a compressed of 11.6 GB of JSON file, pigz will decompress it and write all processed output to stdout, jq will read JSON and save the output as csv file.
The problem is, the output that saved as data.csv is extremely large and after all i still need to analyze this data by using a PHP script and insert it to MYSQL in special format (the data will have very small size then)
But, i only have less than 60 GB free space left in my server, even i'm not able to decompress the full data and save it to the CSV file.
So, i got an idea, if i'm able to save the output to separated files that have different names (let's say the names are the current date or timestamp) then i can run the PHP script to process every .csv file of them and save data to db and then delete the file to free the space, not sure if this is the best way, but at least i'm trying to make it work.
So, i modified my command line to:
curl --silent http://example.com/json.gz | pigz -dc | jq -r '[.name, .value] | #csv' > `date +"%S-%M-%d-%m-%Y"`_data.csv
But, it saved it all in one file only, i thought it will save it as multi files that everyone of them have different name since the date will keep changing while the output is written.
Also, any other working solutions are welcome, thanks!
save space with GNU split --filter
POSIX split creates output files from its input and so requires a lot of free space to store them (the size of the entire uncompressed input plus some overhead).
However, the GNU version of split has an extra --filter option which allows processing individual chunks of data in much less space as it does not need to create any temporary files:
| split -l $NUMLINES --filter='shell_command'
You can think of it like xargs -n $NUMLINES command except passing data to stdin instead of as commandline arguments.
For example, to output the md5sum of each set of (up to) 7 lines of /etc/passwd and then output the number of chunks processed:
</etc/passwd split -l7 --filter='md5sum|tee /dev/tty' |\
{ echo Processed $(wc -l) chunks; }
To modify your command to work on 10000 lines at a time, you could do something like:
curl -L --silent "$URL" |\
pigz -dc |\
jq -r '[.name, .value] | #csv' |\
split -l 10000 --filter='save2db.php'
Your filter command save2db.php should read from stdin.
If you prefer to make it read from an actual file, you can do something like:
... |\
split -l 10000 --filter='cat >TMPFILE; save2db.php TMPFILE';
rm TMPFILE
Warning: You'll need to ensure that it is safe to split your csv file on line boundaries. Some csv files contain fields with embedded literal newlines; they may become malformed if split mid-field.
Use the split command, see man-page
Simple example (10MB to STDOUT):
# dd if=/dev/zero bs=1M count=10 | split - --bytes=1M -d -a3 out
Output files (10 files with size of 1MB read from STDIN):
# stat -c "%s %n" out00*
1048576 out000
1048576 out001
1048576 out002
1048576 out003
1048576 out004
1048576 out005
1048576 out006
1048576 out007
1048576 out008
1048576 out009
Or split the saved file with split --bytes=1M -d -a3 out out
Output:
# stat -c "%s %n" out*
10485760 out
1048576 out000
1048576 out001
1048576 out002
1048576 out003
1048576 out004
1048576 out005
1048576 out006
1048576 out007
1048576 out008
1048576 out009
I'd suggest using a program such as awk to do the partitioning, e.g. like so:
jq -rc '[.id, .value] | #csv' |
awk -v NUM 100000 '{n++; print > "out." int((n+NUM)/NUM) ".csv"}'

Explain me how does this Shell pipe magic (... | tee >(tail -c1 >$PULSE) | bzip2 | ...) works?

Here the original source code (relevant 30 lines bash code highlighted)
Here simplified (s3 is a binary which streams to object storage). The dots (...) are options not posted here.
PULSE=$(mktemp -t shield-pipe.XXXXX)
trap "rm -f ${PULSE}" QUIT TERM INT
set -o pipefail
mysqldump ... | tee >(tail -c1 >$PULSE) | bzip2 | s3 stream ...
How does that work exactly? Can you explain me how this redirections and pipes working? Howto debug the error mysqldump: Got errno 32 on write. When manually invoked (only) mysqldump never fails with an error.
The tricky part is that:
tee writes to standard output as well as a file
>( cmd ) creates a writeable process substitution (a command that mimics the behaviour of a writeable file)
This is used to effectively pipe the output of mysqldump into two other commands: tail -c1 to print the last byte to a file and bzip2 to compress the stream.
As Inian pointed out in the comments, the error 32 comes from a broken pipe. I guess that this comes from s3 stream terminating (maybe a timeout?) which in turn causes the preceding commands in the pipeline to fail.

Grep single value after match

I have a file containing:
{"id":1,"jsonrpc":"2.0","result":{"speed":0}}
How would I be able to grep "0" after "speed":"?
I have tried 'grep -o -P "speed":{1}', not what I am looking for.
You should use jq (sudo apt-get install jq on raspbian) for this task.
echo '{"id":1,"jsonrpc":"2.0","result":{"speed":0}}' | jq .result.speed
Result: 0
Since you said in your question that you have a file "containing" this line, you might want to use grep first to get only the line you're interested in, otherwise jq might throw an error.
Example file:
abc
{"id":1,"jsonrpc":"2.0","result":{"speed":0}}
123
Running grep "speed" yourfile.txt | jq .result.speed would output 0.

Bash for loop picking up filenames and a column from read -r and gnu plot

The top part of the following script works great, the .dat files are created via the MySQL command, and work perfectly with gnu plot (via the command line). The problem is getting the bottom (gnuplot) to work correctly. I'm pretty sure I have a couple of problems in the code: variables and the array. I need to call each .dat file (plot), have the title in the graph (from title in customers.txt)and name it (.png)
any guidance would be appreciated. Thanks a lot -- RichR
#!/bin/bash
set -x
databases=""
titles=""
while read -r ipAddr dbName title; do
dbName=$(echo "$dbName" | sed -e 's/pacsdb//')
rm -f "$dbName.dat"
touch "$dbName.dat"
databases=("$dbName.dat")
titles="$titles $title"
while read -r period; do
mysql -uroot -pxxxx -h "$ipAddr" "pacsdb$dbName" -se \
"SELECT COUNT(*) FROM tables WHERE some.info BETWEEN $period;" >> "$dbName.dat"
done < periods.txt
done < customers.txt
for database in "${databases[#]}"; do
gnuplot << EOF
set a bunch of options
set output "/var/www/$dbName.png"
plot "$dbName.dat" using 2:xtic(1) title "$titles"
EOF
done
exit 0
customers.txt example line-
192.168.179.222 pacsdbgibsonia "Gibsonia Animal Hospital"
Error output.....
+ for database in '"${databases[#]}"'
+ gnuplot
line 0: warning: Skipping unreadable file ".dat"
line 0: No data in plot
+ exit 0
to initialise databases array:
databases=()
to append $dbName.dat to databases array:
databases+=("$dbName.dat")
to retrieve dbName, remove suffix pattern .dat
dbName=${database%.dat}

Export pcap data to csv: timestamp, bytes, uplink/downlink, extra info [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I was wondering if there is any tool that can parse pcap data and convert it to a csv file with the following information:
timestamp, bytes, uplink/downlink, some extra info..
Basically, the uplink/downlink could be seen by the IP/MAC address, and the extra info is not really needed, but what I mean with that is choose a specific field of a packet for example.
I have been trying some tools but I have not found the suitable one yet. Otherwise I will write a small parser.
Thanks in advance!
TShark Here are some examples:
$ tshark -r test.pcap -T fields -e frame.number -e eth.src -e eth.dst -e ip.src -e ip.dst -e frame.len > test1.csv
$ tshark -r test.pcap -T fields -e frame.number -e eth.src -e eth.dst -e ip.src -e ip.dst -e frame.len -E header=y -E separator=, > test2.csv
$ tshark -r test.pcap -R "frame.number>40" -T fields -e frame.number -e frame.time -e frame.time_delta -e frame.time_delta_displayed -e frame.time_relative -E header=y > test3.csv
$ tshark -r test.pcap -R "wlan.fc.type_subtype == 0x08" -T fields -e frame.number -e wlan.sa -e wlan.bssid > test4.csv
$ tshark -r test.pcap -R "ip.addr==192.168.1.6 && tcp.port==1696 && ip.addr==67.212.143.22 && tcp.port==80" -T fields -e frame.number -e tcp.analysis.ack_rtt -E header=y > test5.csv
$ tshark -r test.pcap -T fields -e frame.number -e tcp.analysis.ack_rtt -E header=y > test6.csv
Look no further, wireshark is your best friend. It can open your pcap file and allow you to specify extra columns which you want. After this you can simply export them as csv. On the main interface, simply right on any one of the columns and select "column preference". This opens a new window which is very intuitive. Just add a new column and specify the field name. As simple as that.
I had tried tshark but trust me it becomes a bit annoying especially with this:
tshark: Read filters were specified both with "-R" and with additional command-line arguments."
This message pops up if you include too many columns or for whatever unknown reason.
It looks like you want Bro's connection logs:
bro -r trace.pcap
head conn.log
Output:
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path conn
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes
#types time string addr port addr port enum string intervacount count string bool count string count count count count
1258531221.486539 gvuu4KIHDph 192.168.1.102 68 192.168.1.1 67 udp - 0.163820 301 300 SF - 0 Dd 1 329 1 328
1258531680.237254 6nWmFGj6kWg 192.168.1.103 137 192.168.1.255 137 udp dns 3.780125 350 0 S0 - 0 546 0 0
1258531693.816224 y2lMKyrnnO6 192.168.1.102 137 192.168.1.255 137 udp dns 3.748647 350 0 S0 - 0 546 0 0
Now parse the relevant fields:
bro-cut ts id.orig_h id.orig_p id.resp_h id.resp_p service orig_bytes resp_bytes < conn.log | head
1258531221.486539 192.168.1.102 68 192.168.1.1 67 - 301 300
1258531680.237254 192.168.1.103 137 192.168.1.255 137 dns 350 0
1258531693.816224 192.168.1.102 137 192.168.1.255 137 dns 350 0
1258531635.800933 192.168.1.103 138 192.168.1.255 138 - 560 0
1258531693.825212 192.168.1.102 138 192.168.1.255 138 - 348 0
1258531803.872834 192.168.1.104 137 192.168.1.255 137 dns 350 0
1258531747.077012 192.168.1.104 138 192.168.1.255 138 - 549 0
1258531924.321413 192.168.1.103 68 192.168.1.1 67 - 303 300
1258531939.613071 192.168.1.102 138 192.168.1.255 138 - - -
1258532046.693816 192.168.1.104 68 192.168.1.1 67 - 311 300
You can do this from the Wireshark application itself:
Make sure you have saved the file to disk already (File>Save) (if you have just
done a capture)
Go to File>Export Packet Dissesctions>as "CSV" [etc]
Then enter a filename (make sure you add .csv on the end as WS does not
do this!)
Voila
Here is the python tool to divide the pcap into flows and output the extracted features into a CSV file
Try using flows_to_weka tool in python
This requires a version of scapy installed in your system and better to copy the scapy folder inside the weka folder. And copy the wfe.py, tcp_stream.py and entropy.py files inside the scapy folder. After you done this
Your current directory should look something like this:
C:\Users\INKAKA\flows_to_weka\scapy
and copy the .pcap file into this folder and try running this command :
$python wfe.py -i input.pcap -t csv > output.csv
and you can also retrieve the features that you want by adding the required features in tcp_stream.py and wfe.py.
For reference you can visit :
https://github.com/fichtner/flows_to_weka
Install argus via terminal
sudo apt-get install argus-client
Convert .pcap to .argus file format
argus -r filename.pcap -w filename.argus
-r <FILE> Read FILE
-w <FILE> Write FILE
Convert .argus to .csv file forrmat while choosing which features to extract
ra -r filename.argus -u -s <features-comma-seprated>
Example:
ra -r filename.argus -u -s rank, stime, ltime, dur
-r <FILE> Read FILE
-u Print time values using Unix time format (seconds from the Epoch).
-s Specify the fields to print.
The list of available fields to print can be found here
This information is copied from my original blog which you can read here
As noted in the comments to the question, to output the ip addresses for frames in a capture file in csv format use something like:
tshark -r <filename> -t fields -e ip.addr
See the tshark help for more information about options to set the separator and quoting characters in the csv output.
Field names can be determined by using Wireshark to examine the capture file and selecting a particular field in the details pane. The field name will be then shown in the status line at the bottom of the Wireshark window.
Is it possible that we can set fields separator other than comma ?
Because in my PCap file, if i set the separator=, then my data in output file (.csv) doesn't looks good because i have , in my most of the columns.
So i want to know that is there any way we can set the field separator like other charactors i.e., | (pip) etc
Thanks