Awk json log file which are greater that a time - json

I have a log file that each line is a long json dictionary. None of logs have the same length, but all of them have a '_time_' key which is an epoch time in milliseconds. I want to search inside this log file to extract logs which are greater than a time like 1450616426 (second). Some log examples are:
{'id':Bob, 'last-login':'...', '_time_':1444211444123456, ...}
{'name':'ehsan', 'family':'toghian', 'last-login':'2015-4-12', '_time_': 1444215425123465, .....}
How can I write an awk command? Thanks in advanced.

$ cat tst.awk
{
milli = $0
sub(/.*_time_[^[:digit:]]+/,"",milli)
sub(/[^[:digit:]].*/,"",milli)
secs = milli / 1000
}
secs > tgt
$ awk -v tgt=1450616426 -f tst.awk file
{'id':Bob, 'last-login':'...', '_time_':1444211444123456, ...}
{'name':'ehsan', 'family':'toghian', 'last-login':'2015-4-12', '_time_': 1444215425123465, .....}
or with GNU awk for gensub():
$ awk -v tgt=1450616426 '(gensub(/.*_time_[^[:digit:]]+([[:digit:]]+).*/,"\\1",1) / 1000) > tgt' file
{'id':Bob, 'last-login':'...', '_time_':1444211444123456, ...}
{'name':'ehsan', 'family':'toghian', 'last-login':'2015-4-12', '_time_': 1444215425123465, .....}

gawk
awk -vl=1450616426 '{match($0,"_time_.: *([0-9]{10})[0-9]+",a);if(a[1]>l)print}' file

Related

How to grep specific value from JSON file?

I have a JSON file and content like below:
[
{
"id":"54545-f919-4b0f-930c-0117d6e6c987",
"name":"Inventory_Groups",
"path":"/Groups",
"subGroups":[
{
"id":"343534-394b-429a-834e-f8774240d736",
"name":"UserGroup",
"path":"/Groups/UserGroup",
"subGroups":[
]
}
]
}
]
Now I want to grep value of key id from the subGroups area. How to achive this, if id key not duplicate then it can be achieved by:
grep -o '"id": "[^"]*' Group.json | grep -o '[^"]*$'
But in my case how can I get the value of id as it appears two times?
A valid question to ask your employer is why you're in a position to use the shell but not to use appropriate linux packages. Compare:
awk -F '[":,]+' '$2=="subGroups" {f=1} f && $2=="id" {print $3; exit}' file
(Brittle solution, will fail if the structure of your JSON changes)
To:
jq '.[].subGroups[].id' file
Which can handle compact JSON in addition to numerous other realistic complications.
Using just standard UNIX tools and assuming your sed can tolerate input without a terminating newline (otherwise we can swap out the tr for an awk command that keeps the last newline):
$ tr -d '\n' < file | sed 's/.*"subGroups":[^]}]*"id":"\([^"]*\)\".*/\1\n/'
343534-394b-429a-834e-f8774240d736
Alternatively with just a call to any awk:
$ awk '
{ rec = (NR>1 ? rec ORS : "") $0 }
END {
gsub(/.*"subGroups":[^]}]*"id":"|".*/,"",rec)
print rec
}
' file
343534-394b-429a-834e-f8774240d736

Sh Script JSON values from JSON string

i have a file which contains an JSON object as a string:
{"STATUS":[{"STATUS":"S","When":1530779438,"Code":70,"Msg":"CGMiner stats","Description":"cgminer 4.9.0"}],"STATS":[{"CGMiner":"4.9.0","Miner":"9.0.0.5","CompileTime":"Sat May 26 20:42:30 CST 2018","Type":"Antminer Z9-Mini"},{"STATS":0,"ID":"ZCASH0","Elapsed":179818,"Calls":0,"Wait":0.000000,"Max":0.000000,"Min":99999999.000000,"GHS 5s":"16.39","GHS av":16.27,"miner_count":3,"frequency":"750","fan_num":1,"fan1":5760,"fan2":0,"fan3":0,"fan4":0,"fan5":0,"fan6":0,"temp_num":3,"temp1":41,"temp2":40,"temp3":43,"temp2_1":56,"temp2_2":53,"temp2_3":56,"temp_max":43,"Device Hardware%":0.0000,"no_matching_work":0,"chain_acn1":4,"chain_acn2":4,"chain_acn3":4,"chain_acs1":" oooo","chain_acs2":" oooo","chain_acs3":" oooo","chain_hw1":0,"chain_hw2":0,"chain_hw3":0,"chain_rate1":"5.18","chain_rate2":"5.34","chain_rate3":"5.87"}],"id":1}
now i want to get some values from keys in this object within a sh script.
The following cmd works, but however not for all keys!?
this works: (i get "750")
grep -o '"frequency": *"[^"]*"' LXstats.txt | grep -o '"[^"]*"$'
but this not: (empty)
grep -o '"fan_num": *"[^"]*"' LXstats.txt | grep -o '"[^"]*"$'
same with this:
grep -o '"fan1": *"[^"]*"' LXstats.txt | grep -o '"[^"]*"$'
Working on a xilinx OS, which has no python, so "jq" will not work and grep has no "-P" option. So anyone have an idea to work with that? :)
Thanks and best regards,
dave
When you want to do more than just g/re/p you should be using awk, not combinations of greps+pipes, etc.
$ awk -v tag='frequency' 'match($0,"\""tag"\": *(\"[^\"]*|[0-9]+)") { val=substr($0,RSTART,RLENGTH); sub(/^"[^"]+": *"?/,"",val); print val }' file
750
$ awk -v tag='fan_num' 'match($0,"\""tag"\": *(\"[^\"]*|[0-9]+)") { val=substr($0,RSTART,RLENGTH); sub(/^"[^"]+": *"?/,"",val); print val }' file
1
$ awk -v tag='fan1' 'match($0,"\""tag"\": *(\"[^\"]*|[0-9]+)") { val=substr($0,RSTART,RLENGTH); sub(/^"[^"]+": *"?/,"",val); print val }' file
5760
The above will work with any awk in any shell on any UNIX box. If you have GNU awk for the 3rd arg to match() and gensub() you can write it a bit briefer:
$ awk -v tag='frequency' 'match($0,"\""tag"\": *(\"[^\"]*|[0-9]+)",a) { print gensub(/^"/,"",1,a[1]) }' file
750

Adding a column in multiple csv file using awk

I want to add a column at the multiple (500) CSV files (same dimensionality). Each column should act as an identifier for the individual file. I want to create a bash script using awk(I am a new bee in awk). The CSV files do come with headers.
For eg.
Input File1.csv
#name,#age,#height
A,12,4.5
B,13,5.0
Input File2.csv
#name,#age,#height
C,11,4.6
D,12,4.3
I want to add a new column "#ID" in both the files, where the value of ID will be same for an individual file but not for both the file.
Expected Output
File1.csv
#name,#age,#height,#ID
A,12,4.5,1
B,13,5.0,1
Expected File2.csv
#name,#age,#height,#ID
C,11,4.6,2
D,12,4.3,2
Please suggest.
If you don't need to extract the id number from the filename, this should do.
$ c=1; for f in File*.csv;
do
sed -i '1s/$/,#ID/; 2,$s/$/,'$c'/' "$f";
c=$((c+1));
done
note that this is inplace edit. Perhaps make a backup or test first.
UPDATE
If you don't need the individual files to be updated, this may work better for you
$ awk -v OFS=, 'BEGIN {f="allFiles.csv"}
FNR==1 {c++; print $0,"#ID" > f; next}
{print $0,c > f}' File*.csv
awk -F, -v OFS=, ‘
FNR == 1 {
$(NF + 1) = “ID#”
i++
f = FILENAME
sub(/Input/, “Output”, f)
} FNR != 1 {
$(NF + 1) = i
} {
print > f
}’ Input*.csv
With GNU awk for inplace editing and ARGIND:
awk -i inplace -v OFS=, '{print $0, (FNR==1 ? "#ID" : ARGIND)}' File*.csv

Bash loop to merge files in batches for mongoimport

I have a directory with 2.5 million small JSON files in it. It's 104gb on disk. They're multi-line files.
I would like to create a set of JSON arrays from the files so that I can import them using mongoimport in a reasonable amount of time. The files can be no bigger than 16mb, but I'd be happy even if I managed to get them in sets of ten.
So far, I can use this to do them one at a time at about 1000/minute:
for i in *.json; do mongoimport --writeConcern 0 --db mydb --collection all --quiet --file $i; done
I think I can use "jq" to do this, but I have no idea how to make the bash loop pass 10 files at a time to jq.
Note that using bash find results in an error as there are too many files.
With jq you can use --slurp to create arrays, and -c to make multiline json single line. However, I can't see how to combine the two into a single command.
Please help with both parts of the problem if possible.
Here's one approach. To illustrate, I've used awk as it can read the list of files in small batches and because it has the ability to execute jq and mongoimport. You will probably need to make some adjustments to make the whole thing more robust, to test for errors, and so on.
The idea is either to generate a script that can be reviewed and then executed, or to use awk's system() command to execute the commands directly. First, let's generate the script:
ls *.json | awk -v group=10 -v tmpfile=json.tmp '
function out() {
print "jq -s . " files " > " tmpfile;
print "mongoimport --writeConcern 0 --db mydb --collection all --quiet --file " tmpfile;
print "rm " tmpfile;
files="";
}
BEGIN {n=1; files="";
print "test -r " tmpfile " && rm " tmpfile;
}
n % group == 0 {
out();
}
{ files = files " \""$0 "\"";
n++;
}
END { if (files) {out();}}
'
Once you've verified this works, you can either execute the generated script, or change the "print ..." lines to use "system(....)"
Using jq to generate the script
Here's a jq-only approach for generating the script.
Since the number of files is very large, the following uses features that were only introduced in jq 1.5, so its memory usage is similar to the awk script above:
def read(n):
# state: [answer, hold]
foreach (inputs, null) as $i
([null, null];
if $i == null then .[0] = .[1]
elif .[1]|length == n then [.[1],[$i]]
else [null, .[1] + [$i]]
end;
.[0] | select(.) );
"test -r json.tmp && rm json.tmp",
(read($group|tonumber)
| map("\"\(.)\"")
| join(" ")
| ("jq -s . \(.) > json.tmp", mongo("json.tmp"), "rm json.tmp") )
Invocation:
ls *.json | jq -nRr --arg group 10 -f generate.jq
Here is what I came up with. It seems to work and is importing at roughly 80 a second into an external hard drive.
#!/bin/bash
files=(*.json)
for((I=0;I<${#files[*]};I+=500)); do jq -c '.' ${files[#]:I:500} | mongoimport --writeConcern 0 --numInsertionWorkers 16 --db mydb --collection all --quiet;echo $I; done
However, some are failing. I've imported 105k files but only 98547 appeared in the mongo collection. I think it's because some documents are > 16mb.

awk CSV Split with headers Windows

Ok I have a csv file I need to split based on a column value which is fine, but I cannot get the headers to print in each file.
Currently I use:
awk "FS =\",\" {output=$3\".csv\"; print $0 > output}" test.csv
Which splits the files file based on column 3, but I don't know how to add the header to each file.
I've searched high & low but can't find a solution that will work in a one liner...
UPDATE
OK to date we have a working one liner:
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr>$3\".csv\"}{print>$3\".csv\"}" test.csv
Or in test.awk:
BEGIN{FS=","} NR==1 {hdr=$0;next}!($3 in files) {files[$3]=1;print hdr>$3".csv"}{print>$3".csv"}
Command to run used:
awk -f test.awk test.csv
I really appreciate the help here, I've been trying for hours and have a few things left to work out.
1) Blank line inserted after header
2) Sort the data on specified fields
Further down the line I want to additionally do a row count & cut a reference number from another file is this possible with AWK or am I using the wrong tool for the job?
Thanks again.
UPDATED#2
Blank line after header line
UPDATED
Try this:
On Unix/cygwin (I tested on cygwin):
awk -F, 'NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr"\n">$3".csv"}{print>$3".csv"}' test.csv
Or adding Kent's ideas:
awk -F, 'NR==1{hdr=$0;next}{out=$3".csv"}!($3 in files){files[$3];print hdr"\n">out}{print>out}' test.csv
On windows cmd (not tested):
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr\"\n\">$3\".csv\"}{print>$3\".csv\"}" test.csv
This stores the header line in test.csv to hdr. For the next lines it checks if the file name value is already exists. If not then stores its name in the files hash and prints the header line. And anyway it prints the whole line to the file.
Example file:
$ cat test.csv
A,B,C,D
1,2,a,3
4,5,b,4
Output
$ cat a.csv
A,B,C,D
1,2,a,3
$ cat b.csv
A,B,C,D
4,5,b,4
ADDED
If You would like to put the awk script into a file You could try (I cannot test is, sorry).
test.awk
BEGIN{FS=","}
NR==1 {hdr=$0;next}
!($3 in files) {files[$3]=1;print hdr"\n">$3".csv"}
{print>"$3.csv"}
Then You may call it as
awk -f test.awk test.csv
awk -F, 'NR==1{h=$0;next}{out=$3".csv";
if!(out in a)print h> out; print $0 > out;a[out]}' test.csv
Try something like this:
awk -F, '
BEGIN {
getline header
}
{
out=$3".csv"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}' test.csv
Windows version: (Not tested)
awk " FS =\",\"
BEGIN {
getline header
}
{
out=$3\".csv\"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}" test.csv
awk '{ output=$3".csv"; if( !($0 in a)) print "header" > output; a[$0]
print > output}' FS=, test.csv