Extract only one field for a specific flow from JSON - json

Below is the Json I receive as Response from url.
{"flows":[{"version":"OF_13","cookie":"0","tableId":"0x0","packetCount":"24","byteCount":"4563","durationSeconds":"5747","priority":"0","idleTimeoutSec":"0","hardTimeoutSec":"0","flags":"0","match":{},"instructions":{"instruction_apply_actions":{"actions":"output=controller"}}},
{"version":"OF_13","cookie":"45036000240104713","tableId":"0x0","packetCount":"0","byteCount":"0","durationSeconds":"29","priority":"6","idleTimeoutSec":"0","hardTimeoutSec":"0","flags":"1","match":{"eth_type":"0x0x800","ipv4_src":"10.0.0.10","ipv4_dst":"10.0.0.12"},"instructions":{"none":"drop"}},
{"version":"OF_13","cookie":"45036000240104714","tableId":"0x0","packetCount":"0","byteCount":"0","durationSeconds":"3","priority":"7","idleTimeoutSec":"0","hardTimeoutSec":"0","flags":"1","match":{"eth_type":"0x0x800","ipv4_src":"10.0.0.10","ipv4_dst":"127.0.0.1"},"instructions":{"none":"drop"}},
{"version":"OF_13","cookie":"0","tableId":"0x1","packetCount":"0","byteCount":"0","durationSeconds":"5747","priority":"0","idleTimeoutSec":"0","hardTimeoutSec":"0","flags":"0","match":{},"instructions":{"instruction_apply_actions":{"actions":"output=controller"}}}]}
So, I have for example four flows and I want to extract only the field "byteCount" for a specific flow identify by the ipv4_src and ipv4_dst that i have to give it as input
How can I do this?

json_array := JSON.parse(json_string)
foreach (element in json_array.flows):
if(element.match.hasProperty('ipv4_src') && element.match.hasProperty('ipv4_dst')):
if(element.match.ipv4_src == myValue && element.match.ipv4_dst == otherValue):
print element.byteCount ;
The above is a pseudo-code to find byteCount based on ipv4_src and ipv4_dst. Note that these two properties are within match property, which may or may not contain them. Hence, first check for their existence and then process.
Note: When formatted property, each element in the array is like
{
"version":"OF_13",
"cookie":"45036000240104713",
"tableId":"0x0",
"packetCount":"0",
"byteCount":"0",
"durationSeconds":"29",
"priority":"6",
"idleTimeoutSec":"0",
"hardTimeoutSec":"0",
"flags":"1",
"match":{
"eth_type":"0x0x800",
"ipv4_src":"10.0.0.10",
"ipv4_dst":"10.0.0.12"
},
"instructions":{
"none":"drop"
}
}

Here's how to perform the selection and extraction task using the command-line tool jq:
First create a file, say "extract.jq", with these three lines:
.flows[]
| select(.match.ipv4_src == $src and .match.ipv4_dst == $dst)
| [$src, $dst, .byteCount]
Next, assuming the desired src and dst are 10.0.0.10 and 10.0.0.12 respectively, and that the input is in a file named input.json, run this command:
jq -c --arg src 10.0.0.10 --arg dst 10.0.0.12 -f extract.jq input.json
This would produce one line per match; in the case of your example, it would produce:
["10.0.0.10","10.0.0.12","0"]
If the JSON is coming from some command (such as curl), you can use a pipeline along the following lines:
curl ... | jq -c --arg src 10.0.0.10 --arg dst 10.0.0.12 -f extract.jq

Related

Using jq how to pass multiple values as arguments to a function?

I have a json file test.json with the content:
[
{
"name": "Akshay",
"id": "234"
},
{
"name": "Amit",
"id": "28"
}
]
I have a shell script with content:
#!/bin/bash
function display
{
echo "name is $1 and id is $2"
}
cat test.json | jq '.[].name,.[].id' | while read line; do display $line; done
I want name and id of a single item to be passed together as arguments to the function display but the output is something like this :
name is "Akshay" and id is
name is "Amit" and id is
name is "234" and id is
name is "28" and id is
What should be the correct way to implement the code?
PS: I specifically want to use jq so please base the answer in terms of jq
Two major issues, and some additional items that may not matter for your current example use case but can be important when you're dealing with real-world data from untrusted sources:
Your current code iterates over all names before writing any ids.
Your current code uses newline separators, but doesn't make any effort to read multiple lines into each while loop iteration.
Your code uses newline separators, but newlines can be present inside strings; consequently, this is constraining the input domain.
When you pipe into a while loop, that loop is run in a subshell; when the pipeline exits, the subshell does too, so any variables set by the loop are lost.
Starting up a copy of /bin/cat and making jq read a pipe from its output is silly and inefficient compared to letting jq read from test.json directly.
We can fix all of those:
To write names and ids in pairs, you'd want something more like jq '.[] | (.name, .id)'
To read both a name and an id for each element of the loop, you'd want while IFS= read -r name && IFS= read -r id; do ... to iterate over those pairs.
To switch from newlines to NULs (the NUL being the only character that can't exist in a C string, or thus a bash string), you'd want to use the -j argument to jq, and then add explicit "\u0000" elements to the content being written. To read this NUL-delimited content on the bash side, you'd need to add the -d '' argument to each read.
To move the while read loop out of the subshell, we can use process substitution, as described in BashFAQ #24.
To let jq read directly from test.json, use either <test.json to have the shell connect the file directly to jq's stdin, or pass the filename on jq's command line.
Doing everything described above in a manner robust against input data containing JSON-encoded NULs would look like the following:
#!/bin/bash
display() {
echo "name is $1 and id is $2"
}
cat >test.json <<'EOF'
[
{ "name": "Akshay", "id": "234" },
{ "name": "Amit", "id": "28" }
]
EOF
while IFS= read -r -d '' name && IFS= read -r -d '' id; do
display "$name" "$id"
done < <(jq -j '
def stripnuls: sub("\u0000"; "<NUL>");
.[] | ((.name | stripnuls), "\u0000", (.id | stripnuls), "\u0000")
' <test.json)
You can see the above running at https://replit.com/#CharlesDuffy2/BelovedForestgreenUnits#main.sh
You can use string interpolation.
jq '.[] | "The name is \(.name) and id \(.id)"'
Result:
"The name is Akshay and id 234"
"The name is Amit and id 28"
"The name is hi and id 28"
If you want to get rid of the double-quotes from each object, then:
jq --raw-output '.[] | "The name is \(.name) and is \(.id)"'
https://jqplay.org/s/-lkpHROTBk0

Using jq to count

Using jq-1.5 if I have a file of JSON that looks like
[{... ,"sapm_score":40.776, ...} {..., "spam_score":17.376, ...} ...]
How would I get a count of the ones where sapm_score > 40?
Thanks,
Dan
Update:
I looked at the input file and the format is actually
{... ,"sapm_score":40.776, ...}
{..., "spam_score":17.376, ...}
...
Does this change how one needs to count?
[UPDATE: If the input is not an array, see the last section below.]
count/1
I'd recommend defining a count filter (and maybe putting it in your ~/.jq), perhaps as follows:
def count(s): reduce s as $_ (0;.+1);
With this, assuming the input is an array, you'd write:
count(.[] | select(.sapm_score > 40))
or slightly more efficiently:
count(.[] | (.sapm_score > 40) // empty)
This approach (counting items in a stream) is usually preferable to using length as it avoids the costs associated with constructing an array.
count/2
Here's another definition of count that you might like to use (and perhaps add to ~/.jq as well):
def count(stream; cond): count(stream | cond // empty);
This counts the elements of the stream for which cond is neither false nor null.
Now, assuming the input consists of an array, you can simply write:
count(.[]; .sapm_score > 40)
"sapm_score" vs "spam_score"
If the point is that you want to normalize "sapm_score" to "spam_score", then (for example) you could use count/2 as defined above, like so:
count(.[]; .spam_score > 40 or .sapm_score > 40)
This assumes all the items in the array are JSON objects. If that is not the case, then you might want to try adding "?" after the key names:
count(.[]; .spam_score? > 40 or .sapm_score? > 40)
Of course all the above assumes the input is valid JSON. If that is not the case, then please see https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
If the input is a stream of JSON objects ...
The revised question indicates the input consists of a stream of JSON objects (whereas originally the input was said to be an array of JSON objects). If the input consists of a stream of JSON objects, then the above solutions can easily be adapted, depending on the version of jq that you have. If your version of jq has inputs then (2) is recommended.
(1) All versions: use the -s command-line option.
(2) If your jq has inputs: use the -n command line option, and change .[] above to inputs, e.g.
count(inputs; .spam_score? > 40 or .sapm_score? > 40)
Filter the items that satisfy the condition then get the length.
map(select(.sapm_score > 40)) | length
Here is one way:
reduce .[] as $s(0; if $s.spam_score > 40 then .+1 else . end)
Try it online at jqplay.org
If instead of an array the input is a sequence of newline delimited objects (jsonlines)
reduce inputs as $s(0; if $s.spam_score > 40 then .+1 else . end)
will work if jq is invoked with the -n flag. Here is an example:
$ cat data.json
{ "spam_score":40.776 }
{ "spam_score":17.376 }
$ jq -Mn 'reduce inputs as $s(0; if $s.spam_score > 40 then .+1 else . end)' data.json
1
Try it online at tio.run
cat input.json | jq -c '. | select(.sapm_score > 40)' | wc -l
should do it.
The -c option prints a one-liner compact json representation of each match, and we count the number of lines jq prints.

Get the last element in JSON file

I have this JSON file:
{
"system.timestamp": "{system.timestamp}",
"error.state": "{error.state}",
"system.timestamp": "{system.timestamp}",
"error.state": "{error.state}",
"system.timestamp": "{system.timestamp}",
"error.state": "{error.state}",
"error.content": "{custom.error.content}"
}
I would like to get only the last object of the JSON file as I need to check that in every case, the last object is error.content. Attached part of code is just a sample file, every file that will be generated in reality will contain at least around 40 to 50 objects, so in every case I need to check that the last object is error.content.
I have calculated the length by using jq '. | length'. How do I do it using the jq command in Linux?
Note: it's a plain JSON file without any arrays.
Objects with duplicate keys can be handled in jq using the --stream option, e.g.:
$ jq -s --stream '.[length-2] | { (.[0][0]): (.[1]) }' input.json
{
"error.content": "{custom.error.content}"
}
For large files, the following would probably be better as it avoids "slurping" the input file:
$ jq 'first(tostream) | {(.[0][0]): .[1]} ' input.json

How to delete the last character of prior line with sed

I'm trying to delete a line with a the last character of the prior line with sed:
I have a json file :
{
"name":"John",
"age":"16",
"country":"Spain"
}
I would like to delete country of all entries, to do that I have to delete the comma for the json syntax of the prior line.
I'm using this pattern :
sed '/country/d' test.json
sed -n '/resolved//.$//{x;d;};1h;1!{x;p;};${x;p;}' test.json
Editor's note:
The OP later clarified the following additional requirements, which invalidated some of the existing answers:
- multiple occurrences of country properties should be removed
- across all levels of the object hierarchy
- whitespace variations should be tolerated
Using a proper JSON parser such as jq is generally the best choice (see below), but if installing a utility is not an option, try this GNU sed command:
$ sed -zr 's/,\s*"country":[^\n]+//g' test.json
{
"name":"John",
"age":"16"
}
-z splits the input into records by NULs, which, in this case means that the whole file is read at once, which enables cross-line substitutions.
-r enables extended regular expressions for a more modern syntax with more features.
s/,\n"country":\s*//g replaces all occurrences of a comma followed by a (possibly empty) run of whitespace (including possibly a newline) and then "country" through the end of that line with the empty string, i.e., effectively removes the matched strings.
Note that this assumes that no other property or closing } follows such a country property on the same line.
To demonstrate a more robust solution based on jq.
Bertrand Martel's helpful answer contains a jq solution, which, however, does not address the requirement (added later) of replacing country attributes anywhere in the input object hierarchy.
In a not-yet-released version of jq higher than v1.5.2, a builtin walk/1 function will be available, which enables the following simple solution:
# Walk all nodes and remove a "country" property from any object.
jq 'walk(if type == "object" then del (.country) else . end)' test.json
In v1.5.2 and below, you can define a simplified variant of walk yourself:
jq '
# Define recursive function walk_objects/1 that walks all objects in the
# hierarchy.
def walk_objects(f): . as $in |
if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk_objects(f)) } ) | f
elif type == "array" then map( walk_objects(f) )
else . end;
# Walk all objects and remove a "country" property, if present.
walk_objects(del(.country))
' test.json
As pointed out before you should really consider using a JSON parser to parse JSON.
When that is said you can slurp the whole file, remove newlines and then replace
accordantly:
$ sed ':a;N;$!ba;s/\n//g;s/,"country"[^}]*//' test.json
{"name":"John","age":"16"}
Breakdown:
:a; # Define label 'a'
N; # Append next line to pattern space
$!ba; # Goto 'a' unless it's the last line
s/\n//g; # Replace all newlines with nothing
s/,"country"[^}]*// # Replace ',"country...' with nothing
This might work for you (GNU sed):
sed 'N;s/,\s*\n\s*"country".*//;P;D' file
Read two lines into the pattern space and remove substitution string.
N.B. Allows for spaces either side of the line.
You can use a JSON parser like jq to parse json file. The following will return the document without the country field and write the new document in result.json :
jq 'del(.country)' file.json > result.json

Splitting / chunking JSON files with JQ in Bash or Fish shell?

I have been using the wonderful JQ library to parse and extract JSON data to facilitate re-importing. I am able to extract a range easily enough, but am unsure as to how you could loop through in a script and detect the end of the file, preferably in a bash or fish shell script.
Given a JSON file that is wrapped in a "results" dictionary, how can I detect the end of the file?
From testing, I can see that I will get an empty array nested in my desired structure, but how can you detect the end of file condition?:
jq '{ "results": .results[0:500] }' Foo.json > 0000-0500/Foo.json
Thanks!
I'd recommend using jq to split-up the array into a stream of the JSON objects you want (one per line), and then using some other tool (e.g. awk) to populate the files. Here's how the first part can be done:
def splitup(n):
def _split:
if length == 0 then empty
else .[0:n], (.[n:] | _split)
end;
if n == 0 then empty elif n > 0 then _split else reverse|splitup(-n) end;
# For the sake of illustration:
def data: { results: [range(0,20)]};
data | .results | {results: splitup(5) }
Invocation:
$ jq -nc -f splitup.jq
{"results":[0,1,2,3,4]}
{"results":[5,6,7,8,9]}
{"results":[10,11,12,13,14]}
{"results":[15,16,17,18,19]}
For the second part, you could (for example) pipe the jq output to:
awk '{ file="file."++n; print > file; close(file); }'
A variant you might be interested in would have the jq filter emit both the filename and the JSON on alternate lines; the awk script would then read the filename as well.