Parse JSON field with jq and output repeatedly for every child - json

I have the following JSON:
{
"field1":"foo",
"array":[
{
child_field1:"c1_1",
child_field2:"c1_2"
},
{
child_field1:"c2_1",
child_field2:"c2_2"
}
]...
}
and using jq, I would like to return the following output, where the value of field1 is repeated for every child element.:
foo,c1_1,c1_2
foo,c2_1,c2_2
...
I can access each field separately, am having trouble returning the desired result above.
Can this be done with jq?

jq -r '.array[] as $a | [.field1, $a.child_field1, $a.child_field2] | #csv'
Does the right thing for the sample data you provided, but I freely admit there are lots of ways to do that kind of thing in jq, and that was only the first one which sprang to mind.
I fed it through #csv because it seemed like that was what you wanted, but if you prefer the actual output, exactly as you have written, then:
jq -r '.array[] as $a | "\(.field1),\($a.child_field1),\($a.child_field2)"'
will produce it

In cases like this, there's no need for reduce or `to_entries', or to list the fields explicitly -- one can simply exploit jq's backtracking behavior:
.field1 as $f
| .array[]
| [$f, .[]]
| #csv
As pointed out by #MatthewLDaniel, there are many alternatives to using #csv here.

jq solution:
jq -r '.field1 as $f1 | .array[]
| [$f1, .[]]
| join(",")' input.json
The output:
foo,c1_1,c1_2
foo,c2_1,c2_2

Related

Can this jq map be simplified?

Given this JSON:
{
"key": "/books/OL1000072M",
"source_records": [
"ia:daywithtroubadou00pern",
"bwb:9780822519157",
"marc:marc_loc_2016/BooksAll.2016.part25.utf8:103836014:1267"
]
}
Can the following jq code be simplified?
jq -r '.key as $olid | .source_records | map([$olid, .])[] | #tsv'
The use of variable assignment feels like cheating and I'm wondering if it can be eliminated. The goal is to map the key value onto each of the source_records values and output a two column TSV.
Instead of mapping into an array, and then iterating over it (map(…)[]) just create an array and collect its items ([…]). Also, you can get rid of the variable binding (as) by moving the second part into its own context using parens.
jq -r '[.key] + (.source_records[] | [.]) | #tsv'
Alternatively, instead of using #tsv you could build your tab-separated output string yourself. Either by concatenation (… + …) or by string interpolation ("\(…)"):
jq -r '.key + "\t" + .source_records[]'
jq -r '"\(.key)\t\(.source_records[])"'
Output:
/books/OL1000072M ia:daywithtroubadou00pern
/books/OL1000072M bwb:9780822519157
/books/OL1000072M marc:marc_loc_2016/BooksAll.2016.part25.utf8:103836014:1267
It's not much shorter, but I think it's clearer than the original and clearer than the other shorter answers.
jq -r '.key as $olid | .source_records[] | [ $olid, . ] | #tsv'

jq - Looping through json and concatenate the output to single string

I was currently learning the usage of jq. I have a json file and I am able to loop through and filter out the values I need from the json. However, I am running into issue when I try to combine the output into single string instead of having the output in multiple lines.
File svcs.json:
[
{
"name": "svc-A",
"run" : "True"
},
{
"name": "svc-B",
"run" : "False"
},
{
"name": "svc-C",
"run" : "True"
}
]
I was using the jq to filter to output the service names with run value as True
jq -r '.[] | select(.run=="True") | .name ' svcs.json
I was getting the output as follows:
svc-A
svc-C
I was looking to get the output as single string separated by commas.
Expected Output:
"svc-A,svc-C"
I tried to using join, but was unable to get it to work so far.
The .[] expression explodes the array into a stream of its elements. You'll need to collect the transformed stream (the names) back into an array. Then you can use the #csv filter for the final output
$ jq -r '[ .[] | select(.run=="True") | .name ] | #csv' svcs.json
"svc-A","svc-C"
But here's where map comes in handy to operate on an array's elements:
$ jq -r 'map(select(.run=="True") | .name) | #csv' svcs.json
"svc-A","svc-C"
Keep the array using map instead of decomposing it with .[], then join with a glue string:
jq -r 'map(select(.run=="True") | .name) | join(",")' svcs.json
svc-A,svc-C
Demo
If your goal is to create a CSV output, there is a special #csv command taking care of quoting, escaping etc.
jq -r 'map(select(.run=="True") | .name) | #csv' svcs.json
"svc-A","svc-C"
Demo

jq string manipulation on domain names and dns records

I am attempting to learn some jq but am running into trouble.
I am working with a dataset of dns records like {"timestamp":"1592145252","name":"0.127.9.109.rev.sfr.net","type":"a","value":"109.9.127.0"}
I cannot figure out how to
strip the subdomain details out of the name field. in this example i just want sfr.net
print the name backwards, eg: 0.127.9.109.rev.sfr.net would become ten.rfs.ver.901.9.721.0
my end goal is to print lines like this:
0.127.9.109.rev.sfr.net,ten.rfs.ver.901.9.721.0,a,sfr.net
Thanks SO!
To extract the "domain" part, you could use simple string manipulation methods to select it. Assuming anything after the .rev. part is the domain, you could do this:
split(".rev.")[1]
To reverse a string, jq doesn't have the operations to do it directly for strings. However it does have a function to reverse arrays. So you could convert to an array, reverse, then convert back.
split("") | reverse | join("")
To put it all together for your input:
.name | [
.,
(split("") | reverse | join("")),
(split(".rev.")[1])
] | join(",")
Here's one approach using reverse and capture:
jq -r '
.type as $type
| .name
| "\(.),\(explode|reverse|implode),\($type),"
+ capture("(?<subdomain>[^.]+[.][^.]+)$").subdomain'
Like this :
$ jq -r '.name' file.json | grep -oE '\w+\.\w+$'
sfr.net
$ jq -r '.name' file.json | rev
ten.rfs.ver.901.9.721.0

Optimize JSON denormalization using JQ - "cartesian product" from 1:N

I have a JSON database change log, output of wal2json. It looks like this:
{"xid":1190,"timestamp":"2018-07-19 17:18:02.905354+02","change":[
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
...
Each top level entry (xid) is a transaction, each item in change is, well, a change. One row may change multiple times.
To import to an OLAP system with limited feature set, I need to have the order explicitly stated. So I need to add a sn for each change in a transaction.
Also, each change must be a top level entry - the OLAP can't iterate sub-items within one entry.
{"xid":1190, "sn":1, "kind":"update", "data":{"id":401,"name":"Update AA","age":20} }
{"xid":1190, "sn":2, "kind":"update", "data":{"id":401,"name":"Update BB","age":20} }
{"xid":1191, "sn":1, "kind":"insert", "data":{"id":625,"name":"Inserted","age":20} }
{"xid":1191, "sn":2, "kind":"delete", "data":{"id":625} }
(The reason is that the OLAP has limited ability to transform the data during import, and also doesn't have the order as a parameter.)
So, I do this using jq:
function transformJsonDataStructure {
## First let's reformat it to XML, then transform using XPATH, then back to JSON.
## Example input:
# {"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change":[
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
cat "$1" | while read -r LINE ; do
XID=`echo "$LINE" | jq -c '.xid'`;
export SN=0;
#serr "{xid: $XID, changes: $CHANGES}";
echo "$LINE" | jq -c '.change[]' | while read -r CHANGE ; do
SN=$((SN+=1))
KIND=`echo "$CHANGE" | jq -c --raw-output .kind`;
TABLE=`echo "$CHANGE" | jq -c --raw-output .table`;
DEST_FILE="$TARGET_PATH-$TABLE.json";
case "$KIND" in
update|insert)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".columnnames")" "$(echo "$CHANGE" | jq -c ".columnvalues")") ;;
delete)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".oldkeys.keynames")" "$(echo "$CHANGE" | jq -c ".oldkeys.keyvalues")") ;;
esac
#echo "{\"xid\":$XID, \"table\":\"$TABLE\", \"kind\":\"$KIND\", \"data\":$MAP }" >> "$DEST_FILE"; ;;
echo "{\"xid\":$XID, \"sn\":$SN, \"kind\":\"$KIND\", \"data\":$MAP }" | tee --append "$DEST_FILE";
done;
done;
return;
}
The problem is the performance. I am calling jq few times per entry. This is quite slow, around 1000x times slower than without the transformation.
How can perform the transformation above using just one pass? (jq is not a must, other tool can be used too, but should be in CentOS packages. I want to avoid coding an extra tool for that.
From man jq it seems that it could be capable of processing the whole file (JSON entry per row) in one go. I could do it in XSLT but I can't wrap my head around jq. Especially the iteration of the change array and combining columnnames and columnvalues to a map.
For the iteration, I think map or map_values could be used.
For the 2 arrays to map, I see the from_entries and with_entries functions, but can't get it work.
Any jq master around to advise?
The following helper function converts the incoming array into an object using headers as the keys:
def objectify(headers):
[headers, .] | transpose | map({(.[0]): .[1]}) | add;
The trick now is to use range(0;length) to generate .sn:
{xid} +
(.change
| range(0;length) as $i
| .[$i]
| .columnnames as $header
| {sn: ($i + 1),
kind,
data: (.columnvalues|objectify($header)) } )
Output
For the given log entry, the output would be:
{"xid":1190,"sn":1,"kind":"update","data":{"id":401,"name":"Update AA","age":20}}
{"xid":1190,"sn":2,"kind":"update","data":{"id":401,"name":"Update BB","age":20}}
Moral
If a solution looks too complicated, it probably is.

jq add value of a key in nested array and given to a new key

I have a stream of JSON arrays like this
[{"id":"AQ","Count":0}]
[{"id":"AR","Count":1},{"id":"AR","Count":3},{"id":"AR","Count":13},
{"id":"AR","Count":12},{"id":"AR","Count":5}]
[{"id":"AS","Count":0}]
I want to use jq to get a new json like this
{"id":"AQ","Count":0}
{"id":"AR","Count":34}
{"id":"AS","Count":0}
34=1+3+13+12+5 which are in the second array.
I don't know how to describe it in detail. But the basic idea is shown in my example.
I use bash and prefer to use jq to solve this problem. Thank you!
If you want an efficient but generic solution that does NOT assume each input array has the same ids, then the following helper function makes a solution easy:
# Input: a JSON object representing the subtotals
# Output: the object augmented with additional subtotals
def adder(stream; id; filter):
reduce stream as $s (.; .[$s|id] += ($s|filter));
Assuming your jq has inputs, then the most efficient approach is to use it (but remember to use the -n command-line option):
reduce inputs as $row ({}; adder($row[]; .id; .Count) )
This produces:
{"AQ":0,"AR":34,"AS":0}
From here, it's easy to get the answer you want, e.g. using to_entries[] | {(.key): .value}
If your jq does not have inputs and if you don't want to upgrade, then use the -s option (instead of -n) and replace inputs by .[]
Assuming the .id is the same in each array:
first + {Count: map(.Count) | add}
Or perhaps more intelligibly:
(map(.Count) | add) as $sum | first | .Count = $sum
Or more declaratively:
{ id: (first|.id), Count: (map(.Count) | add) }
It's a bit kludgey, but given your input:
jq -c '
reduce .[] as $item ({}; .[($item.id)] += ($item.Count))
| to_entries
| .[] | {"id": .key, "Count": .value}
'
Yields the output:
{"id":"AQ","Count":0}
{"id":"AR","Count":34}
{"id":"AS","Count":0}