Struggling with parsing JSON with jq - json

I've read all the posts related to it, I'm playing around with it for hours, and still can't manage to get a grip of this tool which seems to be exactly what I need if I just find a way to make it work as I need...
So here's a sample of my JSON:
{
"res": "0",
"main": {
"All": [
{
"field1": "a",
"field2": "aa",
"field3": "aaa",
"field4": "0",
"active": "true",
"id": "1"
},
{
"field1": "b",
"field2": "bb",
"field3": "bbb",
"field4": "0",
"active": "false",
"id": "2"
},
{
"field1": "c",
"field2": "cc",
"field3": "ccc",
"field4": "0",
"active": "true",
"id": "3"
},
{
"field1": "d",
"field2": "dd",
"field3": "ddd",
"field4": "0",
"active": "true",
"id": "4"
}
]
}
}
I'd like to selectively extract some of the fields and get a csv output like this:
field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4
Please notice I've skipped some fields and I'm also not interested in the parent arrays and such.
Thanks a lot in advance.

First your JSON needs to be fixed as following:
{
"main": {
},
"table": {
"All": [
{
"field1": "a",
"field2": "aa",
"field3": "aaa",
"field4": "0",
"active": "true",
"id": "1"
},
{
"field1": "b",
"field2": "bb",
"field3": "bbb",
"field4": "0",
"active": "false",
"id": "2"
},
{
"field1": "c",
"field2": "cc",
"field3": "ccc",
"field4": "0",
"active": "true",
"id": "3"
},
{
"field1": "d",
"field2": "dd",
"field3": "ddd",
"field4": "0",
"active": "true",
"id": "4"
}
]
},
"res": "0"
}
Second using jq you can do the following in order to generate the table output using column:
{ echo Field1 Field2 Field3 ID ; cat data.json | jq -r '.table.All[] | (.field1, .field2, .field3, .id)' | xargs -L4 } | column -t
Output:
Field1 Field2 Field3 ID
a aa aaa 1
b bb bbb 2
c cc ccc 3
d dd ddd 4
Using sed:
echo "field1,field2,field3,id" ;cat data.json | jq -r '.table.All[] | (.field1, .field2, .field3, .id)' | xargs -L4 | sed 's/ /,/g'
Output:
field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4
Update:
Without using sed or xargs , jq has the ability to format the output as csv like the following:
cat data.json | jq -r '.table.All[] | [.field1, .field2, .field3, .id] | #csv'
Output:
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"
Thanks to chepner as he mentioned in comments the header can be added using jq directly as following:
jq -r '(([["field1", "field2", "field3", "id"]]) + [(.table.All[] | [.field1,.field2,.field3,.id])])[]|#csv' data.json
Output:
"field1","field2","field3","id"
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"
This command should work correctly according to the last JSON data you have provided in your question:
jq -r '(([["field1", "field2", "field3", "id"]]) + [(.main.All[] | [.field1,.field2,.field3,.id])])[]|#csv' data.json
([["field1", "field2", "field3", "id"]]) : The first part of the command is for the csv header
(.main.All[] | [.field1,.field2,.field3,.id])]) : As main is the
parent of your JSON then you can choose it using .main which will
print the array All then to print the contents of this array you
have to add [] to the name of this array and the full command will
be .main.All[] which will print multiple dictionries and we can
specific the needed keys by piping the out put of .main.All[] to
another array with the keys we want as this
[.field1,.field2,.field3,.id]

Here's an only-jq solution that only requires specifying the desired keys once, e.g. on the command line:
jq -r --argjson f '["field1", "field2", "field3", "id"]' '
$f, (.table.All[] | [getpath( $f[]|[.])]) | #csv'
Output:
"field1","field2","field3","id"
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"
Losing the quotation marks
One way to avoid quoting the strings would be to pipe into join(",") (or join(", ")) instead of #csv:
field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4
Of course, this might be unacceptable if the values contain commas. In general, if avoiding the quotation marks around strings is important, a good option to consider is #tsv.

Related

Encode JSON value while disabling line wrapping

I originally had an issue with a JSON (let's call it "raw_data.json") that looked like this:
[
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533712,
"field2": "",
"field3": "hello",
"field4": "other",
"field5": 2022,
"field6": "0",
"field7": "0",
"field8": "0",
"field9": "0",
"field10": "0",
"field11": "0"
}
},
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533713,
"field2": "Y2JjLTIwMjItdzA1LXRyZi1vZmZyZXMtcmVuZm9ydC1jaGVxdWllci13MDU=",
"field3": "A0AVB",
"field4": "other",
"field5": "HJlbmZvcnQgY2hlcXVpZXIgVzA1",
"field6": "",
"field7": "02/02/2022",
"field8": "14/02/2022",
"field9": "Ticket"
}
}
]
The dictionaries were pretty big and usually spanned over several lines on my monitor. That was an issue because I needed to encode the whole thing in Base64 and send the result with a HTTP POST method. But the encoding tended to wrap lines by adding newlines characters, which caused my POST method to fail.
I fortunately found a question with the following solution:
export DATA=$(cat 'raw_data.json')
export ENCODED_DATA=$(echo "$DATA" | jq -c . | base64 -w 0)
Problem is, my JSON has now changed to this:
{
"request_id": 1234,
"data_to_be_encoded": [
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533712,
"field2": "",
"field3": "hello",
"field4": "other",
"field5": 2022,
"field6": "0",
"field7": "0",
"field8": "0",
"field9": "0",
"field10": "0",
"field11": "0"
}
},
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533713,
"field2": "Y2JjLTIwMjItdzA1LXRyZi1vZmZyZXMtcmVuZm9ydC1jaGVxdWllci13MDU=",
"field3": "A0AVB",
"field4": "other",
"field5": "HJlbmZvcnQgY2hlcXVpZXIgVzA1",
"field6": "",
"field7": "02/02/2022",
"field8": "14/02/2022",
"field9": "Ticket"
}
}
]
}
Basically I needed to keep the request_id key-value pair as is, while what was inside the data_to_be_encoded key had to be encoded. Again, another post offered a nice solution:
export DATA=$(cat 'raw_data.json')
export ENCODED_DATA=$(echo "$DATA" | jq '.data_to_be_encoded |= #base64')
Except for the fact that this solution adds line wrapping and I haven't found a way to disable that feature like I managed to do with the first solution.
I did try this:
export ENCODED_DATA=$(echo "$DATA" | jq -c .data_to_be_encoded | base64 -w 0)
But it only returns the value inside the data_to_be_encoded key and not the whole JSON. So I'm back to square one.
How can I get the best of both worlds? In other words, how can I disable the line wrapping while at the same time encoding a specific part of my JSON?
Use jq's #base64 builtin, rather than doing the conversion outside because that way you can either convert all or nothing.
jq '.data_to_be_encoded |= #base64'
{
"request_id": 1234,
"data_to_be_encoded": "W3sibWV0YWRhdGEiOnsicHJvamVjdCI6WyJzb21lUHJvamVjdDEiLCJzb21lUHJvamVjdDIiXSwidGFibGVfbmFtZSI6WyJzb21lVGFibGUxIiwic29tZVRhYmxlMiJdLCJzeXNfaW5zZXJ0ZGF0ZXRpbWUiOiJzb21lRGF0ZSIsInN5c19zb3VyY2VzeXN0ZW0iOiJzb21lU291cmNlU3lzdGVtIn0sImRhdGEiOnsiZmllbGQxIjo2MzUzMzcxMiwiZmllbGQyIjoiIiwiZmllbGQzIjoiaGVsbG8iLCJmaWVsZDQiOiJvdGhlciIsImZpZWxkNSI6MjAyMiwiZmllbGQ2IjoiMCIsImZpZWxkNyI6IjAiLCJmaWVsZDgiOiIwIiwiZmllbGQ5IjoiMCIsImZpZWxkMTAiOiIwIiwiZmllbGQxMSI6IjAifX0seyJtZXRhZGF0YSI6eyJwcm9qZWN0IjpbInNvbWVQcm9qZWN0MSIsInNvbWVQcm9qZWN0MiJdLCJ0YWJsZV9uYW1lIjpbInNvbWVUYWJsZTEiLCJzb21lVGFibGUyIl0sInN5c19pbnNlcnRkYXRldGltZSI6InNvbWVEYXRlIiwic3lzX3NvdXJjZXN5c3RlbSI6InNvbWVTb3VyY2VTeXN0ZW0ifSwiZGF0YSI6eyJmaWVsZDEiOjYzNTMzNzEzLCJmaWVsZDIiOiJZMkpqTFRJd01qSXRkekExTFhSeVppMXZabVp5WlhNdGNtVnVabTl5ZEMxamFHVnhkV2xsY2kxM01EVT0iLCJmaWVsZDMiOiJBMEFWQiIsImZpZWxkNCI6Im90aGVyIiwiZmllbGQ1IjoiSEpsYm1admNuUWdZMmhsY1hWcFpYSWdWekExIiwiZmllbGQ2IjoiIiwiZmllbGQ3IjoiMDIvMDIvMjAyMiIsImZpZWxkOCI6IjE0LzAyLzIwMjIiLCJmaWVsZDkiOiJUaWNrZXQifX1d"
}
Demo
If you need all in one line, use the -c option as before:
jq -c '.data_to_be_encoded |= #base64'
{"request_id":1234,"data_to_be_encoded":"W3sibWV0YWRhdGEiOnsicHJvamVjdCI6WyJzb21lUHJvamVjdDEiLCJzb21lUHJvamVjdDIiXSwidGFibGVfbmFtZSI6WyJzb21lVGFibGUxIiwic29tZVRhYmxlMiJdLCJzeXNfaW5zZXJ0ZGF0ZXRpbWUiOiJzb21lRGF0ZSIsInN5c19zb3VyY2VzeXN0ZW0iOiJzb21lU291cmNlU3lzdGVtIn0sImRhdGEiOnsiZmllbGQxIjo2MzUzMzcxMiwiZmllbGQyIjoiIiwiZmllbGQzIjoiaGVsbG8iLCJmaWVsZDQiOiJvdGhlciIsImZpZWxkNSI6MjAyMiwiZmllbGQ2IjoiMCIsImZpZWxkNyI6IjAiLCJmaWVsZDgiOiIwIiwiZmllbGQ5IjoiMCIsImZpZWxkMTAiOiIwIiwiZmllbGQxMSI6IjAifX0seyJtZXRhZGF0YSI6eyJwcm9qZWN0IjpbInNvbWVQcm9qZWN0MSIsInNvbWVQcm9qZWN0MiJdLCJ0YWJsZV9uYW1lIjpbInNvbWVUYWJsZTEiLCJzb21lVGFibGUyIl0sInN5c19pbnNlcnRkYXRldGltZSI6InNvbWVEYXRlIiwic3lzX3NvdXJjZXN5c3RlbSI6InNvbWVTb3VyY2VTeXN0ZW0ifSwiZGF0YSI6eyJmaWVsZDEiOjYzNTMzNzEzLCJmaWVsZDIiOiJZMkpqTFRJd01qSXRkekExTFhSeVppMXZabVp5WlhNdGNtVnVabTl5ZEMxamFHVnhkV2xsY2kxM01EVT0iLCJmaWVsZDMiOiJBMEFWQiIsImZpZWxkNCI6Im90aGVyIiwiZmllbGQ1IjoiSEpsYm1admNuUWdZMmhsY1hWcFpYSWdWekExIiwiZmllbGQ2IjoiIiwiZmllbGQ3IjoiMDIvMDIvMjAyMiIsImZpZWxkOCI6IjE0LzAyLzIwMjIiLCJmaWVsZDkiOiJUaWNrZXQifX1d"}
Demo

jq: Include the lookup key as a field in the result value

I have a JSON object of the following form:
{
"vars": {
"node1": {"field1": "a", "field2": "b"},
"node2": {"field1": "x", "field2": "y"}
"unrelated": {"blah": "blah"}
},
"nodes": ["node1", "node2"]
}
Now, I can get the fields per node (excluding unrelated) using the following jq expression:
.vars[.nodes[]]
Output:
{
"field1": "a",
"field2": "b"
}
{
"field1": "x",
"field2": "y"
}
My question is, how do I include the vars key as a field in the output, i.e.
{
"node": "node1",
"field1": "a",
"field2": "b"
}
{
"node": "node2",
"field1": "x",
"field2": "y"
}
The name of the key (node in the example) is not important.
Based on this post I found an approximate solution:
.vars | to_entries | map_values(.value + {node: .key})[]
which outputs
{
"field1": "a",
"field2": "b",
"node": "node1"
}
{
"field1": "x",
"field2": "y",
"node": "node2"
}
{
"blah": "blah",
"node": "unrelated"
}
But it still includes the unrelated field which is shouldn't.
Store the nodes array's elements in a variable for reference. Storing the elements rather than the whole array automatically also iterates for the next step. Then, just compose your desired output objects using the nodes array item as object {$node} added to the looked-up object in .vars[$node].
jq '.nodes[] as $node | {$node} + .vars[$node]'
{
"node": "node1",
"field1": "a",
"field2": "b"
}
{
"node": "node2",
"field1": "x",
"field2": "y"
}
Demo

Parse 2 files based on key value and recreate another json file [JQ]

I am new to JQ.
I need to make a json file based on another 2 files.
I am worked with it whole day and stack here. Badly need this.
Here is file 1
{
"name": "foo",
"key": "1",
"id": "x"
}
{
"name": "bar",
"key": "2",
"id": "x"
}
{
"name": "baz",
"key": "3",
"id": "y"
}
file 2
{
"name": "a",
"key": "1"
}
{
"name": "b",
"key": "1"
}
{
"name": "c",
"key": "2"
}
{
"name": "d",
"key": "2"
}
{
"name": "e",
"key": "3"
}
Expected Result:
{
"x": {
"foo": [
"a",
"b"
],
"bar": [
"c",
"d"
]
},
"y": {
"baz": [
"e"
]
}
}
I can do it with python script but I need it with jq.
Thanks in advance.
Use reduce on the first file's items ($i) to successively build up the result object using setpath with fields from the item and values as a matching map on the secondary dictionary file ($d).
jq -s --slurpfile d file2 '
reduce .[] as $i ({}; setpath(
[$i.id, $i.name];
[$d[] | select(.key == $i.key).name]
))
' file1
For efficiency, the following solution first constructs a "dictionary" based on file2; furthermore, it does so without having to "slurp" it.
< file2 jq -nc --slurpfile file1 file1 '
(reduce inputs as {$name, $key} ({};
.[$key] += [$name])) as $dict
| reduce $file1[] as {$name, $key, $id} ({};
.[$id] += [ {($name): $dict[$key]} ] )
'

jq slurp and add key / value pairs

$DATA is a long string containing some Email addresses.
echo "$DATA" | grep -Eo "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" | sort | uniq | jq --slurp --raw-input 'split("\n")[:-1]'
Output:
[
"email1#mydomain.com",
"email2#mydomain.com",
"email3#mydomain.com",
"email4#mydomain.com"
]
Desired Output:
[
{
"email": "email1#mydomain.com",
"free": "0",
"used": "0"
},
{
"email": "email2#mydomain.com",
"free": "0",
"used": "0"
},
{
"email": "email3#mydomain.com",
"free": "0",
"used": "0"
},
{
"email": "email4#mydomain.com",
"free": "0",
"used": "0"
}
]
I guess it should be something like += {"free": "0"}
You can replace your current jq command by the following :
jq --slurp --raw-input 'split("\n")[:-1] | map({email: ., free: 0, used: 0})'
You can try it here.

Filter json file objects into a separate json file with jq

At the moment I'm successfully exporting curl output with JQ into a file with valid json.
The comment is as below:
jsonValues=<curl command> | jq '.["issues"] | map({key: .key, type: .fields.issuetype.name, typeid: .fields.issuetype.id, status: .fields.status.name, summary: .fields.summary})' > FullClosedIssueList.json; `
You can see that I'm doing two things with this one command:
Putting all the results into jsonValues.
Exporting to FullClosedIssueList.json.
I find that the jsonValues objects are formatted missing [, ] and ,.
{
"key": "ON-12345",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Some Bug Title"
}
{
"key": "ON-12346",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Some Other Bug Title"
}
Whereas the file output is valid json.
[
{
"key": "ON-12345",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Some Bug Title"
},
{
"key": "ON-12346",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Some Other Bug Title"
}
]
What command do I need to add to JQ such that the objects passed to the bash variable are valid json?
EDIT: This is the same problem as described here
you need the "map" command in that too, in which the select command should be enclosed:
cat FullClosedIssueList.json | jq '.[] | map(select(.typeid=="1"))'