How to parse nested json to csv using command line - json

I want to parse a nested json to csv. The data looks similar to this.
{"tables":[{"name":"PrimaryResult","columns":[{"name":"name","type":"string"},{"name":"id","type":"string"},{"name":"custom","type":"dynamic"}]"rows":[["Alpha","1","{\"age\":\"23\",\"number\":\"xyz\"}]]]}
I want csv file as:
name id age number
alpha 1 23 xyz
I tried:
jq -r ".tables | .[] | .columns | map(.name)|#csv" demo.json > demo.csv
jq -r ".tables | .[] | .rows |.[]|#csv" demo.json >> demo.csv
But I am not getting expected result.
Output:
name id custom
alpha 1 {"age":"23","number":"xyz}
Expected:
name id age number
alpha 1 23 xyz

Assuming valid JSON input:
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{ "name": "name", "type": "string" },
{ "name": "id", "type": "string" },
{ "name": "custom", "type": "dynamic" }
],
"rows": [
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"xyz\"}"
]
}
]
}
And assuming fixed headers:
jq -r '["name", "id", "age", "number"],
(.tables[].rows | [.[0,1], (.[2] | fromjson | .age, .number)])
| #csv' input.json
Output:
"name","id","age","number"
"Alpha","1","23","xyz"
If any of the assumptions is wrong, you need to clarify your requirements, e.g.
How are column names determined?
What happens if the input contains multiple tables?
As the "dynamic" object always of the same shape? Or can it sometimes contain fewer, more, or different columns?

Assuming that the .rows array is a 2D array of rows and fields, and that a column of type "dynamic" always expects a JSON-encoded object whose fields represent further columns but may or may not always be present in every row.
Then you could go with transposing the headers array and the rows array in order to integratively process each column by their type, especially collecting all keys from the "dynamic" type on the fly, and then transpose it back to get the row-based CSV output.
Input (I have added another row for illustration):
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "name",
"type": "string"
},
{
"name": "id",
"type": "string"
},
{
"name": "custom",
"type": "dynamic"
}
],
"rows": [
[
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"123\"}"
],
[
"Beta",
"2",
"{\"age\":\"45\",\"word\":\"xyz\"}"
]
]
}
]
}
Filter:
jq -r '
.tables[] | [.columns, .rows[]] | transpose | map(
if first.type == "string" then first |= .name
elif first.type == "dynamic" then
.[1:] | map(fromjson)
| (map(keys[]) | unique) as $keys
| [$keys, (.[] | [.[$keys[]]])] | transpose[]
else empty end
)
| transpose[] | #csv
'
Output:
"name","id","age","number","word"
"Alpha","1","23","123",
"Beta","2","45",,"xyz"
Demo

Related

How to remove last "column" in a hash of an array of array of hashes JSON input using jq 1.5?

I want to convert the following JSON content stored in a file tmp.json
{
"results": [
[
{
"field": "field1",
"value": "value1-1"
},
{
"field": "field2",
"value": "value1-2"
},
{
"field": "field3",
"value": "value1-3"
}
],
[
{
"field": "field1",
"value": "value2-1"
},
{
"field": "field2",
"value": "value2-2"
},
{
"field": "field3",
"value": "value2-3"
}
],
[
{
"field": "field1",
"value": "value3-1"
},
{
"field": "field2",
"value": "value3-2"
},
{
"field": "field3",
"value": "value3-3"
}
]
]
}
into CSV output:
"field1","field2"
"value1-1","value1-2"
"value2-1","value2-2"
"value3-1","value3-2"
The closest jq expression I've come up with is this:
cat ./tmp.json | jq -r '.results | [ .[] | del(last) ] | (first | map(.field)), (.[] | map(.value)) | #csv'
It works for jq version 1.6, but for version 1.5, the last "column" is still included in the CSV result. How do I edit the jq code so that it works for version 1.5?
Note that the number of columns is not limited to 3; it can be more. The jq code should be able to remove the last column in the final CSV result.
You can use the slice operator to extract the sub-elements of the array inside each field and put into an array and use the #csv
[ .results[][:-1] ] | (first | map(.field)), (.[] | map(.value)) | #csv
The part .results[][:-1] extracts all the elements except the last one in the array. From the manual
Either index may be negative (in which case it counts backwards from the end of the array), or omitted (in which case it refers to the start or end of the array).
See jqplay for a working demo.
I was able to reproduce the bug in jq-1.5. Its a known fact that there are major bugs fixed as part of release 1.6 in del/1. So consider upgrading to use del/1 or use the slice expression as indicated in the answer above

use jq to format json data into csv data

{
"Users": [
{
"Attributes": [
{
"Name": "sub",
"Value": "1"
},
{
"Name": "phone_number",
"Value": "1234"
},
{
"Name": "referral_code",
"Value": "abc"
}
]
},
{
"Attributes": [
{
"Name": "sub",
"Value": "2"
},
{
"Name": "phone_number",
"Value": "5678"
},
{
"Name": "referral_code",
"Value": "def"
}
]
}
]
}
How can I produce output like below ?
1,1234,abc
2,5678,def
jq '.Users[] .Attributes[] .Value' test.json
produces
1
1234
abc
2
5678
def
Not sure this is the cleanest way to handle this, but the following will get the desired output:
.Users[].Attributes | map(.Value) | #csv
Loop through all the deep Attributes .Users[].Attributes
map() to get all the Value's
Convert to #csv
jqPlay demo
If you don't need the output to be guaranteed to be CSV, and if you're sure the "Name" values are presented in the same order, you could go with:
.Users[].Attributes
| from_entries
| [.[]]
| join(",")
To be safe though it would be better to ensure consistency of ordering:
(.Users[0] | [.Attributes[] | .Name]) as $keys
| .Users[]
| .Attributes
| from_entries
| [.[ $keys[] ]]
| join(",")
Using join(",") will produce the comma-separated values as shown in the Q (without the quotation marks), but is not guaranteed to produce the expected CSV for all valid values of the input. If you don't mind the pesky quotation marks, you could use #csv, or if you want to skip the quotation marks around all numeric values:
map(tonumber? // .) | #csv

using jq : how can i use the same search in other field without duplicate code?

I have the following json file for exemple:
{
"FOO": {
"name": "Donald",
"location": "Stockholm"
},
"BAR": {
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"name": "Jack",
"location": "Whereever"
}
}
and i have this jq command :
cat json | jq .[] | {newname : select(.location=="Stockholm") | .name , contains_w : select(.location=="Stockholm") | .name | startswith("W")}
so i get the result :
{
"newname": "Donald",
"contains_w": false
}
{
"newname": "Walt",
"contains_w": true
}
my question is : is there any way to DRY my command ?
i mean how can i get the same result without duplicate the part :
select(.location=="Stockholm") | .name
how can i reuse the result of newname feild ?
i have a really big file to work with so i don't want to waste time and resources.
You are filtering multiple times during object construction. You could filter first and then do the construction on the filtered list eg.
map(select(.location=="Stockholm"))
| map({newname: .name, contains_w: (.name | startswith("W"))})
https://jqplay.org/s/aXjlgOEDnb

Combine JSON Field and add values using jq

I have to aggregate a few JSON results from a site. Because the site has a query concurrency limit and the queries timeout, the time frame for the queries have to be divided. So I am left with a JSON as follows:
{
"results": [
[
{
"field": "AccountId",
"value": "11352"
},
{
"field": "number_of_requests",
"value": "241398"
}
],
[
{
"field": "AccountId",
"value": "74923"
},
{
"field": "number_of_requests",
"value": "238566"
}
]
],
"statistics": {
"recordsMatched": 502870.0,
"recordsScanned": 165908292.0,
"bytesScanned": 744173091162.0
},
"status": "Complete"
}
{
"results": [
[
{
"field": "AccountId",
"value": "11352"
},
{
"field": "number_of_requests",
"value": "185096"
}
]
],
"statistics": {
"recordsMatched": 502870.0,
"recordsScanned": 165908292.0,
"bytesScanned": 744173091162.0
},
"status": "Complete"
}
I need to aggregate the results, match the values to the number of requests and print out the result in descending Order.
Desired Output:
AccountID : Number of Requests
11352 : 426494
74923 : 238566
Current Output:
AccountID : Number of Requests
11352 : 241398
11352 : 185096
74923 : 238566
The jq query I am running currently takes the file name as ResultDir:
list=$(jq -S '.results[] | map( { (.field) : .value} ) | add ' $ResultsDir |
jq -s -c 'sort_by(.number_of_requests|tonumber) | reverse[] ' |
jq -r '"\(.AccountId) : \(.number_of_requests)"')
How do I combine the results of the same accounts before printing it out? The results also need to be in descending order of number of requests.
When possible, it's generally advisable to minimize the number of calls to jq. In this case, it's easy enough to achieve the desired output with just one call to jq.
Assuming the input is a valid stream of JSON objects along the lines shown in the Q, the following produces the desired output:
jq -nr '
[inputs | .results[] | map( { (.field) : .value} ) | add]
| group_by(.AccountId)
| map([.[0].AccountId, (map(.number_of_requests|tonumber) | add)])
| sort_by(.[1]) | reverse
| .[]
| join(" : ")
'

jq: Insert values according to mappings from external file

I was wondering how I can complete this task by command line jq. I make up a file with similar nested structure as follows:
{
"item": "item1",
"features": [
{
"feature": "feature_a",
"value": ""
},
{
"feature": "feature_b",
"value": ""
}
]
}
Now I have another file that maps the feature to value:
feature_a value_1
feature_b value_2
So I would like to insert the value into the first json file, according to the maps, resulting the following output:
{
"item": "item1";
"features": [
{
"feature": "feature_a",
"value": "value_1"
},
{
"feature": "feature_b",
"value": "value_2"
}
]
}
How I can achieve above operation by jq?
Thanks in advance!
Assuming the text file is in dict.txt and the JSON file is in source.json, the invocation
jq -Rs --argfile target source.json dict.txt '
([ split("\n")[]
| select(length>0)
| split(" ")
| { (.[0]): .[1]} ]
| add) as $dict
| $target
| .features |= map(.value = $dict[.feature])'
would yield the desired output.
The main reason for including select(length>0) is to skip any empty strings that might result from using split("\n") to split an entire file.