How to convert nested JSON to CSV using only jq - json

I've following json,
{
"A": {
"C": {
"D": "T1",
"E": 1
},
"F": {
"D": "T2",
"E": 2
}
},
"B": {
"C": {
"D": "T3",
"E": 3
}
}
}
I want to convert it into csv as follows,
A,C,T1,1
A,F,T2,2
B,C,T3,3
Description of output: The parents keys will be printed until, I've reached the leaf child. Once I reached leaf child, print its value.
I've tried following and couldn't succeed,
cat my.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $rows[] | #csv'
and it throwing me an error.
I can't hardcode the parent keys, as the actual json has too many records. But the structure of the json is similar. What am I missing?

Some of the requirements are unclear, but the following solves one interpretation of the problem:
paths as $path
| {path: $path, value: getpath($path)}
| select(.value|type == "object" )
| select( [.value[]][0] | type != "object")
| .path + ([.value[]])
| #csv
(This program could be optimized but the presentation here is intended to make the separate steps clear.)
Invocation:
jq -r -f leaves-to-csv.jq input.json
Output:
"A","C","T1",1
"A","F","T2",2
"B","C","T3",3
Unquoted strings
To avoid the quotation marks around strings, you could replace the last component of the pipeline above with:
join(",")

Here is a solution using tostream and group_by
[
tostream
| select(length == 2) # e.g. [["A","C","D"],"T1"]
| .[0][:-1] + [.[1]] # ["A","C","T1"]
]
| group_by(.[:-1]) # [[["A","C","T1"],["A","C",1]],...
| .[] # [["A","C","T1"],["A","C",1]]
| .[0][0:2] + map(.[-1]|tostring) # ["A","C","T1","1"]
| join(",") # "A,C,T1,1"

Related

How to get the first object after filtering an array in jq?

Given the following JSON
{
"tags": [
{
"key": "env",
"value": "foo"
},
{
"key": "env",
"value": "bar"
}
]
}
I am trying to find out the first tag where the key is env. I have this-
.tags[] | select (.key=="env") |.[0]
but that gives me an error Cannot index object with number
Use first(expr) to provide an expression that satisfies your usecase.
first(.tags[]? | select(.key == "env") .value)
You could wrap the results of your query in an array and then pick the first one
[.tags[] | select(.key=="env")] | .[0]
jq -r 'first( .tags[] | select(.key=="env") ).value'
jqplay
.tags[] flattens the array into a stream of values. You're applying .[0] to each of the values, not a filtered array. To filter an array, you'd use
.tags | map(select(...)) | .[0]
or
.tags | map(select(...)) | first
map(...) is a shorthand for [ .[] | ... ], so the above is equivalent to
.tags | [ .[] | select(...) ] | first
and
[ .tags[] | select(...) ] | first
Finally, [ ... ] | first can be written as first(...).
first( .tags[] | select(...) )

Using jq to convert json to csv

I'm trying to come up with the correct jq syntax to convert json to csv.
Desired results:
<email>,<id>,<name>
e.g.
user1#whatever.nevermind.no,0,general
user2#whatever.nevermind.no,0,general
user1#whatever.nevermind.no,1,local
...
note that also need to ignore objects with empty "agent_priorities"
Input
[
{
"id": 0,
"name": "General",
"agent_priorities": {
"user1#whatever.nevermind.no": "normal",
"user2#whatever.nevermind.no": "normal"
}
},
{
"id": 1,
"name": "local",
"agent_priorities": {
"user1#whatever.nevermind.no": "normal"
}
},
{
"id": 2,
"name": "Engineering",
}
]
The following variant of the accepted answer checks for the existence of the "agent_priorities" key as per the requirements, and uses keys_unsorted to preserve the order of the keys:
jq -r '
.[]
| select(has("agent_priorities"))
| .id as $id
| .name as $name
| .agent_priorities
| keys_unsorted[]
| [., $id, $name ]
| #csv
' file.json
Store the id and name in variables, then iterate over the keys of agent_priorities:
jq -r '.[]
| .id as $id
| .name as $name
| .agent_priorities
| keys
| .[]
| [., $id, $name ]
| #csv
' file.json

jq - converting json to cvs - how to treat "null" as string?

I have the following json file which I would like to convert to csv:
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
Converting it to csv with the following jq produces:
$ jq -sr '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv' < test.json
"date","id"
"2014-05-05T19:07:48.577",1
,2
Unfortunately, for the line with "id" equal to "2", the date column was not set to "null" - instead, it was empty. This in turn makes MySQL error on import if it's a datetime column (it expects a literal "null" if we don't have a date, and errors on "").
How can I make jq print the literal "null", and not ""?
I'd go with:
(map(keys_unsorted) | add | unique) as $cols
| $cols,
(.[] | [.[$cols[]]] | map(. // "null") )
| #csv
First, using keys_unsorted avoids useless sorting.
Second, [.[$cols[]]] is an important, recurrent and idiomatic pattern, used to ensure an array is constructed in the correct order without resorting to the reduce sledge-hammer.
Third, although map(. // "null") seems to be appropriate here, it should be noted that this expression will also replace false with "null", so, it would not be appropriate in general. Instead, to preserve false, one could write map(if . == null then "null" else . end).
Fourth, it should be noted that using map(. // "null") as above will also mask missing values of any of the keys, so if one wants some other behavior (e.g., raising an error if id is missing), then an alternative approach would be warranted.
The above assumes the stream of JSON objects shown in the question is "slurped", e.g. using jq's -s command-line option.
Use // as alternative operator for your cell value:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.] // "null")) as $rows | $cols, $rows[] | #csv' < test.json
(The whole string is pretty good explained here: https://stackoverflow.com/a/32965227/16174836)
You can "stringify" the value using tostring by changing map($row[.]) into map($row[.]|tostring):
$ cat so2332.json
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
$ jq --slurp --raw-output '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.]|tostring)) as $rows | $cols, $rows[] | #csv' so2332.json
"date","id"
"2014-05-05T19:07:48.577","1"
"null","2"
Note that the use of tostring will cause the numbers to be converted to strings.

jq: error (at <stdin>:0): Cannot iterate over string, cannot execute unique problem

We are trying to parse a JSON file to a tsv file. We are having problems trying to eliminate duplicate Id with unique.
JSON file
[
{"Id": "101",
"Name": "Yugi"},
{"Id": "101",
"Name": "Yugi"},
{"Id": "102",
"Name": "David"},
]
cat getEvent_all.json | jq -cr '.[] | [.Id] | unique_by(.[].Id)'
jq: error (at :0): Cannot iterate over string ("101")
A reasonable approach would be to use unique_by, e.g.:
unique_by(.Id)[]
| [.Id, .Name]
| #tsv
Alternatively, you could form the pairs first:
map([.Id, .Name])
| unique_by(.[0])[]
| #tsv
uniques_by/2
For very large arrays, though, or if you want to respect the original ordering, a sort-free alternative to unique_by should be considered. Here is a suitable, generic, stream-oriented alternative:
def uniques_by(stream; f):
foreach stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s
else ($s|tostring) end) as $y
| if .[$t][$y] then .emit = false
else .emit = true | (.item = $x) | (.[$t][$y] = true)
end;
if .emit then .item else empty end );

Pad JSON array with JQ to obtain rectangular result

I have json that looks like this (jq play in the link), and I want to build csv in the end looking like this (reproducible sample at the bottom).
"SO302993",items1,item2,item3.1,item3.2,item3.3, item3.4,...
"SO302994",items1,item2,item3.1,item3.2, , ,...
"SO302995",items1,item2,item3.1,item3.2,item3.3, ,...
item3 elements are in an array and my current solution:
.[] | [.number, .item1, item2, item3[]?]
gives me this:
"SO302993",items1,item2,item3.1,item3.2,item3.3, item3.4,...
"SO302994",items1,item2,item3.1,item3.2,...
"SO302995",items1,item2,item3.1,item3.2,item3.3,...
which will create an uneven number of columns in the csv.
I tried adding .item3[:]? in a Python flavor-style, but it didn't work.
Any help would be much appreciated! And if I wasn't clear do ask to clarify! My snippet and toy data are in the link above.
{
"items": [
{
"name": "Mr Simon Mackin",
"country_of_residence": "Scotland",
"natures_of_control": [
"voting-rights-25-to-50-percent-limited-liability-partnership",
"significant-influence-or-control-limited-liability-partnership"
],
"premises": "4"
}
]
}
{
"items": [
{
"name": "Mrs Simonne Mackinni",
"country_of_residence": "France",
"natures_of_control": [
"significant-influence-or-control-limited-liability-partnership"
],
"premises": "4"
}
]
}
with this query:
.items[] | [.name, .country_of_residence, .natures_of_control[]?, .premises] | #csv
I get this results
"Mr Simon Mackin","Scotland","voting-rights","significant-influence","4"
"Mrs Simonne Mackinni","France","significant-influence","4"
But I'd like to get this (second line has extra comma after "significant-influence).
"Mr Simon Mackin","Scotland","voting-rights","significant-influence","4"
"Mrs Simonne Mackinni","France","significant-influence",,"4"
Since you want a rectangular result, you will have to "pad" the "natures_of_control" array. Based on the sample input, you will need to "slurp" the input in order to obtain a global maximum.
To pad the array, you could use the helper function:
# emit a stream of exactly $n items
def pad($n): range(0;$n) as $i | .[$i];
The solution to the problem as posted on jqplay then becomes:
([.[] | .items[] | .natures_of_control | length] | max) as $mx
| .[]
| (.active_count) as $active_count
| (.ceased_count) as $ceased_count
| (.links.self | split("/")[2]) as $companyCode
| .items[]
| [$companyCode, $active_count, $ceased_count, .name, .country_of_residence, .nationality, .notified_on, (.natures_of_control | pad($mx))]
| #csv
Invocation
The appropriate invocation would look like this:
jq -sr -f program.jq input.json
Handling missing data
To ignore objects that have no "items" you could tweak the above, e.g. as follows:
([.[] | .items[]? | .natures_of_control | length] | max) as $mx
| .[]
| select(.items)
| (.active_count) as $active_count
| (.ceased_count) as $ceased_count
| (.links.self | split("/")[2]) as $companyCode
| .items[]
| [$companyCode, $active_count, $ceased_count, .name, .country_of_residence, .nationality, .notified_on, (.natures_of_control | pad($mx))]
| #csv