JQ - Denormalize nested object - json

I've been trying to convert some JSON to csv and I have the following problem:
I have the following input json:
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
And I expect the following CSV output:
id,t1,t2
100,2,3
200,,3
300,3,
Unfortunately JQ doesn't output if one of select has no match.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
output:
{ "t1": 2, "t2": 3 }
but if one of the objects select returns no match it doesn't return at all.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
Expected output:
{ "t1": 2, "t2": null }
Does anyone know how to achieve this with JQ?
EDIT:
Based on a comment made by #peak I found the solution that I was looking for.
jq -r '["id","t1","t2"],[.id, (.a[] | select(.t==1)).c//null, (.a[] | select(.t==2)).c//null ]|#csv'
The alternative operator does exactly what I was looking for.
Alternative Operator

Here's a simple solution that does not assume anything about the ordering of the items in the .a array, and easily generalizes to arbitrarily many .t values:
# Convert an array of {t, c} to a dictionary:
def tod: map({(.t|tostring): .c}) | add;
["id", "t1", "t2"], # header
(inputs
| (.a | tod) as $dict
| [.id, (range(1;3) as $i | $dict[$i|tostring]) ])
| #csv
Command-line options
Use the -n option (because inputs is being used), and the -r option (to produce CSV).

This is an absolute mess, but it works:
$ cat tmp.json
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
$ cat filter.jq
def t(id):
.a |
map({key: "t\(.t)", value: .c}) |
({t1:null, t2:null, id:id} | to_entries) + . | from_entries
;
inputs |
map(.id as $id | t($id)) |
(.[0] | keys) as $hdr |
([$hdr] + map(to_entries |map(.value)))[]|
#csv
$ jq -rn --slurp -f filter.jq tmp.json
"id","t1","t2"
2,3,100
,3,200
3,,300
In short, you produce a direct object containing the values from your input, then add it to a "default" object to fill in the missing keys.

Related

Using jq to convert json to csv

I'm trying to come up with the correct jq syntax to convert json to csv.
Desired results:
<email>,<id>,<name>
e.g.
user1#whatever.nevermind.no,0,general
user2#whatever.nevermind.no,0,general
user1#whatever.nevermind.no,1,local
...
note that also need to ignore objects with empty "agent_priorities"
Input
[
{
"id": 0,
"name": "General",
"agent_priorities": {
"user1#whatever.nevermind.no": "normal",
"user2#whatever.nevermind.no": "normal"
}
},
{
"id": 1,
"name": "local",
"agent_priorities": {
"user1#whatever.nevermind.no": "normal"
}
},
{
"id": 2,
"name": "Engineering",
}
]
The following variant of the accepted answer checks for the existence of the "agent_priorities" key as per the requirements, and uses keys_unsorted to preserve the order of the keys:
jq -r '
.[]
| select(has("agent_priorities"))
| .id as $id
| .name as $name
| .agent_priorities
| keys_unsorted[]
| [., $id, $name ]
| #csv
' file.json
Store the id and name in variables, then iterate over the keys of agent_priorities:
jq -r '.[]
| .id as $id
| .name as $name
| .agent_priorities
| keys
| .[]
| [., $id, $name ]
| #csv
' file.json

Modify arrays within objects in jq

I have an array of objects, and want to filter the arrays in the b property to only have elements matching the a property of the object.
[
{
"a": 3,
"b": [
1,
2,
3
]
},
{
"a": 5,
"b": [
3,
5,
4,
3,
5
]
}
]
produces
[
{
"a": 3,
"b": [
3
]
},
{
"a": 5,
"b": [
5,
5
]
}
]
Currently, I've arrived at
[.[] | (.a as $a | .b |= [.[] | select(. == $a)])]
That works, but I'm wondering if there's a better (shorter, more readable) way.
I can think of two ways to do this with less code and both are variants of what you have already figured out on your own.
map(.a as $a | .b |= map(select(. == $a)))
del(.[] | .a as $a | .b[] | select(. != $a))

jq - Get a higher level key after a selection

Given a JSON like the following:
{
"data": [{
"id": "1a2b3c",
"info": {
"a": {
"number": 0
},
"b": {
"number": 1
},
"c": {
"number": 2
}
}
}]
}
I want to select on a number that is greater than or equal to 2 and for that selection I want to return the values of id and number. I did this like so:
$ jq -r '.data[] | .id as $ID | .info[] | select(.number >= 2) | [$ID, .number]' in.json
[
"1a2b3c",
2
]
Now I would also like to return a higher level key for my selection, in my case I need to return c. How can I accomplish this?
Assuming you want the string "c" instead of 2 in the output, this will work:
$ jq '.data[] | .id as $ID | .info | to_entries[] | select(.value.number >= 2) | [$ID, .key]' input.json
[
"1a2b3c",
"c"
]

Json to CSV conversion with value as headers

I have a below JSON file and need to convert to CSV file with some values as headers and below that values should get populated. Below is the sample json
{
"environments" : [ {
"dimensions" : [ {
"metrics" : [ {
"name" : "count",
"values" : [ "123" ]
}, {
"name" : "response_time",
"values" : [ "15.7" ]
}],
"name" : "abcd"
}, {
"metrics" : [ {
"name" : "count",
"values" : [ "456" ]
}, {
"name" : "response_time",
"values" : [ "18.7" ]
}],
"name" : "xyzz"
}
This is what I have tried already
jq -r '.environments[].dimensions[] | .name as $p_name | .metrics[] | .name as $val_name | if $val_name == "response_time" then ($p_name,$val_name, .values[])' input.json
Expected out as
name,count,response_time
abcd, 123, 15.7
xyzz, 456, 18.7
If the goal is to rely on the JSON itself to supply the header names in whatever order the "metrics" arrays present them,
then consider:
.environments[].dimensions
| ["name", (.[0] | .metrics[] | .name)], # first emit the headers
( .[] | [.name, (.metrics[].values[0])] ) # ... and then the data rows
| #csv
Generating the headers is easy, so I'll focus on generating the rest of the CSV.
The following has the advantage of being straightforward and will hopefully be more-or-less self-explanatory, at least with the jq manual at the ready. A tweak with an eye to efficiency follows.
jq -r '
# name,count,response_time
.environments[].dimensions[]
| .name as $p_name
| .metrics
| [$p_name]
+ map(select(.name == "count") | .values[0] )
+ map(select(.name == "response_time") | .values[0] )
| #csv
'
Efficiency
Here's a variant of the above which would be appropriate if the .metrics array had a large number of items:
jq -r '
# name,count,response_time
.environments[].dimensions[]
| .name as $p_name
| INDEX(.metrics[]; .name) as $dict
| [$p_name, $dict["count"].values[0], $dict["response_time"].values[0]]
| #csv
'

How to convert nested JSON to CSV using only jq

I've following json,
{
"A": {
"C": {
"D": "T1",
"E": 1
},
"F": {
"D": "T2",
"E": 2
}
},
"B": {
"C": {
"D": "T3",
"E": 3
}
}
}
I want to convert it into csv as follows,
A,C,T1,1
A,F,T2,2
B,C,T3,3
Description of output: The parents keys will be printed until, I've reached the leaf child. Once I reached leaf child, print its value.
I've tried following and couldn't succeed,
cat my.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $rows[] | #csv'
and it throwing me an error.
I can't hardcode the parent keys, as the actual json has too many records. But the structure of the json is similar. What am I missing?
Some of the requirements are unclear, but the following solves one interpretation of the problem:
paths as $path
| {path: $path, value: getpath($path)}
| select(.value|type == "object" )
| select( [.value[]][0] | type != "object")
| .path + ([.value[]])
| #csv
(This program could be optimized but the presentation here is intended to make the separate steps clear.)
Invocation:
jq -r -f leaves-to-csv.jq input.json
Output:
"A","C","T1",1
"A","F","T2",2
"B","C","T3",3
Unquoted strings
To avoid the quotation marks around strings, you could replace the last component of the pipeline above with:
join(",")
Here is a solution using tostream and group_by
[
tostream
| select(length == 2) # e.g. [["A","C","D"],"T1"]
| .[0][:-1] + [.[1]] # ["A","C","T1"]
]
| group_by(.[:-1]) # [[["A","C","T1"],["A","C",1]],...
| .[] # [["A","C","T1"],["A","C",1]]
| .[0][0:2] + map(.[-1]|tostring) # ["A","C","T1","1"]
| join(",") # "A,C,T1,1"