jq JSON handle missing fields - json

Normally, I want to do something like this, which works:
echo '{"a": 123, "b": "{\"embedded\": 456}", "c": 789}' | jq '{a, "b2": .b | fromjson, c}'
that works and yields:
{
"a": 123,
"b2": {
"embedded": 456
},
"c": 789
}
Sometimes, field "b" is missing and in that case the above jq will error:
echo '{"a": 123, "c": 789}' | jq '{a, "b2": .b | fromjson, c}'
That gives a null error, which makes sense because b is missing so .b is null and fromjson errors on null. I want to write my jq command so if b is missing for a record, then it isn't output. I want to simply get this:
{
"a": 123,
"c": 789
}
I've skimmed through the jq Reference manual. I'm having trouble getting the if/else/end construct to solve this. I don't see any other null handling functions or language constructs available.

There are several possible approaches, each with variants. Here are illustrations of a constructive approach, and a postprocessing one.
Constructive
{a} + ((select(has("b")) | {"b2": .b | fromjson}) // {}) + {c}
or more concisely but with slightly different semantics:
[{a}, (select(.b) | {b2: .b | fromjson}), {c}] | add
Postprocessing
{a, "b2": (.b | fromjson? // null), c}
| if .b2 == null then del(.b2) else . end

You need something like below. Use the if condition to check against the presence of .b to be null. Form the structure to have all fields when not null.
jq 'if .b != null then { a, "b2":(.b|fromjson), c } else {a, c} end'

This works the best, combining suggestions from peak and Inian's answers:
# example with b
echo '{"a": 123, "b": "{\"embedded\": 456}", "c": 789}' | jq '{a, c} + if has("b") then {"b": .b | fromjson} else {} end'
# example without b
echo '{"a": 123, "not_b": "{\"embedded\": 456}", "c": 789}' | jq '{a, c} + if has("b") then {"b": .b | fromjson} else {} end'

Related

jq: join string array whether it is empty or not

I have a JSON test.json as follows:
[
{
"a": "a",
"b": [
"a",
"b",
"c"
]
},
{
"a": "a"
}
]
And I would want to join the field b of each entry and handle the case of its
emptiness:
{ "a": "a",
"b": "a, b, c"
},
{
"a": "a",
"b": null
}
The following command works...
cat test.json |
jq '.[] | .b as $t | if $t then {a: .a, b: $t | join(", ")} else {a: .a, b: $t} end'
... but it's too long as I have to write almost the same constructor two times.
I have tried to move the if-then-else conditional or even the // operator in the {} construction, but they result to a syntax error.
Depending on how you want to handle null/empty values you could try these:
map(.b |= (. // [] | join(", ")))
map(if .b then .b |= join(", ") else . end)

JQ - Denormalize nested object

I've been trying to convert some JSON to csv and I have the following problem:
I have the following input json:
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
And I expect the following CSV output:
id,t1,t2
100,2,3
200,,3
300,3,
Unfortunately JQ doesn't output if one of select has no match.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
output:
{ "t1": 2, "t2": 3 }
but if one of the objects select returns no match it doesn't return at all.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
Expected output:
{ "t1": 2, "t2": null }
Does anyone know how to achieve this with JQ?
EDIT:
Based on a comment made by #peak I found the solution that I was looking for.
jq -r '["id","t1","t2"],[.id, (.a[] | select(.t==1)).c//null, (.a[] | select(.t==2)).c//null ]|#csv'
The alternative operator does exactly what I was looking for.
Alternative Operator
Here's a simple solution that does not assume anything about the ordering of the items in the .a array, and easily generalizes to arbitrarily many .t values:
# Convert an array of {t, c} to a dictionary:
def tod: map({(.t|tostring): .c}) | add;
["id", "t1", "t2"], # header
(inputs
| (.a | tod) as $dict
| [.id, (range(1;3) as $i | $dict[$i|tostring]) ])
| #csv
Command-line options
Use the -n option (because inputs is being used), and the -r option (to produce CSV).
This is an absolute mess, but it works:
$ cat tmp.json
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
$ cat filter.jq
def t(id):
.a |
map({key: "t\(.t)", value: .c}) |
({t1:null, t2:null, id:id} | to_entries) + . | from_entries
;
inputs |
map(.id as $id | t($id)) |
(.[0] | keys) as $hdr |
([$hdr] + map(to_entries |map(.value)))[]|
#csv
$ jq -rn --slurp -f filter.jq tmp.json
"id","t1","t2"
2,3,100
,3,200
3,,300
In short, you produce a direct object containing the values from your input, then add it to a "default" object to fill in the missing keys.

jq: error (at <stdin>:0): Cannot iterate over string, cannot execute unique problem

We are trying to parse a JSON file to a tsv file. We are having problems trying to eliminate duplicate Id with unique.
JSON file
[
{"Id": "101",
"Name": "Yugi"},
{"Id": "101",
"Name": "Yugi"},
{"Id": "102",
"Name": "David"},
]
cat getEvent_all.json | jq -cr '.[] | [.Id] | unique_by(.[].Id)'
jq: error (at :0): Cannot iterate over string ("101")
A reasonable approach would be to use unique_by, e.g.:
unique_by(.Id)[]
| [.Id, .Name]
| #tsv
Alternatively, you could form the pairs first:
map([.Id, .Name])
| unique_by(.[0])[]
| #tsv
uniques_by/2
For very large arrays, though, or if you want to respect the original ordering, a sort-free alternative to unique_by should be considered. Here is a suitable, generic, stream-oriented alternative:
def uniques_by(stream; f):
foreach stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s
else ($s|tostring) end) as $y
| if .[$t][$y] then .emit = false
else .emit = true | (.item = $x) | (.[$t][$y] = true)
end;
if .emit then .item else empty end );

How can a sub-object be pruned from a json object using jq?

In the following, I am trying to delete one of the two objects in the "bar" array, the one where "v" == 2:
{
"foo": {},
"bar": [
{
"v": 2
},
{
"v": 1
}
]
}
I am able to first only keep the list, then delete the matching object:
.bar[] | select(.v ==2 | not)
returns:
{
"v": 1
}
Is there a way to delete a sub-object to keep the enclosing object:
{
"foo": {},
"bar": [
{
"v": 1
}
]
}
Along the lines of the given attempt, namely:
.bar[] | select(.v ==2 | not)
you would use the |= operator, e.g.:
.bar |= map(select(.v ==2 | not))
Or simply:
.bar |= map(select(.v != 2))
If you only wanted to delete the first match, you could write:
.bar |= (index({v:2}) as $i| .[:$i] + .[$i+1:])
or more robustly:
.bar |= (index({v:2}) as $i
| if $i then .[:$i] + .[$i+1:] else . end)
or if you prefer:
.bar |= ( ([.[].v]|index(2)) as $i
| if $i then del(.[$i]) else . end)
Use the del operator to delete the node you want:
<file jq 'del(.bar[] | select(.v==2))'

How to convert nested JSON to CSV using only jq

I've following json,
{
"A": {
"C": {
"D": "T1",
"E": 1
},
"F": {
"D": "T2",
"E": 2
}
},
"B": {
"C": {
"D": "T3",
"E": 3
}
}
}
I want to convert it into csv as follows,
A,C,T1,1
A,F,T2,2
B,C,T3,3
Description of output: The parents keys will be printed until, I've reached the leaf child. Once I reached leaf child, print its value.
I've tried following and couldn't succeed,
cat my.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $rows[] | #csv'
and it throwing me an error.
I can't hardcode the parent keys, as the actual json has too many records. But the structure of the json is similar. What am I missing?
Some of the requirements are unclear, but the following solves one interpretation of the problem:
paths as $path
| {path: $path, value: getpath($path)}
| select(.value|type == "object" )
| select( [.value[]][0] | type != "object")
| .path + ([.value[]])
| #csv
(This program could be optimized but the presentation here is intended to make the separate steps clear.)
Invocation:
jq -r -f leaves-to-csv.jq input.json
Output:
"A","C","T1",1
"A","F","T2",2
"B","C","T3",3
Unquoted strings
To avoid the quotation marks around strings, you could replace the last component of the pipeline above with:
join(",")
Here is a solution using tostream and group_by
[
tostream
| select(length == 2) # e.g. [["A","C","D"],"T1"]
| .[0][:-1] + [.[1]] # ["A","C","T1"]
]
| group_by(.[:-1]) # [[["A","C","T1"],["A","C",1]],...
| .[] # [["A","C","T1"],["A","C",1]]
| .[0][0:2] + map(.[-1]|tostring) # ["A","C","T1","1"]
| join(",") # "A,C,T1,1"