Is there a simple way to display a 2D JSON array using jq, so that each subarray is on its own line?
E.g. If I have this JSON:
{
"array": [
[1, 2, 3],
[4, 5],
[6, 7, 8, 9]
]
}
I'm trying to use jq to print out:
1 2 3
4 5
6 7 8 9
Apply compact -c output:
$ jq -c '.array[]' test.json
[1,2,3]
[4,5]
[6,7,8,9]
Or if you also need to join array values by space char:
$ jq -rc '.array[] | join(" ")' test.json
1 2 3
4 5
6 7 8 9
Related
I have the following file file.txt:
{"a": "a", "b": "a", "time": "20210210T10:10:00"}
{"a": "b", "b": "b", "time": "20210210T11:10:00"}
I extract the values with bash command jq (I use this command on massive 100g files):
jq -r '[.a, .b, .time] | #tsv'
This returns good result of:
a a 20210210T10:10:00
b b 20210210T11:10:00
The output I would like is:
a a 2021-02-10 10:10:00
b b 2021-02-10 11:10:00
The problem is that I want to change the format of the date in the most efficient way possible.
How do I do that?
You can do it in sed, but you can also call sub directly in jq:
jq -r '[.a, .b,
( .time
| sub("(?<y>\\d{4})(?<m>\\d{2})(?<d>\\d{2})T";
.y+"-"+.m+"-"+.d+" ")
)
] | #tsv'
Use strptime for date interpretation and strftime for formatting:
parse.jq
[
.a,
.b,
( .time
| strptime("%Y%m%dT%H:%M:%S")
| strftime("%Y-%d-%m %H:%M:%S")
)
] | #tsv
Run it like this:
<input.json jq -rf parse.jq
Or as a one-liner:
<input.json jq -r '[.a,.b,(.time|strptime("%Y%m%dT%H:%M:%S")|strftime("%Y-%d-%m %H:%M:%S"))]|#tsv'
Output:
a a 2021-10-02 10:10:00
b b 2021-10-02 11:10:00
Since speed is an issue, and since there does not appear to be a need for anything more than string splitting, you could compare string splitting done with jq using
[.a, .b,
(.time | "\(.[:4])-\(.[4:6])-\(.[6:8]) \(.[9:])"]
vs similar splitting using jq with awk -F\\t 'BEGIN{OFS=FS} ....' (awk for ease of handling the TSV).
With sed:
$ echo "20210427T19:23:00" | sed -r 's|([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]
{2})T|\1-\2-\3 |'
2021-04-27 19:23:00
I'm attempting to reduce this list of names to a single line of text.
I have JSON like this:
{
"speakers": [
{
"firstName": "Abe",
"lastName": "Abraham"
},
{
"firstName": "Max",
"lastName": "Miller"
}
]
}
Expected output:
Abe Abraham and Max Miller
One of the many attempts I've made is this:
jq -r '.speakers[] | ["\(.firstName) \(.lastName)"] | join(" and ")'
The results are printed out on separate lines like this:
Abe Abraham
Max Miller
I think the join command is just joining the single-element array piped to it (one name per array). How can I get the full list of names passed to join as a single array, so I get the expected output shown above?
You're getting an array for each speaker that way. What you want is a single array containing all so that you can join them, which is done like this:
.speakers | map("\(.firstName) \(.lastName)") | join(" and ")
$ jq -c '.speakers[] | [ "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham"]
["Max Miller"]
If you move your opening [ you get a single array with all the names.
$ jq -c '[ .speakers[] | "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham","Max Miller"]
Which you can pass to join()
$ jq -r '[ .speakers[] | "\(.firstName) \(.lastName)" ] | join(" and ")' speakers.json
Abe Abraham and Max Miller
If there are no other keys you can also write it like:
$ jq -r '[.speakers[] | join(" ")] | join(" and ")' speakers.json
Abe Abraham and Max Miller
I am trying to count the number of records that match some property.
If I have json like:
[
{
"id": 0,
"count": 1
},
{
"id": 1,
"count": 1
},
{
"id": 2,
"count": 0
}
]
I am trying to get the number of records with a count of 1.
I can get the matching records with:
$ jq '.[] | select(.count == 1)' in.json
{
"id": 0,
"count": 1
}
{
"id": 1,
"count": 1
}
But the output lists two items, so I cannot directly use length to count them. Instead, using length gives the length of each item.
$ jq '.[] | select(.count == 1) | length' in.json
2
2
How can I count how many records were matched by select?
For efficiency, one should avoid using length on a constructed array. Instead, it's preferable to use a stream-oriented approach.
Here's one efficient solution, which, for convenience, uses the generic count function, defined as:
def count(stream): reduce stream as $i (0; .+1);
With this def, the solution is simply:
count(.[] | select(.count==1))
Use jq 'map(select(.count==1))|length' in.json.
See select example in jq manual: https://stedolan.github.io/jq/manual/#select(boolean_expression)
I've been trying to convert some JSON to csv and I have the following problem:
I have the following input json:
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
And I expect the following CSV output:
id,t1,t2
100,2,3
200,,3
300,3,
Unfortunately JQ doesn't output if one of select has no match.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
output:
{ "t1": 2, "t2": 3 }
but if one of the objects select returns no match it doesn't return at all.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
Expected output:
{ "t1": 2, "t2": null }
Does anyone know how to achieve this with JQ?
EDIT:
Based on a comment made by #peak I found the solution that I was looking for.
jq -r '["id","t1","t2"],[.id, (.a[] | select(.t==1)).c//null, (.a[] | select(.t==2)).c//null ]|#csv'
The alternative operator does exactly what I was looking for.
Alternative Operator
Here's a simple solution that does not assume anything about the ordering of the items in the .a array, and easily generalizes to arbitrarily many .t values:
# Convert an array of {t, c} to a dictionary:
def tod: map({(.t|tostring): .c}) | add;
["id", "t1", "t2"], # header
(inputs
| (.a | tod) as $dict
| [.id, (range(1;3) as $i | $dict[$i|tostring]) ])
| #csv
Command-line options
Use the -n option (because inputs is being used), and the -r option (to produce CSV).
This is an absolute mess, but it works:
$ cat tmp.json
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
$ cat filter.jq
def t(id):
.a |
map({key: "t\(.t)", value: .c}) |
({t1:null, t2:null, id:id} | to_entries) + . | from_entries
;
inputs |
map(.id as $id | t($id)) |
(.[0] | keys) as $hdr |
([$hdr] + map(to_entries |map(.value)))[]|
#csv
$ jq -rn --slurp -f filter.jq tmp.json
"id","t1","t2"
2,3,100
,3,200
3,,300
In short, you produce a direct object containing the values from your input, then add it to a "default" object to fill in the missing keys.
Does anyone know how to use jq to find the duplicate(s) in a JSON array?
For example:
Input:
[{"foo": 1, "bar": 2}, {"foo": 1, "bar": 2}, {"foo": 4, "bar": 5}]
Output:
[{"foo": 1, "bar": 2}]
One of many possible solutions in jq:
group_by(.) | map(select(length>1) | .[0])
Solutions involving the built-in group_by involve a sort and are therefore inefficient if the goal is simply to identify the duplicates. Here is a sort-free solution that uses a generic and powerful bagof function defined here on a stream:
# Create a two-level dictionary giving [item, n] where n
# is the multiplicity of the item in the stream
def bagof(stream):
reduce stream as $x ({};
($x | [type, tostring]) as $key
| getpath($key) as $entry
| if $entry then setpath($key; [$x, ($entry[1] + 1 )])
else setpath($key; [$x, 1])
end ) ;
# Emit a stream of duplicated items in the stream, s:
def duplicates(s): bagof(s) | .[][] | select(.[1]>1) | .[0];
# Input: an array
# Output: an array of items that are duplicated in the array
def duplicates: [duplicates(.[])];