2D array formatting in JQ - json

I'm using jq to reformat my files to be "nice and pretty". I'm using basic '.', but this is not working as I expect.
Let's say I have structure like this:
{ "foo": { "bar": { "baz": 123 },
"bislot":
[[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15]]
}}
and after reformatting with jq . I'm getting output like this:
{
"foo": {
"bar": {
"baz": 123
},
"bislot": [
[
1,
2,
3,
4,
5
],
[
6,
7,
8,
9,
10
],
[
11,
12,
13,
14,
15
]
]
}
}
What I want to achieve is something like this:
{
"foo": {
"bar": {
"baz": 123
},
"bislot":
[[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15]]
}
}
So for 2D arrays every item(array) should be in one line. Any ideas how can I do this?

As it has been pointed out already, from a JSON processor's perspective the formatting doesn't really matter. As you have noticed, jq by default pretty-prints its output. It also offers a --compact-output (or -c) option to output each JSON document into one line, which in your case would result in
jq -c . file.json
{"foo":{"bar":{"baz":123},"bislot":[[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15]]}}
If you care more about aesthetics than whether it is logically the same document or not, an easy way could be to encode all arrays within arrays as JSON, thus becoming strings, which when pretty-printing the whole document with jq won't be broken down anymore:
jq 'walk(arrays[] |= (arrays |= #json))' file.json
{
"foo": {
"bar": {
"baz": 123
},
"bislot": [
"[1,2,3,4,5]",
"[6,7,8,9,10]",
"[11,12,13,14,15]"
]
}
}
Demo
Going further, you could take the output from the previous call, read it line by line back into jq, decode lines with JSON-encoded arrays using fromjson, and output the modified lines (together masquerading as JSON document). This approach uses regex matching with look-ahead and look-behind to identify (the positions of) JSON-encoded arrays.
jq 'walk(arrays[] |= (arrays |= #json))' file.json \
| jq -Rr '(
match("^\\s+(?=\"\\[)").length as $a
| match("(?<=\\]\"),?$").offset as $b
| .[:$a] + (.[$a:$b] | fromjson) + .[$b:]
) // .'
{
"foo": {
"bar": {
"baz": 123,
"baz2": 123
},
"bislot": [
[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15]
]
}
}
Demo
Disclaimer: Although this outputs a valid JSON document for your sample input, this should still be regarded as an aesthetic printout. If the validity of your JSON document is a first-class concern, the actual formatting shouldn't really matter. jq's default pretty-printer, and its compact equivalent generally should fit most needs.

You can consider combining jq with another tool (here perl):
jq . input.json | perl -0 -pe 's/(\[\s*)
(((\[[^\[\]]+\])[^\[\]]*)+)
/$v1=$1; $v2 = $2;
$v2 =~ s{\[[^\[\]]+\]}
{$v=$&,($v =~ s[\s][]g),$v}egsx;
$v1 . $v2
/egsx'

Related

How would you collect the first few entries of a list from a large json file using jq?

I am trying to process a large json file for testing purposes that has a few thousand entries. The json contains a long list of data to is too large for me to process in one go. Using a jq, is there an easy way to get a valid snippet of the json that only contains the first few entries from the data list? For example is there a query that would look at the whole json file and return to me a valid json that only contains the first 4 entries from data? Thank you!
{
"info":{
"name":"some-name"
},
"data":[
{...},
{...},
{...},
{...}
}
Based on your snippet, the relevant jq would be:
.data |= .[:4]
Here's an example using the --stream option:
$ cat input.json
{
"info": {"name": "some-name"},
"data": [
{"a":1},
{"b":2},
{"c":3},
{"d":4},
{"e":5},
{"f":6},
{"g":7}
]
}
jq --stream -n '
reduce (
inputs | select(has(1) and (.[0] | .[0] == "data" and .[1] < 4))
) as $in (
{}; .[$in[0][-1]] = $in[1]
)
' input.json
{
"a": 1,
"b": 2,
"c": 3,
"d": 4
}
Note: Using limit would have been more efficient in this case, but I tried to be more generic for the purpose of scalability.

jq denormalize a nested field

disclaimer: indeed, there are already different answers (like JQ Join JSON files by key or denormalizing JSON with jq) for but none of them helped me yet or did have different circumstances I was unable to derive a solution from ;/
I have 2 files, both are lists of objects where one of them ha field references to object ids of the other one
given
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
}
}
]
and
[
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
]
my goal would be to get a denormalized object list:
expected
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]
while I'm able to do the main parts, I didn't challenged to bring both together yet:
with
example 1
jq -s '(.[1][] | select(.id == "5de82d5072f4a72ad5d5dcc1"))' objects.json referredObjects.json
I get
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
and with
example 2
jq -s '.[0][] | .reference = {}' objects.json referredObjects.json
I can manipulate any .reference getting
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {}
}
(even I loose the list structure)
But: I can't do s.th. like
execpted "join"
jq -s '.[0][] as $obj | $obj.reference = (.[1][] | select(.id == $obj.reference.id))' objects.json referredObjects.json
even approaches with foreach or reduce looks promising
jq -s '[foreach .[0][] as $obj ({}; .reference.id = ""; . + $obj )]' objects.json referredObjects.json
=>
[
{
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
},
"id": "5b9f50ccdcdf200283f29052"
}
]
where I expected to get the same as in second example
I end up in headaches and looking forward to write a ineffective while routine in any language ... hopefully I would appreciate any help on this
~Marcel
Transform the second file into an object where ids and names are paired and use it as a reference while updating the first file.
$ jq '(map({(.id): .}) | add) as $idx
| input
| map_values(.reference = $idx[.reference.id])' file2 file1
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]
The following solution uses the same strategy as used in the solution by #OguzIsmail but uses the built-in function INDEX/2 to construct the dictionary from the second file.
The important point is that this strategy allows the arrays in both files to be of arbitrary size.
Invocation
jq --argfile file2 file2.json -f program.jq file1.json
program.jq
INDEX($file2[]; .id) as $dict
| map(.reference.id as $id | .reference = $dict[$id])

JQ to merge new values between 2 json files

I'm a rookie wirh JQ.
I would like to merge 2 json files with JQ. But only for the present keys in first file.
First file (first.json)
{
"##locale": "en",
"foo": "bar1"
}
Second file (second.json)
{
"##locale": "en",
"foo": "bar2",
"oof": "rab"
}
I already tried.
edit: jq -n '.[0] * .[1]' first.json second.json
jq -s '.[0] * .[1]' first.json second.json
But the returned result is wrong.
{
"##locale": "en",
"foo": "bar2",
"oof": "rab"
}
"oof" entry should not be present.
Expected merged.
{
"##locale": "en",
"foo": "bar2"
}
Best regards.
And here's a one-liner, which happens to be quite efficient:
jq --argfile first first.json '. as $in | $first | with_entries(.value = $in[.key] )' second.json
Consider:
jq -n '.
| input as $first # read first input
| input as $second # read second input
| $first * $second # make the merger of the two the context item
| [ to_entries[] # ...then break it out into key/value pairs
| select($first[.key]) # ...and filter those for whether they exist in the first input
] | from_entries # ...before reassembling into a single object.
' first.json second.json
...which properly emits:
{
"##locale": "en",
"foo": "bar2"
}

JQ: key selection from numeric objects

I use jq 1.6 in a Windows 10 PowerShell enviroment and trying to select keys from coincidentally numeric json objects.
Json exampel:
{
"alliances_info":{
"744085325458334213":{
"emblem":3,
"name":"wellwell",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"MELL",
"slogan":"",
"id":744085325458334213
},
"744128593839677958":{
"emblem":0,
"name":"Brave",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"GABA",
"slogan":"",
"id":744128593839677958
},
"746034084459209223":{
"emblem":0,
"name":"Queen",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"QUE",
"slogan":"",
"id":746034084459209223
},
"750446471312466445":{
"emblem":0,
"name":"Phoenix Inc",
"member_count":35,
"level":6,
"military_might":453369,
"public":true,
"tag":"PHOI",
"slogan":"",
"id":750446471312466445
},
"750446518934594062":{
"emblem":11,
"name":"Australia",
"member_count":44,
"level":8,
"military_might":957211,
"public":true,
"tag":"AUST",
"slogan":"Go Australia",
"id":750446518934594062
}
},
"server_version":"v7.190.4-master.000000006"
}
I tried several jq commands:
.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]
or
.alliances_info | .. | objects | [{alliance_name: .name, alliance_c
ount: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slog
an, alliance_id: .id}]
But Always get a jq error: parse error: Invalid numeric literal at line 1, column 3
I renounce on the object Building in the first command (and built only a Array) it works. But i need that objects. Any tips?
BR
Timo
Your first query works perfectly well with the given JSON sample. Perhaps you're invoking jq incorrectly. If you have the jq program in a file, say select.jq, you'd invoke jq like so:
jq -f select.jq sample.json
If that doesn't help, then try:
jq empty sample.json
If that fails, there might be something wrong with the encoding of the JSON.
I'm not sure I understand what you want.
Your first attempt works for me, but generates one output for JSON value in the input. That is, I created a file named so.json and put in it your JSON from above:
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
⋮
}
When I run your program , I get:
$ jq '.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]' so.json
[
{
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"alliance_tag": "MELL",
"alliance_slogan": "",
"alliance_id": 744085325458334200
}
]
[
{
"alliance_name": "Brave",
⋮
]
If you want an array at all, you probably want one array containing all the alliances like this:
$ jq '.alliances_info | [ .[] | { alliance_name: .name, alliance_id: .id } ]' so.json
[
{
"alliance_name": "wellwell",
"alliance_id": 744085325458334200
},
{
"alliance_name": "Brave",
"alliance_id": 744128593839678000
},
{
"alliance_name": "Queen",
"alliance_id": 746034084459209200
},
{
"alliance_name": "Phoenix Inc",
"alliance_id": 750446471312466400
},
{
"alliance_name": "Australia",
"alliance_id": 750446518934594000
}
]
Starting from the left,
- .alliances_info looks in its input object for the field named "alliances_info" and outputs its value
- the | next says take the output from the left-hand side and pass those as inputs to the right-hand side.
- right after that first |, I have a [ «jq expressions» ] which tells jq to create one JSON array output for each input; the elements of that array are the outputs of that inner «jq expressions»
- that inner expression starts with .[] which means to produce one output for each JSON value (ignoring the keys) in the input object. For us, that will be the objects named "744085325458334213", "744128593839677958", …
- The next | uses those objects as input and for each, generates a JSON object { alliance_name: .name, alliance_id: .id }
That's why I end up with one JSON array containing 5 JSON objects.
As far as I can tell, you are mostly just renaming a bunch of the fields. For that, you could just do something like this:
$ jq --argjson renameMap '{ "name": "alliance_name", "member_count": "alliance_count", "level": "alliance_level", "military_might": "alliance_power", "tag": "alliance_tag", "slog": "alliance_slogan"}' '.alliances_info |= ( . | [ to_entries[] | ( .value |= ( . | [ to_entries[] | ( .key |= ( if $renameMap[.] then $renameMap[.] else . end ) ) ] | from_entries ) ) ] | from_entries )' so.json
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "MELL",
"slogan": "",
"id": 744085325458334200
},
"744128593839677958": {
"emblem": 0,
"alliance_name": "Brave",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "GABA",
"slogan": "",
"id": 744128593839678000
},
⋮
},
"server_version": "v7.190.4-master.000000006"
}
well i am a idiot (to be here totally clear). I found the reason (and this is normally a nobrainer...). I read the input from a file and the funny thing is that the file is Unicode but no UTF8. after recoding the command is working fine. Thanks for the help.
BR
Timo

How to access nested json keys in jq --stream

I have a huge json file(15 GB) which looks like as follows:
{
"userActivities": {
"-L3ATRosRd-bDgSmX75Z": {
"deviceId": "60ee32c2fae8dcf0",
"dow": "Friday"
}
},
"users": {
"0GTDyAepIjcKMB1XulHCYLXylFS2": {
"ageRangeMin": 21,
"age_range": {
"min": 21
},
"gender": "male"
},
"0GTDyAepIjcKMB1S2": {
"ageRangeMin": 22,
"age_range": {
"min": 20
},
"gender": "male"
}
}
}
I want to extract the objects as if by .users[], but using the streaming parser (jq --stream). That is, I want my output to be as follows:
{"ageRangeMin":21,"age_range":{"min":21},"gender":"male"}
{"ageRangeMin":22,"age_range":{"min":20},"gender":"male"}
Any guidance/help is greatly appreciated. I'm unable to understand how jq --stream works.
If the goal is to just get objects at a certain depth of the json object tree, you can just truncate the stream.
$ jq --stream -nc 'fromstream(2|truncate_stream(inputs | select(.[0][:1] == ["users"])))'
Just make sure you're running the latest available jq. There's a bug in 1.5 for truncate_stream/1 that breaks for any other input greater than 1.
With your input in input.json, the following invocation:
$ jq -nc --stream '
fromstream(inputs|select(.[0][0] == "users"))|.[][]' input.json
yields:
{"ageRangeMin":21,"age_range":{"min":21},"gender":"male"}
{"ageRangeMin":22,"age_range":{"min":20},"gender":"male"}
The idea is to extract the "users" key-value pair first as a single-key object.
Note that the -n option must be used here.