I am trying to process a large json file for testing purposes that has a few thousand entries. The json contains a long list of data to is too large for me to process in one go. Using a jq, is there an easy way to get a valid snippet of the json that only contains the first few entries from the data list? For example is there a query that would look at the whole json file and return to me a valid json that only contains the first 4 entries from data? Thank you!
{
"info":{
"name":"some-name"
},
"data":[
{...},
{...},
{...},
{...}
}
Based on your snippet, the relevant jq would be:
.data |= .[:4]
Here's an example using the --stream option:
$ cat input.json
{
"info": {"name": "some-name"},
"data": [
{"a":1},
{"b":2},
{"c":3},
{"d":4},
{"e":5},
{"f":6},
{"g":7}
]
}
jq --stream -n '
reduce (
inputs | select(has(1) and (.[0] | .[0] == "data" and .[1] < 4))
) as $in (
{}; .[$in[0][-1]] = $in[1]
)
' input.json
{
"a": 1,
"b": 2,
"c": 3,
"d": 4
}
Note: Using limit would have been more efficient in this case, but I tried to be more generic for the purpose of scalability.
disclaimer: indeed, there are already different answers (like JQ Join JSON files by key or denormalizing JSON with jq) for but none of them helped me yet or did have different circumstances I was unable to derive a solution from ;/
I have 2 files, both are lists of objects where one of them ha field references to object ids of the other one
given
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
}
}
]
and
[
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
]
my goal would be to get a denormalized object list:
expected
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]
while I'm able to do the main parts, I didn't challenged to bring both together yet:
with
example 1
jq -s '(.[1][] | select(.id == "5de82d5072f4a72ad5d5dcc1"))' objects.json referredObjects.json
I get
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
and with
example 2
jq -s '.[0][] | .reference = {}' objects.json referredObjects.json
I can manipulate any .reference getting
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {}
}
(even I loose the list structure)
But: I can't do s.th. like
execpted "join"
jq -s '.[0][] as $obj | $obj.reference = (.[1][] | select(.id == $obj.reference.id))' objects.json referredObjects.json
even approaches with foreach or reduce looks promising
jq -s '[foreach .[0][] as $obj ({}; .reference.id = ""; . + $obj )]' objects.json referredObjects.json
=>
[
{
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
},
"id": "5b9f50ccdcdf200283f29052"
}
]
where I expected to get the same as in second example
I end up in headaches and looking forward to write a ineffective while routine in any language ... hopefully I would appreciate any help on this
~Marcel
Transform the second file into an object where ids and names are paired and use it as a reference while updating the first file.
$ jq '(map({(.id): .}) | add) as $idx
| input
| map_values(.reference = $idx[.reference.id])' file2 file1
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]
The following solution uses the same strategy as used in the solution by #OguzIsmail but uses the built-in function INDEX/2 to construct the dictionary from the second file.
The important point is that this strategy allows the arrays in both files to be of arbitrary size.
Invocation
jq --argfile file2 file2.json -f program.jq file1.json
program.jq
INDEX($file2[]; .id) as $dict
| map(.reference.id as $id | .reference = $dict[$id])
I'm a rookie wirh JQ.
I would like to merge 2 json files with JQ. But only for the present keys in first file.
First file (first.json)
{
"##locale": "en",
"foo": "bar1"
}
Second file (second.json)
{
"##locale": "en",
"foo": "bar2",
"oof": "rab"
}
I already tried.
edit: jq -n '.[0] * .[1]' first.json second.json
jq -s '.[0] * .[1]' first.json second.json
But the returned result is wrong.
{
"##locale": "en",
"foo": "bar2",
"oof": "rab"
}
"oof" entry should not be present.
Expected merged.
{
"##locale": "en",
"foo": "bar2"
}
Best regards.
And here's a one-liner, which happens to be quite efficient:
jq --argfile first first.json '. as $in | $first | with_entries(.value = $in[.key] )' second.json
Consider:
jq -n '.
| input as $first # read first input
| input as $second # read second input
| $first * $second # make the merger of the two the context item
| [ to_entries[] # ...then break it out into key/value pairs
| select($first[.key]) # ...and filter those for whether they exist in the first input
] | from_entries # ...before reassembling into a single object.
' first.json second.json
...which properly emits:
{
"##locale": "en",
"foo": "bar2"
}
I use jq 1.6 in a Windows 10 PowerShell enviroment and trying to select keys from coincidentally numeric json objects.
Json exampel:
{
"alliances_info":{
"744085325458334213":{
"emblem":3,
"name":"wellwell",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"MELL",
"slogan":"",
"id":744085325458334213
},
"744128593839677958":{
"emblem":0,
"name":"Brave",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"GABA",
"slogan":"",
"id":744128593839677958
},
"746034084459209223":{
"emblem":0,
"name":"Queen",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"QUE",
"slogan":"",
"id":746034084459209223
},
"750446471312466445":{
"emblem":0,
"name":"Phoenix Inc",
"member_count":35,
"level":6,
"military_might":453369,
"public":true,
"tag":"PHOI",
"slogan":"",
"id":750446471312466445
},
"750446518934594062":{
"emblem":11,
"name":"Australia",
"member_count":44,
"level":8,
"military_might":957211,
"public":true,
"tag":"AUST",
"slogan":"Go Australia",
"id":750446518934594062
}
},
"server_version":"v7.190.4-master.000000006"
}
I tried several jq commands:
.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]
or
.alliances_info | .. | objects | [{alliance_name: .name, alliance_c
ount: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slog
an, alliance_id: .id}]
But Always get a jq error: parse error: Invalid numeric literal at line 1, column 3
I renounce on the object Building in the first command (and built only a Array) it works. But i need that objects. Any tips?
BR
Timo
Your first query works perfectly well with the given JSON sample. Perhaps you're invoking jq incorrectly. If you have the jq program in a file, say select.jq, you'd invoke jq like so:
jq -f select.jq sample.json
If that doesn't help, then try:
jq empty sample.json
If that fails, there might be something wrong with the encoding of the JSON.
I'm not sure I understand what you want.
Your first attempt works for me, but generates one output for JSON value in the input. That is, I created a file named so.json and put in it your JSON from above:
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
⋮
}
When I run your program , I get:
$ jq '.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]' so.json
[
{
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"alliance_tag": "MELL",
"alliance_slogan": "",
"alliance_id": 744085325458334200
}
]
[
{
"alliance_name": "Brave",
⋮
]
If you want an array at all, you probably want one array containing all the alliances like this:
$ jq '.alliances_info | [ .[] | { alliance_name: .name, alliance_id: .id } ]' so.json
[
{
"alliance_name": "wellwell",
"alliance_id": 744085325458334200
},
{
"alliance_name": "Brave",
"alliance_id": 744128593839678000
},
{
"alliance_name": "Queen",
"alliance_id": 746034084459209200
},
{
"alliance_name": "Phoenix Inc",
"alliance_id": 750446471312466400
},
{
"alliance_name": "Australia",
"alliance_id": 750446518934594000
}
]
Starting from the left,
- .alliances_info looks in its input object for the field named "alliances_info" and outputs its value
- the | next says take the output from the left-hand side and pass those as inputs to the right-hand side.
- right after that first |, I have a [ «jq expressions» ] which tells jq to create one JSON array output for each input; the elements of that array are the outputs of that inner «jq expressions»
- that inner expression starts with .[] which means to produce one output for each JSON value (ignoring the keys) in the input object. For us, that will be the objects named "744085325458334213", "744128593839677958", …
- The next | uses those objects as input and for each, generates a JSON object { alliance_name: .name, alliance_id: .id }
That's why I end up with one JSON array containing 5 JSON objects.
As far as I can tell, you are mostly just renaming a bunch of the fields. For that, you could just do something like this:
$ jq --argjson renameMap '{ "name": "alliance_name", "member_count": "alliance_count", "level": "alliance_level", "military_might": "alliance_power", "tag": "alliance_tag", "slog": "alliance_slogan"}' '.alliances_info |= ( . | [ to_entries[] | ( .value |= ( . | [ to_entries[] | ( .key |= ( if $renameMap[.] then $renameMap[.] else . end ) ) ] | from_entries ) ) ] | from_entries )' so.json
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "MELL",
"slogan": "",
"id": 744085325458334200
},
"744128593839677958": {
"emblem": 0,
"alliance_name": "Brave",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "GABA",
"slogan": "",
"id": 744128593839678000
},
⋮
},
"server_version": "v7.190.4-master.000000006"
}
well i am a idiot (to be here totally clear). I found the reason (and this is normally a nobrainer...). I read the input from a file and the funny thing is that the file is Unicode but no UTF8. after recoding the command is working fine. Thanks for the help.
BR
Timo
I have a huge json file(15 GB) which looks like as follows:
{
"userActivities": {
"-L3ATRosRd-bDgSmX75Z": {
"deviceId": "60ee32c2fae8dcf0",
"dow": "Friday"
}
},
"users": {
"0GTDyAepIjcKMB1XulHCYLXylFS2": {
"ageRangeMin": 21,
"age_range": {
"min": 21
},
"gender": "male"
},
"0GTDyAepIjcKMB1S2": {
"ageRangeMin": 22,
"age_range": {
"min": 20
},
"gender": "male"
}
}
}
I want to extract the objects as if by .users[], but using the streaming parser (jq --stream). That is, I want my output to be as follows:
{"ageRangeMin":21,"age_range":{"min":21},"gender":"male"}
{"ageRangeMin":22,"age_range":{"min":20},"gender":"male"}
Any guidance/help is greatly appreciated. I'm unable to understand how jq --stream works.
If the goal is to just get objects at a certain depth of the json object tree, you can just truncate the stream.
$ jq --stream -nc 'fromstream(2|truncate_stream(inputs | select(.[0][:1] == ["users"])))'
Just make sure you're running the latest available jq. There's a bug in 1.5 for truncate_stream/1 that breaks for any other input greater than 1.
With your input in input.json, the following invocation:
$ jq -nc --stream '
fromstream(inputs|select(.[0][0] == "users"))|.[][]' input.json
yields:
{"ageRangeMin":21,"age_range":{"min":21},"gender":"male"}
{"ageRangeMin":22,"age_range":{"min":20},"gender":"male"}
The idea is to extract the "users" key-value pair first as a single-key object.
Note that the -n option must be used here.