jq: How can I combine data from duplicate keys - json

I have a fairly complex JSON data structure that I've managed to use jq to filter down to certain keys and their values. I need to combine the results though, so duplicate keys have only one array of values.
e.g.
{
"1.NBT.B": [
{
"id": 545
},
{
"id": 546
}
]
},
{
"1.NBT.B": [
{
"id": 1281
},
{
"id": 1077
}
]
}
would result in
{
"1.NBT.B": [
{
"id": 545
},
{
"id": 546
},
{
"id": 1281
},
{
"id": 1077
}
]
},
...
or even better:
[{"1.NBT.B": [545, 546, 1281, 1077]}, ...]
I need to do it without having to put in the key ("1.NBT.B") directly, since there are hundreds of these keys. I think what has me most stuck is that the objects here aren't named -- the keys are not the same between objects.
Something like this only gives me the 2nd set of ids, completing skipping the first:
reduce .[] as $item ({}; . + $item)

Part 1
The following jq function combines an array of objects in the manner envisioned by the first part of the question.
# Given an array of objects, produce a single object with an array at
# every key, the array at each key, k, being formed from all the values at k.
def merge:
reduce .[] as $o ({}; reduce ($o|keys)[] as $key (.; .[$key] += $o[$key] ));
With this definition together with the line:
merge
in a file, and with the example input modified to be a valid JSON array,
the result is:
{
"1.NBT.B": [
{
"id": 545
},
{
"id": 546
},
{
"id": 1281
},
{
"id": 1077
}
]
}
Part 2
With merge as defined above, the filter:
merge | with_entries( .value |= map(.id) )
produces:
{
"1.NBT.B": [
545,
546,
1281,
1077
]
}

Related

Access to first key of a nested json using jq and obtain its value

I have the following JSON:
{
"query": "rest ec",
"elected_facts_mapping": {
"AWS": {
"ECS": {
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
}
},
"top_facts_mapping": {
"AWS": {
"ECS": {
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
},
"EC2": {
"attachments": [
"create_ec2"
],
"text": [
"Awesome"
]
}
},
"GitHub": {
"Pull": {
"attachments": [
"pull_req"
],
"text": [
"Be right on it"
]
}
},
"testtttt": {
"test": {
"attachments": [
"hello_world"
],
"text": [
"Be right on it"
]
}
},
"fgjgh": {
"fnfgj": {
"attachments": [
"hello_world"
],
"text": [
"Be right on it"
]
}
},
"tessttertre": {
"gfdgfdgfd": {
"attachments": [
"hello_world"
],
"text": [
"Great!"
]
}
}
},
"elected_facts_with_prefix_text": null
}
And I want to access to top_facts_mapping's first key AWS and it's first key ECS
I am trying to do this (in my DSL):
'.span | fromjson'
'.span_data.top_facts_mapping | keys[0]'
'.span_data.top_facts_mapping[${top_facts_prepare_top_fact_topic}] | keys[0]'
'.top_facts_prepare_top_fact_topic_subtopic[${top_facts_prepare_top_fact_topic}][${top_facts_prepare_top_fact_topic_subtopic}]'
You could use to_entries to turn the object into an array of key-value pairs, then select the first value using [0].value
.top_facts_mapping | to_entries[0].value | to_entries[0].value
{
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
Demo
If at one level the object may be empty, you can prepend each to_entries with try (optionally followed by a catch clause)
Here's a stream-based approach which disassembles the input using the --stream option, filters for the "top_facts_mapping" key on top level .[0][0], truncates the stream to descend 3 levels, re-assembles the stream using fromstream, and outputs the first match:
jq --stream -n 'first(fromstream(3| truncate_stream(inputs | select(.[0][0] == "top_facts_mapping"))))'
{
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
You could use the keys_unsorted builtin, since the underlying object is a dictionary and not a list
.top_facts_mapping | keys_unsorted[0] as $k | .[$k] | .[keys_unsorted[0]]
The above filter could be re-written with a simple function
def get_firstkey_val: keys_unsorted[0] as $k | .[$k];
.top_facts_mapping |
get_firstkey_val | get_firstkey_val
Or with some jq trick-play, assumes the path provided top_facts_mapping is guaranteed to exist
getpath([ paths | select(.[-3] == "top_facts_mapping" ) ] | first)
Since the paths built-in constructs the root to leaf paths as arrays, we all paths containing the second to last field (denoted by .[-3]) as "top_facts_mapping" which returns paths inside it
From which first selects the first entity in the list i.e. below list
[
"top_facts_mapping",
"AWS",
"ECS"
]
Use getpath/1 to obtain the JSON value at the obtained path.
If there is a risk of the key top_facts_mapping not being present in the JSON, getpath/1 could return an error as written above. Fix it by adding a proper check
([ paths | select(.[-3] == "top_facts_mapping" ) ] | first) as $p |
if $p | length > 0 then getpath($p) else empty end

Combining all key value pairs in one using jq filter or jq play

I want to transform JSON data using jq filter
Json data:
{
"main": [
{
"firstKey": "ABCD",
"id": "12345",
"data": [
{
"name": "first_id",
"value": "first_id_value"
},
{
"name": "second_id",
"value": "second_id_value"
},
{
"name": "third_id",
"value": "third_id_value"
}
]
}
]
}
Expected OUTPUT:
{
"firstKey": "ABCD",
"id": "12345",
"data.name.first_id": "first_id_value",
"data.name.second_id": "second_id_value",
"data.name.third_id": "third_id_value"
}
After many trials and errors, I was near to expected output using following filter expression
[.main[]|{"firstKey", "id"},foreach .data[] as $item (0; "data.name.\($item.name)" as $a|$item.value as $b| {($a): $b})][]
Used foreach as objects under "data" are dynamic. the number of objects can differ.
The output for the above expression is:
{
"firstKey": "ABCD",
"id": "12345"
}
{
"data.name.first_id": "first_id_value"
}
{
"data.name.second_id": "second_id_value"
}
{
"data.name.third_id": "third_id_value"
}
But I want the objects of data to be under the same braces as 'firstKey' and 'id'.
LINK to JqPlay
Any suggestions will be helpful.
Since your structure is so rigid, you can cheat and use the built-in from_entries, which takes a list of {key, value} pairs and constructs an object:
.main[] |
{firstKey, id} +
(.data | map({key: "data.name.\(.name)", value}) |
from_entries)

How to count occurrences of a key-value pair per individual object in JQ?

I could not find how to count occurrence of "title" grouped by "member_id"...
The json file is:
[
{
"member_id": 123,
"loans":[
{
"date": "123",
"media": [
{ "title": "foo" },
{ "title": "bar" }
]
},
{
"date": "456",
"media": [
{ "title": "foo" }
]
}
]
},
{
"member_id": 456,
"loans":[
{
"date": "789",
"media": [
{ "title": "foo"}
]
}
]
}
]
With this query I get loan entries for users with "title==foo"
jq '.[] | (.member_id) as $m | .loans[].media[] | select(.title=="foo") | {id: $m, title: .title}' member.json
{
"id": 123,
"title": "foo"
}
{
"id": 123,
"title": "foo"
}
{
"id": 456,
"title": "foo"
}
But I could not find how to get count by user (group by) for a title, to get a result like:
{
"id": 123,
"title": "foo",
"count": 2
}
{
"id": 456,
"title": "foo",
"count": 1
}
I got errors like jq: error (at member.json:31): object ({"title":"f...) and array ([[123]]) cannot be sorted, as they are not both arrays or similar...
When the main goal is to count, it is usually more efficient to avoid constructing an array if determining its length is the only reason for doing so. In the present case you could, for example, write:
def count(s): reduce s as $x (null; .+1);
"foo" as $title | .[] | {
id: .member_id,
$title,
count: count(.loans[].media[] | select(.title == $title))
}
group_by has its uses, but it is well to be aware that it is inefficient even for grouping, because its implementation involves a sort, which is not strictly necessary if the goal is to "group by" some criterion. A completely generic sort-free "group by" function is a bit tricky to implement, but often a simple but non-generic version is sufficient, such as:
# sort-free variant of group_by/1
# f must always evaluate to an integer or always to a string, which
# could be achieved by using `tostring`.
# Output: an array in the former case, or an object in the latter case
def GROUP_BY(f): reduce .[] as $x (null; .[$x|f] += [$x] );
Using group_by :
jq 'map(
(.member_id) as $m
| .loans[].media[]
| select(.title=="foo")
| {id: $m, title: .title}
)
|group_by(.id)[]
|.[0] + { count: length }
' input-file

How to convert array of object into expected json key value object based on the path value

This is my sample input
Input
[
{
"label": "test1",
"value": 1,
"path": "data/testData/testDataLevel3/testDataLevel3_1/0/testDataLevel3_1_a2"
},
{
"label": "test2",
"value": 2,
"path": "data/testData/testDataLevel1/testDataLevel1_1"
}
]
This input needs to be converted like this using jq
Expected output:
{
"data": {
"testData": {
"testDataLevel1": { //object
"testDataLevel1_1": 2
},
"testDataLevel3": {
"testDataLevel3_1": [ //array
{
"testDataLevel3_1_a2": 1
}
]
}
}
}
}
The path will contain the array index as path, and sometimes the keys will be combined in the path as well
You need to convert each .path to a form setpath can understand. The rest is straightforward.
reduce .[] as {$path, $value} (null;
setpath($path / "/" | map(tonumber? // .); $value)
)
Online demo

Parse JSON and JSON values with jq

I have an API that returns JSON - big blocks of it. Some of the key value pairs have more blocks of JSON as the value associated with a key. jq does a great job of parsing the main JSON levels. But I can't find a way to get it to 'recurse' into the values associated with the keys and pretty print them as well.
Here is the start of one of the JSON returns. Note it is only a small percent of the full return:
{
"code": 200,
"status": "OK",
"data": {
"PlayFabId": "xxxxxxx",
"InfoResultPayload": {
"AccountInfo": {
"PlayFabId": "xxxxxxxx",
"Created": "2018-03-22T19:23:29.018Z",
"TitleInfo": {
"Origination": "IOS",
"Created": "2018-03-22T19:23:29.033Z",
"LastLogin": "2018-03-22T19:23:29.033Z",
"FirstLogin": "2018-03-22T19:23:29.033Z",
"isBanned": false
},
"PrivateInfo": {},
"IosDeviceInfo": {
"IosDeviceId": "xxxxxxxxx"
}
},
"UserVirtualCurrency": {
"GT": 10,
"MB": 70
},
"UserVirtualCurrencyRechargeTimes": {},
"UserData": {},
"UserDataVersion": 15,
"UserReadOnlyData": {
"DataVersion": {
"Value": "6",
"LastUpdated": "2018-03-22T19:48:59.543Z",
"Permission": "Public"
},
"achievements": {
"Value": "[{\"id\":0,\"gamePack\":\"GAME.PACK.0.KK\",\"marblesAmount\":50,\"achievements\":[{\"id\":2,\"name\":\"Correct Round 4\",\"description\":\"Round 4 answered correctly\",\"maxValue\":10,\"increment\":1,\"currentValue\":3,\"valueUnit\":\"unit\",\"awardOnIncrement\":true,\"marbles\":10,\"image\":\"https://www.jamandcandy.com/kissinkuzzins/achievements/icons/sphinx\",\"SuccessKey\":[\"0_3_4_0\",\"0_5_4_0\",\"0_6_4_0\",\"0_7_4_0\",\"0_8_4_0\",\"0_9_4_0\",\"0_10_4_0\"],\"event\":\"Player_answered_round\",\"achieved\":false},{\"id\":0,\"name\":\"Complete
This was parsed using jq but as you can see when you get to the
"achievements": { "Vales": "[{\"id\":0,\"gamePack\":\"GAME.PACK.0.KK\",\"marblesAmount\":50,\
lq does no further parse the value at is also JSON.
Is there a filter I am missing to get it to parse the values as well as the higher level structure?
Is there a filter I am missing ...?
The filter you'll need is fromjson, but it should only be applied to the stringified JSON; consider therefore using |= as illustrated using your fragment:
echo '{"achievements": { "Vales": "[{\"id\":0,\"gamePack\":\"GAME.PACK.0.KK\",\"marblesAmount\":50}]"}}' |
jq '.achievements.Vales |= fromjson'
{
"achievements": {
"Vales": [
{
"id": 0,
"gamePack": "GAME.PACK.0.KK",
"marblesAmount": 50
}
]
}
}
recursively/1
If you want to apply fromjson recursively wherever possible, then recursively is your friend:
def recursively(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | recursively(f) )} )
elif type == "array" then map( recursively(f) )
else try (f as $f | if $f == . then . else ($f | recursively(f)) end) catch $in
end;
This would be applied as follows:
recursively(fromjson)
Example
{a: ({b: "xyzzy"}) | tojson} | tojson
| recursively(fromjson)
yields:
{
"a": {
"b": "xyzzy"
}
}