how to group json using jq and then convert to yaml - json

I have this json that i want to convert.
[
{
"externalGroup": "another group admins",
"groupId": "da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
},
{
"externalGroup": "another group users",
"groupId": "7c69cac1-4a70-4170-8251-cde3762fe498"
},
{
"externalGroup": "my group admin",
"groupId": "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
},
{
"externalGroup": "my group users",
"groupId": "8370821e-edfa-4615-ac2e-47815b740f40"
},
{
"externalGroup": "some group",
"groupId": "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
},
{
"externalGroup": "some group",
"groupId": "8370821e-edfa-4615-ac2e-47815b740f40"
},
{
"externalGroup": "some group",
"groupId": "7c69cac1-4a70-4170-8251-cde3762fe498"
}
]
I have tried this, which is pretty close:
jq '. | group_by(.externalGroup)[] | {(.[0].externalGroup): map(.groupId)}'
I get this:
{
"another group admins": [
"da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
]
}
{
"another group users": [
"7c69cac1-4a70-4170-8251-cde3762fe498"
]
}
{
"my group admin": [
"e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
]
}
{
"my group users": [
"8370821e-edfa-4615-ac2e-47815b740f40"
]
}
{
"some group": [
"e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
"8370821e-edfa-4615-ac2e-47815b740f40",
"7c69cac1-4a70-4170-8251-cde3762fe498"
]
}
But this doesn't convert properly with yq. It would need to look something like this instead:
{
"another group admins": [
"da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
],
"another group users": [
"7c69cac1-4a70-4170-8251-cde3762fe498"
],
"my group admin": [
"e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
],
"my group users": [
"8370821e-edfa-4615-ac2e-47815b740f40"
],
"some group": [
"e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
"8370821e-edfa-4615-ac2e-47815b740f40",
"7c69cac1-4a70-4170-8251-cde3762fe498"
]
}
In order to get something like:
"another group admins":
- "da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
"another group users":
- "7c69cac1-4a70-4170-8251-cde3762fe498"
"my group admin":
- "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
"my group users":
- "8370821e-edfa-4615-ac2e-47815b740f40"
"some group":
- "e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
- "8370821e-edfa-4615-ac2e-47815b740f40",
- "7c69cac1-4a70-4170-8251-cde3762fe498"

The piece you are missing is from_entries which can build a JSON object from an array of keys and values.
Instead of:
jq '. | group_by(.externalGroup)[] | {(.[0].externalGroup): map(.groupId)}'
Try:
jq 'group_by(.externalGroup) | map({key:.[0].externalGroup, value:map(.groupId)}) | from_entries'
{
"another group admins": [
"da2e42c8-6423-4d32-99b5-5fc58f9f80b8"
],
"another group users": [
"7c69cac1-4a70-4170-8251-cde3762fe498"
],
"my group admin": [
"e08a1d9d-f108-4e87-bdb3-ee4f10c6752a"
],
"my group users": [
"8370821e-edfa-4615-ac2e-47815b740f40"
],
"some group": [
"e08a1d9d-f108-4e87-bdb3-ee4f10c6752a",
"8370821e-edfa-4615-ac2e-47815b740f40",
"7c69cac1-4a70-4170-8251-cde3762fe498"
]
}
I made the following changes:
Removed the . | at the beginning because it doesn't change anything.
Removed the [] and used map(...) instead, because we want to keep things in an array to feed to from_entries.
Instead of assembling a one-entry object, we create {key:..., value:...} pairs to feed to from_entries.
Actually, I just checked and was slightly surprised to discover that add is actually a bit faster than from_entries even for very long lists. If you use add you need to change even less of your solution.
jq 'group_by(.externalGroup) | map({(.[0].externalGroup):map(.groupId)}) | add'
Adding together objects combines their contents together. I tested with a 250,000 element list and it was slightly faster than from_entries. Given that it's also shorter and in my opinion pretty much just as clear, I think it's worthy of consideration.

An alternative worth considering for producing yaml is gojq, the Go implementation of jq, e.g.
gojq --yaml-output '
group_by(.externalGroup)
| map({(.[0].externalGroup):map(.groupId)}) | add'
To avoid the overhead of map, you could use the following generic stream-oriented add that works for objects or arrays just as well as for numbers:
gojq --yaml-output '
def add(s): reduce s as $x (null; . + $x);
add( group_by(.externalGroup)[]
| {(.[0].externalGroup):map(.groupId)})'

Related

Access to first key of a nested json using jq and obtain its value

I have the following JSON:
{
"query": "rest ec",
"elected_facts_mapping": {
"AWS": {
"ECS": {
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
}
},
"top_facts_mapping": {
"AWS": {
"ECS": {
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
},
"EC2": {
"attachments": [
"create_ec2"
],
"text": [
"Awesome"
]
}
},
"GitHub": {
"Pull": {
"attachments": [
"pull_req"
],
"text": [
"Be right on it"
]
}
},
"testtttt": {
"test": {
"attachments": [
"hello_world"
],
"text": [
"Be right on it"
]
}
},
"fgjgh": {
"fnfgj": {
"attachments": [
"hello_world"
],
"text": [
"Be right on it"
]
}
},
"tessttertre": {
"gfdgfdgfd": {
"attachments": [
"hello_world"
],
"text": [
"Great!"
]
}
}
},
"elected_facts_with_prefix_text": null
}
And I want to access to top_facts_mapping's first key AWS and it's first key ECS
I am trying to do this (in my DSL):
'.span | fromjson'
'.span_data.top_facts_mapping | keys[0]'
'.span_data.top_facts_mapping[${top_facts_prepare_top_fact_topic}] | keys[0]'
'.top_facts_prepare_top_fact_topic_subtopic[${top_facts_prepare_top_fact_topic}][${top_facts_prepare_top_fact_topic_subtopic}]'
You could use to_entries to turn the object into an array of key-value pairs, then select the first value using [0].value
.top_facts_mapping | to_entries[0].value | to_entries[0].value
{
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
Demo
If at one level the object may be empty, you can prepend each to_entries with try (optionally followed by a catch clause)
Here's a stream-based approach which disassembles the input using the --stream option, filters for the "top_facts_mapping" key on top level .[0][0], truncates the stream to descend 3 levels, re-assembles the stream using fromstream, and outputs the first match:
jq --stream -n 'first(fromstream(3| truncate_stream(inputs | select(.[0][0] == "top_facts_mapping"))))'
{
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
You could use the keys_unsorted builtin, since the underlying object is a dictionary and not a list
.top_facts_mapping | keys_unsorted[0] as $k | .[$k] | .[keys_unsorted[0]]
The above filter could be re-written with a simple function
def get_firstkey_val: keys_unsorted[0] as $k | .[$k];
.top_facts_mapping |
get_firstkey_val | get_firstkey_val
Or with some jq trick-play, assumes the path provided top_facts_mapping is guaranteed to exist
getpath([ paths | select(.[-3] == "top_facts_mapping" ) ] | first)
Since the paths built-in constructs the root to leaf paths as arrays, we all paths containing the second to last field (denoted by .[-3]) as "top_facts_mapping" which returns paths inside it
From which first selects the first entity in the list i.e. below list
[
"top_facts_mapping",
"AWS",
"ECS"
]
Use getpath/1 to obtain the JSON value at the obtained path.
If there is a risk of the key top_facts_mapping not being present in the JSON, getpath/1 could return an error as written above. Fix it by adding a proper check
([ paths | select(.[-3] == "top_facts_mapping" ) ] | first) as $p |
if $p | length > 0 then getpath($p) else empty end

How to count occurrences of a key-value pair per individual object in JQ?

I could not find how to count occurrence of "title" grouped by "member_id"...
The json file is:
[
{
"member_id": 123,
"loans":[
{
"date": "123",
"media": [
{ "title": "foo" },
{ "title": "bar" }
]
},
{
"date": "456",
"media": [
{ "title": "foo" }
]
}
]
},
{
"member_id": 456,
"loans":[
{
"date": "789",
"media": [
{ "title": "foo"}
]
}
]
}
]
With this query I get loan entries for users with "title==foo"
jq '.[] | (.member_id) as $m | .loans[].media[] | select(.title=="foo") | {id: $m, title: .title}' member.json
{
"id": 123,
"title": "foo"
}
{
"id": 123,
"title": "foo"
}
{
"id": 456,
"title": "foo"
}
But I could not find how to get count by user (group by) for a title, to get a result like:
{
"id": 123,
"title": "foo",
"count": 2
}
{
"id": 456,
"title": "foo",
"count": 1
}
I got errors like jq: error (at member.json:31): object ({"title":"f...) and array ([[123]]) cannot be sorted, as they are not both arrays or similar...
When the main goal is to count, it is usually more efficient to avoid constructing an array if determining its length is the only reason for doing so. In the present case you could, for example, write:
def count(s): reduce s as $x (null; .+1);
"foo" as $title | .[] | {
id: .member_id,
$title,
count: count(.loans[].media[] | select(.title == $title))
}
group_by has its uses, but it is well to be aware that it is inefficient even for grouping, because its implementation involves a sort, which is not strictly necessary if the goal is to "group by" some criterion. A completely generic sort-free "group by" function is a bit tricky to implement, but often a simple but non-generic version is sufficient, such as:
# sort-free variant of group_by/1
# f must always evaluate to an integer or always to a string, which
# could be achieved by using `tostring`.
# Output: an array in the former case, or an object in the latter case
def GROUP_BY(f): reduce .[] as $x (null; .[$x|f] += [$x] );
Using group_by :
jq 'map(
(.member_id) as $m
| .loans[].media[]
| select(.title=="foo")
| {id: $m, title: .title}
)
|group_by(.id)[]
|.[0] + { count: length }
' input-file

jOOQ JSON formatting as array of objects

I have the following (simplified) jOOQ query:
val result = context.select(
jsonObject(
key("id").value(ITEM.ID),
key("title").value(ITEM.NAAM),
key("resources").value(
jsonArrayAgg(ITEM_INHOUD.RESOURCE_ID).absentOnNull()
)
)
).from(ITEM).fetch()
Now the output that I want is:
[
{
"id": "0da04cc5-f70c-4fb3-b5c7-dc645d342631",
"title": "Title1",
"resources": [
"8b0f6d5c-67fc-47ca-be77-d1735e7721ce",
"ea0316db-1cfd-46d7-8260-5c1a4e65a0cd"
]
},
{
"id": "0f7e67e6-5187-47e2-9f1d-dab08feba38b",
"title": "Title2"
}
]
result.formtJSON() gives the following output:
{
"fields": [
{
"name": "json_object",
"type": "JSON"
}
],
"records": [
[
{
"id": "0da04cc5-f70c-4fb3-b5c7-dc645d342631",
"title": "Title 1"
}
]
]
}
Disabling the headers with result.formatJSON(JSONFormat.DEFAULT_FOR_RECORDS) will get me:
[
[
{
"id": "0da04cc5-f70c-4fb3-b5c7-dc645d342631",
"title": "Title1",
"resources": [
"8b0f6d5c-67fc-47ca-be77-d1735e7721ce",
"ea0316db-1cfd-46d7-8260-5c1a4e65a0cd"
]
}
],
[
{
"id": "0f7e67e6-5187-47e2-9f1d-dab08feba38b",
"title": "Title2"
}
]
]
where I don't want the extra array.
Further customizing the JSONformatter with result.formatJSON(JSONFormat().header(false).recordFormat(JSONFormat.RecordFormat.OBJECT)) I get:
[
{
"json_object": {
"id": "0da04cc5-f70c-4fb3-b5c7-dc645d342631",
"title": "Title1",
"resources": [
"8b0f6d5c-67fc-47ca-be77-d1735e7721ce",
"ea0316db-1cfd-46d7-8260-5c1a4e65a0cd"
]
}
},
{
"json_object": {
"id": "0f7e67e6-5187-47e2-9f1d-dab08feba38b",
"title": "Title2"
}
}
]
where I don't want the object wrapped in json_object.
Is there a way to get the output I want?
Doing it with Result.formatJSON()
This is clearly a flaw in the jOOQ 3.14.0 implementation of Result.formatJSON(). In the special case where there is only one column, and that column is of type JSON or JSONB, the column name may not really matter, and thus its contents should be flattened into the object describing the row. I've created a feature request for this: https://github.com/jOOQ/jOOQ/issues/10953. It will be available in jOOQ 3.15.0 and 3.14.4. You will be able to do this:
result.formatJSON(JSONFormat().header(false).wrapSingleColumnRecords(false));
The RecordFormat is irrelevant here. This works the same way for RecordFormat.ARRAY and RecordFormat.OBJECT
Doing it directly with SQL
Of course, you can always work around this by moving all the logic into SQL. You probably simplified your query by omitting a JOIN and GROUP BY. I'm assuming this is equivalent to what you want:
JSON result = context.select(
jsonArrayAgg(jsonObject(
key("id").value(ITEM.ID),
key("title").value(ITEM.NAAM),
key("resources").value(
select(jsonArrayAgg(ITEM_INHOUD.RESOURCE_ID).absentOnNull())
.from(ITEM_INHOUD)
.where(ITEM_INHOUD.ITEM_ID.eq(ITEM.ID))
)
))
).from(ITEM).fetchSingle().value1()
Note that JSON_ARRAYAGG() aggregates empty sets into NULL, not into an empty []. If that's a problem, use COALESCE()

Non destructive assignation with jq

I got the following data:
{
"things": [
{
"name": "lkj",
"something": [
"hike"
],
"more_data": "important",
"other_stuff": "very important"
},
{
"name": "iou",
"different_more_data": "very important too",
"more_different_data": [
"even more"
]
}
]
}
Each of things has an id called "name", with jq I can edit it like:
jq '(.things[]) |= {name,something:["changed"]}'
{
"things": [
{
"name": "lkj",
"something": [
"changed"
]
},
...
Unfortunately I lose everything not declared in the right hand of the assignation operation.
Is there a way to make assignations without losing data? So that the result is like this:
{
"things": [
{
"name": "lkj",
"something": [
"changed"
],
"more_data": "important",
"other_stuff": "very important"
},
{
"name": "iou",
"something": [
"changed"
],
"different_more_data": "very important too",
"more_different_data": [
"even more"
]
}
]
}
You can simply modify your query so that it looks like:
.things[] |= (.something = ["changed"])
You can also use |= (or one of its siblings, such as +=) instead of = in the RHS expression, e.g.
.things[] |= (.something += ["changed"])
If you want to update some, but not all, items, you can still use the above forms. A straightforward approach is to use if ... then ... else ... end, for example:
.things[] |= (if .name == "lkj" then .something = ["changed"] else . end)
Using select on the LHS of |=
jq (or at least jq since version 1.4) does support the use of select on the LHS of |=, e.g.
(.things[] | select(.name=="lkj")) |= (.something += ["changed"])
With jq's map function:
jq '.things |= map(.something = ["changed"])' jsonfile
map(x) - apply specified filter x for each item of the input array
.something = ["changed"] - set key something to an object with array ["changed"] as a value
The output:
{
"things": [
{
"name": "lkj",
"something": [
"changed"
],
"more_data": "important",
"other_stuff": "very important"
},
{
"name": "iou",
"different_more_data": "very important too",
"more_different_data": [
"even more"
],
"something": [
"changed"
]
}
]
}

jq: output values of ids instead of numbers

Here's my input json:
{
"channels": [
{ "id": 1, "name": "Pop"},
{ "id": 2, "name": "Rock"}
],
"links": [
{ "id": 2, "streams": [ {"url": "http://example.com/rock"} ] },
{ "id": 1, "streams": [ {"url": "http://example.com/pop"} ] }
]
}
This is what I want as an output:
"http://example.com/pop"
"Pop"
"http://example.com/rock"
"Rock"
So I need jq to replace .channels[].id with .links[].streams[0].url based on .links[].id
I don't know if it's right, but this is how I managed to output the urls:
(.channels[].id | tostring) as $ids | [.links[]] | map({(.id | tostring): .streams[0].url}) | add as $urls | $urls[$ids]
"http://example.com/pop"
"http://example.com/rock"
The question is, how do I add .channels[].name to it?
You sometimes have to be careful what you ask for, but this will produce the result you said you want:
.channels[] as $channel
| $channel.name,
(.links[] | select(.id == $channel.id) | .streams[0].url)
Output for the given input:
"Pop"
"http://example.com/pop"
"Rock"
"http://example.com/rock"
Here is a solution which uses reduce and setpath to make a $urls lookup table from .links and then scans .channels generating corresponding urls and names.
(
reduce .links[] as $l (
{};
setpath([ $l.id|tostring ]; [$l.streams[].url])
)
) as $urls
| .channels[]
| $urls[ .id|tostring ][], .name
If multiple urls are present in the "streams" attribute this will
print them all before printing the name. e.g. if the input is
{
"channels": [
{ "id": 1, "name": "Pop"},
{ "id": 2, "name": "Rock"}
],
"links": [
{ "id": 2, "streams": [ {"url": "http://example.com/rock"},
{"url": "http://example.com/hardrock"} ] },
{ "id": 1, "streams": [ {"url": "http://example.com/pop"} ] }
]
}
the output will be
"http://example.com/pop"
"Pop"
"http://example.com/rock"
"http://example.com/hardrock"
"Rock"