jq: count nest object values based on group by - json

Json:
[
{
"account": "1",
"cost": [
{
"usage":"low",
"totalcost": "2.01"
}
]
},
{
"account": "2",
"cost": [
{
"usage":"low",
"totalcost": "2.25"
}
]
},
{
"account": "1",
"cost": [
{
"usage":"low",
"totalcost": "15"
}
]
},
{
"anotheraccount": "a",
"cost": [
{
"usage":"low",
"totalcost": "2"
}
]
}
]
Results expected:
account cost
1 17.01
2 2.25
anotheraccount cost
a 2
I am able to pull out data but not sure how to aggregate it.
jq '.[] | {account,cost : .cost[].totalcost}'
Is there a way to do this in using jq, so I get all types of accounts and costs associated with them?

Two helper functions will help you get you to your destination:
def sigma( f ): reduce .[] as $o (null; . + ($o | f )) ;
def group( keyname ):
map(select(has(keyname)))
| group_by( .[keyname] )
| map({(keyname) : .[0][keyname],
cost: sigma(.cost[].totalcost | tonumber) })
;
With these, the following invocations:
group("account"),
group("anotheraccount")
yield:
[{"account":"1","cost":17.009999999999998},{"account":"2","cost":2.25}]
[{"anotheraccount":"a","cost":2}]
You should be able to manage the final formating step in jq.

Related

Access to first key of a nested json using jq and obtain its value

I have the following JSON:
{
"query": "rest ec",
"elected_facts_mapping": {
"AWS": {
"ECS": {
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
}
},
"top_facts_mapping": {
"AWS": {
"ECS": {
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
},
"EC2": {
"attachments": [
"create_ec2"
],
"text": [
"Awesome"
]
}
},
"GitHub": {
"Pull": {
"attachments": [
"pull_req"
],
"text": [
"Be right on it"
]
}
},
"testtttt": {
"test": {
"attachments": [
"hello_world"
],
"text": [
"Be right on it"
]
}
},
"fgjgh": {
"fnfgj": {
"attachments": [
"hello_world"
],
"text": [
"Be right on it"
]
}
},
"tessttertre": {
"gfdgfdgfd": {
"attachments": [
"hello_world"
],
"text": [
"Great!"
]
}
}
},
"elected_facts_with_prefix_text": null
}
And I want to access to top_facts_mapping's first key AWS and it's first key ECS
I am trying to do this (in my DSL):
'.span | fromjson'
'.span_data.top_facts_mapping | keys[0]'
'.span_data.top_facts_mapping[${top_facts_prepare_top_fact_topic}] | keys[0]'
'.top_facts_prepare_top_fact_topic_subtopic[${top_facts_prepare_top_fact_topic}][${top_facts_prepare_top_fact_topic_subtopic}]'
You could use to_entries to turn the object into an array of key-value pairs, then select the first value using [0].value
.top_facts_mapping | to_entries[0].value | to_entries[0].value
{
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
Demo
If at one level the object may be empty, you can prepend each to_entries with try (optionally followed by a catch clause)
Here's a stream-based approach which disassembles the input using the --stream option, filters for the "top_facts_mapping" key on top level .[0][0], truncates the stream to descend 3 levels, re-assembles the stream using fromstream, and outputs the first match:
jq --stream -n 'first(fromstream(3| truncate_stream(inputs | select(.[0][0] == "top_facts_mapping"))))'
{
"attachments": [
"restart_ecs"
],
"text": [
"Great!"
]
}
You could use the keys_unsorted builtin, since the underlying object is a dictionary and not a list
.top_facts_mapping | keys_unsorted[0] as $k | .[$k] | .[keys_unsorted[0]]
The above filter could be re-written with a simple function
def get_firstkey_val: keys_unsorted[0] as $k | .[$k];
.top_facts_mapping |
get_firstkey_val | get_firstkey_val
Or with some jq trick-play, assumes the path provided top_facts_mapping is guaranteed to exist
getpath([ paths | select(.[-3] == "top_facts_mapping" ) ] | first)
Since the paths built-in constructs the root to leaf paths as arrays, we all paths containing the second to last field (denoted by .[-3]) as "top_facts_mapping" which returns paths inside it
From which first selects the first entity in the list i.e. below list
[
"top_facts_mapping",
"AWS",
"ECS"
]
Use getpath/1 to obtain the JSON value at the obtained path.
If there is a risk of the key top_facts_mapping not being present in the JSON, getpath/1 could return an error as written above. Fix it by adding a proper check
([ paths | select(.[-3] == "top_facts_mapping" ) ] | first) as $p |
if $p | length > 0 then getpath($p) else empty end

parsing jq returns null

I have a json output
{
"7": [
{
"devices": [
"/dev/sde"
],
"name": "osd-block-dcc9b386-529c-451e-9d84-8ccc4091102b",
"tags": {
"ceph.crush_device_class": "None",
"ceph.db_device": "/dev/nvme0n1p5",
"ceph.wal_device": "/dev/nvme0n1p6",
},
"type": "block",
"vg_name": "ceph-c4de9e90-853e-4569-b04f-8677ef9a8c7a"
},
{
"path": "/dev/nvme0n1p5",
"tags": {
"PARTUUID": "69712eb4-be52-4618-ba46-e317d6d3d76e"
},
"type": "db"
}
],
"41": [
{
"devices": [
"/dev/nvme1n1p13"
],
"name": "osd-block-97bce07f-ae98-4fdb-83a9-9fa2f35cee60",
"tags": {
"ceph.crush_device_class": "None",
},
"type": "block",
"vg_name": "ceph-c1d48671-2a33-4615-95e3-cc1b18783f0c"
}
],
"9": [
{
"devices": [
"/dev/sdf"
],
"name": "osd-block-35323eb8-17c1-460d-8cc5-565f549e6991",
"tags": {
"ceph.crush_device_class": "None",
"ceph.db_device": "/dev/nvme0n1p7",
"ceph.wal_device": "/dev/nvme0n1p8",
},
"type": "block",
"vg_name": "ceph-9488e8b8-ec18-4860-93d3-6a1ad91c698c"
},
{
"path": "/dev/nvme0n1p7",
"tags": {
"PARTUUID": "ef0e9588-2a20-4c2c-8b62-d73945e01322"
},
"type": "db"
}
]
}
Required output:
osd.7 /dev/sde /dev/nvme0n1p5 /dev/nvme0n1p6
osd.41 /dev/nvme1n1p13 n/a n/a
osd.9 /dev/sdf /dev/nvme0n1p7 /dev/nvme0n1p7
Problems:
When I try parsing using jq .[][].devices, I get null values:
$ cat json | jq .[][].devices
[
"/dev/sde"
]
null
[
"/dev/nvme1n1p13"
]
null
[
"/dev/sdf"
]
null
I can solve it via jq .[][].devices[]?.
However, this trick doesn't help me when I do want to see where there's no value (to print n/a instead):
$ cat json | jq '.[][].tags | ."ceph.db_device"'
"/dev/nvme0n1p5"
null
"/dev/nvme0n1p3"
null
null
"/dev/nvme0n1p7"
null
And finally, I try to create a table:
$ cat json | jq -r '["osd."+keys[]], [.[][].devices[]?], [.[][].tags."ceph.db_device" // ""] | #csv' | column -t -s,
"osd.7" "osd.41" "osd.9"
"/dev/sde" "/dev/nvme0n1p13" "/dev/sdf"
"/dev/nvme0n1p5" "/dev/nvme0n1p7"
So the obvious problem is that the 3rd row doesn't match the correct values.
And the final problem is how do I transpose it from columns to rows, as detailed in the required output?
Would this do what you want?
jq --raw-output '
to_entries[] | [
"osd." + .key,
( .value[0]
| .devices[],
( .tags
| ."ceph.db_device" // "n/a",
."ceph.wal_device" // "n/a"
)
)
]
| #tsv
'
osd.7 /dev/sde /dev/nvme0n1p5 /dev/nvme0n1p6
osd.41 /dev/nvme1n1p13 n/a n/a
osd.9 /dev/sdf /dev/nvme0n1p7 /dev/nvme0n1p8
Demo

How to count occurrences of a key-value pair per individual object in JQ?

I could not find how to count occurrence of "title" grouped by "member_id"...
The json file is:
[
{
"member_id": 123,
"loans":[
{
"date": "123",
"media": [
{ "title": "foo" },
{ "title": "bar" }
]
},
{
"date": "456",
"media": [
{ "title": "foo" }
]
}
]
},
{
"member_id": 456,
"loans":[
{
"date": "789",
"media": [
{ "title": "foo"}
]
}
]
}
]
With this query I get loan entries for users with "title==foo"
jq '.[] | (.member_id) as $m | .loans[].media[] | select(.title=="foo") | {id: $m, title: .title}' member.json
{
"id": 123,
"title": "foo"
}
{
"id": 123,
"title": "foo"
}
{
"id": 456,
"title": "foo"
}
But I could not find how to get count by user (group by) for a title, to get a result like:
{
"id": 123,
"title": "foo",
"count": 2
}
{
"id": 456,
"title": "foo",
"count": 1
}
I got errors like jq: error (at member.json:31): object ({"title":"f...) and array ([[123]]) cannot be sorted, as they are not both arrays or similar...
When the main goal is to count, it is usually more efficient to avoid constructing an array if determining its length is the only reason for doing so. In the present case you could, for example, write:
def count(s): reduce s as $x (null; .+1);
"foo" as $title | .[] | {
id: .member_id,
$title,
count: count(.loans[].media[] | select(.title == $title))
}
group_by has its uses, but it is well to be aware that it is inefficient even for grouping, because its implementation involves a sort, which is not strictly necessary if the goal is to "group by" some criterion. A completely generic sort-free "group by" function is a bit tricky to implement, but often a simple but non-generic version is sufficient, such as:
# sort-free variant of group_by/1
# f must always evaluate to an integer or always to a string, which
# could be achieved by using `tostring`.
# Output: an array in the former case, or an object in the latter case
def GROUP_BY(f): reduce .[] as $x (null; .[$x|f] += [$x] );
Using group_by :
jq 'map(
(.member_id) as $m
| .loans[].media[]
| select(.title=="foo")
| {id: $m, title: .title}
)
|group_by(.id)[]
|.[0] + { count: length }
' input-file

How to format a csv file using json data?

I have a json file that I need to convert to a csv file, but I am a little wary of trusting a json-to-csv converter site as the outputted data seems to be incorrect... so I was hoping to get some help here!
I have the following json file structure:
{
"GroupName": "GrpName13",
"Number": 3,
"Notes": "Test Group ",
"Units": [
{
"UnitNumber": "TestUnit13",
"DataSource": "Factory",
"ContractNumber": "TestContract13",
"CarNumber": "2",
"ControllerTypeMessageId" : 4,
"NumberOfLandings": 4,
"CreatedBy": "user1",
"CommissionModeMessageId": 2,
"Details": [
{
"DetailName": "TestFloor13",
"DetailNumber": "5"
}
],
"UnitDevices": [
{
"DeviceTypeMessageId": 1,
"CreatedBy": "user1"
}
]
}
]
}
The issue I think Im seeing is that the converters seem to not be able to comprehend the many nested data values. And the reason I think the converters are wrong is because when I try to convert back to json using them, I dont receive the same structure.
Does anyone know how to manually format this json into csv format, or know of a reliable converter than can handle nested values?
Try
www.json-buddy.com/convert-json-csv-xml.htm
if not working for you then you can try this tool
http://download.cnet.com/JSON-to-CSV/3000-2383_4-76680683.html
should be helpful!
I have tried your json on this for url:
http://www.convertcsv.com/json-to-csv.htm
As a result:
UnitNumber,DataSource,ContractNumber,CarNumber,ControllerTypeMessageId,NumberOfLandings,CreatedBy,CommissionModeMessageId,Details/0/DetailName,Details/0/DetailNumber,UnitDevices/0/DeviceTypeMessageId,UnitDevices/0/CreatedBy
TestUnit13,Factory,TestContract13,2,4,4,user1,2,TestFloor13,5,1,user1
Because it could save the path of the key,like the 'DeviceTypeMessageId' in list 'UnitDevices': it will named the columns name with 'UnitDevices/0/DeviceTypeMessageId', this could avoid the same name mistake, so you can get the columns name by its converter rules.
Hope helpful.
Here is a solution using jq
If the file filter.jq contains
def denormalize:
def headers($p):
keys_unsorted[] as $k
| if .[$k]|type == "array" then (.[$k]|first|headers("\($p)\($k)_"))
else "\($p)\($k)"
end
;
def setup:
[
keys_unsorted[] as $k
| if .[$k]|type == "array" then [ .[$k][]| setup ]
else .[$k]
end
]
;
def iter:
if length == 0 then []
elif .[0]|type != "array" then
[.[0]] + (.[1:] | iter)
else
(.[0][] | iter) as $x
| (.[1:] | iter) as $y
| [$x[]] + $y
end
;
[ headers("") ], (setup | iter)
;
denormalize | #csv
and data.json contains (note extra samples added)
{
"GroupName": "GrpName13",
"Notes": "Test Group ",
"Number": 3,
"Units": [
{
"CarNumber": "2",
"CommissionModeMessageId": 2,
"ContractNumber": "TestContract13",
"ControllerTypeMessageId": 4,
"CreatedBy": "user1",
"DataSource": "Factory",
"Details": [
{
"DetailName": "TestFloor13",
"DetailNumber": "5"
}
],
"NumberOfLandings": 4,
"UnitDevices": [
{
"CreatedBy": "user1",
"DeviceTypeMessageId": 1
},
{
"CreatedBy": "user10",
"DeviceTypeMessageId": 10
}
],
"UnitNumber": "TestUnit13"
},
{
"CarNumber": "99",
"CommissionModeMessageId": 99,
"ContractNumber": "Contract99",
"ControllerTypeMessageId": 99,
"CreatedBy": "user99",
"DataSource": "Another Factory",
"Details": [
{
"DetailName": "TestFloor99",
"DetailNumber": "99"
}
],
"NumberOfLandings": 99,
"UnitDevices": [
{
"CreatedBy": "user99",
"DeviceTypeMessageId": 99
}
],
"UnitNumber": "Unit99"
}
]
}
then the command
jq -M -r -f filter.jq data.json
will produce
"GroupName","Notes","Number","Units_CarNumber","Units_CommissionModeMessageId","Units_ContractNumber","Units_ControllerTypeMessageId","Units_CreatedBy","Units_DataSource","Units_Details_DetailName","Units_Details_DetailNumber","Units_NumberOfLandings","Units_UnitDevices_CreatedBy","Units_UnitDevices_DeviceTypeMessageId","Units_UnitNumber"
"GrpName13","Test Group ",3,"2",2,"TestContract13",4,"user1","Factory","TestFloor13","5",4,"user1",1,"TestUnit13"
"GrpName13","Test Group ",3,"2",2,"TestContract13",4,"user1","Factory","TestFloor13","5",4,"user10",10,"TestUnit13"
"GrpName13","Test Group ",3,"99",99,"Contract99",99,"user99","Another Factory","TestFloor99","99",99,"user99",99,"Unit99"

jq get the value of x based on y in a complex json file

jq strikes again. Trying to get the value of DATABASES_DEFAULT based on the name in a json file that has a whole lot of names and I'm completely lost.
My file looks like the following (output of an aws ecs describe-task-definition) only much more complex; I've stripped this to the most basic example I can where the structure is still intact.
{
"taskDefinition": {
"status": "bar",
"family": "bar2",
"volumes": [],
"taskDefinitionArn": "bar3",
"containerDefinitions": [
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo"
}
],
"name": "baz",
"links": []
},
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo2"
}
],
"name": "boo",
"links": []
}
],
"revision": 1
}
}
I need the value of DATABASES_DEFAULT where the name is baz. Note that there are a lot of keypairs with name, I'm specifically talking about the one outside of environment.
I've been tinkering with this but only got this far before realizing that I don't understand how to access nested values.
jq '.[] | select(.name==DATABASES_DEFAULT) | .value'
which is returning
jq: error: DATABASES_DEFAULT/0 is not defined at <top-level>, line 1:
.[] | select(.name==DATABASES_DEFAULT) | .value
jq: 1 compile error
Obviously this a) doesn't work, and b) even if it did, it's independant of the name value. My thought was to return all the db defaults and then identify the one with baz, but I don't know if that's the right approach.
I like to think of it as digging down into the structure, so first you open the outer layers:
.taskDefinition.containerDefinitions[]
Now select the one you want:
select(.name =="baz")
Open the inner structure:
.environment[]
Select the desired object:
select(.name == "DATABASES_DEFAULT")
Choose the key you want:
.value
Taken together:
parse.jq
.taskDefinition.containerDefinitions[] |
select(.name =="baz") |
.environment[] |
select(.name == "DATABASES_DEFAULT") |
.value
Run it like this:
<infile jq -f parse.jq
Output:
"foo"
The following seems to work:
.taskDefinition.containerDefinitions[] |
select(
select(
.environment[] | .name == "DATABASES_DEFAULT"
).name == "baz"
)
The output is the object with the name key mapped to "baz".
$ jq '.taskDefinition.containerDefinitions[] | select(select(.environment[]|.name == "DATABASES_DEFAULT").name=="baz")' tmp.json
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo"
}
],
"name": "baz",
"links": []
}