JQ: group by into single object with groups as keys - json

I have the following data:
[
{
"company.u_ats_region": "Region1",
"hostname": "host1",
"install_status": "1",
"os": "Windows",
"os_domain": "test.com"
},
{
"company.u_ats_region": "Region2",
"hostname": "host2",
"install_status": "1",
"os": "Windows",
"os_domain": "test.com"
},
{
"company.u_ats_region": "Region3",
"hostname": "host3",
"install_status": "7",
"os": "Windows",
"os_domain": "test.com"
}
]
And I've been using this query
{count: length,
regions: [group_by(."company.u_ats_region")[] |
{( [0]."company.u_ats_region"): [.[] |
{name: (.hostname+"."+.os_domain),
os: .os}]}]}
to convert the data into the following:
{
"count": 3,
"regions": [
{
"Region1": [
{
"name": "host1.test.com",
"os": "Windows"
}
]
},
{
"Region2": [
{
"name": "host2.test.com",
"os": "Windows"
}
]
},
{
"Region3": [
{
"name": "host3.test.com",
"os": "Windows"
}
]
}
]
}
This is close to what I'm trying to achieve but I would like 'regions' to be a single object with each region being a key within that object like this:
{
"count": 3,
"regions": {
"Region1": [
{
"name": "host1.test.com",
"os": "Windows"
}
],
"Region2": [
{
"name": "host2.test.com",
"os": "Windows"
}
],
"Region3": [
{
"name": "host3.test.com",
"os": "Windows"
}
]
}
}
I have tried playing around with 'add' but that still didn't bring me any closer to the result I'm trying to achieve. Any help is appreciated!

Creating an object with key and value fields, then using from_entries would be one way:
{
count: length,
regions: group_by(."company.u_ats_region")
| map({
key: .[0]."company.u_ats_region",
value: map({name: "\(.hostname).\(.os_domain)", os})
})
| from_entries
}
{
"count": 3,
"regions": {
"Region1": [
{
"name": "host1.test.com",
"os": "Windows"
}
],
"Region2": [
{
"name": "host2.test.com",
"os": "Windows"
}
],
"Region3": [
{
"name": "host3.test.com",
"os": "Windows"
}
]
}
}
Demo

You can define a custom function which performs the grouping and then tranfsorm the result. Using a function avoids having to repeat the selector:
def group_to_obj(f):
group_by(f) | map({key:first|f, value:.}) | from_entries;
{
count: length,
regions: group_to_obj(."company.u_ats_region")
| map_values(map({name: "\(.hostname).\(.os_domain)", os}))
}
Output:
{
"count": 3,
"regions": {
"Region1": [
{
"name": "host1.test.com",
"os": "Windows"
}
],
"Region2": [
{
"name": "host2.test.com",
"os": "Windows"
}
],
"Region3": [
{
"name": "host3.test.com",
"os": "Windows"
}
]
}
}

Using reduce to iteratively build up the arrays would be another way:
{
count: length,
regions: (
reduce .[] as $i ({};
.[$i."company.u_ats_region"] += [$i | {name: "\(.hostname).\(.os_domain)", os}]
)
)
}
{
"count": 3,
"regions": {
"Region1": [
{
"name": "host1.test.com",
"os": "Windows"
}
],
"Region2": [
{
"name": "host2.test.com",
"os": "Windows"
}
],
"Region3": [
{
"name": "host3.test.com",
"os": "Windows"
}
]
}
}
Demo

Related

jq flatten a deeply nested json document

Considering the following deeply nested json object
[
{
"level1key": "level1value",
"children": [
{
"level2key": "level2value1",
"children": [
{
"level3key1": "ignored",
"level3key2": "ignored",
"level3key3": [
{
"level4key": "ignored"
}
]
},
{
"level3key1": "level3value1",
"level3key2": "level3value22",
"level3key3": [
{
"level4key": "level4value1"
}
]
},
{
"level3key1": "level3value2",
"level3key2": "level3value22",
"level3key3": [
{
"level4key": "level4value2"
}
]
}
]
},
{
"level2key": "level2value2",
"children": [
{
"level3key1": "ignored",
"level3key2": "ignored",
"level3key3": [
{
"level4key": "ignored"
}
]
},
{
"level3key1": "level3value3",
"level3key2": "level3value22",
"level3key3": [
{
"level4key": "level4value3"
}
]
},
{
"level3key1": "level3value4",
"level3key2": "level3value22",
"level3key3": [
{
"level4key": "level4value4"
}
]
}
]
}
]
}
]
I need to filter by "level3value22" at .[0].children[].children[].level3key2 and flatten this deeply nested json object into an array. The expected result is the following. How should the jq sentence look like?
[
{
"v1": "level1value",
"v2": "level2value1",
"v3": "level3value1",
"v4": "level4value1"
},
{
"v1": "level1value",
"v2": "level2value1",
"v3": "level3value2",
"v4": "level4value2"
},
{
"v1": "level1value",
"v2": "level2value2",
"v3": "level3value3",
"v4": "level4value3"
},
{
"v1": "level1value",
"v2": "level2value2",
"v3": "level3value4",
"v4": "level4value4"
}
]
Thanks in advance!
I'm not sure if I understood correctly what you were trying to achieve but this at least works on your sample data:
[
.[0] | [.level1key] + (
.children[] | [.level2key] + (
.children[] | select(.level3key2 == "level3value22") | [.level3key1] + (
.level3key3[] | [.level4key]
)
)
)
| with_entries(.key |= "v\(. + 1)")
]
[
{
"v1": "level1value",
"v2": "level2value1",
"v3": "level3value1",
"v4": "level4value1"
},
{
"v1": "level1value",
"v2": "level2value1",
"v3": "level3value2",
"v4": "level4value2"
},
{
"v1": "level1value",
"v2": "level2value2",
"v3": "level3value3",
"v4": "level4value3"
},
{
"v1": "level1value",
"v2": "level2value2",
"v3": "level3value4",
"v4": "level4value4"
}
]
Demo

Sort complex JSON object by specific property

How can I sort the given JSON object with property count. I want to sort the entire sub-object. The higher the count value should come on the top an so on.
{
"Resource": [
{
"details": [
{
"value": "3.70"
},
{
"value": "3.09"
}
],
"work": {
"count": 1
}
},
{
"details": [
{
"value": "4"
},
{
"value": "5"
}
],
"work": {
"count": 2
},
{
"details": [
{
"value": "5"
},
{
"value": "5"
}
],
"work": "null"
}
]
}
You can try this example to sort your data:
data = {
"data": {
"Resource": [
{
"details": [{"value": "3.70"}, {"value": "3.09"}],
"work": {"count": 1},
},
{"details": [{"value": "4"}, {"value": "5"}], "work": {"count": 2}},
]
}
}
# sort by 'work'/'count'
data["data"]["Resource"] = sorted(
data["data"]["Resource"], key=lambda r: r["work"]["count"]
)
# sort by 'details'/'value'
for r in data["data"]["Resource"]:
r["details"] = sorted(r["details"], key=lambda k: float(k["value"]))
# pretty print:
import json
print(json.dumps(data, indent=4))
Prints:
{
"data": {
"Resource": [
{
"details": [
{
"value": "3.09"
},
{
"value": "3.70"
}
],
"work": {
"count": 1
}
},
{
"details": [
{
"value": "4"
},
{
"value": "5"
}
],
"work": {
"count": 2
}
}
]
}
}

How can I create a hierarchical json response in FLASK

I have a single table in database like database table. I want to search a child from database and return a hierarchical JSON to a front end in order to create a tree. How can I do that in FLASK.
My expected JSON for mat should be like expected JSON
Since you have tagged your question with flask, this post assumes you are using Python as well. To format your database values in JSON string, you can query the db and then use recursion:
import sqlite3, collections
d = list(sqlite3.connect('file.db').cursor().execute("select * from values"))
def get_tree(vals):
_d = collections.defaultdict(list)
for a, *b in vals:
_d[a].append(b)
return [{'name':a, **({} if not (c:=list(filter(None, b))) else {'children':get_tree(b)})} for a, b in _d.items()]
import json
print(json.dumps(get_tree(d), indent=4))
Output:
[
{
"name": "AA",
"children": [
{
"name": "BB",
"children": [
{
"name": "EE",
"children": [
{
"name": "JJ",
"children": [
{
"name": "EEV"
},
{
"name": "FFW"
}
]
},
{
"name": "KK",
"children": [
{
"name": "HHX"
}
]
}
]
}
]
},
{
"name": "CC",
"children": [
{
"name": "FF",
"children": [
{
"name": "LL",
"children": [
{
"name": "QQY"
}
]
},
{
"name": "MM",
"children": [
{
"name": "RRV"
}
]
}
]
},
{
"name": "GG",
"children": [
{
"name": "NN",
"children": [
{
"name": "SSW"
}
]
}
]
}
]
},
{
"name": "DD",
"children": [
{
"name": "HH",
"children": [
{
"name": "OO",
"children": [
{
"name": "TTZ"
}
]
}
]
},
{
"name": "II",
"children": [
{
"name": "PP",
"children": [
{
"name": "UUW"
}
]
}
]
}
]
}
]
}
]

Cloudformation template to create EMR cluster

I am trying to create EMR-5.30.1 clusters with applications such as Hadoop, livy, Spark, ZooKeeper, and Hive with the help of the CloudFormation template. But the issue is with this template is I am able the cluster with only one application from the above list of applications.
below is the CloudFormation Template
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "Best Practice EMR Cluster for Spark or S3 backed Hbase",
"Parameters": {
"EMRClusterName": {
"Description": "Name of the cluster",
"Type": "String",
"Default": "emrcluster"
},
"KeyName": {
"Description": "Must be an existing Keyname",
"Type": "String",
"Default": "keyfilename"
},
"MasterInstanceType": {
"Description": "Instance type to be used for the master instance.",
"Type": "String",
"Default": "m5.xlarge"
},
"CoreInstanceType": {
"Description": "Instance type to be used for core instances.",
"Type": "String",
"Default": "m5.xlarge"
},
"NumberOfCoreInstances": {
"Description": "Must be a valid number",
"Type": "Number",
"Default": 1
},
"SubnetID": {
"Description": "Must be Valid public subnet ID",
"Default": "subnet-ee15b3e0",
"Type": "String"
},
"LogUri": {
"Description": "Must be a valid S3 URL",
"Default": "s3://aws/elasticmapreduce/",
"Type": "String"
},
"S3DataUri": {
"Description": "Must be a valid S3 bucket URL ",
"Default": "s3://aws/elasticmapreduce/",
"Type": "String"
},
"ReleaseLabel": {
"Description": "Must be a valid EMR release version",
"Default": "emr-5.30.1",
"Type": "String"
},
"Applications": {
"Description": "Please select which application will be installed on the cluster this would be either Ganglia and spark, or Ganglia and s3 backed Hbase",
"Type": "String",
"AllowedValues": [
"Spark",
"Hbase",
"Hive",
"Livy",
"ZooKeeper"
]
}
},
"Mappings": {},
"Conditions": {
"Spark": {
"Fn::Equals": [
{
"Ref": "Applications"
},
"Spark"
]
},
"Hbase": {
"Fn::Equals": [
{
"Ref": "Applications"
},
"Hbase"
]
},
"Hive": {
"Fn::Equals": [
{
"Ref": "Applications"
},
"Hive"
]
},
"Livy": {
"Fn::Equals": [
{
"Ref": "Applications"
},
"Livy"
]
},
"ZooKeeper": {
"Fn::Equals": [
{
"Ref": "Applications"
},
"ZooKeeper"
]
}
},
"Resources": {
"EMRCluster": {
"DependsOn": [
"EMRClusterServiceRole",
"EMRClusterinstanceProfileRole",
"EMRClusterinstanceProfile"
],
"Type": "AWS::EMR::Cluster",
"Properties": {
"Applications": [
{
"Name": "Ganglia"
},
{
"Fn::If": [
"Spark",
{
"Name": "Spark"
},
{
"Ref": "AWS::NoValue"
}
]
},
{
"Fn::If": [
"Hbase",
{
"Name": "Hbase"
},
{
"Ref": "AWS::NoValue"
}
]
},
{
"Fn::If": [
"Hive",
{
"Name": "Hive"
},
{
"Ref": "AWS::NoValue"
}
]
},
{
"Fn::If": [
"Livy",
{
"Name": "Livy"
},
{
"Ref": "AWS::NoValue"
}
]
},
{
"Fn::If": [
"ZooKeeper",
{
"Name": "ZooKeeper"
},
{
"Ref": "AWS::NoValue"
}
]
}
],
"Configurations": [
{
"Classification": "hbase-site",
"ConfigurationProperties": {
"hbase.rootdir":{"Ref":"S3DataUri"}
}
},
{
"Classification": "hbase",
"ConfigurationProperties": {
"hbase.emr.storageMode": "s3"
}
}
],
"Instances": {
"Ec2KeyName": {
"Ref": "KeyName"
},
"Ec2SubnetId": {
"Ref": "SubnetID"
},
"MasterInstanceGroup": {
"InstanceCount": 1,
"InstanceType": {
"Ref": "MasterInstanceType"
},
"Market": "ON_DEMAND",
"Name": "Master"
},
"CoreInstanceGroup": {
"InstanceCount": {
"Ref": "NumberOfCoreInstances"
},
"InstanceType": {
"Ref": "CoreInstanceType"
},
"Market": "ON_DEMAND",
"Name": "Core"
},
"TerminationProtected": false
},
"VisibleToAllUsers": true,
"JobFlowRole": {
"Ref": "EMRClusterinstanceProfile"
},
"ReleaseLabel": {
"Ref": "ReleaseLabel"
},
"LogUri": {
"Ref": "LogUri"
},
"Name": {
"Ref": "EMRClusterName"
},
"AutoScalingRole": "EMR_AutoScaling_DefaultRole",
"ServiceRole": {
"Ref": "EMRClusterServiceRole"
}
}
},
"EMRClusterServiceRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"elasticmapreduce.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"
],
"Path": "/"
}
},
"EMRClusterinstanceProfileRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role"
],
"Path": "/"
}
},
"EMRClusterinstanceProfile": {
"Type": "AWS::IAM::InstanceProfile",
"Properties": {
"Path": "/",
"Roles": [
{
"Ref": "EMRClusterinstanceProfileRole"
}
]
}
}
},
"Outputs": {}
}
Also, I want to add a bootstrap script in this template as well, Can anyone please help me with the issue.
As per my knoweldge and understanding, Applications in your case should be an array like below, as mentioned in documentation
"Applications" : [ Application, ... ],
In you case, you can list applications like
"Applications" : [
{"Name" : "Spark"},
{"Name" : "Hbase"},
{"Name" : "Hive"},
{"Name" : "Livy"},
{"Name" : "Zookeeper"},
]
For more arguments other than Name to individual application dictionary , see detail here, you can pass Args, Additional_info etc
You can use following way:-
If you set "ReleaseLabel" then there is no need to mention versions of applications
"Applications": [{
"Name": "Hive"
},
{
"Name": "Presto"
},
{
"Name": "Spark"
}
]
For bootstrap:-
"BootstrapActions": [{
"Name": "setup",
"ScriptBootstrapAction": {
"Path": "s3://bucket/key/Bootstrap.sh"
}
}]
Define like this to create all applications at once.
{
"Type": "AWS::EMR::Cluster",
"Properties": {
"Applications": [
{
"Name": "Ganglia"
},
{
"Name": "Spark"
},
{
"Name": "Livy"
},
{
"Name": "ZooKeeper"
},
{
"Name": "JupyterHub"
}
]
}
}

Recurse for object if exists in JQ

I have the following structure:
{
"hits":
[
{
"_index": "main"
},
{
"_index": "main",
"accordions": [
{
"id": "1",
"accordionBody": "body1",
"accordionInnerButtonTexts": [
"button11",
"button12"
]
},
{
"id": "2",
"accordionBody": "body2",
"accordionInnerButtonTexts": [
"button21",
"button22"
]
}
]
}
]
}
I want to get to this structure:
{
"index": "main"
}
{
"index": "main",
"accordions":
[
{
"id": "1",
"accordionBody": "body1",
"accordionInnerButtonTexts": [
"button11",
"button12"
]
},
{
"id": "2",
"accordionBody": "body2",
"accordionInnerButtonTexts": [
"button21",
"button22"
]
}
]
}
Which means that I always want to include the _index-field as index, and I want to include the whole accordions-list IF IT EXISTS in the object. Here is my attempt:
.hits[] | {index: ._index, accordions: recurse(.accordions[]?)}
It does not produce what I want:
{
"index": "main",
"accordions": {
"_index": "main"
}
}
{
"index": "main",
"accordions": {
"_index": "main",
"accordions": [
{
"id": "1",
"accordionBody": "body1",
"accordionInnerButtonTexts": [
"button11",
"button12"
]
},
{
"id": "2",
"accordionBody": "body2",
"accordionInnerButtonTexts": [
"button21",
"button22"
]
}
]
}
}
{
"index": "main",
"accordions": {
"id": "1",
"accordionBody": "body1",
"accordionInnerButtonTexts": [
"button11",
"button12"
]
}
}
{
"index": "main",
"accordions": {
"id": "2",
"accordionBody": "body2",
"accordionInnerButtonTexts": [
"button21",
"button22"
]
}
}
It seems to create a list of all different permutations given by mixing the objects. This is not what I want. What is the correct jq command, and what is my mistake?
The problem as stated does not require any recursion. Using your attempt as a model, one could in fact simply write:
.hits[]
| {index: ._index}
+ (if has("accordions") then {accordions} else {} end)
Or, with quite different semantics:
.hits[] | {index: ._index} + . | del(._index)