split json file into object per file - json

I have a JSON file, with a structure like this:
{
"106" : {
"id54011" : [
{
"partno1" : "16690617"
},
{
"partno2" : "5899180"
}
],
"parts" : [
"0899180",
"16920617"
],
"id5632" : [
{
"partno1" : "090699180"
}
]
},
"560" : {
"id9452" : [
{
"partno2" : "1569855"
}
],
"parts" : [
"03653624",
"15899855"
],
"id578" : [
{
"partno3" : "0366393624"
},
{
"partno4" : "0363213624"
}
]
}
}
I need to split this JSON into multiple files, using this method:
Each JSON file will consist of one object. Using the example file above, I should end up with 000106.json, and 000560.json. (All names, must have 6 digits, so zeros must be added.)
I have tried to use an iteration grouper, in python, and jq, for this, but no luck up to now.
Expected output:
JSON file 1, named 000106.json:
{
"106" : {
"id54011" : [
{
"partno1" : "16690617"
},
{
"partno2" : "5899180"
}
],
"parts" : [
"0899180",
"16920617"
],
"id5632" : [
{
"partno1" : "090699180"
}
]
}
}
JSON file 2, named 000560.json:
{
"560" : {
"id9452" : [
{
"partno2" : "1569855"
}
],
"parts" : [
"03653624",
"15899855"
],
"id578" : [
{
"partno3" : "0366393624"
},
{
"partno4" : "0363213624"
}
]
}
}

Since this question has both jq and awk tags, I'd recoomend using jq and awk as explained here: Split a JSON file into separate files
You can easily pad the key names in jq or awk.

Related

Parse and Map 2 Arrays with jq

I am working with a JSON file similar to the one below:
{ "Response" : {
"TimeUnit" : [ 1576126800000 ],
"metaData" : {
"errors" : [ ],
"notices" : [ "query served by:1"]
},
"stats" : {
"data" : [ {
"identifier" : {
"names" : [ "apiproxy", "response_status_code", "target_response_code", "target_ip" ],
"values" : [ "IO", "502", "502", "7.1.143.6" ]
},
"metric" : [ {
"env" : "dev",
"name" : "sum(message_count)",
"values" : [ 0.0]
} ]
} ]
} } }
My object is to display a mapping of the identifier and values like :
apiproxy=IO
response_status_code=502
target_response_code=502
target_ip=7.1.143.6
I have been able to parse both names and values with
.[].stats.data[] | (.identifier.names[]) and .[].stats.data[] | (.identifier.values[])
but I need help with the jq way to map the values.
The whole thing can be done in jq using the -r command-line option:
.[].stats.data[]
| [.identifier.names, .identifier.values]
| transpose[]
| "\(.[0])=\(.[1])"

parsing json with double colons

beginner with js here. I'm trying to parse a json string with node6. The interesting bit of json goes like this:
{
"Metadata" : {
"AWS::CloudFormation::Interface" : {
"ParameterGroups" : [
{
"Label" : {
"default": "Group1"
},
"Parameters" : [
"One",
"Two"
]
},
{
"Label" : {
"default": "Group2"
},
"Parameters" : [
"Three"
]
}
]
}
}
}
I'm trying to list all Parameters (One, Two, Three), but I cannot get through "AWS::CloudFormation::Interface". Accessing AWS::CloudFormation::Interface.ParameterGroups fails, and trying to walk AWS::CloudFormation::Interface subtree
for ( a in Metadata ) {
for ( b in a ) {}
}
get's me an array of single characters.
thanks.

Get the nth item of JSON array in MongoDB

Using MongoDb how do you get back the date, and 3rd "obs" back from below?
{ "data" : [
{ "val" : [
{ "obs" : "2/3/2016"
},
{ "obs" : 41.8599992990494
},
{ "obs" : 41.3111999630928
},
{ "obs" : 5.048
}
]
},
{ "val" : [
{ "obs" : "2/4/2016"
},
{ "obs" : 39.394998550415
},
{ "obs" : 41.8486998975277
},
{ "obs" : NumberInt(0)
}
]
},
{ "val" : [
{ "obs" : "2/5/2016"
},
{ "obs" : NumberInt(0)
},
{ "obs" : 40.2090013027191
},
{ "obs" : 24.2410004138947
},
{ "obs" : 3.629
}
]
}
]
}
Started with this:
db.myColl.find({},{"_id":0, "data.val.obs": 1, })
would like:
["2/3/2016", 41.3111], ["2/4/2016", 41.8486]
Here is how you could do this in MongoDB starting from v 3.4
db.getCollection('test').aggregate([
{
$addFields: {
data: {
$map: {
input: "$data",
as: "item",
in: {$concatArrays: [{$slice: ['$$item.val', 1]}, {$slice: ['$$item.val', 2, 1]}]}
}
}
}
}
]);
So basically I'm using $addFields not to lose other properties of a root document (as you might need them). If you don't need them you can switch to $project.
Example: collection records look like this: {_id: ..., data: [...], data_2: [...]}.
If you run the query as is you'll have 'data' array filtered. But you'll still have data_2 unchanged. If you replace $addFields with $project you'll lose data_2. (or you need to explicitly tell mongo to keep it by passing data_2: true)
Then I'm mapping each element of 'data' array and assign the result back to 'data' array so in fact data property is overridden by filtered array.
To get 1st and 3rd elements I use $slice (each $slice returns an array of one document). And then I join them into a single array by $concatArrays.

Retrieve item list by checking multiple attribute values in MongoDB in golang

This question based on MongoDB,How to retrieve selected items retrieve by selecting multiple condition.It is like IN condition in Mysql
SELECT * FROM venuelist WHERE venueid IN (venueid1, venueid2)
I have attached json data structure that I have used.[Ref: JSON STRUCTUE OF MONGODB ].
As an example, it has a venueList then inside the venue list, It has several attribute venue id and sum of user agents name and total count as value.user agents mean user Os,browser and device information. In this case I used os distribution.In that case i was count linux,ubuntu count on particular venueid.
it is like that,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
Finally I want to get count of all linux user count by selecting venueid list in one find query in MongoDB.
As example, I want to select all count of linux users by conditioning if venue id VID1212 or VID4343
Ref: JSON STRUCTUE OF MONGODB
{
"_id" : ObjectId("57f940c4932a00aba387b0b0"),
"tenantID" : 1,
"date" : "2016-10-09 00:23:56",
"venueList" : [
{
"id" : “VID1212”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
“ssidList” : [ // this is list of ssid’s in venue
{
"id" : “SSID1212”,
"sum" : [
{
"name" : "linux",
"value" : 8
},
{
"name" : "ubuntu",
"value" : 6
}
],
“macList” : [ // this is mac list inside particular ssid ex: this is mac list inside the SSID1212
{
"id" : “12:12:12:12:12:12”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 1
}
]
}
]
}
]
},
{
"id" : “VID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"ssidList" : [
{
"id" : “SSID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"macList" : [
{
"id" : “43:43:43:43:43:34”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
]
}
]
}
]
}
]
}
I am using golang as language to manipulation data with mongoldb using mgo.v2 package
expected out put is :
output
linux : 12+2 = 14
ubuntu : 4+0 = 4
Don't consider inner list in venuelist.
You'd need to use the aggregation framework where you would run an aggregation pipeline that first filters the documents in the collection based on
the venueList ids using the $match operator.
The second pipeline would entail flattening the venueList and sum subdocument arrays in order for the data in the documents to be processed further down the pipeline as denormalised entries. The $unwind operator is useful here.
A further filter using $match is necessary after unwinding so that only the documents you want to aggregate are allowed into the next pipeline.
The main pipeline would be the $group operator stage which aggregates the filtered documents to create the desired sums using the accumulator operator $sum. For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value.
Putting this altogether, consider running the following pipeline:
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": null,
"linux": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "linux" ] },
"$venueList.sum.value", 0
]
}
},
"ubuntu": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "ubuntu" ] },
"$venueList.sum.value", 0
]
}
}
}
}
])
For usage with mGo, you can convert the above pipeline using the guidance in http://godoc.org/labix.org/v2/mgo#Collection.Pipe
For a more flexible and better performant alternative which executes much faster than the above, and also takes into consideration unknown values for the sum list, run the alternative pipeline as follows
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": "$venueList.sum.name",
"count": { "$sum": "$venueList.sum.value" }
}
},
{
"$group": {
"_id": null,
"counts": {
"$push": {
"name": "$_id",
"count": "$count"
}
}
}
}
])

Filtering JSONPath with given string value

If I have a JSON like so:
{
"data": [
{
"service" : { "id" : 1 }
},
{
"service" : { "id" : 2 }
},
{
"service" : {}
}
]
}
This query works:
$..service[?(#.id==2)]
And gives expected result:
[
{
"id" : 2
}
]
However, if I had strings as id's:
{
"data": [
{
"service" : { "id" : "a" }
},
{
"service" : { "id" : "b" }
},
{
"service" : {}
}
]
}
Running similar query:
$..service[?(#.id == "a")]
Gives no results (empty array).
I am using this evaluator.
I was looking at docs here but could not find anything to point me in the right direction... Any help if someone knows how to write such query? Thanks :)
without " works
$..service[?(#.id == b)]
give this result
[
{
"id" : "b"
}
]