MongoDB aggregate and count json paths - json

I have a MongoDB Collection which contains data elements like this:
{
"_id" : "9878jr23geg",
"element" : {
"name" : "element7",
"Set" : [
{
"SubListA" : [
{
"name" : "AlbertEinstein",
"value" : "45"
},
{
"name" : "JohnDoe",
"value" : "34"
},
]
},
{
"MoreNames" : [
{
"name" : "TimMcGraw",
"value" : "39"
}
]
}
]
}
{
"_id" : "275678hfvd",
"element" : {
"name" : "element8",
"Set" : [
{
"SubListA" : [
{
"name" : "AlbertEinstein",
"value" : "45"
},
{
"name" : "JimmyKimmel",
"value" : "41"
}
]
}
]
}
I'm trying to count the occurrences of each unique name, grouped by the element of Set to which they belong. For example, both objects in my example above have an object with name: "AlbertEinstein" inside element.Set.SublistA; therefore I'd expect a return value something along the lines of:
element.Set.SublistA.AlbertEinstein | 2
Essentially, I'd like a count for each of the distinct names when the data is grouped by objects within element.Set.
Ideally, for the example given, I'd like all of:
element.Set.SubListA.AlbertEinstein | 2
element.Set.SubListA.JohnDoe | 1
element.Set.MoreNames.TimMcGraw | 1
element.Set.SublistA.JimmyKimmel | 1
I've tried several aggregate queries but none seems to achieve what I'm trying to do.

Related

JSON Path Combining

I have a json file like below.
{"trackId":610957461,"countryCode":"TR","deviceType":"IPHONE","date":"2020-10-01","rankings":
[
{"keyword":"boyner","rank":1},
{"keyword":"giyim","rank":1},
{"keyword":"ykm","rank":1},
{"keyword":"colin\\s","rank":1},
{"keyword":"erkek giyim","rank":1},
{"keyword":"boyner kart","rank":1},
{"keyword":"giyim siteleri","rank":1}
]}
When i set json path like $, I see that only trackid,countrycode,devicetype,date columns.
I want keyword and rank columns in addition to these.
So What is the right json path for this columns?
Using this expression (using Jayway)
$..["rankings"]..["keyword", "rank"]
outputs
[
{
"keyword" : "boyner",
"rank" : 1
},
{
"keyword" : "giyim",
"rank" : 1
},
{
"keyword" : "ykm",
"rank" : 1
},
{
"keyword" : "colin\\s",
"rank" : 1
},
{
"keyword" : "erkek giyim",
"rank" : 1
},
{
"keyword" : "boyner kart",
"rank" : 1
},
{
"keyword" : "giyim siteleri",
"rank" : 1
}
]

jq : mapping array in object to another object while keeping parent keys and adding new ones

I would like to map the following structure
{
"id" : "OUTER_ID",
"name" : "OUTER_NAME"
"items" : [
{
"id" : "INNER_ID_1",
"name" : "INNER_NAME_1",
},
{
"id" : "INNER_ID_2",
"name" : "INNER_NAME_2",
}
]
}
into this
{
"payload": [
{
"key" : "INNER_NAME_1_KEY",
"data" : {
"id" : "OUTER_ID",
"name" : "OUTER_NAME",
"items" : [
{
"id" : "INNER_ID_1",
"name" : "INNER_NAME_1"
}
]
}
},
{
"key" : "INNER_NAME_2_KEY",
"data" : {
"id" : "OUTER_ID",
"name" : "OUTER_NAME",
"items" : [
{
"id" : "INNER_ID_2",
"name" : "INNER_NAME_2"
}
]
}
}
]
}
So, for each item in the initial items array, I want to create an entry in the output's payload, i.e I want to map items[i] to payload[i].data.items while also creating the payload, key and data keys in the output, and setting payload[i].data.id and payload[i].data.name to the input's outer id and name.
Can this be done with jq?
Sure, you can use the following filter :
.id as $id | .name as $name | {payload : [ .items[] | {key:.id, data:{id:$id, name: $name, items:[.]}} ] }
You can try it here.

Sort / filter multiple objects in JQ by date

I'm trying to use JQ to find the most recent artifact in a Nexus API query. Right now, my JSON output looks something like:
{
"items" : [ {
"downloadUrl" : "https://nexus.ama.org/repository/Snapshots/org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.144121-1.jar",
"path" : "org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.144121-1.jar",
"id" : "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOTY1N2JiOTEyMTM1ZGRjZWQ",
"repository" : "Snapshots",
"format" : "maven2",
"checksum" : {
"sha1" : "7ac324905fb1ff15ef6020f256fcb5c9f54113ca",
"md5" : "bb25c483a183001dfdc58c07a71a98ed"
}
}, {
"downloadUrl" : "https://nexus.ama.org/repository/Snapshots/org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.204941-2.jar",
"path" : "org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.204941-2.jar",
"id" : "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOWM4YjQ0NmRjYzFkODkxM2U",
"repository" : "Snapshots",
"format" : "maven2",
"checksum" : {
"sha1" : "b4ba2049ea828391c720f49b6668a66a8b0bca9c",
"md5" : "6757c55c0e6d933dc90e398204cca966"
}
} ],
"continuationToken" : null
}
I've managed to use JQ to repackage the data as:
.items[] | { "id" : .id, "date" : (.path | scan("[0-9]{8}\\.[0-9-]*")) }
output:
{
"id": "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOTY1N2JiOTEyMTM1ZGRjZWQ",
"date": "20180703.144121-1"
}
{
"id": "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOWM4YjQ0NmRjYzFkODkxM2U",
"date": "20180703.204941-2"
}
Now I'm a little stuck trying to figure out which of the two JSON objects is the most recent. How can I sort by date and extract the id for that object?
Is there a better way to filter/sort this data? My example has only 2 items[] in the JSON response, but there may be a larger number of them.
The filter sort_by/1 will sort your timestamps in chronological order, but it requires an array as input, so you could write:
.items
| map({ "id" : .id, "date" : (.path | scan("[0-9]{8}\\.[0-9-]*")) })
| sort_by(.date)
| .[-1]
The trailing .[-1] selects the last item, so with your input the result would be:
{
"id": "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOWM4YjQ0NmRjYzFkODkxM2U",
"date": "20180703.204941-2"
}

Retrieve item list by checking multiple attribute values in MongoDB in golang

This question based on MongoDB,How to retrieve selected items retrieve by selecting multiple condition.It is like IN condition in Mysql
SELECT * FROM venuelist WHERE venueid IN (venueid1, venueid2)
I have attached json data structure that I have used.[Ref: JSON STRUCTUE OF MONGODB ].
As an example, it has a venueList then inside the venue list, It has several attribute venue id and sum of user agents name and total count as value.user agents mean user Os,browser and device information. In this case I used os distribution.In that case i was count linux,ubuntu count on particular venueid.
it is like that,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
Finally I want to get count of all linux user count by selecting venueid list in one find query in MongoDB.
As example, I want to select all count of linux users by conditioning if venue id VID1212 or VID4343
Ref: JSON STRUCTUE OF MONGODB
{
"_id" : ObjectId("57f940c4932a00aba387b0b0"),
"tenantID" : 1,
"date" : "2016-10-09 00:23:56",
"venueList" : [
{
"id" : “VID1212”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
“ssidList” : [ // this is list of ssid’s in venue
{
"id" : “SSID1212”,
"sum" : [
{
"name" : "linux",
"value" : 8
},
{
"name" : "ubuntu",
"value" : 6
}
],
“macList” : [ // this is mac list inside particular ssid ex: this is mac list inside the SSID1212
{
"id" : “12:12:12:12:12:12”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 1
}
]
}
]
}
]
},
{
"id" : “VID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"ssidList" : [
{
"id" : “SSID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"macList" : [
{
"id" : “43:43:43:43:43:34”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
]
}
]
}
]
}
]
}
I am using golang as language to manipulation data with mongoldb using mgo.v2 package
expected out put is :
output
linux : 12+2 = 14
ubuntu : 4+0 = 4
Don't consider inner list in venuelist.
You'd need to use the aggregation framework where you would run an aggregation pipeline that first filters the documents in the collection based on
the venueList ids using the $match operator.
The second pipeline would entail flattening the venueList and sum subdocument arrays in order for the data in the documents to be processed further down the pipeline as denormalised entries. The $unwind operator is useful here.
A further filter using $match is necessary after unwinding so that only the documents you want to aggregate are allowed into the next pipeline.
The main pipeline would be the $group operator stage which aggregates the filtered documents to create the desired sums using the accumulator operator $sum. For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value.
Putting this altogether, consider running the following pipeline:
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": null,
"linux": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "linux" ] },
"$venueList.sum.value", 0
]
}
},
"ubuntu": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "ubuntu" ] },
"$venueList.sum.value", 0
]
}
}
}
}
])
For usage with mGo, you can convert the above pipeline using the guidance in http://godoc.org/labix.org/v2/mgo#Collection.Pipe
For a more flexible and better performant alternative which executes much faster than the above, and also takes into consideration unknown values for the sum list, run the alternative pipeline as follows
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": "$venueList.sum.name",
"count": { "$sum": "$venueList.sum.value" }
}
},
{
"$group": {
"_id": null,
"counts": {
"$push": {
"name": "$_id",
"count": "$count"
}
}
}
}
])

How to update a nested array value in mongodb?

I want update a array value that is nested within an array value: i.e. set
status = enabled
where alerts.id = 2
{
"_id" : ObjectId("5496a8ed49847b6cd7c7b350"),
"name" : "joe",
"locations" : [
{
"name": "my location",
"alerts" : [
{
"id" : 1,
"status" : null
},
{
"id" : 2,
"status" : null
}
]
}
]
}
I would have used the position $ character, but cannot use it twice in a statement - multi positional operators are not supported yet: https://jira.mongodb.org/browse/SERVER-831
How do I issue a statement to only update the status field of an alert matching an id of 2?
UPDATE
If I change the schema as follows:
{
"_id" : ObjectId("5496ab2149847b6cd7c7b352"),
"name" : "joe",
"locations" : {
"my location" : {
"alerts" : [
{
"id" : 1,
"status" : "enabled"
},
{
"id" : 2,
"status" : "enabled"
}
]
},
"my other location" : {
"alerts" : [
{
"id" : 3,
"status" : null
},
{
"id" : 4,
"status" : null
}
]
}
}
}
I can then use:
update({"locations.my location.alerts.id":1},{$set: {"locations.my location.alerts.$.status": "enabled"}});
Problem is I cannot create indexes on the alert id :-(
it may be better of modelled as such, specially if an index on location and,or alerts.id is needed.
{
"_id" : ObjectId("5496a8ed49847b6cd7c7b350"),
"name" : "joe",
"location" : "myLocation",
"alerts" : [{
"id" : 1,
"status" : null
},
{
"id" : 2,
"status" : null
}
]
}
{
"_id" : ObjectId("5496a8ed49847b6cd7c7b350"),
"name" : "joe",
"location" : "otherLocation",
"alerts" : [{
"id" : 1,
"status" : null
},
{
"id" : 2,
"status" : null
}
]
}
I think you are having a wrong tool for the job. What you have in your example is relational data and it's much easier to handle with relational database. So I would suggest to use SQL-database instead of mongo.
But if you really want to do it with mongo, then I guess the only option is to fetch the document and modify it and put it back.