I'm having some problem to write a query to return a triple nested value from a document. The documents I'm using are structured like this
{
"areaname": "name1",
"places": [
{
"placename": "place1",
"objects": [
{
"objname": "obj1",
"tags": [
"tag1",
"tag2"
]
},
{
"objname": "obj2",
"tags": [
"tag6",
"tag7"
]
}
]
},
{
"placename": "place2",
"objects": [
{
"objname": "obj45",
"tags": [
"tag46",
"tag34"
]
},
{
"objname": "obj77",
"tags": [
"tag56",
"tag11"
]
}
]
}
]
}
It is quite simple actually but I can't find a solution to a simple query like:
"return the objname of the object that contains tag1 inside their tag"
So for the give document if I use "tag1" as a parameter it is expected for the query to return "obj1"
It should give me the same result if I use "tag2" as a parameter
Other example: using "tag56" it should return only "obj77"
Right now i have no problem returning the whole document using the dot-notation or top level field such as areaname or others
db.users.find( {"places.objects.tags":"tag1"}, { areaname: 1, _id:0 } )
Is this even possible?
Keeping it simple:
[
{
"$match" : {
"places.objects.tags" : "tag1"
}
},
{
"$unwind" : "$places"
},
{
"$unwind" : "$places.objects"
},
{
"$match" : {
"places.objects.tags" : "tag1"
}
},
{
"$group" : {
"_id" : "$_id",
"obj_names" : {
"$push" : "$places.objects.objname"
}
}
}
],
You should add any other fields you want to keep to the group stage,
this can also be done without the double $unwind stage but i choose this for read-ability.
Related
I have this JSON
{
"srv_config": [{
"name": "db1",
"servers": ["srv1", "srv2"],
"prop": [{"source":"aa"},"destination":"bb"},{"source":"cc"},"destination":"cc"},]
}, {
"name": "db2",
"servers": ["srv2", "srv2"],
"prop": [{"source":"dd"},"destination":"dd"},{"source":"ee"},"destination":"ee"},]
}
]
}
I try to build a JMESPath expression to select the prop application in each object in the main array, but based on the existence of a string in the servers element.
To select all props, I can do:
*.props [*]
But how do I add condition that says "select only if srv1 is in servers list"?
You can use the contains function in order to filter based on a array containing something.
Given the query:
*[?contains(servers, `srv1`)].prop | [][]
This gives us:
[
{
"source": "aa",
"destination": "bb"
},
{
"source": "cc",
"destination": "cc"
}
]
Please mind that I am also using a bit of flattening here.
All this run towards a corrected version of you JSON:
{
"srv_config":[
{
"name":"db1",
"servers":[
"srv1",
"srv2"
],
"prop":[
{
"source":"aa",
"destination":"bb"
},
{
"source":"cc",
"destination":"cc"
}
]
},
{
"name":"db2",
"servers":[
"srv2",
"srv2"
],
"prop":[
{
"source":"dd",
"destination":"dd"
},
{
"source":"ee",
"destination":"ee"
}
]
}
]
}
I building a standard free text search on a site that sells cars.
In the search box the user can enter a search word that are passed on to the query where it is used to match both nested and non-nested properties.
I'm using inner_hits to limit the number of variants returned by the query (in this sample variants is not remove from _source)
When matching on a nested property color the inner_hits collection contains the correct variant as expected.
However when matching on a non-nested property title the inner_hits collection is empty. I understand why it's empty.
Can you suggest a better way to structure the query?
Another option would be to always just return at least 1 variant - but how can the be achieved?
Mappings
PUT test
{
"mappings": {
"car": {
"properties": {
"variants": {
"type": "nested"
}
}
}
}
}
Insert data
PUT test/car/1
{
"title": "VW Golf",
"variants": [
{
"color": "red",
"forsale": true
},
{
"color": "blue",
"forsale": false
}
]
}
Query by color
GET test/_search
{
"query": {
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "blue"
}
},
"inner_hits": {}
}
}
}
Color query: works as expected!
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 1,
"hits" : [
{
"_nested" : {
"field" : "variants",
"offset" : 1
},
"_source" : {
"color" : "blue",
"forsale" : false
}
}
]
}
}
}
}
]
Query by brand
GET test/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "golf"
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "golf"
}
},
"inner_hits": {}
}
}
]
}
}
}
Brand query result :-(
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 0,
"hits" : [ ]
}
}
}
}
You already know it but inner_hits returns an empty array because no nested documents matched in the nested query.
A simple solution is to change the query such that the nested query will always match. This can be done by wrapping the nested query into a bool query and add a match_all query.
If you set the boost of the match_all query to 0, it will not contribute to the score. Consequently, if a nested document match it will be first.
Now the inner hits will not be empty, but there is a second problem, all the documents will match. You can either:
set a min_score with a very small value (e.g., 0.00000001) to discard document with a score of 0
duplicate the original nested query and use a minimum_should_match at 2.
{
"query": {
"bool": {
// Ensure that at least 1 of the first 2 queries will match
// The third query will always match
"minimum_should_match": 2,
"should": [
{
"match": {
"title": <SEARCH_TERM>
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": <SEARCH_TERM>
}
}
}
},
{
"nested": {
"path": "variants",
"query": {
"bool": {
"should": [
{
"match": {
"variants.color": <SEARCH_TERM>
}
},
{
// Disable scoring
"match_all": { "boost": 0 }
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
One way to do it is using a script_fields clause.
You would write a little script in painless that would do the following:
store the List you get from variants in a variable
then iterate over the Maps in this List
if the Map has
color blue you return the Map . (If none evaluate to true you return an empty
Map). This would create an additional field per searchresult with only those variants where the color is blue.
One important drawback is that this is a very heavy operation, especially if you have many records.
You can take this approach if it is something only you will ever do, maybe a few times a year outside peak hours. If your use case is something with regular use and to be performed by many users, I would change the mapping, return variants as a whole or choose some other solution.
Here is the sample JSON
Sample JSON:
[
{
"_id": "123456789",
"YEAR": "2019",
"VERSION": "2019.Version",
"QUESTION_GROUPS": [
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "MH"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
},
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "UP"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
}
]
}
]
Query that am using :
db.collection.find({},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
})
My requirement is retrive all QUESTION_VALUE whose QUESTION_NAME is equals to STATE_CODE.
Thanks in Advance.
If I get you well, What you are trying to do is something like:
db.collection.find(
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
})
Attention: you will get ALL the "QUESTION_VALUE" for ANY document which has a QUESTION_GROUPS.QUESTIONS.QUESTION_NAME with that value.
Attention 2: You will get also the _Id. It is by default.
In case you would like to skip those issues, you may need to use Aggregations, and unwind the "QUESTION_GROUPS"-> "QUESTIONS". This way you can skip both the irrelevant results, and the _id field.
It sounds like you want to unwind the arrays and grab only the question values back
Try this
db.collection.aggregate([
{
$unwind: "$QUESTION_GROUPS"
},
{
$unwind: "$QUESTION_GROUPS.QUESTIONS"
},
{
$match: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
}
},
{
$project: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
}
}
])
This question based on MongoDB,How to retrieve selected items retrieve by selecting multiple condition.It is like IN condition in Mysql
SELECT * FROM venuelist WHERE venueid IN (venueid1, venueid2)
I have attached json data structure that I have used.[Ref: JSON STRUCTUE OF MONGODB ].
As an example, it has a venueList then inside the venue list, It has several attribute venue id and sum of user agents name and total count as value.user agents mean user Os,browser and device information. In this case I used os distribution.In that case i was count linux,ubuntu count on particular venueid.
it is like that,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
Finally I want to get count of all linux user count by selecting venueid list in one find query in MongoDB.
As example, I want to select all count of linux users by conditioning if venue id VID1212 or VID4343
Ref: JSON STRUCTUE OF MONGODB
{
"_id" : ObjectId("57f940c4932a00aba387b0b0"),
"tenantID" : 1,
"date" : "2016-10-09 00:23:56",
"venueList" : [
{
"id" : “VID1212”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
“ssidList” : [ // this is list of ssid’s in venue
{
"id" : “SSID1212”,
"sum" : [
{
"name" : "linux",
"value" : 8
},
{
"name" : "ubuntu",
"value" : 6
}
],
“macList” : [ // this is mac list inside particular ssid ex: this is mac list inside the SSID1212
{
"id" : “12:12:12:12:12:12”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 1
}
]
}
]
}
]
},
{
"id" : “VID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"ssidList" : [
{
"id" : “SSID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"macList" : [
{
"id" : “43:43:43:43:43:34”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
]
}
]
}
]
}
]
}
I am using golang as language to manipulation data with mongoldb using mgo.v2 package
expected out put is :
output
linux : 12+2 = 14
ubuntu : 4+0 = 4
Don't consider inner list in venuelist.
You'd need to use the aggregation framework where you would run an aggregation pipeline that first filters the documents in the collection based on
the venueList ids using the $match operator.
The second pipeline would entail flattening the venueList and sum subdocument arrays in order for the data in the documents to be processed further down the pipeline as denormalised entries. The $unwind operator is useful here.
A further filter using $match is necessary after unwinding so that only the documents you want to aggregate are allowed into the next pipeline.
The main pipeline would be the $group operator stage which aggregates the filtered documents to create the desired sums using the accumulator operator $sum. For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value.
Putting this altogether, consider running the following pipeline:
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": null,
"linux": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "linux" ] },
"$venueList.sum.value", 0
]
}
},
"ubuntu": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "ubuntu" ] },
"$venueList.sum.value", 0
]
}
}
}
}
])
For usage with mGo, you can convert the above pipeline using the guidance in http://godoc.org/labix.org/v2/mgo#Collection.Pipe
For a more flexible and better performant alternative which executes much faster than the above, and also takes into consideration unknown values for the sum list, run the alternative pipeline as follows
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": "$venueList.sum.name",
"count": { "$sum": "$venueList.sum.value" }
}
},
{
"$group": {
"_id": null,
"counts": {
"$push": {
"name": "$_id",
"count": "$count"
}
}
}
}
])
I'm having multiple documents in a collection, each document has this data structure :
{
_id: "some object id",
data1: [
{
data2_id : 13233,
data2: [
{
sub_data1: "text1",
sub_data2: "text2",
sub_data3: "text3",
},
{
sub_data1: "text4",
sub_data2: "text5",
sub_data3: "text6",
}
]
},
{
data2_id : 53233,
data2: [
{
sub_data1: "text4",
sub_data2: "text5",
sub_data3: "text6",
}
...
]
},
{
data2_id : 56233,
data2: [
{
sub_data1: "text7",
sub_data2: "text8",
sub_data3: "text9",
}
...
]
},
{
data2_id : 53236,
data2: [
{
sub_data1: "text10",
sub_data2: "text22",
sub_data3: "text33",
}
...
]
}
]
}
I'd like to update to a set of ids that maches some condition, update only the sub object within the document.
I've tries this:
db.collection.update({
"$and": [
{
"_id": {
"$in": [
{
"$id": "54369aca9bc25af3ca8b4568"
},
{
"$id": "54369aca9bc25af3ca8b4562"
}
]
}
},
{
"data1.data2": {
"$elemMatch": {
"sub_data1": "text4",
"sub_data2": "text5"
}
}
}
]
},
{
"data1.data2.$.sub_data3" : "text updated"
}
)
But I get the following error:
Update of data into MongoDB failed: dev.**.com:27017: cannot use the part (data2 of data1.data2.0.sub_data3) to traverse the element...
Any Ideas?
There is an open issue here that imposes a limitation when trying to update elements of an array nested within another array.
Besides, there are some improvements you can do here:
For your query you don't need the $and
db.collection.update(
{
"_id": {
"$in": [
{"$id": "54369aca9bc25af3ca8b4568"},
{"$id": "54369aca9bc25af3ca8b4562"}
]},
"data1.data2": {
"$elemMatch": {
"sub_data1": "text4",
"sub_data2": "text5"
}
},{..update...})
You might want to use $set:
db.collection.update(query,{ $set:{"name": "Mike"} })
Otherwise, you might lose the rest of the data within your document.