Can this JSON with "timestamp" : Double format be aggregated for SUM, AVG in MongoDB - json

I have imported the above json data into it's own collection on mongoDB database. I'm trying to aggregate the values (ie 40, 30, 30) and SUM and AVG them as they reside in the inner most embedded document. I'm having a problem doing this when I try using dot notation and can not get any vaules. I feel the unique timestamps (ie 1567544426000, 1567541464000, 1567541475000) are a problem. Is this json file formatted correctly for aggregation and how would I do so. Thanks for any help or if you can even point me in the right direction where I can find out how to do SUM, AVG etc to the data.
I've tried use NoSQLBooster and Query ASsist for MongoDB
{
"Barcode": "97-1908-577-1032-BE1-332",
"IP": "192.162.656.111",
"VFD": {
"CurrentPV": {
"Type": "Speed",
"Data": {
"1567544426000": 40,
"1567541464000": 30
"1567541475000": 30
}
},
"CurrentSP": {
"Type": "Speed",
"Data": {
"1567544426000": 55,
"1567541464000": 5
"1567541488000": 10
}
},
"Program_Running": {
"Type": "Active",
"Data": {
"1567544426000": 1,
"1567541464000": 0
"1567541475000": 3
}
}
},
"Equipment": "PieceOfEquipment",
"Location": "Garage",
"RunEnd": "NA",
"RunStart": 1533541438
}
I can't seem to reach the values even when I use dot notation down to the "Data" branch object (ie Equipment.VFD.CurrentPV.Data) but no result sets are returned.

We can convert the VFD.CurrentPV.Data into an array of key-value pairs using $objectToArray and then perform SUM and AVG on the values itself.
The following query can get us the expected output:
db.collection.aggregate([
{
$addFields:{
"data":{
$objectToArray: "$VFD.CurrentPV.Data"
}
}
},
{
$project:{
"sum":{
$sum:"$data.v"
},
"avg":{
$avg:"$data.v"
}
}
}
]).pretty()
Data set:
{
"_id" : ObjectId("5d830f3afb35a835fbd8638e"),
"Barcode" : "97-1908-577-1032-BE1-332",
"IP" : "192.162.656.111",
"VFD" : {
"CurrentPV" : {
"Type" : "Speed",
"Data" : {
"1567544426000" : 40,
"1567541464000" : 30,
"1567541475000" : 30
}
},
"CurrentSP" : {
"Type" : "Speed",
"Data" : {
"1567544426000" : 55,
"1567541464000" : 5,
"1567541488000" : 10
}
},
"Program_Running" : {
"Type" : "Active",
"Data" : {
"1567544426000" : 1,
"1567541464000" : 0,
"1567541475000" : 3
}
}
},
"Equipment" : "PieceOfEquipment",
"Location" : "Garage",
"RunEnd" : "NA",
"RunStart" : 1533541438
}
Output:
{
"_id" : ObjectId("5d830f3afb35a835fbd8638e"),
"sum" : 100,
"avg" : 33.333333333333336
}

Related

Retrieve item list by checking multiple attribute values in MongoDB in golang

This question based on MongoDB,How to retrieve selected items retrieve by selecting multiple condition.It is like IN condition in Mysql
SELECT * FROM venuelist WHERE venueid IN (venueid1, venueid2)
I have attached json data structure that I have used.[Ref: JSON STRUCTUE OF MONGODB ].
As an example, it has a venueList then inside the venue list, It has several attribute venue id and sum of user agents name and total count as value.user agents mean user Os,browser and device information. In this case I used os distribution.In that case i was count linux,ubuntu count on particular venueid.
it is like that,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
Finally I want to get count of all linux user count by selecting venueid list in one find query in MongoDB.
As example, I want to select all count of linux users by conditioning if venue id VID1212 or VID4343
Ref: JSON STRUCTUE OF MONGODB
{
"_id" : ObjectId("57f940c4932a00aba387b0b0"),
"tenantID" : 1,
"date" : "2016-10-09 00:23:56",
"venueList" : [
{
"id" : “VID1212”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
“ssidList” : [ // this is list of ssid’s in venue
{
"id" : “SSID1212”,
"sum" : [
{
"name" : "linux",
"value" : 8
},
{
"name" : "ubuntu",
"value" : 6
}
],
“macList” : [ // this is mac list inside particular ssid ex: this is mac list inside the SSID1212
{
"id" : “12:12:12:12:12:12”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 1
}
]
}
]
}
]
},
{
"id" : “VID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"ssidList" : [
{
"id" : “SSID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"macList" : [
{
"id" : “43:43:43:43:43:34”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
]
}
]
}
]
}
]
}
I am using golang as language to manipulation data with mongoldb using mgo.v2 package
expected out put is :
output
linux : 12+2 = 14
ubuntu : 4+0 = 4
Don't consider inner list in venuelist.
You'd need to use the aggregation framework where you would run an aggregation pipeline that first filters the documents in the collection based on
the venueList ids using the $match operator.
The second pipeline would entail flattening the venueList and sum subdocument arrays in order for the data in the documents to be processed further down the pipeline as denormalised entries. The $unwind operator is useful here.
A further filter using $match is necessary after unwinding so that only the documents you want to aggregate are allowed into the next pipeline.
The main pipeline would be the $group operator stage which aggregates the filtered documents to create the desired sums using the accumulator operator $sum. For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value.
Putting this altogether, consider running the following pipeline:
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": null,
"linux": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "linux" ] },
"$venueList.sum.value", 0
]
}
},
"ubuntu": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "ubuntu" ] },
"$venueList.sum.value", 0
]
}
}
}
}
])
For usage with mGo, you can convert the above pipeline using the guidance in http://godoc.org/labix.org/v2/mgo#Collection.Pipe
For a more flexible and better performant alternative which executes much faster than the above, and also takes into consideration unknown values for the sum list, run the alternative pipeline as follows
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": "$venueList.sum.name",
"count": { "$sum": "$venueList.sum.value" }
}
},
{
"$group": {
"_id": null,
"counts": {
"$push": {
"name": "$_id",
"count": "$count"
}
}
}
}
])

Custom analyzer appearing in type mapping but not working in Elasticsearch

I'm trying to add a custom analyzer to my index while also mapping that analyzer to a property on a type. Here is my JSON object for doing this:
{ "settings" : {
"analysis" : {
"analyzer" : {
"test_analyzer" : {
"type" : "custom",
"tokenizer": "standard",
"filter" : ["lowercase", "asciifolding"],
"char_filter": ["html_strip"]
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"checkanalyzer" : {
"type" : "string",
"analyzer" : "test_analyzer"
}
}
}
}
}
I know this analyzer works because I've tested it using /wp2/_analyze?analyzer=test_analyzer -d '<p>Testing analyzer.</p>' and also it shows up as the analyzer for the checkanalyzer property when I check /wp2/test/_mapping. However, if I add a document like {"checkanalyzer": "<p>The tags should not show up</p>"}, the HTML tags don't get stripped out when I retrieve the document using the _search endpoint. Am I misunderstanding how the mapping works or is there something wrong with my JSON object? I'm dynamically creating the wp2 index and also the test type when I make this call to Elasticsearch, not sure if that matters.
The html doesn't get removed from the source, it gets removed from the terms generated by that source. You can see this if you use a terms aggregation:
POST /test_index/_search
{
"aggs": {
"checkanalyzer_field_terms": {
"terms": {
"field": "checkanalyzer"
}
}
}
}
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"checkanalyzer": "<p>The tags should not show up</p>"
}
}
]
},
"aggregations": {
"checkanalyzer_field_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "not",
"doc_count": 1
},
{
"key": "should",
"doc_count": 1
},
{
"key": "show",
"doc_count": 1
},
{
"key": "tags",
"doc_count": 1
},
{
"key": "the",
"doc_count": 1
},
{
"key": "up",
"doc_count": 1
}
]
}
}
}
Here's some code I used to test it:
http://sense.qbox.io/gist/2971767aa0f5949510fa0669dad6729bbcdf8570
Now if you want to completely strip out the html prior to indexing and storing the content as is, you can use the mapper attachment plugin - in which when you define the mapping, you can categorize the content_type to be "html."
The mapper attachment is useful for many things especially if you are handling multiple document types, but most notably - I believe just using this for the purpose of stripping out the html tags is sufficient enough (which you cannot do with the html_strip char filter).
Just a forewarning though - NONE of the html tags will be stored. So if you do need those tags somehow, I would suggest defining another field to store the original content. Another note: You cannot specify multifields for mapper attachment documents, so you would need to store that outside of the mapper attachment document. See my working example below.
You'll need to result in this mapping:
{
"html5-es" : {
"aliases" : { },
"mappings" : {
"document" : {
"properties" : {
"delete" : {
"type" : "boolean"
},
"file" : {
"type" : "attachment",
"fields" : {
"content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"author" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
},
"language" : {
"type" : "string"
}
}
},
"hash_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"raw_content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "raw"
},
"title" : {
"type" : "string"
}
}
}
},
"settings" : { //insert your own settings here },
"warmers" : { }
}
}
Such that in NEST, I will assemble the content as such:
Attachment attachment = new Attachment();
attachment.Content = Convert.ToBase64String(File.ReadAllBytes("path/to/document"));
attachment.ContentType = "html";
Document document = new Document();
document.File = attachment;
document.RawContent = InsertRawContentFromString(originalText);
I have tested this in Sense - results are as follows:
"file": {
"_content": "PGh0bWwgeG1sbnM6TWFkQ2FwPSJodHRwOi8vd3d3Lm1hZGNhcHNvZnR3YXJlLmNvbS9TY2hlbWFzL01hZENhcC54c2QiPg0KICA8aGVhZCAvPg0KICA8Ym9keT4NCiAgICA8aDE+VG9waWMxMDwvaDE+DQogICAgPHA+RGVsZXRlIHRoaXMgdGV4dCBhbmQgcmVwbGFjZSBpdCB3aXRoIHlvdXIgb3duIGNvbnRlbnQuIENoZWNrIHlvdXIgbWFpbGJveC48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+YXNkZjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD4xMDwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5MYXZlbmRlci48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+MTAvNiAxMjowMzwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD41IDA5PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPjExIDQ3PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPkhhbGxvd2VlbiBpcyBpbiBPY3RvYmVyLjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5qb2c8L3A+DQogIDwvYm9keT4NCjwvaHRtbD4=",
"_content_length": 0,
"_content_type": "html",
"_date": "0001-01-01T00:00:00",
"_title": "Topic10"
},
"delete": false,
"raw_content": "<h1>Topic10</h1><p>Delete this text and replace it with your own content. Check your mailbox.</p><p> </p><p>asdf</p><p> </p><p>10</p><p> </p><p>Lavender.</p><p> </p><p>10/6 12:03</p><p> </p><p>5 09</p><p> </p><p>11 47</p><p> </p><p>Halloween is in October.</p><p> </p><p>jog</p>"
},
"highlight": {
"file.content": [
"\n <em>Topic10</em>\n\n Delete this text and replace it with your own content. Check your mailbox.\n\n  \n\n asdf\n\n  \n\n 10\n\n  \n\n Lavender.\n\n  \n\n 10/6 12:03\n\n  \n\n 5 09\n\n  \n\n 11 47\n\n  \n\n Halloween is in October.\n\n  \n\n jog\n\n "
]
}

Query for : How many elements of an array are matching within a string in mongoDb

Suppose my JSON is like following:
{ "id":0,"keywords":"amount,debited,account,ticket,not,generated,now" }
{ "id":1,"keywords":"how,safe,gocash" }
{ "id":2,"keywords":"how,referral,program,gocash,works" }
If my array is like
array =["how","safe","gocash"];
then how do I get the count that while checking with first; count should be zero, with second three and with third two. (That means how many elements of an array are present in the string)
Is it possible or what approach I should adopt?
One way of solving this would require some form of modification to your schema by adding an extra field that holds the keywords in an array. This field becomes quite handy when running an aggregation pipeline to return the desired count of elements of an array that match the original string.
To add the additional field you would need the Bulk API operations to update the collection as follows:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({"keywords": { "$exists": true, "$type": 2 }}).forEach(function(doc) {
var keywordsArray = doc.keywords.split(',');
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "keywordsArray": keywordsArray }
});
count++;
if (count % 100 == 0) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if (count % 100 != 0) { bulk.execute(); }
The above creates an additional field "keywordsArray" that is a result of splitting the keywords string to an array.
After the operation your sample collection would have the documents:
/* 0 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"keywordsArray" : [
"amount",
"debited",
"account",
"ticket",
"not",
"generated",
"now"
]
}
/* 1 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"keywordsArray" : [
"how",
"safe",
"gocash"
]
}
/* 2 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"keywordsArray" : [
"how",
"referral",
"program",
"gocash",
"works"
]
}
On to the next stage, the aggregation framework pipeline, run the following pipeline operation which uses the $let, $size and $setIntersection operators to work out the the desired count result:
var array = ["how","safe","gocash"];
db.collection.aggregate([
{
"$project": {
"id": 1, "keywords": 1,
"count": {
"$let": {
"vars": {
"commonToBoth": { "$setIntersection": [ "$keywordsArray", array ] }
},
"in": { "$size": "$$commonToBoth" }
}
}
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"count" : 0
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"count" : 3
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"count" : 2
}
],
"ok" : 1
}

Nested filter numerical range

I have the following json object:
{
"Title": "Terminator,
"Purchases": [
{"Country": "US", "Site": "iTunes", "Price": 4.99},
{"Country": "FR", "Site": "Google", "Price": 5.99}
]
}
I want to be able to find an object specifying a Country+Site+PriceRange. For example, the above should return True on Country=US&Price<5.00, but should return False on Country=FR&Price<5.00. How would the index and query look to do this? Here is another answer that this is a follow-up question to: Search within array object.
Simply add a Range query to your Bool query logic tree. This will return documents that match US for country and have the Price field with a numeric value less than 5.
{ "query":
{ "nested" : {
"path" : "Purchases",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"Purchases.Country" : "US"}
},
{
"range" : "Purchases.Price":
{
"lte": 5
}
}
]
}
}
}
}
}

Count links in Arrays in MongoDB collection

I have a collection with objects, which are linking to other objects in the array:
{
"_id" : ObjectId("53f75bedc5489f86666d305e"),
"id" : "2",
"links_to" : [
{
"id" : 1,
"label" : null,
},
{
"id" : 3,
"label" : null,
},
{
"id" : 60,
"label" : null,
},
{
"id" : 23,
"label" : null,
},
},
{
"_id" : ObjectId("53f75bedc5489f86666d305e"),
"id" : "3",
"links_to" : [
{
"id" : 4,
"label" : null,
},
{
"id" : 8,
"label" : null,
},
{
"id" : 23,
"label" : null,
},
{
"id" : 2,
"label" : null,
},
},
...
Now I would like to write a query, which gives as an output for each id the number of links. Eg.:
{"id": 1, "numberOfLinks": 21},
{"id": 2, "numberOfLinks": 15},
...
Thanks in advance.
The best approach is to keep the count on the document and update it when you either $push or $pull elements of the array using the $inc operator. In this way the field is maintained on the document itself:
{
"links_to": [],
"linkCount": 0
}
When you "push"
db.collecction.update(
{},
{ "$push": { "links_to": newLink }, "$inc": { "linkCount": 1 } }
)
And "pull":
db.collecction.update(
{},
{ "$pull": { "links_to": newLink }, "$inc": { "linkCount": -1 } }
)
Without doing this, you can use the $size operator from the aggregation framework in Mondern MongoDB to get the array length:
db.collection.aggregate([
{ "$project": {
"numberOfLinks": { "$size": "$link_count" }
}}
])
Or in versions prior to MongoDB 2.6 you can count the array members after $unwind and $group:
db.collection.aggregate([
{ "$unwind": "$link_count" },
{ "$group": {
"_id": "$id",
"numberOfLinks": { "$sum": 1 }
}}
])
So usually unless you want something specifically "dynamic" then just maintain the count on the document. This avoids the overhead of calculation when you query.
Actually this is fairly simple to achieve using aggregation:
db.foo.aggregate([
{$unwind: "$links_to" },
{$group: { _id: {"lti":"$links_to.id"}, numberOfLinks: {$sum: 1} } },
{$project: { _id:0, id: "$_id.lti", numberOfLinks: "$numberOfLinks" } }
])
produces the desired output, though in reversed order of fields, at least in the shell output:
{ "numberOfLinks" : 3, "id" : 3 }
{ "numberOfLinks" : 3, "id" : 2 }
{ "numberOfLinks" : 1, "id" : 5 }
{ "numberOfLinks" : 2, "id" : 4 }
{ "numberOfLinks" : 3, "id" : 1 }
If you can live with an output like:
{ "_id" : { "linksToId" : 3 }, "numberOfLinks" : 3 }
{ "_id" : { "linksToId" : 2 }, "numberOfLinks" : 3 }
{ "_id" : { "linksToId" : 5 }, "numberOfLinks" : 1 }
{ "_id" : { "linksToId" : 4 }, "numberOfLinks" : 2 }
{ "_id" : { "linksToId" : 1 }, "numberOfLinks" : 3 }
you can skip the $project step of the aggregation pipeline.
This is extremely efficient. I did a test basically doing the same thing over a collection of 5M documents with roughly 17M relations. Takes 18 seconds on a not exactly high performance server.