Generating Mongo query from MySQL query - mysql

I have been using the following MySQL command to construct a heatmap from log data. However, I have a new data set that is stored in a Mongo database and I need to run the same command.
select concat(a.packages '&' b.packages) "Concurrent Packages",
count(*) "Count"
from data a
cross join data b
where a.packages<b.packages and a.jobID=b.jobID
group by a.packages, b.packages
order by a.packages, b.packages;
Keep in mind that the tables a and b do not exist prior to the query. However, they are created from the packages column of the data table, which has jobID as the field which I want to check for matches. In other words if two packages are within the same job I want to add an entry to the concurrent usage count. How can I generate a similar query in Mongo?

This is not a "join" of different documents; it is an operation within one document, and can be done in MongoDB.
You have a SQL TABLE "data" like this:
JobID TEXT,
package TEXT
The best way to store this in MongoDB will be a collection called "data", containing one document per JobID that contains an array of packages:
{
_id: <JobID>,
packages: [
"packageA",
"packageB",
....
]
}
[ Note: you could also implement your data table as only one document in MongoDB, containing an array of jobs which contain each an array of packages. This is not recommended, because you might hit the 16MB document size limit and nested arrays are not (yet) well supported by different queries - if you want to use the data for other purposes as well ]
Now, how to get a result like this ?
{ pair: [ "packageA", "packageB" ], count: 20 },
{ pair: [ "packageA", "packageC" ], count: 11 },
...
As there is no built-in "cross join" of two arrays in MongoDB, you'll have to program it out in the map function of a mapReduce(), emitting each pair of packages as a key:
mapf = function () {
that = this;
this.packages.forEach( function( p1 ) {
that.packages.forEach( function( p2 ) {
if ( p1 < p2 ) {
key = { "pair": [ p1, p2 ] };
emit( key, 1 );
};
});
});
};
[ Note: this could be optimized, if the packages arrays were sorted ]
The reduce function is nothing more than summing up the counters for each key:
reducef = function( key, values ) {
count = 0;
values.forEach( function( value ) { count += value } );
return count;
};
So, for this example collection:
> db.data.find()
{ "_id" : "Job01", "packages" : [ "pA", "pB", "pC" ] }
{ "_id" : "Job02", "packages" : [ "pA", "pC" ] }
{ "_id" : "Job03", "packages" : [ "pA", "pB", "pD", "pE" ] }
we get the following result:
> db.data.mapReduce(
... mapf,
... reducef,
... { out: 'pairs' }
... );
{
"result" : "pairs",
"timeMillis" : 443,
"counts" : {
"input" : 3,
"emit" : 10,
"reduce" : 2,
"output" : 8
},
"ok" : 1,
}
> db.pairs.find()
{ "_id" : { "pair" : [ "pA", "pB" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pC" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pA", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pC" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pD", "pE" ] }, "value" : 1 }
For more information on mapReduce consult: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/ and http://docs.mongodb.org/manual/applications/map-reduce/

You can't. Mongo doesn't do joins. Switching from SQL to Mongo is a lot more involved than migrating your queries.
Typically, you would include all the pertinent information in the same record (rather than normalize the information and select it with a join). Denormalize!

Related

Mongolite group by/aggregate on JSON object

I have a json document like this on my mongodb collection:
Updated document:
{
"_id" : ObjectId("59da4aef8c5d757027a5a614"),
"input" : "hi",
"output" : "Hi. How can I help you?",
"intent" : "[{\"intent\":\"greeting\",\"confidence\":0.8154089450836182}]",
"entities" : "[]",
"context" : "{\"conversation_id\":\"48181e58-dd51-405a-bb00-c875c01afa0a\",\"system\":{\"dialog_stack\":[{\"dialog_node\":\"root\"}],\"dialog_turn_counter\":1,\"dialog_request_counter\":1,\"_node_output_map\":{\"node_5_1505291032665\":[0]},\"branch_exited\":true,\"branch_exited_reason\":\"completed\"}}",
"user_id" : "50001",
"time_in" : ISODate("2017-10-08T15:57:32.000Z"),
"time_out" : ISODate("2017-10-08T15:57:35.000Z"),
"reaction" : "1"
}
I need to perform group by on intent.intent field and I'm using Rstudio with mongolite library.
What I have tried is :
pp = '[{"$unwind": "$intent"},{"$group":{"_id":"$intent.intent", "count": {"$sum":1} }}]'
stats <- chat$aggregate(
pipeline=pp,
options = '{"allowDiskUse":true}'
)
print(stats)
But it's not working, output for above code is
_id count
1 NA 727
If intent attribute type is string and keep the object as string.
We can split it to array with \" and use third item of array.
db.getCollection('test1').aggregate([
{ "$project": { intent_text : { $arrayElemAt : [ { $split: ["$intent", "\""] } ,3 ] } } },
{ "$group": {"_id": "$intent_text" , "count": {"$sum":1} }}
])
Result:
{
"_id" : "greeting",
"count" : 1.0
}

Retrieve item list by checking multiple attribute values in MongoDB in golang

This question based on MongoDB,How to retrieve selected items retrieve by selecting multiple condition.It is like IN condition in Mysql
SELECT * FROM venuelist WHERE venueid IN (venueid1, venueid2)
I have attached json data structure that I have used.[Ref: JSON STRUCTUE OF MONGODB ].
As an example, it has a venueList then inside the venue list, It has several attribute venue id and sum of user agents name and total count as value.user agents mean user Os,browser and device information. In this case I used os distribution.In that case i was count linux,ubuntu count on particular venueid.
it is like that,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
Finally I want to get count of all linux user count by selecting venueid list in one find query in MongoDB.
As example, I want to select all count of linux users by conditioning if venue id VID1212 or VID4343
Ref: JSON STRUCTUE OF MONGODB
{
"_id" : ObjectId("57f940c4932a00aba387b0b0"),
"tenantID" : 1,
"date" : "2016-10-09 00:23:56",
"venueList" : [
{
"id" : “VID1212”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
“ssidList” : [ // this is list of ssid’s in venue
{
"id" : “SSID1212”,
"sum" : [
{
"name" : "linux",
"value" : 8
},
{
"name" : "ubuntu",
"value" : 6
}
],
“macList” : [ // this is mac list inside particular ssid ex: this is mac list inside the SSID1212
{
"id" : “12:12:12:12:12:12”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 1
}
]
}
]
}
]
},
{
"id" : “VID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"ssidList" : [
{
"id" : “SSID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"macList" : [
{
"id" : “43:43:43:43:43:34”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
]
}
]
}
]
}
]
}
I am using golang as language to manipulation data with mongoldb using mgo.v2 package
expected out put is :
output
linux : 12+2 = 14
ubuntu : 4+0 = 4
Don't consider inner list in venuelist.
You'd need to use the aggregation framework where you would run an aggregation pipeline that first filters the documents in the collection based on
the venueList ids using the $match operator.
The second pipeline would entail flattening the venueList and sum subdocument arrays in order for the data in the documents to be processed further down the pipeline as denormalised entries. The $unwind operator is useful here.
A further filter using $match is necessary after unwinding so that only the documents you want to aggregate are allowed into the next pipeline.
The main pipeline would be the $group operator stage which aggregates the filtered documents to create the desired sums using the accumulator operator $sum. For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value.
Putting this altogether, consider running the following pipeline:
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": null,
"linux": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "linux" ] },
"$venueList.sum.value", 0
]
}
},
"ubuntu": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "ubuntu" ] },
"$venueList.sum.value", 0
]
}
}
}
}
])
For usage with mGo, you can convert the above pipeline using the guidance in http://godoc.org/labix.org/v2/mgo#Collection.Pipe
For a more flexible and better performant alternative which executes much faster than the above, and also takes into consideration unknown values for the sum list, run the alternative pipeline as follows
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": "$venueList.sum.name",
"count": { "$sum": "$venueList.sum.value" }
}
},
{
"$group": {
"_id": null,
"counts": {
"$push": {
"name": "$_id",
"count": "$count"
}
}
}
}
])

Query for : How many elements of an array are matching within a string in mongoDb

Suppose my JSON is like following:
{ "id":0,"keywords":"amount,debited,account,ticket,not,generated,now" }
{ "id":1,"keywords":"how,safe,gocash" }
{ "id":2,"keywords":"how,referral,program,gocash,works" }
If my array is like
array =["how","safe","gocash"];
then how do I get the count that while checking with first; count should be zero, with second three and with third two. (That means how many elements of an array are present in the string)
Is it possible or what approach I should adopt?
One way of solving this would require some form of modification to your schema by adding an extra field that holds the keywords in an array. This field becomes quite handy when running an aggregation pipeline to return the desired count of elements of an array that match the original string.
To add the additional field you would need the Bulk API operations to update the collection as follows:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({"keywords": { "$exists": true, "$type": 2 }}).forEach(function(doc) {
var keywordsArray = doc.keywords.split(',');
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "keywordsArray": keywordsArray }
});
count++;
if (count % 100 == 0) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if (count % 100 != 0) { bulk.execute(); }
The above creates an additional field "keywordsArray" that is a result of splitting the keywords string to an array.
After the operation your sample collection would have the documents:
/* 0 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"keywordsArray" : [
"amount",
"debited",
"account",
"ticket",
"not",
"generated",
"now"
]
}
/* 1 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"keywordsArray" : [
"how",
"safe",
"gocash"
]
}
/* 2 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"keywordsArray" : [
"how",
"referral",
"program",
"gocash",
"works"
]
}
On to the next stage, the aggregation framework pipeline, run the following pipeline operation which uses the $let, $size and $setIntersection operators to work out the the desired count result:
var array = ["how","safe","gocash"];
db.collection.aggregate([
{
"$project": {
"id": 1, "keywords": 1,
"count": {
"$let": {
"vars": {
"commonToBoth": { "$setIntersection": [ "$keywordsArray", array ] }
},
"in": { "$size": "$$commonToBoth" }
}
}
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"count" : 0
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"count" : 3
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"count" : 2
}
],
"ok" : 1
}

Finding JSON objects in mongoDB

I'm trying to find objects using the built it queries and It just doesn't work..
My JSON file is something like this:
{ "Text1":
{
"id":"2"
},
"Text2":
{
"id":"2,3"
},
"Text3":
{
"id":"1"
}
}
And I write this db.myCollection.find({"id":2})
And it doesn't find anything.
When I write db.myCollection.find() it shows all the data as it should.
Anyone knows how to do it correctly?
Its hard to change the data-structure but as you want just your matching sub-document and you don't know where is your target sub-document (for example the query should be on Text1 or Text2 , ...) there is a good data structure for this:
{
"_id" : ObjectId("548dd9261a01c68fab8d67d7"),
"pair" : [
{
"id" : "2",
"key" : "Text1"
},
{
"id" : [
"2",
"3"
],
"key" : "Text2"
},
{
"id" : "1",
"key" : "Text3"
}
]
}
and your query is:
db.myCollection.findOne({'pair.id' : "2"} , {'pair.$':1, _id : -1}).pair // there is better ways (such as aggregation instead of above query)
as result you will have:
{
"0" : {
"id" : "2",
"key" : "Text1"
}
}
Update 1 (newbie way)
If you want all the document not just one use this
var result = [];
db.myCollection.find({'pair.id' : "2"} , {'pair.$':1, _id : -1}).forEach(function(item)
{
result.push(item.pair);
});
// the output will be in result
Update 2
Use this query to get all sub-documents
db.myCollection.aggregate
(
{ $unwind: '$pair' },
{ $match : {'pair.id' : "2"} }
).result
it produce output as
{
"0" : {
"_id" : ObjectId("548deb511a01c68fab8d67db"),
"pair" : {
"id" : "2",
"key" : "Text1"
}
},
"1" : {
"_id" : ObjectId("548deb511a01c68fab8d67db"),
"pair" : {
"id" : [
"2",
"3"
],
"key" : "Text2"
}
}
}
Since your are query specify a field in a subdocument this is what will work. see .find() documentation.
db.myCollection.find({"Text1.id" : "2"}, {"Text1.id": true})
{ "_id" : ObjectId("548dd798e2fa652e675af11d"), "Text1" : { "id" : "2" } }
If the query is on "Text1" or "Text2" the best thing to do here as mention in the accepted answer is changing you document structure. This can be easily done using the "Bulk" API.
var bulk = db.mycollection.initializeOrderedBulkOp(),
count = 0;
db.mycollection.find().forEach(function(doc) {
var pair = [];
for(var key in doc) {
if(key !== "_id") {
var id = doc[key]["id"].split(/[, ]/);
pair.push({"key": key, "id": id});
}
}
bulk.find({"_id": doc._id}).replaceOne({ "pair": pair });
count++; if (count % 300 == 0){
// Execute per 300 operations and re-Init
bulk.execute();
bulk = db.mycollection.initializeOrderedBulkOp();
}
})
// Clean up queues
if (count % 300 != 0 )
bulk.execute();
Your document now look like this:
{
"_id" : ObjectId("55edddc6602d0b4fd53a48d8"),
"pair" : [
{
"key" : "Text1",
"id" : [
"2"
]
},
{
"key" : "Text2",
"id" : [
"2",
"3"
]
},
{
"key" : "Text3",
"id" : [
"1"
]
}
]
}
Running the following query:
db.mycollection.aggregate([
{ "$project": {
"pair": {
"$setDifference": [
{ "$map": {
"input": "$pair",
"as": "pr",
"in": {
"$cond": [
{ "$setIsSubset": [ ["2"], "$$pr.id" ]},
"$$pr",
false
]
}
}},
[false]
]
}
}}
])
returns:
{
"_id" : ObjectId("55edddc6602d0b4fd53a48d8"),
"pair" : [
{
"key" : "Text1",
"id" : [
"2"
]
},
{
"key" : "Text2",
"id" : [
"2",
"3"
]
}
]
}

mongo db remove json objects

I have a mongo json object as follows
{
"_id" : new BinData(3, "RDHABb22XESWvP83FplqJw=="),
"name" : "NEW NODE",
"host" : null,
"aet" : null,
"studies" : ["1.3.12.2.1107.5.99.3.30000008061114424970500000589"],
"testcases" : [new BinData(3, "Zhl+zIXomkqAd8NIkRiTjQ==")],
"sendentries" : [{
"_id" : "1.3.12.2.1107.5.99.3.30000008061114424970500000589",
"Index" : 0,
"Type" : "Study"
}, {
"_id" : "cc7e1966-e885-4a9a-8077-c3489118938d",
"Index" : 1,
"Type" : "TestCase"
}]
}
The fields "Studies" and "TestCases" are now obsolete and I am now storing that information in a new field called SendEntries. I would like to get rid of the Studies and TestCases from the old entries and unmap those fields going forward. I want to know how I can update my current collections to get rid of the Studies and TestCases fields.
I'm just few weeks into Mongo.
You can use the $unset operator with update.
db.collection.update({},
{ $unset: {
"studies": "",
"testcases": ""
},
{ "upsert": false, "muti": true }
)
And that will remove all of the fields from all of your documents in your collection
Use $unset, there's a manual page e.g.
db.yourCollection.update( { },
{ $unset: {
Studies: "",
testcases: ""
}
},
{ multi: true }
)