Expressing this SQL as Mongo Query - mysql

I want to figure out the most active users on my site.
I have records of the form
{
"_id" : "db1855b0-f2f4-44eb-9dbb-81e27780c796",
"createdAt" : 1360497266621,
"profile" : { "name" : "test" },
"services" : { "resume":
{ "loginTokens" : [{
"token" : "82c01cb8-796a-4765-9366-d07c98c64f4d",
"when" : 1360497266624
},
{
"token" : "0e4bc0a4-e139-4804-8527-c416fb20f6b1",
"when" : 1360497474037
} ]
},
"twitter" : {
"accessToken" : "9314Sj9kKvSyosxTWPY5r57851C2ScZBCe",
"accessTokenSecret" : "UiDcJfOfjH7g9UiBEOBs",
"id" : 2933049,
"screenName" : "testname"
}
}
}
I want to be able to select users and order by the number of loginTokens.
In MySQL it would be something like:
SELECT id, COUNT(logins) AS logins
FROM users
GROUP BY id ORDER BY logins DESC
I've tried this on querymongo.com and i got an error (can't work with aliases/ cant order by non-column names)
What's the Mongo way to do this?
Thanks!

I just converted:
SELECT id, COUNT(logins)
FROM users
GROUP BY id
To:
db.users.group({
"key": {
"id": true
},
"initial": {
"countlogins": 0
},
"reduce": function(obj, prev) {
prev.countlogins++;
}
});
Hope this helps

Here is an example of what you said using the aggregation framework:
db.users.aggregate([
{$unwind: '$services.resume.loginTokens'},
{$group: {_id: '$_id', logins: {$sum: 1}}},
{$sort: {logins: -1}}
])
This should do the trick.

Related

Sort or Order by specific values with Elasticsearch 5

Trying to figure out how to sort elasticsearch results so that fields with specific values always show first. In this case, I want specific SKUs to show first when showing category pages (I'm using bool query to generate elasticsearch results for category pages).
If I were trying to accomplish this with MySQL, I'd use the case statement:
ORDER BY CASE sku
WHEN 'sku1' then 1 WHEN 'sku2' then 2 WHEN 'sku3' then 3 ELSE 4 END
This query executes:
{
"sort" : [
{
"_script": {
"type": "number",
"script": {
"inline" : "params.sortOrder.indexOf(doc['skuid_text'].value)",
"params": {
"sortOrder": [
"SKUID1",
"SKUID2",
"SKUID3"
]
}
},
"order": "asc"
}
}
],
"query" :{
"bool" : {
"must" : [
{
"term" : {
"category_codes" : "CATEGORY1"
}
}
]
}
}
}
But it's returning "-1" as the sort value for all records, eg:
sort": [
-1
]
Note: 'skuid_text' is the SKU field I have analyzed as "keyword" type. I have tried both doc['skuid_text'].value and doc['skuid_text'] And I have verified that SKUs in the "sortOrder" array are definitely included in the result set.
What am I missing? Or, is there a completely different way to approach the problem?
Actually the original query DOES kind of work, it's just that the way I was ordering with "asc" was pushing all the skus in the sortOrder array to the end.
If you reverse the order of the skus in sortOrder and do some math on it, it will sort correctly. Kind of hack-y though. I'd love to know if there's a better way anyone can think of.
{
"sort" : [
{
"_script": {
"type": "number",
"script": {
"inline" : "(9999)-params.sortOrder.indexOf(doc['skuid_text'].value)",
"params": {
"sortOrder": [
"SKUID3",
"SKUID2",
"SKUID1"
]
}
},
"order": "asc"
}
}
],
"query" :{
"bool" : {
"must" : [
{
"term" : {
"category_codes" : "CATEGORY1"
}
}
]
}
}
}
This plugin did what you need for <5 versions. You could give it a try on your installation as is, or see what the author has to say about a 5.x update.
https://github.com/jprante/elasticsearch-functionscore-conditionalboost

Firebase how to make this kind of query in Swift

I have this JSON structure:
{
"groups" : {
"-KBxo9-RoY0eowWKeHkU" : {
"author" : "rsenov",
"members" : {
"-KBxo7ZU6McsmDOxyias" : true,
"-KBxo8_TUTW6NZze6xcd" : true,
"rsenov" : true
},
"name" : "Prueba 3"
}
},
"users" : {
"-KBxo7ZU6McsmDOxyias" : {
"avatar" : "owl2",
"groups" : {
"-KBxo9-RoY0eowWKeHkU" : true
},
"isUser" : false,
"name" : "Pepa"
},
"-KBxo8_TUTW6NZze6xcd" : {
"avatar" : "monkey",
"groups" : {
"-KBxo9-RoY0eowWKeHkU" : true
},
"isUser" : false,
"name" : "Lolas"
},
"rsenov" : {
"avatar" : "guest",
"groups" : {
"-KBxo9-RoY0eowWKeHkU" : true
},
"isUser" : true,
"name" : "Ruben",
}
}
}
and the security&rules file is:
{
"rules": {
".read": true,
".write": true,
"users": {
".indexOn": ["email", "groups"]
},
"groups": {
".indexOn": ["author", "name"]
}
}
}
I'm trying to run a query in order to get the ChildChanged snapshot:
DataService.dataService.USERS_REF.queryOrderedByChild("groups").queryEqualToValue(currentGroup.groupKey).observeEventType(.ChildChanged, withBlock: {snapshot in
print(snapshot)
})
DataService.dataService.USERS_REFcorresponds to the url that point to the "users" key, and currentGroup.groupKeyis equal to -KBxo9-RoY0eowWKeHkUin this case.
According to this query, I should get the snapshot of the child that has changed. For example, if I replace the user name "Pepa" to "Test", I should get the snapshot:
"-KBxo7ZU6McsmDOxyias" : {
"avatar" : "owl2",
"groups" : {
"-KBxo9-RoY0eowWKeHkU" : true
},
"isUser" : false,
"name" : "Test"
}
but this query never get's called...
Is there something wrong in my query?
"I'm trying to run a query in order to get the ChildChanged snapshot:" is a little odd.
You can query for data, or observe a node via ChildChanged.
If you just want to be notified of changes within the users node, add an observer to that node and when Pepa changes to Test, your app will be notified and provided a snapshot of the user node that changed.
var ref = Firebase(DataService.dataService.USERS_REF)
ref.observeEventType(.ChildChanged, withBlock: { snapshot in
println("the changed user is: \(snapshot.value)")
})
Oh, and no need for queryOrderedByChild since the snapshot will only contain the single node that changed.

MongoDB complex select count group by function

I have a collection called 'my_emails' where are stored email addresses :
[
{ email:"russel#gmail.com"},
{ email:"mickey#yahoo.com"},
{ email:"john#yahoo.com"},
]
and I try to get the top 10 hostnames used...
[
{host: "gmail.com", count: 1000},
{host: "yahoo.com", count: 989}, ...
]
if I had MySQL, I’ll do this query :
SELECT substr(email,locate('#',email)+1,255) AS host,count(1) AS count
FROM my_emails
WHERE email like '%#%'
GROUP BY substr(email,locate('#',email)+1,255)
ORDER BY count(1) DESC
LIMIT 10
how can I do with mongodb ?
I try without result something like this :
db.my_emails.aggregate([ { $group : {_id : "$host", count : { $sum : 1 }}}]);
I don't know how to make the $host value without adding a new property to my records
MongoDB doesn't provide any operator like locate but you can use .mapReduce to do this:
db.collection.mapReduce(
function() {
emit(this.email.substr(this.email.indexOf('#') + 1), 1);
},
function(host, count) {
return Array.sum(count) ; },
{ out: "hosts" }
)
Then db.hosts.find().sort({ 'value': -1 }).limit(10) returns top 10 hostname:
{ "_id" : "yahoo.com", "value" : 2 }
{ "_id" : "gmail.com", "value" : 1 }
An alternative workaround would be to modify your data structure by introducing another field in your schema which holds only the domain value of the email address. This can be done with a bulk update using the Bulk API operations that give a better write response i.e. useful information about what actually happened during the update:
var bulk = db.my_emails.initializeUnorderedBulkOp(),
count = 0;
db.my_emails.find().forEach(function(doc) {
var domain = doc.email.replace(/.*#/, ""),
update = { domain: domain };
bulk.find({ "_id": doc._id }).updateOne({
"$set": update
})
count++;
if (count % 1000 == 0) {
bulk.execute();
bulk = db.my_emails.initializeUnorderedBulkOp();
}
})
if (count % 1000 != 0) { bulk.execute(); }
Bulk update response from sample:
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 0,
"nUpserted" : 0,
"nMatched" : 3,
"nModified" : 3,
"nRemoved" : 0,
"upserted" : [ ]
})
After this update, a query on the collection db.my_emails.find().pretty() will yield:
{
"_id" : ObjectId("561618af645a64b1a70af2c5"),
"email" : "russel#gmail.com",
"domain" : "gmail.com"
}
{
"_id" : ObjectId("561618af645a64b1a70af2c6"),
"email" : "mickey#yahoo.com",
"domain" : "yahoo.com"
}
{
"_id" : ObjectId("561618af645a64b1a70af2c7"),
"email" : "john#yahoo.com",
"domain" : "yahoo.com"
}
Now, having the domain field will make it easier for the aggregation framework to give you the host count through the $sum operator in the $group pipeline. The following pipeline operation will return the desired outcome:
db.my_emails.aggregate([
{
"$group": {
"_id": "$domain",
"count": { "$sum": 1 }
}
}
])
Output:
{ "_id" : "yahoo.com", "count" : 2 }
{ "_id" : "gmail.com", "count" : 1 }

mongo db remove json objects

I have a mongo json object as follows
{
"_id" : new BinData(3, "RDHABb22XESWvP83FplqJw=="),
"name" : "NEW NODE",
"host" : null,
"aet" : null,
"studies" : ["1.3.12.2.1107.5.99.3.30000008061114424970500000589"],
"testcases" : [new BinData(3, "Zhl+zIXomkqAd8NIkRiTjQ==")],
"sendentries" : [{
"_id" : "1.3.12.2.1107.5.99.3.30000008061114424970500000589",
"Index" : 0,
"Type" : "Study"
}, {
"_id" : "cc7e1966-e885-4a9a-8077-c3489118938d",
"Index" : 1,
"Type" : "TestCase"
}]
}
The fields "Studies" and "TestCases" are now obsolete and I am now storing that information in a new field called SendEntries. I would like to get rid of the Studies and TestCases from the old entries and unmap those fields going forward. I want to know how I can update my current collections to get rid of the Studies and TestCases fields.
I'm just few weeks into Mongo.
You can use the $unset operator with update.
db.collection.update({},
{ $unset: {
"studies": "",
"testcases": ""
},
{ "upsert": false, "muti": true }
)
And that will remove all of the fields from all of your documents in your collection
Use $unset, there's a manual page e.g.
db.yourCollection.update( { },
{ $unset: {
Studies: "",
testcases: ""
}
},
{ multi: true }
)

Generating Mongo query from MySQL query

I have been using the following MySQL command to construct a heatmap from log data. However, I have a new data set that is stored in a Mongo database and I need to run the same command.
select concat(a.packages '&' b.packages) "Concurrent Packages",
count(*) "Count"
from data a
cross join data b
where a.packages<b.packages and a.jobID=b.jobID
group by a.packages, b.packages
order by a.packages, b.packages;
Keep in mind that the tables a and b do not exist prior to the query. However, they are created from the packages column of the data table, which has jobID as the field which I want to check for matches. In other words if two packages are within the same job I want to add an entry to the concurrent usage count. How can I generate a similar query in Mongo?
This is not a "join" of different documents; it is an operation within one document, and can be done in MongoDB.
You have a SQL TABLE "data" like this:
JobID TEXT,
package TEXT
The best way to store this in MongoDB will be a collection called "data", containing one document per JobID that contains an array of packages:
{
_id: <JobID>,
packages: [
"packageA",
"packageB",
....
]
}
[ Note: you could also implement your data table as only one document in MongoDB, containing an array of jobs which contain each an array of packages. This is not recommended, because you might hit the 16MB document size limit and nested arrays are not (yet) well supported by different queries - if you want to use the data for other purposes as well ]
Now, how to get a result like this ?
{ pair: [ "packageA", "packageB" ], count: 20 },
{ pair: [ "packageA", "packageC" ], count: 11 },
...
As there is no built-in "cross join" of two arrays in MongoDB, you'll have to program it out in the map function of a mapReduce(), emitting each pair of packages as a key:
mapf = function () {
that = this;
this.packages.forEach( function( p1 ) {
that.packages.forEach( function( p2 ) {
if ( p1 < p2 ) {
key = { "pair": [ p1, p2 ] };
emit( key, 1 );
};
});
});
};
[ Note: this could be optimized, if the packages arrays were sorted ]
The reduce function is nothing more than summing up the counters for each key:
reducef = function( key, values ) {
count = 0;
values.forEach( function( value ) { count += value } );
return count;
};
So, for this example collection:
> db.data.find()
{ "_id" : "Job01", "packages" : [ "pA", "pB", "pC" ] }
{ "_id" : "Job02", "packages" : [ "pA", "pC" ] }
{ "_id" : "Job03", "packages" : [ "pA", "pB", "pD", "pE" ] }
we get the following result:
> db.data.mapReduce(
... mapf,
... reducef,
... { out: 'pairs' }
... );
{
"result" : "pairs",
"timeMillis" : 443,
"counts" : {
"input" : 3,
"emit" : 10,
"reduce" : 2,
"output" : 8
},
"ok" : 1,
}
> db.pairs.find()
{ "_id" : { "pair" : [ "pA", "pB" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pC" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pA", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pC" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pD", "pE" ] }, "value" : 1 }
For more information on mapReduce consult: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/ and http://docs.mongodb.org/manual/applications/map-reduce/
You can't. Mongo doesn't do joins. Switching from SQL to Mongo is a lot more involved than migrating your queries.
Typically, you would include all the pertinent information in the same record (rather than normalize the information and select it with a join). Denormalize!