Mix data from MySQL in MongoDB query? - mysql

I have the following query in MongoDB, which selects the last 10 conversations a user has participated in. They are sorted based on the last message in a conversation. I have a way to paginate between the results using $gt in the match statement. For now, this is commented out.
I limited the number of members to 3 (using slice), because a group conversation can contain, for example, 30, 40, 50, ... members.
The problem: I have a MySQL database to store relational information. So I have a table users and a table followers, to show who is following each other. I want to include this information in a way into MongoDB, so I can find 3 members I am following, to show relevant profile pictures of known people. Let's say I have a conversation with 50 members, then I want to retrieve 3 people I follow (if any), so I see familiar profile pictures. Is this possible in a way, or is my wish not possible?
query
db.getCollection('conversations').aggregate([
{
$lookup: {
foreignField: "c_ID",
from: "messages",
localField: "_id",
as: "messages"
}
},
{
"$unwind": "$messages"
},
{
"$sort": {
"messages.t": -1
}
},
{
"$group": {
"_id": "$_id",
"lastMessage": {
"$first": "$messages"
},
"allFields": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$allFields",
{
"lastMessage": "$lastMessage"
}
]
}
}
},
{
$project: {
messages: 0
}
},
{
$match: {
"members.uID": "1",
//"lastMessage.t": { $gt: ISODate("2020-02-04 20:38:02.154Z") }
}
},
{
$sort: { "lastMessage.t": 1 }
},
{
$limit: 10
},
{
$project: {
members: {
$slice: [ {
$filter: {
input : "$members", as : "member", cond : {
$ne : ["$$member.uID" , "1"]
}
}
}, 3 ]
}
}
},
])
conversations document
{
"_id" : ObjectId("5e35f2c840713a43aeeeb3d9"),
"members" : [
{
"uID" : "1",
"j" : 1580580922
},
{
"uID" : "4",
"j" : 1580580922
},
{
"uID" : "5",
"j" : 1580580922
}
]
}
messages document
{
"_id" : ObjectId("5e35ee5f40713a43aeeeb1c5"),
"c_ID" : ObjectId("5e35f2c840713a43aeeeb3d9"),
"fromID" : "1",
"msg" : "What's up?",
"t" : 1580591922,
"d" : {
"4" : 1580592039
},
"r" : {
"4" : 1580592339
}
}

Related

Elastic Search: Multiple term search in one query

New to ElasticSearch.
I have documents under an index: myindex in Elastic search with mappings:
http://host:port/myindex/_mapping
{
"mappings":{
"properties": {
"en_US": {
"type": "keyword"
}
}
}
}
Let's say my 3 documents look like this:
{
"product": "p1",
"subproduct": "p1.1"
}
{
"product": "p1",
"subproduct": "p1.2"
}
{
"product": "p2",
"subproduct": "p2.1"
}
Now, I am querying using for single subproduct p1.1 with product p1 as below and it's working fine:
POST: http://host:port/myindex/_search
{
"query": {
"bool" : {
"must" : {
"term" : { "product" : "p1" }
},
"filter": {
"term" : { "subproduct" : "p1.1" }
}
}
}
}
My question is:
How I can query for 2 or more subproducts in one _search query, like suproducts p1.1 and p1.2 under product p1 ?
Query should return list of all subproduct p1.1 and subproduct p1.2 with p1 product.
Simply change the term-query in your filter-clause to a terms-query and search for multiple terms.
{
"query": {
"bool" : {
"must" : {
"term" : { "product" : "p1" }
},
"filter": {
"terms" : { "subproduct" : ["p1.1", "p1.2"] }
}
}
}
}

Retrieve item list by checking multiple attribute values in MongoDB in golang

This question based on MongoDB,How to retrieve selected items retrieve by selecting multiple condition.It is like IN condition in Mysql
SELECT * FROM venuelist WHERE venueid IN (venueid1, venueid2)
I have attached json data structure that I have used.[Ref: JSON STRUCTUE OF MONGODB ].
As an example, it has a venueList then inside the venue list, It has several attribute venue id and sum of user agents name and total count as value.user agents mean user Os,browser and device information. In this case I used os distribution.In that case i was count linux,ubuntu count on particular venueid.
it is like that,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
Finally I want to get count of all linux user count by selecting venueid list in one find query in MongoDB.
As example, I want to select all count of linux users by conditioning if venue id VID1212 or VID4343
Ref: JSON STRUCTUE OF MONGODB
{
"_id" : ObjectId("57f940c4932a00aba387b0b0"),
"tenantID" : 1,
"date" : "2016-10-09 00:23:56",
"venueList" : [
{
"id" : “VID1212”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
“ssidList” : [ // this is list of ssid’s in venue
{
"id" : “SSID1212”,
"sum" : [
{
"name" : "linux",
"value" : 8
},
{
"name" : "ubuntu",
"value" : 6
}
],
“macList” : [ // this is mac list inside particular ssid ex: this is mac list inside the SSID1212
{
"id" : “12:12:12:12:12:12”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 1
}
]
}
]
}
]
},
{
"id" : “VID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"ssidList" : [
{
"id" : “SSID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"macList" : [
{
"id" : “43:43:43:43:43:34”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
]
}
]
}
]
}
]
}
I am using golang as language to manipulation data with mongoldb using mgo.v2 package
expected out put is :
output
linux : 12+2 = 14
ubuntu : 4+0 = 4
Don't consider inner list in venuelist.
You'd need to use the aggregation framework where you would run an aggregation pipeline that first filters the documents in the collection based on
the venueList ids using the $match operator.
The second pipeline would entail flattening the venueList and sum subdocument arrays in order for the data in the documents to be processed further down the pipeline as denormalised entries. The $unwind operator is useful here.
A further filter using $match is necessary after unwinding so that only the documents you want to aggregate are allowed into the next pipeline.
The main pipeline would be the $group operator stage which aggregates the filtered documents to create the desired sums using the accumulator operator $sum. For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value.
Putting this altogether, consider running the following pipeline:
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": null,
"linux": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "linux" ] },
"$venueList.sum.value", 0
]
}
},
"ubuntu": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "ubuntu" ] },
"$venueList.sum.value", 0
]
}
}
}
}
])
For usage with mGo, you can convert the above pipeline using the guidance in http://godoc.org/labix.org/v2/mgo#Collection.Pipe
For a more flexible and better performant alternative which executes much faster than the above, and also takes into consideration unknown values for the sum list, run the alternative pipeline as follows
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": "$venueList.sum.name",
"count": { "$sum": "$venueList.sum.value" }
}
},
{
"$group": {
"_id": null,
"counts": {
"$push": {
"name": "$_id",
"count": "$count"
}
}
}
}
])

Elastic query to show exact match OR other fields if not found

I need some help rewriting my elasticsearch query.
What i need is:
1- to show a single record if there is an exact match on the two fields verb and sessionid.raw (partial matches are not accepted).
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "sessionid.raw" : strSessionID } },
]
OR
2- to show the top 5 records (sorted by _score DESC and #timestamp ASC) that match some other fields, giving a boost if the records are between the specified time range.
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "pid" : strPID } },
],
"should": [
{ "match" : { "user.raw" : strUser } },
{ "range" : { "#timestamp" : {
"from" : QueryFrom,
"to" : QueryTo,
"format" : DateFormatElastic,
"time_zone" : "America/Sao_Paulo",
"boost" : 2 }
} },
]
The code below is almost doing what i want.
Right now it boosts sessionid.raw to the top if found, but the remaining records are not being discarded.
var objQueryy = {
"fields" : [ "#timestamp", "program", "pid", "sessionid.raw", "user", "frontendip", "frontendname", "_score" ],
"size" : ItemsPerPage,
"sort" : [ { "_score" : { "order": "desc" } }, { "#timestamp" : { "order" : "asc" } } ],
"query" : {
"bool": {
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "pid" : strPID } },
{ "bool": {
"should": [
{ "match" : { "user.raw" : strUser } },
{ "match" : { "sessionid.raw": { "query": strSessionID, "boost" : 3 } } },
{ "range" : { "#timestamp" : { "from": QueryFrom, "to": QueryTo, "format": DateFormatElastic, "time_zone": "America/Sao_Paulo" } } },
],
}},
],
},
},
}
Elasticsearch cannot "prune" your secondary results for you when an exact match is also found.
You would have to implement this discarding functionality on the client side after all results had been returned.
You may find the cleanest implementation is to execute your two search strategies separately. Your search client would:
Run the first (exact match) query
Run the second (expanded) query only if no results found

Elastic Search aggregation enhanced filtering for nested query

I have the following objects indexed:
{ "ProjectName" : "Project 1",
"Roles" : [
{ "RoleName" : "Role 1", "AddedAt" : "2015-08-14T17:11:31" },
{ "RoleName" : "Role 2", "AddedAt" : "2015-09-14T17:11:31" } ] }
{ "ProjectName" : "Project 2",
"Roles" : [
{ "RoleName" : "Role 1", "AddedAt" : "2015-10-14T17:11:31" } ] }
{ "ProjectName" : "Project 3",
"Roles" : [
{ "RoleName" : "Role 2", "AddedAt" : "2015-11-14T17:11:31" } ] }
I.e., a list of projects with different roles added, added in different time.
(Roles list is a nested field)
What I need is to have aggregation which would select how many projects exist per certain role, BUT only(!) if the role was added to the project in certain period.
A classic query (without the dates rande filtering) looks like this (and works well):
{ // ... my main query here
"aggs" : {
"agg1" : {
"nested" : {
"path" : "Roles"
},
"aggs" : {
"agg2": {
"terms": {
"field" : "Roles.RoleName"
},
"aggs": {
"agg3":{
"reverse_nested": {}
}}}}}}
But this approach is not working for me, because if I need filtering by dates starting from let's say '2015-09-01', both 'Role 1' and 'Role 2' would be selected for the first project (i.e., the project for them) as the 'Role 1' would hit because 'Role 2''s project hits because of the 'Role 2' AddedAt date criterium.
So, I consider, I should add the following condition somewhere additionally:
"range": { "Roles.AddedAt": {
"gte": "2015-09-01T00:00:00",
"lte": "2015-12-02T23:59:59"
}}
But I can not find a correct way to do that.
The results of the working query are (kind of) the following:
"aggregations": {
"agg1": {
"doc_count": 17,
"agg2": {
"buckets": [
{
"key": "Role 1",
"doc_count": 2,
"agg3": {
"doc_count": 2
}
},
{
"key": "Role 2",
"doc_count": 2,
"agg3": {
"doc_count": 2
}
},
Try this:
{
"aggs": {
"agg1": {
"nested": {
"path": "Roles"
},
"aggs": {
"NAME": {
"filter": {
"query": {
"range": {
"Roles.AddedAt": {
"gte": "2015-09-01T00:00:00",
"lte": "2015-12-02T23:59:59"
}
}
}
},
"aggs": {
"agg2": {
"terms": {
"field": "Roles.RoleName"
},
"aggs": {
"agg3": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}

Count links in Arrays in MongoDB collection

I have a collection with objects, which are linking to other objects in the array:
{
"_id" : ObjectId("53f75bedc5489f86666d305e"),
"id" : "2",
"links_to" : [
{
"id" : 1,
"label" : null,
},
{
"id" : 3,
"label" : null,
},
{
"id" : 60,
"label" : null,
},
{
"id" : 23,
"label" : null,
},
},
{
"_id" : ObjectId("53f75bedc5489f86666d305e"),
"id" : "3",
"links_to" : [
{
"id" : 4,
"label" : null,
},
{
"id" : 8,
"label" : null,
},
{
"id" : 23,
"label" : null,
},
{
"id" : 2,
"label" : null,
},
},
...
Now I would like to write a query, which gives as an output for each id the number of links. Eg.:
{"id": 1, "numberOfLinks": 21},
{"id": 2, "numberOfLinks": 15},
...
Thanks in advance.
The best approach is to keep the count on the document and update it when you either $push or $pull elements of the array using the $inc operator. In this way the field is maintained on the document itself:
{
"links_to": [],
"linkCount": 0
}
When you "push"
db.collecction.update(
{},
{ "$push": { "links_to": newLink }, "$inc": { "linkCount": 1 } }
)
And "pull":
db.collecction.update(
{},
{ "$pull": { "links_to": newLink }, "$inc": { "linkCount": -1 } }
)
Without doing this, you can use the $size operator from the aggregation framework in Mondern MongoDB to get the array length:
db.collection.aggregate([
{ "$project": {
"numberOfLinks": { "$size": "$link_count" }
}}
])
Or in versions prior to MongoDB 2.6 you can count the array members after $unwind and $group:
db.collection.aggregate([
{ "$unwind": "$link_count" },
{ "$group": {
"_id": "$id",
"numberOfLinks": { "$sum": 1 }
}}
])
So usually unless you want something specifically "dynamic" then just maintain the count on the document. This avoids the overhead of calculation when you query.
Actually this is fairly simple to achieve using aggregation:
db.foo.aggregate([
{$unwind: "$links_to" },
{$group: { _id: {"lti":"$links_to.id"}, numberOfLinks: {$sum: 1} } },
{$project: { _id:0, id: "$_id.lti", numberOfLinks: "$numberOfLinks" } }
])
produces the desired output, though in reversed order of fields, at least in the shell output:
{ "numberOfLinks" : 3, "id" : 3 }
{ "numberOfLinks" : 3, "id" : 2 }
{ "numberOfLinks" : 1, "id" : 5 }
{ "numberOfLinks" : 2, "id" : 4 }
{ "numberOfLinks" : 3, "id" : 1 }
If you can live with an output like:
{ "_id" : { "linksToId" : 3 }, "numberOfLinks" : 3 }
{ "_id" : { "linksToId" : 2 }, "numberOfLinks" : 3 }
{ "_id" : { "linksToId" : 5 }, "numberOfLinks" : 1 }
{ "_id" : { "linksToId" : 4 }, "numberOfLinks" : 2 }
{ "_id" : { "linksToId" : 1 }, "numberOfLinks" : 3 }
you can skip the $project step of the aggregation pipeline.
This is extremely efficient. I did a test basically doing the same thing over a collection of 5M documents with roughly 17M relations. Takes 18 seconds on a not exactly high performance server.