I'm running an aggregate through PyMongo.
The aggregate, formatted fairly nicely, looks like this:
[{
$match: {
syscode: {
$in: [598.0]
},
date: {
$gte: newDate(1509487200000),
$lte: newDate(1510264800000)
}
}
},
{
$group: {
_id: {
date: "$date",
start_date: "$start_date",
end_date: "$end_date",
daypart: "$daypart",
network: "$network"
},
syscode_data: {
$push: {
syscode: "$syscode",
cpm: "$cpm"
}
}
}
}]
It returns no results when I use the .explode methods on its cursor in Python.
When I run it through NoSQL Booster for MongoDB, I get the results back. That said, the Mongo log files don't change from what I'm seeing when I run it through PyMongo.
When I look at the Mongo logs, there's an additional group by pipeline added to them. Apparently the Booster knows what to do with this and I don't.
{ $group: { _id: null, count: { $sum: 1.0 } } }
This is the full log line I see.
2018-03-11T21:05:04.374+0200 I COMMAND [conn71] command Customer.weird_stuff command: aggregate { aggregate: "rate_cards", pipeline: [ { $match: { syscode: { $in: [ 598.0 ] }, date: { $gte: new Date(1509487200000), $lte: new Date(1510264800000) } } }, { $group: { _id: { date: "$date", start_date: "$start_date", end_date: "$end_date", daypart: "$daypart", network: "$network" }, syscode_data: { $push: { syscode: "$syscode", cpm: "$cpm" } } } }, { $group: { _id: null, count: { $sum: 1.0 } } } ], cursor: { batchSize: 1000.0 }, $db: "Customer" } planSummary: COLLSCAN keysExamined:0 docsExamined:102900 cursorExhausted:1 numYields:803 nreturned:1 reslen:134 locks:{ Global: { acquireCount: { r: 1610 } }, Database: { acquireCount: { r: 805 } }, Collection: { acquireCount: { r: 805 } } } protocol:op_query 122ms
What's going on? How do I handle this from the Python side?
Notes as I'm digging: this pipeline runs when I get lucky and use an unordered dictionary (default) with Pymongo. When I run the input JSON through the JSON.Jsondecoder with the line:
json.JSONDecoder(object_pairs_hook=OrderedDict).decode(parsed_param)
the output has a very complex format (necessary due to the pipeline needing to maintain its order) and ends up passing that extra piece.
So, lacking interest I found a workaround. Examining the problem, I found that when I added an additional step to the pipeline ({"$sort": {"_id": 1}}) the translation from Python dictionary to Mongo JSON aggregate didn't generate the extra JSON object.
This is a poor answer, but I think the root cause is that the conversion between complex ordered dictionaries and Mongo JSON queries in this particular environment has a little tiny bug that affected this particular query.
I would be excited to go find it and examine it further, but I'm buried at a new job.
Related
I have a small collection with records of the format:
db.presentations =
[
{
"_id": "1",
"student": "A",
"presentationDate": {
"$date": "2023-01-17T00:00:00Z"
}
},
{
"_id": "2",
"student": "B",
"presentationDate": {
"$date": "2023-01-17T00:00:00Z"
}
},
...
,
{
"_id": "26",
"student": "Z",
"presentationDate": {
"$date": "2023-01-17T00:00:00Z"
}
},
]
Instead of all the presentationDates being the same, I want to set them to an ascending order. So, student A's presentationDate is 2023-01-17, student B's is 2023-01-18, student C's is 2023-01-19, and so on.
I've been exploring some functions that could do this, but none really seem to fit what I'm trying to do, eg:
$dateAdd: allows specification of the unit and amount (eg, day, 3) by which to increase a date object, but it must be used as part of an aggregation pipeline. I don't see how to increment by variable amount for each document.
forEach() / map(): allows flexibility in function applied to each record, but again, I don't see how to increment by variable (uniformly increasing) amount for each document. I'm also not sure it's possible to edit documents within a forEach?
Put another way, I'm basically trying to iterate through my cursor/collection and update each document, incrementing a global variable on each itereation.
I'm new to mongosh, so any ideas, feedback are appreciated!
Of course you could select the data, iterate over all documents, change the value and save back. You can also do it with an aggregation pipeline like this:
db.collection.aggregate([
{
$setWindowFields: {
sortBy: { student: 1 },
output: {
pos: { $documentNumber: {} }
}
}
},
{
$set: {
presentationDate: {
$dateAdd: {
startDate: "$presentationDate",
unit: "day",
amount: "$pos"
}
}
}
}
])
If you like to modify the data, then use
db.collection.updateMany({}, [
{
$setWindowFields: {
sortBy: { student: 1 },
output: {
pos: { $documentNumber: {} }
}
}
},
{
$set: {
presentationDate: {
$dateAdd: {
startDate: "$presentationDate",
unit: "day",
amount: "$pos"
}
}
}
}
])
I am trying to query a document in my MongoDB
Document:
{
_id: '111',
subEntities: [
{
subId: '999',
dateOfStart: '2098-01-01',
dateOfTermination: '2099-12-31'
},
{
subId: '998',
dateOfStart: '2088-01-01',
dateOfTermination: '2089-12-31'
}
]
}
My Query:
{"$and": [
{"subEntities.dateOfStart": {"$lte": "2098-01-02"}},
{"subEntities.dateOfTermination": {"$gte": "2099-12-30"}},
{"subEntities.subId": {"$in": ["998"]}}
]}
As you can see, I am trying to apply a date value and an ID to the subentities.
The date value should be between dateOfStart and dateOfTermination.
The query returns a match, although the date value only matches the first subentity and the ID query matches the second subquery.
How can I make it so that there is only one match when both queries match the same subentity?
Can I aggregate the subentities?
Thanks a lot!
When you query arrays Mongo by default "flattens" them, which means each condition of the query get's executed independently.
You want to be using $elemMatch, this allows you to query full objects from within an array, like so:
db.collection.find({
subEntities: {
$elemMatch: {
dateOfStart: {
"$lte": "2098-01-02"
},
dateOfTermination: {
"$gte": "2099-12-30"
},
subId: {
"$in": [
"998"
]
}
}
}
})
Mongo Playground
If you want to filter dates between dateOfStart and dateOfTermination you should invert the $gte and $lte conditions:
{
"$and": [
{ "subEntities.dateOfStart": { "$gte": "2098-01-02" } },
{ "subEntities.dateOfTermination": { "$lte": "2099-12-30" } },
{ "subEntities.subId": { "$in": ["998"] } }
]
}
I have a MongoDB document with over 2.8m documents of common passwords (hashed in SHA1) and their popularity.
Currently I've imported the documents with the following schema
{"_id":"5ded1a559015155eb8295f48","password":"20EABE5D64B0E216796E834F52D61FD0B70332FC:2512537"}
Although I'd like to split this so I can have the popularity value and it would look something like this
{"_id":"5ded1a559015155eb8295f48","password":"20EABE5D64B0E216796E834F52D61FD0B70332FC","popularity":2512537}
Question is im unsure how I can split the password into two password, popularity using : to split the string
You can use Aggregation Framework to split current password into two fields. You need to start with $indexOfBytes to get the position of : and then you need $substr to create new fields based on evaluated position.
db.collection.aggregate([
{
$addFields: {
colonPos: { $indexOfBytes: ["$password",":"] }
}
},
{
$addFields: {
password: { $substr: [ "$password", 0, "$colonPos" ] },
popularity: { $substr: [ "$password", "$colonPos", { $strLenBytes: "$password" } ] }
}
},
{
$project: {
colonPos: 0
}
}
])
Mongo Playground
As a last step you can use $out which takes all your aggregation results and writes them into new or existing collection.
EDIT: Alternative approach using $split (thank to #matthPen):
db.collection.aggregate([
{
$addFields: {
password: { $arrayElemAt: [ { "$split": [ "$password", ":"] }, 0 ] },
popularity: { $arrayElemAt: [ { "$split": [ "$password", ":"] }, 1 ] }
}
}
])
Mongo Playground
I have a list of objects in mongoDB
[{ "name":"Joy", "age":23 }, { "name":"Nick", "age":26 }, {
"name":"Merry", "age":27 }, { "name":"Ben", "age":20 }]
I need a result which should have list of ages like
ages:[23,26,27,20]
Actually these objects are just for an example. I am using some different objects where I can use $group after that $push to get a result but can we get this result without $group?
If by "objects" you mean MongoDB documents then you can use the trick with grouping by null which will basically merge all your documents into one result:
db.collection.aggregate([
{
$group: {
_id: null,
ages: { $push: "$age" }
}
}
])
You'll get following result:
{ "_id" : null, "ages" : [ 23, 26, 27, 20 ] }
I have a created a collection with following command:
db.device_states.documentKey.insert({"device":"nest_kitchen_thermo"})
When I try to execute below command I get an error:
db.device_states.watch( {
$match: {
"documentKey.device": {
$in : [ "nest_kitchen_thermo"]
},
operationType: "insert"
}
});
syntax ERROR: missing:after property id:
The updated collection looks like that:
{
_id: <resume_token>,
operationType: 'insert',
ns: {db:'example',coll:"device_states"},
documentKey: { device:'nest_kitchen_thermo'},
fullDocument: {
_id : ObjectId(),
device: 'nest_kitchen_thermo',
temp: 68
}
}
In Mongo shell, there is no such querying attribute watch. $match is part of aggregation pipeline and it should be used within the aggregate operation.
Modified query(Mongo Shell) is
db.device_states.documentKey.aggregate([
{ $match:
{
"device": { $in: ["nest_kitchen_thermo"] },
operationType: "insert"
}
}
]);