I know that common table expressions (CTE) a.k.a. "temporary named result sets" can be used in SQL to generate a temporary table, but can this be done in MongoDB? I want a document, but it's only for temporary use in my query.
Can you create a temporary table in MongoDB without creating a new collection?
For example, if I were to try to recreate the code below in Mongo...
Example CTE Table in SQL:
n
f1
f2
1
20
12
2
40
0.632
3
60
0.647
WITH RECURSIVE example (n, f1, f2) AS
( SELECT 1, 20, 12
UNION ALL SELECT
n + 1,
n * 20,
least(6*n, $globalVar * 100),
FROM example WHERE n < 3
) SELECT * FROM example
It seems that there is no general equivalent for CTE in MongoDB. However, for OP's example, it is possible to wrangle the output of $range to produce a similar effect.
// whichever collection doesn't matter; as long as it has 1 document then it should be fine
db.collection.aggregate([
{
// jsut take 1 document
"$limit": 1
},
{
// use $range to generate iterator [1, 2, 3]
"$addFields": {
"rg": {
"$range": [
1,
4
]
},
globalVar: 0.001
}
},
{
// do the mapping according to logic
"$addFields": {
"cte": {
"$map": {
"input": "$rg",
"as": "n",
"in": {
n: "$$n",
f1: {
"$multiply": [
"$$n",
20
]
},
f2: {
"$cond": {
"if": {
$lt: [
{
"$multiply": [
"$$n",
6
]
},
{
"$multiply": [
"$globalVar",
100
]
}
]
},
"then": {
"$multiply": [
"$$n",
6
]
},
"else": {
"$multiply": [
"$globalVar",
100
]
}
}
}
}
}
}
}
},
{
// wrangle back to expected form
"$unwind": "$cte"
},
{
"$replaceRoot": {
"newRoot": "$cte"
}
}
])
Here is the Mongo playground for your reference.
In my MongoDB (export from JSON file) I have database "dab" with structure like this:
id:"1"
datetime:"2020-05-08 5:09:56"
name:"namea"
lat:55.826738
lon:45.0423412
analysis:"[{"0":0.36965591924860347},{"5":0.10391287134268598},{"10":0.086884394..."
I'm using that db for spark analysis via MongoDB-Spark Connector.
My problem is field "analysis" - I need average result for all values from every interval ("0", "5", "10", ..., "1000"), so I have to sum 0.36965591924860347 + 0.10391287134268598 + 0.086884394 + ... and divide by number of intervals (I have 200 intervals in every column), and finally multiply the result by 100.
My solution would be this one:
db.collection.aggregate([
{
$set: {
analysis: {
$map: {
input: "$analysis",
in: { $objectToArray: "$$this" }
}
}
}
},
{
$set: {
analysis: {
$map: {
input: "$analysis",
in: { $first: "$$this.v" }
}
}
}
},
{ $set: { average: { $multiply: [ { $avg: "$analysis" }, 100 ] } } }
])
Mongo playground
You can use $reduce on that array,sum the values,and then divide with the number of elements and then multiply with 100
db.collection.aggregate([
{
"$addFields": {
"average": {
"$multiply": [
{
"$divide": [
{
"$reduce": {
"input": "$analysis",
"initialValue": 0,
"in": {
"$let": {
"vars": {
"sum": "$$value",
"data": "$$this"
},
"in": {
"$add": [
"$$sum",
{
"$arrayElemAt": [
{
"$arrayElemAt": [
{
"$map": {
"input": {
"$objectToArray": "$$data"
},
"as": "m",
"in": [
"$$m.k",
"$$m.v"
]
}
},
0
]
},
1
]
}
]
}
}
}
}
},
{
"$size": "$analysis"
}
]
},
100
]
}
}
}
])
You can test the code here
But this code has 1 problem, you save data in documents, and MongoDB
doesn't have a function like get(document,$$k), the new MongoDB v5.0 has a $getField but still accepts only constants no variables.
I mean we cant do in your case getField(doc,"5").
So we have the cost of converting each document to an array.
I am trying to build the following function for the function_score search query:
{
"filter": {
"range": {
"availabilityAverage": {
"gt": 0
}
}
},
"field_value_factor": {
"field": "availabilityAverage",
"factor": 1,
"modifier": "log1p"
},
"weight": 100
}
This is currently my .Net code
.FieldValueFactor(ff => ff
.Field(fff => fff.StandardPriceMin)
.Factor(2)
.Modifier(FieldValueFactorModifier.Log1P)
.Weight(100)
.Filter(faf => faf
.Range(r => r
.Field(rf => rf.AvailabilityAverage)
.GreaterThan(0.0)
)
)
)
However, this is the result of the NEST query:
{
"filter": {
"range": {
"availabilityAverage": {
"gt": 0.0
}
}
},
"field_value_factor": {
"factor": 2.0,
"field": "standardPriceMin",
"modifier": "log1p",
"filter": {
"range": {
"availabilityAverage": {
"gt": 0.0
}
}
},
"weight": 100.0
},
"weight": 100.0
}
It is adding correctly the filter and weight on the outside of field_value_factor but also including the 'Filter' and 'weight' on the inside as a child element. This is not the case for others such as RandomScore() with exact same format but only with field_value_factor.
I tried several different combinations but neither provided expected result. Is it normal that the NEST is generating this JSON?
Thanks in advance.
It looks like there's a bug in how IFieldValueFactorFunction is being serialized, resulting in filter and weight being included twice, outside of "field_value_factor" and inside. I've opened a pull request to address.
I have a MongoDB document with over 2.8m documents of common passwords (hashed in SHA1) and their popularity.
Currently I've imported the documents with the following schema
{"_id":"5ded1a559015155eb8295f48","password":"20EABE5D64B0E216796E834F52D61FD0B70332FC:2512537"}
Although I'd like to split this so I can have the popularity value and it would look something like this
{"_id":"5ded1a559015155eb8295f48","password":"20EABE5D64B0E216796E834F52D61FD0B70332FC","popularity":2512537}
Question is im unsure how I can split the password into two password, popularity using : to split the string
You can use Aggregation Framework to split current password into two fields. You need to start with $indexOfBytes to get the position of : and then you need $substr to create new fields based on evaluated position.
db.collection.aggregate([
{
$addFields: {
colonPos: { $indexOfBytes: ["$password",":"] }
}
},
{
$addFields: {
password: { $substr: [ "$password", 0, "$colonPos" ] },
popularity: { $substr: [ "$password", "$colonPos", { $strLenBytes: "$password" } ] }
}
},
{
$project: {
colonPos: 0
}
}
])
Mongo Playground
As a last step you can use $out which takes all your aggregation results and writes them into new or existing collection.
EDIT: Alternative approach using $split (thank to #matthPen):
db.collection.aggregate([
{
$addFields: {
password: { $arrayElemAt: [ { "$split": [ "$password", ":"] }, 0 ] },
popularity: { $arrayElemAt: [ { "$split": [ "$password", ":"] }, 1 ] }
}
}
])
Mongo Playground
I migrated timeseries data from SQL to MongoDB. I'll give you an example:
Let's say we have a measurement device with an ID, where once per minute a value gets read. So per day, we have 24 hours * 60 minutes = 1440 values for that device.
In SQL, we have 1440 single rows for this device per day:
ID Timestamp Value
400001 01.01.2017 00:00:00 ...
"" 01.01.2017 00:01:00 ...
"" ... ...
"" 01.01.2017 23:59:00 ...
I migrated the data to MongoDB where I now have one document per day, with the values distributed to 24 hour array that respectively contain 60 minute fields containing the values (and only one Timestamp with the date XX-XX-XXXX 00:00:00):
{ ID: 400001,
Timestamp: 01.01.2017 00:00:00,
Hours:
[ 0: [0: ..., 1: ..., 2: ..., ....... 59: ... ],
1: [0: ..., 1: ..., 2: ..., ....... 59: ... ],
.
.
23: [0: ..., 1: ..., 2: ..., ....... 59: ... ]
]
}
My Problem is:
I want to transform the following SQL statement to mongoDB:
SELECT (Val) AS Val, (UNIX_TIMESTAMP(DATE_FORMAT(ArrivalTime, '%Y-%m-%d %H:%i:00'))) * 1000 AS timestmp FROM database WHERE ID = 400001 AND ArrivalTime BETWEEN FROM_UNIXTIME(1470002400) AND FROM_UNIXTIME(1475272800) ORDER BY ArrivalTime ASC
Output
Since in MongoDB I only save the day Timestamp and then split the values in arrays, I don't have a Timestamp for each Value like in SQL. So if I want to for example, get the values between 01.01.2017 02:14:00 and 01.01.2017 18:38:00, how would I do that?
I made a MongoDB query that can give me the Values between two whole days:
db.getCollection('test').aggregate([{$match: {ID: '400001', $and: [ {Timestamp_day: {$gte: new ISODate("2016-08-01 00:00:00.000Z")}}, {Timestamp_day: {$lte: new ISODate("2016-10-01 00:00:00.000Z")}}]}},{$unwind:"$Hours"}, {$unwind:"$Hours"}, {$group: {_id: '$Timestamp_day', Value: {$push: "$Hours"}}}, {$sort: {_id: 1}}]);
Output
But I need it like in SQL that I can also just give out the Values for a few hours, and with the correct Timestamp given per each Values.
I hope you can help me.
This should get you going:
db.collection.aggregate([{
$match: {
"ID": '400001',
"Timestamp_day": {
$gte: new ISODate("2017-01-01T00:00:00.000Z"),
$lte: new ISODate("2017-01-01T00:00:00.000Z")
}
}
}, {
$unwind: {
path: "$Hours",
includeArrayIndex: "Hour"
}
}, {
$unwind: {
path: "$Hours",
includeArrayIndex: "Minute"
}
}, {
$project: {
"_id": 0, // remove the "_id" field
"Val": "$Hours", // rename "Hours" to "Val"
"Timestamp": { // "resolve" our timestamp...
$add: // ...by adding
[
{ $multiply: [ "$Hour", 60 * 60 * 1000 ] }, // ...the number of hours in milliseconds
{ $multiply: [ "$Minute", 60 * 1000 ] }, // ...plus the number of minutes in milliseconds
"$Timestamp_day", // to the "Timestamp_day" value
]
}
}
}, {
$sort: {
"Timestamp": 1 // oh well, sort by timestamp ascending
}
}]);
With an input document of
{
"_id" : ObjectId("5a0e7d096216d24dd605cdec"),
"ID" : "400001",
"Timestamp_day" : ISODate("2017-01-01T00:00:00.000Z"),
"Hours" : [
[
0.0,
0.1,
2.0
],
[
1.0,
1.1,
2.1
],
[
2.0,
2.1,
2.2
]
]
}
the results look like this:
/* 1 */
{
"Val" : 0.0,
"Timestamp" : ISODate("2017-01-01T00:00:00.000Z")
}
/* 2 */
{
"Val" : 0.1,
"Timestamp" : ISODate("2017-01-01T00:01:00.000Z")
}
/* 3 */
{
"Val" : 2.0,
"Timestamp" : ISODate("2017-01-01T00:02:00.000Z")
}
/* 4 */
{
"Val" : 1.0,
"Timestamp" : ISODate("2017-01-01T01:00:00.000Z")
}
/* 5 */
{
"Val" : 1.1,
"Timestamp" : ISODate("2017-01-01T01:01:00.000Z")
}
/* 6 */
{
"Val" : 2.1,
"Timestamp" : ISODate("2017-01-01T01:02:00.000Z")
}
/* 7 */
{
"Val" : 2.0,
"Timestamp" : ISODate("2017-01-01T02:00:00.000Z")
}
/* 8 */
{
"Val" : 2.1,
"Timestamp" : ISODate("2017-01-01T02:01:00.000Z")
}
/* 9 */
{
"Val" : 2.2,
"Timestamp" : ISODate("2017-01-01T02:02:00.000Z")
}
UPDATE:
Based on your comment you need to calculate the difference between any value and its respective preceding value. This can be done the following way - there might be nicer ways of achieving the same thing, though... The first part is almost identical to the solution above except it has an added $match stage to remove null values as per your specification.
db.collection.aggregate([{
$match: {
"ID": '400001',
"Timestamp_day": {
$gte: new ISODate("2017-01-01T00:00:00.000Z"),
$lte: new ISODate("2017-01-01T00:00:00.000Z")
}
}
}, {
$unwind: {
path: "$Hours",
includeArrayIndex: "Hour"
}
}, {
$unwind: {
path: "$Hours",
includeArrayIndex: "Minute"
}
}, {
$match: {
"Hours": { $ne: null } // get rid of all null values
}
}, {
$project: {
"_id": 0, // remove the "_id" field
"Val": "$Hours", // rename "Hours" to "Val"
"Timestamp": { // "resolve" our timestamp...
$add: // ...by adding
[
{ $multiply: [ "$Hour", 60 * 60 * 1000 ] }, // ...the number of hours in milliseconds
{ $multiply: [ "$Minute", 60 * 1000 ] }, // ...plus the number of minutes in milliseconds
"$Timestamp_day", // to the "Timestamp_day" value
]
}
}
}, {
$sort: {
"Timestamp": 1 // oh well, sort by timestamp ascending
}
}, {
$group: {
"_id": null, // throw all documents in the same aggregated document
"Docs": {
$push: "$$ROOT" // and store our documents in an array
}
}
}, {
$unwind: {
path: "$Docs", // we flatten the "values" array
includeArrayIndex: "Docs.Index", // this will give us the index of every element - there might be more elegant solutions using $map and $let...
}
}, {
$group: { // YES, unfortunately a second time... but this time we have the array index for each element
"_id": null, // throw all documents in the same aggregated document
"Docs": {
$push: "$Docs" // and store our documents in an array
}
}
}, {
$addFields: {
"Docs": {
$let: {
vars: { "shiftedArray": { $concatArrays: [ [ null ], "$Docs.Val" ] } }, // shift value array by one to the right and put a null object at the start
in: {
$map: {
input: "$Docs",
as: "d",
in: {
"Timestamp" : "$$d.Timestamp",
"Val": { $ifNull: [ { $abs: { $subtract: [ "$$d.Val", { $arrayElemAt: [ "$$shiftedArray", "$$d.Index" ] } ] } }, 0 ] }
}
}
}
}
}
}
}, {
$unwind: "$Docs"
}, {
$replaceRoot: {
newRoot: "$Docs"
}
}]);
The results using your sample data set look like this:
/* 1 */
{
"Timestamp" : ISODate("2017-01-01T00:00:00.000Z"),
"Val" : 0.0
}
/* 2 */
{
"Timestamp" : ISODate("2017-01-01T00:01:00.000Z"),
"Val" : 0.0
}
/* 3 */
{
"Timestamp" : ISODate("2017-01-01T00:02:00.000Z"),
"Val" : 2.0
}
/* 4 */
{
"Timestamp" : ISODate("2017-01-01T00:04:00.000Z"),
"Val" : 3.0
}
/* 5 */
{
"Timestamp" : ISODate("2017-01-01T00:05:00.000Z"),
"Val" : 0.0
}
/* 6 */
{
"Timestamp" : ISODate("2017-01-01T00:06:00.000Z"),
"Val" : 1.0
}
Eventuell könntest du mir nochmal helfen, auch ein Tipp würde reichen #dnickless.. Ich bräuchte eine Query die mir den Betrag der Differenz zum vorherig gemessenen Wert gibt (in einem bestimmten Zeitraum, zu einer bestimmten ID).
Also als Beispiel:
Timestamp_day: ISODate("2017-01-01T01:00:00.000Z"),
Hours: [
[ 1.0, 1.0, -1.0, null, 2.0, 2.0, 3.0, ... ],
[ ... ],
...
]
Und dann als output:
{
'Timestamp' : ISODate("2017-01-01T00:00:00.000Z"),
'Val' : 0.0 /* nix - 1.0 */
}
{
'Timestamp' : ISODate("2017-01-01T00:01:00.000Z"),
'Val' : 0.0 /* 1.0 - 1.0 */
}
{
'Timestamp' : ISODate("2017-01-01T00:02:00.000Z"),
'Val' : 2.0 /* 1.0 - -1.0 */
}
{
'Timestamp' : ISODate("2017-01-01T00:04:00.000Z"),
'Val' : 3.0 /* -1.0 - (null) - 2.0 */
}
{
'Timestamp' : ISODate("2017-01-01T00:05:00.000Z"),
'Val' : 0.0 /* 2.0 - 2.0 */
}
{
'Timestamp' : ISODate("2017-01-01T00:06:00.000Z"),
'Val' : 1.0 /* 2.0 - 3.0 */
}
Hoffe es ist einigermaßen verständlich was ich meine