Using addToSet inside an array with MongoDB - json

I'm trying to track daily stats for an individual.
I'm having a hard time adding a new day inside "history" and can also use a pointer on updating "walkingSteps" as new data comes in.
My schema looks like:
{
"_id": {
"$oid": "50db246ce4b0fe4923f08e48"
},
"history": [
{
"_id": {
"$oid": "50db2316e4b0fe4923f08e12"
},
"date": {
"$date": "2012-12-24T15:26:15.321Z"
},
"walkingSteps": 10,
"goalStatus": 1
},
{
"_id": {
"$oid": "50db2316e4b0fe4923f08e13"
},
"date": {
"$date": "2012-12-25T15:26:15.321Z"
},
"walkingSteps": 5,
"goalStatus": 0
},
{
"_id": {
"$oid": "50db2316e4b0fe4923f08e14"
},
"date": {
"$date": "2012-12-26T15:26:15.321Z"
},
"walkingSteps": 8,
"goalStatus": 0
}
]
}
db.history.update( ? )
I've been browsing (and attempting) the mongodb documentation but they don't quite break it all the way down to dummies like myself... I couldn't quite translate their examples to my setup.
Thanks for any help.
E = noob trying to learn programming

Adding a day:
user = {_id: ObjectId("50db246ce4b0fe4923f08e48")}
day = {_id: ObjectId(), date: ISODate("2013-01-07"), walkingSteps:0, goalStatus: 0}
db.users.update(user, {$addToSet: {history:day}})
Updating walkingSteps:
user = ObjectId("50db246ce4b0fe4923f08e48")
day = ObjectId("50db2316e4b0fe4923f08e13") // second day in your example
query = {_id: user, 'history._id': day}
db.users.update(query, {$set: {"history.$.walkingSteps": 6}})
This uses the $ positional operator.
It might be easier to have a separate history collection though.
[Edit] On the separate collections:
Adding days grows the document in size and it might need to be relocated on the disk. This can lead to performance issues and fragmentation.
Deleting days won't shrink the document size on disk.
It makes querying easier/straightforward (e.g. searching for a period of time)

Even though #Justin Case puts the right answer he doesn't explain a few things in it extremely well.
You will notice first of all that he gets rid of the resolution on dates and moves their format to merely the date instead of date and time like so:
day = {_id: ObjectId(), date: ISODate("2013-01-07"), walkingSteps:0, goalStatus: 0}
This means that all your dates will have 00:00:00 for their time instead of the exact time you are using atm. This increases the ease of querying per day so you can do something like:
db.col.update(
{"_id": ObjectId("50db246ce4b0fe4923f08e48"),
"history.date": ISODate("2013-01-07")},
{$inc: {"history.$.walkingSteps":0}}
)
and other similar queries.
This also makes $addToSet actually enforce its rules, however since the data in this sub document could change, i.e. walkingSteps will increment $addToSet will not work well here anyway.
This is something I would change from the ticked answer. I would probably use $push or something else instead since $addToSet is heavier and won't really do anything useful here.
The reason for a separate history collection in my view would be what you said earlier with:
Yes, the amount of history items for that day.
So this array contains a set of days, which is fine but it sounds like the figure that you wish to get walkingSteps from, a set of history items, should be in another collection and you set walkingSteps according to the count of the amount of items in that other collection for today:
db.history_items.find({date: ISODate("2013-01-07")}).count();

Referring to MongoDB Manual, $ is the positional operator which identifies an element in an array field to update without explicitly specifying the position of the element in the array. The positional $ operator, when used with the update() method and acts as a placeholder for the first match of the update query selector.
So, if you issue a command to update your collection like this:
db.history.update(
{someCriterion: someValue },
{ $push: { "history":
{"_id": {
"$oid": "50db2316e4b0fe4923f08e12"
},
"date": {
"$date": "2012-12-24T15:26:15.321Z"
},
"walkingSteps": 10,
"goalStatus": 1
}
}
)
Mongodb might try to identify $oid and $date as some positional parameters. $ also is part of the atomic operators like $set and $push. So, it is better to avoid use this special character in Mongodb.

Related

Mongo query to get comma separated value

I have query which is traversing only in forward direction.
example:
{
"orderStatus": "SUBMITTED",
"orderNumber": "785654",
"orderLine": [
{
"lineNumber": "E1000",
**"trackingnumber": "12345,67890",**
"lineStatus": "IN-PROGRESS",
"lineStatusCode": 50
}
],
"accountNumber": 9076
}
find({'orderLine.trackingNumber' : { $regex: "^12345.*"} })**
When I use the above query I get the entire document. But I want to fetch the document when I search with 67890 value as well
At any part of time I will be always querying with single tracking number only.
12345 or 67890 Either with 12345 or 67890. There are chances tracking number value can extend it's value 12345,56789,01234,56678.
I need to pull the whole document no matter what the tracking number is in whatever position.
OUTPUT
should be whole document
{
"orderStatus": "SUBMITTED",
"orderNumber": "785654",
"orderLine": [
{
"lineNumber": "E1000",
"trackingnumber": "12345,67890",
"lineStatus": "IN-PROGRESS",
"lineStatusCode": 50
}
],
"accountNumber": 9076
}
Also I have done indexing for trackingNumber field. Need help here. Thanks in advance.
Following will search with either 12345 or 67890. It is similar to like condition
find({'orderLine.trackingNumber' : { $regex: /12345/} })
find({'orderLine.trackingNumber' : { $regex: /67890/} })
There's also an alternative way to do this
Create a text index
db.order.createIndex({'orderLine.trackingnumber':"text"})
You can make use of this index to search the value from trackingnumber field
db.order.find({$text:{$search:'12345'}})
--
db.order.find({$text:{$search:'67890'}})
--
//Do take note that you can't search using few in between characters
//like the following query won't give any result..
db.order.find({$text:{$search:'6789'}}) //have purposefully removed 0
To further understand how $text searches work, please go through the following link.

How to avoid Keys with Duplicate Values in Couchbase.Lite

Is it possible to tell CB.Lite to reject documents that contain values from a certain key repeated?
For instance, if i have the next document already in CB.Lite:
{
"Dog": {
"Name": "Dug",
"Color": "Blue",
"Age": 2
}
}
Is it possible to tell CB.Lite to reject any document with repeated Key "Name", so that if i try to add the next one:
{
"Dog": {
"Name": "Dug",
"Color": "Green",
"Age": 5
}
}
it would reject it?
I know It would be not much hassle to implement this functionality myself, but i was wondering if CB.Lite has already something Out of the Box.
Currently not at commit time (this is as of 1.4.x). The closest you could where Couchbase would do most of the work would be to create a View emitting the value you don't want repeated, then query and do the enforcement yourself.
This is assuming the docs themselves have different IDs. If you had what you showed using the same document ID, there are other possibilities. For example, you could trap this and reject it in Sync Gateway.

N1QL nested json, query on field inside object inside array

I have json documents in my Couchbase cluster that looks like this
{
"giata_properties": {
"propertyCodes": {
"provider": [
{
"code": [
{
"value": [
{
"name": "Country Code",
"value": "EG"
},
{
"name": "City Code",
"value": "HRG"
},
{
"name": "Hotel Code",
"value": "91U"
}
]
}
],
"providerCode": "gta",
"providerType": "gds"
},
{
"code": [
{
"value": [
{
"value": "071801"
}
]
},
{
"value": [
{
"value": "766344"
}
]
}
],
"providerCode": "restel",
"providerType": "gds"
},
{
"code": [
{
"value": [
{
"value": "HRG03Z"
}
]
},
{
"value": [
{
"value": "HRG04Z"
}
]
}
],
"providerCode": "5VF",
"providerType": "tourOperator"
}
]
}
}
}
I'm trying to create a query that fetches a single document based on the value of giata_properties.propertyCodes.provider.code.value.value and a specific providerType.
So for example, my input is 071801 and restel, I want a query that will fetch me the document I pasted above (because it contains these values).
I'm pretty new to N1QL so what I tried so far is (without the providerType input)
SELECT * FROM giata_properties AS gp
WHERE ANY `field` IN `gp.propertyCodes.provider.code.value` SATISFIES `field.value` = '071801' END;
This returns me an empty result set. I'm probably doing all of this wrongly.
edit1:
According to geraldss answer I was able to achieve my goal via 2 different queries
1st (More general) ~2m50.9903732s
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END;
2nd (More specific) ~2m31.3660388s
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;
Bucket have around 550K documents. No indexes but the primary currently.
Question part 2
When I do either of the above queries, I get a result streamed to my shell very quickly, then I spend the rest of the query time waiting for the engine to finish iterating over all documents. I'm sure that I'll be only getting 1 result from future queries so I thought I can use LIMIT 1 so the engine stops searching on first result, I tried something like
SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END LIMIT 1;
But that made no difference, I get a document written to my shell and then keep waiting until the query finishes completely. How can this be configured correctly?
edit2:
I've upgraded to the latest enterprise 4.5.1-2844, I have only the primary index created on giata_properties bucket, when I execute the query along with the LIMIT 1 keyword it still takes the same time, it doesn't stop quicker.
I've also tried creating the array index you suggested but the query is not using the index and it keeps insisting on using the #primary index (even if I use USE INDEX clause).
I tried removing SELF from the index you suggested and it took a much longer time to build and now the query can use this new index, but I'm honestly not sure what I'm doing here.
So 3 questions:
1) Why LIMIT 1 using primary index doesn't make the query stop at first result?
2) What's the difference between the index you suggested with and without SELF? I tried to look for SELF keyword documentation but I couldn't find anything.
This is how both indexes look in Web ui
Index 1 (Your original suggestion) - Not working
CREATE INDEX `gp_idx1` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star((((self.`giata_properties`).`propertyCodes`).`provider`)).`code`) end)))
Index 2 (Without SELF)
CREATE INDEX `gp_idx2` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star(((self.`propertyCodes`).`provider`)).`code`) end)))
3) What would be the query for a specific giata_properties.propertyCodes.provider.code.value.value and a specific providerCode? I managed to do both separately but I wasn't successful in merging them.
Thanks for all your help dear
Here is a query without the providerType.
EXPLAIN SELECT *
FROM giata_properties AS gp
WHERE ANY v WITHIN gp.giata_properties.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;
You can also index this in Couchbase 4.5.0 and above.
CREATE INDEX idx1 ON giata_properties( DISTINCT ARRAY v.`value` FOR v WITHIN SELF.giata_properties.propertyCodes.provider[*].code END );
Edit to answer question edits
The performance has been addressed in 4.5.x. You should try the following on Couchbase 4.5.1 and post the execution times here.
Test on 4.5.1.
Create the index.
Use the LIMIT. In 4.5.1, the limit is pushed down to the index.

Possible to chain results in N1ql?

I'm currently trying to do a bit of complex N1QL for a project I'm working on, theoretically I could do all of this processing in multiple N1QL calls and by parsing the results each time, however if possible I'd like for this to contained in one call.
What I would like to do is:
filter all documents that contain a "dataSync.test.id" field with more than 1 id
Read back all other ids in that list
Use that list to get other documents containing those ids
Get the "dataSync.test._channels" field for those documents (optionally a filter by docType might help parsing)
This would probably return a list of "dataSync.test._channels"
Is this possible in N1QL? It appears like it might be but I can't get the syntax right.
My data structures look a little like
{
"dataSync": {
"test": {
"_channels": [
"RP"
],
"id": [
"dataSync_user_1015",
"dataSync_user_1010",
"dataSync_user_1005"
],
"_lastUpdatedBy": "TEST"
}
},
...
}
{
"dataSync": {
"test": {
"_channels": [
"RSD"
],
"id": [
"dataSync_user_1010"
],
"_lastUpdatedBy": "TEST"
}
},
...
}
Yes. I think you can do all these.
Initial set of IDs with filtering can be retrieved as a subquery and then you can get subsquent documents by joins.
SELECT fulldoc
FROM (select meta().id as dockey from doc where a=1) as mydoc
INNER JOIN doc fulldoc ON KEYS mydoc.dockey;
There are optimizations that can be done here. Try the sequencing first to ensure you're get the job done.

How to enter multiple table data in mongoDB using json

I am trying to learn mongodb. Suppose there are two tables and they are related. For example like this -
1st table has
First name- Fred, last name- Zhang, age- 20, id- s1234
2nd table has
id- s1234, course- COSC2406, semester- 1
id- s1234, course- COSC1127, semester- 1
id- s1234, course- COSC2110, semester- 1
how to insert data in the mongo db? I wrote it like this, not sure is it correct or not -
db.users.insert({
given_name: 'Fred',
family_name: 'Zhang',
Age: 20,
student_number: 's1234',
Course: ['COSC2406', 'COSC1127', 'COSC2110'],
Semester: 1
});
Thank you in advance
This would be a assuming that what you want to model has the "student_number" and the "Semester" as what is basically a unique identifier for the entries. But there would be a way to do this without accumulating the array contents in code.
You can make use of the upsert functionality in the .update() method, with the help of of few other operators in the statement.
I am going to assume you are going this inside a loop of sorts, so everything on the right side values is actually a variable:
db.users.update(
{
"student_number": student_number,
"Semester": semester
},
{
"$setOnInsert": {
"given_name": given_name,
"family_name": family_name,
"Age": age
},
"$addToSet": { "courses": course }
},
{ "upsert": true }
)
What this does in an "upsert" operation is first looks for a document that may exist in your collection that matches the query criteria given. In this case a "student_number" with the current "Semester" value.
When that match is found, the document is merely "updated". So what is being done here is using the $addToSet operator in order to "update" only unique values into the "courses" array element. This would seem to make sense to have unique courses but if that is not your case then of course you can simply use the $push operator instead. So that is the operation you want to happen every time, whether the document was "matched" or not.
In the case where no "matching" document is found, a new document will then be inserted into the collection. This is where the $setOnInsert operator comes in.
So the point of that section is that it will only be called when a new document is created as there is no need to update those fields with the same information every time. In addition to this, the fields you specified in the query criteria have explicit values, so the behavior of the "upsert" is to automatically create those fields with those values in the newly created document.
After a new document is created, then the next "upsert" statement that uses the same criteria will of course only "update" the now existing document, and as such only your new course information would be added.
Overall working like this allows you to "pre-join" the two tables from your source with an appropriate query. Then you are just looping the results without needing to write code for trying to group the correct entries together and simply letting MongoDB do the accumulation work for you.
Of course you can always just write the code to do this yourself and it would result in fewer "trips" to the database in order to insert your already accumulated records if that would suit your needs.
As a final note, though it does require some additional complexity, you can get better performance out of the operation as shown by using the newly introduced "batch updates" functionality.For this your MongoDB server version will need to be 2.6 or higher. But that is one way of still reducing the logic while maintaining fewer actual "over the wire" writes to the database.
You can either have two separate collections - one with student details and other with courses and link them with "id".
Else you can have a single document with courses as inner document in form of array as below:
{
"FirstName": "Fred",
"LastName": "Zhang",
"age": 20,
"id": "s1234",
"Courses": [
{
"courseId": "COSC2406",
"semester": 1
},
{
"courseId": "COSC1127",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 2
}
]
}