How to get the latest document in couchbase bucket? - couchbase

I have a activity bucket in my couchbase-db and I need to retrieve the latest document for different types, My initial approach was this:
Document Format : [ id , { val.time ,val.type, val.load } ]
And then I wrote different views to Map a specific val.type and I used reduce to get the latest val.time, However I have the problem of views not being updated ( Cause apparently the Map is only called on new or changed documents and this approach needs all of the documents to be mapped and reduced.)
What is the best practice / approach for time-based data on Couchbase ( NoSQL ) databases ?

You can access the documents by time with a simple view like this:
Convert your time to an integer value. Eg using the parse() method. http://www.quackit.com/javascript/javascript_date_and_time_functions.cfm
Use this integer value as the key to your view. You can then access the first or last document.
If you always need the last document it's faster to negate the time value, so the largest times are at the start of the index.

If you are using a development design document, it is expected that adding new keys into your bucket may have no effect on the view. Since only a subset of keys from the bucket are going into the Map/Reduce pipeline, adding a key that is not going into the subset would trigger no update to your view.
Either evaluate it on a production design document (by clicking the "publish" button), or try adding more keys.

Related

How to Create simple View or query subdocument

I read the docs - but can't seem to get my head around this ( I'm a SQL guy )
1) I loaded a json file in using CBdocloader
[
{
"ID": "9e78f4a6-4061-48aa-8154-0b738d93461b",
"More fields": ""
}
]
2) There is now an object in my bucket calles values100 ( that was the name of the file ) .
3) How to I access the data in this object in the bucket that I imported through a query or view?
Select * from mybucket returns 1 result that has all the rows I loaded - but I really want to query that data in that bucket? Should I create a view? Should I query a View? My Question is #3 but I am confused..
I believe there are a couple of things going on:
a. cbdocloader expects that each document is contained in a separate file. The behavior desired indicates that each document should instead be its own file, rather than one single document on Couchbase. The tool will then create multiple couchbase documents which can be indexed. I'm not sure there is a way to split out the text file using the tool; you may have to write a script to do it for you.
b. Couchbase is intended to be a document database, not a SQL database. As such, the way you would access a document in a majority of the cases is via the document id, which should have some significance to your application. This is not to say you can't look up the document id in an index, but you may find a SQL database to work better if you plan to do a lot of complex queries. If you need help on creating an index, please post a new question.

Storing unconfirmed and confirmed data to a database

I am creating a web application using Strongloop using a MySQL database connector.
I want it to be possible, that a user can modify data in the application - but that this data will not be 'saved' until a user expressly chooses to save the data.
On the other hand, this is a web application and I don't want to keep the data in the user's session or local storage - I want this data to be immediately persisted so it can be recovered easily if the user loses their session.
To implement it I am thinking of doing the following, but I'm not sure if this is a good idea, or if there is a better way to be doing this.
This is one was I can implement it without doing too much customization on an existing relation:
add an new generated index as the primary key for the table
add a new generated index that represents the item in the row
this would be generated for new items, or set to an old item for edits
add a boolean attribute 'saved'
Data will be written as 'saved=false'. To 'save' the data, the row is marked saved and the old row is deleted. The old row can be looked up by it's key, the second attribute in the row.
The way I was thinking of implementing it is to create a base entity called Saveable. Then every Database entity that extends Saveable will also have the 'Saveable' property.
Saveable has:
A generated id number
A generated non id number - the key for the real object
A 'saved' attribute
I would then put a method in Savable.js to perform the save operation and expose it via the API, and a method to intercept new writes and store them as unsaved.
My question is - is this a reasonable way to achieve what I want?

CouchDB override a base JSON document

Is there a standard strategy (or agreed best-practice) in CouchDB for creating a JSON document that:
Is based on another document.
Contains a small number of JSON properties that represent overrides to the original document.
?
On receiving a request, CouchDB would calculate a result JSON document with the overrides applied and return it as a response. The user should not need to know or care that it's a composite document.
Thats a very good Question because you asking for the possibility AND best-practice. The answer is - it depends ;-)
Generally you can do it with a CouchDB _list. Par example you get two docs from the _view the _list is based on, calculate the composite doc and respond it. The downside is that this server-side computing is very performance relevant. Don't use it when your composite doc will be requested in e.g. every user session. But when your use-case is smth like e.g. a request from another service once every night that should be ok.
CouchDB will love you when you take an alternative approach which leads in result to the situation that the composite doc is ready-to-respond stored in an index.
If you want to store the composite doc exactly as it should be a CouchDB _update handler can be used. You get the custom properties of the doc in the payload and the default doc from the database, merge everything into the composite doc and store that under an unique id (or overwrite the default doc).
Last but not least you can also use two approaches which are based on CouchDB _view. Both will not deliver the composite doc but the default doc and the custom overwrites in one request. First approach is to build a view with a multipart key which groups the parent doc (default data) and the child's (overwrites) together - second approach is to create a view with linked data: emit the custom settings as value of the view row and overwrite the view row _id with the _id of the default doc. When the view row gets requested with the query param ?include_docs=true default data and custom overwrites will be included in the result.

Best DynamoDB Structure for Implementation

I am working on a save application, basically the user could go to an article and click save to store it in his profile. Instead of using a relational database the Application currently is using dynamodb. Each article has a specific type of article. The way the structure is currently used for this application is:
user-id [string][DynamoDBHashKey]
type-of-article [string] [DynamoDBRangeKey]
json [string]
user-id is the unique identifier for the user, type-of-article is well.. the type of the article, and the json is all the articles saved in a json format. the json format being:
[{article-id: timestamp}, {article-id: timestamp}]
Article #1 ^ Article #2 ^
article-id is (again) the article unique identifier and timestamp is the timestamp for when that article was stored .
Note This was done before dynamodb started supporting for json documents as Map and Lists. And the code is not mine.. It was already done..
So when the application needs to remove an article from saved It calls dynamo to get the json modify the json and then stores it again. When is going to add a new article it does the same thing. Now a problem appeared when I wanted to display all the articles ordered by the timestamps. I had to call to get all the types and merge them in a dictionary to sort them. (In the user profile I need to show all saved articles, no matter what type, sorted) Now the application is taking more than 700 or 900 ms to respond.
Personally I don't think this is the best way to approach this. So i'm thinking on rewriting the previous code to implement the new features from dynamodb (List and Maps). Now my idea for the structure in dynamodb is like this:
user-id [string] [DynamoDBHashKey]
saved-articles [List]
article-type_1
article_1 [Map] {id: article-id, timestamp: date}
article_2 [Map] {id: article-id, timestamp: date}
article-type_2
article_1 [Map] {id: article-id, timestamp: date}
But i'm relatively new to dynamodb, I made some test code to store this in dynamo using list and maps. I did it using the low level api and with the Object Persistence Model.
Now, my question is: is this a better approach or if is not why ? and what would be the better approach.
This way I think I can use the low level Api to only get the saved-articles of article-type #2. Or if I need them all I just call it all.
I would stick with a solution that is a bit more NoSQL-like. For NoSQL databases, if you have nested data models and/or updating existing records, those are often indicators that your data model can be optimized. I really see 2 objects that your application is using, 'users' and 'articles'. I would avoid a nested data model and updating existing records by doing the following:
'user' table
user id as hash key
'article' table
article id as hash key
timestamp as range key
user id (used in global secondary index described below)
article type and any other attributes would be non-key attributes
You'd also have a global secondary index on the article table that would allow you to search for articles by user id, which would look like something (assuming you want a user's articles sorted by date):
user id as hash key
timestamp as range key
article id as projected attribute
With this model, you never need to go back and edit existing records, you just add records that are 'edited' as new records, and you take the one with the most recent timestamp as your current version.
One thing to remember with NoSQL is that storage space is cheap, reads are cheap, but editing existing records are usually expensive and undesirable operations.

Tridion 2009 embedded metadata storage format in the broker

I'm fairly new to Tridion and I have to implement functionality that will allow a content editor to create a component and assign multiple date ranges (available dates) to it. These will need to be queried from the broker to provide a search functionality.
Originally, this was only require a single start and end date and so were implemented as individual meta data fields.
I am proposing to use an embedded schema within the schema's 'available dates' metadata field to allow multiple start and end dates to be assigned.
However, as the field is now allowing multiple values, the data is stored in the broker as comma separated values in the 'KEY_STRING_VALUE' column rather than as a date value in the 'KEY_DATE_VALUE' column as it was when it was only allowed a single start and end values.
eg.
KEY_NAME | KEY_STRING_VALUE
end_date | 2012-04-30T13:41:00, 2012-06-30T13:41:00
start_date | 2012-04-21T13:41:00, 2012-06-01T13:41:00
This is now causing issues with my broker querying as I can no longer use simple query logic to retrieve the items I require for the search based on the dates.
Before I start to write C# logic to parse these comma separated dates and search based on those, I was wondering if anyone had had similar requirements/experiences in the past and had implemented this in a different way to reduce the amount of code parsing required and to use the broker querying to complete the search.
I'm developing this on Tridion 2009 but using the 5.3 Broker (for legacy reasons) so the query currently looks like this (for the single start/end dates):
query.SetCustomMetaQuery((KEY_NAME='end_date' AND KEY_DATE_VALUE>'" + startDateStr + "') AND (ITEM_ID IN(SELECT ITEM_ID FROM CUSTOM_META WHERE KEY_NAME='start_date' AND KEY_DATE_VALUE<'" + endDateStr + "')))";
Any help is greatly appreciated.
Just wanted to come back and give some details on how I finally approached this should anyone else face the same scenario.
I proposed the set number of fields to the client (as suggested by Miguel) but the client wasn't happy with that level of restriction.
Therefore, I ended up implementing the embeddable schema containing the start and end dates which gave most flexibility. However, limitations in the Broker API meant that I had to access the Broker DB directly - not ideal, but the client has agreed to the approach to get the functionality required. Obviously this would need to be revisited should any upgrades be made in the future.
All the processing of dates and the available periods were done in C# which means the performance of the solution is actually pretty good.
One thing that I did discover that caused some issues was that if you have multiple values for the field using the embedded schema (ie in this case, multiple start and end dates) then the meta data is stored in the KEY_STRING_VALUE column in the CUSTOM_META table. However, if you only have a single value in the field (i.e. one start and end date) then these are stored as dates in the KEY_DATE_VALUE column in the same way as if you'd just used single fields rather than an embeddable schema. It seems a sensible approach for Tridion to take but it serves to make it slightly more complicated when writing the queries and the parsing code!
This is a complex scenario, as you will have to go throughout all the DCPs and parse those strings to determine if match the search criteria
There is a way you could convert that metadata (comma separated) in single values in the broker, but the name of the fields need to be different Range1, Range2, ...., RangeN
You can do that with a deployer extension where you change the XML Structure of the package and convert each those strings in different values (1,2, .., n).
This extension can take some time if you are not familiar with deployer extensions and doesn't solve 100% your scenario.
The problem of this is that you still have to apply several conditions for retrieve those values and there is always a limit you have to set (Versus the User that can add as may values as wants)
Sample:
query.SetCustomMetaQuery((KEY_NAME='end_date1'
query.SetCustomMetaQuery((KEY_NAME='end_date2'
query.SetCustomMetaQuery((KEY_NAME='end_date3'
query.SetCustomMetaQuery((KEY_NAME='end_date4'
Probably the fastest and easiest way to achieve that is instead to use an multi-value field, use different fields. I understand that is not the most generic scenario and there are Business Requirements implications but can simplify the development.
My previous comments are in the context of use only the Broker API, but you can take advantage of a search engine if is part of your architecture.
You can index the Broker Database and massage the data.
Using the Search Engine API you can extract the ids of the Components/Component Templates and use the Broker API to retrieve the proper information