I am new to couchbase and NoSql.We have a central/main data center that contains whole documents and several branches/offices, they will have their own specific documents.We want to replicate some documents between branches and main data center.
Each branch must see its own documents and not others.
I want to know if there is any document level security in couchbase?
Have you decided what mechanism you are going to use for replication yet? If you use Couchbase's included XDCR (Cross Data Centre Replication) you could use XDCR filtering to replicate only a subset of documents from the main data centre to the branches. In this way the branch servers will only have the specific documents you send to them. You could also send any updates on the data back to the main DC by setting up a second XDCR process from the branch back to the main DC.
Couchbase's XDCR filtering documentation explain the filtering setup quite well and here are some more details about the XDCR setup in general. Hope that helps!
Related
Is it possible to use couchbase syny gateway in the following way:
1) Mobile client queries couchbase for data.
2) No data in couchbase present so this triggers a import of the needed data from for example a mysql database into couchbase.
3) The imported data is then transfered to the mobile client by couchbase synch gateway.
4) The mobile client goes to sleep.
5) After 12 hours of inactivity the data is removed from couchbase.
6) Next day the mobile client still holds the data offline and syncs again which sync gateway
7) the data is again imported to couchbase server and the diffs are synced with the client
Does couchbase provide hooks to implement such an flexable usecase?
If yes could somebody point me to the important api calls?
Many Greetings
The preferred way to do this would run most things through Sync Gateway (the data imports from the external source in particular should go through Sync Gateway, not directly to Couchbase, and removing the data should go through SG also.)
Sync Gateway's sync function runs when SG receives documents. In this sense, there's no way to trigger something based on nothing being there.
One way you might solve this is by having the mobile client push a special purpose document. Your sync function could catch this and react in several ways (fire a webhook request, start a replication, or you could set up something to monitor a changes feed and trigger from that).
Next you have the issue of removing the data on the Server side. Here the question is a little unclear. Typically applications write new revisions to SG, and these get synced to the client (and vice versa). If you remove everything on the Server side, you'll actually end up with what are called tombstone revisions showing the document as deleted. (This is a result of the flexible conflict resolution technique used by Couchbase Mobile. It uses multiversion concurrency control.)
The question is a little unclear. It sounds like you don't want to store the data long term on the Server side. If that's right, I think you could do something like:
Delete the data (through SG)
Have the mobile client push data to SG
Trigger SG again with some special document
Update the data from the external source
Have the client pull updates from SG
That's a very rough outline. This is too complicated to really work out in this format. I suggest you post questions through the Couchbase developer forum to get more details.
So, the short answer, yes, this seems feasible, but a full answer needs more detail on what you're doing and what your constraints are.
I have heard only of couchbase bucket is there also a basket? I would like to have multiple apps use multiple buckets but for couchbase performance is there a lighter thing than a bucket called basket?
Never heard of a basket in Couchbase. Now that being said we strongly encourage people to add a type field to every document stored in buckets. Before having queries we would tell you to do multiple applications by prefixing all your document keys by the app prefix. Now that we have n1ql and that you can do queries based on the content, you should add a field in the doc as well.
From a security perspective you'll be mixing stuff from different app in the same bucket. We have no way to make any distinction right now between doc from one app or the other on the server side. It means your security model has to be handled on the client/application layer.
I have 2 Couchbase clusters. 1 for real time work and 1 for back-end data query.
I wish to replicate only 10% of the data from the real time bucket to the back-end because it's used for statistically annalists.
Note one: I know it's not possible by the UI, I'm looking for a way to write some-kind of extension for it that could "sit" in the middle of the XCDR and filter it.
Note two: As I understand Elastic-Search are using the replication feature to get noticed for changes on the cluster and build there own indexes. If I could "listen" for those notification myself I could take it from there, reading and sending the relevant data myself.
Any ideas on how I can make it work?
==NOTES==
I found the following link: http://blog.couchbase.com/xdcr-aspnet-and-nancy, this give a basic example of Sinatra project which XDCR can connect to. But there is no link to a documentation on the Rest API for one which doesn't want to work with Sinatra.
As for #Cihan question, replication 10% of the data is the basic use I wish for and for that I can use only the key. But in general I probably like to munipulate the data and also be able to merge it to an existing data - that would be a case if I have 2 real time clusters replicating to 1 back-end cluster.
Don't have anything built in today to do this. You could set up XDCR and delete the data that you don't need on the destination cluster but it may reappear as updates happen so your cleanup will have to continuously run. would a method like that work?
By the way we do plan to have the facility in future. one comment that would be helpful for me is what type of a filtering would suffice in your case? can we filter with a prefix only to achieve your case or would you need a more sophisticated filtering expression?
thanks
Cihan Biyikoglu
http://pubapi.cryptsy.com/api.php?method=marketdatav2
I would like to synchronize market data on a continuous basis (e.g. cryptsy and other exchanges). I would like to show latest buy/sell price from the respective orders from these exchanges on a regular basis as a historical time series.
What backend database should I used to store and render or plot any parameter from the retrieved data as a historical timeseries data.
I'd suggest you look at a database tuned for handling time series data. The one that springs to mind is InfluxDB. This question has a more general take on time series databases.
I think it needs more detail about the requirement.
It just describe, "it needs sync time series data". What is scenario? what is data source and destination?
Option 1.
If it is just data synchronization issues between two data based, easiest solution is CouchDB NoSQL Series (CouchDB, CouchBase, Cloudant)
All they are based on CouchDB, anyway they provides data center level data replication feature (XCDR). So you can replicate the date to other couchDB in other data center or even in couchDB in mobile devices.
I hope it will be useful to u.
Option 2.
Other approach is Data Integration approach. You can sync data by using ETL batch job. Batch worker can copy data to destination periodically. It is most common way to replicate data to other destination. There are a lot of tools it supports ETL line Pentaho ETL, Spring Integration, Apache Camel.
If you provide me more detail scenario, i can help u in more detail
Enjoy
-Terry
I think mongoDB is a good choice. Here is why:
You can easily scale out, and thus be able to store tremendous amount of data. When using an according shard key, you might even be able to position the shards close to the exchange they follow in order to improve speed, if that should become a concern.
Replica sets offer automatic failover, which implicitly could be an issue
Using the TTL feature, data can be automatically deleted after their TTL, effectively creating a round robin database.
Both the aggregation and the map/reduce framework will be helpful
There are some free classes at MongoDB University which will prevent you to avoid the most common pitfalls
i will be using couchbase as the database for my website. i plan for the website to be international so i will probably have datacenters in the usa, europe and australia to keep latency low. i also want to minimise bandwidth between datacenters so i am planning to fire off parallel updates (ajax) to all datacenters whenever the user stores data.
my question is then: if i insert the same data into all three clusters approximately simultaneously, is couchbase smart enough to recognize that this data is identical and therefore does not need replicating between datacenters?
i watched this video and he explained that the cas value is updated when a document is updated and this is used to determine which documents require replication. if the cas value is updated when any document on the cluster is updated then my guess is that the answer is "no" - as it is very likely that i may be sending only some data to all 3 clusters at once, and any data which is sent to only one cluster will get the cas temporarily out of sync for that cluster. however if the cas value is independent per document then the answer may be "yes". maybe there are some options which can be altered to make the cas value independent per document?
Couchbase does not know anything about the body of the documents that you store. From it's perspective, if you write the same document to 3 clusters (all linked bi-directionally with XDCR) it considers them 3 different document mutations to the document with that ID. Couchbase will perform its normal conflict resolution process to choose which of the 3 is the "winner". This will result in "winning" document being transferred to the other two clusters, despite the fact that it may have the exact same content as the "losing" revisions.
Anytime you write to the same document ID in different clusters, you have to be aware that conflict resolution will choose the winning revision. If you're not careful you can overwrite data you didn't mean to.
Typically a different approach is chosen for your use case. For each user, a "home" cluster is chosen, probably based geography. All operations are tied to this cluster for that user. If that cluster is down, you can switch to another cluster. Using this approach you avoid writing to multiple clusters, and you would only change clusters under well defined conditions.
The CAS value is just an opaque identifier of the revision. In your example above, all 3 document writes would end up with different CAS values (which is one of the reasons Couchbase sees them as different, and has to choose a winner)
The conflict resolution process is document in this section of the manual