Two questions:
1) Is there any way to setup a bidirectional replication of all couchbase buckets between two clusters (without manually creating a replication between each of the buckets)
2) I'm creating buckets dynamically using the Java SDK, I would like to create and replicate a bucket created on a cluster automatically onto the other, is there anyway to do this using the Java SDK?
There is not presently a trivial way to establish bi-directional replication for all buckets on one cluster to corresponding buckets on another cluster. Nor is there a programatic Java API for managing XDCR replications.
There is however a REST API for managing XDCR replications that could be used to programatically create replications. With a little bit of work using other aspects of the REST API this could be made to provide a solution for your first question also.
The couchbase-cli tool is a command-line friendly wrapper around a significant portion of the REST API and it should be possible to create shell scripts using it to accomplish what you require if you are adverse to interacting with the REST API directly.
Related
I'm looking for a best solution that suits my requirements. I would like to use MySQL with a lot of instances, so I need to be able to add as much master servers with slaves servers as might be needed in the future. There also will be sharding. Currently I've found out that GCP doesn't allow you to add more than one master server to a running instance. If so, what can I do then? I need to create 3 or more master servers and add slave servers to them. And if there is a new row in one of the master servers, the 3 slaves will receive that row and everything will by synchronized, so I'll be able to do a simple SELECT query in one of these slaves to get the actual data. I'm sorry for my english, I'm not a native speaker :)
What you are looking for it's called read replica. Using Google Cloud Cloud SQL for MySQL will let you implement a setup like the one you are describing, deploying multiples read replicas really fast
For the sharding part, you just need to deploy multiple masters with its own read replicas and on your application logic implement the needed code to find the data in the right instance.
I have a full deployment of couchbase (server, sync gateway and lite) and have an API, mobile app and web app all using it.
It works very well, but I was wondering if there are any advantages to using the Sync Gateway API over the Couchbase SDK? Specifically I would like to know if Sync Gateway would handle larger numbers of operations better than the SDK, perhaps an internal queue/cache system, but can't seem to find definitive documentation for this.
At the moment the API uses the C# Couchbase SDK and we use SyncGateway very little (only really for synchronising the mobile app).
First, some relevant background info :
Every document that needs to be synced over to Couchbase Lite(CBL) clients needs to be processed by the Sync Gateway (SGW). This is true whether a doc is written via the SGW API or whether it comes in via server write (N1QL or SDK). The latter case is referred to as "import processing” wherein the document that is written to the bucket (via N1QL) is read by SGW via DCP feed. The document is then processed by SGW and written back to the bucket with the relevant sync metadata.
Prerequisite :
In order for the SGW to import documents written directly via N1QL/SDK, you must enable “shared bucket access” and import processing as discussed here
Non-mobile documents :
If you have documents that are never going to be synced to the CBL clients, then choice is obvious. Use server SDKs or N1QL
Mobile documents (docs to sync to CBL clients) :
Assuming you are on SGW 2.x syncing with CBL 2.x clients
If you have documents written at server end that need to be synced to CBL clients, then consider the following
Server side write rate:
If you are looking at writes on server side coming in at sustained rates significantly exceeding 1.5K/sec (lets say 5K/sec), then you should go the SGW API route. While it's easy enough to do a bulk update via server N1QL query, remember that SGW still needs to keep up and do the import processing (what's discussed in the background).
Which means, if you are doing high volume updates through the SDK/N1QL, then you will have to rate limit it so the SGW can keep up (do batched updates via SDK)
That said, it is important to consider the fact that if SGW can't keep up with the write throughput on the DCP feed, it's going to result in latency, no matter how the writes are happening (SGW API or N1QL)
If your sustained write rate on server isn’t excepted to be significantly high, then go with N1QL.
Deletes Handling:
Does not matter. Under shared-bucket-access, deletes coming in via SDK or SGW API will result in a tombstone. Read more about it here
SGW specific config :
Naturally, if you are dealing with SGW specific config, creating SGW users, roles, then you will use the SGW API for that.
Conflict Handling :
In 2.x, it does not matter. Conflicts are handled on CBL side.
Challenge with SGW API
Probably the biggest challenge in a real-world scenario is that using the SG API path means either storing information about SG revision IDs in the external system, or perform every mutation as a read-then-write (since we don't have a way to PUT a document without providing a revision ID)
The short answer is that for backend operations, Couchbase SDK is your choice, and will perform much better. Sync Gateway is meant to be used by Mobile clients, with few exceptions (*).
Bulk/Batch operations
In my performance tests using Java Couchbase SDK and bulk operations from AsyncBucket (link), I have updated up to 8 thousand documents per second. In .Net there you can do Batch operations too (link).
Sync Gateway also supports bulk operations, yet it is much slower because it relies on REST API and it requires you to provide a _rev from the previous version of each document you want to update. This will usually result in the backend having to do a GET before doing a PUT. Also, keep in mind that Sync Gateway is not a storage unit. It just works as a proxy to Couchbase, managing mobile client access to segments of data based on the channels registered for each user, and writes all of it's meta-data documents into the Couchbase Server bucket, including channel indexing, user register, document revisions and views.
Querying
Views are indexed thus for querying of large data they may will respond very fast. Whenever a document is changed, the map function of all views has the opportunity to map it. But when a view is created through Sync Gateway REST API, some code is added to your map function to handle user channels/permissions, making it slower than plain code created directly in Couchbase Admin UI. Querying views with compound keys using startKey/endKey parameters is very powerful when you have hierarchical data, but this functionality and the use of reduce function are not available for mobile clients.
N1QL can also be very fast too, when your N1QL query is taking advantage of Couchbase indexes.
Notes
(*) One exception to the rule is when you want to delete a document and have this reflected on mobile phones. The DELETE operation, leaves an empty document with _deleted: true attribute, and can only be done through Sync Gateway. Next time the mobile device synchronizes and finds this hint, it will delete the document from local storage. You can also use set this attribute through a PUT operation, when you may also adding _exp: "2019-12-12T00:00:00.000Z" attribute to perform a programmed purge of the document in a future date, so that the server also gets clean. However, just purging a document through Sync Gateway is equivalent to delete it through Couchbase SDK and this won't reflect on mobile devices.
NOTE: Prior to Sync Gateway 1.5 and Couchbase 5.0, all backend operations had to be done directly in Sync Gateway so that Sync Gateway and mobile clients could detect those changes. This has changed since shared_bucket_access option was introduced. More info here.
I am trying to sync pouchDB with couchBase through Sync Gateway, but i just get data added by pouchDB, not initial data added to couchBase. For example there is 750 docs in couchBase but none of them synced to the pouchDB. Also http://localhost:4985/_admin/db/db not showing couchBase docs too.
The problem is with adding data to Couchbase Server directly. Couchbase Mobile currently requires extra metadata in order to deal with replication and conflict resolution. This isn't handled by the Server SDKs.
The recommended approach is to do all database writes through Sync Gateway.
To simplify use with PHP, you may want to use a Swagger PHP client. (You can see an example of using clients autogenerated by Swagger in this post. The example use Javascript and Node.js, but the principles are the same.)
You can read from Couchbase Server directly if you want (to do a N1QL query, for example).
Another option is to use "bucket shadowing". This is trickier, and is likely to get deprecated at some point. I only list it for completeness.
I want to run couchbase on AWS EC2. Since my traffic is cyclic in nature, can I run Couchbase under auto-scaling. Since there are a lot of steps required to add/remove a node, I was wondering if this is the right approach. Has anybody tried it ?
It has been done before. Here is a high level list of the things you'd have to do:
Define which Couchbase metrics you need to use to base your scaling considerations on
Create a script to get those metrics from Couchbase and put them into Cloudwatch using Couchbase Rest API or CLI.
Create an AMI with Couchbase installed and OS configured.
Script the addition of one or more new nodes (using Couchbase Rest API or CLI), plus a rebalance, as a response to Auto-scaling
Script the removal and rebalance of nodes (using Couchbase Rest API or CLI), as a response to contraction in auto-scaling.
With you reliying on rebalances here, you will have to watch how long your rebalances take and perhaps tune your cluster (e.g. move more vBuckets at once and other settings) and usage of Couchbase for faster rebalances (e.g. if you have large views, they can have an effect on rebalances). Normally rebalances are meant to be a background process and take as long as they take, but that may not be appropriate in this particular use. Only you can answer that.
We have built a LAMP-stack API application via PHP Laravel. This currently uses a local mySQL instance. We have mostly implemented views in AngularJS.
In order to use Firebase, we need to sync data between the authoritative store in mySQL with anything relevant that exists on Firebase, as close to real-time as possible. This means that other parts of the app which are not real-time and don't use Firebase can also serve up fresh content that's very recently been entered into the system.
I know that Firebase is essentially a noSQL database in the cloud. My question is - how do I write a wrapper or a means to sync the canonical version of my Firebase into my database of record - mySQL?
Update to answer - our final decision - ditching Firebase as an option
We have decided against this, as we can easily have a socket.io instance on the same server with an extremely low latency connection to mySQL, so that the two can remain in sync. There's no need to go across the web when resources and endpoints can exist on localhost. It also gives us the option to run our app without any internet connection, which is important if we sell an on-premise appliance to large companies.
A noSQL sync platform like Firebase is really just a temporary store that makes reads/writes faster in semi-real-time. If they attempt to get into the "we also persist everything for you" business - that's a whole different ask with much more commitment required.
The guarantee on eventual consistency between mySQL and Firebase is more important to get right first - to prevent problems down the line. Also, an RDMS is essential to our app - it's the only way to attack a lot of data-heavy problems in our analytics/data mappings - there's very strong reasons most of the world still uses a RDMS like mySQL, etc. You can make those very reliable too - through Amazon RDS and Google Cloud SQL.
There's no specific problem beyond scaling real-time sync that Firebase actually solves for us, which other open source frameworks don't already solve. If their JS lib actually handled offline scenarios (when you START offline) elegantly, I might have considered it, but it doesn't do that yet.
So, YMMV - but in our specific case, we're not considering Firebase for the reasons given above.
The entire topic is incredibly broad, definitely too broad to provide a simple answer to.
I'll stick to the use-case you provided in the comments:
Imagine that you have a checklist stored in mySQL, comprised of some attributes and a set of steps. The steps are stored in another table. When someone updates this checklist on Firebase - how would I sync mySQL as well?
If you insist on combining Firebase and mySQL for this use-case, I would:
Set up your Firebase as a work queue: var ref = new Firebase('https://my.firebaseio.com/workqueue')
have the client push a work item into Firebase: ref.push({ task: 'id-of-state', newState: 'newstate'})
set up a (nodejs) server that:
monitors the work queue (ref.on('child_added')
updates the item in the mySQL database
removes the task from the queue
See this github project for an example of a work queue on top of Firebase: https://github.com/firebase/firebase-work-queue