Limited Synchronization base on client information with Couchbase - couchbase

I'm thinking of putting together this scenario for couchbase and I'm not quite sure whether it's possible? I can't seem to find any reference to it. So here it is.
We have Bucket A as the main bucket. It contains consolidated information for Client A, Client B and Client C. Currently as it is, every one of them are connected directly to Couchbase main cluster.
What I'm thinking is if I could have Client A to have just their information in their own couchbase server but still synchronized with Bucket A in the main cluster, client B have their own couchbase server with their own information only and so is client C. Will this scenario be possible? and if it does, can someone help me with advise on how to accomplish it?
I read it somewhere that couchbase gateway, can pseudo does this except with one potential problem. It can't synchronize with the existing bucket that's already existed in Couchbase Main cluster. So what that means, is that Client A will have to create "Client Bucket A" and sync with Couchbase Main Cluster with bucket called "Client Bucket A". If that truely is the case, that scenario just doesn't work for us. Please advise? Many thanks everyone...
P.S.: I'm not quite sure whether this type of question falls under stackoverflow or ServerFault so pardon my ignorance.

Client A,B and C are separate couchbase clusters? You can't selectively replicate parts of a bucket, you'd need to have 3 separate buckets, one for each client that then replicates to the external client clusters.

Related

How to use more than one MySQL master server in Google Cloud Platform?

I'm looking for a best solution that suits my requirements. I would like to use MySQL with a lot of instances, so I need to be able to add as much master servers with slaves servers as might be needed in the future. There also will be sharding. Currently I've found out that GCP doesn't allow you to add more than one master server to a running instance. If so, what can I do then? I need to create 3 or more master servers and add slave servers to them. And if there is a new row in one of the master servers, the 3 slaves will receive that row and everything will by synchronized, so I'll be able to do a simple SELECT query in one of these slaves to get the actual data. I'm sorry for my english, I'm not a native speaker :)
What you are looking for it's called read replica. Using Google Cloud Cloud SQL for MySQL will let you implement a setup like the one you are describing, deploying multiples read replicas really fast
For the sharding part, you just need to deploy multiple masters with its own read replicas and on your application logic implement the needed code to find the data in the right instance.

What hooks does couchbase sync gateway provide for sync?

Is it possible to use couchbase syny gateway in the following way:
1) Mobile client queries couchbase for data.
2) No data in couchbase present so this triggers a import of the needed data from for example a mysql database into couchbase.
3) The imported data is then transfered to the mobile client by couchbase synch gateway.
4) The mobile client goes to sleep.
5) After 12 hours of inactivity the data is removed from couchbase.
6) Next day the mobile client still holds the data offline and syncs again which sync gateway
7) the data is again imported to couchbase server and the diffs are synced with the client
Does couchbase provide hooks to implement such an flexable usecase?
If yes could somebody point me to the important api calls?
Many Greetings
The preferred way to do this would run most things through Sync Gateway (the data imports from the external source in particular should go through Sync Gateway, not directly to Couchbase, and removing the data should go through SG also.)
Sync Gateway's sync function runs when SG receives documents. In this sense, there's no way to trigger something based on nothing being there.
One way you might solve this is by having the mobile client push a special purpose document. Your sync function could catch this and react in several ways (fire a webhook request, start a replication, or you could set up something to monitor a changes feed and trigger from that).
Next you have the issue of removing the data on the Server side. Here the question is a little unclear. Typically applications write new revisions to SG, and these get synced to the client (and vice versa). If you remove everything on the Server side, you'll actually end up with what are called tombstone revisions showing the document as deleted. (This is a result of the flexible conflict resolution technique used by Couchbase Mobile. It uses multiversion concurrency control.)
The question is a little unclear. It sounds like you don't want to store the data long term on the Server side. If that's right, I think you could do something like:
Delete the data (through SG)
Have the mobile client push data to SG
Trigger SG again with some special document
Update the data from the external source
Have the client pull updates from SG
That's a very rough outline. This is too complicated to really work out in this format. I suggest you post questions through the Couchbase developer forum to get more details.
So, the short answer, yes, this seems feasible, but a full answer needs more detail on what you're doing and what your constraints are.

Couchbase Sync Gateway

I am learning about couchbase. It's my first experience with NoSQL databases.
In the case of a central server and many users with mobile devices. I want every user on your database has different data.
I have doubts about the sync.
To synchronize, the server will have a database per user? A database per user sounds to many databases ... otherwise I do not understand how to differentiate user data.
Is it possible to communicate between server and device through couchdatabase?.
Is it good strategy for communication with the device to write to the replica that is in the server and Couchbase is responsible for making the communication? Where I can find an example of this?
Some of these questions are unclear, but I will attempt to answer:
The server will not have a database per user. It only has the databases (or "buckets") that you set up ahead of time. User data is separated by a mechanism called channels.
Communication between server and device is the exact point of Couchbase Lite. In this scenario, you must make all changes go through Sync Gateway in order for replication to function properly. This is handled for you by Couchbase Lite, so you just need to point it to a Sync Gateway instance and let it take care of replicating between various devices.

Couchbase - what happens if a node dies after writing data to disk but before it gets replicated

Here is the scenario.
I have two nodes under my couchbase server, Node A and B.I have replication on, so B will act as the node where replicated data of A should go.
Lets say that I try adding a new record and it happen to get saved at node A. Node A saves this data on RAM and on its disk successfully but UNFORTUNATELY, it crashes even before this data could get replicated to Node B
If I have configured automatic failover, Then all requests for Node A data will now go to Node B.
My question is Will I be able to get this new data which could not get replicated to node B but was successfully written over Node A's Disk? considering that Node A is down and all i have is Node B to communicate with
If yes, Please explain how. if no, Is there any official couchbase doc mentioning this behavior.
I tried looking for an answer in the official document and mostly it look like that answer is no, But thought of discussing this here before concluding that its data loss for sure.
Thanks in advance
In the scenario you described, yes the data will not be available, assuming you didn't check that the data had been successfully replicated. However note that replication will typically complete before perisistance, as the network is typically faster than disk.
Couchbase provides an observe API which allows you to verify that a particular mutation has been replicated and/or persisted. See Monitoring data using observe in the Couchbase developer guide.

Database replication: multiple geographical locations with local database, one main remote database

I've got a very specific use case and because I'm not too familiar with database replication, I am open to suggestions and ideas about how to accomplish the following in the best possible way:
A web application + database is running on a remote server. Let's call this set-up R for remote.
Now suppose there are 3 separate geographical locations which need read+write access to the database. I will call these locations L1, L2 and L3.
The main problem: the remote server might be unavailable or the internet connection of one of the locations might not always work, rendering the remote application unavailable; but we want the application to work as a high availability solution (on-site) even when the remote server is down or when there is an internet connection problem.
Partial solution: So I was thinking about giving each geographical location its own server with a local copy of the web application. The web application itself can get updated when needed from a version control system automatically (for example using git hooks).
So far so good... (at least I believe so?)
But what about our data? The really tricky part seems to be the database replication. Let's assume no DNS or IP failover and assume that the user first tries to access the remote server directly and if this does not work, the user can still use the local server on-site instead. This all happens inside a web browser (or similar client).
One possible (but unsatisfactory) solution would be to use master-slave replication from R (master) to L1, L2 and L3 (slaves). When doing this asynchronously this should be quite fast? I think this is a viable solution for temporary local read-only database access when the main server is broken or can't be accessed.
But... what about read-write support? I suppose we would need multi-master replication in this case, but I am afraid that synchronous replication using something like (for example) MySQL Cluster or Galera would slow things down, especially since L1, L2 and L3 are on lower bandwidth connections. And they are connected through WAN. (Also, L1, L2 or L3 might not always be online.)
The real question: How would you tackle this specific use case? At the moment I am leaning towards multi-master replication if it doesn't slow down things too much. The application itself will mainly be used by employees on-site but by some external people over WAN as well. Would multi-master replication work well? What if for example L1 is down for 24 hours and suddenly comes back on-line? What if R can't be accessed?
EXTRA: not my main question, but I also need the synchronized data to be sent securely over SSL, if possible, please take this into account for your answer.
Perhaps I am still forgetting some necessary details; if so, please respond with some feedback and I will try to update my question accordingly.
Please note that I haven't decided on a database yet and the database schema will be developed from scratch, so ideas using other databases or database engines are welcome as well. (At the moment I have most experience with MySQL and PostgreSQL)
As you are still undecided, I would strongly recommand you to have a look at MS-SQL merge replication. It is strong, highly reliable, replicates through LAN and HTTPS (so called web replication), and not that expensive.
Terminology differs from the mySql Master\Slave idea. We are here talking about one publisher, and multiple subscribers. All changes done at subscriber's level are collected and sent to the publisher, then redistributed to all subscribers (with, if needed, fancy options like 'filtered subscriptions').
Standard architecture will then be:
a publisher, somewhere on a server, which collects and redistributes changes between subscribers. Publisher might not be accessed by end users.
other database subscribers servers, either for local or web access, replicating with the publisher. Subscribers are accessed by end users.
We have been using this architecture for years, including:
one subscriber for internet access
one subscriber for intranet access
tens of subscribers for local access: some subscribers are on our constructions projects, somewhere in the desert ....
Such an architecture is not available "from the shelf" with MySQL. I guess it could be built, but it would then certainly be a lot more expensive than just buying the corresponding MS-SQL licenses. Do not forget that the free SQLEXPRESS version of MS-SQL can be a subscriber.
Be careful: If you are planning to go through such a configuration, I would (really) strongly advise you to have all primary keys set to uniqueIdentifier data type, and randomly generated. This will avoid the typical replication pitfall, where PK's are set to int with automatic increment, and where independant servers generate identical primary keys between two replications (MS-SQL proposes a tool to avoid such problems, where you can allocate PK ranges per server, but this solution is a real PITA ...).