How to ignore delete in couchbase unidirectional replication? - couchbase

I have bucket1 at datacenter1 and i am replicating it on datacenter2 via XDCR (unidirectional)
Now i just want that if there is any update or creation of document occur on bucket1 on datacenter1 then it should get replicated on datacenter2 but if there is any delete operation on any document of bucket1 on datacenter1 it should not replicate on datacenter2 .
In simple terms , i want to ignore delete operation on documents while unidirectional replication .
How can i do that ? Is there any setting in couchbase (i am using couchbase server 3.0) or any other way possible ?

At the moment, there is no way to do that with XDCR in Couchbase. Is your goal to archive all data in one location but have just operational data in the other? It may be worth filing a feature request over at https://issues.couchbase.com.
Since the source is open and what you're talking about isn't a super hard feature, you could try implementing it at the receiving side. Most of the hard work is configuration/UI kinds of things.

Related

How to synchronize MySQL database with Amazon OpenSearch service

I am new to Amazon OpenSearch service, and i wish to know if there's anyway i can sync MySQL db with Opensearch on real time. I thought of Logstash but it seems like it doesn't support delete , update operations which might not update my OpenSearch cluster
I'm going to comment for Elasticsearch as that is the tag used for this question.
You can:
Read from the database (SELECT * from TABLE)
Convert each record to a JSON Document
Send the json document to elasticsearch, preferably using the _bulk API.
Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.
I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/
Have also a look at this "live coding" recording.
Side note: If you want to run Elasticsearch, have look at Cloud by Elastic, also available if needed from AWS Marketplace, Azure Marketplace and Google Cloud Marketplace.
Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, Maps UI, Alerting and built-in solutions named Observability, Security, Enterprise Search and what is coming next :) ...
Disclaimer: I'm currently working at Elastic.
Keep a column that indicates when the row was last modified, then you will be able to do updates to OpenSearch. Similarly for deleting, just have a column indicating whether it is deleted or not (soft delete), and the date it was deleted.
With this db design, you can send the "delete" or "update" actions to OpenSearch/ElasticSearch to update/delete the indexes based on the last modified / deleted date. You can later have a scheduled maintenance job to delete these rows permanently from the database table.
Lastly, this article might be of help to you How to keep Elasticsearch synchronized with a relational database using Logstash and JDBC

What hooks does couchbase sync gateway provide for sync?

Is it possible to use couchbase syny gateway in the following way:
1) Mobile client queries couchbase for data.
2) No data in couchbase present so this triggers a import of the needed data from for example a mysql database into couchbase.
3) The imported data is then transfered to the mobile client by couchbase synch gateway.
4) The mobile client goes to sleep.
5) After 12 hours of inactivity the data is removed from couchbase.
6) Next day the mobile client still holds the data offline and syncs again which sync gateway
7) the data is again imported to couchbase server and the diffs are synced with the client
Does couchbase provide hooks to implement such an flexable usecase?
If yes could somebody point me to the important api calls?
Many Greetings
The preferred way to do this would run most things through Sync Gateway (the data imports from the external source in particular should go through Sync Gateway, not directly to Couchbase, and removing the data should go through SG also.)
Sync Gateway's sync function runs when SG receives documents. In this sense, there's no way to trigger something based on nothing being there.
One way you might solve this is by having the mobile client push a special purpose document. Your sync function could catch this and react in several ways (fire a webhook request, start a replication, or you could set up something to monitor a changes feed and trigger from that).
Next you have the issue of removing the data on the Server side. Here the question is a little unclear. Typically applications write new revisions to SG, and these get synced to the client (and vice versa). If you remove everything on the Server side, you'll actually end up with what are called tombstone revisions showing the document as deleted. (This is a result of the flexible conflict resolution technique used by Couchbase Mobile. It uses multiversion concurrency control.)
The question is a little unclear. It sounds like you don't want to store the data long term on the Server side. If that's right, I think you could do something like:
Delete the data (through SG)
Have the mobile client push data to SG
Trigger SG again with some special document
Update the data from the external source
Have the client pull updates from SG
That's a very rough outline. This is too complicated to really work out in this format. I suggest you post questions through the Couchbase developer forum to get more details.
So, the short answer, yes, this seems feasible, but a full answer needs more detail on what you're doing and what your constraints are.

couchbase document read date/time

i want to implement a feature that shows when a couchbase document is last read.
Is this saved by default in meta data of couchbase or i need to update the document with a field on every read so it can be retrieved later on .
There's nothing like that in the metadata, you'd have to update the document yourself.
Side note: For writes/updates, you could have made use of the auditing annotation feature of Spring Data (supported by Spring Data Couchbase since SDC 2.1.1) but not for reads.
Also note that performance will suffer as you'd have to effectively perform a write for each read. And there's also the potential consistency side-effects: what if there's already a write of the same document happening in parallel?
To implement this, if you can wait for Couchbase Server 4.5, you should maybe consider using the sub-document API. (see this blog).

Could i use couchbase for token generation and validation systems

I am designing a high volume service which essentially works as a token creation and validation service. Today we use SQL based databases which are fast failing to scale and the alternative that we are looking at is Couchbase server (memcached). However, the use case is the token generated by this service is sent to other services where the token will be used for authentication. If the replication is not fast enough, the authentication fails. Is there a simpler means to achieve this either via code or any other alternatives is also welcome. This seems to us to be a "read-your-own-write" use case.
There is no consistency API for XDCR right now. It's probably somewhere on the dev roadmap, because it's one of the commonly requested features.
If you want to get RYOW consistency across data centers, our only option is to write to all the DCs simulataneously from the application code. Of course, it's not atomic, but you can work around that somewhat by waiting for all the clusters to acknowledge the write before proceeding.
I think you might be misunderstanding what replication is for in Couchbase. Couchbase is strongly consistent for standard data operations. If you write a document or key/value pair, you can immediately read that object. No waiting for replication. The use case you are talking about is a very common one for Couchbase.
The only things that are eventually consistent in Couchbase are XDCR, for obvious reasons, and reading from views. Even then, this can be minimized with proper cluster sizing.

Couchbase - what happens if a node dies after writing data to disk but before it gets replicated

Here is the scenario.
I have two nodes under my couchbase server, Node A and B.I have replication on, so B will act as the node where replicated data of A should go.
Lets say that I try adding a new record and it happen to get saved at node A. Node A saves this data on RAM and on its disk successfully but UNFORTUNATELY, it crashes even before this data could get replicated to Node B
If I have configured automatic failover, Then all requests for Node A data will now go to Node B.
My question is Will I be able to get this new data which could not get replicated to node B but was successfully written over Node A's Disk? considering that Node A is down and all i have is Node B to communicate with
If yes, Please explain how. if no, Is there any official couchbase doc mentioning this behavior.
I tried looking for an answer in the official document and mostly it look like that answer is no, But thought of discussing this here before concluding that its data loss for sure.
Thanks in advance
In the scenario you described, yes the data will not be available, assuming you didn't check that the data had been successfully replicated. However note that replication will typically complete before perisistance, as the network is typically faster than disk.
Couchbase provides an observe API which allows you to verify that a particular mutation has been replicated and/or persisted. See Monitoring data using observe in the Couchbase developer guide.