What hooks does couchbase sync gateway provide for sync? - couchbase

Is it possible to use couchbase syny gateway in the following way:
1) Mobile client queries couchbase for data.
2) No data in couchbase present so this triggers a import of the needed data from for example a mysql database into couchbase.
3) The imported data is then transfered to the mobile client by couchbase synch gateway.
4) The mobile client goes to sleep.
5) After 12 hours of inactivity the data is removed from couchbase.
6) Next day the mobile client still holds the data offline and syncs again which sync gateway
7) the data is again imported to couchbase server and the diffs are synced with the client
Does couchbase provide hooks to implement such an flexable usecase?
If yes could somebody point me to the important api calls?
Many Greetings

The preferred way to do this would run most things through Sync Gateway (the data imports from the external source in particular should go through Sync Gateway, not directly to Couchbase, and removing the data should go through SG also.)
Sync Gateway's sync function runs when SG receives documents. In this sense, there's no way to trigger something based on nothing being there.
One way you might solve this is by having the mobile client push a special purpose document. Your sync function could catch this and react in several ways (fire a webhook request, start a replication, or you could set up something to monitor a changes feed and trigger from that).
Next you have the issue of removing the data on the Server side. Here the question is a little unclear. Typically applications write new revisions to SG, and these get synced to the client (and vice versa). If you remove everything on the Server side, you'll actually end up with what are called tombstone revisions showing the document as deleted. (This is a result of the flexible conflict resolution technique used by Couchbase Mobile. It uses multiversion concurrency control.)
The question is a little unclear. It sounds like you don't want to store the data long term on the Server side. If that's right, I think you could do something like:
Delete the data (through SG)
Have the mobile client push data to SG
Trigger SG again with some special document
Update the data from the external source
Have the client pull updates from SG
That's a very rough outline. This is too complicated to really work out in this format. I suggest you post questions through the Couchbase developer forum to get more details.
So, the short answer, yes, this seems feasible, but a full answer needs more detail on what you're doing and what your constraints are.

Related

Using Couchbase SDK vs Sync Gateway API

I have a full deployment of couchbase (server, sync gateway and lite) and have an API, mobile app and web app all using it.
It works very well, but I was wondering if there are any advantages to using the Sync Gateway API over the Couchbase SDK? Specifically I would like to know if Sync Gateway would handle larger numbers of operations better than the SDK, perhaps an internal queue/cache system, but can't seem to find definitive documentation for this.
At the moment the API uses the C# Couchbase SDK and we use SyncGateway very little (only really for synchronising the mobile app).
First, some relevant background info :
Every document that needs to be synced over to Couchbase Lite(CBL) clients needs to be processed by the Sync Gateway (SGW). This is true whether a doc is written via the SGW API or whether it comes in via server write (N1QL or SDK). The latter case is referred to as "import processing” wherein the document that is written to the bucket (via N1QL) is read by SGW via DCP feed. The document is then processed by SGW and written back to the bucket with the relevant sync metadata.
Prerequisite :
In order for the SGW to import documents written directly via N1QL/SDK, you must enable “shared bucket access” and import processing as discussed here
Non-mobile documents :
If you have documents that are never going to be synced to the CBL clients, then choice is obvious. Use server SDKs or N1QL
Mobile documents (docs to sync to CBL clients) :
Assuming you are on SGW 2.x syncing with CBL 2.x clients
If you have documents written at server end that need to be synced to CBL clients, then consider the following
Server side write rate:
If you are looking at writes on server side coming in at sustained rates significantly exceeding 1.5K/sec (lets say 5K/sec), then you should go the SGW API route. While it's easy enough to do a bulk update via server N1QL query, remember that SGW still needs to keep up and do the import processing (what's discussed in the background).
Which means, if you are doing high volume updates through the SDK/N1QL, then you will have to rate limit it so the SGW can keep up (do batched updates via SDK)
That said, it is important to consider the fact that if SGW can't keep up with the write throughput on the DCP feed, it's going to result in latency, no matter how the writes are happening (SGW API or N1QL)
If your sustained write rate on server isn’t excepted to be significantly high, then go with N1QL.
Deletes Handling:
Does not matter. Under shared-bucket-access, deletes coming in via SDK or SGW API will result in a tombstone. Read more about it here
SGW specific config :
Naturally, if you are dealing with SGW specific config, creating SGW users, roles, then you will use the SGW API for that.
Conflict Handling :
In 2.x, it does not matter. Conflicts are handled on CBL side.
Challenge with SGW API
Probably the biggest challenge in a real-world scenario is that using the SG API path means either storing information about SG revision IDs in the external system, or perform every mutation as a read-then-write (since we don't have a way to PUT a document without providing a revision ID)
The short answer is that for backend operations, Couchbase SDK is your choice, and will perform much better. Sync Gateway is meant to be used by Mobile clients, with few exceptions (*).
Bulk/Batch operations
In my performance tests using Java Couchbase SDK and bulk operations from AsyncBucket (link), I have updated up to 8 thousand documents per second. In .Net there you can do Batch operations too (link).
Sync Gateway also supports bulk operations, yet it is much slower because it relies on REST API and it requires you to provide a _rev from the previous version of each document you want to update. This will usually result in the backend having to do a GET before doing a PUT. Also, keep in mind that Sync Gateway is not a storage unit. It just works as a proxy to Couchbase, managing mobile client access to segments of data based on the channels registered for each user, and writes all of it's meta-data documents into the Couchbase Server bucket, including channel indexing, user register, document revisions and views.
Querying
Views are indexed thus for querying of large data they may will respond very fast. Whenever a document is changed, the map function of all views has the opportunity to map it. But when a view is created through Sync Gateway REST API, some code is added to your map function to handle user channels/permissions, making it slower than plain code created directly in Couchbase Admin UI. Querying views with compound keys using startKey/endKey parameters is very powerful when you have hierarchical data, but this functionality and the use of reduce function are not available for mobile clients.
N1QL can also be very fast too, when your N1QL query is taking advantage of Couchbase indexes.
Notes
(*) One exception to the rule is when you want to delete a document and have this reflected on mobile phones. The DELETE operation, leaves an empty document with _deleted: true attribute, and can only be done through Sync Gateway. Next time the mobile device synchronizes and finds this hint, it will delete the document from local storage. You can also use set this attribute through a PUT operation, when you may also adding _exp: "2019-12-12T00:00:00.000Z" attribute to perform a programmed purge of the document in a future date, so that the server also gets clean. However, just purging a document through Sync Gateway is equivalent to delete it through Couchbase SDK and this won't reflect on mobile devices.
NOTE: Prior to Sync Gateway 1.5 and Couchbase 5.0, all backend operations had to be done directly in Sync Gateway so that Sync Gateway and mobile clients could detect those changes. This has changed since shared_bucket_access option was introduced. More info here.

Couchbase Sync Gateway- Server and Client API vs bucket shadowing

I am working on a project that uses Couchbase Server and Sync Gateway to synchronize the contents of a bucket with iOS and Android clients running Couchbase Lite. I also need read and write access to the Couchbase Server from a Node.js server application. From the research I've done, using shadowing is frowned upon (https://github.com/couchbase/sync_gateway/wiki/Bucket-Shadowing), which led me to look into the Sync Gateway API as a means to update the bucket from the Node.js application. Updating existing documents through the Sync Gateway API appears to require the most recent revision ID of the document to be passed in, requiring a separate read before the modification (http://mobile-couchbase.narkive.com/HT2kvBP0/cblite-sync-gateway-couchbase-server), which seems potentially inefficient. What is the best way to solve this problem?
Updating a document (which is really creating a new revision) requires the revision ID. Otherwise Couchbase can't associate the update with a parent. This breaks the whole approach to conflict resolution. (Couchbase uses a method known as multiversion concurrency control.)
The expectation is that you're updating the existing contents of a document. This implies you've read the document already, including the revision ID.
If for some reason you don't need to the old contents to update the document, you still need the revision ID. If you work around it (for example, by purging a document through Sync Gateway and then pushing your new version) you can end up with two versions of document in the system with no connection, which will cause a special kind of conflict.
So the short answer is no, there's no way to avoid this (without causing yourself other headaches).
I am not sure why your question was downvoted, as it seems like a reasonable question. You are correct, the Couchbase bucket that is used by Sync Gateway should probably best be thought of as "opaque", you should not be poking around in there and changing things. There are a number of implementations of Couchbase Lite, such as one for Java, .NET, and Mac OS X. Have you considered making a web service that, on one side, is serving your application, and on the other side is itself a Couchbase Lite client? You should be able to separate your data as necessary using channels.

Set TTL for documents in Couchbase Server

I want to set TTL (time to live) at couchbase server for all documents getting pushed from mobile devices continuously for replication. I did not find any clear explanation/example in documentation to do this.
What should be the approach to set TTL for documents coming from mobile devices to Server through Sync Gateway.
Approach 1:
One approach is to create a view at server side which would return createdDate as key. We will query that view for keys of today date which would return today documents and we can set TTL for those documents. But how and when would we call this view and is it a good approach?
Approach 2:
Should I do it by using webhooks where it will listen to document changes (creations) made through Couchbase Lite push replications, set TTL for new documents and save back to Couchbase server?
Is there any other better way to do it?
Also what is the way to set TTL for only specific documents?
EDIT: My final approach:
I will create following view at couchbase server:
function (doc, meta) {
emit(dateToArray(doc.createdDate));
}
I will have a job which would run daily at EOD, query view to get documents created today and set TTL for those documents.
In this way I would be able to delete documents regularly.
Let me know if there is any issue with it or there is any better way.
Hopefully someone from the mobile team can chime in and confirm this, but I don't think that the mobile SDK allows to set an expiry at the moment.
I guess you could add the creation/update time to your document and create a batch job that uses the "core" SDKs to periodically find old documents (either via N1QL or a dedicated view) and remove them.
It is not currently possible to set a TTL for documents via Sync Gateway like you can with a Couchbase Server smart-client. There is a conceptual impedance mismatch with Sync Gateway using the native-style TTLs on documents. This is because the Sync Gateway protocol functions on the basis of revision trees and even when a document is 'deleted', there is still a document in place to mark that there is a document that has been deleted.
I would also be wary of workloads which might require TTLs (e.g. a cache), Sync Gateway documents take up space even after they've been deleted so your dataset may continue to grow unbounded.
If you do require TTLs, and if you do not think the dataset size will be an issue then the best way would be to store a field in the document that represents the time the document would expire. Then you would do two things:
When accessing the document, if it has expired then you manually delete it
Periodically iterate over the all docs endpoint and delete any documents you find with an expiry time in the past.
Couchbase does not delete when TTL reached;
Instead, when you access (expired) document,
then Couchbase check expiry, remove it at that moment.
http://developer.couchbase.com/documentation/server/4.0/developer-guide/expiry.html

mySQL authoritative database - combined with Firebase

We have built a LAMP-stack API application via PHP Laravel. This currently uses a local mySQL instance. We have mostly implemented views in AngularJS.
In order to use Firebase, we need to sync data between the authoritative store in mySQL with anything relevant that exists on Firebase, as close to real-time as possible. This means that other parts of the app which are not real-time and don't use Firebase can also serve up fresh content that's very recently been entered into the system.
I know that Firebase is essentially a noSQL database in the cloud. My question is - how do I write a wrapper or a means to sync the canonical version of my Firebase into my database of record - mySQL?
Update to answer - our final decision - ditching Firebase as an option
We have decided against this, as we can easily have a socket.io instance on the same server with an extremely low latency connection to mySQL, so that the two can remain in sync. There's no need to go across the web when resources and endpoints can exist on localhost. It also gives us the option to run our app without any internet connection, which is important if we sell an on-premise appliance to large companies.
A noSQL sync platform like Firebase is really just a temporary store that makes reads/writes faster in semi-real-time. If they attempt to get into the "we also persist everything for you" business - that's a whole different ask with much more commitment required.
The guarantee on eventual consistency between mySQL and Firebase is more important to get right first - to prevent problems down the line. Also, an RDMS is essential to our app - it's the only way to attack a lot of data-heavy problems in our analytics/data mappings - there's very strong reasons most of the world still uses a RDMS like mySQL, etc. You can make those very reliable too - through Amazon RDS and Google Cloud SQL.
There's no specific problem beyond scaling real-time sync that Firebase actually solves for us, which other open source frameworks don't already solve. If their JS lib actually handled offline scenarios (when you START offline) elegantly, I might have considered it, but it doesn't do that yet.
So, YMMV - but in our specific case, we're not considering Firebase for the reasons given above.
The entire topic is incredibly broad, definitely too broad to provide a simple answer to.
I'll stick to the use-case you provided in the comments:
Imagine that you have a checklist stored in mySQL, comprised of some attributes and a set of steps. The steps are stored in another table. When someone updates this checklist on Firebase - how would I sync mySQL as well?
If you insist on combining Firebase and mySQL for this use-case, I would:
Set up your Firebase as a work queue: var ref = new Firebase('https://my.firebaseio.com/workqueue')
have the client push a work item into Firebase: ref.push({ task: 'id-of-state', newState: 'newstate'})
set up a (nodejs) server that:
monitors the work queue (ref.on('child_added')
updates the item in the mySQL database
removes the task from the queue
See this github project for an example of a work queue on top of Firebase: https://github.com/firebase/firebase-work-queue

Core Data and JSON question

I know this question has been possed before, but the explanation was a little unclear to me, my question is a little more general. I'm trying to conceptualize how one would periodically update data in an iPhone app, using a remote web service. In theory a portion of the data on the phone would be synced periodically (only when updated). While other data would require the user be online, and be requested on the fly.
Conceptually, this seems possible using XML-RPC or JSON and Core data. I wonder if anyone has an opinion on the best way to implement this, I am a novice iPhone developer, but I understand much of the process conceptually.
Thanks
To synchronize a set of entities when you don't have control over the server, here is one approach:
Add a touched BOOL attribute to your entity description.
In a sync attempt, mark all entity instances as untouched (touched = [NSNumber numberWithBool:NO]).
Loop through your server-side (JSON) instances and add or update entities from your Core Data store to your server-side store, or vice versa. The direction of updating will depend on your synchronization policy, and what data is "fresher" on either side. Either way, mark added, updated or sync'ed Core Data entities as touched (touched = [NSNumber numberWithBool:YES])
Depending on your sync policy, delete all entity instances from your Core Data store which are still untouched. Untouched entities were presumably deleted from your server-side store, because no addition, update or sync event took place between the Core Data store and the server for those objects.
Synchronization is a fair amount of work to implement and will depend on what degree of synchronization you need to support. If you're just pulling data, step 3 is considerably simpler because you won't need to push object updates to the server.
Syncing is hard, very hard. Ideally you would want to receive deltas of the changes from the server and then using a unique id for each record in Core Data, update only those records that are new or changed.
Assuming you can do that, then the code is pretty straight forward. If you are syncing in both directions then things get more complicated because you need to track deltas on both sides and handle collisions.
Can you clarify what type of syncing you are wanting to accomplish? Is it bi-directional or pull only?
I have an answer, but it's sucky. I'm currently looking for a more acceptable/reliable solution (i.e. anything Marcus Zarra cooks up).
What I've done needs some work ... seriously, because it doesn't work all the time...
The mobile device has a json catalog of entities, their versions, and a url pointing to a json file with the entity contents.
The server has the same setup, the catalog listing the entities, etc.
Whenever the mobile device starts, it compares the entity versions of it's local catalog with the catalog on the server. If any of those versions on the server are newer, it offers the user an opportunity to download the entity updates.
When the user elects to update, the mobile device now has the url for each of the new/changed entities and downloads it. Once downloaded, the app will blow away all objects for each of the changed entities, and then insert the new objects from JSON. In the event of an error, the deletions/insertions are rolled back to pre-update status.
This works, sort of. I can't catch it in a debug session when it goes awry, so I'm not sure what might cause corruption or inconsistency in the process.