In couchbase are CAS values always increasing? - couchbase

We are trying to build out an application that is trying to avoid some locking and synchronization, this design requires us to know whether or not the CAS for couchbase is always increasing or changes in a predictable manner for a specific key. So is the CAS for a key always going to be increasing or decreasing or is it random?

CAS values are opaque. The only assumption you can make is that every mutation of a document will generate a new (different) CAS value.

Related

MySQL trigger notifies a client

I have an Android frontend.
The Android client makes a request to my NodeJS backend server and waits for a reply.
The NodeJS reads a value in a MySQL database record (without send it back to the client) and waits that its value changes (an other Android client changes it with a different request in less than 20 seconds), then when it happens the NodeJS server replies to client with that new value.
Now, my approach was to create a MySQL trigger and when there is an update in that table it notifies the NodeJS server, but I don't know how to do it.
I thought two easiers ways with busy waiting for give you an idea:
the client sends a request every 100ms and the server replies with the SELECT of that value, then when the client gets a different reply it means that the value changed;
the client sends a request and the server every 100ms makes a SELECT query until it gets a different value, then it replies with value to the client.
Both are bruteforce approach, I would like to don't use them for obvious reasons. Any idea?
Thank you.
Welcome to StackOverflow. Your question is very broad and I don't think I can give you a very detailed answer here. However, I think I can give you some hints and ideas that may help you along the road.
Mysql has no internal way to running external commands as a trigger action. To my knowledge there exists a workaround in form of external plugin (UDF) that allowes mysql to do what you want. See Invoking a PHP script from a MySQL trigger and https://patternbuffer.wordpress.com/2012/09/14/triggering-shell-script-from-mysql/
However, I think going this route is a sign of using the wrong architecture or wrong design patterns for what you want to achieve.
First idea that pops into my mind is this: Would it not be possible to introduce some sort of messaging from the second nodjs request (the one that changes the DB) to the first one (the one that needs an update when the DB value changes)? That way the the first nodejs "process" only need to query the DB upon real changes when it receives a message.
Another question would be, if you actually need to use mysql, or if some other datastore might be better suited. Redis comes to my mind, since with redis you could implement the messaging to the nodejs at the same time...
In general polling is not always the wrong choice. Especially for high load environments where you expect in each poll to collect some data. Polling makes impossible to overload the processing capacity for the data retrieving side, since this process controls the maximum throughput. With pushing you give that control to the pushing side and if there is many such pushing sides, control is hard to achieve.
If I was you I would look into redis and learn how elegantly its publish/subscribe mechanism can be used as messaging system in your context. See https://redis.io/topics/pubsub

Global variables and sessions in asp.net

I'm new to web development, and coming from the world of java and android I have a few questions. (I'm using asp.net).
Let's assume I have a simple webpage with a label showing a number and a button. When any user presses the button, the number gets incremented automatically for all the users viewing the site, even if they do not refresh the page. Would I use sessions to achieve this or there another concept I should look into?
I have 2 types of counters which I store in a mysql table with the following schema.
Counter_ID Increment_Value
Each counter is active for a set amount of time and only one instance of a counter can be active at one point in time. After this time, the counter is reset to 0 and a new instance of the counter is created. I store all the instances which are active as well as past instances in a table with this schema.
Instance_ID Counter_ID Counter_Value Status(Active/Complete) Time_Remaining
When a user opens a page dedicated to one of the two counter types, the information about the current running instance of that counter needs to be loaded. Would I just execute a SQL query to achieve this and read the information for active counters every time the counter page is loaded or is there a way in which I can store this information on the site so that the site "knows" which instance is currently active and does not require an SQL query for each request (using a global variable concept) ? Obviously, the situations described above are just simplified examples which I use to explain my issue.
You can use ApplicationState to cache global values that are not user-specific. In your first example, since the number is incremented for all users you can transactionally store it in the database whenever it is incremented, and also cache it in ApplicationState so that it can be read quickly when rendering pages on the server. You will have to be careful to ensure you are handling concurrency properly so that each time the number is incremented the Database AND the cache are updated atomically.
It's a little unclear from your question, but if your requirement is to also publish changes to the number in real-time to all users who are currently using your website you will need to look at real-time techniques. Websockets are good for this (if available on the server and client browser). Specifically, on the .NET platform SignalR is a great way to implement real-time communication from server to client and with graceful fall-back in case WebSockets are not supported.
Just to be clear, you would not use Session storage for this scenario (unless I have misinterpreted your question). Session is per-user and should typically not affect other users in the system. Your example is all about global values so Session is not the correct choice in this case.
For your second example, using ApplicationState and transactional DB commits you should be able to cache which counter is currently active and switch them around at will provided you lock all your resources while you perform the switch between them.
Hopefully that's enough information to get you heading in the right direction.

How can I configure the number of vBuckets on Couchbase

This document from Couchbase says:
The hashing function used by Couchbase Server to map keys to vBuckets
is configurable – both the hashing algorithm and the output space (the total number of vBuckets output by the function).
However, I did not find any way to configure it. In particular I'm interested in the number of vBuckets, not the hashing algorithm. Could someone please help.
Thank you!
If you just want to make the rebalance to go faster, you don't need to change the number of vBuckets itself, you can just change the number of vBuckets moved concurrently during the rebalance. By default, rebalance moves 1 vBucket at a time.
Here is how you can change this setting: http://docs.couchbase.com/admin/admin/Tasks/rebalance-behind-the-scenes.html
If you have views defined, you can also disable consistent view querying for the duration of the rebalance to speed it up. Remember to turn it back on after the rebalance finishes! http://docs.couchbase.com/admin/admin/REST/rest-cluster-disable-query.html

xdcr replication of identical data

i will be using couchbase as the database for my website. i plan for the website to be international so i will probably have datacenters in the usa, europe and australia to keep latency low. i also want to minimise bandwidth between datacenters so i am planning to fire off parallel updates (ajax) to all datacenters whenever the user stores data.
my question is then: if i insert the same data into all three clusters approximately simultaneously, is couchbase smart enough to recognize that this data is identical and therefore does not need replicating between datacenters?
i watched this video and he explained that the cas value is updated when a document is updated and this is used to determine which documents require replication. if the cas value is updated when any document on the cluster is updated then my guess is that the answer is "no" - as it is very likely that i may be sending only some data to all 3 clusters at once, and any data which is sent to only one cluster will get the cas temporarily out of sync for that cluster. however if the cas value is independent per document then the answer may be "yes". maybe there are some options which can be altered to make the cas value independent per document?
Couchbase does not know anything about the body of the documents that you store. From it's perspective, if you write the same document to 3 clusters (all linked bi-directionally with XDCR) it considers them 3 different document mutations to the document with that ID. Couchbase will perform its normal conflict resolution process to choose which of the 3 is the "winner". This will result in "winning" document being transferred to the other two clusters, despite the fact that it may have the exact same content as the "losing" revisions.
Anytime you write to the same document ID in different clusters, you have to be aware that conflict resolution will choose the winning revision. If you're not careful you can overwrite data you didn't mean to.
Typically a different approach is chosen for your use case. For each user, a "home" cluster is chosen, probably based geography. All operations are tied to this cluster for that user. If that cluster is down, you can switch to another cluster. Using this approach you avoid writing to multiple clusters, and you would only change clusters under well defined conditions.
The CAS value is just an opaque identifier of the revision. In your example above, all 3 document writes would end up with different CAS values (which is one of the reasons Couchbase sees them as different, and has to choose a winner)
The conflict resolution process is document in this section of the manual

Database strategy for synchronization based on changes

I have a Spring+Hibernate+MySQL backend that exposes my model (8 different entities) to a desktop client. To keep synchronized, I want the client to regularely ask the server for recent changes. The process may be as follows:
Point A: The client connects for the
first time and retrieves all the
model from the server.
Point B: The client asks the server
for all changes since Point A.
Point C: The client asks the server
for all changes since Point B.
To retrieve the changes (point B&C) I could create a HQL query that returns all rows in all my tables that have been last modified since my previous retrieval. However I'm afraid this can be a heavy query and degrade my performance if executed oftenly.
For this reason I was considering other alternatives as keeping a separate table with recent updates for a fast access. I have looked to using L2 query cache but it doesn't seem to serve for my purpose.
Does someone know a good strategy for my purpose? My initial thought is to keep control of synchronization and avoid using "automatic" synchronization tools.
Many thanks
you can store changes in a queue table. Triggers can populate the queue on insert, update, delete. this preserves the order of the changes like insert, update, update, delete. Empty the queue after download.
Emptying the queue would cause issues if you have multiple clients.... may need to think about a design to handle that case.
there are several designs you can go with, all with trade offs. I have used the queue design before, but it was only copying data to a single destination, not multiple.