Couchbase consistency - couchbase

When couchbase server response to set command:
when data was saved to single (master) node
when data was saved to all nodes

The answer will vary slightly by the client library you're using. But generally speaking, a positive result from calling set implies only that there were no I/O or other errors between the client and server. In such a case, the data should be safely in memory on the master node for a given key.
In 2.0, Couchbase Server and the respective client libraries will support the Observe method, which will allow for durability checks. Calling observe, you'll be able to ask questions like:
Is a key in memory on its master node?
Is a key persisted to disk on its master node?
Has a key been replicated in memory?
Has a key been persisted to its replicas?
For more on Observe, see http://www.couchbase.com/wiki/display/couchbase/Observe.
One other point, just to make sure it's clear... Nodes in a Couchbase cluster are all peers in terms of responsibilities, but have master/slave relationships in terms of keys and the replication of those keys. In other words, key "foo" has a single master node, but may be replicated to other nodes as slave copies of that key.

Default behavior of Couchbase is that it will write data to only primary node. Write to other replica node(s) happens asynchronously in a peer-to-peer communication style. Also, in fact in the primary node, data is only written to the cache (RAM) when write operation is performed (it gets persisted later).

Related

couchbase RAM quota and vbucket's detail questions

I had a cluster which inculdes three nodes. We created a bucket inside and set the number of bucket replicas to be 2. Besides the RAM quota is set to be 10G per node, that is, the total RAM quota is 30G.
I used client-side to save data into this bucket. Hours later, the client-side printed Temporary failure error. and Couchbase web console showed that the bucket RAM reached 29G.Repeated data compression but the RAM didn't reduce anymore.
My questions is organized as follows.
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
In response to your specific questions:
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
If by hardware you mean disk; then yes, currently Couchbase must hold all document keys (along with some additional metadata) in RAM. This is to ensure that any request for a key can be answered immediately, both in the positive ("yes, this key exists and here's it's value) and the negative ("no, such a key doesn't exist.)"
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
Some of this is probably the metadata. If you go to the Bucket tab and display it's statistics by clicking on it's name, you can see the amount of memory used - specifically under the VBucket Resources tab to see how much is used for metadata and user data. See the Couchbase Admin Guide - Viewing Bucket and cluster statistics for more details.
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
The replica metadata is also always kept in RAM, but the replica values (like active values) can be ejected to disk to free up memory.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
Yes the vbucket hashing is done on the client - see the Architecture and Concepts - Vbuckets section in the Admin guide.
In general you may want to review the Sizing chapter in the Admin guide to determine how much of you memory is being used for storing key metadata - specifically the Memory Sizing section. The exact calculation depends on the version of Couchbase (and so I won't duplicate here).

Couchbase loses data after cluster node failure simulation

I have 3 nodes in Couchbase cluster with number of replicas set to 1.
While performing a multithreaded insert of 1M documents, I restart one of the nodes couple of times.
The result is that at the end of insert operations, I am missing about 15% of the data.
Any idea how to prevent the data loss?
Firstly, did you failover the node when it went out of the cluster? Until you failover, the replica on the other nodes will not be promoted to active (and hence any replica data will not be accessible).
Secondly, are you checking the return value from your insert operations? If a node is inaccessible (but before a failover) operations will return an exception (likely "timeout") - you should ensure the application retries the insert.
Thirdly, by default most CRUD operations on Couchbase return as soon as the update has occurred on the master node for maximum performance. As a consequence if you do loose a node it's possible that the replica hasn't been written yet - so there would be no replica even if you did perform a failover. To prevent this you can use the observe operation to not report the operation "complete" until a replica node has a copy - see Monitoring Items using observe.
Note that using observe will result in a performance penalty, but this may be an acceptable tradeoff for modifications you particularly care about.

How Couchbase achieves strong consistency

I searched for an explanation on how Couchbase achieves strong consistency inside a cluster. Is all of this as a result of using membase?
Couchbase IS membase btw. Couchbase is a product and a company, the company is a merge of NorthScale (Membase) and the CouchDB founders, and the resulting name for both company and product was Couchbase.
Update operations (replace and [forced] set) update RAM cache first, and subsequent reads are the new value, this is the consistency model.
Couchbase is an "eventually persisted" (EP) architecture, where CRUD operations update RAM cache first and are inserted into the EP queue for disk i/o. At the same time, when replicas are configured, they go into replica queues and are transferred to the other nodes. The EP architecture is what allows for immediate consistency and super high throughput as disk i/o is the slowest component of all systems.
As WiredPrairie mentioned, a single node is responsible/active for a given key. The key is hashed and the result of the hash is a particular partition it should live in. The partition->couchbase-node map, which the sdk's maintain, allows them to go directly to the active node for each partition. Again, this reduces latency because it doesn't have to go through a load balancer, (it's load balanced by the architecture itself), nor does it go through a "master" node, each node is a master, nor does it go through a "shard master" whose job is to redirect clients to a particular shard. By bypassing all those, latency is reduced to a minimum.
Couchbase guarantees strong consistency by enforcing that all reads for a particular piece of data go to a single node in a cluster. You cannot read from a replica. If you could, you might end up with inconsistent data.
When using the 2.0 XDCR, Couchbase provides only eventual consistency.
I wouldn't say it's a "result" of anything other than a specific design requirement they had for their software.
There's some additional information in this blog post.
I don't think it is strong consistency, since if one node with active vbucket reboot, data has not been replicated or not persisted yet, it will lose the data ;
strong consistency need W+R>N, and R=1 here, so we need W=N which means all replicas should be ACID;
we could call it fake strong consistency

Why is seperate Write and Read better?

I can't understand why separate write and read is better than write and read in one server.
For example, I have a mysql cluster with three machines: node1, node2,node3.
One possible architecture is:
All write requests to node1, but all read requests to node2 and node3.
The second possible architecture is:
All of these three nodes handle both writes and reads.
We can see in architecture one that the write to node1 pressure very huge, so I prefer architecture two.
Also, why does mongodb separates writes to primary node and reads to secondary nodes.
This is an issue of scale for both MySQL and MongoDB. In the simplest application with a small dataset and low traffic volume, having all writes and reads go to one server gives you a simple architecture. In a very high volume read application with low volume writes, a single write node replicating to more than one read nodes gives you the ability to scale your reads just by adding another node. In a high read AND write volume application you might consider sharding (in MySQL you do it yourself or find a tool to help), in mongodb you run mongos that handles sharding for you. Sharding will put records on a specific instance based on some key that you define. The key will determine the instance each record should be stored on. You can imagine that sharding would be more complicated to manage than a single server for read/write access. You would be right, even in a case like mongodb that does the sharding for you once you define a key (or just use the default key).
MySQL Cluster also supports auto-sharding - by default hashing the primary key, but users can feed their own keys in to provide more distribution awareness. Each node in the cluster is a master, and internal load balancing will distribute the loads across nodes
While very high level, the short demo posted here introduces you to the concepts of sharding in MySQL Cluster:
http://www.oracle.com/pls/ebn/swf_viewer.load?p_shows_id=11464419

Does the MySQL NDB Cluster consider node distance? Will it use the replicates if they are nearer?

I'm building a very small NDB cluster with only 3 machines. This means that machine 1 will serve as both MGM Server, MySQL Server, and NDB data node. The database is only 7 GB so I plan to replicate each node at least once. Now, since a query might end up using data that is cached in the NDB node on machine one, even if it isn't node the primary source for that data, access would be much faster (for obvious reasons).
Does the NDB cluster work like that? Every example I see has at least 5 machines. The manual doesn't seem to mention how to handle node differences like this one.
There are a couple of questions here :
Availability / NoOfReplicas
MySQL Cluster can give high availability when data is replicated across 2 or more data node processes. This requires that the NoOfReplicas configuration parameter is set to 2 or greater. With NoOfReplicas=1, each row is stored in only one data node, and a data node failure would mean that some data is unavailable and therefore the database as a whole is unavailable.
Number of machines / hosts
For HA configurations with NoOfReplicas=2, there should be at least 3 separate hosts. 1 is needed for each of the data node processes, which has a copy of all of the data. A third is needed to act as an 'arbitrator' when communication between the 2 data node processes fails. This ensures that only one of the data nodes continues to accept write transactions, and avoids data divergence (split brain). With only two hosts, the cluster will only be resilient to the failure of one of the hosts, if the other host fails instead, the whole cluster will fail. The arbitration role is very lightweight, so this third machine can be used for almost any other task as well.
Data locality
In a 2 node configuration with NoOfReplicas=2, each data node process stores all of the data. However, this does not mean that only one data node process is used to read/write data. Both processes are involved with writes (as they must maintain copies), and generally, either process could be involved in a read.
Some work to improve read locality in a 2-node configuration is under consideration, but nothing is concrete.
This means that when MySQLD (or another NdbApi client) is colocated with one of the two data nodes, there will still be quite a lot of communication with the other data node.