I searched for an explanation on how Couchbase achieves strong consistency inside a cluster. Is all of this as a result of using membase?
Couchbase IS membase btw. Couchbase is a product and a company, the company is a merge of NorthScale (Membase) and the CouchDB founders, and the resulting name for both company and product was Couchbase.
Update operations (replace and [forced] set) update RAM cache first, and subsequent reads are the new value, this is the consistency model.
Couchbase is an "eventually persisted" (EP) architecture, where CRUD operations update RAM cache first and are inserted into the EP queue for disk i/o. At the same time, when replicas are configured, they go into replica queues and are transferred to the other nodes. The EP architecture is what allows for immediate consistency and super high throughput as disk i/o is the slowest component of all systems.
As WiredPrairie mentioned, a single node is responsible/active for a given key. The key is hashed and the result of the hash is a particular partition it should live in. The partition->couchbase-node map, which the sdk's maintain, allows them to go directly to the active node for each partition. Again, this reduces latency because it doesn't have to go through a load balancer, (it's load balanced by the architecture itself), nor does it go through a "master" node, each node is a master, nor does it go through a "shard master" whose job is to redirect clients to a particular shard. By bypassing all those, latency is reduced to a minimum.
Couchbase guarantees strong consistency by enforcing that all reads for a particular piece of data go to a single node in a cluster. You cannot read from a replica. If you could, you might end up with inconsistent data.
When using the 2.0 XDCR, Couchbase provides only eventual consistency.
I wouldn't say it's a "result" of anything other than a specific design requirement they had for their software.
There's some additional information in this blog post.
I don't think it is strong consistency, since if one node with active vbucket reboot, data has not been replicated or not persisted yet, it will lose the data ;
strong consistency need W+R>N, and R=1 here, so we need W=N which means all replicas should be ACID;
we could call it fake strong consistency
Related
We have an enterprise application that uses an SQL database. The database access characteristics are about 90% reads. The data that does get updated or created needs to be up-to-date immediately. The cache needs to be correctly invalidated with high certainty. The entities are referred to by their primary key for 98% of the cases.
The application is based on Node.js and is AWS-native. Since the application is AWS-native, I'd like to rely on managed services from AWS rather than hosting my own. One option is to implement our read-through Redis-based cache. Upon retrieving the entities, we'd check the cache and if the data is not cached we'd put it into the cache before turning it to the user. The parts of the code that update those entities will invalidate the cache by primary key.
Generally speaking, in computer science cache coherency is one of the most challenging problems to get right. I am of the opinion that rather than implementing a Redis cache and thinking through all of the possible scenarios for correctly invalidating it, it is wiser to instead configure an Aurora read-replica specifically for reading frequently accessed entities. The RDBMS will do a much better job at caching than anything we can build ourselves.
So, I am facing two options -- go through the effort of implementing my own caching, or use read replicas. My personal opinion is to use a read replica.
Any advice is greatly appreciated, as always.
Yes, you're right, cache invalidation is a tough problem. The simplest solution is to add code to your data writes, to replace the cached values. So they're always current. But this is easy only if the cached values have a pretty much 1-to-1 correlation with rows in your database.
An advantage of your own cache is that you can cache data that is not 1-to-1 with rows of data in the database. You might cache an entire HTML fragment for a drop-down menu for example. That could be the result of several SQL queries. It could be quite an advantage to cache data that is higher up the "food chain" so to speak. But cache invalidation becomes less straightforward. Best for storing results of queries that don't change often.
Using a read-replica is not a substitute for using a cache. Querying a read-replica still has overhead of making a database connection, authentication, SQL query parsing and optimization, locking, and all the other overhead that goes into RDBMS workings.
Querying data from a cache can be orders of magnitude faster.
Both have their place. It's best to use both a cache and a read-replica for different tasks. I would also add message queues as an important technology. I believe database, cache, and queue form a three-legged stool.
But you must have experience and judgment to know when each is the best tool for a given case.
I had a cluster which inculdes three nodes. We created a bucket inside and set the number of bucket replicas to be 2. Besides the RAM quota is set to be 10G per node, that is, the total RAM quota is 30G.
I used client-side to save data into this bucket. Hours later, the client-side printed Temporary failure error. and Couchbase web console showed that the bucket RAM reached 29G.Repeated data compression but the RAM didn't reduce anymore.
My questions is organized as follows.
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
In response to your specific questions:
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
If by hardware you mean disk; then yes, currently Couchbase must hold all document keys (along with some additional metadata) in RAM. This is to ensure that any request for a key can be answered immediately, both in the positive ("yes, this key exists and here's it's value) and the negative ("no, such a key doesn't exist.)"
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
Some of this is probably the metadata. If you go to the Bucket tab and display it's statistics by clicking on it's name, you can see the amount of memory used - specifically under the VBucket Resources tab to see how much is used for metadata and user data. See the Couchbase Admin Guide - Viewing Bucket and cluster statistics for more details.
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
The replica metadata is also always kept in RAM, but the replica values (like active values) can be ejected to disk to free up memory.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
Yes the vbucket hashing is done on the client - see the Architecture and Concepts - Vbuckets section in the Admin guide.
In general you may want to review the Sizing chapter in the Admin guide to determine how much of you memory is being used for storing key metadata - specifically the Memory Sizing section. The exact calculation depends on the version of Couchbase (and so I won't duplicate here).
I am curious to know, how couchbase server support high concurrency and high throughput.
It's a very broad question to answer but I'll try to cover some of the key reasons for why Couchbase is fast and scalable.
Writes in Couchbase are by default asynchronous,replication and persistence happen in the background, and the smart clients (SKD's) are notified of success or failure. So basically any new documents or mutations to documents are written to ram and then asynchronously flushed to disk in the background and replicated to other nodes. This means that there is no waiting time or contention on IO/disk speed. (This means it is possible to write to ram and then the node to fall over before the request has been persisted to disk or replicated to a secondary/third node). It is possible to make writes synchronously but it will slow down throughput considerably.
When dealing with ram, writes and read are VERY fast (we've only pushed our cluster to 20k operations a second) but large companies easily hit upwards of 400k operations a second. LinkedIN sustain this ops rate with only 4 nodes ---> http://www.couchbase.com/customer-stories
In traditional database architectures usually the setup would be a master DB (Mysql/Postgres/Oracle) coupled with a slave DB for data redundancy, also writes/reads can be split between the 2 as load gets higher. Couchbase is meant to be used as a distributed system (Couchbase recommend at least 3 nodes in production). Data is automatically sharded between the nodes in a cluster thus spreading the writes/reads across multiple machines. In the case of needing higher throughput, adding a node in Couchbase is as simple as clicking add node and then rebalance cluster, the data will be automatically partitioned across the new cluster map.
So essentially writing/reading from ram with async disk persistence + distributed reads and writes == high throughput
Hope that helps!
#scalabilitysolved already gave a great overview, but if you want a longer (and more detailed) description take a look at the Couchbase_Server_Architecture_Review on couchbase.com
I can't understand why separate write and read is better than write and read in one server.
For example, I have a mysql cluster with three machines: node1, node2,node3.
One possible architecture is:
All write requests to node1, but all read requests to node2 and node3.
The second possible architecture is:
All of these three nodes handle both writes and reads.
We can see in architecture one that the write to node1 pressure very huge, so I prefer architecture two.
Also, why does mongodb separates writes to primary node and reads to secondary nodes.
This is an issue of scale for both MySQL and MongoDB. In the simplest application with a small dataset and low traffic volume, having all writes and reads go to one server gives you a simple architecture. In a very high volume read application with low volume writes, a single write node replicating to more than one read nodes gives you the ability to scale your reads just by adding another node. In a high read AND write volume application you might consider sharding (in MySQL you do it yourself or find a tool to help), in mongodb you run mongos that handles sharding for you. Sharding will put records on a specific instance based on some key that you define. The key will determine the instance each record should be stored on. You can imagine that sharding would be more complicated to manage than a single server for read/write access. You would be right, even in a case like mongodb that does the sharding for you once you define a key (or just use the default key).
MySQL Cluster also supports auto-sharding - by default hashing the primary key, but users can feed their own keys in to provide more distribution awareness. Each node in the cluster is a master, and internal load balancing will distribute the loads across nodes
While very high level, the short demo posted here introduces you to the concepts of sharding in MySQL Cluster:
http://www.oracle.com/pls/ebn/swf_viewer.load?p_shows_id=11464419
Which are the performance considerations I should keep in mind when I'm planning an SQL Azure application? Azure Storage, and the worker and the web roles looks very scalable, but if at the end they are using one database... it looks like the bottleneck.
I was trying to find numbers about:
How many concurrent connections does
SQL Azure support?
Which is the bandwidth?
But no luck.
For example, I'm planning and application that uses a very high level of inserts, but I need return the result of an aggregate function each time (e.g.: the sum of all records with same key in a column), so I can not go with table storage.
Batching is an option, but time response is critical as well, so I'm afraid the database will be bloated with lot of connections.
Sharding is another option, but even when the amount of inserts is massive, the amount of data is very small, 4 to 6 columns with one PK and no FK. So even a 1Gb DB would be an overkill (and an overpay :D) for a partition.
Which would be the performance keys I should keep in mind when I'm facing these kind of applications?
Cheers.
Achieving both scalability and performance can be very difficult, even in the cloud. Your question was primarily about scalability, so you may want to design your application in such a way that your data becomes "eventually" consistent, using queues for example. A worker role would listen for incoming insert requests and would perform the insert asynchronously.
To minimize the number of roundtrips to the database and optimize connection pooling make sure to batch your inserts as well. So you could send 100 inserts in one shot. Also keep in mind that SQL Azure now supports MARS (multiple active recordsets) so that you can return multiple SELECTs in a single batch back to the calling code. The use of batching and MARS should reduce the number of database connections to a minimum.
Sharding usually helps for Read operations; not so much for inserts (although I never benchmarked inserts with sharding). So I am not sure sharding will help you that much for your requirements.
Remember that the Azure offering is designed first for scalability and reasonable performance in a multitenancy environment, where your database is shared with others on the same server. So if you need strong performance with guaranteed response time you may need to reevaluate your hosting choices or indeed test the performance boundaries of Azure for your needs as suggested by tijmenvdk.
SQL Azure will throttle your connections if any form of resource contention occurs (this includes heavy load but might also occur when your database is physically moved around). Throttling is non-deterministic, meaning that you cannot predict if and when this happens. When throttling, SQL Azure will drop your connection, requiring you to perform a retry. Number of connections supported and bandwidth is not published "by design" due to the flexible nature of the underlying infrastructure. Having said that, the setup is optimized for high availability, not high throughput.
If the bursts happen at a known time, you might consider sharding just during those bursts and consolidating the data after the burst has happened. Another way to handle this, is to start queueing/batching writes if and only if throttling occurs. You can use an Azure Queue for that plus a worker role to empty the queue later. This "overflow mechanism" has the advantage of automatically engaging if throttling occurs.
As an alternative you could use Azure Table Storage and keep a separate table of running totals that you can report back instead of performing an aggregation over the data to return the required sum of all records (this might be tricky due to the lack of locking on the tables though).
Apologies for stating the obvious, but the first step would be to test if you run into throttling at all in your scenario. I would give the overflow solution a try.