If I have a cluster that is written to very often, and I only need that data replicated every so often.
1) Is it possible to throttle XDCR in couchbase at all?
2) Can I write a custom trigger for when XDCR should be attempted?
Is it possible to throttle XDCR in couchbase at all?
Yes, there are a number of settings to control XDCR, including reducing the number of "Replications per Bucket" from the default of 32 to something lower - this should reduce the XDCR bandwidth usage (at the expense of slower replication. See Providing XDCR advanced settings in the Couchbase Admin Guide for details on that and other settings.
Can I write a custom trigger for when XDCR should be attempted?
No, this isn't possible at present (CB 2.5.1).
Related
It is not said explicit but I suppose that Dataflow could use Persistent Disk Resources
Anyway I can not find confirmation for that.
I wonder if I could assume that limitations and expected performance for using Timers is equal to that provided here: https://cloud.google.com/compute/docs/disks/performance
Dataflow uses Persistent disks to store timers. But there's also a significant amount of caching involved so performance should be better than just reading from persistent disks.
I want to run couchbase on AWS EC2. Since my traffic is cyclic in nature, can I run Couchbase under auto-scaling. Since there are a lot of steps required to add/remove a node, I was wondering if this is the right approach. Has anybody tried it ?
It has been done before. Here is a high level list of the things you'd have to do:
Define which Couchbase metrics you need to use to base your scaling considerations on
Create a script to get those metrics from Couchbase and put them into Cloudwatch using Couchbase Rest API or CLI.
Create an AMI with Couchbase installed and OS configured.
Script the addition of one or more new nodes (using Couchbase Rest API or CLI), plus a rebalance, as a response to Auto-scaling
Script the removal and rebalance of nodes (using Couchbase Rest API or CLI), as a response to contraction in auto-scaling.
With you reliying on rebalances here, you will have to watch how long your rebalances take and perhaps tune your cluster (e.g. move more vBuckets at once and other settings) and usage of Couchbase for faster rebalances (e.g. if you have large views, they can have an effect on rebalances). Normally rebalances are meant to be a background process and take as long as they take, but that may not be appropriate in this particular use. Only you can answer that.
I searched for an explanation on how Couchbase achieves strong consistency inside a cluster. Is all of this as a result of using membase?
Couchbase IS membase btw. Couchbase is a product and a company, the company is a merge of NorthScale (Membase) and the CouchDB founders, and the resulting name for both company and product was Couchbase.
Update operations (replace and [forced] set) update RAM cache first, and subsequent reads are the new value, this is the consistency model.
Couchbase is an "eventually persisted" (EP) architecture, where CRUD operations update RAM cache first and are inserted into the EP queue for disk i/o. At the same time, when replicas are configured, they go into replica queues and are transferred to the other nodes. The EP architecture is what allows for immediate consistency and super high throughput as disk i/o is the slowest component of all systems.
As WiredPrairie mentioned, a single node is responsible/active for a given key. The key is hashed and the result of the hash is a particular partition it should live in. The partition->couchbase-node map, which the sdk's maintain, allows them to go directly to the active node for each partition. Again, this reduces latency because it doesn't have to go through a load balancer, (it's load balanced by the architecture itself), nor does it go through a "master" node, each node is a master, nor does it go through a "shard master" whose job is to redirect clients to a particular shard. By bypassing all those, latency is reduced to a minimum.
Couchbase guarantees strong consistency by enforcing that all reads for a particular piece of data go to a single node in a cluster. You cannot read from a replica. If you could, you might end up with inconsistent data.
When using the 2.0 XDCR, Couchbase provides only eventual consistency.
I wouldn't say it's a "result" of anything other than a specific design requirement they had for their software.
There's some additional information in this blog post.
I don't think it is strong consistency, since if one node with active vbucket reboot, data has not been replicated or not persisted yet, it will lose the data ;
strong consistency need W+R>N, and R=1 here, so we need W=N which means all replicas should be ACID;
we could call it fake strong consistency
I am using RDS on amazon with a MySQL interface. My application runs on EC2 nodes and read/update the database, but the number of reads and writes are too many and that reduces performance. Most of the time the number of connections exceed the allowed limit.
I was considering using Elasticache to improve performance, however I did not find resources on web, how to configure database to use this effectively.
Is this the best way to improve my read/write performance?
Any suggestions?
You can't just "turn on" memcache. You need to write code that interacts with memcache, such that your database query results are cached in memcache. Take a look at this users guide -- I think it will give you a good idea for how memcache is used:
http://www.memcachier.com/documentation/memcache-user-guide/
Performance is related to the type and structure of the queries used, so there might be some room for optimization. Maybe you can provide more details about the exact queries used. However, you could tackle this from a different angle - if you had an auto scaling capability, you could simply scale up your database to additional machines if needed, so you could accommodate an infinite number of connections even without any performance optimization (of course if you optimize it will improve performance). This is not possible on RDS, but there are at least two other cloud DB providers running on EC2 that I'm aware of which offer auto scaling - www.xeround.com and www.enterprisedb.com.
You can use elasticache as a second level cache for your rds db. If you use java you can use the hibernate-memcached lib. But you still need to configure how and what to cache in the second level cache depending on your data.
Additionally you could use read replica at RDS least to get split the read traffic.
(What instance type are you using, you saw that they have different i/o capacities?)
I've been reading up about MySQL Cluster 7, and it appears that there is some support for a memcache storage engine.
Does the implemenation require any custom code in the application (making requests to memcache), or is it integrated to the point where I could
select cars.* from cars WHERE cars.id = 100
and MySQL cluster + memcache would be able to "automatically" look at the memcache cache first, and if there wasn't a hit, look in MySQL?
Like wise with update - Would i manually have to set the data in memcache with every modify or is there a mechanism that will do it for me?
Memcached would not provide the functionality that you describe. Memcached is key-value storage, and it does not automatically cache any query results. You would need to write code to store the results. Some frameworks make this easier.
MySQL's query caching can cache query results, but you're still hitting MySQL.
MySQL's NDB cluster is a clustered in-memory storage engine that is able to serve up relational data very fast thanks to load balancing and partitioning.
Take a look at this blog to learn more about the implementation and capabilities of the memcached API for MySQL Cluster:
http://www.clusterdb.com/mysql-cluster/scalabale-persistent-ha-nosql-memcache-storage-using-mysql-cluster/
Essentially the API is implemented as a plug-in to the memcached server which can then communicate directly with the data nodes, via memcached commands, without going through an SQL layer - giving you very fast native access to your data, with full persistence, scalability, write throughput and schema or schemaless data storage