Does the MySQL NDB Cluster consider node distance? Will it use the replicates if they are nearer? - mysql

I'm building a very small NDB cluster with only 3 machines. This means that machine 1 will serve as both MGM Server, MySQL Server, and NDB data node. The database is only 7 GB so I plan to replicate each node at least once. Now, since a query might end up using data that is cached in the NDB node on machine one, even if it isn't node the primary source for that data, access would be much faster (for obvious reasons).
Does the NDB cluster work like that? Every example I see has at least 5 machines. The manual doesn't seem to mention how to handle node differences like this one.

There are a couple of questions here :
Availability / NoOfReplicas
MySQL Cluster can give high availability when data is replicated across 2 or more data node processes. This requires that the NoOfReplicas configuration parameter is set to 2 or greater. With NoOfReplicas=1, each row is stored in only one data node, and a data node failure would mean that some data is unavailable and therefore the database as a whole is unavailable.
Number of machines / hosts
For HA configurations with NoOfReplicas=2, there should be at least 3 separate hosts. 1 is needed for each of the data node processes, which has a copy of all of the data. A third is needed to act as an 'arbitrator' when communication between the 2 data node processes fails. This ensures that only one of the data nodes continues to accept write transactions, and avoids data divergence (split brain). With only two hosts, the cluster will only be resilient to the failure of one of the hosts, if the other host fails instead, the whole cluster will fail. The arbitration role is very lightweight, so this third machine can be used for almost any other task as well.
Data locality
In a 2 node configuration with NoOfReplicas=2, each data node process stores all of the data. However, this does not mean that only one data node process is used to read/write data. Both processes are involved with writes (as they must maintain copies), and generally, either process could be involved in a read.
Some work to improve read locality in a 2-node configuration is under consideration, but nothing is concrete.
This means that when MySQLD (or another NdbApi client) is colocated with one of the two data nodes, there will still be quite a lot of communication with the other data node.

Related

Galera Mariadb Multi-Master Replication

I am trying to setup a cluster of 3 servers in 3 different locations; Dallas-US, London-UK, Mumbai-India. On each location I have setup a webserver and db server. On db server I have configured Galera Mariadb Multi-Master cluster to replicate db among all three servers. My each webservers are connected with local IP to their regional db server. I am expecting that my Dallas webserver will fetch db records from Dallas db server; London webserver from London db server and Mumbai webserver from Mumbai db server.
Everything is working well but I have found that mysql query takes much time above 100s while fetching record. I have tried Mariadb with single instance and its fetching data within 5s.
What am I doing wrong?
It is possible to configure Galera to be single-Master. It does not sound like you did that, but suggest double checking.
Given that all nodes are writable, here's a simplified view of what happens on each transaction.
Do all the work to store/update the data on the Master you are connected to. (Presumably, it is the local machine.)
At COMMIT time, make a single round-trip to each other nodes (probably ~200ms) to give them a chance to say "wait! that would cause a conflict".
Usually step 2 will come back with "it's OK". At this point, the COMMIT returns success to the client.
(Note: If you are not using BEGIN...COMMIT, but instead auto_commit=ON, then there is an implicit COMMIT at the end of each DML statement.)
For a local read, the default action should return "immediately".
But, maybe you are concerned about the "critical read" problem. (cf wsrep_sync_wait) In this case, you want to make sure that a write has propagated to your server. This is likely to lead to a 200ms delay on the read because it waits for the "gcache" is caught up.
If you can assume that only read from the same server that they write to, consider setting wsrep_sync_wait=0. If anyone does a cross-datacenter write, then read, he could hit the "critical read" problem. (This is where he writes something, but may not see it on the next read.)

Does couchbase persist data?

I am planning to store data into the couch-base cluster.
I would like to know what will happen if my couch base goes down for the following scenarios:
[Consider that there were no active transactions happening]
A node goes down from the cluster.(My assumption is after the node is fixed and is up, it will sync up with other nodes, and the data would be there). Also, let me know after it is synced up will there still be any data loss?
The cluster went down and is fixed and restarted.
Please let me know data persistence analogy for the above scenarios.
Yes, Couchbase persists data to disk. It writes change operations to append-only files on the Data Service nodes.
Data loss is unlikely for your two scenarios because there are no active transactions.
Data loss can occur if a node fails
while persisting a change to disk or
before completing replication to another node if the bucket supports replicas.
Example: Three Node Cluster with Replication
Consider the case of a three node Couchbase cluster and a bucket with one replica for each document. That means that a single document will have a copy stored on two separate nodes, call those the active and the replica copies. Couchbase will shard the documents equitably across the nodes.
When a node goes down, about a third of the active and replica copies become unavailable.
A. If a brand new node is added and the cluster rebalanced, the new node will have the same active and replica copies as the old one did. Data loss will occur if replication was incomplete when the node failed.
B. If the node is failed over, then replicas for the active documents on the failed node will become active. Data loss will occur if replication was incomplete when the node failed.
C. If the failed node rejoins the cluster it can reuse its existing data so the only data loss would be due to a failure to write changes to disk.
When the cluster goes down, data loss may occur if there is a disk failure.

Where can I find the clear definitions for a Couchbase Cluster, Couchbase Node and a Couchbase Bucket?

I am new to Couchbase and NoSQL terminologies. From my understanding a Couchbase node is a single system running a Couchbase Server application and a collection of such nodes having the same data by replication form a Couchbase Cluster.
Also, a Couchbase Bucket is somewhat like a table in RDBMS wherein you put your documents. But how can I relate the Node with the Bucket? Can someone please explain me about it in simple terms?
a Node is a single machine (1 IP/ hostname) that executes Couchbase Server
a Cluster is a group of Nodes that talk together. Data is distributed between the nodes automatically, so the load is balanced. The cluster can also provides replication of data for resilience.
a Bucket is the "logical" entity where your data is stored. It is both a namespace (like a database schema) and a table, to some extent. You can store multiple types of data in a single bucket, it doesn't care what form the data takes as long as it is a key and its associated value (so you can store users, apples and oranges in a same Bucket).
The bucket acts gives the level of granularity for things like configuration (how much of the available memory do you want to dedicate to this bucket?), replication factor (how many backup copies of each document do you want in other nodes?), password protection...
Note that I said that Buckets where a "logical" entity? They are in fact divided into 1024 virtual fragments which are spread between all the nodes of the cluster (that's how data distribution is achieved).

Mysql cluster for dummies

So what's the idea behind a cluster?
You have multiple machines with the same copy of the DB where you spread the read/write? Is this correct?
How does this idea work? When I make a select query the cluster analyzes which server has less read/writes and points my query to that server?
When you should start using a cluster, I know this is a tricky question, but mabe someone can give me an example like, 1 million visits and a 100 million rows DB.
1) Correct. Every data node does not hold a full copy of the cluster data, but every single bit of data is stored on at least two nodes.
2) Essentially correct. MySQL Cluster supports distributed transactions.
3) When vertical scaling is not possible anymore, and replication becomes impractical :)
As promised, some recommended readings:
Setting Up Multi-Master Circular Replication with MySQL (simple tutorial)
Circular Replication in MySQL (higher-level warnings about conflicts)
MySQL Cluster Multi-Computer How-To (step-by-step tutorial, it assumes multiple physical machines, but you can run your test with all processes running on the same machine by following these instructions)
The MySQL Performance Blog is a reference in this field
1->your 1st point is correct in a way.But i think if multiple machines would share the same data it would be replication instead of clustering.
In clustering the data is divided among the various machines and there is horizontal partitioning means the dividing of the data is based on the rows,the records are divided by using some algorithm among those machines.
the dividing of data is done in such a way that each record will get a unique key just as in case of a key-value pair and each machine also has a unique machine_id related which is used to define which key value pair would go to which machine.
we call each machine a cluster and each cluster consists of an individual mysql-server, individual data and a cluster manager.and also there is a data sharing between all the cluster nodes so that all the data is available to the every node at any time.
the retrieval of data is done through memcached devices/servers for fast retrieval and
there is also a replication server for a particular cluster to save the data.
2->yes, there is a possibility because there is a sharing of all the data among all the cluster nodes. and also you can use a load balancer to balance the load.But the idea of load balancer is quiet common because they are being used by most of the servers. but if you are trying you just for your knowledge then there is no need because you will not get to notice the type of load that creates the requirement of a load balancer the cluster manager itself can do the whole thing.
3->RandomSeed is right. you do feel the need of a cluster when your replication becomes impractical means if you are using the master server for writes and slave for reads then at some time when the traffic becomes huge such that the sever would not be able to work smoothly then you will feel the need of clustering. simply to speed up the whole process.
this is not the only case, this is just one of the scenario this is only just a case.
hope this is helpful for you!!

Why is seperate Write and Read better?

I can't understand why separate write and read is better than write and read in one server.
For example, I have a mysql cluster with three machines: node1, node2,node3.
One possible architecture is:
All write requests to node1, but all read requests to node2 and node3.
The second possible architecture is:
All of these three nodes handle both writes and reads.
We can see in architecture one that the write to node1 pressure very huge, so I prefer architecture two.
Also, why does mongodb separates writes to primary node and reads to secondary nodes.
This is an issue of scale for both MySQL and MongoDB. In the simplest application with a small dataset and low traffic volume, having all writes and reads go to one server gives you a simple architecture. In a very high volume read application with low volume writes, a single write node replicating to more than one read nodes gives you the ability to scale your reads just by adding another node. In a high read AND write volume application you might consider sharding (in MySQL you do it yourself or find a tool to help), in mongodb you run mongos that handles sharding for you. Sharding will put records on a specific instance based on some key that you define. The key will determine the instance each record should be stored on. You can imagine that sharding would be more complicated to manage than a single server for read/write access. You would be right, even in a case like mongodb that does the sharding for you once you define a key (or just use the default key).
MySQL Cluster also supports auto-sharding - by default hashing the primary key, but users can feed their own keys in to provide more distribution awareness. Each node in the cluster is a master, and internal load balancing will distribute the loads across nodes
While very high level, the short demo posted here introduces you to the concepts of sharding in MySQL Cluster:
http://www.oracle.com/pls/ebn/swf_viewer.load?p_shows_id=11464419