I intend to run a Point of Sale software in a galera cluster (percona xtradb). Each POS terminal would be its own cluster and then there will be an Amazon EC2 in addition to help avoid split-brain scenarios.
Is the above setup an ideal cluster setup? My POS terminals could range from 1 to N nodes within a local network and I will always only have 1 EC2 instance outside the network.
Thanks,
Yes. To provide automatic failover, 3 nodes is required. If you have 3 nodes in the same building, etc, then you are not safe against floods, earthquakes, tornadoes, data center failure, etc. "Within the local network" -- see what Amazon means by that, then read between the lines; it may or may not protect you from various possible disasters.
Do not plan on having "too many" nodes in the cluster -- all writes go to all other nodes; this can add up to a lot of network traffic. (I have not heard of more than something like a dozen nodes. But I don't know what the practical limit is.)
You could have multiple clusters and have data replicated off-cluster to some central server for reporting, etc. That replication would be ordinary MySQL replication, not the Galera type.
Related
I intend to setup a Couchbase system with two cluster: the main cluster is active and another one for backup (use XDCR to replicate). Use haproxy in front of this Couchbase system to switch (manual) from active cluster to backup cluster when active cluster down.
Before test, i want to ask some advice for this topology. Is there any problem with this. Can i run smoothly in production environment???
I thought i can not use vbucket awareness client in this topology. Because client only know haproxy, i can not send direct request from client to couchbase server (has vbucket for specific document). Is that right???
From your scenario it sounds like overhead. Why would you keep "stand by" cluster as a backup?
Instead, you can have all four instances of couchbase servers as one cluster (each instance running on its own box)...so you will take full advantage of vBucket architecture that it will be native-managed. If one of the instances is down, you will have no loss of data since the enabled replication will have mirror copy in the other nodes.
We use this setup in production with no issues. From time to time we bring one of the instances down for maintenance and the rest of the cluster still runs and its completely transparent to the Couchbase clients, e.g. no down time!
In my opinion XDCR makes sense for geographically separated locations (so you keep one cluster in Americas another in EMEA and so on). If all your instances in the same location, then Couchbase cluster technology will deliver high-availability (HA) with fail-over support already build in.
So what's the idea behind a cluster?
You have multiple machines with the same copy of the DB where you spread the read/write? Is this correct?
How does this idea work? When I make a select query the cluster analyzes which server has less read/writes and points my query to that server?
When you should start using a cluster, I know this is a tricky question, but mabe someone can give me an example like, 1 million visits and a 100 million rows DB.
1) Correct. Every data node does not hold a full copy of the cluster data, but every single bit of data is stored on at least two nodes.
2) Essentially correct. MySQL Cluster supports distributed transactions.
3) When vertical scaling is not possible anymore, and replication becomes impractical :)
As promised, some recommended readings:
Setting Up Multi-Master Circular Replication with MySQL (simple tutorial)
Circular Replication in MySQL (higher-level warnings about conflicts)
MySQL Cluster Multi-Computer How-To (step-by-step tutorial, it assumes multiple physical machines, but you can run your test with all processes running on the same machine by following these instructions)
The MySQL Performance Blog is a reference in this field
1->your 1st point is correct in a way.But i think if multiple machines would share the same data it would be replication instead of clustering.
In clustering the data is divided among the various machines and there is horizontal partitioning means the dividing of the data is based on the rows,the records are divided by using some algorithm among those machines.
the dividing of data is done in such a way that each record will get a unique key just as in case of a key-value pair and each machine also has a unique machine_id related which is used to define which key value pair would go to which machine.
we call each machine a cluster and each cluster consists of an individual mysql-server, individual data and a cluster manager.and also there is a data sharing between all the cluster nodes so that all the data is available to the every node at any time.
the retrieval of data is done through memcached devices/servers for fast retrieval and
there is also a replication server for a particular cluster to save the data.
2->yes, there is a possibility because there is a sharing of all the data among all the cluster nodes. and also you can use a load balancer to balance the load.But the idea of load balancer is quiet common because they are being used by most of the servers. but if you are trying you just for your knowledge then there is no need because you will not get to notice the type of load that creates the requirement of a load balancer the cluster manager itself can do the whole thing.
3->RandomSeed is right. you do feel the need of a cluster when your replication becomes impractical means if you are using the master server for writes and slave for reads then at some time when the traffic becomes huge such that the sever would not be able to work smoothly then you will feel the need of clustering. simply to speed up the whole process.
this is not the only case, this is just one of the scenario this is only just a case.
hope this is helpful for you!!
I'm building a very small NDB cluster with only 3 machines. This means that machine 1 will serve as both MGM Server, MySQL Server, and NDB data node. The database is only 7 GB so I plan to replicate each node at least once. Now, since a query might end up using data that is cached in the NDB node on machine one, even if it isn't node the primary source for that data, access would be much faster (for obvious reasons).
Does the NDB cluster work like that? Every example I see has at least 5 machines. The manual doesn't seem to mention how to handle node differences like this one.
There are a couple of questions here :
Availability / NoOfReplicas
MySQL Cluster can give high availability when data is replicated across 2 or more data node processes. This requires that the NoOfReplicas configuration parameter is set to 2 or greater. With NoOfReplicas=1, each row is stored in only one data node, and a data node failure would mean that some data is unavailable and therefore the database as a whole is unavailable.
Number of machines / hosts
For HA configurations with NoOfReplicas=2, there should be at least 3 separate hosts. 1 is needed for each of the data node processes, which has a copy of all of the data. A third is needed to act as an 'arbitrator' when communication between the 2 data node processes fails. This ensures that only one of the data nodes continues to accept write transactions, and avoids data divergence (split brain). With only two hosts, the cluster will only be resilient to the failure of one of the hosts, if the other host fails instead, the whole cluster will fail. The arbitration role is very lightweight, so this third machine can be used for almost any other task as well.
Data locality
In a 2 node configuration with NoOfReplicas=2, each data node process stores all of the data. However, this does not mean that only one data node process is used to read/write data. Both processes are involved with writes (as they must maintain copies), and generally, either process could be involved in a read.
Some work to improve read locality in a 2-node configuration is under consideration, but nothing is concrete.
This means that when MySQLD (or another NdbApi client) is colocated with one of the two data nodes, there will still be quite a lot of communication with the other data node.
We have an EC2 running both apache and mysql at the moment. I am wondering if moving the mysql to another EC2 instance will increase or decrease the performance of the site. I am more worried about the network speed issues between the two instances.
EC2 instances in the same availability zone are connected via a 10,000 Mbps network - that's faster than a good solid state drive on a SATA-3 interface (6Gb/s)
You won't see any performance drop by moving a database to another server, in fact you'll probably see a performance increase because of having separate memory and cpu cores for the two servers.
If your worry is network latency then forget about it - not a problem on AWS in the same availability zone.
Another consideration is that you're probably storing your website & db file on an EBS mounted volume. That EBS block is stored off-instance so you're actually storing a storage array on the same super-fast 10Gbps network.
So what I'm saying is... with EBS your website and database are already talking across the network to get their data, putting them on seperate instances won't really change anything in that respect - besides giving more resources to both servers. More resources means more data stored locally in memory and more performance.
The answer depends largely on what resources apache and MySQL are using. They can happily co-habit if demands on your website are low, and each are configured with enough memory that they don't shell out to virtual memory. In this instance, they are best kept together.
As traffic grows, or your application grows, you will benefit from splitting them out because they can then both run inside dedicated memory. Provided that the instances are in the same region then you should see fast performance between them. I have even run a web application in Europe with the DB in USA and performance wasn't noticeably bad! I wouldn't recommend that though!
Because AWS is easy and cheap, your best bet is to set it up and benchmark it!
I'm reading everywhere that the minimum for a mysql 7 cluster is 3 physical machines, but the cluster exists out of 4 nodes
1 mysql node
2 data nodes
1 management node
So this means at least 1 machine must be hosting 2 types of nodes but I cannot find anywhere which machine shares which nodes.
I've read that sharing MySQL and data nodes is not recommended so then it must be the management node and MySQL node which are sharing a machine?
Could anyone please advice me on this..
Just a small edit: I'm currently setting this up because we now have 1 normal MySQL server and we're pretty much hitting its limit. I'm mainly trying to setup the cluster for performance gain (2 data/MySQL nodes should be faster then 1 right?), expanding it with more server to gain redundancy is next on the list.
You can co-locate the management nodes with the SQL nodes to reduce your footprint to 3 x physical hosts
I would recommend taking a look at the Evaluation Guide (note, opens a pdf) which can talk you through these factors, as well as providing some tips / best practices when moving from a single MySQL node to a fully distributed storage engine such as MySQL Cluster:
http://dev.mysql.com/downloads/MySQL_Cluster_72_DMR_EvaluationGuide.pdf