Im new to clustering and Im doing a project on cluster database. I want to make use of MySQL Cluster. Im using it for a small scale database and this is my plan:
5 node:
1 management node,
2 SQL node,
2 API node.
My questions are:
1) Is my plan for the node process alright?
2) What should I do when I got the error "Failed to allocate node id..."?
3) Is it a requirement to use multi-threaded data node?
4) Where do I place my web server page for the user to access the database?
Please reply. Thank you so much.
This answer might be a little late but can be helpful for someone else getting started:
1) Is my plan for the node process alright?
Your plan for the node process is ok for a small cluster. I would recommend adding an additional management node and 2 more Data Nodes if the number of replicas you have configured is 2. Reason being since you currently have only 2 data nodes, your cluster will not be functionally should once of those nodes "die" . This is because of the two-phase commit that takes place. In the case of a faiure only 1 data node will be able to persist the data , the other one would be unreachable and therefore the transaction would be marked as incomplete.
2) What should I do when I got the error "Failed to allocate node
id..."?
This error is usually thrown if you have assigned the same id to other nodes in your configuration file. Each node should have a unique Id.
3) Is it a requirement to use multi-threaded data node?
It is not a requirement but recommended. Using mulch-threaded data node allows you to leverage modern computer architecture with multiply cpus to allow for your data processing queries to be processed much faster. As a result updates and queries will be done much faster
4) Where do I place my web server page for the user to access the
database?
Hmm.Not sure why you want to achieve here. This would be a separate question, if you are using PHP or usually any other language. You will have to have a web server configured. Place them in the root of the http directory to get started
Related
I have a MySQL database with some data in it. Now I want to create cluster from this this database.
I've already configured Galera, and I have just this single node. Now, how to replicate the data to ather nodes?
Will it happen auto-maicaly by Galera, or should I dump current data first, then clean the DB on the first node, start all other nodes and load the data to one of them and wait for them to replicate.
Following question is: how to add/remove nodes in runtime?
Just if somebody is still interested in an answer to this.
The procedure is:
bootstrap the cluster by starting the first node
add other nodes as needed
Any new nodes will first receive the current state of the DB before becoming available.
https://mariadb.com/kb/en/mariadb/getting-started-with-mariadb-galera-cluster/#getting-started
Here is the scenario.
I have two nodes under my couchbase server, Node A and B.I have replication on, so B will act as the node where replicated data of A should go.
Lets say that I try adding a new record and it happen to get saved at node A. Node A saves this data on RAM and on its disk successfully but UNFORTUNATELY, it crashes even before this data could get replicated to Node B
If I have configured automatic failover, Then all requests for Node A data will now go to Node B.
My question is Will I be able to get this new data which could not get replicated to node B but was successfully written over Node A's Disk? considering that Node A is down and all i have is Node B to communicate with
If yes, Please explain how. if no, Is there any official couchbase doc mentioning this behavior.
I tried looking for an answer in the official document and mostly it look like that answer is no, But thought of discussing this here before concluding that its data loss for sure.
Thanks in advance
In the scenario you described, yes the data will not be available, assuming you didn't check that the data had been successfully replicated. However note that replication will typically complete before perisistance, as the network is typically faster than disk.
Couchbase provides an observe API which allows you to verify that a particular mutation has been replicated and/or persisted. See Monitoring data using observe in the Couchbase developer guide.
I have a small hadoop/hive cluster (6 nodes in total).
Using "hadoop dfsadmin -report" I see that are datanodes are working well and connected.
Additionally when I add data in a hive table I can see that the data are being distributed
all over the node. (Easy to check, as the disk space used increases).
I am trying to create some indexes on one table. From the jobtracker http interface, I see only one node available. I tried to run multiple queries ( I use mysql for the metadata) but they appear to run only on the node that hive is installed.
Basically My question is how to make the jobtracker to utilize the other nodes as well.
From what you tell it looks that:
Datanodes are properly running on all nodes and able to communicate with namenode.
Task trackers are not running on all nodes except of one, or, are not able to communicate with the job tracker for some reason.
After checking that task trackers indeed running - read their logs to find out what is their problem to communicate with JobTracker.
I'm building a very small NDB cluster with only 3 machines. This means that machine 1 will serve as both MGM Server, MySQL Server, and NDB data node. The database is only 7 GB so I plan to replicate each node at least once. Now, since a query might end up using data that is cached in the NDB node on machine one, even if it isn't node the primary source for that data, access would be much faster (for obvious reasons).
Does the NDB cluster work like that? Every example I see has at least 5 machines. The manual doesn't seem to mention how to handle node differences like this one.
There are a couple of questions here :
Availability / NoOfReplicas
MySQL Cluster can give high availability when data is replicated across 2 or more data node processes. This requires that the NoOfReplicas configuration parameter is set to 2 or greater. With NoOfReplicas=1, each row is stored in only one data node, and a data node failure would mean that some data is unavailable and therefore the database as a whole is unavailable.
Number of machines / hosts
For HA configurations with NoOfReplicas=2, there should be at least 3 separate hosts. 1 is needed for each of the data node processes, which has a copy of all of the data. A third is needed to act as an 'arbitrator' when communication between the 2 data node processes fails. This ensures that only one of the data nodes continues to accept write transactions, and avoids data divergence (split brain). With only two hosts, the cluster will only be resilient to the failure of one of the hosts, if the other host fails instead, the whole cluster will fail. The arbitration role is very lightweight, so this third machine can be used for almost any other task as well.
Data locality
In a 2 node configuration with NoOfReplicas=2, each data node process stores all of the data. However, this does not mean that only one data node process is used to read/write data. Both processes are involved with writes (as they must maintain copies), and generally, either process could be involved in a read.
Some work to improve read locality in a 2-node configuration is under consideration, but nothing is concrete.
This means that when MySQLD (or another NdbApi client) is colocated with one of the two data nodes, there will still be quite a lot of communication with the other data node.
Q:
I've inherited a system that consists (for simplicity) of 2 application servers that write to a single master database. One application server performs quite a few operations {small amount of time, like milli seconds. } per unit of time. The other application server acts like an API Server, through which clients interact. This "API" server operates on half the tables in the database most of which are not needed by the other application server. However the "API" server does cause the other application server, through its interaction with SQL Server, to lose time and performance.
I wanted to know what would be a good approach in resolving this.
idea's so far
[1] create a second database which will be master-master slaved with current database. Getting http://mysql-mmm.org/ scripts and running then. (concurrency?)
[2] slowly begin moving tables from "master" database into a new "API" database. (lots of legacy code..)
[3] some kind of a SQL priority queue.. (how fault tolerant can this be?)
Step 1 - work out where your bottleneck is
Step 2 - decide where your best return on effort is
If you simply want to make it perform better, then you have to work out where the slow point is. Ideally you would use 3 hosts, one for each application server and one for the database. In this configuration, you should quickly be able to work out if it is the database working the disks hard, or if it's CPU loading, lock contention etc.
Once you know where the bottleneck is, you'll have a much more focussed problem to fix. The options you have suggested may or may not help depending on what the real bottleneck is.