What is the need of Journal node
Why we configure three journal nodes in high availability.
Is it only for replication?
The Role of Journal nodes is to keep both the Namenodes in sync and avoid hdfs split brain scenario by allowing only Active NN to write into journals.
From Apache Hadoop Documentations
Prior to Hadoop 2.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine was unavailable, the cluster on the whole would be unavailable until the NameNode was either restarted or started on a separate machine. In a classic HA cluster, two separate machines are configured as NameNodes. At any point, one of the NameNodes will be in Active state and the other will be in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover.
In order for the Standby node to keep its state coordinated with the Active node, both nodes communicate with a group of separate daemons called ‘JournalNodes’ (JNs). When any namespace modification is performed by the Active node, it logs a record of the changes made, in the JournalNodes. The Standby node is capable of reading the amended information from the JNs, and is regularly monitoring them for changes. As the Standby Node sees the changes, it then applies them to its own namespace. In case of a failover, the Standby will make sure that it has read all the changes from the JounalNodes before changing its state to ‘Active state’. This guarantees that the namespace state is fully synched before a failover occurs.
JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.
Here is also some good external link about JournalNode
https://www.edureka.co/blog/namenode-high-availability-with-quorum-journal-manager-qjm/
https://community.hortonworks.com/articles/27225/how-qjm-works-in-namenode-ha.html
Related
My understanding is that Zookeeper is often used to solve the problem of "keeping track of which node plays a particular role" in a distributed system (e.g. master node in a DB or in a MapReduce cluster, etc).
For simplicity, say we have a DB with one master and multiple replicas and the current master node in the DB goes down. In this scenario, one would, in principle, make one of the replica nodes a new master node. At this point my understanding is:
If we didn't have Zookeeper
The application servers may not know that we have a new master node, so they would not know where to send writes unless we have some custom logic on the app server itself to detect / correct this problem.
If we have Zookeeper
Zookeeper would somehow detect this failure, and update the value for the corresponding master key. Moreover, application servers can (optionally?) register hooks in Zookeeper, so Zookeeper can notify them of this failure, so that the app servers can update (e.g. in memory), which DB node is the new master.
My questions are:
How does Zookeper know what node to make master? Is Zookeper responsible for this choice?
How is this information propagated to nodes that need to interact with Zookeeper? E.g. If one of the Zookeeper nodes go down, how would the application servers know which Zookeeper node to hit in this scenario? Does Zookeeper manage this differently from competing solutions like e.g. etcd?
The answer to both 1. and 2. is called leader election process and briefly works in the following way:
When a process starts in a cluster managed by ZK, the cluster enters an election state. If there is a leader then there exists an established hierarcy and the existing leader is just verified. If there is no leader (say master is down), ZK forces the znodes to use sequence flags to look for a new leader. Each node talks to its peers and sends a message containing the node's identifier (sid) and the most recent transaction it executed (zxid). These messages are called votes. When a node receives a vote it can either neglect it or keep it depending the zxid. If zxid is newer it keeps the vote if older than what it has it discards it. If there is a tie in zxids then the vote with the highest sid wins! So there will come a time when all nodes will have the same vote which will define the new leader by the sid. So this is how ZK elects a new leader node!
I am planning to store data into the couch-base cluster.
I would like to know what will happen if my couch base goes down for the following scenarios:
[Consider that there were no active transactions happening]
A node goes down from the cluster.(My assumption is after the node is fixed and is up, it will sync up with other nodes, and the data would be there). Also, let me know after it is synced up will there still be any data loss?
The cluster went down and is fixed and restarted.
Please let me know data persistence analogy for the above scenarios.
Yes, Couchbase persists data to disk. It writes change operations to append-only files on the Data Service nodes.
Data loss is unlikely for your two scenarios because there are no active transactions.
Data loss can occur if a node fails
while persisting a change to disk or
before completing replication to another node if the bucket supports replicas.
Example: Three Node Cluster with Replication
Consider the case of a three node Couchbase cluster and a bucket with one replica for each document. That means that a single document will have a copy stored on two separate nodes, call those the active and the replica copies. Couchbase will shard the documents equitably across the nodes.
When a node goes down, about a third of the active and replica copies become unavailable.
A. If a brand new node is added and the cluster rebalanced, the new node will have the same active and replica copies as the old one did. Data loss will occur if replication was incomplete when the node failed.
B. If the node is failed over, then replicas for the active documents on the failed node will become active. Data loss will occur if replication was incomplete when the node failed.
C. If the failed node rejoins the cluster it can reuse its existing data so the only data loss would be due to a failure to write changes to disk.
When the cluster goes down, data loss may occur if there is a disk failure.
In Couchbase documentation: https://developer.couchbase.com/documentation/server/current/concepts/distributed-data-management.html
There is no concept of master nodes, slave nodes, config nodes, name nodes, head nodes, etc, and all the software loaded on each node is identical
But in my logs I get the message found in post:
https://forums.couchbase.com/t/havent-heard-from-a-higher-priority-node-or-a-master-so-im-taking-over/5924
Haven't heard from a higher priority node or a master, so I'm taking over. mb_master 000 ns_1#10.200.0.10 1:07:38 AM Tue Feb 7, 2017
and
Somebody thinks we're master. Not forcing mastership takover over ourselves mb_master 000 ns_1#10.200.0.10 1:07:28 AM Tue Feb 7, 2017
I am having trouble finding what the master does, because any search about a master results in the comment of couchbase not having a master node.
The error messages seem to originate from the cluster management which should look like this (I didn't manage to find the Couchbase implementation of it). The link points to the implementation of membase which is the predecessor of Couchbase.
While all nodes are equal in Couchbase this is not the case when there is some redistribution of data. As described in detail in this document a master is chosen to manage the redistribution. The log messages you get are caused by this process.
The Master Node in the cluster manager is also known as the orchestrator.
Straight from the Couchbase Server 4.6 documentation, https://developer.couchbase.com/documentation/server/4.6/concepts/distributed-data-management.html
Although each node runs its own local Cluster Manager, there is only
one node chosen from among them, called the orchestrator, that
supervises the cluster at a given point in time. The orchestrator
maintains the authoritative copy of the cluster configuration, and
performs the necessary node management functions to avoid any
conflicts from multiple nodes interacting. If a node becomes
unresponsive for any reason, the orchestrator notifies the other nodes
in the cluster and promotes the relevant replicas to active status.
This process is called failover, and it can be done automatically or
manually. If the orchestrator fails or loses communication with the
cluster for any reason, the remaining nodes detect the failure when
they stop receiving its heartbeat, so they immediately elect a new
orchestrator. This is done immediately and is transparent to the
operations of the cluster.
I intend to setup a Couchbase system with two cluster: the main cluster is active and another one for backup (use XDCR to replicate). Use haproxy in front of this Couchbase system to switch (manual) from active cluster to backup cluster when active cluster down.
Before test, i want to ask some advice for this topology. Is there any problem with this. Can i run smoothly in production environment???
I thought i can not use vbucket awareness client in this topology. Because client only know haproxy, i can not send direct request from client to couchbase server (has vbucket for specific document). Is that right???
From your scenario it sounds like overhead. Why would you keep "stand by" cluster as a backup?
Instead, you can have all four instances of couchbase servers as one cluster (each instance running on its own box)...so you will take full advantage of vBucket architecture that it will be native-managed. If one of the instances is down, you will have no loss of data since the enabled replication will have mirror copy in the other nodes.
We use this setup in production with no issues. From time to time we bring one of the instances down for maintenance and the rest of the cluster still runs and its completely transparent to the Couchbase clients, e.g. no down time!
In my opinion XDCR makes sense for geographically separated locations (so you keep one cluster in Americas another in EMEA and so on). If all your instances in the same location, then Couchbase cluster technology will deliver high-availability (HA) with fail-over support already build in.
I'm building a very small NDB cluster with only 3 machines. This means that machine 1 will serve as both MGM Server, MySQL Server, and NDB data node. The database is only 7 GB so I plan to replicate each node at least once. Now, since a query might end up using data that is cached in the NDB node on machine one, even if it isn't node the primary source for that data, access would be much faster (for obvious reasons).
Does the NDB cluster work like that? Every example I see has at least 5 machines. The manual doesn't seem to mention how to handle node differences like this one.
There are a couple of questions here :
Availability / NoOfReplicas
MySQL Cluster can give high availability when data is replicated across 2 or more data node processes. This requires that the NoOfReplicas configuration parameter is set to 2 or greater. With NoOfReplicas=1, each row is stored in only one data node, and a data node failure would mean that some data is unavailable and therefore the database as a whole is unavailable.
Number of machines / hosts
For HA configurations with NoOfReplicas=2, there should be at least 3 separate hosts. 1 is needed for each of the data node processes, which has a copy of all of the data. A third is needed to act as an 'arbitrator' when communication between the 2 data node processes fails. This ensures that only one of the data nodes continues to accept write transactions, and avoids data divergence (split brain). With only two hosts, the cluster will only be resilient to the failure of one of the hosts, if the other host fails instead, the whole cluster will fail. The arbitration role is very lightweight, so this third machine can be used for almost any other task as well.
Data locality
In a 2 node configuration with NoOfReplicas=2, each data node process stores all of the data. However, this does not mean that only one data node process is used to read/write data. Both processes are involved with writes (as they must maintain copies), and generally, either process could be involved in a read.
Some work to improve read locality in a 2-node configuration is under consideration, but nothing is concrete.
This means that when MySQLD (or another NdbApi client) is colocated with one of the two data nodes, there will still be quite a lot of communication with the other data node.