I am planning to store data into the couch-base cluster.
I would like to know what will happen if my couch base goes down for the following scenarios:
[Consider that there were no active transactions happening]
A node goes down from the cluster.(My assumption is after the node is fixed and is up, it will sync up with other nodes, and the data would be there). Also, let me know after it is synced up will there still be any data loss?
The cluster went down and is fixed and restarted.
Please let me know data persistence analogy for the above scenarios.
Yes, Couchbase persists data to disk. It writes change operations to append-only files on the Data Service nodes.
Data loss is unlikely for your two scenarios because there are no active transactions.
Data loss can occur if a node fails
while persisting a change to disk or
before completing replication to another node if the bucket supports replicas.
Example: Three Node Cluster with Replication
Consider the case of a three node Couchbase cluster and a bucket with one replica for each document. That means that a single document will have a copy stored on two separate nodes, call those the active and the replica copies. Couchbase will shard the documents equitably across the nodes.
When a node goes down, about a third of the active and replica copies become unavailable.
A. If a brand new node is added and the cluster rebalanced, the new node will have the same active and replica copies as the old one did. Data loss will occur if replication was incomplete when the node failed.
B. If the node is failed over, then replicas for the active documents on the failed node will become active. Data loss will occur if replication was incomplete when the node failed.
C. If the failed node rejoins the cluster it can reuse its existing data so the only data loss would be due to a failure to write changes to disk.
When the cluster goes down, data loss may occur if there is a disk failure.
Related
As per How To Set Up Replication in MySQL,
Once the replica instance has been initialized, it creates two
threaded processes. The first, called the IO thread, connects to the
source MySQL instance and reads the binary log events line by line,
and then copies them over to a local file on the replica’s server
called the relay log. The second thread, called the SQL thread, reads
events from the relay log and then applies them to the replica
instance as fast as possible.
Isn't it contradictory to the theory of master-slave database replication in which the master copies data to the slaves?
Reliability. (A mini-history of MySQL's efforts.)
When a write occurs on the Primary, N+1 extra actions occur:
One write to the binlog -- this is to allow for any Replicas that happen to be offline (for any reason); they can come back later and request data from this file. (Also see sync_binlog)
N network writes, one per Replica. These are to get the data to the Replicas ASAP.
Normally, if you want more than a few Replicas, you can "fan out" through several levels, thereby allowing for an unlimited number of Replicas. (10 per level would give you 1000 Replicas in 3 layers.)
The product called Orchestrator carries this to an extra level -- the binlog is replicated to an extra server and the network traffic occurs from there. This offloads the Primary. (Booking.com uses it to handle literally hundreds of replicas.)
On the Replica's side the two threads were added 20 years ago because of the following scenario:
The Replica is busy doing one query at a time.
It gets busy with some long query (say an ALTER)
Lots of activity backs up on the Primary
The Primary dies.
Now the Replica finishes the Alter, but does not have anything else to work on, so it is very "behind" and will take extra time to "catch up" once the Primary comes back online.
Hence, the 2-thread Replica "helps" keep things in sync, but it is still not fully synchronous.
Later there was "semi-synchronous" replication and multiple SQL threads in the Replica (still a single I/O thread).
Finally, InnoDB Cluster and Galera became available to provide [effectively] synchronous replication. But they come with other costs.
"master-slave database replication in which the master copies data to the slaves" - it's just a concept - data from a leader is copied to followers. There are many options how this could be done. Some of those are the write ahead log replication, blocks replication, rows replication.
Another interesting approach is to use a replication system completely separate from the storage. An example for this would be Bucardo - replication system for PostgreSQL. In that case nighter leader or follower actually do work.
In Couchbase documentation: https://developer.couchbase.com/documentation/server/current/concepts/distributed-data-management.html
There is no concept of master nodes, slave nodes, config nodes, name nodes, head nodes, etc, and all the software loaded on each node is identical
But in my logs I get the message found in post:
https://forums.couchbase.com/t/havent-heard-from-a-higher-priority-node-or-a-master-so-im-taking-over/5924
Haven't heard from a higher priority node or a master, so I'm taking over. mb_master 000 ns_1#10.200.0.10 1:07:38 AM Tue Feb 7, 2017
and
Somebody thinks we're master. Not forcing mastership takover over ourselves mb_master 000 ns_1#10.200.0.10 1:07:28 AM Tue Feb 7, 2017
I am having trouble finding what the master does, because any search about a master results in the comment of couchbase not having a master node.
The error messages seem to originate from the cluster management which should look like this (I didn't manage to find the Couchbase implementation of it). The link points to the implementation of membase which is the predecessor of Couchbase.
While all nodes are equal in Couchbase this is not the case when there is some redistribution of data. As described in detail in this document a master is chosen to manage the redistribution. The log messages you get are caused by this process.
The Master Node in the cluster manager is also known as the orchestrator.
Straight from the Couchbase Server 4.6 documentation, https://developer.couchbase.com/documentation/server/4.6/concepts/distributed-data-management.html
Although each node runs its own local Cluster Manager, there is only
one node chosen from among them, called the orchestrator, that
supervises the cluster at a given point in time. The orchestrator
maintains the authoritative copy of the cluster configuration, and
performs the necessary node management functions to avoid any
conflicts from multiple nodes interacting. If a node becomes
unresponsive for any reason, the orchestrator notifies the other nodes
in the cluster and promotes the relevant replicas to active status.
This process is called failover, and it can be done automatically or
manually. If the orchestrator fails or loses communication with the
cluster for any reason, the remaining nodes detect the failure when
they stop receiving its heartbeat, so they immediately elect a new
orchestrator. This is done immediately and is transparent to the
operations of the cluster.
I have 3 nodes in Couchbase cluster with number of replicas set to 1.
While performing a multithreaded insert of 1M documents, I restart one of the nodes couple of times.
The result is that at the end of insert operations, I am missing about 15% of the data.
Any idea how to prevent the data loss?
Firstly, did you failover the node when it went out of the cluster? Until you failover, the replica on the other nodes will not be promoted to active (and hence any replica data will not be accessible).
Secondly, are you checking the return value from your insert operations? If a node is inaccessible (but before a failover) operations will return an exception (likely "timeout") - you should ensure the application retries the insert.
Thirdly, by default most CRUD operations on Couchbase return as soon as the update has occurred on the master node for maximum performance. As a consequence if you do loose a node it's possible that the replica hasn't been written yet - so there would be no replica even if you did perform a failover. To prevent this you can use the observe operation to not report the operation "complete" until a replica node has a copy - see Monitoring Items using observe.
Note that using observe will result in a performance penalty, but this may be an acceptable tradeoff for modifications you particularly care about.
I am using Amazon RDS for my database services and want to use the read replica feature to distributed the traffic amongst the my read replica volumes. I currently store the connection information for my database in a single config file. So my idea is that I could create a function that randomly picked from a list of my read-replica endpoints/addresses in my config file any time my application performed a read.
Is there a problem with this idea as long as I don't perform it on a write?
My guess is that if you have a service that has enough traffic to where you have multiple rds read replicas that you want to balance load across, then you also have multiple application servers in front of it operating behind a load balancer.
As such, you are probably better off having certain clusters of app server instances each pointing at a specific read replica. Perhaps you do this by availability zone.
The thought here is that your load balancer will then serve as the mechanism for properly distributing the incoming requests that ultimately lead to database reads. If you had the DB reads randomized across different replicas you could have unexpected spikes where too much traffic happens to be directed to one DB replica causing resulting latency spikes on your service.
The biggest challenge is that there is no guarantee that the read replicas will be up-to-date with the master or with each other when updates are made. If you pick a different read-replica each time you do a read you could see some strangeness if one of the read replicas is behind: one out of N reads would get stale data, giving an inconsistent view of the system.
Choosing a random read replica per transaction or session might be easier to deal with from the consistency perspective.
I'm building a very small NDB cluster with only 3 machines. This means that machine 1 will serve as both MGM Server, MySQL Server, and NDB data node. The database is only 7 GB so I plan to replicate each node at least once. Now, since a query might end up using data that is cached in the NDB node on machine one, even if it isn't node the primary source for that data, access would be much faster (for obvious reasons).
Does the NDB cluster work like that? Every example I see has at least 5 machines. The manual doesn't seem to mention how to handle node differences like this one.
There are a couple of questions here :
Availability / NoOfReplicas
MySQL Cluster can give high availability when data is replicated across 2 or more data node processes. This requires that the NoOfReplicas configuration parameter is set to 2 or greater. With NoOfReplicas=1, each row is stored in only one data node, and a data node failure would mean that some data is unavailable and therefore the database as a whole is unavailable.
Number of machines / hosts
For HA configurations with NoOfReplicas=2, there should be at least 3 separate hosts. 1 is needed for each of the data node processes, which has a copy of all of the data. A third is needed to act as an 'arbitrator' when communication between the 2 data node processes fails. This ensures that only one of the data nodes continues to accept write transactions, and avoids data divergence (split brain). With only two hosts, the cluster will only be resilient to the failure of one of the hosts, if the other host fails instead, the whole cluster will fail. The arbitration role is very lightweight, so this third machine can be used for almost any other task as well.
Data locality
In a 2 node configuration with NoOfReplicas=2, each data node process stores all of the data. However, this does not mean that only one data node process is used to read/write data. Both processes are involved with writes (as they must maintain copies), and generally, either process could be involved in a read.
Some work to improve read locality in a 2-node configuration is under consideration, but nothing is concrete.
This means that when MySQLD (or another NdbApi client) is colocated with one of the two data nodes, there will still be quite a lot of communication with the other data node.