In createReplication call keep getting same replication id all the time. How to get unique Id? The requirement is need to check metrics per replication created. Say we are programmatically creating replication aim to monitor it by periodically call stats API and changes_left is 0 we are stopping replication programmatically by calling cancelreplication.
The issue with couchbase behavior is getting the same replication ID all the time. How the replication id is calculated?
Replication ID is ultimately composed in the following format:
<Remote Cluster UUID>/<SourceBucketName>/<TargetBucketName>
XDCR internally has a randomly generated Internal ID to differentiate between instances, but it is not exposed to the outside world.
Related
I am running a MySQL 5.5 Master-Slave setup. For avoiding too many hits on my master server, I am thinking of having one or may be more servers for MySQL and incoming requests will first hit the HAProxy and it accordingly forwards the requests either in round robin or any scheduling algorithm defined in HAProxy. So set up will be like -
APP -> API Gateaway/Server -> HAProxy -> Master Server1/Master Server2
So what can be pros and cons to this setup ?
Replication in MySQL is asynchronous by default, so you can't always assume that the replicas are in sync with their source.
If you intend to use your load-balancer to split writes over the two master instances, you could get into trouble with that because of MySQL's asynchronous replication.
Say you commit a row on master1 to a table that has a unique key. Then you commit a row with the same unique value to the same table on master2, before the change on master1 has been applied through replication. Both servers allowed the row to be committed, because as far as they knew, it did not violate the unique constraint. But then as replication tries to apply the change on each server, those changes do conflict with the row committed. This is called split-brain, and it's incredibly difficult to recover from.
If your load-balancer randomly sends some read queries to another instance, they might not return data that you just committed on the other instance. This is called replication lag.
This may or may not be a problem for your app, but it's likely that in your app, at least some of the queries require strong consistency, i.e. reading outdated results is not permitted. Other cases even with the same app may be more tolerant of some replication lag.
I wrote a presentation some years ago about splitting queries between source and replica MySQL instances: https://www.percona.com/sites/default/files/presentations/Read%20Write%20Split.pdf. The presentation goes into more details about the different types of tolerance for replication lag.
MySQL 8.0 has introduced a more sophisticated solution for all of these problems. It's called Group Replication, and it does its best to ensure that all instances are in sync all the time, so you don't have the risk of reading stale data or creating write conflicts. The downside of Group Replication is that to ensure no replication lag occurs, it may need to constrain your transaction throughput. In other words, COMMITs may be blocked until the other instances in the replication cluster respond.
Read more about Group Replication here: https://dev.mysql.com/doc/refman/8.0/en/group-replication.html
P.S.: Whichever solution you decide to pursue, I recommend you do upgrade your version of MySQL. MySQL 5.5 passed its end-of-life in 2018, so it will no longer get updates even for security flaws.
I am planning to store data into the couch-base cluster.
I would like to know what will happen if my couch base goes down for the following scenarios:
[Consider that there were no active transactions happening]
A node goes down from the cluster.(My assumption is after the node is fixed and is up, it will sync up with other nodes, and the data would be there). Also, let me know after it is synced up will there still be any data loss?
The cluster went down and is fixed and restarted.
Please let me know data persistence analogy for the above scenarios.
Yes, Couchbase persists data to disk. It writes change operations to append-only files on the Data Service nodes.
Data loss is unlikely for your two scenarios because there are no active transactions.
Data loss can occur if a node fails
while persisting a change to disk or
before completing replication to another node if the bucket supports replicas.
Example: Three Node Cluster with Replication
Consider the case of a three node Couchbase cluster and a bucket with one replica for each document. That means that a single document will have a copy stored on two separate nodes, call those the active and the replica copies. Couchbase will shard the documents equitably across the nodes.
When a node goes down, about a third of the active and replica copies become unavailable.
A. If a brand new node is added and the cluster rebalanced, the new node will have the same active and replica copies as the old one did. Data loss will occur if replication was incomplete when the node failed.
B. If the node is failed over, then replicas for the active documents on the failed node will become active. Data loss will occur if replication was incomplete when the node failed.
C. If the failed node rejoins the cluster it can reuse its existing data so the only data loss would be due to a failure to write changes to disk.
When the cluster goes down, data loss may occur if there is a disk failure.
We have and architecture of a scalable web application on AWS and utilized AWS RDS MySQL. They say you have to create 2 Slave Read-Replicas for your MySQL Master db instance. Using this, the master DB will synchronize your data whenever it finds a change (in its master instance) across all of the read replicas. Your application has to split Read and Write operations so that All the Read Requests goes to Read-Replicas (Via a load-balancer or DNS) and write requests\ops goes to the master db.
Now my question is if a user visits a page which has a write operation, he does the operation and click to a page where a Read Operation is required of that new entered data. How much the Master db will take to sync with slave read-replicas so that the user can successfully see the read-operation result (i.e. the newly created record) on the very next page.
RDS MySQL Read Replica lag is influenced by a number of factors including the load on both the primary and secondary instances, the amount of data being replicated, the number of replicas, if they are within the same region or cross-region, etc. Lag can stretch to seconds or minutes, though typically it is under one minute.
For low-lag (10s of milliseconds) read replicas in a MySQL-compatible database you can use Amazon Aurora.
I've purchased a single VPC on AWS and initiated there 6 MySql databases, and foreach one I've created a reading replica, so that I can always run queries on the reading replicas quickly.
Most of the day, my writing instances (original instances) are fully loaded and their CPUs percentage is mostly 99%. However, the reading replicas shows something ~7-10% CPU usage, but sometimes I get an error when I run a service connecting to the reading replica "TOO MANY CONNECTIONS".
I'm not that expert with AWS, but is this happening because the writing replicas are fully loaded and they're on the same VPC?
this happening because the writing replicas are fully loaded and they're on the same VPC?
No, it isn't. This is unrelated to replication. In replication, the replica counts as exactly 1 connection on the master, but replication does not consume any connections on the replica itself. There is no impact on connections related to the intensity of the total workload from replication.
This issue simply means you have more clients connecting to the replica than are allowed by the parameter group based on your RDS instance type. Use the query SELECT ##MAX_CONNECTIONS; to see what this limit is. Use SHOW STATUS LIKE 'THREADS_CONNECTED'; to see how many connections exist currently, and use SHOW PROCESSLIST; (as the administrative user, or any user holding the PROCESS privilege) in order to see what all of these connections are doing.
If many of them show Sleep and have long values in Time (seconds spent in the current state) then the problem is that your application is somehow abandoning connections, rather than properly closing them after use or when they are otherwise no longer needed.
For my application I need my database to handle say 1000 updates per second at peak time, this isn't too much of a problem I just need the right server. However, if this server goes down I need a backup with the synced data to take over. How do I sync the data to another database?
In a separate part of my application I have a master and a slave, the slave replicates the master and the slave is read only. Could I use this method for my problem? I have looked into mysql clusters but so far reading about clusters is just making me more confused.
So put simply, how can I replicate my database handing 1000 writes per second, in case of downtime?
There are two solutions one simple but requiring manual reconfiguration in the event of the main server going down, the other more complex but more robust.
A) Simple replication - you can configure a slave server that receives updates from the master server. Both servers must be able to handle the number of updates and queries that you foresee. In the event of the master server failing, you need to manually swap the slave into the master role. http://dev.mysql.com/doc/refman/5.0/en/replication.html
B) Clustering - I'm not very familiar with MySQL clustering, but it gives synchronous updates to all servers and automatic failover - http://www.mysql.com/products/cluster/