We have and architecture of a scalable web application on AWS and utilized AWS RDS MySQL. They say you have to create 2 Slave Read-Replicas for your MySQL Master db instance. Using this, the master DB will synchronize your data whenever it finds a change (in its master instance) across all of the read replicas. Your application has to split Read and Write operations so that All the Read Requests goes to Read-Replicas (Via a load-balancer or DNS) and write requests\ops goes to the master db.
Now my question is if a user visits a page which has a write operation, he does the operation and click to a page where a Read Operation is required of that new entered data. How much the Master db will take to sync with slave read-replicas so that the user can successfully see the read-operation result (i.e. the newly created record) on the very next page.
RDS MySQL Read Replica lag is influenced by a number of factors including the load on both the primary and secondary instances, the amount of data being replicated, the number of replicas, if they are within the same region or cross-region, etc. Lag can stretch to seconds or minutes, though typically it is under one minute.
For low-lag (10s of milliseconds) read replicas in a MySQL-compatible database you can use Amazon Aurora.
Related
As per How To Set Up Replication in MySQL,
Once the replica instance has been initialized, it creates two
threaded processes. The first, called the IO thread, connects to the
source MySQL instance and reads the binary log events line by line,
and then copies them over to a local file on the replica’s server
called the relay log. The second thread, called the SQL thread, reads
events from the relay log and then applies them to the replica
instance as fast as possible.
Isn't it contradictory to the theory of master-slave database replication in which the master copies data to the slaves?
Reliability. (A mini-history of MySQL's efforts.)
When a write occurs on the Primary, N+1 extra actions occur:
One write to the binlog -- this is to allow for any Replicas that happen to be offline (for any reason); they can come back later and request data from this file. (Also see sync_binlog)
N network writes, one per Replica. These are to get the data to the Replicas ASAP.
Normally, if you want more than a few Replicas, you can "fan out" through several levels, thereby allowing for an unlimited number of Replicas. (10 per level would give you 1000 Replicas in 3 layers.)
The product called Orchestrator carries this to an extra level -- the binlog is replicated to an extra server and the network traffic occurs from there. This offloads the Primary. (Booking.com uses it to handle literally hundreds of replicas.)
On the Replica's side the two threads were added 20 years ago because of the following scenario:
The Replica is busy doing one query at a time.
It gets busy with some long query (say an ALTER)
Lots of activity backs up on the Primary
The Primary dies.
Now the Replica finishes the Alter, but does not have anything else to work on, so it is very "behind" and will take extra time to "catch up" once the Primary comes back online.
Hence, the 2-thread Replica "helps" keep things in sync, but it is still not fully synchronous.
Later there was "semi-synchronous" replication and multiple SQL threads in the Replica (still a single I/O thread).
Finally, InnoDB Cluster and Galera became available to provide [effectively] synchronous replication. But they come with other costs.
"master-slave database replication in which the master copies data to the slaves" - it's just a concept - data from a leader is copied to followers. There are many options how this could be done. Some of those are the write ahead log replication, blocks replication, rows replication.
Another interesting approach is to use a replication system completely separate from the storage. An example for this would be Bucardo - replication system for PostgreSQL. In that case nighter leader or follower actually do work.
I've purchased a single VPC on AWS and initiated there 6 MySql databases, and foreach one I've created a reading replica, so that I can always run queries on the reading replicas quickly.
Most of the day, my writing instances (original instances) are fully loaded and their CPUs percentage is mostly 99%. However, the reading replicas shows something ~7-10% CPU usage, but sometimes I get an error when I run a service connecting to the reading replica "TOO MANY CONNECTIONS".
I'm not that expert with AWS, but is this happening because the writing replicas are fully loaded and they're on the same VPC?
this happening because the writing replicas are fully loaded and they're on the same VPC?
No, it isn't. This is unrelated to replication. In replication, the replica counts as exactly 1 connection on the master, but replication does not consume any connections on the replica itself. There is no impact on connections related to the intensity of the total workload from replication.
This issue simply means you have more clients connecting to the replica than are allowed by the parameter group based on your RDS instance type. Use the query SELECT ##MAX_CONNECTIONS; to see what this limit is. Use SHOW STATUS LIKE 'THREADS_CONNECTED'; to see how many connections exist currently, and use SHOW PROCESSLIST; (as the administrative user, or any user holding the PROCESS privilege) in order to see what all of these connections are doing.
If many of them show Sleep and have long values in Time (seconds spent in the current state) then the problem is that your application is somehow abandoning connections, rather than properly closing them after use or when they are otherwise no longer needed.
For my application I need my database to handle say 1000 updates per second at peak time, this isn't too much of a problem I just need the right server. However, if this server goes down I need a backup with the synced data to take over. How do I sync the data to another database?
In a separate part of my application I have a master and a slave, the slave replicates the master and the slave is read only. Could I use this method for my problem? I have looked into mysql clusters but so far reading about clusters is just making me more confused.
So put simply, how can I replicate my database handing 1000 writes per second, in case of downtime?
There are two solutions one simple but requiring manual reconfiguration in the event of the main server going down, the other more complex but more robust.
A) Simple replication - you can configure a slave server that receives updates from the master server. Both servers must be able to handle the number of updates and queries that you foresee. In the event of the master server failing, you need to manually swap the slave into the master role. http://dev.mysql.com/doc/refman/5.0/en/replication.html
B) Clustering - I'm not very familiar with MySQL clustering, but it gives synchronous updates to all servers and automatic failover - http://www.mysql.com/products/cluster/
I am using Amazon RDS for my database services and want to use the read replica feature to distributed the traffic amongst the my read replica volumes. I currently store the connection information for my database in a single config file. So my idea is that I could create a function that randomly picked from a list of my read-replica endpoints/addresses in my config file any time my application performed a read.
Is there a problem with this idea as long as I don't perform it on a write?
My guess is that if you have a service that has enough traffic to where you have multiple rds read replicas that you want to balance load across, then you also have multiple application servers in front of it operating behind a load balancer.
As such, you are probably better off having certain clusters of app server instances each pointing at a specific read replica. Perhaps you do this by availability zone.
The thought here is that your load balancer will then serve as the mechanism for properly distributing the incoming requests that ultimately lead to database reads. If you had the DB reads randomized across different replicas you could have unexpected spikes where too much traffic happens to be directed to one DB replica causing resulting latency spikes on your service.
The biggest challenge is that there is no guarantee that the read replicas will be up-to-date with the master or with each other when updates are made. If you pick a different read-replica each time you do a read you could see some strangeness if one of the read replicas is behind: one out of N reads would get stale data, giving an inconsistent view of the system.
Choosing a random read replica per transaction or session might be easier to deal with from the consistency perspective.
I've got a Java web service backed by MySQL + EC2 + EBS. For data integrity I've looked into DRBD, MySQL cluster etc. but wonder if there isn't a simpler solution. I don't need high availability (can handle downtime)
There are only a few operations whose data I need to preserve -- creating an account, changing password, purchase receipt. The majority of the data I can afford to recover from a stale backup.
What I am thinking is that I could pipe selected INSERT/UPDATE commands to storage (S3, SimpleDB for instance) and when required (when the db blows up) replay these commands from the point of last backup. And wouldn't it be neat if this functionality was implemented in the JDBC driver itself.
Is this too silly to work, or am I missing another obvious and robust solution?
Have you looked into moving your MySQL into Amazon Web Services as well? You can use Amazon Relational Database Service (RDS). Also see MySQL Enterprise Support.
You always have a window where total loss of a server and associated file storage will result in some amount of lost data.
When I ran a modestly busy SaaS solution in AWS, I had a MySQL Master running on a large instance and a MySQL Slave running on a small instance in a different availability zone. The replication lag was typically no more than 2 seconds, though a surge in traffic could take that up to a minute or two.
If you can't afford losing 5 minutes of data, I would suggest running a Master/Slave setup over rolling your own recovery mechanism. If you do roll your own, ensure the "stale" backups and the logged/journaled critical data are in a different availability zone. AWS has lost entire zones before.