persistently replicating RDS MySQL database to external slave - mysql

AWS now allows you to replicate data from an RDS instance to an external MySQL database.
However, according to the docs:
Replication to an instance of MySQL running external to Amazon RDS is only supported during the time it takes to export a database from a MySQL DB instance. The replication should be terminated when the data has been exported and applications can start accessing the external instance.
Is there a reason for this? Can I choose to ignore this if I want the replication to be persistent and permanent? Or does AWS enforce this somehow? If so, are there any work-arounds?

It doesn't look like Amazon explicitly states why they don't support ongoing replication other than the statement you quoted. In my experience, if AWS doesn't explicitly document a reason for why they do something then you're not likely to find out unless they decide to document it at a later time.
My guess would be that it has to do with the dynamic nature of Amazon instances and how they operate within RDS. RDS instances can have their IP address change suddenly without warning. We've encountered that on more than one occasion with the RDS instances that we run. According to the RDS Best Practices guide :
If your client application is caching the DNS data of your DB instances, set a TTL of less than 30 seconds. Because the underlying IP address of a DB instance can change after a failover, caching the DNS data for an extended time can lead to connection failures if your application tries to connect to an IP address that no longer is in service.
Given that RDS instances can and do change their IP address from time to time my guess is that they simply want to avoid the possibility of having to support people who set up external replication only to have it suddenly break if/when an RDS instance gets assigned a new IP address. Unless you set the replication user and any firewalls protecting your external mysql server to be pretty wide open then replication could suddenly stop if the RDS master reboots for any reason (maintenance, hardware failure, etc). From a security point of view, opening up your replication user and firewall port like that are not a good idea.

Related

AWS RDS connection from external client extremely slow

I am currently connecting to an RDS instance (MariaDB) without an issue from within the configured VPC.
I am also connecting to the RDS instance from local clients (external to the VPC) with no connectivity issues but have serious issues with SQL execution speeds. It can take up to 20 times longer to execute a query remotely vs locally (an EC2 on same VPC).
I have the Security Group for the RDS instance setup to allow the external IPs as incoming rules and the RDS instance is listening on a non-default port (not 3306).
I cannot think of anything I should be doing differently on the network side of things and I have set skip-name-resolve=1, yet the speed is ridiculous.
It has no preference in terms of what the SQL query may be (SELECT, UPDATE, DELETE), they all execute slow.
Server RDS is MariaDb 10.1.19 on a db.t2.medium instance.
Client connection is via MySQL .NET Connector and connection string:
Server=<ip>;Port=<port>;Database=<dbname>;User ID=<dbuser>;Pooling=true;CharSet=utf8;Password=<dbpass>
Client has no connectivity or speed issues when DB in not an RDS (local MySQL).
I have seen various network related issues popping up now and then (connection stream dropped) but nothing serious apart from that, just very slow.
Any pointers on how to at least determine where the problem is?
The scenario I am trying to achieve (with acceptable speeds) is described here (albeit vague in their instructions):
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario4

Amazon Aurora DB Cluster Not Auto Balancing Correctly

I have created an Amazon Aurora Database cluster runing MySQL with three instances: the main instance that backs the cluster and two read replicas for balancing. However, the cluster does not seem to be balancing the reads at all. I have one replica managing 700+ Selects/sec maximizing the CPU at 99.75% or higher while the other replica is doing virtually nothing with a CPU usage of 4% at 1 select per second, if that. The main cluster instance itself is at 33% CPU usage as it is being written to simultaneously while the replicas should are being read from. The lag time between the replicas is under 20 milliseconds. My application is querying the read only endpoint of the cluster but no balancing appears to be happening. Does anyone have any insight into why this may be happening or why the replica is at such a high CPU usage? The queries being ran against it are not complex by any means.
Aurora Cluster endpoints are DNS records and they only do DNS round robin during resolution. This means that when your client application opens connections to a cluster endpoint, you end up resolving the endpoint to different instances (different IPs basically), there by striping your connections across multiple replicas. Past that point, there is no load balancing. Connections are striped across instances, and queries run on each of those connections go to the corresponding instance backing it.
Now consider the scenario where your connection pool was already created to the cluster endpoint when you have one instance behind it. Now, if you add more instances, there will be no impact to your application, unless you terminate your connection and reestablish them. You would do a DNS round robin again, and this time some of your connections would land on the new instance that you provisioned.
Few callouts:
In Aurora, you have 2 cluster endpoints. One (RW) endpoint always points to the current writer and one (RO) does the DNS round robin between your read replicas.
Also, DNS propagation might take a few seconds when failovers happen, so that occasional errors are quite natural when failovers occur.
Hope this helps.
We've implemented a driver to try to mitigate this problem, with some visible gains: https://github.com/DiceTechnology/dice-fairlink
It regularly discovers the read-replicas to catch up with cluster changes and round-robins connections among them.
Despite not measuring any CPU utilisation, we've observed a better load distribution than with the native DNS based round-robin of the cluster reader endpoint
The Aurora's DNS based load balancing works at the connection level (not the individual query level). You must keep resolving the endpoint without caching DNS to get a different instance IP on each resolution. If you only resolve the endpoint once and then keep the connection in your pool, every query on that connection goes to the same instance. If you cache DNS, you receive the same instance IP each time you resolve the endpoint.
Unless you use a smart database driver, you depend on DNS record updates and DNS propagation for failovers, instance scaling, and load balancing across Aurora Replicas. Currently, Aurora DNS zones use a short Time-To-Live (TTL) of 5 seconds. Ensure that your network and client configurations don’t further increase the DNS cache TTL. Remember that DNS caching can occur anywhere from your network layer, through the operating system, to the application container. For example, Java virtual machines (JVMs) are notorious for caching DNS indefinitely unless configured otherwise. Here are AWS documentation and Aurora whitepaper on configuring DNS cache ttl.
My guess is that you are not connecting to the cluster endpoint.
Load Balancing – Connecting to the cluster endpoint allows Aurora to load-balance connections across the replicas in the DB cluster. This helps to spread the read workload around and can lead to better performance and more equitable use of the resources available to each replica. In the event of a failover, if the replica that you are connected to is promoted to the primary instance, the connection will be dropped. You can then reconnect to the reader endpoint in order to send your read queries to the other replicas in the cluster.
New Reader Endpoint for Amazon Aurora – Load Balancing & Higher Availability
[EDIT]
To load balance within a single application, you will need to reconnect to the endpoint. If you use the same connection for all queries only one replica will be responding. However, opening connections is expensive so this might not provide much benefit unless your queries take some time to run.

Amazon RDS two way replication

I have two rds which have the same structure, each db serves one of two apps. Some of tables are using by first app some by other. I want to set up two ways read/write replication between two rds. I can do it with stand alone mysql, set table replication (https://dev.mysql.com/doc/refman/5.7/en/replication-rules-table-options.html) but can not find any option to do so for rds
This is is probably what you'd call a Really Bad Idea™.
RDS does allow you to configure an RDS instance as a slave of another machine.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/mysql_rds_set_external_master.html
Of course, on the same page...
Warning
Do not use mysql.rds_set_external_master to manage replication between two Amazon RDS DB instances.
...however, that appears to be because that's not how you configure an RDS instance to be a replica of another instance. When you're configuring a read-only replica, you don't use this -- RDS manages all of the replication configuration for you.
mysql.rds_set_external_master() is a stored procedure that allows you to execute CHANGE MASTER TO... since, in RDS, you lack the SUPER privilege and would otherwise be unable to do this.
The feature is designed for hot migrations from a non-RDS MySQL server to RDS, by replicating the events from the external master into RDS during the transition.
...however, if there is a way to do what you are trying to do with RDS, this would be it. Each instance would be set to use the other as its master.
The two would need to have network connectivity, which means they'd need to be in the same region and same VPC, or you'd have to handle the peering or tunnel configuration yourself to establish that network path.
This is almost certainly an unsupported configuration, but again, if there is a way to accomplish it, this would be the way. "Unsupported" doesn't mean it won't work, but only that AWS support will not likely be able to provide assistance if it doesn't.
Did I mention this might not be a good idea?

Share localhost mysql database with cloud computing instance

How can I copy the content of a specific table (or the table as is) from my local database to a database instance stored on a cloud, lets say Amazon RDS?
note: it has to be done ones every hour.
EDIT:
Other I/O operations on the local database should no be suspended (e.g. no READ LOCKS).
You can set your local database server to be a master to the Amazon RDS instance which means the Amazon RDS instance becomes a slave in this setup. This is possible to do as mentioned in the AWS documentation here http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.External.Repl.html
You can also configure the slave to update only a specific table in the database after a specified interval of time.

Architecture of MySQL on EBS for scaling (Amazon Web Services)

I'm trying to understand how to architect an Amazon Web Services application.
I have an instance running off of EBS. As far as I understand, I need to mount the EBS drive so that I can store my MySQL database on it.
When I later want to scale up, how do I do so? I understand that I can add more server instances, but how will they be accessing the database? Since from what I understand, the EBS volume can only be attached to one server instance.
I can't speak to this particular setup as I do not have experience using EBS with a MySQL instance but how this type of scaling is typically accomplished is by dedicating a particular instance as the master database server. Any time you spin up additional web servers those are still using the master DB IP to connect. At the time in which your database is the bottleneck you then spin up a slave DB instance on one of the boxes (or its own dedicated box). You can then configure replication in either a master to slave direction or a circular replication so that you can write to the slave instance as well.
If you choose the classic master to slave replication then you will have to make sure your writes are only performed on the master DB instance.
You can setup something like Zeus or any other connection load balancer so that you only ever have to connect to a single Database IP which will then round-robin route your read connections to your pool of servers. Otherwise you'd have to manage the connections yourself which is definitely not trivial. Good luck.
Growing Amazon EBS Volume sizes
You can give a try to MySQL clustering on your EBS backed instances. I have similar query, with more requirements, posted here.
EBS Volumes capacity can be scaled up using Snapshot->launch new volume technique, alternatively storage capacity can be scaled out using EBS Striping (RAID 0).
In AWS you cannot mount same EBS Volume to 2 EC2 instances simultaneously, so when you are scaling your application you need to scale out / up your MySQL DB either thru Replication or clustering. AWS RDS is a very good option for MySQL , if your application is read intensive then you can scale out using RDS Read replica's as well. If you need write scaling then functional partition or MySQL Shards can be explored.
AWS has an entire product dedicated to this: RDS.
In all but the rarest and most specialized of circumstances you're going to be better off using RDS than trying to create and tune your own EBS/EC2/MySQL infrastructure.
RDS also directly answers your question - they directly enable the creation of readonly databases to use as query slaves. RDS also performs backups, upgrades, and all sorts of fail-over infrastructure for you.
With EBS there's no way to attach a disk to multiple EC2 instances, so you're not going to be able to build out a failure cluster using that approach. Instead you're going to need replication or backup tools of some type.