I am starting to plan a multi region (us-east & us-west) web app that involves AWS RDS MySQL db. i am going to put this in AWS. Can any aws guru clarify my concern?
I will have the multi AZ for redundancy/High Availibity. And the Read DB accross regions for faster READ request processing.
My concern/question:
If the master DB instance is in US-west. and if the write request from instances/computes/app server in us-east are routed to db endpoint which is in us-west, does this cause lag in the app OR is it the way how many AWS users uses?
The read instance local to the app servers are not for writes.
You can't defeat the speed of light.
Having a server write to the database that may be 80ms away may not result in acceptable performance. Only you can determine this.
You run into the same issue if you use MySQL replication across regions.
Now, if you just want to have read replicas across regions, with all writes directed to a single region, you can probably make that work.
If you really need a fast, globally distributed database, consider using something like DynamoDB.
Related
I'm trying to create a disaster recovery plan for a cost efficient, maintainable and with little down for Aurora MySQL.
I want two read/write databases in two different regions, they can be separate databases called primary-us-east-1 and backup-us-east-2. I also want bidirectional replication between primary-us-east-1 to backup-us-east-2. Only one database will be connected to at all times so collisions are not a concern. In the event that region us-east-1 goes down, all I have to do is trigger a DNS switch to point to us-east-2 since backup-us-east-2 is already updated.
I've looked into Aurora Global Databases but this requires promoting a read replica in a secondary region to a master and then updating the DNS to recover from a region outage. I like the 0 work for data replication across several regions but I don't like losing the maintainability of the new resources in the process because the newly created resources (clusters/replicas) won't be maintainable in CDK if created through a lambda or by hand.
Is what I'm asking for possible? If yes, does anyone know of a replication solution so data can be copied primary-us-east-1 between backup-us-east-2?
UPDATE 1:
A potential solution is standing up the Aurora MySQL resources primary-us-east-1 and backup-us-east-2 using cdk. Keep them in sync using AWS Database Migration Service for continuous replication. Use a lambda to detect a region outage which will then perform the dns switch to point to backup-us-east-2. The only follow up task would be bringing primary-us-east-1 in sync with backup-us-east-2.
Whole region outages are very rare (see https://awsmaniac.com/aws-outages/). I would be cautious about how much effort you invest in trying to automate detection and failover for such cases. It's a lot of work to do this, if it's possible at all. It's extremely hard to do this right, it's hard to test and hard to keep working. Lots of potential for false-positive failover events, or out of control flip-flopping. Whole companies have started up and failed trying to create fully automated failover solutions. I would bet that even the FAANG companies don't achieve it, but rely on site reliability engineers to respond to outages.
IMO, it's more cost-effective to develop a nicely written runbook for manual cutover to the other region, and then make sure your staff practice region failover periodically. This ensures the docs are kept up to date, the tools work, and the team is familiar with the steps.
DNS updates are slow. What I would recommend instead is some sort of proxy server, so your apps can use a single endpoint, and the proxy can switch which database on the back-end to use dynamically. This is basically what MySQL Router is for, and I've also done a proof of concept with Envoy Proxy (sorry I don't have access to that code anymore), and I suppose you could do the same thing with ProxySQL.
My opinion is that AWS still has potential for improvement with respect to failover for RDS and Aurora. It works, but it can cause long downtimes on the order of several minutes. So it's hardly an improvement over manual failover. That is, some oncall engineer gets paged, checks out some dashboards to confirm that it's a legitimate outage, and then executes the runbook to do a manual failover.
I have two MySQL RDS's (hosted on AWS). One of these RDS instances is my "production" RDS, and the other is my "performance" RDS. These RDS's have the same schema and tables.
Once a year, we take a snapshot of the production RDS, and load it into the performance RDS, so that our performance environment will have similar data to production. This process takes a while - there's data specific to the performance environment that must be re-added each time we do this mirror.
I'm trying to find a way to automate this process, and to achieve the following:
Do a one time mirror in which all data is copied over from our production database to our performance database.
Continuously (preferably weekly) mirror all new data (but not old data) between our production and performance MySQL RDS's.
During the continuous mirroring, I'd like for the production data not to overwrite anything already in the performance database. I'd only want new data to be inserted into the production database.
During the continuous mirroring, I'd like to change some of the data as it goes onto the performance RDS (for instance, I'd like to obfuscate user emails).
The following are the tools I've been researching to assist me with this process:
AWS Database Migration Service seems to be capable of handling a task like this, but the documentation recommends using different tools for homogeneous data migration.
Amazon Kinesis Data Streams also seems able to handle my use case - I could write a "fetcher" program that gets all new data from the prod MySQL binlog, sends it to Kinesis Data Streams, then write a Lambda that transforms the data (and decides on what data to send/add/obfuscate) and sends it to my destination (being the performance RDS, or if I can't directly do that, then a consumer HTTP endpoint I write that updates the performance RDS).
I'm not sure which of these tools to use - DMS seems to be built for migrating heterogeneous data and not homogeneous data, so I'm not sure if I should use it. Similarly, it seems like I could create something that works with Kinesis Data Streams, but the fact that I'll have to make a custom program that fetches data from MySQL's binlog and another program that consumes from Kinesis makes me feel like Kinesis isn't the best tool for this either.
Which of these tools is best capable of handling my use case? Or is there another tool that I should be using for this instead?
I have a set of Amazon RDS instances spread across multiple regions. Each RDS instance has a similar set of tables prefixed in a predictable way.
For example, I have the database for 50 universities in EU-West, and 50 universities in US-East, and one DB per university (100 databases across two regions). I want to get the total count of students across all databases. We have the connection configuration for individual databases listed in a DynamoDB table.
Currently, if we want to run a MySQL query across our entire dataset, we would do it by the following steps:
Get all the connection configurations from DynamoDB
Check against a blacklist to filter out schema by wildcard
Loop through each connection configuration in an application script (in this case: PHP)
Submit the MySQL query to each database individually, writing a CSV of the results.
This workflow is slow, as it is single-threaded, and it is difficult to make it fail gracefully. It has the pros of being somewhat flexible in terms of whitelisting, but requires a code change to allow any sort of non-rigid queries.
I have looked into the following and have run into some problems:
Amazon Athena, Aurora, Glue, and Redshift all do something similar, but seem to not have strong cross-regional capabilities.
Federated tables in MySQL can work cross-region, but require more maintenance and configuration each time we would add a new database.
For the size of our dataset replicating everything into an S3 bucket or a Datalake would be cost-prohibitive on the grounds of the data-transfer cost.
Is there a workflow or technology which will allow me to overcome the shortcomings of the single-threaded, application driven method?
I have three different application environments: production, demo, and dev. In each, I have an RDS instance running MySQL. I have five tables that house data that needs to be the same across all environments. I am trying to find a way to handle this.
For security purposes, it's not best to allow demo and dev to access the production database, so putting the data there seems to be a bad idea.
All environments need read/write capabilities. Is there a good solution to this?
Many thanks.
For security purposes, it's not best to allow demo and dev to access the production database, so putting the data there seems to be a bad idea.
Agreed. Do not have your demo/dev environments access data from your production environments.
I don't know your business logic, but I cannot think of a case where dev/demo data needs to be "in sync" with production data, unless the dev/demo environment is also dependent on other "production assets". If that were the case, I would suggest duplicating that data into your other environments.
Usually, the data in your database would be dependent on the environment it's contained within.
For best security and separation of concerns, keep your environment segregated as much as possible. This includes (but not limited to):
database data,
customer data,
images and other files
If data needs to be synchronized, create a script/program to perform that synchronization completely (db + all necessary assets). But do that as part of your normal development pipeline so it goes through dev+testing+qa etc.
So the thing about RDS and database level access is that you still would manage the user credentials like you would on premise. From an AWS perspective all you would need to do to allow access is update the security groups of your Mysql RDS instances to allow the traffic, then give your application the credentials you have provisioned for it. I do agree it is bad practice to give production level access to your dev or demo environments.
As far as the data being the same you can automate a nightly snapshot of the Production database and recreate new instances based on that. If your infrastructure is in Cloudformation or Terraform you can provide the new endpoint created in the snapshot and spin up a new DEV or DEMO environment.
Amazon RDS creates a storage volume snapshot of your DB instance, backing up the entire DB instance and not just individual databases. You can create a DB instance by restoring from this DB snapshot. When you restore the DB instance, you provide the name of the DB snapshot to restore from, and then provide a name for the new DB instance that is created from the restore. You cannot restore from a DB snapshot to an existing DB instance; a new DB instance is created when you restore.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CreateSnapshot.html
I would recommend using a fan out system at the point of data capture, along with a snapshot.
Take a point in time snap shot (i.e. now), spin up test/dev databases from this, and then use SQS->SNS->SQS fan out architecture to push any new changes to the data to your other databases?
I am using Amazon RDS for my database services and want to use the read replica feature to distributed the traffic amongst the my read replica volumes. I currently store the connection information for my database in a single config file. So my idea is that I could create a function that randomly picked from a list of my read-replica endpoints/addresses in my config file any time my application performed a read.
Is there a problem with this idea as long as I don't perform it on a write?
My guess is that if you have a service that has enough traffic to where you have multiple rds read replicas that you want to balance load across, then you also have multiple application servers in front of it operating behind a load balancer.
As such, you are probably better off having certain clusters of app server instances each pointing at a specific read replica. Perhaps you do this by availability zone.
The thought here is that your load balancer will then serve as the mechanism for properly distributing the incoming requests that ultimately lead to database reads. If you had the DB reads randomized across different replicas you could have unexpected spikes where too much traffic happens to be directed to one DB replica causing resulting latency spikes on your service.
The biggest challenge is that there is no guarantee that the read replicas will be up-to-date with the master or with each other when updates are made. If you pick a different read-replica each time you do a read you could see some strangeness if one of the read replicas is behind: one out of N reads would get stale data, giving an inconsistent view of the system.
Choosing a random read replica per transaction or session might be easier to deal with from the consistency perspective.