I have three different application environments: production, demo, and dev. In each, I have an RDS instance running MySQL. I have five tables that house data that needs to be the same across all environments. I am trying to find a way to handle this.
For security purposes, it's not best to allow demo and dev to access the production database, so putting the data there seems to be a bad idea.
All environments need read/write capabilities. Is there a good solution to this?
Many thanks.
For security purposes, it's not best to allow demo and dev to access the production database, so putting the data there seems to be a bad idea.
Agreed. Do not have your demo/dev environments access data from your production environments.
I don't know your business logic, but I cannot think of a case where dev/demo data needs to be "in sync" with production data, unless the dev/demo environment is also dependent on other "production assets". If that were the case, I would suggest duplicating that data into your other environments.
Usually, the data in your database would be dependent on the environment it's contained within.
For best security and separation of concerns, keep your environment segregated as much as possible. This includes (but not limited to):
database data,
customer data,
images and other files
If data needs to be synchronized, create a script/program to perform that synchronization completely (db + all necessary assets). But do that as part of your normal development pipeline so it goes through dev+testing+qa etc.
So the thing about RDS and database level access is that you still would manage the user credentials like you would on premise. From an AWS perspective all you would need to do to allow access is update the security groups of your Mysql RDS instances to allow the traffic, then give your application the credentials you have provisioned for it. I do agree it is bad practice to give production level access to your dev or demo environments.
As far as the data being the same you can automate a nightly snapshot of the Production database and recreate new instances based on that. If your infrastructure is in Cloudformation or Terraform you can provide the new endpoint created in the snapshot and spin up a new DEV or DEMO environment.
Amazon RDS creates a storage volume snapshot of your DB instance, backing up the entire DB instance and not just individual databases. You can create a DB instance by restoring from this DB snapshot. When you restore the DB instance, you provide the name of the DB snapshot to restore from, and then provide a name for the new DB instance that is created from the restore. You cannot restore from a DB snapshot to an existing DB instance; a new DB instance is created when you restore.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CreateSnapshot.html
I would recommend using a fan out system at the point of data capture, along with a snapshot.
Take a point in time snap shot (i.e. now), spin up test/dev databases from this, and then use SQS->SNS->SQS fan out architecture to push any new changes to the data to your other databases?
Related
I have two MySQL RDS's (hosted on AWS). One of these RDS instances is my "production" RDS, and the other is my "performance" RDS. These RDS's have the same schema and tables.
Once a year, we take a snapshot of the production RDS, and load it into the performance RDS, so that our performance environment will have similar data to production. This process takes a while - there's data specific to the performance environment that must be re-added each time we do this mirror.
I'm trying to find a way to automate this process, and to achieve the following:
Do a one time mirror in which all data is copied over from our production database to our performance database.
Continuously (preferably weekly) mirror all new data (but not old data) between our production and performance MySQL RDS's.
During the continuous mirroring, I'd like for the production data not to overwrite anything already in the performance database. I'd only want new data to be inserted into the production database.
During the continuous mirroring, I'd like to change some of the data as it goes onto the performance RDS (for instance, I'd like to obfuscate user emails).
The following are the tools I've been researching to assist me with this process:
AWS Database Migration Service seems to be capable of handling a task like this, but the documentation recommends using different tools for homogeneous data migration.
Amazon Kinesis Data Streams also seems able to handle my use case - I could write a "fetcher" program that gets all new data from the prod MySQL binlog, sends it to Kinesis Data Streams, then write a Lambda that transforms the data (and decides on what data to send/add/obfuscate) and sends it to my destination (being the performance RDS, or if I can't directly do that, then a consumer HTTP endpoint I write that updates the performance RDS).
I'm not sure which of these tools to use - DMS seems to be built for migrating heterogeneous data and not homogeneous data, so I'm not sure if I should use it. Similarly, it seems like I could create something that works with Kinesis Data Streams, but the fact that I'll have to make a custom program that fetches data from MySQL's binlog and another program that consumes from Kinesis makes me feel like Kinesis isn't the best tool for this either.
Which of these tools is best capable of handling my use case? Or is there another tool that I should be using for this instead?
I am using t2.large RDS instance, I want to downgrade to t2.micro to fit my current business. I have a few question to ask:
- How can I downgrade RDS instance without losing data and downtime ?
Thanks,
You can't really do it without downtime, but you could minimize the downtime.
The easiest option is to Modify the DB instance. This will result in downtime because a new database will be provisioned, the data will be relocated and the DNS name will be changed to point to the new instance.
Seeing that you believe a t2.micro will be sufficient for your database, it would be fair to assume that there would be times when your database is not in use so that you can perform the Modify operation. It should only take a few minutes.
Officially, the best way to modify a database without downtime is to use Multi-AZ, which can update one node while traffic is still being served by another node. However, your goal seems to be to reduce cost, rather than spending more to ensure uptime.
By the way, a t2.micro is quite limited in terms of CPU and network bandwidth. You are trying to save 21c per day, at the potential cost of having a poorly-responding database.
You can consider creating a read replica (t2.micro) of the master instance (t2.large). Once the read replica is in sync with the master instance, you can promote the read replica and then point the application towards the new master instance (which is the promoted read replica).
For reference, see:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_MySQL.Replication.ReadReplicas.html
https://aws.amazon.com/blogs/aws/amazon-rds-for-mysql-promote-read-replica/
I have an Kubernetes environment running multipe applications (services). Now i'm a little bit confused how to setup the MySQL database instance(s).
According to different sources each microservice should have there own database. Should i create a single MySQL statefulset in HA mode running multiple databases OR should i deploy a separate MySQL instance for each application (service) running one database each.
My first thought would be the first option hence where should HA oterwise be usefull for? Would like to hear some differente views on this.
Slightly subjective question, but here's what we have setup. Hopefully, that will help you build a case. I'm sure someone would have a different opinion, and that might be equally valid too:
We deploy about 70 microservices, each with it's own database ("schema"), and it's own JDBC URL (defined via a service). Each microservice has it's own endpoint and credentials that we do not share between microservices. So in effect, we have kept the design to be completely independent across the microservices as far as the schema is concerned.
Deployment-wise, however, we have opted to go with a single database instance for hosting all databases (or "schemas"). While technically, we could deploy each database on its own database instance, we chose not to do it for few main reasons:
Cost overhead: Running separate database instances for each microservice would add a lot of "fixed" costs. This may not be directly relevant to you if you are simply starting the database as a MySQL Docker container (we use a separate database service, such as RDS or Google Cloud SQL). But even in the case of MySQL as a Docker container, you might end up having a non-trivial cost if you run, for example, 70 separate containers one per microservice.
Administration overhead: Given that databases are usually quite involved (disk space, IIOPs, backup/archiving, purge, upgrades and other administration activities), having separate database instances -- or Docker container instances -- may put a significant toll on your admin or operations teams, especially if you have a large number of microservices
Security: Databases are usually also critical when it comes to security as the "truth" usually goes in the DB. Keeping encryption, TLS configuration and strengths of credentials aside (as they should be of utmost importance regardless of your deployment model), security considerations, reviews, audits and logging will bring in significant challenges if your databases instances are too many.
Ease of development: Relatively less critical in the grand scheme of things, but significant, nonetheless. Unless you are thinking of coming up with a different model for development (and thus breaking the "dev-prod parity"), your developers may have a hard time figuring out the database endpoints for debugging even if they only need that information once-in-a-while.
So, my recommendation would be to go with a single database instance (Docker or otherwise), but keep the databases/schemas completely independent and inaccessible by the any microservice but the "owner" microservice.
If you are deploying MySQL as Docker container(s), go with a StatefulSet for persistence. Define an external pvc so that you can always preserve the data, no matter what happens to your pods or even your cluster. Of course, if you run 'active-active', you will need to ensure clustering between your nodes, but we do run it in 'active-passive' mode, so we keep the replica count to 1 given we only use MySQL Docker container alternative for our test environments to save costs of external DBaaS service where it's not required.
I am starting to plan a multi region (us-east & us-west) web app that involves AWS RDS MySQL db. i am going to put this in AWS. Can any aws guru clarify my concern?
I will have the multi AZ for redundancy/High Availibity. And the Read DB accross regions for faster READ request processing.
My concern/question:
If the master DB instance is in US-west. and if the write request from instances/computes/app server in us-east are routed to db endpoint which is in us-west, does this cause lag in the app OR is it the way how many AWS users uses?
The read instance local to the app servers are not for writes.
You can't defeat the speed of light.
Having a server write to the database that may be 80ms away may not result in acceptable performance. Only you can determine this.
You run into the same issue if you use MySQL replication across regions.
Now, if you just want to have read replicas across regions, with all writes directed to a single region, you can probably make that work.
If you really need a fast, globally distributed database, consider using something like DynamoDB.
If you have a test server, and a production server. How do you sync the database?
Do you add the data in the test server first, just like you code in the test server?
How is that generally handled?
That depends on how coupled you can allow the two databases to be.
One configuration might be to have the two databases replicated with the production server in master configuration and the test server as slave. I would use caution with this method, unless you absolutely need the synchronization of live data with test infrastructure, don't go down this path.
Instead, if you want to keep the two instances separated, just replicate with mysqldumps.
I version control SQL dumps (as made by mysqldump). I find it's usually a nice portable format.
Some platforms have concepts like Django's Fixtures which dumps models data to a JSON or XML file so you can version control it to/from the server and load it up where you need it. Whether you have something like this available depends on your platform.
You're talking about getting a live version in your production environment. If I were you, I'd settle for near-live. Cron up a dump script on the server and either write a little script to pull it into your production environment or just rely on version control.
Anything more than near-live will likely be two-way and risk live data.
I use Navicat to sync my development and production servers. The Navicat product allows structure syncing (ie new fields) and data syncing (data). The main limitation is that you can't sync any tables that don't have a primary key. And its not always that fast if there are a lot of records to transfer.
WE do all database devvelopment inscripts including scripts to add records to lookup tables. these scripts are in source control and are versionied just like any other code. To send changes from dev to prod, we run the scripts for that version. This would include any changes needed to lookup type tables as well as table structure changes, sps, userdefined functions, views, etc. SInce it cannot be deplyed to prod without a script (we havea configuration managment team that does the deplying not developers), we have no trouble at all with people nort using scripts or source control.
To go back from prod to dev, we refesh last prod backup and then rerun the dev scripts which have not yet been promoted to prod.