We have a large AWS RDS(MySQL) Instance and we need to replicate data from it to another Ec2 Instance, daily at a certain time for reporting and analysis purpose.
currently we are using mysqldump to create a dump file and then copy the whole schema which takes a lot of time. Is there a faster way of doing this, it would be a lot better if it copies only the new records.
How can we copy data without copying whole schema every time?
You should look at the Database Migration Service. Don't be confused by the name. It can do continuous or one time replication. From the FAQ:
Q. In addition to one-time data migration, can I use AWS Database
Migration Service for continuous data replication?
Yes, you can use AWS Database Migration Service for both one-time data
migration into RDS and EC2-based databases as well as for continuous
data replication. AWS Database Migration Service will capture changes
on the source database and apply them in a transactionally-consistent
way to the target. Continuous replication can be done from your data
center to the databases in AWS or in the reverse, replicating to a
database in your datacenter from a database in AWS. Ongoing continuous
replication can also be done between homogeneous or heterogeneous
databases. For ongoing replication it would be preferable to use
Multi-AZ for high-availability.
You can use AWS Glue to do the database migration as an ETL job periodically.
You can also consider using AWS Data Migration Service (DMS).
However AWS Glue is preferred over DMS for ETL jobs that runs within AWS and you are familiar with Python or Scala to write the transformation logic.
Q: When should I use AWS Glue vs AWS Database Migration Service?
AWS Database Migration Service (DMS) helps you migrate databases to AWS easily and securely. For use cases which require a database migration from on-premises to AWS or database replication between on-premises sources and sources on AWS, we recommend you use AWS DMS. Once your data is in AWS, you can use AWS Glue to move and transform data from your data source into another database or data warehouse, such as Amazon Redshift.
Related
I had a question regarding migrating large data form my local machine to AWS RDS (Aurora DB). Basically I have local mysql database that has couple of tables with around 4GB of data. I need to replicate this data in AWS RDS. The approach I was thinking was to make INSERT call to the RDS but with this huge amount of data (32 million rows), the process would be costly. I did see some resources on exporting data from local and importing it in RDS but could not quite understand how it works. Does someone have a good idea about this and advice me on what would be the best process. PS: the data only exist on local machine and not in any servers.
Dump a CSV extract into S3 then use an AWS migration tool, I.e. see: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html
Is there any best practice to keep the staging database up to date with the production database?
For example,
every day at midnight, the production database overwrites the staging database.
If your goal is to make the Staging database an exact copy of the Production database, then you could:
Take a Snapshot of the Production database
Delete the Staging database
Restore a new Staging database from the Snapshot of the Production database
These steps can be automated via a script that calls the AWS Command-Line Interface (CLI). For example, it could use:
aws rds create-db-snapshot
aws rds delete-db-instance
aws rds restore-db-instance-from-db-snapshot
You can achieve the task as said by #John, but there are a few points which we can not know, such as
the status of the clone
notification once complete
Below official Blog will help with everything we need to know during every clone.
Blog: Orchestrating database refreshes for Amazon RDS and Amazon Aurora | AWS Database Blog
CloudFormation Git Repo: aws-samples
/
database-refresh-orchestrator-for-amazon-rds-and-amazon-aurora
[Optional] Migrate from RDS to Aurora
Migrating an RDS for MySQL snapshot to Aurora - Amazon Aurora
I have a situation where data is getting loaded into staging tables of Aurora MySQL from AWS S3 using Lambda function on a daily basis. Now, I need to schedule a job in Aurora DB which will trigger a SQL Procedure (having some transformation logic) to load the data from staging tables to target tables.
Can this be achieved in AWS Aurora Database? Does it needs any help from any other AWS Services?
AFAIK you can't schedule db jobs from within the DB. You would need to use other AWS services.
Some options for scheduled tasks:
Trigger a cron job via Cloudwatch events. See this page for information on integrating with AWS Batch.
If using ECS, then scheduled tasks are an option
You've also got the option of running cron directly on an EC2 instance, but this seems flaky, and you'd be better off using AWS IMO.
You have a few options for the target of a cron job. It could trigger a Lambda function to modify the db (probably the easiest approach), but is likely to involve low level / Raw SQL operations.
Alternatively, you could launch a Docker container on ECS. This means you can use frameworks, or other libraries to give you abstractions for the DB update.
I would like to sync the local MySQL database to Amazon RDS MySQL database. I found a solution for EC2 to RDS but not for Local Database to RDS.
I built a database including 12 tables which all I want to get backup them to cloud periodically or automatically.
I do not want to run EC2 server since I need only MySQL database to get backup on cloud.
I need a solution like Microsoft Database Sync Agent. Whenever changes detected in Local Database, it should be synced to the cloud database. How can I make this happen?
You could use the AWS Database Migration Service:
AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud, between on-premises instances (through an AWS Cloud setup), or between combinations of cloud and on-premises setups.
With AWS DMS, you can perform one-time migrations, and you can replicate ongoing changes to keep sources and targets in sync.
You can achieve this by following below steps.
Make a replica of local server to RDS.
Enable Query logging in local Database
Create a cron job which will process logging queries and it will execute queries on RDS instance in same order.
To generate a replica to RDS you can follow below steps.
You can't replicate your local database to RDS directly. Your need to dump you data and then after you can import it on RDS.
Instead of generating a dump file you can directly import data into RDS using below command.
mysqldump db_name | mysql -h 'other_hostname' db_name
You can find our more about this over here.
https://dev.mysql.com/doc/refman/5.7/en/copying-databases.html
Also first import tables & it's data then after import your triggers, routines & events. If you import together then there is a chances to get conflict and you job will be terminated.
I have defined a schema and populated with data into MySql on my laptop. However, due to the large-scale of the datasets, the computing time suffers in the following analysis stage in python. So I decide to try to move all my work to cloud. I'm wondering if there is anyway to let the server in AWS directly connect to the mysql server in my laptop so that I can use the existing datasets without recollect them.
You may be interested in AWS Database Migration Service.
You can use DMS database migration service of aws which will do everything for you.
https://aws.amazon.com/dms/
But you have to take care about foreign key and other data. sometime it is generating the error.
DMS will transfer all of your data from laptop to aws database which will be on RDS or you can choose.