Using AWS DMS for CDC from MySQL to S3 - mysql

I would like to use AWS DMS to replicate changes from transactional on-premise MySQL database to AWS S3. Afterward, that data would be moved to Snowflake datamart.
The problem I'm facing, how can I start the replication task manually, let's say every 6 hours, and transfer to S3 only changes during that period. Then, I would load data into Snowflake and move(remove) loaded files so the landing area path is empty and ready for the new 6 hours batch.

Related

Migrating Very Large Data Set from Local Machine To AWS RDS (Aurora)

I had a question regarding migrating large data form my local machine to AWS RDS (Aurora DB). Basically I have local mysql database that has couple of tables with around 4GB of data. I need to replicate this data in AWS RDS. The approach I was thinking was to make INSERT call to the RDS but with this huge amount of data (32 million rows), the process would be costly. I did see some resources on exporting data from local and importing it in RDS but could not quite understand how it works. Does someone have a good idea about this and advice me on what would be the best process. PS: the data only exist on local machine and not in any servers.
Dump a CSV extract into S3 then use an AWS migration tool, I.e. see: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html

Archiving AWS RDS mysql Database

I am looking for options to archive my old data from specific tables of an AWS RDS MySQL database.
I came across AWS S3, AWS Glacier and copy the data to either one using some Pipelines or Buckets, but from what I understood they copy the data to vault or backups the data, but don't move them.
Is there a proper option to archive the data by moving from RDS to S3 or Glacier or Deep Archive? i.e., deleting from the table in AWS RDS after creating an archive.
What would be the best option for the archival process with my requirements and would it affect the replicas that already exist?
The biggest consideration when "archiving" the data is ensuring that it is in a useful format should you every want it back again.
Amazon RDS recently added that ability to export RDS snapshot data to Amazon S3.
Thus, the flow could be:
Create a snapshot of the Amazon RDS database
Export the snapshot to Amazon S3 as a Parquet file (you can choose to export specific sets of databases, schemas, or tables)
Set the Storage Class on the exported file as desired (eg Glacier Deep Archive)
Delete the data from the source database (make sure you keep a Snapshot or test the Export before deleting the data!)
When you later wish to access the data:
Restore the data if necessary (based upon Storage Class)
Use Amazon Athena to query the data directly from Amazon S3
Recently I did build a similar pipeline using AWS lambda that runs on a cron schedule(Cloudwatch event) every month to take a manual snapshot of the RDS, export it to S3, and delete the records that are older than n days
I added a gist of the util class that I used, adding it here if it helps anyone
JS Util class to create and export Db snapshots to S3
PS: I just wanted to add it as a comment to the approved answer but don't have enough reputations for that.

AWS RDS: Load XML From S3?

Since AWS Aurora does not support the RDS free tier (it does not have Micro instance support), I am using a MySQL server instead.
I have a script that generates data (currently in XML) that can be imported into MySQL, then writes it to an S3 bucket. I was intending to use the LOAD XML FROM S3 command like in this answer to import it from the bucket, but I get a syntax error when I try.
I've looked at AWS Data Pipelines, but it seems hard to maintain since, from what I can tell, it only supports CSV, and I would have to edit the SQL query to import the lines manually whenever the structure of the database changes. This is an advantage of XML; LOAD XML gets the column names from the file, not the query used.
Does the AWS MySQL (not Aurora) support importing from S3? Or do I have to generate the XML, write it locally and to the bucket, then use LOAD XML LOCAL INFILE on the local file?
There are multiple limitations while importing data in RDS from S3 as mentioned in the official documentation. Check if any of the below applies to you.
Limitations and Recommendations for Importing Backup Files from Amazon S3 to Amazon RDS The following are some limitations and
recommendations for importing backup files from Amazon S3:
You can only import your data to a new DB instance, not an existing
DB instance.
You must use Percona XtraBackup to create the backup of your on-premises database.
You can't migrate from a source database that has tables defined
outside of the default MySQL data directory.
You can't import a MySQL 5.5 or 8.0 database.
You can't import an on-premises MySQL 5.6 database to an Amazon RDS
MySQL 5.7 or 8.0 database. You can upgrade your DB instance after
you complete the import.
You can't restore databases larger than the maximum database size
supported by Amazon RDS for MySQL. For more information about
storage limits, see General Purpose SSD Storage and Provisioned IOPS
SSD Storage.
You can't restore from an encrypted source database, but you can
restore to an encrypted Amazon RDS DB instance.
You can't restore from an encrypted backup in the Amazon S3 bucket.
You can't restore from an Amazon S3 bucket in a different AWS Region
than your Amazon RDS DB instance.
Importing from Amazon S3 is not supported on the db.t2.micro DB
instance class. However, you can restore to a different DB instance
class, and then change the instance class later. For more
information about instance classes, see Hardware Specifications for
All Available DB Instance Classes.
Amazon S3 limits the size of a file uploaded to an Amazon S3 bucket
to 5 TB. If a backup file exceeds 5 TB, then you must split the
backup file into smaller files.
Amazon RDS limits the number of files uploaded to an Amazon S3
bucket to 1 million. If the backup data for your database, including
all full and incremental backups, exceeds 1 million files, use a
tarball (.tar.gz) file to store full and incremental backup files in
the Amazon S3 bucket.
User accounts are not imported automatically. Save your user
accounts from your source database and add them to your new DB
instance later.
Functions are not imported automatically. Save your functions from
your source database and add them to your new DB instance later.
Stored procedures are not imported automatically. Save your stored
procedures from your source database and add them to your new DB
instance later.
Time zone information is not imported automatically. Record the
time
zone information for your source database, and set the time zone of
your new DB instance later. For more information, see Local Time
Zone for MySQL DB Instances.
Backward migration is not supported for both major versions and
minor versions. For example, you can't migrate from version 5.7 to
version 5.6, and you can't migrate from version 5.6.39 to version
5.6.37.

Replicate Data Regularly from AWS RDS (MySQL) to another Server (EC2 instance)

We have a large AWS RDS(MySQL) Instance and we need to replicate data from it to another Ec2 Instance, daily at a certain time for reporting and analysis purpose.
currently we are using mysqldump to create a dump file and then copy the whole schema which takes a lot of time. Is there a faster way of doing this, it would be a lot better if it copies only the new records.
How can we copy data without copying whole schema every time?
You should look at the Database Migration Service. Don't be confused by the name. It can do continuous or one time replication. From the FAQ:
Q. In addition to one-time data migration, can I use AWS Database
Migration Service for continuous data replication?
Yes, you can use AWS Database Migration Service for both one-time data
migration into RDS and EC2-based databases as well as for continuous
data replication. AWS Database Migration Service will capture changes
on the source database and apply them in a transactionally-consistent
way to the target. Continuous replication can be done from your data
center to the databases in AWS or in the reverse, replicating to a
database in your datacenter from a database in AWS. Ongoing continuous
replication can also be done between homogeneous or heterogeneous
databases. For ongoing replication it would be preferable to use
Multi-AZ for high-availability.
You can use AWS Glue to do the database migration as an ETL job periodically.
You can also consider using AWS Data Migration Service (DMS).
However AWS Glue is preferred over DMS for ETL jobs that runs within AWS and you are familiar with Python or Scala to write the transformation logic.
Q: When should I use AWS Glue vs AWS Database Migration Service?
AWS Database Migration Service (DMS) helps you migrate databases to AWS easily and securely. For use cases which require a database migration from on-premises to AWS or database replication between on-premises sources and sources on AWS, we recommend you use AWS DMS. Once your data is in AWS, you can use AWS Glue to move and transform data from your data source into another database or data warehouse, such as Amazon Redshift.

Loading from CSV data from S3 into AWS RDS Mysql

So as the question suggests, I am looking for a command similar to the "copy" command of redhsifts which allows me to load csv data stored in an S3 buckets directly into a AWS RDS Mysql table ( it's not aurora).
How do I do that?
Just migrate your database to aurora, then use LOAD DATA INFILE FROM S3 's3://<your-file-location>/<filename>.csv'
Careful though if you are using big CSV files - you'll have to manage timeouts and tune your instance to have a fast write capacity.