I am trying to import large data table, around 1 billion record into MySQL(Amazon RDS) with a single .sql file by using source <.sql file>. During operation it is breaking connection.
How can I make it possible?
Thanks
One way to do it is to import the data into a mysql database locally, shut down the database server, then use import it into an amazon EC2 instance, then use replication to synchronize it to RDS. This is from the Amazon documentation on importing a large database to RDS.
Related
I had a question regarding migrating large data form my local machine to AWS RDS (Aurora DB). Basically I have local mysql database that has couple of tables with around 4GB of data. I need to replicate this data in AWS RDS. The approach I was thinking was to make INSERT call to the RDS but with this huge amount of data (32 million rows), the process would be costly. I did see some resources on exporting data from local and importing it in RDS but could not quite understand how it works. Does someone have a good idea about this and advice me on what would be the best process. PS: the data only exist on local machine and not in any servers.
Dump a CSV extract into S3 then use an AWS migration tool, I.e. see: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html
Since AWS Aurora does not support the RDS free tier (it does not have Micro instance support), I am using a MySQL server instead.
I have a script that generates data (currently in XML) that can be imported into MySQL, then writes it to an S3 bucket. I was intending to use the LOAD XML FROM S3 command like in this answer to import it from the bucket, but I get a syntax error when I try.
I've looked at AWS Data Pipelines, but it seems hard to maintain since, from what I can tell, it only supports CSV, and I would have to edit the SQL query to import the lines manually whenever the structure of the database changes. This is an advantage of XML; LOAD XML gets the column names from the file, not the query used.
Does the AWS MySQL (not Aurora) support importing from S3? Or do I have to generate the XML, write it locally and to the bucket, then use LOAD XML LOCAL INFILE on the local file?
There are multiple limitations while importing data in RDS from S3 as mentioned in the official documentation. Check if any of the below applies to you.
Limitations and Recommendations for Importing Backup Files from Amazon S3 to Amazon RDS The following are some limitations and
recommendations for importing backup files from Amazon S3:
You can only import your data to a new DB instance, not an existing
DB instance.
You must use Percona XtraBackup to create the backup of your on-premises database.
You can't migrate from a source database that has tables defined
outside of the default MySQL data directory.
You can't import a MySQL 5.5 or 8.0 database.
You can't import an on-premises MySQL 5.6 database to an Amazon RDS
MySQL 5.7 or 8.0 database. You can upgrade your DB instance after
you complete the import.
You can't restore databases larger than the maximum database size
supported by Amazon RDS for MySQL. For more information about
storage limits, see General Purpose SSD Storage and Provisioned IOPS
SSD Storage.
You can't restore from an encrypted source database, but you can
restore to an encrypted Amazon RDS DB instance.
You can't restore from an encrypted backup in the Amazon S3 bucket.
You can't restore from an Amazon S3 bucket in a different AWS Region
than your Amazon RDS DB instance.
Importing from Amazon S3 is not supported on the db.t2.micro DB
instance class. However, you can restore to a different DB instance
class, and then change the instance class later. For more
information about instance classes, see Hardware Specifications for
All Available DB Instance Classes.
Amazon S3 limits the size of a file uploaded to an Amazon S3 bucket
to 5 TB. If a backup file exceeds 5 TB, then you must split the
backup file into smaller files.
Amazon RDS limits the number of files uploaded to an Amazon S3
bucket to 1 million. If the backup data for your database, including
all full and incremental backups, exceeds 1 million files, use a
tarball (.tar.gz) file to store full and incremental backup files in
the Amazon S3 bucket.
User accounts are not imported automatically. Save your user
accounts from your source database and add them to your new DB
instance later.
Functions are not imported automatically. Save your functions from
your source database and add them to your new DB instance later.
Stored procedures are not imported automatically. Save your stored
procedures from your source database and add them to your new DB
instance later.
Time zone information is not imported automatically. Record the
time
zone information for your source database, and set the time zone of
your new DB instance later. For more information, see Local Time
Zone for MySQL DB Instances.
Backward migration is not supported for both major versions and
minor versions. For example, you can't migrate from version 5.7 to
version 5.6, and you can't migrate from version 5.6.39 to version
5.6.37.
I would like to sync the local MySQL database to Amazon RDS MySQL database. I found a solution for EC2 to RDS but not for Local Database to RDS.
I built a database including 12 tables which all I want to get backup them to cloud periodically or automatically.
I do not want to run EC2 server since I need only MySQL database to get backup on cloud.
I need a solution like Microsoft Database Sync Agent. Whenever changes detected in Local Database, it should be synced to the cloud database. How can I make this happen?
You could use the AWS Database Migration Service:
AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud, between on-premises instances (through an AWS Cloud setup), or between combinations of cloud and on-premises setups.
With AWS DMS, you can perform one-time migrations, and you can replicate ongoing changes to keep sources and targets in sync.
You can achieve this by following below steps.
Make a replica of local server to RDS.
Enable Query logging in local Database
Create a cron job which will process logging queries and it will execute queries on RDS instance in same order.
To generate a replica to RDS you can follow below steps.
You can't replicate your local database to RDS directly. Your need to dump you data and then after you can import it on RDS.
Instead of generating a dump file you can directly import data into RDS using below command.
mysqldump db_name | mysql -h 'other_hostname' db_name
You can find our more about this over here.
https://dev.mysql.com/doc/refman/5.7/en/copying-databases.html
Also first import tables & it's data then after import your triggers, routines & events. If you import together then there is a chances to get conflict and you job will be terminated.
I have created a Relational Database (MySQL) hosted on Amazon Web Services. What I would like to do next is, import the data in my local CSV files into this database. I would really appreciate if someone provides me an outline on how to go about it.Thanks!
This is easiest and most hands-off by using MySQL command line. For large loads, consider spinning up a new EC2 instance, installing MySQL CL tools, and transferring your file to that machine. Then, after connecting to your database via CL, you'd do something like:
mysql> LOAD DATA LOCAL INFILE 'C:/upload.csv' INTO TABLE myTable;
Also options to match your file's details and ignore header (plenty more in the docs)
mysql> LOAD DATA LOCAL INFILE 'C:/upload.csv' INTO TABLE myTable FIELDS TERMINATED BY ','
ENCLOSED BY '"' IGNORE 1 LINES;
If you're hesitant to use CL, download MySQL Workbench. It connects no prob to AWS RDS.
Closing thoughts:
MySQL LOAD DATA Docs
AWS' Aurora RDS is MySQL-compatible so command works there too
"LOCAL" flag actually transfers the file from your client machine (where you're running the command) to the DB server. Without LOCAL, the file must be on the DB server (not possible to transfer it there in advance with RDS)
Works great on huge files too! Just sent a 8.2GB file via this method (260 million rows). Took just over 10 hours from a t2-medium EC2 to db.t2.small Aurora
Not a solution if you need to watch out for unique keys or read the CSV row-by-row and change the data before inserting/updating
I did some digging and found this official AWS documentation on how to import data from any source to MySQL hosted on RDS.
It is a very detailed step by step guide and icludes an explanation on how to import CSV files.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.AnySource.html
Basically, each table must have its own file. Data for multiple tables cannot be combined in the same file. Give each file the same name as the table it corresponds to. The file extension can be anything you like. For example, if the table name is "sales", the file name could be "sales.csv" or "sales.txt", but not "sales_01.csv".
Whenever possible, order the data by the primary key of the table being loaded. This drastically improves load times and minimizes disk storage requirements.
There is another option to import data to MySQL database, you can use an external tool Alooma that can do the data import for you in real time.
Depending on how large is your file, but if it is under 1GB I found that DataGrip imports smaller files without any issues: https://www.jetbrains.com/datagrip/
You get nice mapping tool and graphical IDE to play around. DataGrip is available as a trial for 30 days free.
I am experiencing myself RDS connection dropouts with bigger files like > 2GB. Not sure if it is about the DataGrip or AWS side.
I think your best bet would be to develop a script in your language of choice to connect to the database and import it.
If your database is internet accessible then you can run that script locally. If it is in a private subnet then you can either run that script on an EC2 instance with access to the private subnet or on lambda connected to your VPC. You should really only use lambda if you expect runtime to be less than 5 minutes or so.
Edit: Note that lambda only supports a handful of languages
AWS Lambda supports code written in Node.js (JavaScript), Python, Java
(Java 8 compatible), and C# (.NET Core).
I have a legacy SQL Server DB and I need to copy part of a very very big table on it over to a new Aurora DB cluster from AWS (RDS).
The old table in SQL server has 1.8 billion records and 43 columns, however in the new DB I will only have 13 of those columns carried over and almost all rows.
I was wondering if anyone has any ideas on the best way that I can move this data across?
I wrote a simple Python script to query the SQL server and then execute insert statements on the new DB but I estimate this would take about 30 hours to run after I did some tests on smaller sets of data.
Any ideas?
P.S Aurora is based off of MySQL so I would imagine if it works for MySQL it would work for Aurora.
Assuming you can get the data you want into something like a CSV file, LOAD DATA LOCAL INFILE should be pretty performant.
I did wonder whether it would be allowed on RDS and discovered an AWS article on importing data into MySQL on RDS. I couldn't find an equivalent one for Aurora, only migrating from an RDS based MySQL instance. There's an Amazon RDS for Aurora Export/Import Performance Best Practices document that has one reference to LOAD DATA LOCAL INFILE, however.