load data from s3 to mySQL running EC2 instance (not RDS) - mysql

I want to able able to use the load data infile command in mySQL, but instead of loading the data from a local file I want to load it from a CSV file.
I.e., if the file is in local storage it'd look like:
LOAD DATA INFILE'C:\\abc.csv' INTO TABLE abc
But if it's in S3, not sure how I could do something like this.
Is this possible?
NOTE: this is not an RDS machine, so this command does not seem to work:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-copys3tords.html

The mysql CLI allows you to execute STDIN as a stream of SQL statements.
Using a combination of the s3 CLI and mkfifo, you can stream data out of s3.
Then it's a simple matter of connecting the streams with something that re-formats the CSV into valid SQL.
mkfifo /tmp/mypipe
s3 cp s3://your/s3/object /tmp/mypipe
python transform_csv_to_sql.py < /tmp/mypipe | mysql target_database
You might be able to remove the python step and use MySQL's CSV code if you tell MySQL to load the data directly from your fifo:
mkfifo /tmp/mypipe
s3 cp s3://your/s3/object /tmp/mypipe
mysql target_database --execute "LOAD DATA INFILE '/tmp/mypipe'"
Good luck!

Related

is there any sample shell script to load data from AWS S3 to MySQL RDS instance

Please provide if you guys have some idea/any sample shell script to load data from AWS S3 to MySQL (RDS) instance.
You can achieve this in 2 steps.
Step 1 - Download the CSV file.
wget <s3-csv-url>
Step 2 - Use the mysqlimport to import the data from csv to your MySQL database.
In case you are using an Aurora Instance, you could simply use the LOAD command to import data from S3 to your database table.

Using Cron to export data from mysql database to CSV, then getting all it's data to bigquery table

Using Cron to export data from mysql database to CSV, then reading this csv file and getting all it's data to google cloud using bigquery
Hello guys, I have a Mysql database called db_test, and one table in it called members_test(id, name). I'm working on Linux Ubunto OS. I am trying to use cronjob to take data at midnight from this table into a CSV file. Also I want to let bigquery somehow read this csv file and take its data and put them in a table called cloud_members_tab saved on the google cloud platform.
How to do this?
make sure you have your CSV generated correctly (don't rely on MySQL CSV export natively)
install gsutil and bq command line utility
upload CSV to Google Cloud Storage
use a shell command like below:
gsutil cp -j -o="GSUtil:parallel_composite_upload_threshold=150M"
/tmp/export.csv gs://bucket/export.csv
use bq load
bq load --source_format=CSV --field_delimiter="," --null_marker="\N"
--allow_quoted_newlines --autodetect --source_format=CSV dataset.tablename gs://bucket/export.csv

How does mysqlimport work internally?

As MySQL doc says LOAD DATA INFILE is 20x times faster than bulk insert(INSERT VALUES), I am tring to replace INSERT VALUES with LOAD DATA INFILE in my application.
When I try LOAD DATA INFILE with ADO.NET, it ends in an error. It seems there is no way to load a file from local machine where MySQL is not running.
But that is not a problem for mysqlimport, the doc also says that.
You can also load data files by using the mysqlimport utility; it
operates by sending a LOAD DATA INFILE statement to the server. The
--local option causes mysqlimport to read data files from the client host.
I am curious about what the protocol is used by mysqlimport when it imports data into MySQL on a remote machine.
Is it parsing the data file and construct SQL like INSERT VALUES?
or it has a special protocol to send the data file directly to MySQL?

Loading from CSV data from S3 into AWS RDS Mysql

So as the question suggests, I am looking for a command similar to the "copy" command of redhsifts which allows me to load csv data stored in an S3 buckets directly into a AWS RDS Mysql table ( it's not aurora).
How do I do that?
Just migrate your database to aurora, then use LOAD DATA INFILE FROM S3 's3://<your-file-location>/<filename>.csv'
Careful though if you are using big CSV files - you'll have to manage timeouts and tune your instance to have a fast write capacity.

Export mysql query to amazon s3 bucket

Based on the accepted answer here, I am able to export the results of mysql query to a csv file on my Amazon EC2 instance using the following:
mysql -user -pass -e "SELECT * FROM table" > /data.csv
However, as the file exported is large, I want to export an Amazon s3-bucket (s3:\\mybucket) which is accessible from my EC2 instance
I tried:
mysql -user -pass -e "SELECT * FROM table" > s3:\\mybucket\data.csv
But it doesn't export the file.
If you want to use the mysql command line program, then you have two choices:
Increase the size of your instance's storage so that the file can be created. Then copy the file to S3
Create a separate program or script that reads from Standard Input and writes to S3.
Another solution would be to create a simple program that processes your SELECT and directly writes to S3. There are lots of examples of this on the Internet in Python, Java, etc.
In addition to the accepted answer, if you are using the Aurora, then you can do
SELECT * FROM table INTO OUTFILE S3 's3:\\mybucket\table-data';
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html
Alternate approach,
mysqldump -h [db_hostname] -u [db_user] -p[db_passwd] [databasename] [tablename] | aws s3 cp - s3://[s3_bucketname]/[mysqldump_filename]
it'll directly store the file to s3 without occupying space