Move data between AWS RDS instances - mysql

I need to move millions of rows between identical mysql db's on two different rds instances. The approach I thought about is this:
- use data-pipeline to export data from the first instance to amazon-s3
- use data-pipeline to import data from amazon-s3 to the second instance
My problem is that I need to delete the data on the first instance at the end. Since we're talking about huge amounts of data I thought about creating a stored procedure to delete the rows in batches. Is there a way to achieve that in aws? Or are there any other solutions?
One other thing is that i need only to move some rows from a specific table, not the whole table or the whole database.

You can use AWS DMS service which is the easiest method to move the huge amount of data. Please follow the below steps.
First, you need to change some settings in Parameter group on both RDS instances.
'log_bin' = 'ON'
'binlog_format' = 'ROW'
'binlog_checksum' = 'NONE'
'log_bin_use_v1_row_events' = 'ON'
Take a dump of the database's schema from the first RDS instance.
Restore it on the second RDS.
Now start to configure the DMS.
Setup the Endpoints first.
Then create a task to import data from Source(first RDS) to
Destination(second RDS).
In the migration type, if you want to load existing data choose to
Migrate existing data or if you trying to sync real time data then
select ongoing changes.
Under task setting, select Target table preparation mode = Do
nothing.
Check Enable logging check box it'll help to debug in case of any
errors.
Once the task is started you can able to see the process in the
dashboard.

Using TRUNCATE TABLE instead of delete statement if you want to delete all the data in one table. It will save you a lot of time.

Data-pipeline is more for a recurring process. Seems like a lot of extra hassle if you just want to do a one-time operation. Maybe easier to launch an instance with decent network throughput, attach a big enough EBS volume to hold your data and use command line tools like mysqldump to move the data.
As far as cleanup goes, probably faster to come up with a query that copies the rows you want to keep to a temp table (or everything but the rows you don't want) and then use rename to swap out the temp table for the original. Then drop the original table.

Related

AWS DMS Propagate Truncate MySQL RDS

I’m hoping someone has come across this before and figured out a way to solve it. I have a MSSQL table hosted on prem that I need to keep in sync with Dynamo DB. It’s a large database with a lot of tables but I don’t need ALL the data off of it, I just need to query some data and that output that query result to DynamoDB. I need CDC on the data as well, meaning if the last query yields different results (update, delete, create) it is also reflected in the DynamoDB table.
I was able to accomplish this by doing:
AWS Batch -> Python script to query data and insert into MySQL table -> AWS DMS for replication -> DynamoDB
The issue with this is since it’s a batch job I need to be able to truncate the MySQL table and just load new data in, and have that be reflected on DynamoDb. The issue with that is DMS does NOT support truncate queries. I was wondering if maybe I just looped through all the rows and deleted them if it would work as a pseudo “Truncate” since delete and inserts to the table seem to be replicated perfectly to DynamoDB via DMS.
Is there a better way to do this? Will this work?

mysql automatic replication of partial data

I have to create a dashboard based on a table in mysql, and only on today datas
This db is used on a service with a massive data quantity, and continous read and write data, so I'd like to replicate in a "slave" instance part of this table (only today data).
Is it possible to do it in Mysql, without scripting?
Thanks
MySQL has no built-in feature to replicate a subset of rows. There are replication filters to replicate a subset of schemas or tables, but not rows.
One workaround could be to replicate fully to the replica, then on the replica delete any data that is more than one day old.
But this would work only for a database that is INSERT-only. If you also have UPDATE and DELETE operations replicated, they might find that they are trying to change rows that are missing. If you use ROW-based binary logs, this would result in a replication error when it can't find the row, and replication would stop.
It might work if you only use STATEMENT-based binary logs, but I've never tried it so I can't predict what other problems might occur. Also, you can't fully prevent ROW-based binary logs from occurring, because individual sessions can change their binary log format.
I think you're going to need a bespoke solution no matter what. Probably not using replication, but just an ETL job to query the current day's data and import it into another MySQL instance (not a replica).

MySQL update whole database without downtime

I have a large database that needs to be rebuilt every 24h. The database is built using a custom script on a server that puls data from different files. The problem is that the whole process takes 1 min to complete and that is 1 min downtime because we need to drop the whole database in order to rebuild it (there is no other way than to drop it).
At first, we planned to build a temporary database, and drop the original and then rename the temporary to the original name but MySQL doesn't support database renaming.
The second approach was to dump .sql file from temp database and import it to main(original) database but that also causes downtime.
What is the best way to do this?
Here is something that I do. It doesn't result in zero downtime but could finish in less than a second.
Create a database that only has interface elements to your real database. In my case, it only contains view definitions, and all user queries go through this database.
Create a new database each night. When it is done, then update the view definitions to refer to the new database. I would recommend either turning off user access to the database containing the views while you are updating them or deleting all of the views and recreating them -- this prevents partial access to the old database. Because creating views is fast, this should be a very fast operation.
We do all of this through a job. In fact, before changing the production views, we test the view creation on another database to be sure they are all working.
Obviously, if you use alter view instead of requiring consistency across all the views, then there is no downtime, just a brief period of inconsistency.

Duplicate a whole database on the same server?

We are running a service where we have to setup a new database for each new site. The database is exactly the same so we can simply dump from a backup file or clone from a sample database (which is created only for clone purpose, no transaction will be run there thus no worry about corrupting data) from the same server. The database it self contains around 100 tables and with some data, taking around 1-2mins to import, which is too slow.
I'm trying to find a way to do it as fast as possible, the first thought came to mind was to copy the files within the sample database data_dir, but it seems like I also need to somehow edit the table lists or mysql wont be able to read my new database's tables eventhough it still shows them there.
You're duplicating the database the wrong way, it will be much faster if you do it properly.
Here is how you duplicate a database:
create database new_database;
create table new_database.table_one select * from source_database.table_one;
create table new_database.table_two select * from source_database.table_two;
create table new_database.table_three select * from source_database.table_three;
...
I just did a performance test, this takes 81 seconds to duplicate 750MB of data across 7 million table rows. Presumably your database is smaller than that?
I don't think you are going to find anything faster. One thing you could do is already have a queue of duplicate databases on standby ready to be picked up and used at any time. So you don't need to create a new database at all, you just rename an existing database from a queue of available ones. And have a cron job running to make sure the queue never runs empty.
Why mysql not able to read or what you changes in table lists?
I think there may be problem of permissions to read by mysql, otherwise it would be fine..
Thanks

Copying data from PostgreSQL to MySQL

I currently have a PostgreSQL database, because one of the pieces of software we're using only supports this particular database engine. I then have a query which summarizes and splits the data from the app into a more useful format.
In my MySQL database, I have a table which contains an identical schema to the output of the query described above.
What I would like to develop is an hourly cron job which will run the query against the PostgreSQL database, then insert the results into the MySQL database. During the hour period, I don't expect to ever see more than 10,000 new rows (and that's a stretch) which would need to be transferred.
Both databases are on separate physical servers, continents apart from one another. The MySQL instance runs on Amazon RDS - so we don't have a lot of control over the machine itself. The PostgreSQL instance runs on a VM on one of our servers, giving us complete control.
The duplication is, unfortunately, necessary because the PostgreSQL database only acts as a collector for the information, while the MySQL database has an application running on it which needs the data. For simplicity, we're wanting to do the move/merge and delete from PostgreSQL hourly to keep things clean.
To be clear - I'm a network/sysadmin guy - not a DBA. I don't really understand all of the intricacies necessary in converting one format to the other. What I do know is that the data being transferred consists of 1xVARCHAR, 1xDATETIME and 6xBIGINT columns.
The closest guess I have for an approach is to use some scripting language to make the query, convert results into an internal data structure, then split it back out to MySQL again.
In doing so, are there any particular good or bad practices I should be wary of when writing the script? Or - any documentation that I should look at which might be useful for doing this kind of conversion? I've found plenty of scheduling jobs which look very manageable and well-documented, but the ongoing nature of this script (hourly run) seems less common and/or less documented.
Open to any suggestions.
Use the same database system on both ends and use replication
If your remote end was also PostgreSQL, you could use streaming replication with hot standby to keep the remote end in sync with the local one transparently and automatically.
If the local end and remote end were both MySQL, you could do something similar using MySQL's various replication features like binlog replication.
Sync using an external script
There's nothing wrong with using an external script. In fact, even if you use DBI-Link or similar (see below) you probably have to use an external script (or psql) from a cron job to initiate repliation, unless you're going to use PgAgent to do it.
Either accumulate rows in a queue table maintained by a trigger procedure, or make sure you can write a query that always reliably selects only the new rows. Then connect to the target database and INSERT the new rows.
If the rows to be copied are too big to comfortably fit in memory you can use a cursor and read the rows with FETCH, which can be helpful if the rows to be copied are too big to comfortably fit in memory.
I'd do the work in this order:
Connect to PostgreSQL
Connect to MySQL
Begin a PostgreSQL transaction
Begin a MySQL transaction. If your MySQL is using MyISAM, go and fix it now.
Read the rows from PostgreSQL, possibly via a cursor or with DELETE FROM queue_table RETURNING *
Insert them into MySQL
DELETE any rows from the queue table in PostgreSQL if you haven't already.
COMMIT the MySQL transaction.
If the MySQL COMMIT succeeded, COMMIT the PostgreSQL transaction. If it failed, ROLLBACK the PostgreSQL transaction and try the whole thing again.
The PostgreSQL COMMIT is incredibly unlikely to fail because it's a local database, but if you need perfect reliability you can use two-phase commit on the PostgreSQL side, where you:
PREPARE TRANSACTION in PostgreSQL
COMMIT in MySQL
then either COMMIT PREPARED or ROLLBACK PREPARED in PostgreSQL depending on the outcome of the MySQL commit.
This is likely too complicated for your needs, but is the only way to be totally sure the change happens on both databases or neither, never just one.
BTW, seriously, if your MySQL is using MyISAM table storage, you should probably remedy that. It's vulnerable to data loss on crash, and it can't be transactionally updated. Convert to InnoDB.
Use DBI-Link in PostgreSQL
Maybe it's because I'm comfortable with PostgreSQL, but I'd do this using a PostgreSQL function that used DBI-link via PL/Perlu to do the job.
When replication should take place, I'd run a PL/PgSQL or PL/Perl procedure that uses DBI-Link to connect to the MySQL database and insert the data in the queue table.
Many examples exist for DBI-Link, so I won't repeat them here. This is a common use case.
Use a trigger to queue changes and DBI-link to sync
If you only want to copy new rows and your table is append-only, you could write a trigger procedure that appends all newly INSERTed rows into a separate queue table with the same definition as the main table. When you want to sync, your sync procedure can then in a single transaction LOCK TABLE the_queue_table IN EXCLUSIVE MODE;, copy the data, and DELETE FROM the_queue_table;. This guarantees that no rows will be lost, though it only works for INSERT-only tables. Handling UPDATE and DELETE on the target table is possible, but much more complicated.
Add MySQL to PostgreSQL with a foreign data wrapper
Alternately, for PostgreSQL 9.1 and above, I might consider using the MySQL Foreign Data Wrapper, ODBC FDW or JDBC FDW to allow PostgreSQL to see the remote MySQL table as if it were a local table. Then I could just use a writable CTE to copy the data.
WITH moved_rows AS (
DELETE FROM queue_table RETURNING *
)
INSERT INTO mysql_table
SELECT * FROM moved_rows;
In short you have two scenarios:
1) Make destination pull the data from source into its own structure
2) Make source push out the data from its structure to destination
I'd rather try the second one, look around and find a way to create postgresql trigger or some special "virtual" table, or maybe pl/pgsql function - then instead of external script, you'll be able to execute the procedure by executing some query from cron, or possibly from inside postgres, there are some possibilities of operation scheduling.
I'd choose 2nd scenario, because postgres is much more flexible, and manipulating data some special, DIY ways - you will simply have more possibilities.
External script probably isn't a good solution, e.g. because you will need to treat binary data with special care, or convert dates&times from DATE to VARCHAR and then to DATE again. Inside external script, various text-stored data will be probably just strings, and you will need to quote it too.