Related
I am looking into migrating my MySQL DB to Azure Database for MySQL https://azure.microsoft.com/en-us/services/mysql/. It currently resides on a server hosted by another company. The DB is about 100 GB. (It worries me that Azure uses the term "relatively large" for 1GB.)
Is there a way to migrate the DB without any or little (a few hours, max) downtime? I obviously can't do a dump and load as the downtime could be days. Their documentation seems to be for syncing with a MySQL server that is already on a MS server.
Is there a way to export the data out of MS Azure if I later want to use something else, again without significant downtime?
Another approach: Use Azure Data Factory to copy the data from your MySQL source to your Azure DB. Set up a sync procedure that updates your Azure Database with new rows. Sync, take MYSQL db offline, sync once more and switch to the Azure DB.
See Microsoft online help
Don't underestimate the complexity of this migration.
With 100GB, it's a good guess that most rows in your tables don't get UPDATEd or DELETEd.
For my suggestion here to work, you will need a way to
SELECT * FROM table WHERE (the rows are new or updated since a certain date)
Some INSERT-only tables will have autoincrementing ID values. In this case you can figure out the ID cutoff value between old and new. Other tables may be UPDATEd. Unless those table have timestamps saying when they were updated, you'll have a challenge figuring it out. You need to understand your data to do that. It's OK if your WHERE (new or updated) operation takes some extra rows that are older. It's NOT OK if it misses INSERTed or UPDATEd rows.
Once you know how to do this for each large table, you can start migrating.
Mass Migration Keeping your old system online and active, you can use mysqldump to migrate your data to the new server. You can take as long as you require to do it. Read this for some suggestions. getting Lost connection to mysql when using mysqldump even with max_allowed_packet parameter
Then, you'll have a stale copy of the data on the new server. Make sure the indexes are correctly built. You may want to use OPTIMIZE TABLE on the newly loaded tables.
Update Migration You can then use your WHERE (the rows are new or updated) queries to migrate the rows that have changed since you migrated the whole table. Again, you can take as long as you want to do this, keeping your old system online. It should take much less time than your first migration, because it will handle far fewer rows.
Final Migration, offline Finally, you can take your system offline and migrate the remaining rows, the ones that changed since your last migration. And migrate your small tables in their entirety, again. Then start your new system.
Yeah but, you say, how will I know I did it right?
For best results, you should script your migration steps, and use the scripts. That way your final migration step will go quickly.
You could rehearse this process on a local server on your premises. While 100GiB is big for a database, it's not an outrageous amount of disk space on a desktop or server-room machine.
Save the very large extracted files from your mass migration step so you can re-use them when you flub your first attempts to load them. That way you'll save the repeated extraction load on your old system.
You should stand up a staging copy of your migrated database (at your new cloud provider) and test it with a staging copy of your application. You may be able to do this with a small subset of your rows. But do test your final migration step with this copy to make sure it works.
Be prepared for a fast rollback to the old system if the new one goes wrong .
AND, maybe this is an opportunity to purge out some old data before you migrate. This kind of migration is difficult enough that you could make a business case for extracting and then deleting old rows from your old server, before you start migrating.
Google says NO triggers, NO stored procedures, No views. This means the only thing I can dump (or import) is just a SHOW TABLES and SELECT * FROM XXX? (!!!).
Which means for a database with 10 tables and 100 triggers, stored procedures and views I have to recreate, by hand, almost everything? (either for import or for export).
(My boss thinks I am tricking him. He cannot understand how previous, to me, employers did that replication to a bunch of computers using two clicks and I personally need hours (or even days) to do this with an internet giant like Google.)
EDIT:
We have applications which are being created in local computers, where we use our local MySQL. These applications use MySQL DB's which consist, say, from n tables and 10*n triggers. For the moment we cannot even check google-cloud-sql since that means almost everything (except the n almost empty tables) must be "uploaded" by hand. And we cannot also check using google-cloud-sql DB since that means almost everything (except the n almost empty tables) must be "downloaded" by hand.
Until now we do these "up-down"-loads by taking a decent mysqldump from the local or the "cloud" MySQL.
It's unclear what you are asking for. Do you want "replication" or "backups" because these are different concepts in MySQL.
If you want to replicate data to another MySQL instance, you can set up replication. This replication can be from a Cloud SQL instance, or to a Cloud SQL instance using the external master feature.
If you want to backup data to or from the server, checkout these pages on importing data and exporting data.
As far as I understood, you want to Create Cloud SQL Replicas. There are a bunch of replica options found in the doc, use the one that fits the best to you.
However, if you said "replica" as Cloning a Cloud SQL instance, you can follow the steps to clone your instance in a new and independent instance.
Some of these tutorials are done by using the GCP Console and can be scheduled.
We ran into serious performance problems with our Oracle database and we would like to try to migrate it to a MySQL-based database (either MySQL directly or, more preferably, Infobright).
The thing is, we need to let the old and the new system overlap for at least some weeks if not months, before we actually know, if all the features of the new database match our needs.
So, here is our situation:
The Oracle database consists of multiple tables with each millions of rows. During the day, there are literally thousands of statements, which we cannot stop for migration.
Every morning, new data is imported into the Oracle database, replacing some thousands of rows. Copying this process is not a problem, so we could, in theory, import in both databases in parallel.
But, and here the challenge lies, for this to work we need to have an export from the Oracle database with a consistent state from one day. (We cannot export some tables on Monday and some others on Tuesday, etc.) This means, that at least the export should be finished in less than one day.
Our first thought was to dump the schema, but I wasn't able to find a tool to import an Oracle dump file into MySQL. Exporting tables in CSV files might work, but I'm afraid it could take too long.
So my question now is:
What should I do? Is there any tool to import Oracle dump files into MySQL? Does anybody have any experience with such a large-scale migration?
PS: Please, don't suggest performance optimization techniques for Oracle, we already tried a lot :-)
Edit: We already tried some ETL tools before, only to find out, that they were not fast enough: Exporting only one table already took more than 4 hours ...
2nd Edit: Come on folks ... did nobody ever try to export a whole database as fast as possible and convert the data so that it can be imported into another database system?
Oracle does not supply an out-of-the-box unload utility.
Keep in mind without comprehensive info about your environment (oracle version? server platform? how much data? what datatypes?) everything here is YMMV and you would want to give it a go on your system for performance and timing.
My points 1-3 are just generic data movement ideas. Point 4 is a method that will reduce downtime or interruption to minutes or seconds.
1) There are 3rd party utilities available. I have used a few of these but best for you to check them out yourself for your intended purpose. A few 3rd party products are listed here: OraFaq . Unfortunately a lot of them run on Windows which would slow down the data unload process unless your DB server was on windows and you could run the load utility directly on the server.
2) If you don't have any complex datatypes like LOBs then you can roll your own with SQLPLUS. If you did a table at a time then you can easily parallelize it. Topic has been visited on this site probably more than once, here is an example: Linky
3) If you are 10g+ then External Tables might be a performant way to accomplish this task. If you create some blank external tables with the same structure as your current tables and copy the data to them, the data will be converted to the external table format (a text file). Once again, OraFAQ to the rescue.
4) If you must keep systems in parallel for days/weeks/months then use a change data capture/apply tool for near-zero downtime. Be prepared to pay $$$. I have used Golden Gate Software's tool that can mine the Oracle redo logs and supply insert/update statements to a MySQL Database. You can migrate the bulk of the data with no downtime the week before go-live. Then during your go-live period, shut down the source database, have Golden Gate catch up the last remaining transactions, then open up access to your new target database. I have used this for upgrades and the catch up period was only a few minutes. We already had a site licenses for Golden Gate so it wasn't anything out of pocket for us.
And I'll play the role of Cranky DBA here and say if you can't get Oracle performing well I would love to see a write up of how MySQL fixed your particular issues. If you have an application where you can't touch the SQL, there are still lots of possible ways to tune Oracle. /soapbox
I have built a C# application that can read an Oracle dump (.dmp) file and pump it's tables of data into a SQL Server database.
This application is used nightly on a production basis to migrate a PeopleSoft database to SQL Server. The PeopleSoft database has 1100+ database tables and the Oracle dump file is greater than 4.5GB in size.
This application creates the SQL Server database and tables and then loads all 4.5GB of data in less than 55 minutes running on a dual-core Intel server.
I don't believe it would be too difficult to modify this application to work with other databases provided they have an ADO.NET provider.
yeah, Oracle is pretty slow. :)
You can use any number of ETL tools to move data from Oracle into MySQL. My favourite is SQL Server Integration Services.
If you have Oracle9i or higher, you can implement Change Data Capture. Read more here http://download-east.oracle.com/docs/cd/B14117_01/server.101/b10736/cdc.htm
Then you can take a delta of changes from Oracle to your MySQL or Infobright using any ETL technologies.
We had the same issue. Needed to get tables and data from oracle dbms to mysql dbms.
We used this tool we found online... It worked well.
http://www.sqlines.com/download
This tool will basically help you:
Connect to your source DBMS(ORACLE)
Connect to destination DBMS(MySQL)
Specify schema and tables in the ORACLE DBMS you want to migrate
Press a "Transfer" button to Run the migration process(running inbuilt migration queries)
Get a transfer log, which will tell how many records were READ from SOURCE and WRITTEN on the destination database, what queries failed.
Hope this will help others that will land on this question.
I've used Pentaho Data Integration to migrate from Oracle to MySql (I also migrated the same data to Postresql, which was about 50% quicker, which I guess was largely due to the different JDBC drivers being used). I followed Roland Bouman's instructions here, almost to the letter, and was very pleasantly suprised at how easy it was:
Copy Table data from one DB to another
I don't know whether it will be appropriate for your data load, but it's worth a shot.
I recently released etlalchemy to accomplish this task. It is an open-sourced solution which allows migration between any 2 SQL databases with 4 lines of Python, and was initially designed to migrate from Oracle to MySQL. Support has been added for MySQL, PostgreSQL, Oracle, SQLite and SQL Server.
This will take care of migrating schema (arguably the most challenging), data, indexes and constraints, with many more options available.
To install:
$ pip install etlalchemy
On El Capitan: pip install --ignore-installed etlalchemy
To run:
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
orcl_db_source = ETLAlchemySource("oracle+cx_oracle://username:password#hostname/ORACLE_SID")
mysql_db_target = ETLAlchemyTarget("mysql://username:password#hostname/db_name", drop_database=True)
mysql_db_target.addSource(orcl_db_source)
mysql_db_target.migrate()
Concerning performance, this tool utilizes BULK import tools across various RDBMS such as mysqlimport and COPY FROM (postgresql) to carry out migrations efficiently. I was able to migrate a 5GB SQL Server database with 33,105,951 rows into MySQL in 40 minutes, and a 3GB 7,000,000 row Oracle database to MySQL in 13 minutes.
To get more background on the origins of the project, check out this post. If you get any errors running the tool, open an issue on the github repo and I'll patch it up in less than a week!
(To install the "cx_Oracle" Python driver, follow these instructions)
You can use Python, SQL*Plus and mysql.exe (MySQL client) script to copy whole table of just query results.
It will be portable because all those tools exist on Windows and Linux.
When I had to do it I implemented following steps using Python:
Extract data into CSV file using SQL*Plus.
Load dump file into MySQL
using mysql.exe.
You can improve performance by performing parallel load using Tables/Partitions/Sub-partitions.
Disclosure: Oracle-to-MySQL-Data-Migrator is the script I wrote for data integration between Oracle and MySQL on Windows OS.
We ran into serious performance problems with our Oracle database and we would like to try to migrate it to a MySQL-based database (either MySQL directly or, more preferably, Infobright).
The thing is, we need to let the old and the new system overlap for at least some weeks if not months, before we actually know, if all the features of the new database match our needs.
So, here is our situation:
The Oracle database consists of multiple tables with each millions of rows. During the day, there are literally thousands of statements, which we cannot stop for migration.
Every morning, new data is imported into the Oracle database, replacing some thousands of rows. Copying this process is not a problem, so we could, in theory, import in both databases in parallel.
But, and here the challenge lies, for this to work we need to have an export from the Oracle database with a consistent state from one day. (We cannot export some tables on Monday and some others on Tuesday, etc.) This means, that at least the export should be finished in less than one day.
Our first thought was to dump the schema, but I wasn't able to find a tool to import an Oracle dump file into MySQL. Exporting tables in CSV files might work, but I'm afraid it could take too long.
So my question now is:
What should I do? Is there any tool to import Oracle dump files into MySQL? Does anybody have any experience with such a large-scale migration?
PS: Please, don't suggest performance optimization techniques for Oracle, we already tried a lot :-)
Edit: We already tried some ETL tools before, only to find out, that they were not fast enough: Exporting only one table already took more than 4 hours ...
2nd Edit: Come on folks ... did nobody ever try to export a whole database as fast as possible and convert the data so that it can be imported into another database system?
Oracle does not supply an out-of-the-box unload utility.
Keep in mind without comprehensive info about your environment (oracle version? server platform? how much data? what datatypes?) everything here is YMMV and you would want to give it a go on your system for performance and timing.
My points 1-3 are just generic data movement ideas. Point 4 is a method that will reduce downtime or interruption to minutes or seconds.
1) There are 3rd party utilities available. I have used a few of these but best for you to check them out yourself for your intended purpose. A few 3rd party products are listed here: OraFaq . Unfortunately a lot of them run on Windows which would slow down the data unload process unless your DB server was on windows and you could run the load utility directly on the server.
2) If you don't have any complex datatypes like LOBs then you can roll your own with SQLPLUS. If you did a table at a time then you can easily parallelize it. Topic has been visited on this site probably more than once, here is an example: Linky
3) If you are 10g+ then External Tables might be a performant way to accomplish this task. If you create some blank external tables with the same structure as your current tables and copy the data to them, the data will be converted to the external table format (a text file). Once again, OraFAQ to the rescue.
4) If you must keep systems in parallel for days/weeks/months then use a change data capture/apply tool for near-zero downtime. Be prepared to pay $$$. I have used Golden Gate Software's tool that can mine the Oracle redo logs and supply insert/update statements to a MySQL Database. You can migrate the bulk of the data with no downtime the week before go-live. Then during your go-live period, shut down the source database, have Golden Gate catch up the last remaining transactions, then open up access to your new target database. I have used this for upgrades and the catch up period was only a few minutes. We already had a site licenses for Golden Gate so it wasn't anything out of pocket for us.
And I'll play the role of Cranky DBA here and say if you can't get Oracle performing well I would love to see a write up of how MySQL fixed your particular issues. If you have an application where you can't touch the SQL, there are still lots of possible ways to tune Oracle. /soapbox
I have built a C# application that can read an Oracle dump (.dmp) file and pump it's tables of data into a SQL Server database.
This application is used nightly on a production basis to migrate a PeopleSoft database to SQL Server. The PeopleSoft database has 1100+ database tables and the Oracle dump file is greater than 4.5GB in size.
This application creates the SQL Server database and tables and then loads all 4.5GB of data in less than 55 minutes running on a dual-core Intel server.
I don't believe it would be too difficult to modify this application to work with other databases provided they have an ADO.NET provider.
yeah, Oracle is pretty slow. :)
You can use any number of ETL tools to move data from Oracle into MySQL. My favourite is SQL Server Integration Services.
If you have Oracle9i or higher, you can implement Change Data Capture. Read more here http://download-east.oracle.com/docs/cd/B14117_01/server.101/b10736/cdc.htm
Then you can take a delta of changes from Oracle to your MySQL or Infobright using any ETL technologies.
We had the same issue. Needed to get tables and data from oracle dbms to mysql dbms.
We used this tool we found online... It worked well.
http://www.sqlines.com/download
This tool will basically help you:
Connect to your source DBMS(ORACLE)
Connect to destination DBMS(MySQL)
Specify schema and tables in the ORACLE DBMS you want to migrate
Press a "Transfer" button to Run the migration process(running inbuilt migration queries)
Get a transfer log, which will tell how many records were READ from SOURCE and WRITTEN on the destination database, what queries failed.
Hope this will help others that will land on this question.
I've used Pentaho Data Integration to migrate from Oracle to MySql (I also migrated the same data to Postresql, which was about 50% quicker, which I guess was largely due to the different JDBC drivers being used). I followed Roland Bouman's instructions here, almost to the letter, and was very pleasantly suprised at how easy it was:
Copy Table data from one DB to another
I don't know whether it will be appropriate for your data load, but it's worth a shot.
I recently released etlalchemy to accomplish this task. It is an open-sourced solution which allows migration between any 2 SQL databases with 4 lines of Python, and was initially designed to migrate from Oracle to MySQL. Support has been added for MySQL, PostgreSQL, Oracle, SQLite and SQL Server.
This will take care of migrating schema (arguably the most challenging), data, indexes and constraints, with many more options available.
To install:
$ pip install etlalchemy
On El Capitan: pip install --ignore-installed etlalchemy
To run:
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
orcl_db_source = ETLAlchemySource("oracle+cx_oracle://username:password#hostname/ORACLE_SID")
mysql_db_target = ETLAlchemyTarget("mysql://username:password#hostname/db_name", drop_database=True)
mysql_db_target.addSource(orcl_db_source)
mysql_db_target.migrate()
Concerning performance, this tool utilizes BULK import tools across various RDBMS such as mysqlimport and COPY FROM (postgresql) to carry out migrations efficiently. I was able to migrate a 5GB SQL Server database with 33,105,951 rows into MySQL in 40 minutes, and a 3GB 7,000,000 row Oracle database to MySQL in 13 minutes.
To get more background on the origins of the project, check out this post. If you get any errors running the tool, open an issue on the github repo and I'll patch it up in less than a week!
(To install the "cx_Oracle" Python driver, follow these instructions)
You can use Python, SQL*Plus and mysql.exe (MySQL client) script to copy whole table of just query results.
It will be portable because all those tools exist on Windows and Linux.
When I had to do it I implemented following steps using Python:
Extract data into CSV file using SQL*Plus.
Load dump file into MySQL
using mysql.exe.
You can improve performance by performing parallel load using Tables/Partitions/Sub-partitions.
Disclosure: Oracle-to-MySQL-Data-Migrator is the script I wrote for data integration between Oracle and MySQL on Windows OS.
What's the fastest way to export/import a mysql database using innodb tables?
I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".
From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.
Are there better ways to do this, or further optimizations we should be doing?
Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.
Update:
To answer some questions posed in the answers:
The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.
We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.
I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.
We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.
It depends on how you define "fastest".
As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.
The pertinent questions are:
How often does your production database schema change?
Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.
How often do you need to put production data into a development environment?
In my experience, not very often at all. I've generally found that once a month is more than sufficient.
How long does mysqldump take?
If it's less than 8 hours it can be done overnight as a cron job. Problem solved.
Do you need all the data?
Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.
Conclusion
Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.
Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.
Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.