I have a client with close to 120,000,000 records in an Oracle database. Their engineer claims they can only give us a ms access dump of their database. The data will actually be going into an MySQL relational database instance.
What potential issues and problems can we expect moving from Oracle > Access > MySQL?
We have located tools that can convert oracle db to MySQL, but due the large nature of the database 100gb + I am not sure of the stability of these software based solutions to handle the conversion process. This is a time sensitive project and I am worried that if we make any mistakes in the onset that we may not be able to complete in a timely manner.
Exporting the Oracle data to a comma-separated, tab separated, or pipe separated set of files would not be very challenging. It's done all the time.
I have no idea why someone would claim to only be able to produce an MS Access dump from an Oracle database -- if that's not being done directly via selecting from Access through ODBC then it's done via an intermediate flat file anyway. I'm inclined to call "BS" or "incompetence" on this claim.
The maximum size of an Access database is 2GB so I don't see how the proposed migration could be achieved without partitioning the data.
Related
We ran into serious performance problems with our Oracle database and we would like to try to migrate it to a MySQL-based database (either MySQL directly or, more preferably, Infobright).
The thing is, we need to let the old and the new system overlap for at least some weeks if not months, before we actually know, if all the features of the new database match our needs.
So, here is our situation:
The Oracle database consists of multiple tables with each millions of rows. During the day, there are literally thousands of statements, which we cannot stop for migration.
Every morning, new data is imported into the Oracle database, replacing some thousands of rows. Copying this process is not a problem, so we could, in theory, import in both databases in parallel.
But, and here the challenge lies, for this to work we need to have an export from the Oracle database with a consistent state from one day. (We cannot export some tables on Monday and some others on Tuesday, etc.) This means, that at least the export should be finished in less than one day.
Our first thought was to dump the schema, but I wasn't able to find a tool to import an Oracle dump file into MySQL. Exporting tables in CSV files might work, but I'm afraid it could take too long.
So my question now is:
What should I do? Is there any tool to import Oracle dump files into MySQL? Does anybody have any experience with such a large-scale migration?
PS: Please, don't suggest performance optimization techniques for Oracle, we already tried a lot :-)
Edit: We already tried some ETL tools before, only to find out, that they were not fast enough: Exporting only one table already took more than 4 hours ...
2nd Edit: Come on folks ... did nobody ever try to export a whole database as fast as possible and convert the data so that it can be imported into another database system?
Oracle does not supply an out-of-the-box unload utility.
Keep in mind without comprehensive info about your environment (oracle version? server platform? how much data? what datatypes?) everything here is YMMV and you would want to give it a go on your system for performance and timing.
My points 1-3 are just generic data movement ideas. Point 4 is a method that will reduce downtime or interruption to minutes or seconds.
1) There are 3rd party utilities available. I have used a few of these but best for you to check them out yourself for your intended purpose. A few 3rd party products are listed here: OraFaq . Unfortunately a lot of them run on Windows which would slow down the data unload process unless your DB server was on windows and you could run the load utility directly on the server.
2) If you don't have any complex datatypes like LOBs then you can roll your own with SQLPLUS. If you did a table at a time then you can easily parallelize it. Topic has been visited on this site probably more than once, here is an example: Linky
3) If you are 10g+ then External Tables might be a performant way to accomplish this task. If you create some blank external tables with the same structure as your current tables and copy the data to them, the data will be converted to the external table format (a text file). Once again, OraFAQ to the rescue.
4) If you must keep systems in parallel for days/weeks/months then use a change data capture/apply tool for near-zero downtime. Be prepared to pay $$$. I have used Golden Gate Software's tool that can mine the Oracle redo logs and supply insert/update statements to a MySQL Database. You can migrate the bulk of the data with no downtime the week before go-live. Then during your go-live period, shut down the source database, have Golden Gate catch up the last remaining transactions, then open up access to your new target database. I have used this for upgrades and the catch up period was only a few minutes. We already had a site licenses for Golden Gate so it wasn't anything out of pocket for us.
And I'll play the role of Cranky DBA here and say if you can't get Oracle performing well I would love to see a write up of how MySQL fixed your particular issues. If you have an application where you can't touch the SQL, there are still lots of possible ways to tune Oracle. /soapbox
I have built a C# application that can read an Oracle dump (.dmp) file and pump it's tables of data into a SQL Server database.
This application is used nightly on a production basis to migrate a PeopleSoft database to SQL Server. The PeopleSoft database has 1100+ database tables and the Oracle dump file is greater than 4.5GB in size.
This application creates the SQL Server database and tables and then loads all 4.5GB of data in less than 55 minutes running on a dual-core Intel server.
I don't believe it would be too difficult to modify this application to work with other databases provided they have an ADO.NET provider.
yeah, Oracle is pretty slow. :)
You can use any number of ETL tools to move data from Oracle into MySQL. My favourite is SQL Server Integration Services.
If you have Oracle9i or higher, you can implement Change Data Capture. Read more here http://download-east.oracle.com/docs/cd/B14117_01/server.101/b10736/cdc.htm
Then you can take a delta of changes from Oracle to your MySQL or Infobright using any ETL technologies.
We had the same issue. Needed to get tables and data from oracle dbms to mysql dbms.
We used this tool we found online... It worked well.
http://www.sqlines.com/download
This tool will basically help you:
Connect to your source DBMS(ORACLE)
Connect to destination DBMS(MySQL)
Specify schema and tables in the ORACLE DBMS you want to migrate
Press a "Transfer" button to Run the migration process(running inbuilt migration queries)
Get a transfer log, which will tell how many records were READ from SOURCE and WRITTEN on the destination database, what queries failed.
Hope this will help others that will land on this question.
I've used Pentaho Data Integration to migrate from Oracle to MySql (I also migrated the same data to Postresql, which was about 50% quicker, which I guess was largely due to the different JDBC drivers being used). I followed Roland Bouman's instructions here, almost to the letter, and was very pleasantly suprised at how easy it was:
Copy Table data from one DB to another
I don't know whether it will be appropriate for your data load, but it's worth a shot.
I recently released etlalchemy to accomplish this task. It is an open-sourced solution which allows migration between any 2 SQL databases with 4 lines of Python, and was initially designed to migrate from Oracle to MySQL. Support has been added for MySQL, PostgreSQL, Oracle, SQLite and SQL Server.
This will take care of migrating schema (arguably the most challenging), data, indexes and constraints, with many more options available.
To install:
$ pip install etlalchemy
On El Capitan: pip install --ignore-installed etlalchemy
To run:
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
orcl_db_source = ETLAlchemySource("oracle+cx_oracle://username:password#hostname/ORACLE_SID")
mysql_db_target = ETLAlchemyTarget("mysql://username:password#hostname/db_name", drop_database=True)
mysql_db_target.addSource(orcl_db_source)
mysql_db_target.migrate()
Concerning performance, this tool utilizes BULK import tools across various RDBMS such as mysqlimport and COPY FROM (postgresql) to carry out migrations efficiently. I was able to migrate a 5GB SQL Server database with 33,105,951 rows into MySQL in 40 minutes, and a 3GB 7,000,000 row Oracle database to MySQL in 13 minutes.
To get more background on the origins of the project, check out this post. If you get any errors running the tool, open an issue on the github repo and I'll patch it up in less than a week!
(To install the "cx_Oracle" Python driver, follow these instructions)
You can use Python, SQL*Plus and mysql.exe (MySQL client) script to copy whole table of just query results.
It will be portable because all those tools exist on Windows and Linux.
When I had to do it I implemented following steps using Python:
Extract data into CSV file using SQL*Plus.
Load dump file into MySQL
using mysql.exe.
You can improve performance by performing parallel load using Tables/Partitions/Sub-partitions.
Disclosure: Oracle-to-MySQL-Data-Migrator is the script I wrote for data integration between Oracle and MySQL on Windows OS.
I'm planning to create an VB.net application for retrieving data from a database (MS Access) and store it to a web server (MySQL data base). I really have confusion in my mind. I'm planning to use task scheduler so that the program will automatically run. I'm planning to set the time every 5 minutes.
How can I avoid the redundancy of data?
For example, I'm planning to get the sales for 5 minutes, after 5 minutes I will do it again. I think there will be redundancy in that case. I would like to ask your ideas about this scenario: how would you handle it?
If at all possible you should avoid using two databases in a situation like this.
Look for information on the linked table manager -- the data that Access uses doesn't have to be stored in Access.
http://www.mssqltips.com/sqlservertip/1480/configure-microsoft-access-linked-tables-with-a-sql-server-database/
If you have to do this, then see about using/upgrading to Access 2010 and use data macros (triggers), to put the new/changed data into temp tables that you clear out once you've copied the data over.
In a comment you said "i dont have any idea about how to replace the native tables with ODBC".
Is that the only obstacle which prevents you consolidating the data into one set in MySQL? If so, try this suggestion for setting ODBC links to MySQL tables.
Install an ODBC driver for MySQL, if you don't have one already. The latest version is available here: Download Connector/ODBC
Create a DSN (Data Source Name) for your MySQL database from the Windows ODBC Data Source Administrator.
Create a new Access database and use the DSN to create links with guidance from the web page link #jmoreno provided.
If the Access names of the linked tables are different than the names you originally used for the native Access tables, change them to match those original names.
Then you can import your forms, queries, reports, etc. from the old Access application. Ideally everything will just work, since Access will find the table names it needs and won't care that they are external instead of native tables. However you many need to resolve any data type incompatibilities between Access and MySQL.
You would need the MySQL ODBC driver on each machine where the Access application is used. Personally I would prefer to deal with that rather than the challenges of synchronizing between separate Access and MySQL data stores. (YMMV)
When you're ready to deploy, you can convert the ODBC links to DSN-less connections so the client machines wouldn't need to each have the DSN configured. See Using DSN-Less Connections by Doug Steele, Access MVP, for detailed instructions.
You will need to think very carefully about how you identify the data which has changed since the last synchronization cycle. If every row of data has a 'last updated' timestamp (that is indexed) then you could write a process that selected the recently updated rows from each table in turn. That's apt to be a bit heavy on the originating database (MS Access), plus you still have to identify the corresponding row to replace (where replacement is required) in the MySQL database. Of course, you can put different tables on different change schedules. For example, the table of US states probably doesn't change once a year, but your customer orders tables (or SO questions and answers tables) may change a lot in five minutes.
Some DBMS have alternative mechanisms, especially for working between copies of themselves. Some DBMS also provide a mechanism that is sometimes called 'changed data capture' (CDC) that allows you to get the changed data. Sometimes, in DBMS where you have a 'transaction log' or 'logical log' (but not CDC or something similar), you can 'mine' the log files (or log backups) to find the changes. However, the logs are typically optimized for the DBMS internal recovery processes, not for your use.
Well, obviously you will have to keep track of data items (may be in a different metadata space/datastore) that you have already processed to avoid the redundancy. The metadata should be used to filter out records that have been processed from the source. The logic and what needs to be in the metadata would depend on the exact use case here.
I have a production db running on Oracle 10g. I want to set up a data warehouse using a MySQL 5.5 database and ideally would like to use CDC to identify any changes to the live DB and populate those changes to the warehouse.
Has anyone done this?
Is it possible without the use of a third party ETL tool, if not can anyone recommend any software for the job?
You can also use oracle database gateway for odbc.
It al depends how many tables you want to replicate and amount of data changed daily
You may end up writing a lot of triggers but it might slow down your database.
If you have creation and last modification fields you may use them as well.
Plus you can copy modified data only on schedule during of peak hours.
I work for a very small company. I was recently faced with the question of whether there is a good way to convert a proprietary database to a MySQL database without owning the proprietary database engine e.g. if one is given a large oracle database file (or choose your favorite proprietary database engine format), but doesn't have a license for the oracle database engine, is there a good, perfectly reliable way to convert it to a MySQL database format that can be read with the MySQL database engine? My question is very vague as to which proprietary format is the source just because there would be multiple sources and it looks like they would be "various and sundry". My suspicion is that there is no perfectly reliable way, especially for a wide variety of proprietary databases. If there are a few proprietary formats for which this is possible, I would still be interested in knowing, though "various and sundry" is probably the real issue. Minimizing cost, effort and correct conversion are key so I think this probably is in the not possible list.
-John
Most commercial DBs have a trial or limited download version, should at least be enough to export the data and schema.
However you do it, it's probably safer to read the schema and create the structure in MySQL then export each table as say CSV and re-import it into MySQL rather than rely on a direct conversion tool.
ps. Of course if you have a lot of stored procedures or custom Oracle specific SQL then it's going to hurt a lot more!
Most databases (and vendors) support some sort of SQL DDL/DML export capability. They may not advertise it loudly, but it's there. MySQL and PostgresSQL both have this sort of capability. Microsoft has the SQL Server Database Publishing Wizard. Oracle has this capability as well. You might be able to convince your data source to export the data, instead of merely dumping it.
All of these tools have limitations, particularly when it comes to BLOBs and similar data types. Exporting the data typically takes longer than merely dumping it, and the resulting files may be significantly larger. The advantage is that the resulting SQL scripts are amenable to being edited/converted/tweaked to match your target database's prefer SQL syntax.
For Oracle, you can look into Personal Edition (only available for Windows). With a short-term license (eg one year) it is a low-cost [from about $100] way for an individual to get the functionality to process large Oracle databases with the full feature set.
It is licensed on a single-user basis, and the definition of a user is quite wide. As such, you are not allowed, for example, to distribute reports from a Personal Edition database to a bunch of people (each would be counted as a user). A one-off migration from Oracle to another platform should be okay. A repeated, regular extraction would be more likely to be seen as being part of a process of supplying information to multiple users.
We ran into serious performance problems with our Oracle database and we would like to try to migrate it to a MySQL-based database (either MySQL directly or, more preferably, Infobright).
The thing is, we need to let the old and the new system overlap for at least some weeks if not months, before we actually know, if all the features of the new database match our needs.
So, here is our situation:
The Oracle database consists of multiple tables with each millions of rows. During the day, there are literally thousands of statements, which we cannot stop for migration.
Every morning, new data is imported into the Oracle database, replacing some thousands of rows. Copying this process is not a problem, so we could, in theory, import in both databases in parallel.
But, and here the challenge lies, for this to work we need to have an export from the Oracle database with a consistent state from one day. (We cannot export some tables on Monday and some others on Tuesday, etc.) This means, that at least the export should be finished in less than one day.
Our first thought was to dump the schema, but I wasn't able to find a tool to import an Oracle dump file into MySQL. Exporting tables in CSV files might work, but I'm afraid it could take too long.
So my question now is:
What should I do? Is there any tool to import Oracle dump files into MySQL? Does anybody have any experience with such a large-scale migration?
PS: Please, don't suggest performance optimization techniques for Oracle, we already tried a lot :-)
Edit: We already tried some ETL tools before, only to find out, that they were not fast enough: Exporting only one table already took more than 4 hours ...
2nd Edit: Come on folks ... did nobody ever try to export a whole database as fast as possible and convert the data so that it can be imported into another database system?
Oracle does not supply an out-of-the-box unload utility.
Keep in mind without comprehensive info about your environment (oracle version? server platform? how much data? what datatypes?) everything here is YMMV and you would want to give it a go on your system for performance and timing.
My points 1-3 are just generic data movement ideas. Point 4 is a method that will reduce downtime or interruption to minutes or seconds.
1) There are 3rd party utilities available. I have used a few of these but best for you to check them out yourself for your intended purpose. A few 3rd party products are listed here: OraFaq . Unfortunately a lot of them run on Windows which would slow down the data unload process unless your DB server was on windows and you could run the load utility directly on the server.
2) If you don't have any complex datatypes like LOBs then you can roll your own with SQLPLUS. If you did a table at a time then you can easily parallelize it. Topic has been visited on this site probably more than once, here is an example: Linky
3) If you are 10g+ then External Tables might be a performant way to accomplish this task. If you create some blank external tables with the same structure as your current tables and copy the data to them, the data will be converted to the external table format (a text file). Once again, OraFAQ to the rescue.
4) If you must keep systems in parallel for days/weeks/months then use a change data capture/apply tool for near-zero downtime. Be prepared to pay $$$. I have used Golden Gate Software's tool that can mine the Oracle redo logs and supply insert/update statements to a MySQL Database. You can migrate the bulk of the data with no downtime the week before go-live. Then during your go-live period, shut down the source database, have Golden Gate catch up the last remaining transactions, then open up access to your new target database. I have used this for upgrades and the catch up period was only a few minutes. We already had a site licenses for Golden Gate so it wasn't anything out of pocket for us.
And I'll play the role of Cranky DBA here and say if you can't get Oracle performing well I would love to see a write up of how MySQL fixed your particular issues. If you have an application where you can't touch the SQL, there are still lots of possible ways to tune Oracle. /soapbox
I have built a C# application that can read an Oracle dump (.dmp) file and pump it's tables of data into a SQL Server database.
This application is used nightly on a production basis to migrate a PeopleSoft database to SQL Server. The PeopleSoft database has 1100+ database tables and the Oracle dump file is greater than 4.5GB in size.
This application creates the SQL Server database and tables and then loads all 4.5GB of data in less than 55 minutes running on a dual-core Intel server.
I don't believe it would be too difficult to modify this application to work with other databases provided they have an ADO.NET provider.
yeah, Oracle is pretty slow. :)
You can use any number of ETL tools to move data from Oracle into MySQL. My favourite is SQL Server Integration Services.
If you have Oracle9i or higher, you can implement Change Data Capture. Read more here http://download-east.oracle.com/docs/cd/B14117_01/server.101/b10736/cdc.htm
Then you can take a delta of changes from Oracle to your MySQL or Infobright using any ETL technologies.
We had the same issue. Needed to get tables and data from oracle dbms to mysql dbms.
We used this tool we found online... It worked well.
http://www.sqlines.com/download
This tool will basically help you:
Connect to your source DBMS(ORACLE)
Connect to destination DBMS(MySQL)
Specify schema and tables in the ORACLE DBMS you want to migrate
Press a "Transfer" button to Run the migration process(running inbuilt migration queries)
Get a transfer log, which will tell how many records were READ from SOURCE and WRITTEN on the destination database, what queries failed.
Hope this will help others that will land on this question.
I've used Pentaho Data Integration to migrate from Oracle to MySql (I also migrated the same data to Postresql, which was about 50% quicker, which I guess was largely due to the different JDBC drivers being used). I followed Roland Bouman's instructions here, almost to the letter, and was very pleasantly suprised at how easy it was:
Copy Table data from one DB to another
I don't know whether it will be appropriate for your data load, but it's worth a shot.
I recently released etlalchemy to accomplish this task. It is an open-sourced solution which allows migration between any 2 SQL databases with 4 lines of Python, and was initially designed to migrate from Oracle to MySQL. Support has been added for MySQL, PostgreSQL, Oracle, SQLite and SQL Server.
This will take care of migrating schema (arguably the most challenging), data, indexes and constraints, with many more options available.
To install:
$ pip install etlalchemy
On El Capitan: pip install --ignore-installed etlalchemy
To run:
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
orcl_db_source = ETLAlchemySource("oracle+cx_oracle://username:password#hostname/ORACLE_SID")
mysql_db_target = ETLAlchemyTarget("mysql://username:password#hostname/db_name", drop_database=True)
mysql_db_target.addSource(orcl_db_source)
mysql_db_target.migrate()
Concerning performance, this tool utilizes BULK import tools across various RDBMS such as mysqlimport and COPY FROM (postgresql) to carry out migrations efficiently. I was able to migrate a 5GB SQL Server database with 33,105,951 rows into MySQL in 40 minutes, and a 3GB 7,000,000 row Oracle database to MySQL in 13 minutes.
To get more background on the origins of the project, check out this post. If you get any errors running the tool, open an issue on the github repo and I'll patch it up in less than a week!
(To install the "cx_Oracle" Python driver, follow these instructions)
You can use Python, SQL*Plus and mysql.exe (MySQL client) script to copy whole table of just query results.
It will be portable because all those tools exist on Windows and Linux.
When I had to do it I implemented following steps using Python:
Extract data into CSV file using SQL*Plus.
Load dump file into MySQL
using mysql.exe.
You can improve performance by performing parallel load using Tables/Partitions/Sub-partitions.
Disclosure: Oracle-to-MySQL-Data-Migrator is the script I wrote for data integration between Oracle and MySQL on Windows OS.