I'm looking for a possible solution for the following problem.
First the situation I'm at:
I've 2 databases, 1 Oracle DB and 1 MySQL DB. Although they have a lot of similarities they are not identical. A lot of tables are available on both the Oracle DB and the MySQL DB but the Oracle tables are often more extensive and contain more columns.
The situation with the databases can't be changed, so I've to deal with that.
Now I'm looking for the following:
I want to synchronise data from Oracle to MySQL and vice versa. This has to be done real time or as close to real time as possible. So when changes are made at one DB they have to be synced to the other DB as quickly as possible.
Also not every table has to be in sync, so the solution must offer a way of selecting which tables have to be synced and which not.
Because the databases are not identical replication isn't an option I think. But what is?
I hope you guys can help me with finding a way of doing this or a tool which does exactly what I need. Maybe you know some good papers/articles I can use?
Thanks!
Thanks for the comments.
I did some further research on ETL and EAI.
I found out that I am searching for an ETL tool.
I read your question and your answer. I have worked on both Oracle, SQL, ETL and data warehouses and here are my suggestions:
It is good to have a readymade ETL tool. But, if your application is big enough to make you need a tailor made ETL tool, I suggest you for a home-made ETL process.
If your transactional database is on Oracle, you can have triggers set up on the key tables that would further trigger an external procedure written in C, C++ or Java.
The reason behind using an external procedure is to be able to communicate with both databases at a time - Oracle and MySQL.
You can read more about Oracle External Procedures here.
If not through ExtProc, you can develop a separate application in Java or .Net that would extract data from the first database, transform it according to your business rules and load it into your warehouse.
In either approaches that you choose, you will have greater control on the ETL process if you implement your own tool, rather than going for a readymade tool.
Related
I am trying to practice writing simple SQL queries but I can't connect to my school account on Microsoft SQL Server Studio because they delete your database once you finish the class. I downloaded MySQL but I wasn't sure if I could practice queries on it or not. Any answers would be great, thanks!
The syntax and built-in functions are not identical, but they are similar for many things. For example, SQL TOP, LIMIT, or ROWNUM clauses is a case where they diverge.
Personally, I’d recommend taking a look at what w3schools.com has for SQL (and MySQL) resources. They have a ton of info to get you going, tutorials, references, etc. They also have “Try It Yourself” modules where you can use a playground database they provide to practice executing in the language your looking at material for. In cases where the SQL Server and MySQL (and other languages) syntax differs, w3schools shows examples of both (all that they support) like in the example I mentioned at the top.
Snippet from the top of their SQL tutorial home page:
“Our SQL tutorial will teach you how to use SQL in: MySQL, SQL Server, MS Access, Oracle, Sybase, Informix, Postgres, and other database systems.”
I suggest SQLite. Why?
It is an embedded database, rather than something like MySql which stores folders of metadata. Sqlite stores only a single file. My reasons for beginning with Sqlite are as follows:
1) It is the default database for a lot of common applications (Django, Airflow, etc). Knowing it would come in real handy when learning these tools.
2) The download is not only much simpler, but the ide is much faster. Your complete database is also only a single file (very beginner friendly).
3) In memory databases. That's correct, you can spawn and delete databases in Sqlite within memory (or even delete the whole file with your OS file removal). Very useful for learning, data science, and on the fly OLAP.
4) It can store up to 140 TB of data. It is the perfect tool to load a csv to quickly analyze the data. Also, you can create a small database, compress the file, and send it to anyone! Sharing your whole database is really just sharing a file.
5) You can import sqlite into Python, C, C++, and start automating your queries. You can do this with MySQL too but there is more library downloading and reading to do. Do not use sqlite in production (multi threading limits) but it is great for ad hoc analysis (Jupyter notebooks), prototyping, and learning.
Overall, mysql is not the best tool for beginners (it's not even used in production as much as Postgres or SQL server). It abstracts too much from the user to even understand what the database represents or the query engine. Also, Sqlite is closer to standard ANSI than MySQL in my opinion (given all the syntactic sugar). Learn Sqlite, move to postgres, and then explore all the nosql, blockchain, etc. If you ever face MySQL, you'll pick it up in minutes. I guarantee, you will have a much easier time picking up sqlite!
This is an odd (probably stupid) question but is it possible to manipulate a Drizzle database through MySQL stored procedures? The reason I need to do this is that I'm migrating my MySQL database (which contains a lot of stored procedures) to Drizzle (which doesn't support stored procedures). This was essentially one of my 'grand' short term ideas for this.
Any ideas for doing this, or some other ones?
There isn't an easy way to do this that I'm aware of (speaking as a core Drizzle developer, former MySQL developer and currently at Percona as Director of Server Development). You may be able to get somewhere with Continuent for replicating from MySQL to Drizzle though.
While you may get some success with the FEDERATED or FEDERATEDX engines... I wouldn't bet that your access patterns make that remotely efficient.
The best bet is to move away from stored procedures - they're a maintenance and scalability nightmare anyway (schema versioning is hard, having application code in schema just makes it harder).
I'm fishing for knowledge on this one as I know a way I could get what I'm after but I'm wondering if it's the best approach. Due to project creep the database of what was once a very simple application became an overcomplicated monster, and while it worked administration wasn't exactly easy.
I've got the opportunity now to essentially start again, however only piece by piece, gradually migrating functions from the old application to the new one. This requires that I keep the new database synchronised with the old one, which thankfully only needs to be one way as no data will be getting created on the new database that needs to be migrated back.
The options considered so far are SSIS and a Quartz triggered windows service using plain old C# ADO.Net. SSIS I've decided is probably a bad idea as it can be a nightmare with upserts, requiring temporary tables to be created followed by a merge and the schema differences are pretty extensive so the SSIS logic would be a headache. The ADO.Net approach is the direction I'm leaning as data readers, bulk inserts and LINQ should do the job nicely. However considering how many people this must have effected before me I'm thinking there must be a better way. What approach do you guys use?
To get a bit more specific the details are:
SQL Server 2008 R2
~2 million rows over roughly 30 tables -> ~20
tables
Databases will each be on completely separate database servers
with different credentials
Many thanks
Lets supose this scenario:
S_old: your 'old' server
S_new: your 'new' server
db_old: your 'old' database
db_new: your 'new' database
A solution may be a nighly db_old backup and restore from s_old to s_new: db_old_bk. Then, in s_new you have access to both databases (db_old_bk and db_new) At this point you can do upserts with 'merge' sqlcommand on t-sql easyly (we are talking about 20 tables). Is this that you are looking for?
I'm not sure, if it fits exactly stackoverflow, however as i'm seeking for some code rather than a tool, i think it does.
I'm looking for a way of how to replicate / synchronize different database systems -- in this case: mysql and mongodb. We are running both for different purpose. We started with a mysql database and added mongodb later on for special applications. There's data we would like to have in both databases, where we want to have constraints in mysql respectivly dbrefs in mongodb. For example: We need a user-record in mysql, but also in mongodb for references between tables respectivly objects. At the moment we have a cronjob, which dumps the mysql data and imports it in mongodb. However though it works quite well, that's not the solution we would like to have.
I think for the moment a one-way replication would be enough -- mysql->mongodb, the important part is, that the replication works in "realtime", much like a mysql master->slave replication works.
Are there already any solutions for this problem or ideas anyone of how to achieve this?
Thanks!
SymmetricDS is open source, Java-based, web-enabled, database independent, data synchronization/replication software that might do the trick with a few tweaks. It has an extension point called IDataLoaderFilter which you could use to implement a MongodbDataLoader.
This would help with one way database replication. It might be a little more difficult to synchronized from MongoDb -> relational database, but the SymmetricDS team would be very helpful in trying to find the solution.
What you're looking for is called EAI (Enterprise application integration). There are a lot of commercial tools around but under the provided link, you'll also find a couple OSS solutions. The basis of EAI is that you have data sources and data sinks. The EAI framework offers tools to build custom pumps between the two.
I suggest to either use a DB trigger to start the synchronization or send a trigger signal in your applications. Note that there is no key-hole solution since synchronization can become arbitrarily complex (for example, how do you make sure that all rows are copied?).
As far as I see you need to develop some sort of "Control program" that has the drivers for each DBMS and run it as a daemon. The daemon should have a trigger or a very small recheck interval to keep the DBs synchronized
Technically, you could set up a process which parses the binary log of the MySQL server and replicate the relevant sql queries. I've never done such a thing with a a different database as a slave, but maybe it is worth a shot?
I have been interested in database developing for some time now and decided that MS SQL has a lot to offer in terms of T-SQL and generally much more functionality (not saying that Oracle or Postgres don't have that).
I would like to know:
What are the big paradigm changes I
should expect to see?
How much effort do "regular" companies put
into developing their database (for
transactions, triggers, events, data
cleansing, ETL)?
What can I expect
from the inner-workings of MS SQL
developer teams and how they
interact with the .NET application
developers?
I hope I have phrased my question correctly. I am not very clued-up about the whole .NET scene.
Can't answer #1 as I've never worked with mysql but I'll take a shot at #2 and #3.
This tends to depend on the size of the database and/or the size (or professionalism) of the company. Companies with large databases with many users spend a great deal of time indeed making sure that the database both has integrity and is performance tuned. They woudl lose customers if they did not. We have 6 people who do nothing but ETL work and 5 dbas who tune and manage the databases and database servers as well as many many developers who write t-sql code.
As far as #3, in good companies these people work together very well as a team. In bad companies, there is often tension between the two groups and each uses the other group as a scapegaoat for whatever problems occur. I work with a bunch of great .net developers. They respect my database expertise as I respect their .net expertise and we caonsult each other on design issues and tuning issues and in general any issue that needs input from both sides.
http://forums.mysql.com/read.php?60,124480,124480 details using linked servers from SQL Server to MySQL to do the actual data migration.
Apache DDLUtils should be able to help. You can reverse engineer the schema into a common DDL and also export the data to a flat file. Import it in afterward.