I'm not sure, if it fits exactly stackoverflow, however as i'm seeking for some code rather than a tool, i think it does.
I'm looking for a way of how to replicate / synchronize different database systems -- in this case: mysql and mongodb. We are running both for different purpose. We started with a mysql database and added mongodb later on for special applications. There's data we would like to have in both databases, where we want to have constraints in mysql respectivly dbrefs in mongodb. For example: We need a user-record in mysql, but also in mongodb for references between tables respectivly objects. At the moment we have a cronjob, which dumps the mysql data and imports it in mongodb. However though it works quite well, that's not the solution we would like to have.
I think for the moment a one-way replication would be enough -- mysql->mongodb, the important part is, that the replication works in "realtime", much like a mysql master->slave replication works.
Are there already any solutions for this problem or ideas anyone of how to achieve this?
Thanks!
SymmetricDS is open source, Java-based, web-enabled, database independent, data synchronization/replication software that might do the trick with a few tweaks. It has an extension point called IDataLoaderFilter which you could use to implement a MongodbDataLoader.
This would help with one way database replication. It might be a little more difficult to synchronized from MongoDb -> relational database, but the SymmetricDS team would be very helpful in trying to find the solution.
What you're looking for is called EAI (Enterprise application integration). There are a lot of commercial tools around but under the provided link, you'll also find a couple OSS solutions. The basis of EAI is that you have data sources and data sinks. The EAI framework offers tools to build custom pumps between the two.
I suggest to either use a DB trigger to start the synchronization or send a trigger signal in your applications. Note that there is no key-hole solution since synchronization can become arbitrarily complex (for example, how do you make sure that all rows are copied?).
As far as I see you need to develop some sort of "Control program" that has the drivers for each DBMS and run it as a daemon. The daemon should have a trigger or a very small recheck interval to keep the DBs synchronized
Technically, you could set up a process which parses the binary log of the MySQL server and replicate the relevant sql queries. I've never done such a thing with a a different database as a slave, but maybe it is worth a shot?
Related
I need to move a huge system from MySQL to PostgreSQL. This cannot be done in one go, which is why I need a robust (real time or near real time) data bi-directional synchronisation solution between MySQL and PostgreSQL. SymmetricDS looks like a tool that could solve my problem. However...
Would SymmetricDS be capable of this? The documentation is extensive and it doesn't clearly state that it would work in this particular situation. I'd like to know that this is at least possible, before spending a few weeks and hitting a dead end.
SymmetricDS is capable of this.
I've configured a bi-directional sync between MySQL and PostgreSQL. It shouldn't take a couple of weeks to setup a test. Start off by syncing a single table without dependencies.
For a one time import export it is also possible to use the SymmetricDS DbImport DbExport tools.
I'm looking for a possible solution for the following problem.
First the situation I'm at:
I've 2 databases, 1 Oracle DB and 1 MySQL DB. Although they have a lot of similarities they are not identical. A lot of tables are available on both the Oracle DB and the MySQL DB but the Oracle tables are often more extensive and contain more columns.
The situation with the databases can't be changed, so I've to deal with that.
Now I'm looking for the following:
I want to synchronise data from Oracle to MySQL and vice versa. This has to be done real time or as close to real time as possible. So when changes are made at one DB they have to be synced to the other DB as quickly as possible.
Also not every table has to be in sync, so the solution must offer a way of selecting which tables have to be synced and which not.
Because the databases are not identical replication isn't an option I think. But what is?
I hope you guys can help me with finding a way of doing this or a tool which does exactly what I need. Maybe you know some good papers/articles I can use?
Thanks!
Thanks for the comments.
I did some further research on ETL and EAI.
I found out that I am searching for an ETL tool.
I read your question and your answer. I have worked on both Oracle, SQL, ETL and data warehouses and here are my suggestions:
It is good to have a readymade ETL tool. But, if your application is big enough to make you need a tailor made ETL tool, I suggest you for a home-made ETL process.
If your transactional database is on Oracle, you can have triggers set up on the key tables that would further trigger an external procedure written in C, C++ or Java.
The reason behind using an external procedure is to be able to communicate with both databases at a time - Oracle and MySQL.
You can read more about Oracle External Procedures here.
If not through ExtProc, you can develop a separate application in Java or .Net that would extract data from the first database, transform it according to your business rules and load it into your warehouse.
In either approaches that you choose, you will have greater control on the ETL process if you implement your own tool, rather than going for a readymade tool.
The only good reference that I can find on the internet is this whitepaper, which explains what database tiering is, but not how it works:
The concept behind database tiering is
the seamless co-existence of multiple
(legacy and new) database technologies
to best solve a business problem.
But, how does it implemented? How does it work?
Any links regarding this would also be helpful. Thanks.
I think the idea of that document is you to put "cheap" databases in front of the "expensive" databases to reduce costs.
For example. Let's assume you have an "expensive" db...something like Oracle, or DB2 or even MSSQL (more realistically it's probably more of an issue with a legacy DB system that is not supported much or you need specialized resources to maintain). A database engine that costs a lot to purchase and maintain (arguably these are not expensive when you take all factors into consideration. But let's use them for the example).
Now if you suddenly get famous and your server starts to get overloaded what do you do? Do you buy a bigger server and migrate all your data to that new server? That could be incredibly expensive.
With the tiering solution you put several "cheap" databases in front of you "expensive" database to take the brunt of the work. So your web servers (or app servers) talk to a bunch of MySQL servers, for example, instead of directly to the your expensive server. Then these MySQL servers handle the majority of the calls. For example, they could handle all read-only calls completely on their own and only need to pass write-calls back to the main database server. These MySQL servers are then kept in sync via standard replication practices.
Using methods like this you could in theory scale out your expensive server to dozens, if not hundreds, of "cheap" database servers and handle a much higher load.
Database tiering is just a specific style of tiering. There are also application tiering and service tiering. It's a form of scalability.
What exactly are you asking? This question is rather vague.
This is a PDF from a course at Ohio State. What it discusses is a bit over my head, but hopefully you might understand it better.
I am developing a Rails application that will access a lot of RSS feeds or crawl sites for data (mostly news). It will be something like Google News but with a different approach, so I'll store a lot of news (or news summaries), classify them in different categories and use ranking and recommendation techniques.
Should I go with MySQL?
Is it worthwhile using IBM DB2
purexml to store the doucuments?
Also Ruby search implementations
(Ferret, Ultrasphinx and others) are
not needed If I choose DB2. Is that correct?
What are the advantages of
PostreSQL in this?
Does it makes sense to use Couch DB in
this scenario?
I'd like to choose the best option but without over-complicating the solution. So I discarded the idea to use two different storage solutions (one for the news documents and other for the rest of the data). I'm also considering only "free" options, so I didn't look at Oracle or MS SQL Server.
purexml is heavier than SQL, so you pay more for your roundtrip between webserver and DB. If you plan to have lots of users, I'd avoid it, your better off letting your webserver cache the requests, thus avoiding creating xml(rss) everytime, if that is what you are thinking about.
I'd go with MySQL because its really good at serving and its totally free, well PostgreSQL is too, but haven't used it so I can't say.
CouchDB could make sense, but not if you plan on doing OLAP (Offline Analysis) of your data, a normal RDBMS will be better at it.
Admitting firstly that I generally don't like mysql, I will say that there has been writing on this topic regarding postgres:
http://oldmoe.blogspot.com/2008/08/101-reasons-why-postgresql-is-better.html
This is always my choice when I need a pure relational database. I don't know whether a document database would be more appropriate for your application without knowing more about it. It does sound like it's something you should at least investigate.
MySQL is probably one of the best options out there; light, easy to install and maintain, multiplatform and free. On top of that there are some good free client tools.
Something to think about; because of the nature of your system you will probably have some tables that will grow quite a lot very quickly so you might want to think about performance.
Thus, MySQL supports vertical partitioning but only from V 5.1.
It sounds to me the application you will build can easily become a large-scale web app. I would suggest PostgreSQL, for it has been known for its reliability.
You can check out the following link -- Bob Ippolito from MochiMedia tells us why they ditched MySQL for PostgreSQL. Although the posts are more than 3 years old, the issues MySQL 5.1 has recently tend to prove that they are still relevant.
http://bob.pythonmac.org/archives/category/sql/mysql/
MySQL is good in production. I haven't used PostgreSQL for rails, but it's a good solution as well.
In the dev and test environments I'd start out with SQLite (default), and perhaps migrate to your target DB in the test environment as you move closer to completion.
I need to set up a MySQL environment that will support adding many unique databases over time (thousands, actually).
I assume that at some point I will need to start adding MySQL servers, and would like my environment to be prepared for the case beforehand, to make the transition to a 2nd, 3rd, 100th server easy.
And just to make it interesting, It would be very convenient if the solution was modeled so the application that queries the databases sends all the queries to a single address and receives a result. It should be unaware of the number and location of the servers. The database name is unique and can be used to figure out which server holds the database.
I've done some research, and MySQL Proxy pops out as the main candidate, but I haven't been able to find anything specific about making it perform as described above.
Anyone?
Great question. I know of several companies that have done this (Facebook jumps out as the biggest). None are happy, but alternatives kind of suck, too.
More things for you to consider -- what happens when some of these databases or servers fail? What happens when you need to do a cross-database query (and you will, even if you don't think so right now).
Here's the FriendFeed solution: http://bret.appspot.com/entry/how-friendfeed-uses-mysql
It's a bit "back-asswards" since they are basically using MySQL as a glorified key-value store. I am not sure why they don't just cut out the middleman and use something like BerkeleyDB for storing their objects. Connection management, maybe? Seems like the MySQL overhead would be too high a price to pay for something that could be added pretty easily (famous last words).
What you are really looking for (I think) is a distributed share-nothing database. Several have been built on top of open-source technologies like MySQL and PostgreSQL, but none are available for free. If you are in the buying mood, check out these companies: Greenplum, AsterData, Netezza, Vertica.
There is also a large number of various distributed key-value storage solutions out there. For lack of a better reference, here's a starting point: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/ .
Your problem sounds similar to one we faced - that you are acting as a white-label, and that each client needs to have their own separate database. Assuming this concept parallels yours, what we did was leverage a "master" database that stored the hostname and database name for the client (which could be cached in the application tier). The server the client was accessing could then dynamically shift its datasource to the required database. This allowed us to scale up to thousands of client databases, scattered across servers.