SymmetricDS: real time synchronisation of MySQL with PostgreSQL - mysql

I need to move a huge system from MySQL to PostgreSQL. This cannot be done in one go, which is why I need a robust (real time or near real time) data bi-directional synchronisation solution between MySQL and PostgreSQL. SymmetricDS looks like a tool that could solve my problem. However...
Would SymmetricDS be capable of this? The documentation is extensive and it doesn't clearly state that it would work in this particular situation. I'd like to know that this is at least possible, before spending a few weeks and hitting a dead end.

SymmetricDS is capable of this.
I've configured a bi-directional sync between MySQL and PostgreSQL. It shouldn't take a couple of weeks to setup a test. Start off by syncing a single table without dependencies.
For a one time import export it is also possible to use the SymmetricDS DbImport DbExport tools.

Related

How to replicate two different database systems?

I'm not sure, if it fits exactly stackoverflow, however as i'm seeking for some code rather than a tool, i think it does.
I'm looking for a way of how to replicate / synchronize different database systems -- in this case: mysql and mongodb. We are running both for different purpose. We started with a mysql database and added mongodb later on for special applications. There's data we would like to have in both databases, where we want to have constraints in mysql respectivly dbrefs in mongodb. For example: We need a user-record in mysql, but also in mongodb for references between tables respectivly objects. At the moment we have a cronjob, which dumps the mysql data and imports it in mongodb. However though it works quite well, that's not the solution we would like to have.
I think for the moment a one-way replication would be enough -- mysql->mongodb, the important part is, that the replication works in "realtime", much like a mysql master->slave replication works.
Are there already any solutions for this problem or ideas anyone of how to achieve this?
Thanks!
SymmetricDS is open source, Java-based, web-enabled, database independent, data synchronization/replication software that might do the trick with a few tweaks. It has an extension point called IDataLoaderFilter which you could use to implement a MongodbDataLoader.
This would help with one way database replication. It might be a little more difficult to synchronized from MongoDb -> relational database, but the SymmetricDS team would be very helpful in trying to find the solution.
What you're looking for is called EAI (Enterprise application integration). There are a lot of commercial tools around but under the provided link, you'll also find a couple OSS solutions. The basis of EAI is that you have data sources and data sinks. The EAI framework offers tools to build custom pumps between the two.
I suggest to either use a DB trigger to start the synchronization or send a trigger signal in your applications. Note that there is no key-hole solution since synchronization can become arbitrarily complex (for example, how do you make sure that all rows are copied?).
As far as I see you need to develop some sort of "Control program" that has the drivers for each DBMS and run it as a daemon. The daemon should have a trigger or a very small recheck interval to keep the DBs synchronized
Technically, you could set up a process which parses the binary log of the MySQL server and replicate the relevant sql queries. I've never done such a thing with a a different database as a slave, but maybe it is worth a shot?

Postgresql slave for Mysql Master. Possible?

I need to make some tests for a potential migration from Mysql to PostgreSql.
It will be easier to test if it is possible to use Postgre as slave for my MySQL master.
Is it possible ?
Thanks in advance
No.
You can build something yourself using triggers and an external process to send data over, but it's fairly difficult since mysql has a rather limited support for triggers.
For your scenario you're likely to be better off doing periodic dumps of the data over. The best way is often to migrate the schema manually, and then send your data over as CSV. The "mysqldump --compatible" usually doesn't work well enough.
It is possible. Sort of. Maybe.
One solution that supposedly supports MySQL -> PostgreSQL migration is Continuent's open-source Tungsten Replicator.
You can see some instructions on how to implement this "Heterogeneous Replication" here (although the method they suggest, using tungsten-installer, is deprecated and you might be better off using tpm like so).
Thing is, while there are plenty of resources indicating Tungsten really did use to support this, officially it seems they no longer do. This means that if you try to use the most recent Tungsten Replicator version (3.*), you'll quickly find that some files needed for Postgres are missing.
If, on the other hand, you try to download an older version, say 2.2.1, none of those errors appear, and all the files seem to be present, which leaves some room for optimism.
Personally, I must admit I haven't been able to get 2.2.1 to work either, but this probably has more to do with my lack of experience using Tungsten Replicator in general, and not with Postgres support. Also, in my case the real-time element wasn't as important, so we just ended up going with a cron job running pgloader.
So, if real-time replication from MySQL to Postgres is something you must have, I'd recommend at least trying out Tungsten Replicator before you start implementing a solution of your own. However, if real-time isn't an absolute requirement, there are probably simpler ways.
(Also, you might want to have a look at SymmetricDS which claims to do something similar, though I haven't personally looked into it.)
I don't think so, master-slave replication is only possible between same databases.
You could configure MySQL using the PostgreSQL-SQL-mode and you could also make a dump ready to import in Postgresql by using --compatible in mysqldump.
SymmetricDS does support MySQL to Postgres replication. There is an open source version available as well as a professional version which provides a web based interface.

What's the fastest way to import a large mysql database backup?

What's the fastest way to export/import a mysql database using innodb tables?
I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".
From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.
Are there better ways to do this, or further optimizations we should be doing?
Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.
Update:
To answer some questions posed in the answers:
The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.
We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.
I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.
We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.
It depends on how you define "fastest".
As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.
The pertinent questions are:
How often does your production database schema change?
Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.
How often do you need to put production data into a development environment?
In my experience, not very often at all. I've generally found that once a month is more than sufficient.
How long does mysqldump take?
If it's less than 8 hours it can be done overnight as a cron job. Problem solved.
Do you need all the data?
Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.
Conclusion
Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.
Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.
Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.

Is SQLite suitable for use in a production website?

I'm rewriting a PHP+MySQL site that averages 40-50 hits a day using Django.
Is SQLite a suitable database to use here? Are there any advantages/disadvantages between them?
I'm just using the db to store a blog and the users who can edit it. I am using fulltext search for the blog search, but no complex joins anywhere.
40-50 hits per day is very small and SQLLite can be used without any problem.
MySql might be better once you will get more hit because it handles in a better way multiple connexion (lock isn't the same with MySql and SqlLite).
The major problem with sqlite is concurrency. If you expect 40-50 hits a day, that's probably a non-issue. However, if that load increases you should be ready to migrate to a database daemon such as MySQL - better abstract your database specific code to make such a switch as painless as possible.
The performance section of the SQLite wiki might be of use to you.
Since you're already using an adequate database, I don't see a reason to migrate to a smaller one.
While sqlite might be perfectly adequate, too - changing to a less-capable platform from a more-capable one doesn't seem the best choice :)
SQLite will work just fine for you. It sounds as though you're largely using the database as read-only (with occasional writes to update the content). SQLite excels at this kind of access pattern. The only place where SQLite chokes is when you have a lot of writes to a database, because once a process attempts to write the file is locked until the write is complete. Also, if you do lots of writes (like updating rows in a loop) you should look into putting all those writes into a transaction - while the file is locked once the transaction hits a write query, the updates themselves take much less time because they're written to the file at once and not individually.
SQLite would be fine for this level of traffic. It actually performs quite well, the only thing that it is lacking is caching of data and queries because it needs to be spun up every time your page is accessed. That said, it is still very quick and it shouldn't be too hard to migrate to MySQL later if need be.

Apart from initial cost, are there any other benefits of using MySQL over MSQL server with .net?

I've used both and I've found MySql to have several frustrating bugs, limited support for: IDE integration, profiling, integration services, reporting, and even lack of a decent manager. Total cost of ownership of MSSQL Server is touted to be less than MySQL too (.net environment), but maintaining an open mind could someone point out any killer features of MySql?
I've used MySQL in the past and I'm using MSSQL lately but I can't remember anything that MySQL has and MSSQL can't do.
I think the most killer feature of MySQL it's the simplicity. For some projects you just don't need all the power you can have with a huge system like MSSQL. I have an UNIX heritage and find the simple configuration file like my.ini a killer feature of MySQL.
Also the security system of MySQL is much less robust but it makes the job right for most of applications. I believe MySQL it's killer itself from this point of view, and should stay that way, letting young users being introduced to RDBMS with a simple view first. If your project gets big enough that you are considering switch to a more robust system, then MSSQL can pop as a possibility.
That's what happened to me.
The only thing I can think of, off hand, is locking. SQLServer has traditionally had poor locking strategy that has tripped many people up.
You should use what you prefer, ultimately. Its not as if MySQL is not good enough to compete with MS SQL, eg. Slashdot uses MySQL, so its hardly got problems with high-scalability performance.
Its killer feature though, is that it is free - you can deploy as many of them without worrying one fig about licensing issues. That's more important for the spread of software than anyone could imagine.
(TCO is a difficult thing to calculate - and is advice only ever given from paid consultants and other vested interests. Ignore that. MSSQL is expensive and MySQL is free.)
About 6 years ago I developed a custom e-commernce website using ASP and MySQL for the database. At the time MySQL was clearly a better choice than MSDE which had built in throttling which concerned me enough to use MySQL. Also the difference in coding between using MySQL and MSDE/SQL was not that different or much of a concern.
Now all these years later I'm trying to get the code converted to .NET and even after purchasing commercial MySQL drivers from CRLab. I found that, as you hinted, the IDE integration is just not up to par.
I will say that MySQL is doing a great job even with our database tables approaching 4GB. So when I switch to MSSQL I have to go ahead and get SQL Workstation or higher ($$$), and not use SQL Express which has a 4gb limit.
All of my experience has changed the way I develop new websites. Now, unless it is expected to have a lot of traffic. I use VistaDB and then upgrade to SQL Server if needed. VistaDB is syntax and datasource compatible with SQL Server. And the best part is it is only a single file for the database and a dll for your bin folder.
That's my two cents based on my personal experience with using MySQL in ASP and now .NET.
I work with MSSQL, MySql and PostGres regularly (using .net, java and PHP). One of my favorite things about about MySQL (esp. compared to MSSQL) is the ease with which you can run and restore full database backups.
MSSQL's model of using .bak files is really ugly and time-consuming (topic for another post.) But if you want to do somethign like automated testing, or automated build processes (that include building a db from scratch), MySQL can be a bit easier to deal with.
A few other points:
The management tools have gotten a lot better since the early days.
If you are interested in transactions, constraints, etc.. be sure you are defining your tables to use the InnoDB storage engine (instead of MyISAM which is designed for speed.)
I do miss MSSQL's schema generating tool, but I think there are equivalent tools out there.
We've used a Linux database server and a window's web server (for .net apps) with great success.
If you are using something like NHibernate or some other non-MS data abstraction layer, the case to look beyond MSSQL is stronger too...
Three points to consider; unfortunately the first two are contradictory:
1) .NET and MySQL were not designed to interact with one another, and there is no official support from either side. You're invariably going to encounter issues trying to use them together.
2) If portability off of Windows may ever be an issue (much .NET code runs quite nicely on other platforms via Mono), you'll want to avoid locking yourself too deeply to MSSQL. That doesn't mean not using it, but being careful that you don't rely on its particular quirks too much.
3) TCO is just a buzzword. It's complete nonsense when it's calculated by anyone other than you. Nobody can make such a calculation and honestly claim that it has any relevance outside their particular environment. There are too many factors, most of which have absolutely nothing to do with things like tool availability.
I've been using the community version of MySQL for alsmost 99% of my project. I like MySQL is that I can deploy via Xcopy and is powerful compare to other "xcopy-able" database server. I also wrote a wrapper to start and stop MySQL & Apache (like LAMP), but with my own implemetation and addon capability
MySQL probably has a lower TCO, since administration and configuration is more simple and straightforward than the Spaghetti GUI that MS SQL makes you do most of the configuration through, having to dig through hundreds of obscure properties dialogs to accomplish even basic administration tasks.
There is one area where MS SQL clearly excels over MySQL in my experience:
Integration with other technologies. MS SQL allows you to replicate back and forth with Oracle and MySQL databases, and provides SSIS for executing scheduled data transformations from other database servers.
There may be others, but I don't have experience with them.