I need to make some tests for a potential migration from Mysql to PostgreSql.
It will be easier to test if it is possible to use Postgre as slave for my MySQL master.
Is it possible ?
Thanks in advance
No.
You can build something yourself using triggers and an external process to send data over, but it's fairly difficult since mysql has a rather limited support for triggers.
For your scenario you're likely to be better off doing periodic dumps of the data over. The best way is often to migrate the schema manually, and then send your data over as CSV. The "mysqldump --compatible" usually doesn't work well enough.
It is possible. Sort of. Maybe.
One solution that supposedly supports MySQL -> PostgreSQL migration is Continuent's open-source Tungsten Replicator.
You can see some instructions on how to implement this "Heterogeneous Replication" here (although the method they suggest, using tungsten-installer, is deprecated and you might be better off using tpm like so).
Thing is, while there are plenty of resources indicating Tungsten really did use to support this, officially it seems they no longer do. This means that if you try to use the most recent Tungsten Replicator version (3.*), you'll quickly find that some files needed for Postgres are missing.
If, on the other hand, you try to download an older version, say 2.2.1, none of those errors appear, and all the files seem to be present, which leaves some room for optimism.
Personally, I must admit I haven't been able to get 2.2.1 to work either, but this probably has more to do with my lack of experience using Tungsten Replicator in general, and not with Postgres support. Also, in my case the real-time element wasn't as important, so we just ended up going with a cron job running pgloader.
So, if real-time replication from MySQL to Postgres is something you must have, I'd recommend at least trying out Tungsten Replicator before you start implementing a solution of your own. However, if real-time isn't an absolute requirement, there are probably simpler ways.
(Also, you might want to have a look at SymmetricDS which claims to do something similar, though I haven't personally looked into it.)
I don't think so, master-slave replication is only possible between same databases.
You could configure MySQL using the PostgreSQL-SQL-mode and you could also make a dump ready to import in Postgresql by using --compatible in mysqldump.
SymmetricDS does support MySQL to Postgres replication. There is an open source version available as well as a professional version which provides a web based interface.
Related
I need to move a huge system from MySQL to PostgreSQL. This cannot be done in one go, which is why I need a robust (real time or near real time) data bi-directional synchronisation solution between MySQL and PostgreSQL. SymmetricDS looks like a tool that could solve my problem. However...
Would SymmetricDS be capable of this? The documentation is extensive and it doesn't clearly state that it would work in this particular situation. I'd like to know that this is at least possible, before spending a few weeks and hitting a dead end.
SymmetricDS is capable of this.
I've configured a bi-directional sync between MySQL and PostgreSQL. It shouldn't take a couple of weeks to setup a test. Start off by syncing a single table without dependencies.
For a one time import export it is also possible to use the SymmetricDS DbImport DbExport tools.
We are currently evaluating failover support in different databases.
We were earlier using HSQLDB but it seems that it does not have clustering/replication support.
Our requirement is simply to have two database servers, one being only for synchronous backup but if the primary server is down, then the secondary should automatically start acting as the primary server.
Has anyone evaluated MySQL, PostgreSQL or any other DB server for such a use case?
Edit: We had thought of using MySQL cluster but it now seems that it is under GPL license which we won't be able to work with. Could anyone please suggest a synchronous replication/clustering solution which can be used? We are currently using HSQL, so a solution with HSQL used in clustered mode will be ideal for us but we are open for change.
Stackoverflow resources
MySQL supports replication out of the box: see this question for MySQL: Scaling solutions for MySQL (Replication, Clustering)
PostgreSQL also support replication, see this question for that: PostgreSQL replication strategies
If your requirements are simple MySQL will work
I've used MySQL is a simple master-master failover scenario using the setup I read in High Performance MySQL. I highly recommend the book if you're keen on using MySQL.
It has worked well for me, because I just wanted a simple fail-over.
If your use case is just as simple. It will work well.
Just for completeness, the H2 database has some clustering support, but compared to the MySQL and PostgreSQL features it is very limited, it's really only failover. I would first look at HA-JDBC.
for a simple failover where servers are on the same location. you can use DRBD and Heartbeat.
In a nutshell: DRBD stores the data on 2 servers on the same time. fully transparent to the system. with heartbeat the standby checks against the main server, if its not reachable, it takes over the resource, mounts it and starts the database daemon. (works with mysql, postgres and most probably with most other daemons out there)
There is a third-party product that works with HSQLDB:
http://ha-jdbc.sourceforge.net/
Not sure this is within the desired price range of most FOSS-type people :-) but we use DB2 9.7 for exactly this purpose (actually, we mostly use DB2/z on the mainframe for it, but some customers like the DB2/LUW (Linux/UNIX/Windows) option for smaller systems).
DB2 comes with high availability (HA) features built in and you can use db2haicu, the DB2 High Availability Instance Configuration Utility (gotta love those acronym generators employed by Big Blue) to configure things relatively painlessly.
It's active/passive as you desired, although DB2 is certainly capable of active/active setups for load balancing.
The particular setups we're most familiar with at the low end (everything other than a mainframe) are actually shared disk ones, with the HA applying to only DBMS resources and not data, but you can separate the data with DB2 replication features as well.
We've had one client (at least) using Q replication, which is a very low latency replication method, close to synchronous but not quite. DB2 does actually provide real synchronous replication as well.
DeveloperWorks has an interesting article on how this all hangs together, along with the various options.
I'm not sure, if it fits exactly stackoverflow, however as i'm seeking for some code rather than a tool, i think it does.
I'm looking for a way of how to replicate / synchronize different database systems -- in this case: mysql and mongodb. We are running both for different purpose. We started with a mysql database and added mongodb later on for special applications. There's data we would like to have in both databases, where we want to have constraints in mysql respectivly dbrefs in mongodb. For example: We need a user-record in mysql, but also in mongodb for references between tables respectivly objects. At the moment we have a cronjob, which dumps the mysql data and imports it in mongodb. However though it works quite well, that's not the solution we would like to have.
I think for the moment a one-way replication would be enough -- mysql->mongodb, the important part is, that the replication works in "realtime", much like a mysql master->slave replication works.
Are there already any solutions for this problem or ideas anyone of how to achieve this?
Thanks!
SymmetricDS is open source, Java-based, web-enabled, database independent, data synchronization/replication software that might do the trick with a few tweaks. It has an extension point called IDataLoaderFilter which you could use to implement a MongodbDataLoader.
This would help with one way database replication. It might be a little more difficult to synchronized from MongoDb -> relational database, but the SymmetricDS team would be very helpful in trying to find the solution.
What you're looking for is called EAI (Enterprise application integration). There are a lot of commercial tools around but under the provided link, you'll also find a couple OSS solutions. The basis of EAI is that you have data sources and data sinks. The EAI framework offers tools to build custom pumps between the two.
I suggest to either use a DB trigger to start the synchronization or send a trigger signal in your applications. Note that there is no key-hole solution since synchronization can become arbitrarily complex (for example, how do you make sure that all rows are copied?).
As far as I see you need to develop some sort of "Control program" that has the drivers for each DBMS and run it as a daemon. The daemon should have a trigger or a very small recheck interval to keep the DBs synchronized
Technically, you could set up a process which parses the binary log of the MySQL server and replicate the relevant sql queries. I've never done such a thing with a a different database as a slave, but maybe it is worth a shot?
I need a DBMS, but do not know which to choose.
Basically, the application makes many INSERT / UPDATE, but also many SELECT. SELECT mostly very simple, one field only.
I am using MySQL + InnoDB at the moment, but as the database is growing, I need the best solution. The table can grow indefinitely, and the time +- 2GiB
EDIT:
Will run on Linux, and perhaps rarely in FreeBSD.
Not need a user management, all processes currently connect as root. Typically, there are many simultaneous accesses (now in 83 threads, according to the mysqladmin).
Access will be with C++, but need access to PHP also
PHPMyAdmin statistics:
select: 42.57%
insert: 7.97%
update: 49.45%
EDIT2:
After some thought, and the answers here, I believe that I can't use MySQL for your client library is GPL
Any alternative that does not harm (much) performance?
I think you have plenty of options.
You can continue to use MySQL. YouTube have used it fairly successfully
PostgreSQL (Free, Open Source, pretty good performance, reliable)
Oracle (NOT free, but has good support for very large databases)
If it's very simple queries, could it be done well with a key/value store?
According to this, the maximum database size on Linux 2.4+ (ext3) is 4TB. So I think you are safe to stick with MySQL+InnoDB if performance is adequate.
I would think MySQL is an excellent choice from what you've stated. Oracle isn't free, and has some overhead in all the security and enterprise level features that MySQL doesn't. You want support for multiple languages. MySQL can scale well (I believe Flickr is a good example). Most databases are accessible via most languages: e.g. Perl, Java and C all have driver based APIs ( JDBC, DBI and ODBC ). IIRC PHP has one very similiar to DBI. Also: starting with a database does allow you some wiggle room for the future: e.g. joins and aggregation.
One advice I would give is: make sure whatever you choose is ACID compliant. Also, You might take the time to compare PostGres and see if there is something about it that meets your needs as well or better than MySQL.
We had an applicationg running using MySql. We found MySql was not suitable for our app after we found that it didnt support some of the GIS capability that PostGIS has (note: mysql only supports minimum-bounding rectangle GIS search).
So we changed our DB to PostgreSQL. We then found out that Postgresql 8.2 running on Windows is so much slower compared to Mysql 5.1. By slower, I mean at roughly 4-5 times slower.
Why is this? Is there something in the configuration that we need to change?
I found some comments from other websites such as this:
UPDATE: We found that the cause of the slowness is due to the BLOB that we are inserting into the DB. We need to be able to insert BLOB at a sustained rate of 10-15 MB/s. We are using libpq's lo_read and lo_write for each BLOB we are inserting/reading. Is that the best way? Has anyone used Pgsql for inserting large BLOB at a high rate before?
EDIT: I heard that PgSql just recently got ported to Windows. Could this be one of the reasons?
There are cases where PostgreSQL on Windows pays an additional overhead compared to other solutions, due to tradeoffs made when we ported it.
For example, PostgreSQL uses a process per connection, MySQL uses a thread. On Unix, this is usually not a noticeable performance difference, but on Windows creating new processes is very expensive (due to the lack of the fork() system call). For this reason, using persistent connections or a connection pooler is much more important on Windows when using PostgreSQL.
Another issue I've seen is that early PostgreSQL on Windows will by default make sure that it's writes are going through the write cache - even if it's battery backed. AFAIK, MySQL does not do this, and it will greatly affect write performance. Now, this is actually required if you have a non-safe hardware, such as a cheap drive. But if you have a battery-backed write cache, you want to change this to regular fsync. Modern versions of PostgreSQL (certainly 8.3) will default to open_datasync instead, which should remove this difference.
You also mention nothing about how you have tuned the configuration of the database. By default, the configuration file shipped with PostgreSQL is very conservative. If you haven't changed anything there, you definitely need to take a look at it. There is some tuning advice available on the PostgreSQL wiki.
To give any more details, you will have to provide a lot more details about exactly what runs slow, and how you have tuned your database. I'd suggest an email to the pgsql-general mailinglist.
While the Windows port of PostgreSQL is relatively recent, my understanding is that it performs about as well as the other versions. But it's definitely a port; almost all developers work primarily or exclusively on Unix/Linux/BSD.
You really shouldn't be running 8.2 on Windows. In my opinion, 8.3 was the first Windows release that was truly production-ready; 8.4 is better yet. 8.2 is rather out of date anyway, and you'll reap several benefits if you can manage to upgrade.
Another thing to consider is tuning. PostgreSQL requires more tuning than MySQL to achieve optimal performance. You may want to consider posting to one of the mailing lists for help with more than basic tweaking.
PostgreSQL is already slower than MySQL up to a certain point (it is actually faster when you have a ridiculously large database). Just FYI, this isn't causing your problem but keep that in mind.