Failover support for a DB - mysql

We are currently evaluating failover support in different databases.
We were earlier using HSQLDB but it seems that it does not have clustering/replication support.
Our requirement is simply to have two database servers, one being only for synchronous backup but if the primary server is down, then the secondary should automatically start acting as the primary server.
Has anyone evaluated MySQL, PostgreSQL or any other DB server for such a use case?
Edit: We had thought of using MySQL cluster but it now seems that it is under GPL license which we won't be able to work with. Could anyone please suggest a synchronous replication/clustering solution which can be used? We are currently using HSQL, so a solution with HSQL used in clustered mode will be ideal for us but we are open for change.

Stackoverflow resources
MySQL supports replication out of the box: see this question for MySQL: Scaling solutions for MySQL (Replication, Clustering)
PostgreSQL also support replication, see this question for that: PostgreSQL replication strategies
If your requirements are simple MySQL will work
I've used MySQL is a simple master-master failover scenario using the setup I read in High Performance MySQL. I highly recommend the book if you're keen on using MySQL.
It has worked well for me, because I just wanted a simple fail-over.
If your use case is just as simple. It will work well.

Just for completeness, the H2 database has some clustering support, but compared to the MySQL and PostgreSQL features it is very limited, it's really only failover. I would first look at HA-JDBC.

for a simple failover where servers are on the same location. you can use DRBD and Heartbeat.
In a nutshell: DRBD stores the data on 2 servers on the same time. fully transparent to the system. with heartbeat the standby checks against the main server, if its not reachable, it takes over the resource, mounts it and starts the database daemon. (works with mysql, postgres and most probably with most other daemons out there)

There is a third-party product that works with HSQLDB:
http://ha-jdbc.sourceforge.net/

Not sure this is within the desired price range of most FOSS-type people :-) but we use DB2 9.7 for exactly this purpose (actually, we mostly use DB2/z on the mainframe for it, but some customers like the DB2/LUW (Linux/UNIX/Windows) option for smaller systems).
DB2 comes with high availability (HA) features built in and you can use db2haicu, the DB2 High Availability Instance Configuration Utility (gotta love those acronym generators employed by Big Blue) to configure things relatively painlessly.
It's active/passive as you desired, although DB2 is certainly capable of active/active setups for load balancing.
The particular setups we're most familiar with at the low end (everything other than a mainframe) are actually shared disk ones, with the HA applying to only DBMS resources and not data, but you can separate the data with DB2 replication features as well.
We've had one client (at least) using Q replication, which is a very low latency replication method, close to synchronous but not quite. DB2 does actually provide real synchronous replication as well.
DeveloperWorks has an interesting article on how this all hangs together, along with the various options.

Related

Query data from database for 2 different server

I want to query data from 2 different database server using mysql. Is there a way to do that without having to create Federated database as Google Cloud Platform does not support Federated Engine.
Thanks!
In addition to #MontyPython's excellent response, there is a third, albeit a bit cumbersome, way to do this if by any chance you cannot use Federated Engine and you also cannot manage your databases replication.
Use an ETL tool to do the work
Back in the day, I faced a very similar problem: I had to join data from two separate database servers, neither of which I had any administrative access to. I ended up setting up Pentaho's ETL suite of tools to Extract data from both databases, Transform if (basically having Pentaho do a lot of work with both datasets) and Loading it on my very own local database engine where I ended up with exactly the merged and processed data I needed.
Be advised, this IS a lot of work (you have to "teach" your ETL tool what you need and depending on what tool you use, it may involve quite some coding) but once you're done, you can schedule the work to happen automatically at regular intervals so you always have your local processed/merged data readily accesible.
FWIW, I used Pentaho's community edition so free as in beer
You can achieve this in two ways, one you have already mentioned:
1. Use Federated Engine
You can see how it is done here - Join tables from two different server. This is a MySQL specific answer.
2. Set up Multi-source Replication on another server and query that server
You can easily set up Multi-source Replication using Replication channels
Check out their official documentation here - https://dev.mysql.com/doc/refman/8.0/en/replication-multi-source-tutorials.html
If you have an older version of MySQL where Replication channels are not available, you may use one of the many third-party replicators like Tungsten Replicator.
P.S. - There is no such thing in MySQL as a FDW in PostgreSQL. Joins across servers are easily possible in other database management systems but not in MySQL.

PgPool like failover for MySQL - Currently using HAProxy

I'm running MySQL servers load balanced via HAProxy. But I need to perform some action whenever a node (MySQL) goes down or comes up, like we do in PgPool-II.
Example:
When ever MySQL-1 goes down, I also want to shutdown my WebServer-1. (STONITH like)
In PgPool-II for PostgreSQL, I can do this. I'm unaware, if there are similar tools for MySQL.
Is this possible in HAProxy itself:
Say, when my DB-1 goes down, route the web server traffic, to WebServer-2.
MySQL has quite an extensive support for replication, high availability and sharding. Your question is not very clear to me but here are a few things you could try reading:
For failover, MySQL provides a python script called mysqlfailover which is shipped as part of MySQL utilities.
For sharding and high availability you could try out MySQL Fabric. MySQL Fabric is a relatively new product so expect it to not provide all kinds of fancy sophisticated sharding schemes in theory but it is stable enough for you to trust it with what it provides.
By the way both of them are open source so you could tweak it to suit your need!

Trying bolster mysql failover

My MYSQL servers are not configured properly with failover. I am thinking of using Redhat cluster or heartbeat. Also, I need to achieve all tasks w/ one floating IP since application does not know about multiple databases. Can someone suggest what route would be the very for best?
If you decide on using regular replication I would recommend you using this:
MHA tool for MySQL replication high availability
I didn't have a good experience with the ways you want to use, they are not as good as a cluster and not so much better than normal replication with the tool I posted. I do believe you should take a look at this tool and maybe stick with regular replication or jump to serious HA with MySQL cluster. Configuring the cluster might be hard but using Several Nines Solutions things might be a bit easier.

How to scale MySQL with multiple machines?

I have a web app running LAMP. We recently have an increase in load and is now looking at solutions to scale. Scaling apache is pretty easy we are just going to have multiple multiple machines hosting it and round robin the incoming traffic.
However, each instance of apache will talk with MySQL and eventually MySQL will be overloaded. How to scale MySQL across multiple machines in this setup? I have already looked at this but specifically we need the updates from the DB available immediately so I don't think replication is a good strategy here? Also hopefully this can be done with minimal code change.
PS. We have around a 1:1 read-write ratio.
There're only two strategies: replication and sharding. Replication comes often in place when you have less write and much read traffic, so you can redirect the reads to many slaves, with the pitfall of lots of replication traffic with the time and a probability for inconsitency.
With sharding you shard your database tables across multiple machines (called functional sharding), which makes especially joins much harder. If this doenst fit anymore you also need to shard you rows across multiple machines, but this is no fun and depends a sharding layer implemented between you application and the database.
Document oriented databases or column stores do this work for you, but they are currently optimized for OLAP not for OLTP.
Depends on the application backend (i.e. how the PKs, transactions and insert IDs are handled), you might consider MASTER-MASTER replication with different auto_increment setups. This can be tricky and needs to be thoroughly tested but it can work.
Also, in new MySQL 5.6 there is a GTID (Global Transaction Identifier) that generally helps a lot in keeping the replication in sync, especially in this scenario.
You should take a look at MySQL Performance Blog. Maybe you'll find something useful.
Well... good luck scaling all those writes to a real large scale. The database engine becomes the bottleneck, too many locks and buffers mgmt and stuff...
The only way I found that really works is scale out, sharding, unfortunately sharding is not provided for MySQL "out of the box" (like in some NoSQLs such as Mongo). ScaleBase (disclaimer: I work there) is a maker of a complete scale-out solution an "automatic sharding machine" if you like. ScaleBae analyzes your data and SQL stream, splits the data across DB nodes, route commands and aggregates results in runtime – so you won’t have to!

Which database has the best support for replication

I have a fairly good feel for what MySQL replication can do. I'm wondering what other databases support replication, and how they compare to MySQL and others?
Some questions I would have are:
Is replication built in, or an add-on/plugin?
How does the replication work (high-level)? MySQL provides statement-based replication (and row-based replication in 5.1). I'm interested in how other databases compare. What gets shipped over the wire? How do changes get applied to the replicas?
Is it easy to check consistency between master and slaves?
How easy is it to get a failed replica back in sync with the master?
Performance? One thing I hate about MySQL replication is that it's single-threaded, and replicas often have trouble keeping up, since the master can be running many updates in parallel, but the replicas have to run them serially. Are there any gotchas like this in other databases?
Any other interesting features...
MySQL's replication is weak inasmuch as one needs to sacrifice other functionality to get full master/master support (due to the restriction on supported backends).
PostgreSQL's replication is weak inasmuch as only master/standby is supported built-in (using log shipping); more powerful solutions (such as Slony or Londiste) require add-on functionality. Archive log segments are shipped over the wire, which are the same records used to make sure that a standalone database is in working, consistent state on unclean startup. This is what I'm using presently, and we have resynchronization (and setup, and other functionality) fully automated. None of these approaches are fully synchronous. More complete support will be built in as of PostgreSQL 8.5. Log shipping does not allow databases to come out of synchronization, so there is no need for processes to test the synchronized status; bringing the two databases back into sync involves setting the backup flag on the master, rsyncing to the slave (with the database still runnning; this is safe), and unsetting the backup flag (and restarting the slave process) with the archive logs generated during the backup process available; my shop has this process (like all other administration tasks) automated. Performance is a nonissue, since the master has to replay the log segments internally anyhow in addition to doing other work; thus, the slaves will always be under less load than the master.
Oracle's RAC (which isn't properly replication, as there's only one storage backend -- but you have multiple frontends sharing the load, and can build redundancy into that shared storage backend itself, so it's worthy of mention here) is a multi-master approach far more comprehensive than other solutions, but is extremely expensive. Database contents aren't "shipped over the wire"; instead, they're stored to the shared backend, which all the systems involved can access. Because there is only one backend, the systems cannot come out of sync.
Continuent offers a third-party solution which does fully synchronous statement-level replication with support for all three of the above databases; however, the commercially supported version of their product isn't particularly cheap (though vastly less expensive. Last time I administered it, Continuent's solution required manual intervention for bringing a cluster back into sync.
I have some experience with MS-SQL 2005 (publisher) and SQLEXPRESS (subscribers) with overseas merge replication. Here are my comments:
1 - Is replication built in, or an add-on/plugin?
Built in
2 - How does the replication work
(high-level)?
Different ways to replicate, from snapshot (giving static data at the subscriber level) to transactional replication (each INSERT/DELETE/UPDATE instruction is executed on all servers). Merge replication replicate only final changes (successives UPDATES on the same record will be made at once during replication).
3 - Is it easy to check consistency between master and slaves?
Something I have never done ...
4 - How easy is it to get a failed replica back in sync with the master?
The basic resynch process is just a double-click one .... But if you have 4Go of data to reinitialize over a 64 Kb connection, it will be a long process unless you customize it.
5 - Performance?
Well ... You will of course have a bottleneck somewhere, being your connection performance, volume of data, or finally your server performance. In my configuration, users only write to subscribers, which all replicate with the main database = publisher. This server is then never sollicited by final users, and its CPU is strictly dedicated to data replication (to multiple servers) and backup. Subscribers are dedicated to clients and one replication (to publisher), which gives a very interesting result in terms of data availability for final users. Replications between publisher and subscribers can be launched together.
6 - Any other interesting features...
It is possible, with some anticipation, to keep on developping the database without even stopping the replication process....tables (in an indirect way), fields and rules can be added and replicated to your subscribers.
Configurations with a main publisher and multiple suscribers can be VERY cheap (when compared to some others...), as you can use the free SQLEXPRESS on the suscriber's side, even when running merge or transactional replications
Try Sybase SQL Anywhere
Just adding to the options with SQL Server (especially SQL 2008, which has Change Tracking features now). Something to consider is the Sync Framework from Microsoft. There's a few options there, from the basic hub-and-spoke architecture which is great if you have a single central server and sometimes-connected clients, right through to peer-to-peer sync which gives you the ability to do much more advanced syncing with multiple 'master' databases.
The reason you might want to consider this instead of traditional replication is that you have a lot more control from code, for example you can get events during the sync progress for Update/Update, Update/Delete, Delete/Update, Insert/Insert conflicts and decide how to resolve them based on business logic, and if needed store the loser of the conflict's data somewhere for manual or automatic processing. Have a look at this guide to help you decide what's possible with the different methods of replication and/or sync.
For the keen programmers the Sync Framework is open enough that you can have the clients connect via WCF to your WCF Service which can abstract any back-end data store (I hear some people are experimenting using Oracle as the back-end).
My team has just gone release with a large project that involves multiple SQL Express databases syncing sub-sets of data from a central SQL Server database via WAN and Internet (slow dial-up connection in some cases) with great success.
MS SQL 2005 Standard Edition and above have excellent replication capabilities and tools. Take a look at:
http://msdn.microsoft.com/en-us/library/ms151198(SQL.90).aspx
It's pretty capable. You can even use SQL Server Express as a readonly subscriber.
There are a lot of different things which databases CALL replication. Not all of them actually involve replication, and those which do work in vastly different ways. Some databases support several different types.
MySQL supports asynchronous replication, which is very good for some things. However, there are weaknesses. Statement-based replication is not the same as what most (any?) other databases do, and doesn't always result in the expected behaviour. Row-based replication is only supported by a non production-ready version (but is more consistent with how other databases do it).
Each database has its own take on replication, some involve other tools plugging in.
A bit off-topic but you might want to check Maatkit for tools to help with MySQL replication.
All the main commercial databases have decent replication - but some are more decent than others. IBM Informix Dynamic Server (version 11 and later) is particularly good. It actually has two systems - one for high availability (HDR - high-availability data replication) and the other for distributing data (ER - enterprise replication). And the the Mach 11 features (RSS - remote standalone secondary, and SDS - shared disk secondary) are excellent too, doubly so in 11.50 where you can write to either the primary or secondary of an HDR pair.
(Full disclosure: I work on Informix softare.)
I haven't tried it myself, but you might also want to look into OpenBaseSQL, which seems to have some simple to use replication built-in.
Another way to go is to run in a virtualized environment. I thought the data in this blog article was interesting
http://chucksblog.typepad.com/chucks_blog/2008/09/enterprise-apps.html
It's from an EMC executive, so obviously, it's not independent, but the experiment should be reproducible
Here's the data specific for Oracle
http://oraclestorageguy.typepad.com/oraclestorageguy/2008/09/to-rac-or-not-to-rac-reprise.html
Edit: If you run virtualized, then there are ways to make anything replicate
http://chucksblog.typepad.com/chucks_blog/2008/05/vmwares-srm-cha.html