Robust fault tolerant MySQL replication - mysql

Is there any way to get a fault tolerant MySQL replication? I am in an environment that has many networking issues. It appears that replication gets an error and just stops. I need it to continue to work and recover from these faults. There is some wrapper software that checks the state of replication and restarts it in the case of losing its log position. Is there an alternative?
Note:
Replication is done from an embedded computer with MySQL 4.1 to a external computer that has MySQL 5.0.45

What error are you getting? You also haven't described what replication scheme or Mysql version you're using. The errors you're getting are also important.
Replication usually stops when there's a primary/unique key conflict in a Master-Master replication. Other than that on a typical Master-Slave replication setup, networking issues shouldn't cause problems.
Try using Mysql 5.1 or newer, since replication in 5.0 is statement-based and causes problems in Master-Master setups, or when you're using stored-procedures.
(Also, stay away from Mysql Cluster ... noticed the advice on another comment).

Replication errors only happen if the databases get out of sync somehow, having the server simply continue would mean incoherent databases, I really doubt you'd want that.
In my experience, the only time you end up with such errors is if one of the master servers did not complete a query and the slave noticed.
In any case, if you really want to have the slave continue via some sort of chron job, you could always have a query run every few minuts asking the slave "SHOW SLAVE STATUS" then checking the error column, if it's present, send a "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER; START SLAVE;" command. But it would probably be much more apt to send an email to an admin when mysql encounters an error instead, so he/she can investigate the source of the problem and make sure the databases are actually in sync, otherwise you're likely to see more errors in the near future as the databases become more and more out of sync.

Consider MySQL Cluster using the NDB storage engine, it's meant to be shared-nothing and fault tolerant

MySQL replication will normally detect problems and reconnect anyway, continuing from where it left off.
If you're getting replication errors, it's likely that the source is something else. MySQL replication effectively does a "tail -f" on the query log and replays it on the slave (it's slightly smarter than that, but not much).
If the databases become out of sync, MySQL replication will neither detect nor repair this, but it may eventually cause it to break as a subsequent update cannot proceed due to conflicting data on the slave.
The default timeouts on the replication slave are much too long - it waits hours (or something) - you'll want to reduce this.
Data becoming out of sync is difficult to avoid, mitigation steps are:
Monitor replication using something like mk-table-checksum from Maatkit
Audit all your code for replication-unsafe queries
If using 5.1, switch to row-based replication, which is less likely to suffer from this problem

Related

mysql multi-source replication stability

We use mysql multi-source replication, analysts and developers works with the databases, it's a main task of server(cross-bases queries, etc). Mysql slave replicates data from about 10-15 servers, some of them is realy big (400gb, 500gb, 1.5tb). Slave host - ec2 r4.2xlarge. But have some problems, main of them - stability. Often there is an errors 1236, 1594. We fix one channel - another fails, sometimes it's critical.
Backups of most master databases are performed through ebs snapshot, where datadir located. But here another problem of mysql multi-source - work with backups, unlike multi instance replication, I can not just change datadir, where backup located.
I was trying to find a solution that at least partially solved the problems of stability and work with snapshots, but I found absolutely nothing.
Did I understand correctly that there is no support for multi-source replication in the aws-RDS?
Maybe there is a similar solution, or there is another approach to solving the problem (FEDERATED is bad variant). Thanks for any help and advices.
As Michael said, you shouldn't be encountering these errors. Multi source replication has quite stable now. I personally been using it for more than 2 years now. Yes I have faced some of issues like "slave trying to access binlogs that master has purged", but most of them were easy to sort out. I would recommend you to look at these replication topics:
Errant transactions
GTID hole : this was there because of bug
max allowed packet error
We fix one channel - another fails, sometimes it's critical.
make sure you're not touching GTID set for other channels.
Did I understand correctly that there is no support for multi-source
replication in the aws-RDS?
The feature still not available in any of AWS cloud services. https://forums.aws.amazon.com/thread.jspa?messageID=781416&tstart=0
Multi source replication is not supported in RDS , you can use AWS DMS for same purpose .

MYSQL replication does not work for all tables

I have a problem with MySQL replication - there is one table on master server which doesn't appear on slave server. Both master and slave has the same master_log_file and master_log_position, both slave_io and slave_sql threads are running, I even tried to add an empty table to the master database but it does appear on the slave database. It's not the first time I got such error but before that my symptopms were new data did not appear in slave database. Are there any other solutions for this problem than stopping replication on the slave, dropping the database, dumping it on master server, rsyncing to slave server and restarting replication from new file/position?
I noticed using
SHOW SLAVE STATUS;
that Relay_Log_Pos is smaller than Read_Master_Log_Pos and Relay_Log_File differ from Master_Log_File but Slave_SQL_Running_State says
Slave has read all relay log; waiting for more updates
Seconds_Behind_Master says 0.
MySQL officially only supports replication to the next higher version (although it will work for 5.7.13+), see Replication Compatibility Between MySQL Versions:
MySQL supports replication from one release series to the next higher release series. For example, you can replicate from a master running MySQL 5.5 to a slave running MySQL 5.6, from a master running MySQL 5.6 to a slave running MySQL 5.7, and so on.
However, you may encounter difficulties when replicating from an older master to a newer slave if the master uses statements or relies on behavior no longer supported in the version of MySQL used on the slave.
By default, replication will stop if an error occurs, and you have to restart it (after fixing the error). If you use the --slave-skip-errors=all-option however, it will skip these errors:
Normally, replication stops when an error occurs on the slave, which gives you the opportunity to resolve the inconsistency in the data manually. This option causes the slave SQL thread to continue replication when a statement returns any of the errors listed in the option value.
Do not use this option unless you fully understand why you are getting errors. If there are no bugs in your replication setup and client programs, and no bugs in MySQL itself, an error that stops replication should never occur. Indiscriminate use of this option results in slaves becoming hopelessly out of synchrony with the master, with you having no idea why this has occurred.
MySQL 5.5. and 5.7. will actually behave differently for a lot of statements, so enabling this option in this scenario will require even more care.
Without seeing your actual table create-statement, it is unclear what exactly caused that problem and how to fix it (or if it is possible), but you should especially check your configuration settings. MySQL 5.7. enables strict mode by default, so a usual suspect for incompatibilites is e.g. a zero default value for date/timestamp-columns like default '0000-00-00' (either explicit or implicit), which is not allowed anymore, see no_zero_date.
Even if you seem to not be too keen about 100% replication (which can snowball very fast, but that is up to you to evaluate for your scenario), resetting your slave (after fixing e.g. the configuration settings) at least once is probably the easiest solution, as there might have been other things you may have missed, and, if executed without errors, will also doublecheck if your tables and data up to that point are compatible with your 5.7-slave now.

What when mysqldump becomes utterly slow

Currently my database is almost 20 GB big and still growing.
I'm taking a daily backup with mysqldump and it's getting really slow.
So slow that in the meanwhile new connections stack up and eventually cause this error:
SQLSTATE[HY000] [1040] Too many connections
(I could improve the amount of connections that's accepted but that won't do anything because the connections are still just frozen, waiting for the backup to complete, which will lead to timeout)
I've been reading up on some options to improve the speed and this is what I've found:
option --quick (Will probably help)
option --single-transaction (Will prevent tables from being locked, but may cause database to become incorrect)
Master-Slave replication (Probably the best thing I could do, one problem, I have only one server available)
The master-slave replication really sounds like it's the best option since I can stop the slave from updating, take the backup, and let it resume syncing. The problem is I only have one fysical machine to work with.
I know that I can set up multiple mysql instances on this one server. The question is: Is it wise to do so?
The slave is really only used to generate that backup file (which will be copied to a different disk on the network) so that the master can stay live.
if you use just innodb - try xtrabackup.
if you use both myisam and innodb - flush + lvm snapshot + file-level copy might work for you.
indeed replication slave for backups is good idea as well. just remember to periodically check data consistency between the master and the slave.

How to ensure MySQL replication SLAVE is fully synchronized with the replication MASTER?

Using simple replication settings with one MASTER and one SLAVE, how can one ensure that the SLAVE and MASTER are fully synchronized?
Now yes, they both started from the exact same image and replication is working and reporting that everything is okay BUT:
* It has happened that there were errors stopping the replication and then the replication had to be stopped and later resumed.
* Perhaps a change accidentally occurred on the SLAVE and then it's not the same as the MASTER anymore.
* Other whichever scenarios that might break sync.
While it's possible to do a big mysqldump of both database and compare the files I would be interested in a method that can be implemented more easily and also can be checked automatically to ensure all is in sync.
Thanks
Have you tried Percona Toolkit (formerly known as Maatkit)? You can use one of their tools which is pt-table-checksum for your case. You can check other tools too at their website.
pt-table-checksum performs an online replication consistency check by
executing checksum queries on the master, which produces different
results on replicas that are inconsistent with the master. The
optional DSN specifies the master host. The tool’s exit status is
nonzero if any differences are found, or if any warnings or errors
occur.
The following command will connect to the replication master on
localhost, checksum every table, and report the results on every
detected replica:
If you have MySQL Server versions 5.6.14 or higher, you can use the MySQL Replication Synchronization Checker. It's included in the MySql server package. Is designed to work exclusively for servers that support global transaction identifiers (GTIDs) and have gtid_mode=ON.
This utility permits you to check replication servers for synchronization. It checks data consistency between a master and slaves or between two slaves. The utility reports missing objects as well as missing data.
The utility can operate on an active replication topology, applying a synchronization process to check the data. Those servers where replication is not active can still be checked but the synchronization process will be skipped. In that case, it is up to the user to manually synchronize the servers.
See MySQL Documentation for more information
You are right to be suspicious of a seemingly healthy master/slave replication setup! We were running fine when suddenly we got alerts from check_mk concerning a database that existed on our master that did not exist on our slave... but the master and slave status outputs were good! How unnerving is that? The way to prove integrity of the process is to use checksums to verify the data.
I have seen a lot of chatter on the Internet recommending pt-table-checksum . However, its limitations proved to be too onerous for us to be comfortable with. Most importantly, it requires and even sets statement-based replication (see the pt-table-checksum link). As it says in the mysql 5.6 online documentation, (for row-based replication...) "all changes can be replicated. This is the safest form of replication." There are other disadvantages to statement-based replication that make our developers nervous because some functions cannot be replicated properly; see the doc for a list.
We have already experienced issues with a master and slave using statement-based replication so we're specifically trying to avoid it.
We are going to try mysqlrplsync which specifically mentions that it "works independently of the binary log format (row, statement, or mixed)". It also mentions, however, that gtid-mode must be on and it requires MySQL 5.6.14 and higher... which means, I believe, that the MySQL delivered with RHEL7/CentOS 7 at least is out. You'll need to get the MySQL Community Edition, which is left as an exercise for the reader but you can go here for the packages or here for the repos, including RHEL derivatives and Debian.

Two mysql servers using same database

I have a MySQL database running on our server at this location.
However, the internet connection at this location is slow (Especially when several users are connected remotely).
We also have a remote web server on a very fast internet connection.
Can I run another MySQL server on the remote server and still be able to run queries and updates on it?
I want to have two servers because
- Users at this location can connect via lan (fast)
- Users working remotely can connect to synced remote server (fast)
Is this possible? From what I understand replication does not work this way. What is replication used for then? Backups?
Thanks for your help!
[Edit]
After doing some more reading, I am a little worried about setting up multi-master replication due to the fact that I had not considered multi-master when designing the database and conflicts could be an issue.
The good news though is that most time consuming operations are queries not updates.
And, I found out that there is a driver that handles master-slave connections.
http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-replication-connection.html
That way writes will be sent to the master and reads can come from the faster connection.
Has anyone tried doing this before? My one concern is that if I update to the master, then run a query expecting to see the update on the slave, will it be there right away? Or will the slow connection make this solution just as slow as using the master for both read and write?
What you're asking, I believe, is called Multi-Master Replication, by which both servers serve as replication masters to each other. Changes on either server become replicated back to the other as soon as possible. MySQL can be configured to do it, however I'm not sure how the differences in speed would affect your performance and data integrity.
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication-multi-master.html