I have a problem with MySQL replication - there is one table on master server which doesn't appear on slave server. Both master and slave has the same master_log_file and master_log_position, both slave_io and slave_sql threads are running, I even tried to add an empty table to the master database but it does appear on the slave database. It's not the first time I got such error but before that my symptopms were new data did not appear in slave database. Are there any other solutions for this problem than stopping replication on the slave, dropping the database, dumping it on master server, rsyncing to slave server and restarting replication from new file/position?
I noticed using
SHOW SLAVE STATUS;
that Relay_Log_Pos is smaller than Read_Master_Log_Pos and Relay_Log_File differ from Master_Log_File but Slave_SQL_Running_State says
Slave has read all relay log; waiting for more updates
Seconds_Behind_Master says 0.
MySQL officially only supports replication to the next higher version (although it will work for 5.7.13+), see Replication Compatibility Between MySQL Versions:
MySQL supports replication from one release series to the next higher release series. For example, you can replicate from a master running MySQL 5.5 to a slave running MySQL 5.6, from a master running MySQL 5.6 to a slave running MySQL 5.7, and so on.
However, you may encounter difficulties when replicating from an older master to a newer slave if the master uses statements or relies on behavior no longer supported in the version of MySQL used on the slave.
By default, replication will stop if an error occurs, and you have to restart it (after fixing the error). If you use the --slave-skip-errors=all-option however, it will skip these errors:
Normally, replication stops when an error occurs on the slave, which gives you the opportunity to resolve the inconsistency in the data manually. This option causes the slave SQL thread to continue replication when a statement returns any of the errors listed in the option value.
Do not use this option unless you fully understand why you are getting errors. If there are no bugs in your replication setup and client programs, and no bugs in MySQL itself, an error that stops replication should never occur. Indiscriminate use of this option results in slaves becoming hopelessly out of synchrony with the master, with you having no idea why this has occurred.
MySQL 5.5. and 5.7. will actually behave differently for a lot of statements, so enabling this option in this scenario will require even more care.
Without seeing your actual table create-statement, it is unclear what exactly caused that problem and how to fix it (or if it is possible), but you should especially check your configuration settings. MySQL 5.7. enables strict mode by default, so a usual suspect for incompatibilites is e.g. a zero default value for date/timestamp-columns like default '0000-00-00' (either explicit or implicit), which is not allowed anymore, see no_zero_date.
Even if you seem to not be too keen about 100% replication (which can snowball very fast, but that is up to you to evaluate for your scenario), resetting your slave (after fixing e.g. the configuration settings) at least once is probably the easiest solution, as there might have been other things you may have missed, and, if executed without errors, will also doublecheck if your tables and data up to that point are compatible with your 5.7-slave now.
Related
I'm doing a migration work from MySql to MariaDB where replication is involved, everything is working fine and compatibility of master MySql (5.5.59) to slave MariaDB (10.1.26) is good.
The problem occur when I enable the replication from MariaDB master to MariaDB slave (same versions: 10.1.26). In some situations, identified on massive updates, the slave start to lag.
If I restore the master to MySql (5.5.59) and I replicate to the same slave in MariaDB, the lag never occur on the same set of updates.
I checked the relay logs in the MariaDB slave that is lagging, comparing the ones received when MySql is the master and the ones received when MariaDB is the master, the only differences are that when the master is MariaDB I can see statements related to gtid.
I would like to disable the presence of the gtid statements on the relay log when the master is MariaDB and make a replication similar to the "old style" MySql replication without gtid, but I've not found if is possible to do that.
The replication lag was due to the engine set in the table mysql.gtid_slave_pos in the slave server, by default this table is InnoDB and the tables that were receiving the replication updates are not InnoDB.
As explained in the link below, every transaction executed by the slave cause also an update on mysql.gtid_slave_pos, if the engines of the tables are different, that can cause a bad performance (in my case the server was lagging 4000+ seconds, changing the engine in the mysql.gtid_slave_pos the replication is now immediate).
https://mariadb.com/kb/en/library/mysqlgtid_slave_pos-table/
From MariaDB 10.3.1, a new parameter has been introduced to help with this problem: gtid_pos_auto_engines This parameter will create a different table mysql.gtid_slave_pos for each engine involved in the replication. Unfortunately seems not possible to accomplish that with previus version of MariaDB, the table mysql.gtid_slave_pos must be unique and the choice of its engine is up to the DBA and the tables/queries involved in the replication
I'm using MySQL 5.7 with GTID master-master replication and I'm experiencing a strange error.
Randomly one of my masters will stop replicating with: "Cannot replicate anonymous transaction when ##GLOBAL.GTID_MODE = ON"
When I check there is indeed an anonymous entry in the binlog, but what isn't clear is how it got there since the other master also has GTID_MODE=ON and that should not allow any Anonymous transactions to execute or make their way in to the binlog.
enforce_gtid_consistency is also ON so queries that would result in an Anonymous transaction should be failing.
It's also only ever a single query/transaction. The previous and next queries in the binlog always have GTID's.
I ran into this same issue. The first time I saw it was when I upgraded to mysql 5.7.12.
I don't have a workaround/fix yet either. What version of mysql are you on?
Setting GTID_MODE to ON_PERMISSIVE helped me either on mysql 8.0.12 with the same error.
I left it at ON_PERMISSIVE because I don't see any disadvantage of it.
SET ##GLOBAL.GTID_MODE = ON_PERMISSIVE;
Using simple replication settings with one MASTER and one SLAVE, how can one ensure that the SLAVE and MASTER are fully synchronized?
Now yes, they both started from the exact same image and replication is working and reporting that everything is okay BUT:
* It has happened that there were errors stopping the replication and then the replication had to be stopped and later resumed.
* Perhaps a change accidentally occurred on the SLAVE and then it's not the same as the MASTER anymore.
* Other whichever scenarios that might break sync.
While it's possible to do a big mysqldump of both database and compare the files I would be interested in a method that can be implemented more easily and also can be checked automatically to ensure all is in sync.
Thanks
Have you tried Percona Toolkit (formerly known as Maatkit)? You can use one of their tools which is pt-table-checksum for your case. You can check other tools too at their website.
pt-table-checksum performs an online replication consistency check by
executing checksum queries on the master, which produces different
results on replicas that are inconsistent with the master. The
optional DSN specifies the master host. The tool’s exit status is
nonzero if any differences are found, or if any warnings or errors
occur.
The following command will connect to the replication master on
localhost, checksum every table, and report the results on every
detected replica:
If you have MySQL Server versions 5.6.14 or higher, you can use the MySQL Replication Synchronization Checker. It's included in the MySql server package. Is designed to work exclusively for servers that support global transaction identifiers (GTIDs) and have gtid_mode=ON.
This utility permits you to check replication servers for synchronization. It checks data consistency between a master and slaves or between two slaves. The utility reports missing objects as well as missing data.
The utility can operate on an active replication topology, applying a synchronization process to check the data. Those servers where replication is not active can still be checked but the synchronization process will be skipped. In that case, it is up to the user to manually synchronize the servers.
See MySQL Documentation for more information
You are right to be suspicious of a seemingly healthy master/slave replication setup! We were running fine when suddenly we got alerts from check_mk concerning a database that existed on our master that did not exist on our slave... but the master and slave status outputs were good! How unnerving is that? The way to prove integrity of the process is to use checksums to verify the data.
I have seen a lot of chatter on the Internet recommending pt-table-checksum . However, its limitations proved to be too onerous for us to be comfortable with. Most importantly, it requires and even sets statement-based replication (see the pt-table-checksum link). As it says in the mysql 5.6 online documentation, (for row-based replication...) "all changes can be replicated. This is the safest form of replication." There are other disadvantages to statement-based replication that make our developers nervous because some functions cannot be replicated properly; see the doc for a list.
We have already experienced issues with a master and slave using statement-based replication so we're specifically trying to avoid it.
We are going to try mysqlrplsync which specifically mentions that it "works independently of the binary log format (row, statement, or mixed)". It also mentions, however, that gtid-mode must be on and it requires MySQL 5.6.14 and higher... which means, I believe, that the MySQL delivered with RHEL7/CentOS 7 at least is out. You'll need to get the MySQL Community Edition, which is left as an exercise for the reader but you can go here for the packages or here for the repos, including RHEL derivatives and Debian.
Is there any way to get a fault tolerant MySQL replication? I am in an environment that has many networking issues. It appears that replication gets an error and just stops. I need it to continue to work and recover from these faults. There is some wrapper software that checks the state of replication and restarts it in the case of losing its log position. Is there an alternative?
Note:
Replication is done from an embedded computer with MySQL 4.1 to a external computer that has MySQL 5.0.45
What error are you getting? You also haven't described what replication scheme or Mysql version you're using. The errors you're getting are also important.
Replication usually stops when there's a primary/unique key conflict in a Master-Master replication. Other than that on a typical Master-Slave replication setup, networking issues shouldn't cause problems.
Try using Mysql 5.1 or newer, since replication in 5.0 is statement-based and causes problems in Master-Master setups, or when you're using stored-procedures.
(Also, stay away from Mysql Cluster ... noticed the advice on another comment).
Replication errors only happen if the databases get out of sync somehow, having the server simply continue would mean incoherent databases, I really doubt you'd want that.
In my experience, the only time you end up with such errors is if one of the master servers did not complete a query and the slave noticed.
In any case, if you really want to have the slave continue via some sort of chron job, you could always have a query run every few minuts asking the slave "SHOW SLAVE STATUS" then checking the error column, if it's present, send a "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER; START SLAVE;" command. But it would probably be much more apt to send an email to an admin when mysql encounters an error instead, so he/she can investigate the source of the problem and make sure the databases are actually in sync, otherwise you're likely to see more errors in the near future as the databases become more and more out of sync.
Consider MySQL Cluster using the NDB storage engine, it's meant to be shared-nothing and fault tolerant
MySQL replication will normally detect problems and reconnect anyway, continuing from where it left off.
If you're getting replication errors, it's likely that the source is something else. MySQL replication effectively does a "tail -f" on the query log and replays it on the slave (it's slightly smarter than that, but not much).
If the databases become out of sync, MySQL replication will neither detect nor repair this, but it may eventually cause it to break as a subsequent update cannot proceed due to conflicting data on the slave.
The default timeouts on the replication slave are much too long - it waits hours (or something) - you'll want to reduce this.
Data becoming out of sync is difficult to avoid, mitigation steps are:
Monitor replication using something like mk-table-checksum from Maatkit
Audit all your code for replication-unsafe queries
If using 5.1, switch to row-based replication, which is less likely to suffer from this problem
I have two servers configured in a master-master pair using MMM. I recently had an issue where the passive master received a replication error (got a packet bigger than max_allowed_packet) but the slave IO and SQL threads continued running. And seconds_behind_master was still showing as 0 even though the slave was not executing new statements.
I thought this type of error would cause replication to stop (it's done this in the past). Instead replication kept running and our monitors didn't notice the problem. Also the replication errors continually showed up in the mysql error log, instead of "Last_Error" in "show slave status".
We are running version 5.0.33.
Any ideas what happened here? thanks!
For the max allowed packet size, it sounds like your two DBs are not configured identically. At least the network protocol stuff should be identical.
Did you try show slave status on both machines?
Quiet failure is a terrible situation. I wonder what records did not make it. Do you have a way of finding out?
Are you getting periodic errors in the error log or a flood of identical errors? Is the sequence number incrementing on the passive master?
Jacob