Strange slow performance in mysql replication - mysql

I have MySQL (MariaDB 10.1.27) master that hold ~1900 databases. Total dbs size ~100GB
Used InnoDB engine with innodb_file_per_table = 1
I'm going to setup two slaves for this server (MariaDB 10.1.27).
Both slaves with identical hardware like master.
master was stopeed at midnight, was made backup from /var/lib/mysql with tar
and this tar was transferred and extracted on two separated servers.
One slave setted up fast, and catched up master (1 day delay) in ~1 hour.
In time of catching up, I got 100% CPU usage (LA 3-4) with log of disk activity (~20%wa)
Second slave has some strange problem with replication performance.
After setup it catching up master more than 2 days. And procees finished 20% only.
current show slave status:
show slave status\Gshow processlist;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: XXXXXXXXXX
Master_User: XXXXXXXX
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysqld-bin.000572
Read_Master_Log_Pos: 524619108
Relay_Log_File: relay-bin.000004
Relay_Log_Pos: 916671538
Relay_Master_Log_File: mysqld-bin.000571
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table: apsc.%,billing.%,information_schema.%,performance_schema.%,horde.%,mysql.%,phpmyadmin_gmRJC4rn_tom.%,psa.%,roundcubemail.%,sitebuilder5.%
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 916671245
Relay_Log_Space: 1598365917
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 24730
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 111
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
1 row in set (0.00 sec)
No serious activity present on CPU and on disk (see top screen later)
======
top - 03:36:36 up 114 days, 12:26, 9 users, load average: 1.82, 1.86, 1.89
Tasks: 364 total, 1 running, 362 sleeping, 0 stopped, 1 zombie
Cpu(s): 1.5%us, 0.7%sy, 0.0%ni, 87.7%id, 4.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32993560k total, 32271192k used, 722368k free, 625464k buffers
Swap: 4194296k total, 229512k used, 3964784k free, 23100280k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13809 mysql 1 -19 6819m 5.8g 7332 S 8.9 18.5 132:12.33 mysqld
======
As I see, all bin logs transferred from master to slave currently,
but slave have slow performance in applying this logs to dbs.
In slow query log (2 seconds limit) I have no any records.
Googling problem not help. Already tried:
skip-name-resolve
innodb_flush_log_at_trx_commit=0
innodb-flush-method=O_DIRECT
#move innodb logs to SSD
nnodb_log_group_home_dir=/var/lib/mysql_logs
Setup multithreaded replication attempt (2 and 4 threads) was failed, because slowing down process more that current state. Slave got increasing lag (Seconds_Behind_Master in show slave status)
Please advice me what can be source of problem and what can be made to make replication faster.

Related

Issues with MySql replication on MariaDB

I have been trying to get MySQL replication set up on digital ocean with forge servers & Maria DB.
I keep getting this error when running slave status\g :
Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
This is the tutorial I followed:
https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql
I've checked the server-id in both my.conf files and the master is set to 1 and the slave 2.
Here's a dump of the full status\g output
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: *****
Master_User: slave_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000017
Read_Master_Log_Pos: 642
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 4 <br>
Relay_Master_Log_File: mariadb-bin.000017
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 1
Exec_Master_Log_Pos: 642
Relay_Log_Space: 249
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1593
Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
Can anyone help?
Check that the config file is being used. It is probably /etc/my.cnf (not my.conf).
Run SHOW VARIABLES LIKE 'server_id'; on both servers.
Check that server_id is in the [mysqld] section of my.cnf.

mysql data not replicating, but replication seems fine

I have a 4 machine mysql cluster running mariadb 10.0.21. I have two masters setup, that are slaving off of each other. And then two read-only slaves that are replicating from the 1st and 2nd db’s respectively.
In short, the way its setup is like this:
db1 replicates -> db2
db1 replicates -> db3
db2 replicates -> db1
db2 replicates -> db4
It's a pretty basic 4 db setup.
The problem I'm having is that I setup a nagios user to monitor the cluster on the 1st db.
I checked and I could log into the first 3 databases but not the 4th one with the nagios user. I could do that without creating the nagios user on each db because I'm replicating my mysql database on the first two database machines.
But for some reason the nagios user never was created on db4 the way it was on db2 even tho replication seems fine on all nodes.
As you probably recall, I have db 2 replicating to the 4th db.
And if I do a show master status on db 2, I can see that I'm replicating the mysql db:
MariaDB [mysql]> show master status;
+--------------------+----------+------------------------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+--------------------+----------+------------------------------+------------------+
| mariadb-bin.000078 | 376 | jfwiki,jokefire,bacula,mysql | |
+--------------------+----------+------------------------------+------------------+
1 row in set (0.00 sec)
If I check that the nagios user is there on the 2nd db, it is.
MariaDB [mysql]> select User,Host,Password from user where user like 'nagios';
+--------+-------------+-------------------------------------------+
| User | Host | Password |
+--------+-------------+-------------------------------------------+
| nagios | 52.4.204.96 | *somepasswordhash |
+--------+-------------+-------------------------------------------+
1 row in set (0.00 sec)
This user was not created on db2, but is there because of the replication.
And if I check the slave status on db4, replication seems completely fine:
[root#db4:~] #mysql -e "show slave status\G" | egrep "Slave_IO_State|Master_Host|Slave_IO_Running|Slave_SQL_Running|Last_Errno|Seconds_Behind_Master"
Slave_IO_State: Waiting for master to send event
Master_Host: db2.example.com
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Seconds_Behind_Master: 0
And if I check for the presence of the nagios user on db4 the way I did on db2, the user just isn't there:
MariaDB [mysql]> select User,Host from user where user like 'nagios';
Empty set (0.00 sec)
So my question is, why did the nagios user not get replicated to db4 the way it did from db1 -> db2 and from db1 -> db3? Even tho slave replication on db4 seems to be okay? I could log into all those hosts using the nagios user from the monitoring host.
Here's the full output of the slave replication command on db4 in case I missed anything from my grep earlier:
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: db2.example.com
Master_User: jf_slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000078
Read_Master_Log_Pos: 376
Relay_Log_File: db4-relay-bin.000044
Relay_Log_Pos: 537
Relay_Master_Log_File: mariadb-bin.000078
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 376
Relay_Log_Space: 1121
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: Yes
Master_SSL_CA_File: /opt/mysql/ca.crt
Master_SSL_CA_Path:
Master_SSL_Cert: /opt/mysql/db4.example.com.crt
Master_SSL_Cipher:
Master_SSL_Key: /opt/mysql/db4.example.com.key
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_SSL_Crl: /opt/mysql/ca.crt
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
1 row in set (0.00 sec)
I am seeing some errors in the mariadb logs on db4, however they're no different than the errors I'm seeing on the 1st 3 databases where the nagios user data replicated successfully.
151004 15:34:36 [Note] Error reading relay log event: slave SQL thread was killed
151004 15:34:36 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
151004 15:34:36 [Note] Slave I/O thread killed while reading event
151004 15:34:36 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000078', position 376
151004 15:36:47 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.000078' at position 376, relay log './db4-relay-bin.000042' position: 537
151004 15:36:47 [Note] Slave I/O thread: connected to master 'jf_slave#db2.example.com:3306',replication started in log 'mariadb-bin.000078' at position 376
151007 4:24:12 [Note] Error reading relay log event: slave SQL thread was killed
151007 4:24:12 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
151007 4:24:12 [Note] Slave I/O thread killed while reading event
151007 4:24:12 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000078', position 376
151007 4:28:20 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.000078' at position 376, relay log './db4-relay-bin.000043' position: 537
151007 4:28:20 [Note] Slave I/O thread: connected to master 'jf_slave#db2.example.com:3306',replication started in log 'mariadb-bin.000078' at position 376
So why would there be this database inconsistency, when all the indicators of replication seem okay? Why can't the nagios user log into db4 the way it can on the 1st 3 db's?
Thanks
Because you need to set Log_slave_updates in the CNF file on each server.

Mysql slave out of sync after crash

We have a "1 master, 1 slave" MySQL setup. We had a sudden power outage that took down the slave. After getting the machine back up, I found that the slave was out of sync with the master:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.1
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-log.001576
Read_Master_Log_Pos: 412565824
Relay_Log_File: mysqld-relay-bin.002671
Relay_Log_Pos: 6930
Relay_Master_Log_File: mysql-log.001573
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table: blah.table2
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table blah.info; Can't find record in 'info', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log.001573, end_log_pos 689031225
Skip_Counter: 0
Exec_Master_Log_Pos: 689030864
Relay_Log_Space: 2944772417
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table blah.info; Can't find record in 'info', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log.001573, end_log_pos 689031225
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
We're using a binlog format of "ROW", so when I try to use mysqlbinlog to look at the offending row, I don't see anything of use. I don't want to simply set the skip counter, because I think that would throw my table even further out of sync.
Is there anything I can do on the slave that would essentially "roll back" to a given point in time, where I could then reset the master log number, poition, etc? If not, is there anything at all that I can do to get back in sync?
One can usually recover from small discrepancies using pt-table-checksum and pt-table-sync.
It looks to me like your slave lost its place in the binary log sequence when it crashes. The slave continually writes its last processed binlog event into datadir/relay-log.info, but this file uses buffered writes, so it is susceptible to losing data in a crash.
That's why Percona Server created a crash-resistant replication feature to store the same replica info in an InnoDB table, to recover from this scenario.
MySQL 5.6 has implemented a similar feature: you can set relay_log_info_repository=TABLE so the replica saves its state in a crash-resistant way.
Re your comment:
Yes, in theory pt-table-sync can fix any amount of replication drift, but it's not necessarily the most efficient way to correct large discrepancies. At some point, it's quicker and more efficient to trash the outdated replica and reinitialize it using a new backup from the master.
Check out How to setup a slave for replication in 6 simple steps with Percona Xtrabackup.

Replicate tables from different database of same mysql server

I have one server with 2 databases, and i want to replicate several tables from one database to another. Purpose is that we uses same user's table that used in projects.
As in anothers tables used InnoDB with foreign keys to users table i've chosen a replication way.
For that I made the changes for my.cnf
master-user=root
server-id = 2
replicate-rewrite-db = dou->jobs
replicate-do-table = jobs.auth\_user
replicate-wild-do-table = jobs.geo\_%
replicate-do-table = jobs.user\_profile
replicate-same-server-id = 1
report-host = master-is-slave
binlog-do-db = dou
log-bin
after syncing tables from binlog-do-db and starting slave error.log next lines appears:
111112 15:10:22 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='localhost', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='localhost', master_port='3306', master_log_file='mysql-bin.000074', master_log_pos='106'.
111112 15:10:36 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000074' at position 106, relay log '/var/log/mysql/dell-relay-bin.000001' position: 4
111112 15:10:36 [Note] Slave I/O thread: connected to master 'root#localhost:3306',replication started in log 'mysql-bin.000074' at position 106
Seems that on this step everything is ok, and show slave status shows no errors.
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: localhost
Master_User: root
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000074
Read_Master_Log_Pos: 814
Relay_Log_File: dell-relay-bin.000002
Relay_Log_Pos: 959
Relay_Master_Log_File: mysql-bin.000074
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table: jobs.user\_profile,jobs.auth\_user
Replicate_Ignore_Table:
Replicate_Wild_Do_Table: jobs.geo\_%
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 814
Relay_Log_Space: 1113
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
ERROR:
No query specified
The thing is that master changes does not affect slave but slave status changes.
Thanks for any help in solve that problem.
I won't tell you that replicating table to the same DB is bad idea, cause of several times increased IO.
Slave is not updated, because server-id is the same for both master and slave. Usually slave ignores updates with the same server-id as it's own.
Add replicate-same-server-id to my.cnf. replicate-same-server-id documentation
Out of interest would a view do what you want?
I really don't think running replication on the same instance of mysql into to the same instance is a good idea.
The other option you might want to investigate if server hardware is a problem, would to run multiple instances of mysql running off different ports on the same machine, which might help you achieve what you are looking for. This is something i am using in a test environment for simulated master DB failure and slave DB promotion.
There is many situations on where you can achive more optimisations for example. DB1 replicates to DB2 (same server). From where DB2 only has data stored for as long as 1 week. all data that is older than 1 weeks gets deleted (On db2). in such a setup for example a high traffic db server where u need it to be as clean as possible a dual db setup on the same server uses "less" resources from the server if what your server do most of the time is read data from the database. I have such a setup buth i use 4 diferent servers. server 1) 3 days. server 2) 30 days. server 3) 2 months, and server 4 all data from start. (Server 4 mostly used for gething very old registers not so much used. Sorry for my english buth i think i made a point on where you CAN and should use db replication on the same server to reduce memmory usage and cpu usage.

Mysql Slave not updating

I have replication set up every thing looks fine I have not errors , but the data is not being moved to the Slave
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: xxxxx
Master_User: xxxxxx
Master_Port: xxxx
Connect_Retry: 30
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 98
Relay_Log_File: xxxxx-relay-bin.002649
Relay_Log_Pos: 235
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 98
Relay_Log_Space: 235
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)
Run a show master status or show master status\G on the master DB. It will give you the correct values to update your slave with.
From your slave status, it looks like your slave has successfully connected to the master and is awaiting log events. To me, this means your slave user has been properly set up, and has the correct access. It really seems like you just need to sync the correct log file position.
Careful, because to get a good sync, you should probably stop the master, dump the DB, record the master log file positions, then start the master, import the DB on the slave, and finally start the slave in slave mode using the correct master log file pos. I've done this about 30 times, and if you don't follow those steps almost exactly, you will get a bad sync.
there could be couple of issues
master did not know about slave.
slave and master are not in sync with relay log file.
you have to sync the slave with master from where it did not updated. then you start slave. it should work fine.