Additional Nodes on Galera MySQL failing to add - mysql

Ok so I have a second node I am trying to add to a working galera mysql server as another node...configs here
Node A(working)
[server]
[mysqld]
[embedded]
[mysqld-5.5]
[mariadb]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=172.16.1.20
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="controller_cluster"
wsrep_cluster_address="gcomm://"
wsrep_sst_receive_addres="172.16.1.20"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync
wsrep_sst_auth=wsrep_sst:password
[mariadb-5.5]
Node B(wont start)
[server]
[mysqld]
skip-name-resolve
log = /var/log/mysqld.log
log-error = /var/log/mysqld.error.log
[embedded]
[mysqld-5.5]
[mariadb]
log = /var/log/mysqld.log
log-error = /var/log/mysqld.error.log
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=172.16.1.21
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="controller_cluster"
wsrep_cluster_address="gcomm://172.16.1.20"
wsrep_sst_receive_addres="172.16.1.21"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync
wsrep_sst_auth=wsrep_sst:password
[mariadb-5.5]
Node A Permissions on /var/lib/mysql
-rw-rw----. 1 mysql mysql 16384 Mar 4 23:54 aria_log.00000001
-rw-rw----. 1 mysql mysql 52 Mar 4 23:54 aria_log_control
-rw-r-----. 1 mysql root 283162 Mar 5 17:49 db01.deg.pod1.err
-rw-rw----. 1 mysql mysql 5 Mar 4 23:54 db01.deg.pod1.pid
-rw-------. 1 mysql mysql 134219040 Mar 5 17:48 galera.cache
-rw-rw----. 1 mysql mysql 104 Mar 5 17:48 grastate.dat
-rw-rw----. 1 mysql mysql 12582912 Mar 4 23:54 ibdata1
-rw-rw----. 1 mysql mysql 5242880 Mar 4 23:54 ib_logfile0
-rw-rw----. 1 mysql mysql 5242880 Mar 4 22:30 ib_logfile1
drwx------. 2 mysql mysql 4096 Mar 4 22:59 mysql
srwxrwxrwx. 1 mysql mysql 0 Mar 4 23:54 mysql.sock
drwx------. 2 root root 4096 Mar 4 22:59 performance_schema
-rw-r--r--. 1 mysql mysql 124 Mar 4 22:11 RPM_UPGRADE_HISTORY
-rw-r--r--. 1 mysql mysql 124 Mar 4 22:11 RPM_UPGRADE_MARKER-LAST
drwxr-xr-x. 2 mysql mysql 4096 Mar 4 22:11 test
drwx------. 2 mysql mysql 4096 Mar 5 17:35 tt
Node B Permissions on /var/lib/mysql
-rw-rw----. 1 mysql mysql 16384 Mar 5 17:49 aria_log.00000001
-rw-rw----. 1 mysql mysql 52 Mar 5 17:49 aria_log_control
-rw-r-----. 1 mysql root 0 Mar 5 17:49 db02.deg.pod1.err
-rw-------. 1 mysql mysql 134219040 Mar 5 17:49 galera.cache
-rw-rw----. 1 mysql mysql 104 Mar 5 17:49 grastate.dat
-rw-rw----. 1 mysql mysql 12582912 Mar 5 17:49 ibdata1
-rw-rw----. 1 mysql mysql 5242880 Mar 5 17:49 ib_logfile0
-rw-rw----. 1 mysql mysql 5242880 Mar 5 17:49 ib_logfile1
drwx------. 2 mysql mysql 4096 Mar 4 23:10 mysql
srwxrwxrwx 1 mysql mysql 0 Mar 5 17:49 mysql.sock
-rw-------. 1 root root 107 Mar 4 23:10 nohup.out
-rw-r--r-- 1 root root 269455 Mar 5 03:42 out.log
drwx------ 2 root root 4096 Mar 5 03:20 performance_schema
-rw-r--r--. 1 mysql mysql 124 Mar 4 22:14 RPM_UPGRADE_HISTORY
-rw-r--r--. 1 mysql mysql 124 Mar 4 22:14 RPM_UPGRADE_MARKER-LAST
drwxr-xr-x. 2 mysql mysql 4096 Mar 4 22:14 test
drwx------ 2 mysql mysql 4096 Mar 5 17:36 tt
-rw------- 1 root root 0 Mar 5 03:52 wsrep_recovery.hh4i9
Password is also the same on both ends for mysql user and for root user.
** Failure log on Node B **
140305 17:49:39 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140305 17:49:39 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.A8nOzH' --pid-file='/var/lib/mysql/db02.deg.pod1-recover.pid'
nohup: ignoring input
140305 17:49:39 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:39 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:41 mysqld_safe WSREP: Recovered position bce8f04b-a41a-11e3-b010-4ba4a408598c:0
140305 17:49:41 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:41 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:41 [Note] WSREP: wsrep_start_position var submitted: 'bce8f04b-a41a-11e3-b010-4ba4a408598c:0'
140305 17:49:41 [Note] WSREP: Read nil XID from storage engines, skipping position init
140305 17:49:41 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
140305 17:49:41 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info#codership.com> loaded successfully.
140305 17:49:41 [Note] WSREP: CRC-32C: using hardware acceleration.
140305 17:49:41 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
140305 17:49:41 [Note] WSREP: Passing config to GCS: base_host = 172.16.1.21; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 2147483647; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
140305 17:49:41 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
140305 17:49:41 [Note] WSREP: wsrep_sst_grab()
140305 17:49:41 [Note] WSREP: Start replication
140305 17:49:41 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
140305 17:49:41 [Note] WSREP: protonet asio version 0
140305 17:49:41 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
140305 17:49:41 [Note] WSREP: backend: asio
140305 17:49:41 [Note] WSREP: GMCast version 0
140305 17:49:41 [Note] WSREP: (7036f7c8-a4b8-11e3-97c3-866382997e69, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
140305 17:49:41 [Note] WSREP: (7036f7c8-a4b8-11e3-97c3-866382997e69, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
140305 17:49:41 [Note] WSREP: EVS version 0
140305 17:49:41 [Note] WSREP: PC version 0
140305 17:49:41 [Note] WSREP: gcomm: connecting to group 'controller_cluster', peer '172.16.1.20:'
140305 17:49:41 [Note] WSREP: declaring 3f183cba-a422-11e3-b1c7-52f230abd39f stable
140305 17:49:41 [Note] WSREP: Node 3f183cba-a422-11e3-b1c7-52f230abd39f state prim
140305 17:49:41 [Note] WSREP: view(view_id(PRIM,3f183cba-a422-11e3-b1c7-52f230abd39f,48) memb {
3f183cba-a422-11e3-b1c7-52f230abd39f,0
7036f7c8-a4b8-11e3-97c3-866382997e69,0
} joined {
} left {
} partitioned {
})
140305 17:49:42 [Note] WSREP: gcomm: connected
140305 17:49:42 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
140305 17:49:42 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
140305 17:49:42 [Note] WSREP: Opened channel 'controller_cluster'
140305 17:49:42 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
140305 17:49:42 [Note] WSREP: Waiting for SST to complete.
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: sent state msg: 52478b99-a4b8-11e3-9283-8207651d087e
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: got state msg: 52478b99-a4b8-11e3-9283-8207651d087e from 0 (db01.deg.pod1)
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: got state msg: 52478b99-a4b8-11e3-9283-8207651d087e from 1 (db02.deg.pod1)
140305 17:49:42 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 47,
members = 1/2 (joined/total),
act_id = 1,
last_appl. = -1,
protocols = 0/5/2 (gcs/repl/appl),
group UUID = bce8f04b-a41a-11e3-b010-4ba4a408598c
140305 17:49:42 [Note] WSREP: Flow-control interval: [23, 23]
140305 17:49:42 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1)
140305 17:49:42 [Note] WSREP: State transfer required:
Group state: bce8f04b-a41a-11e3-b010-4ba4a408598c:1
Local state: 00000000-0000-0000-0000-000000000000:-1
140305 17:49:42 [Note] WSREP: New cluster view: global state: bce8f04b-a41a-11e3-b010-4ba4a408598c:1, view# 48: Primary, number of nodes: 2, my index: 1, protocol version 2
140305 17:49:42 [Warning] WSREP: Gap in state sequence. Need state transfer.
140305 17:49:44 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.1.21' --auth 'wsrep_sst:password' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '13639''
140305 17:49:44 [Note] WSREP: Prepared SST request: rsync|172.16.1.21:4444/rsync_sst
140305 17:49:44 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140305 17:49:44 [Note] WSREP: REPL Protocols: 5 (3, 1)
140305 17:49:44 [Note] WSREP: Assign initial position for certification: 1, protocol version: 3
140305 17:49:44 [Note] WSREP: Service thread queue flushed.
140305 17:49:44 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (bce8f04b-a41a-11e3-b010-4ba4a408598c): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():445. IST will be unavailable.
140305 17:49:44 [Note] WSREP: Node 1.0 (db02.deg.pod1) requested state transfer from '*any*'. Selected 0.0 (db01.deg.pod1)(SYNCED) as donor.
140305 17:49:44 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1)
140305 17:49:44 [Note] WSREP: Requesting state transfer: success, donor: 0
140305 17:49:45 [Warning] WSREP: 0.0 (db01.deg.pod1): State transfer to 1.0 (db02.deg.pod1) failed: -1 (Operation not permitted)
140305 17:49:45 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():723: Will never receive state. Need to abort.
140305 17:49:45 [Note] WSREP: gcomm: terminating thread
140305 17:49:45 [Note] WSREP: gcomm: joining thread
140305 17:49:45 [Note] WSREP: gcomm: closing backend
140305 17:49:46 [Note] WSREP: view(view_id(NON_PRIM,3f183cba-a422-11e3-b1c7-52f230abd39f,48) memb {
7036f7c8-a4b8-11e3-97c3-866382997e69,0
} joined {
} left {
} partitioned {
3f183cba-a422-11e3-b1c7-52f230abd39f,0
})
140305 17:49:46 [Note] WSREP: view((empty))
140305 17:49:46 [Note] WSREP: gcomm: closed
140305 17:49:46 [Note] WSREP: /usr/sbin/mysqld: Terminated.
140305 17:49:46 mysqld_safe mysqld from pid file /var/lib/mysql/db02.deg.pod1.pid ended
WSREP_SST: [ERROR] Parent mysqld process (PID:13639) terminated unexpectedly. (20140305 17:49:47.601)
WSREP_SST: [INFO] Joiner cleanup. (20140305 17:49:47.603)
WSREP_SST: [INFO] Joiner cleanup done. (20140305 17:49:48.110)

One possible reason for nodes failing to add is that the existing nodes don't expect them. In your configuration files, you may just need to point each node to the rest.
For example, try using wsrep_cluster_address="gcomm://"172.16.1.20,172.16.1.21" in both galera.cnf files.

Related

Galera cluster not working use wsrep_sst_method=xtrabackup (-v2) until first use rsync

I'm trying to setup a galera cluster, with MariaDB 10.2, and percona-xtrabackup-2.3.10-1.el7.x86_64.
If bootstrap the donor by wsrep_sst_method=xtrabackup or xtrabackup-v2, the joiner will be unable to join in the cluster, and the error message complains about "no valid checkpoint".
However, if first bring up the cluster using wsrep_sst_method=rsync then change it to xtrabackup, then stop all nodes and bootstrap the donor again (by galera_new_cluster), the joiner is able to join in OK.
I suspect there was some data synchronized to the second node when using rsync.
Could you please give me some pointers about why xtrabackup doesn't work the first time?
Is it a common practice to bootstrap first time with rsync?
Or, anything else.
Any hints will be highly appreciated, and just let me know if you need more information.
Thank you for your help.
More details:
When bootstrap the first time using xtrabackup, the joiner is unable to join in, and its log says:
Jan 23 04:09:14 setsv-dr.local.example.com mysqld[6924]: WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20210123 04:09:14.948)
Jan 23 04:09:14 setsv-dr.local.example.com mysqld[6924]: WSREP_SST: [ERROR] Cleanup after exit with status:2 (20210123 04:09:14.971)
, and the donor logs say:
Jan 23 04:11:31 setsv mysqld: group UUID = a53cd166-5d68-11eb-aa0f-83c1fd168f1f
Jan 23 04:11:31 setsv mysqld: 2021-01-23 4:11:31 140325406353152 [Note] WSREP: Flow-control interval: [16, 16]
Jan 23 04:11:31 setsv mysqld: 2021-01-23 4:11:31 140325646681856 [Note] WSREP: REPL Protocols: 9 (4, 2)
Jan 23 04:11:31 setsv mysqld: 2021-01-23 4:11:31 140325646681856 [Note] WSREP: New cluster view: global state: a53cd166-5d68-11eb-aa0f-83c1fd168f1f:0, view# 35: Primary, number of nodes: 1, my index: 0, protocol version 3
Jan 23 04:11:31 setsv mysqld: 2021-01-23 4:11:31 140325646681856 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 23 04:11:31 setsv mysqld: 2021-01-23 4:11:31 140325646681856 [Note] WSREP: Assign initial position for certification: 0, protocol version: 4
Jan 23 04:11:31 setsv mysqld: 2021-01-23 4:11:31 140325655074560 [Note] WSREP: Service thread queue flushed.
Jan 23 04:11:32 setsv mysqld: WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.56.71 4444 (20210123 04:11:32.454)
Jan 23 04:11:32 setsv mysqld: WSREP_SST: [INFO] Evaluating innobackupex --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.56.71:4444; RC=( ${PIPESTATUS[#]} ) (20210123 04:11:32.457)
Jan 23 04:11:32 setsv mysqld: 2021/01/23 04:11:32 socat[15384] E connect(6, AF=2 192.168.56.71:4444, 16): Connection refused
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325134952192 [Warning] Aborted connection 26 to db: 'unconnected' user: 'sst_user' host: 'localhost' (Got an error reading communication packets)
Jan 23 04:11:32 setsv mysqld: WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20210123 04:11:32.467)
Jan 23 04:11:32 setsv mysqld: WSREP_SST: [ERROR] Cleanup after exit with status:22 (20210123 04:11:32.469)
Jan 23 04:11:32 setsv mysqld: WSREP_SST: [INFO] Cleaning up temporary directories (20210123 04:11:32.471)
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140324111660800 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.56.71:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --gtid 'a53cd166-5d68-11eb-aa0f-83c1fd168f1f:0' --gtid-domain-id '0' --mysqld-args --basedir=/usr --wsrep-new-cluster --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140324111660800 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.56.71:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --gtid 'a53cd166-5d68-11eb-aa0f-83c1fd168f1f:0' --gtid-domain-id '0' --mysqld-args --basedir=/usr --wsrep-new-cluster --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1: 22 (Invalid argument)
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140324111660800 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.56.71:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --gtid 'a53cd166-5d68-11eb-aa0f-83c1fd168f1f:0' --gtid-domain-id '0' --mysqld-args --basedir=/usr --wsrep-new-cluster --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325406353152 [Warning] WSREP: Could not find peer: b8dba97a-5d6b-11eb-8d6d-ae8d4ded2a34
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325406353152 [Warning] WSREP: 0.0 (setsv): State transfer to -1.-1 (left the group) failed: -22 (Invalid argument)
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325406353152 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 0)
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325406353152 [Note] WSREP: Member 0.0 (setsv) synced with group.
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325406353152 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325646681856 [Note] WSREP: Synchronized with group, ready for connections
Jan 23 04:11:32 setsv mysqld: 2021-01-23 4:11:32 140325646681856 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 23 04:11:36 setsv mysqld: 2021-01-23 4:11:36 140325414745856 [Note] WSREP: cleaning up b8dba97a (ssl://192.168.56.71:4567)
, file /var/lib/mysql//innobackup.backup.log says
210123 04:11:32 innobackupex: Starting the backup operation
IMPORTANT: Please check that the backup run completes successfully.
At the end of a successful backup run innobackupex
prints "completed OK!".
210123 04:11:32 Connecting to MySQL server host: localhost, user: sst_user, password: set, port: not set, socket: /var/lib/mysql/mysql.sock
Using server version 10.2.36-MariaDB
innobackupex version 2.3.10 based on MySQL server 5.6.24 Linux (x86_64) (revision id: bd0d4403f36)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql/
xtrabackup: open files limit requested 0, set to 16384
xtrabackup: using the following InnoDB configuration:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 50331648
InnoDB: No valid checkpoint found.
InnoDB: If this error appears when you are creating an InnoDB database,
InnoDB: the problem may be that during an earlier attempt you managed
InnoDB: to create the InnoDB data files, but log file creation failed.
InnoDB: If that is the case, please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/error-creating-innodb.html

Mariadb Galera node will not successfully rejoin unless it picks a particular node to rejoin via

Debian 10, Maridadb 10.3.26, Galera-3 25.3.31
I have a three node cluster. The nodes are named node3, node4, and node5. Node3 gets disconnected from the cluster on occasion.
If it picks node5 to recover from I get
2020-11-18 19:42:08 7 [Note] WSREP: Requesting state transfer: success, donor: 2
2020-11-18 19:42:08 7 [Note] WSREP: GCache history reset: 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:0 -> 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:75720363
2020-11-18 19:42:08 17 [Note] WSREP: SST received: 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:75696989
2020-11-18 19:42:08 17 [Note] WSREP: wsrep_start_position set to '57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:75696989'
2020-11-18 19:42:08 7 [Note] WSREP: Receiving IST: 23374 writesets, seqnos 75696989-75720363
2020-11-18 19:42:08 0 [Note] WSREP: 2.0 (node5): State transfer to 0.0 (node3) complete.
2020-11-18 19:42:08 0 [Note] WSREP: Member 2.0 (node5) synced with group.
2020-11-18 19:42:08 0 [Note] WSREP: (23249d11, 'tcp://0.0.0.0:4567') turning message relay requesting off
2020-11-18 19:42:15 0 [Warning] WSREP: Protocol violation. JOIN message sender 2.0 (node5) is not in state transfer (SYNCED). Message ignored.
after which node3 will sit forever never changing wsrep_ready to yes.
On the other hand, if node3 picks node4 I get all the same sort of messages except
[Warning] WSREP: Protocol violation. JOIN message sender 2.0 (node5) is not in state transfer (SYNCED). Message ignored.
does not appear and eventually node3 wsrep_ready becomes yes and the node starts to process queries.
Any idea how I much figure out the issue?
Here is some more data. This is an example of a successful join when it chooses node4 instead of node5:
2020-11-19 21:12:54 7 [Note] WSREP: State transfer required:
Group state: 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:75815331
Local state: 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:75696989
2020-11-19 21:12:54 7 [Note] WSREP: REPL Protocols: 9 (4, 2)
2020-11-19 21:12:54 7 [Note] WSREP: New cluster view: global state: 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:75815331, view# 349: Primary, number of nodes: 3, my index: 2, protocol version 3
2020-11-19 21:12:54 7 [Warning] WSREP: Gap in state sequence. Need state transfer.
2020-11-19 21:12:56 7 [Note] WSREP: Prepared SST request: mysqldump|10.4.44.82:3360
2020-11-19 21:12:56 7 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-11-19 21:12:56 7 [Note] WSREP: Assign initial position for certification: 75815331, protocol version: 4
2020-11-19 21:12:56 0 [Note] WSREP: Service thread queue flushed.
2020-11-19 21:12:56 7 [Note] WSREP: IST receiver addr using tcp://x.y.z.a:4568
2020-11-19 21:12:56 7 [Note] WSREP: Prepared IST receiver, listening at: tcp://x.y.z.a:4568
2020-11-19 21:12:56 0 [Note] WSREP: Member 2.0 (node3) requested state transfer from '*any*'. Selected 0.0 (node4)(SYNCED) as donor.
2020-11-19 21:12:56 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 75815331)
2020-11-19 21:12:56 7 [Note] WSREP: Requesting state transfer: success, donor: 0
2020-11-19 21:12:56 7 [Note] WSREP: GCache history reset: 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:0 -> 57b37aa2-d111-11e8-a015-ab6cf5f3b3ea:75815331
2020-11-19 21:12:56 0 [Note] WSREP: (fcbfdc45, 'tcp://0.0.0.0:4567') turning message relay requesting off

Percona/Galera cluster: Node consistency compromized, aborting

I have a two separate Galera clusters, will call them C1 and C2.
C1 is a main cluster and has a three nodes, C2 is emergency and has only one node. C1 connected with C2 by GTID replication for providing completely identity.
The main problem that I can't increase a number of nodes in C2 cluster. I get these errors:
2016-08-02 12:54:00 19067 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-08-02 13:00:58 19067 [Warning] WSREP: RBR event 1 Gtid apply warning: 1, 12305593
2016-08-02 13:00:58 19067 [Warning] WSREP: Failed to apply app buffer: seqno: 12305593, status: 1
at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2016-08-02 13:00:58 19067 [Warning] WSREP: RBR event 1 Gtid apply warning: 1, 12305593
2016-08-02 13:00:58 19067 [Warning] WSREP: Failed to apply app buffer: seqno: 12305593, status: 1
at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2016-08-02 13:00:58 19067 [Warning] WSREP: RBR event 1 Gtid apply warning: 1, 12305593
2016-08-02 13:00:58 19067 [Warning] WSREP: Failed to apply app buffer: seqno: 12305593, status: 1
at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2016-08-02 13:00:58 19067 [Warning] WSREP: RBR event 1 Gtid apply warning: 1, 12305593
2016-08-02 13:00:58 19067 [ERROR] WSREP: Failed to apply trx: source: 377c1c2e-5885-11e6-aed8-1e04b1e53bff version: 3 local: 0 state: APPLYING flags: 1 conn_id: 346 trx
_id: 23137618 seqnos (l: 5, g: 12305593, s: 12305592, d: 12305592, ts: 16848639299414398)
2016-08-02 13:00:58 19067 [ERROR] WSREP: Failed to apply trx 12305593 4 times
2016-08-02 13:00:58 19067 [ERROR] WSREP: Node consistency compromized, aborting...
2016-08-02 13:00:58 19067 [Note] WSREP: Closing send monitor...
2016-08-02 13:00:58 19067 [Note] WSREP: Closed send monitor.
2016-08-02 13:00:58 19067 [Note] WSREP: gcomm: terminating thread
2016-08-02 13:00:58 19067 [Note] WSREP: gcomm: joining thread
2016-08-02 13:00:58 19067 [Note] WSREP: gcomm: closing backend
The main settings that relates to the issue:
C1 and C2:
binlog_format=ROW
wsrep_sst_method=xtrabackup-v2
server_id = 1 (C1); server_id = 101 (C2)
wsrep_OSU_method = 'TOI'
What is most interesting, that with simple async replication (not GTID) all worked properly. Have you any ideas, guys?

Percona XtraDB Cluster SST not working using rsync: wsrep_sst_rsync

I'm sure there is a simple fix for this, but forgive me I'm new at PXC. I'm using rsync to transfer state of the bootstrapping node to node2. node2 is the node I am joining to the cluster. I originally tried Xtrabackup, but ran into problems which I will explore at another time. For now I'm using rsync for what I thought would be simplicity. If you scroll down to the [ERROR] you will see where the problems are causing the State Transfer to be interrupted. What could be causing this?
*WSREP_SST: **[ERROR]** find/rsync returned code 123: (20141228 02:24:40.505)
2014-12-28 02:24:40 9446 **[ERROR]** WSREP: Failed to read from: **wsrep_sst_rsync** --role 'donor' --address '192.168.70.2:4444/rsync_sst' --auth 'sst:sst' --socket*
2014-12-28 02:24:24 9446 [Note] WSREP: (2861d1d7, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2014-12-28 02:24:24 9446 [Note] WSREP: declaring 41f79045 at tcp://192.168.70.2:4567 stable
2014-12-28 02:24:24 9446 [Note] WSREP: Node 2861d1d7 state prim
2014-12-28 02:24:24 9446 [Note] WSREP: save pc into disk
2014-12-28 02:24:24 9446 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2014-12-28 02:24:24 9446 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 42495aba-8e30-11e4-9596-c702faf22ad0
2014-12-28 02:24:24 9446 [Note] WSREP: STATE EXCHANGE: sent state msg: 42495aba-8e30-11e4-9596-c702faf22ad0
2014-12-28 02:24:24 9446 [Note] WSREP: STATE EXCHANGE: got state msg: 42495aba-8e30-11e4-9596-c702faf22ad0 from 0 (node3)
2014-12-28 02:24:24 9446 [Note] WSREP: STATE EXCHANGE: got state msg: 42495aba-8e30-11e4-9596-c702faf22ad0 from 1 (node2)
2014-12-28 02:24:24 9446 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 0,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = 48ec9889-8ddc-11e4-9efd-da6610fd24da
2014-12-28 02:24:24 9446 [Note] WSREP: Flow-control interval: [23, 23]
2014-12-28 02:24:24 9446 [Note] WSREP: New cluster view: global state: 48ec9889-8ddc-11e4-9efd-da6610fd24da:0, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3
2014-12-28 02:24:24 9446 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-12-28 02:24:24 9446 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-12-28 02:24:24 9446 [Note] WSREP: Service thread queue flushed.
2014-12-28 02:24:24 9446 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2014-12-28 02:24:24 9446 [Note] WSREP: Service thread queue flushed.
2014-12-28 02:24:25 9446 [Note] WSREP: Member 1.0 (node2) requested state transfer from '*any*'. Selected 0.0 (node3)(SYNCED) as donor.
2014-12-28 02:24:25 9446 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 0)
2014-12-28 02:24:25 9446 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-12-28 02:24:25 9446 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.70.2:4444/rsync_sst' --auth 'sst:sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --binlog 'mysql-bin' --gtid '48ec9889-8ddc-11e4-9efd-da6610fd24da:0''
2014-12-28 02:24:25 9446 [Note] WSREP: sst_donor_thread signaled with 0
2014-12-28 02:24:25 9446 [Note] WSREP: Flushing tables for SST...
2014-12-28 02:24:25 9446 [Note] WSREP: Provider paused at 48ec9889-8ddc-11e4-9efd-da6610fd24da:0 (5)
2014-12-28 02:24:25 9446 [Note] WSREP: Tables flushed.
WSREP_SST: [INFO] Preparing binlog files for transfer: (20141228 02:24:26.201)
mysql-bin.000015
2014-12-28 02:24:27 9446 [Note] WSREP: (2861d1d7, 'tcp://0.0.0.0:4567') turning message relay requesting off
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest1.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest2.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest3.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest4.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest5.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest6.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest7.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest8.ibd": Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
rsync: send_files failed to open "/var/lib/mysql/mysql/innodb_index_stats.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/mysql/innodb_table_stats.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/mysql/slave_master_info.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/mysql/slave_relay_log_info.ibd": Permission denied (13)
rsync: send_files failed to open "/var/lib/mysql/mysql/slave_worker_info.ibd": Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
WSREP_SST: **[ERROR]** find/rsync returned code 123: (20141228 02:24:40.505)
2014-12-28 02:24:40 9446 **[ERROR]** WSREP: Failed to read from: **wsrep_sst_rsync** --role 'donor' --address '192.168.70.2:4444/rsync_sst' --auth 'sst:sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --binlog 'mysql-bin' --gtid '48ec9889-8ddc-11e4-9efd-da6610fd24da:0'
2014-12-28 02:24:40 9446 **[ERROR]** WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address '192.168.70.2:4444/rsync_sst' --auth 'sst:sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --binlog 'mysql-bin' --gtid '48ec9889-8ddc-11e4-9efd-da6610fd24da:0': 255 (Unknown error 255)
2014-12-28 02:24:40 9446 [Note] WSREP: resuming provider at 5
2014-12-28 02:24:40 9446 [Note] WSREP: Provider resumed.
2014-12-28 02:24:40 9446 **[ERROR]** WSREP: Command did not run: wsrep_sst_rsync --role 'donor' --address '192.168.70.2:4444/rsync_sst' --auth 'sst:sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --binlog 'mysql-bin' --gtid '48ec9889-8ddc-11e4-9efd-da6610fd24da:0'
2014-12-28 02:24:40 9446 [Warning] WSREP: 0.0 (node3): State transfer to 1.0 (node2) failed: -255 (Unknown error 255)
2014-12-28 02:24:40 9446 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 0)
2014-12-28 02:24:40 9446 [Note] WSREP: Member 0.0 (node3) synced with group.
2014-12-28 02:24:40 9446 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
2014-12-28 02:24:40 9446 [Note] WSREP: Synchronized with group, ready for connections
2014-12-28 02:24:40 9446 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-12-28 02:24:41 9446 [Note] WSREP: forgetting 41f79045 (tcp://192.168.70.2:4567)
2014-12-28 02:24:41 9446 [Note] WSREP: Node 2861d1d7 state prim
2014-12-28 02:24:41 9446 [Note] WSREP: save pc into disk
2014-12-28 02:24:41 9446 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2014-12-28 02:24:41 9446 [Note] WSREP: forgetting 41f79045 (tcp://192.168.70.2:4567)
2014-12-28 02:24:41 9446 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 4c2e3544-8e30-11e4-a0dc-d280b597b8d2
2014-12-28 02:24:41 9446 [Note] WSREP: STATE EXCHANGE: sent state msg: 4c2e3544-8e30-11e4-a0dc-d280b597b8d2
2014-12-28 02:24:41 9446 [Note] WSREP: STATE EXCHANGE: got state msg: 4c2e3544-8e30-11e4-a0dc-d280b597b8d2 from 0 (node3)
2014-12-28 02:24:41 9446 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 2,
members = 1/1 (joined/total),
act_id = 0,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = 48ec9889-8ddc-11e4-9efd-da6610fd24da
2014-12-28 02:24:41 9446 [Note] WSREP: Flow-control interval: [16, 16]
2014-12-28 02:24:41 9446 [Note] WSREP: New cluster view: global state: 48ec9889-8ddc-11e4-9efd-da6610fd24da:0, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 3
2014-12-28 02:24:41 9446 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-12-28 02:24:41 9446 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-12-28 02:24:41 9446 [Note] WSREP: Service thread queue flushed.
2014-12-28 02:24:41 9446 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2014-12-28 02:24:41 9446 [Note] WSREP: Service thread queue flushed.
2014-12-28 02:24:46 9446 [Note] WSREP: cleaning up 41f79045 (tcp://192.168.70.2:4567)
Something is wrong with your permissions of the datadirectory as you can see from the following errors:
rsync: send_files failed to open "/var/lib/mysql/sbtest/sbtest7.ibd": Permission denied (13)
Things I would check:
Are all data files in /var/lib/mysql owned by the mysql user?
do you have SELinux or AppArmor running? (It might be malconfigured).
Please note that a DONOR with rsync SST method will block reads/writes while it's the DONOR. This can cause downtime.
I suggest to look into Percona XtraBackup (The issues you have might be similar)

MariaDB Galera Cluster set up problems

I am trying to get a mariadb cluster up and running but it is not working out for me. Right now I am using MariaDB Galera 5.5.36 on a 64 bit red hat ES6 machine. I installed mariadb through this repo here:
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/5.5-galera/rhel6-amd64/
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
In the server.conf I have the following in server 1:
[mariadb]
log_error=/var/log/mariadb.log
query_cache_size=0
query_cache_type=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.211.133
wsrep_cluster_name='cluster'
wsrep_node_address='192.168.211.132'
wsrep_node_name='cluster1'
wsrep_sst_method=rsync
and on server 2 I have
[mariadb]
log_error=/var/log/mariadb.log
query_cache_size=0
query_cache_type=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.211.132
wsrep_cluster_name='cluster'
wsrep_node_address='192.168.211.133'
wsrep_node_name='cluster2'
wsrep_sst_method=rsync
When I start server 1 with the following command: sudo service mysql start --wsrep-new-cluster it starts up just fine, if I open up mysql and check the status of wsrep it says everything is up and running which is good but when I try to do sudo service mysql start on the second server I get the following in the error logs:
140609 14:47:55 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140609 14:47:56 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.i5qfm2' --pid-file='/var/lib/mysql/localhost.localdomain-recover.pid'
140609 14:47:57 mysqld_safe WSREP: Recovered position 85448d73-ebe8-11e3-9c20-fbc1995fee11:0
140609 14:47:57 [Note] WSREP: wsrep_start_position var submitted: '85448d73-ebe8-11e3-9c20-fbc1995fee11:0'
140609 14:47:57 [Note] WSREP: Read nil XID from storage engines, skipping position init
140609 14:47:57 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
140609 14:47:57 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info#codership.com> loaded successfully.
140609 14:47:57 [Note] WSREP: CRC-32C: using hardware acceleration.
140609 14:47:57 [Note] WSREP: Found saved state: 85448d73-ebe8-11e3-9c20-fbc1995fee11:-1
140609 14:47:57 [Note] WSREP: Passing config to GCS: base_host = 192.168.211.133; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
140609 14:47:57 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
140609 14:47:57 [Note] WSREP: wsrep_sst_grab()
140609 14:47:57 [Note] WSREP: Start replication
140609 14:47:57 [Note] WSREP: Setting initial position to 85448d73-ebe8-11e3-9c20-fbc1995fee11:0
140609 14:47:57 [Note] WSREP: protonet asio version 0
140609 14:47:57 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
140609 14:47:57 [Note] WSREP: backend: asio
140609 14:47:57 [Note] WSREP: GMCast version 0
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
140609 14:47:57 [Note] WSREP: EVS version 0
140609 14:47:57 [Note] WSREP: PC version 0
140609 14:47:57 [Note] WSREP: gcomm: connecting to group 'cluster', peer '192.168.211.132:,192.168.211.134:'
140609 14:48:00 [Warning] WSREP: no nodes coming from prim view, prim not possible
140609 14:48:00 [Note] WSREP: view(view_id(NON_PRIM,0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,1) memb {
0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,0
} joined {
} left {
} partitioned {
})
140609 14:48:01 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50775S), skipping check
140609 14:48:31 [Note] WSREP: view((empty))
140609 14:48:31 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():141
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -110 (Connection timed out)
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'cluster' at 'gcomm://192.168.211.132,192.168.211.134': -110 (Connection timed out)
140609 14:48:31 [ERROR] WSREP: gcs connect failed: Connection timed out
140609 14:48:31 [ERROR] WSREP: wsrep::connect() failed: 7
140609 14:48:31 [ERROR] Aborting
140609 14:48:31 [Note] WSREP: Service disconnected.
140609 14:48:32 [Note] WSREP: Some threads may fail to exit.
140609 14:48:32 [Note] /usr/sbin/mysqld: Shutdown complete
140609 14:48:32 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdomain.pid ended
I am at a loss as to why the second server cannot detect that a cluster is up and running. These machines can communicate with each other just fine, I can SSH from one to the other and they can ping each other. I tried deleted the galera cache, tried downgrading my version of mariadb galera, tried disabling SELinux, tried running the mysql service as a different user, verified that the correct ports are open, tried running them on 2 VMs on separate computers with different IP addresses, etc. Does anyone have any idea what is going on here because I have been searching for 3 days trying to fix this but no solution seems to work with me.
Here is how I fixed my similar issue.
CentOS 7 w/ MariaDB Galera 10.1.
Node2 I saw this:
016-12-27 15:40:38 140703512762624 [Warning] WSREP: no nodes coming from prim view, prim not possible
After doing some reading, I tried running this on node1.
service mysql start --wsrep-new-cluster
But this failed, and in the logs, I found this...
2016-12-27 15:44:08 140438853814528 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
So I edited the file /var/lib/mysql/grastate.dat, changing safe_to_bootstrap to 1.
I was then able to start the Primary node using:
service mysql start --wsrep-new-cluster
Then on the others, I just used
service mysql start
Note: This was in a demo pre-production environment. I promptly broke it after getting everything to work by rebooting all servers at the same time :P, but I knew there were no writes, and that the DB's were in sync. If you are in produciton and this happens, you can use the following to figure out which node to run "new-cluster" on, which is akin to saying, make me primary.
mysqld_safe --wsrep-recover
If this is a production issue, I highly reccomend reading this article and making a backup w/ CloneZilla before throwing commands at the broken clients!
https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/
The cluster must start with this command on primary node:
galera_new_cluster
after starting first node, you can start other nodes in the cluster successfully.
I believe you need to list all the IPs in the wsrep_cluster_address parameter.
wsrep_cluster_address=gcomm://192.168.211.132,192.168.211.133
This should be done on both hosts. BTW you likely want three nodes not two as to avoid split brain scenarios.