Python dpkt throws NeedData exception on a valid pcap - exception

This is a duplicate to an unsolved question
My code is very simple:
for pcap_path in pcaps:
f = open(pcap_path)
pcap = dpkt.pcap.Reader(f)
i = 1
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
if tcp.dport == 80 and len(tcp.data) > 0:
http = dpkt.http.Request(tcp.data)
lst.append(http.headers['host'])
f.close()
and here is the pcap
I don't wanna use other pcap parsers because dpkt is BY FAR the fastest.
It's like x50 times faster than scapy for example.
It fails in the following packets:
Failed in packet 1
Failed in packet 6
Failed in packet 7
Failed in packet 8
Failed in packet 10
Failed in packet 12
Failed in packet 14
Failed in packet 19
Failed in packet 21
Failed in packet 22
Failed in packet 24
Failed in packet 26
Failed in packet 28
Failed in packet 30
Failed in packet 32
Failed in packet 34
Failed in packet 36
Failed in packet 38
Failed in packet 41
Failed in packet 42
Failed in packet 45
Failed in packet 46
Failed in packet 48
Failed in packet 50
Failed in packet 52
Failed in packet 54
Failed in packet 57
Failed in packet 58
Failed in packet 60
Failed in packet 62
Failed in packet 64
Failed in packet 68
Failed in packet 70
Failed in packet 72
Failed in packet 78
Failed in packet 80
Failed in packet 90
Failed in packet 92
Failed in packet 94
Failed in packet 98
Failed in packet 100
Failed in packet 102
Failed in packet 106
Failed in packet 108
Failed in packet 110
Failed in packet 114
Failed in packet 116
Failed in packet 118
Failed in packet 120
Failed in packet 124
Failed in packet 126
Failed in packet 128
Failed in packet 130
Failed in packet 132
Failed in packet 134
Failed in packet 137
Failed in packet 143
Failed in packet 145
Failed in packet 155
Failed in packet 157
Failed in packet 159
Failed in packet 161
Failed in packet 163
Failed in packet 165
Failed in packet 169
Failed in packet 171
Failed in packet 173
Failed in packet 175
Failed in packet 178
Failed in packet 180
Failed in packet 184
Failed in packet 186
Failed in packet 188
Failed in packet 190
Failed in packet 193
Failed in packet 194
Failed in packet 196
Failed in packet 200
Failed in packet 202
Failed in packet 204
Failed in packet 208
Failed in packet 210
Failed in packet 212
Failed in packet 216
Failed in packet 218
Failed in packet 220
Failed in packet 226
Failed in packet 228
Failed in packet 238
Failed in packet 240
Failed in packet 242
Failed in packet 244
Failed in packet 248
Failed in packet 250
Failed in packet 252
Failed in packet 256
Failed in packet 258
Failed in packet 260
Failed in packet 264
Failed in packet 266
Failed in packet 268
Failed in packet 272
Failed in packet 274
Failed in packet 276
Failed in packet 280
Failed in packet 282
Failed in packet 284
Failed in packet 288
Failed in packet 290
Failed in packet 292
Failed in packet 296
Failed in packet 298
Failed in packet 300
Failed in packet 304
Failed in packet 306
Failed in packet 308
Failed in packet 312
Failed in packet 314
Failed in packet 316

dpkt checks if the value of the HTTP Content-Length header, and the length of the actual data match. This is enforced strictly. It will be fixed soon.
In the interim, you can make this work my commenting this line in the dpkt library, and adding a dummy pass statement in its stead.

Related

Error establishing a database connection - AWS Linux 2

Using AWS Linux 2, on my wp site this error "Error establishing a database connection", I believe I was hacked via the wp blog page and the entire site is disabled. Using MySQL:
$ sudo systemctl start mysqld
Job for mysqld.service failed because the control process exited with error code. See "systemctl status mysqld.service" and "journalctl -xe" for details.
At first didn't realize that I needed to run those two commands in the CLI. Here is the output for systemctl status mysqld.service -l:
$ systemctl status mysqld.service -l
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2022-02-13 21:00:30 UTC; 6h ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Process: 19959 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS (code=exited, status=1/FAILURE)
Process: 19926 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
Main PID: 19959 (code=exited, status=1/FAILURE)
Status: "Server startup in progress"
Error: 13 (Permission denied)
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal mysqld[19959]: 2022-02-13T21:00:30.730728Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.28) starting as process 19959
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal mysqld[19959]: 2022-02-13T21:00:30.734108Z 0 [Warning] [MY-010091] [Server] Can't create test file /var/lib/mysql/mysqld_tmp_file_case_insensitive_test.lower-test
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal mysqld[19959]: 2022-02-13T21:00:30.734121Z 0 [Warning] [MY-010159] [Server] Setting lower_case_table_names=2 because file system for /var/lib/mysql/ is case insensitive
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal mysqld[19959]: 2022-02-13T21:00:30.734720Z 0 [ERROR] [MY-010187] [Server] Could not open file '/var/log/mysqld.log' for error logging: Permission denied
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal mysqld[19959]: 2022-02-13T21:00:30.736749Z 0 [ERROR] [MY-010119] [Server] Aborting
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal mysqld[19959]: 2022-02-13T21:00:30.736912Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.28) MySQL Community Server - GPL.
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal systemd[1]: mysqld.service: main process exited, code=exited, status=1/FAILURE
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal systemd[1]: Failed to start MySQL Server.
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal systemd[1]: Unit mysqld.service entered failed state.
Feb 13 21:00:30 ip-172-31-91-154.ec2.internal systemd[1]: mysqld.service failed.
and running journalctl -xe -l:
Feb 14 03:17:27 ip-172-31-91-154.ec2.internal sshd[29726]: Excess permission or bad ownership on file /var/lo
Feb 14 03:17:27 ip-172-31-91-154.ec2.internal sshd[29726]: Received disconnect from 81.70.242.147 port 43604:
Feb 14 03:17:27 ip-172-31-91-154.ec2.internal sshd[29726]: Disconnected from 81.70.242.147 port 43604 [preaut
Feb 14 03:17:41 ip-172-31-91-154.ec2.internal sshd[29728]: Connection closed by 104.243.26.5 port 46890 [prea
Feb 14 03:18:06 ip-172-31-91-154.ec2.internal sshd[29730]: Invalid user yw from 82.196.5.221 port 47008
Feb 14 03:18:06 ip-172-31-91-154.ec2.internal sshd[29730]: Excess permission or bad ownership on file /var/lo
Feb 14 03:18:06 ip-172-31-91-154.ec2.internal sshd[29730]: input_userauth_request: invalid user yw [preauth]
Feb 14 03:18:06 ip-172-31-91-154.ec2.internal sshd[29730]: pam_unix(sshd:auth): check pass; user unknown
Feb 14 03:18:06 ip-172-31-91-154.ec2.internal sshd[29730]: pam_unix(sshd:auth): authentication failure; logna
Feb 14 03:18:08 ip-172-31-91-154.ec2.internal sshd[29730]: Failed password for invalid user yw from 82.196.5.
Feb 14 03:18:08 ip-172-31-91-154.ec2.internal sshd[29730]: Excess permission or bad ownership on file /var/lo
Feb 14 03:18:08 ip-172-31-91-154.ec2.internal sshd[29730]: Received disconnect from 82.196.5.221 port 47008:1
Feb 14 03:18:08 ip-172-31-91-154.ec2.internal sshd[29730]: Disconnected from 82.196.5.221 port 47008 [preauth
Feb 14 03:18:15 ip-172-31-91-154.ec2.internal sshd[29732]: Invalid user server from 120.53.121.152 port 59070
Feb 14 03:18:15 ip-172-31-91-154.ec2.internal sshd[29732]: Excess permission or bad ownership on file /var/lo
Feb 14 03:18:15 ip-172-31-91-154.ec2.internal sshd[29732]: input_userauth_request: invalid user server [preau
Feb 14 03:18:15 ip-172-31-91-154.ec2.internal sshd[29732]: pam_unix(sshd:auth): check pass; user unknown
Feb 14 03:18:15 ip-172-31-91-154.ec2.internal sshd[29732]: pam_unix(sshd:auth): authentication failure; logna
Feb 14 03:18:18 ip-172-31-91-154.ec2.internal sshd[29732]: Failed password for invalid user server from 120.5
Feb 14 03:18:18 ip-172-31-91-154.ec2.internal sshd[29732]: Excess permission or bad ownership on file /var/lo
Feb 14 03:18:19 ip-172-31-91-154.ec2.internal sshd[29732]: Received disconnect from 120.53.121.152 port 59070
Feb 14 03:18:19 ip-172-31-91-154.ec2.internal sshd[29732]: Disconnected from 120.53.121.152 port 59070 [preau
Feb 14 03:18:27 ip-172-31-91-154.ec2.internal dhclient[2827]: XMT: Solicit on eth0, interval 111530ms.
I tried removing the db and reinstalling but this did not make any difference.
the issue is this line:
[ERROR] [MY-010187] [Server] Could not open file '/var/log/mysqld.log' for error logging: Permission denied
Which is supported by this error message:
access forbidden by rule
So the issue is permissions to the /var/log/mysqld.log file. What is needed is to create permissions for mysql to access this file - running this command with a server reboot solved this issue:
$ sudo chown mysql:mysql /var/log/mysqld.log
Deleting the ec2 instance and starting over was the direction I took. Still no idea of what caused the error however am double checking security holes and patching up. Thanks everyone for responding.
best

Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon

I am using MySQL Percona XtraDB Cluster with 3 node and MySQL service in one node did not work because of disk full. after fix the issue we try to start MySQL on this node but the following error occur: (before this issue all node had been working fine)
$ sudo /etc/init.d/mysql start
[....] Starting mysql (via systemctl): mysql.serviceJob for mysql.service failed because the control process exited with error code.
See "systemctl status mysql.service" and "journalctl -xe" for details.
failed!
$ sudo systemctl status mysql.service
● mysql.service - LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon
Loaded: loaded (/etc/init.d/mysql; generated)
Drop-In: /etc/systemd/system/mysql.service.d
└─override.conf
Active: failed (Result: exit-code) since Tue 2020-03-24 11:57:55 +0430; 1h 2min ago
Docs: man:systemd-sysv-generator(8)
Process: 19397 ExecStart=/etc/init.d/mysql start (code=exited, status=1/FAILURE)
Mar 24 11:57:47 server-3 systemd[1]: Starting LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Mar 24 11:57:47 server-3 mysql[19397]: * Starting MySQL (Percona XtraDB Cluster) database server mysqld
Mar 24 11:57:55 server-3 mysql[19397]: * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).
Mar 24 11:57:55 server-3 mysql[19397]: ...fail!
Mar 24 11:57:55 server-3 systemd[1]: mysql.service: Control process exited, code=exited status=1
Mar 24 11:57:55 server-3 systemd[1]: mysql.service: Failed with result 'exit-code'.
Mar 24 11:57:55 server-3 systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
Operating System: Ubuntu 18.04.2 LTS
mysql Ver 14.14 Distrib 5.7.25-28, for debian-linux-gnu (x86_64) using 7.0
also please find journalctl log in below:
$ sudo journalctl -xe
Journal file /var/log/journal/7e072ef87bdf452f8a9684d905f87daf/user-1863206337.journal is truncated, ignoring file.
Mar 24 19:05:22 server-3 pmm-agent[30366]: INFO[2020-03-24T19:05:22.235+04:30] time="2020-03-24T19:05:22+04:30" level=error msg="Error pinging mysqld: dial tcp 1
Mar 24 19:05:22 server-3 mysql[6820]: * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).
Mar 24 19:05:22 server-3 mysql[6820]: ...fail!
Mar 24 19:05:22 server-3 systemd[1]: mysql.service: Control process exited, code=exited status=1
Mar 24 19:05:22 server-3 systemd[1]: mysql.service: Failed with result 'exit-code'.
Mar 24 19:05:22 server-3 systemd[1]: mysql.service: Failed with result 'exit-code'.
Mar 24 19:05:22 server-3 systemd[1]: mysql.service: Failed with result 'exit-code'.
Mar 24 19:05:22 server-3 systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
-- Subject: Unit mysql.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit mysql.service has failed.
--
-- The result is RESULT.
Mar 24 19:05:22 server-3 sudo[6779]: pam_unix(sudo:session): session closed for user root
Mar 24 19:05:22 server-3 kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20170831/exfield-427)
Mar 24 19:05:22 server-3 kernel: No Local Variables are initialized for Method [_PMM]
Mar 24 19:05:22 server-3 kernel: No Arguments are initialized for method [_PMM]
Mar 24 19:05:22 server-3 kernel: ACPI Error: Method parse/execution failed \_SB.PMI0._PMM, AE_AML_BUFFER_LIMIT (20170831/psparse-550)
Mar 24 19:05:22 server-3 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20170831/power_meter-338)
Mar 24 19:05:23 server-3 snmpd[1805]: Connection from UDP: [192.168.2.10]:54620->[192.168.8.3]:161
Mar 24 19:05:23 server-3 snmpd[1805]: Connection from UDP: [192.168.2.10]:54647->[192.168.8.3]:161
Mar 24 19:05:25 server-3 pmm-agent[30366]: INFO[2020-03-24T19:05:25.550+04:30] time="2020-03-24T19:05:25+04:30" level=error msg="Error pinging mysqld: dial tcp 1
Mar 24 19:05:26 server-3 snmpd[1805]: Connection from UDP: [192.168.2.10]:55485->[192.168.8.3]:161
Mar 24 19:05:26 server-3 snmpd[1805]: Connection from UDP: [192.168.2.10]:55500->[192.168.8.3]:161
Mar 24 19:05:27 server-3 pmm-agent[30366]: INFO[2020-03-24T19:05:27.235+04:30] time="2020-03-24T19:05:27+04:30" level=error msg="Error pinging mysqld: dial tcp 1
Mar 24 19:05:27 server-3 kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20170831/exfield-427)
Mar 24 19:05:27 server-3 kernel: No Local Variables are initialized for Method [_PMM]
Mar 24 19:05:27 server-3 kernel: No Arguments are initialized for method [_PMM]
Mar 24 19:05:27 server-3 kernel: ACPI Error: Method parse/execution failed \_SB.PMI0._PMM, AE_AML_BUFFER_LIMIT (20170831/psparse-550)
Mar 24 19:05:27 server-3 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20170831/power_meter-338)
Mar 24 19:05:30 server-3 pmm-agent[30366]: INFO[2020-03-24T19:05:30.550+04:30] time="2020-03-24T19:05:30+04:30" level=error msg="Error pinging mysqld: dial tcp 1
Mar 24 19:05:31 server-3 pmm-agent[30366]: ERRO[2020-03-24T19:05:31.438+04:30] cannot select ##slow_query_log_file: dial tcp 127.0.0.1:3306: connect: connection
Mar 24 19:05:32 server-3 pmm-agent[30366]: INFO[2020-03-24T19:05:32.234+04:30] time="2020-03-24T19:05:32+04:30" level=error msg="Error pinging mysqld: dial tcp 1
Mar 24 19:05:32 server-3 kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20170831/exfield-427)
Mar 24 19:05:32 server-3 kernel: No Local Variables are initialized for Method [_PMM]
Mar 24 19:05:32 server-3 kernel: No Arguments are initialized for method [_PMM]
Mar 24 19:05:32 server-3 kernel: ACPI Error: Method parse/execution failed \_SB.PMI0._PMM, AE_AML_BUFFER_LIMIT (20170831/psparse-550)
Mar 24 19:05:32 server-3 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20170831/power_meter-338)
Mar 24 19:05:35 server-3 pmm-agent[30366]: INFO[2020-03-24T19:05:35.550+04:30] time="2020-03-24T19:05:35+04:30" level=error msg="Error pinging mysqld: dial tcp 1
Mar 24 19:05:36 server-3 sudo[7805]: admin_user : problem with defaults entries ; TTY=pts/0 ; PWD=/etc/systemd/system/mysql.service.d ; USER=root ;
Mar 24 19:05:36 server-3 sudo[7805]: admin_user : TTY=pts/0 ; PWD=/etc/systemd/system/mysql.service.d ; USER=root ; COMMAND=/bin/journalctl -xe
Mar 24 19:05:36 server-3 sudo[7805]: pam_unix(sudo:session): session opened for user root by admin_user(uid=0)
and please find mysqld.log in below:
/var/log$ sudo tail -n50 mysqld.log
2020-03-23T20:44:32.844540Z 36 [ERROR] WSREP: Node consistency compromised, aborting...
2020-03-23T20:44:32.844642Z 13 [ERROR] WSREP: Failed to apply trx: source: 89d92e92-6638-11e9-bdcb-7f2c26717188 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 26113114 trx_id: 5580550846 seqnos (l: 151018291, g: 793430465, s: 793430462, d: 793430355, ts: 33700460567224654)
2020-03-23T20:44:32.844671Z 13 [ERROR] WSREP: Failed to apply trx 793430465 4 times
2020-03-23T20:44:32.844683Z 13 [ERROR] WSREP: Node consistency compromised, aborting...
2020-03-23T20:44:33.255602Z 21 [Note] WSREP: (c0cb4c75, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.8.1:4567 tcp://192.168.8.2:4567
2020-03-23T20:44:37.817120Z 21 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2020-03-23T20:44:37.817139Z 21 [Note] WSREP: declaring node with index 1 suspected, timeout PT5S (evs.suspect_timeout)
2020-03-23T20:44:37.817147Z 21 [Note] WSREP: evs::proto(c0cb4c75, LEAVING, view_id(REG,82cab7cb,31)) suspecting node: 82cab7cb
2020-03-23T20:44:37.817152Z 21 [Note] WSREP: evs::proto(c0cb4c75, LEAVING, view_id(REG,82cab7cb,31)) suspected node without join message, declaring inactive
2020-03-23T20:44:37.817158Z 21 [Note] WSREP: evs::proto(c0cb4c75, LEAVING, view_id(REG,82cab7cb,31)) suspecting node: 89d92e92
2020-03-23T20:44:37.817161Z 21 [Note] WSREP: evs::proto(c0cb4c75, LEAVING, view_id(REG,82cab7cb,31)) suspected node without join message, declaring inactive
2020-03-23T20:44:37.817182Z 21 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,82cab7cb,31)
memb {
c0cb4c75,0
}
joined {
}
left {
}
partitioned {
82cab7cb,0
89d92e92,0
}
)
2020-03-23T20:44:37.817203Z 21 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2020-03-23T20:44:37.817358Z 21 [Note] WSREP: gcomm: closed
2020-03-23T20:44:37.817463Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-03-23T20:44:37.817504Z 0 [Note] WSREP: Flow-control interval: [10000, 10000]
2020-03-23T20:44:37.817515Z 0 [Note] WSREP: Trying to continue unpaused monitor
2020-03-23T20:44:37.817522Z 0 [Note] WSREP: Received NON-PRIMARY.
2020-03-23T20:44:37.817524Z 13 [ERROR] WSREP: non-standard exception
2020-03-23T20:44:37.817530Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> OPEN (TO: 793430465)
2020-03-23T20:44:37.817568Z 13 [Note] WSREP: applier thread exiting (code:8)
2020-03-23T20:44:37.817584Z 13 [Note] WSREP: Starting Shutdown
2020-03-23T20:44:37.817587Z 0 [Note] WSREP: Received self-leave message.
2020-03-23T20:44:37.817529Z 36 [ERROR] WSREP: non-standard exception
2020-03-23T20:44:37.817643Z 36 [Note] WSREP: applier thread exiting (code:8)
2020-03-23T20:44:37.817656Z 36 [Note] WSREP: Starting Shutdown
2020-03-23T20:44:37.817759Z 0 [Note] WSREP: Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN
2020-03-23T20:44:37.817813Z 0 [Note] WSREP: Flow-control interval: [10000, 10000]
2020-03-23T20:44:37.817841Z 0 [Note] WSREP: Trying to continue unpaused monitor
2020-03-23T20:44:37.817852Z 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2020-03-23T20:44:37.817860Z 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 793430465)
2020-03-23T20:44:37.817870Z 0 [Note] WSREP: RECV thread exiting 0: Success
2020-03-23T20:44:37.818185Z 21 [Note] WSREP: recv_thread() joined.
2020-03-23T20:44:37.818192Z 21 [Note] WSREP: Closing replication queue.
2020-03-23T20:44:37.818195Z 21 [Note] WSREP: Closing slave action queue.
2020-03-23T20:44:37.818247Z 21 [Note] WSREP: /usr/sbin/mysqld: Terminated.
How to start MySQL in this node?
thanks in advance.
As your myslqd.log shows, the disk full condition has left your node in a state, where it can find no way to achieve consistency with the cluster.
This implies, that you have to remove the node from the cluster, empty it out and then re-add it like a new node.

Kubernetes Galera: WSREP: failed to open gcomm backend connection: 110:

I am trying to setup a Kubernetes Galera 3 Replica set. I am using:
https://github.com/kubernetes/kubernetes/tree/master/test/e2e/testing-manifests/statefulset/mysql-galera
The first pod spins up fine, but the second pod gets stuck:
1 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():162
2018-07-21 18:24:40 1 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2018-07-21 18:24:40 1 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1379: Failed to open channel 'mysql' at 'gcomm://mysql-0.galera.mysql.svc.cluster.local,mysql-1.galera.mysql.svc.cluster.local': -110 (Connection timed out)
2018-07-21 18:24:40 1 [ERROR] WSREP: gcs connect failed: Connection timed out
2018-07-21 18:24:40 1 [ERROR] WSREP: wsrep::connect(gcomm://mysql-0.galera.mysql.svc.cluster.local,mysql-1.galera.mysql.svc.cluster.local) failed: 7
2018-07-21 18:24:40 1 [ERROR] Aborting
Do I need to have etcd setup for this cluster to work? Any suggestions would be appreciated.
Thank you!
Kubernetes Info:
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Percona mysql xtradb cluster doesn't start properly and node restarts don't work

tl;dr
When starting a fresh percona cluster of 3 kubernetes pods, the grastate.dat seq_no is set at -1 and doesn't change. On deleting one pod and watching it restart, expecting it to rejoin the cluster, it sets it's inital position to 00000000-0000-0000-0000-000000000000:-1 and tries to connect to itself (it's former ip), maybe because it'd been the first pod in the cluster? It then timeouts in it's erroneous connection to itself:
2017-03-26T08:38:05.374058Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
The cluster doesn't get started properly and I'm unable to successfully restart pods in the cluster.
Full
When I start the cluster from scratch. With blank data directories and a fresh etcd cluster, everything seems to come up. However I look at the grastate.dat and I find that the seq_no for each pod is -1:
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
At this point I can do mysql -h percona -u wordpress -p and connect and wordpress works too.
Scenario:
I have 3 percona pods
/ # jonathan#ubuntu:~/Projects/k8wp$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-0 1/1 Running 1 12h
etcd-1 1/1 Running 0 12h
etcd-2 1/1 Running 3 12h
etcd-3 1/1 Running 1 12h
percona-0 1/1 Running 0 8m
percona-1 1/1 Running 0 57m
percona-2 1/1 Running 0 57m
When I try to restart percona-0 it gets kicked out of the cluster on restarting, percona-0's gvwstate.dat file shows
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/gvwstate.dat
my_uuid: b7571ff8-11f8-11e7-bd2d-8b50487e1523
#vwbeg
view_id: 3 b7571ff8-11f8-11e7-bd2d-8b50487e1523 3
bootstrap: 0
member: b7571ff8-11f8-11e7-bd2d-8b50487e1523 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
The other 2 pods in the cluster show:
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/gvwstate.dat
my_uuid: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/gvwstate.dat
my_uuid: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
Here are what I think are the relevant errors from percona-0's startup:
2017-03-26T08:37:58.370605Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer '10.52.0.26:'
2017-03-26T08:38:01.373345Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
2017-03-26T08:38:01.373682Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-03-26T08:38:01.373750Z 0 [Note] WSREP: view(view_id(NON_PRIM,b7571ff8,5) memb {
b7571ff8,0
} joined {
} left {
} partitioned {
})
2017-03-26T08:38:01.373838Z 0 [Note] WSREP: gcomm: connected
2017-03-26T08:38:01.373872Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-26T08:38:01.373987Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-03-26T08:38:01.374012Z 0 [Note] WSREP: Opened channel 'wordpress-001'
2017-03-26T08:38:01.374108Z 0 [Note] WSREP: Waiting for SST to complete.
2017-03-26T08:38:01.374417Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-03-26T08:38:01.374469Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-03-26T08:38:01.374491Z 0 [Note] WSREP: Received NON-PRIMARY.
2017-03-26T08:38:01.374560Z 1 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
The ip it's trying to connect to 10.52.0.26 in 2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer '10.52.0.26:' is actually that pods previous ip, here's the listing of keys in etcd I did before deleting percona-0
/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/wordpress
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress-001/10.52.0.26
/pxc-cluster/wordpress-001/10.52.0.26/hostname
/pxc-cluster/wordpress-001/10.52.0.26/ipaddr
After kubectl delete pods/percona-0:
/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress
Also during the restart percona-0 tried to register to etcd with:
{"action":"create","node":{"key":"/pxc-cluster/queue/wordpress-001/00000000000000009886","value":"10.52.0.27","expiration":"2017-03-26T08:38:57.980325718Z","ttl":60,"modifiedIndex":9886,"createdIndex":9886}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/ipaddr","value":"10.52.0.27","expiration":"2017-03-26T08:38:28.01814818Z","ttl":30,"modifiedIndex":9887,"createdIndex":9887}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/hostname","value":"percona-0","expiration":"2017-03-26T08:38:28.037188157Z","ttl":30,"modifiedIndex":9888,"createdIndex":9888}}
{"action":"update","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"expiration":"2017-03-26T08:38:28.054726795Z","ttl":30,"modifiedIndex":9889,"createdIndex":9887},"prevNode":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"modifiedIndex":9887,"createdIndex":9887}}
which doesn't work.
From the second member of the cluster percona-1:
2017-03-26T08:37:44.069583Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.52.0.26:4567
2017-03-26T08:37:45.069756Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') reconnecting to b7571ff8 (tcp://10.52.0.26:4567), attempt 0
2017-03-26T08:37:48.570332Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
2017-03-26T08:37:49.605089Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspecting node: b7571ff8
2017-03-26T08:37:49.605276Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspected node without join message, declaring inactive
2017-03-26T08:37:50.104676Z 0 [Note] WSREP: declaring c33d6a73 at tcp://10.52.2.33:4567 stable
New Info:
I restarted percona-0 again, and this time it somehow came up! After a few tries I realised the pod needs to restarted twice to come up i.e. after deleting it the first time, it comes up with the above errors, after deleting it the second time it comes up okay and syncs with the other members. Could this be because it was the first pod in the cluster?
I've tested deleting the other pods but they all come back up okay.
The issue only lies with percona-0.
Also;
Taking down all the pods at once, if my node was to crash, that's the situation where the pods don't come back up at all! I suspect it's because no state is saved to grastate.dat , i.e. seq_no remains -1 even though the global id may change, the pods exit with mysqld shutdown, and the following errors:
jonathan#ubuntu:~/Projects/k8wp$ kubectl logs percona-2 | grep ERROR
2017-03-26T11:20:25.795085Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:25.795276Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:25.795544Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.2.36': -110 (Connection timed out)
2017-03-26T11:20:25.795618Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:25.795645Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.2.36) failed: 7
2017-03-26T11:20:25.795693Z 0 [ERROR] Aborting
jonathan#ubuntu:~/Projects/k8wp$ kubectl logs percona-1 | grep ERROR
2017-03-26T11:20:27.093780Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:27.093977Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:27.094145Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.1.49': -110 (Connection timed out)
2017-03-26T11:20:27.094200Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:27.094227Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.1.49) failed: 7
2017-03-26T11:20:27.094247Z 0 [ERROR] Aborting
jonathan#ubuntu:~/Projects/k8wp$ kubectl logs percona-0 | grep ERROR
2017-03-26T11:20:52.040214Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:52.040279Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:52.040385Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.2.36': -110 (Connection timed out)
2017-03-26T11:20:52.040437Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:52.040471Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.2.36) failed: 7
2017-03-26T11:20:52.040508Z 0 [ERROR] Aborting
grastate.dat on deleting all pods:
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root#gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
No, gvwstate.dat
Fixed it with changing the entrypoint in the container to the following script:
#!/bin/bash
sed -i \"s|safe_to_bootstrap.*:.*|safe_to_bootstrap:1|1\" /var/lib/mysql/grastate.dat;
/entrypoint.sh --wsrep-new-cluster;
Thanks to https://www.claudiokuenzler.com/blog/494/galera-cluster-mysql-not-starting-failed-to-open-channel-reach-primary#.WNesDiF97Qo
The issue is, when restarting the 3 pods from a crash, they all hit the following error:
[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
What that means (summarizing from the the link), is that since all the pods are down the first pod (the pods are managed by a statefulset) comes up and tries to reconnect to the cluster but doesn't find any other pods it can connect to, so it goes down, the next pod comes up tries the same thing, hits the same error and goes down to etc etc
The solution is for the first pod to start a new cluster when it comes up then all the subsequent will come up and find a node to connect to. It'll still come up with all the data.
So with percona xtradb the docker container's entrypoint looks like:
exec mysqld --user=mysql --wsrep_cluster_name=$CLUSTER_NAME --wsrep_cluster_address="gcomm://$cluster_join" --wsrep_sst_method=xtrabackup-v2 --wsrep_sst_auth="xtrabackup:$XTRABACKUP_PASSWORD" --log-error=${DATADIR}error.log $CMDARG
So all I have to do to get the setup running is pass the earlier argument --wsrep-new-cluster to the /entrypoint.sh file like so:
/entrypoint.sh --wsrep-new-cluster
PS//
I tried the above at first alone but I ran into an error stating that to force a new cluster and bootstrap with that node I had to set safe_to_bootstrap from 0 to 1 in /var/lib/mysql/grastate.dat

MySql version 5.7.12 getting crashed

We have migrated from SqlServer 2008 to MySql version 5.7.12 on AWS EC2 instance of Ubuntu 14.04 LTS Version. The server is getting crashed repeatedly. The logs are as below:
2016-05-25T06:17:40.045804Z 2946 [Note] Aborted connection 2946 to db: 'dbowithrw' user: 'cwuser' host: '172.16.4.138' (Got an error reading communication packets)
2016-05-25T06:17:40.046804Z 2938 [Note] Aborted connection 2938 to db: 'dbowithrw' user: 'cwuser' host: '172.16.4.138' (Got an error reading communication packets)
2016-05-25T06:17:40.046817Z 2945 [Note] Aborted connection 2945 to db: 'dbowithrw' user: 'cwuser' host: '172.16.4.138' (Got an error reading communication packets)
2016-05-25T06:22:41.447479Z 2985 [Note] Aborted connection 2985 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.48' (Got an error reading communication packets)
2016-05-25T06:22:41.447483Z 2964 [Note] Aborted connection 2964 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.48' (Got an error reading communication packets)
2016-05-25T06:24:11.317802Z 2931 [Note] Aborted connection 2931 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T06:34:51.345602Z 3008 [Note] Aborted connection 3008 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.48' (Got an error reading communication packets)
06:35:45 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
key_buffer_size=52428800
read_buffer_size=20971520
max_used_connections=284
max_threads=214
thread_count=283
connection_count=283
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 8843311 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fe34c6be4d0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fe40a536e40 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xebef8c]
/usr/sbin/mysqld(handle_fatal_signal+0x451)[0x7acb61]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fe6f98f4340]
/usr/sbin/mysqld(_ZN14Item_func_case7cleanupEv+0x21)[0x8236b1]
/usr/sbin/mysqld(_Z13cleanup_itemsP4Item+0x21)[0xc94431]
/usr/sbin/mysqld(_ZN5TABLE16cleanup_gc_itemsEv+0x34)[0xd37234]
/usr/sbin/mysqld(_Z19close_thread_tablesP3THD+0x5f)[0xc404ff]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x4b7)[0xc970c7]
/usr/sbin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x50)[0xc20d00]
/usr/sbin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x384)[0xc228e4]
/usr/sbin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0xab)[0xc2327b]
/usr/sbin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x120)[0xc24410]
/usr/sbin/mysqld(_ZN7sp_head7executeEP3THDb+0x4f4)[0xc1c624]
/usr/sbin/mysqld(_ZN7sp_head15execute_triggerEP3THDRK25st_mysql_const_lex_stringS4_P10GRANT_INFO+0x1fc)[0xc1cedc]
/usr/sbin/mysqld(_ZN7Trigger7executeEP3THD+0x10d)[0xd403ed]
/usr/sbin/mysqld(_ZN13Trigger_chain16execute_triggersEP3THD+0x18)[0xd434a8]
/usr/sbin/mysqld(_ZN24Table_trigger_dispatcher16process_triggersEP3THD23enum_trigger_event_type29enum_trigger_action_time_typeb+0x52)[0xd3cee2]
/usr/sbin/mysqld(_Z12mysql_updateP3THDR4ListI4ItemES4_y15enum_duplicatesPyS6_+0x134d)[0xd1cead]
/usr/sbin/mysqld(_ZN14Sql_cmd_update23try_single_table_updateEP3THDPb+0x1b6)[0xd1f346]
/usr/sbin/mysqld(_ZN14Sql_cmd_update7executeEP3THD+0x27)[0xd1f667]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x5d0)[0xc971e0]
/usr/sbin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x50)[0xc20d00]
/usr/sbin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x384)[0xc228e4]
/usr/sbin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0xab)[0xc2327b]
/usr/sbin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x120)[0xc24410]
/usr/sbin/mysqld(_ZN7sp_head7executeEP3THDb+0x4f4)[0xc1c624]
/usr/sbin/mysqld(_ZN7sp_head15execute_triggerEP3THDRK25st_mysql_const_lex_stringS4_P10GRANT_INFO+0x1fc)[0xc1cedc]
/usr/sbin/mysqld(_ZN7Trigger7executeEP3THD+0x10d)[0xd403ed]
/usr/sbin/mysqld(_ZN13Trigger_chain16execute_triggersEP3THD+0x18)[0xd434a8]
/usr/sbin/mysqld(_ZN24Table_trigger_dispatcher16process_triggersEP3THD23enum_trigger_event_type29enum_trigger_action_time_typeb+0x52)[0xd3cee2]
/usr/sbin/mysqld(_Z12mysql_updateP3THDR4ListI4ItemES4_y15enum_duplicatesPyS6_+0x134d)[0xd1cead]
/usr/sbin/mysqld(_ZN14Sql_cmd_update23try_single_table_updateEP3THDPb+0x1b6)[0xd1f346]
/usr/sbin/mysqld(_ZN14Sql_cmd_update7executeEP3THD+0x27)[0xd1f667]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x5d0)[0xc971e0]
/usr/sbin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x50)[0xc20d00]
/usr/sbin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x384)[0xc228e4]
/usr/sbin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0xab)[0xc2327b]
/usr/sbin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x120)[0xc24410]
/usr/sbin/mysqld(_ZN7sp_head7executeEP3THDb+0x4f4)[0xc1c624]
/usr/sbin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x757)[0xc200b7]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x1c25)[0xc98835]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x385)[0xc9d3e5]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x8d7)[0xc9dd27]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x177)[0xc9f667]
/usr/sbin/mysqld(handle_connection+0x278)[0xd5a138]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xee4224]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fe6f98ec182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe6f8df947d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe34dd976e0): is an invalid pointer
Connection ID (thread ID): 3055
Status: NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2016-05-25T06:35:45.130129Z mysqld_safe Number of processes running now: 0
2016-05-25T06:35:45.131247Z mysqld_safe mysqld restarted
2016-05-25T06:35:45.136417Z 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 30000)
2016-05-25T06:35:45.136469Z 0 [Warning] Changed limits: max_connections: 214 (requested 700)
2016-05-25T06:35:45.136472Z 0 [Warning] Changed limits: table_open_cache: 400 (requested 4000)
2016-05-25T06:35:45.285890Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.12) starting as process 5208 ...
2016-05-25T06:35:45.291032Z 0 [Note] InnoDB: PUNCH HOLE support available
2016-05-25T06:35:45.291101Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-05-25T06:35:45.291118Z 0 [Note] InnoDB: Uses event mutexes
2016-05-25T06:35:45.291129Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2016-05-25T06:35:45.291140Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-05-25T06:35:45.291152Z 0 [Note] InnoDB: Using Linux native AIO
2016-05-25T06:35:45.291417Z 0 [Note] InnoDB: Number of pools: 1
2016-05-25T06:35:45.291538Z 0 [Note] InnoDB: Using CPU crc32 instructions
2016-05-25T06:35:45.298282Z 0 [Note] InnoDB: Initializing buffer pool, total size = 10G, instances = 8, chunk size = 128M
2016-05-25T06:35:45.777778Z 0 [Note] InnoDB: Completed initialization of buffer pool
2016-05-25T06:35:45.835555Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2016-05-25T06:35:45.853945Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2016-05-25T06:35:45.893013Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 688601815952
2016-05-25T06:35:46.011211Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688607058432
2016-05-25T06:35:46.130727Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688612301312
2016-05-25T06:35:46.250410Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688617544192
2016-05-25T06:35:46.369991Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688622787072
2016-05-25T06:35:46.439395Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/sureshprasad.ibd' with space ID 8686, since the redo log references ./dbowithrw/sureshprasad.ibd with space ID 8685.
2016-05-25T06:35:46.621020Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688628029952
2016-05-25T06:35:46.926647Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688633272832
2016-05-25T06:35:47.220662Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688638515712
2016-05-25T06:35:47.523115Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688643758592
2016-05-25T06:35:47.813625Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688649001472
2016-05-25T06:35:48.013840Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:48.013907Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:48.109329Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688603126272
2016-05-25T06:35:48.245377Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688608369152
2016-05-25T06:35:48.380703Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688613612032
2016-05-25T06:35:48.516794Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688618854912
2016-05-25T06:35:48.653082Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688624097792
2016-05-25T06:35:48.942457Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688629340672
2016-05-25T06:35:49.258740Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688634583552
2016-05-25T06:35:49.573738Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688639826432
2016-05-25T06:35:49.890260Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688645069312
2016-05-25T06:35:50.194665Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688650312192
2016-05-25T06:35:50.325541Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:50.325606Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:50.325619Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:50.325628Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:50.325840Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:50.325891Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:50.325924Z 0 [Note] InnoDB: Ignoring data file './dbowithrw/#sql-ib824219-1537447590.ibd' with space ID 8686. Another data file called ./dbowithrw/sureshprasad.ibd exists with the same space ID.
2016-05-25T06:35:50.343396Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 688652957751
2016-05-25T06:35:50.344666Z 0 [Note] InnoDB: Database was not shutdown normally!
2016-05-25T06:35:50.344689Z 0 [Note] InnoDB: Starting crash recovery.
2016-05-25T06:35:50.635987Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2016-05-25T06:35:51.610469Z 0 [Note] InnoDB: Apply batch completed
2016-05-25T06:35:51.931405Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2016-05-25T06:35:51.931566Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2016-05-25T06:35:51.931607Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2016-05-25T06:35:51.980053Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2016-05-25T06:35:51.980939Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2016-05-25T06:35:51.980966Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2016-05-25T06:35:51.981311Z 0 [Note] InnoDB: Waiting for purge to start
2016-05-25T06:35:52.031470Z 0 [Note] InnoDB: 5.7.12 started; log sequence number 688652957751
2016-05-25T06:35:52.031491Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6196ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
2016-05-25T06:35:52.031700Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2016-05-25T06:35:52.031853Z 0 [Note] Plugin 'FEDERATED' is disabled.
2016-05-25T06:35:52.036324Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2016-05-25T06:35:52.036535Z 0 [Warning] CA certificate ca.pem is self signed.
2016-05-25T06:35:52.037547Z 0 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
2016-05-25T06:35:52.037582Z 0 [Note] - '0.0.0.0' resolves to '0.0.0.0';
2016-05-25T06:35:52.037634Z 0 [Note] Server socket created on IP: '0.0.0.0'.
2016-05-25T06:35:52.213332Z 0 [Note] Event Scheduler: Loaded 0 events
2016-05-25T06:35:52.213498Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.12' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL)
2016-05-25T06:35:52.244629Z 2 [Warning] IP address '10.0.0.100' could not be resolved: Name or service not known
2016-05-25T06:35:55.287910Z 7 [Warning] IP address '172.16.1.58' could not be resolved: Name or service not known
2016-05-25T06:36:02.660844Z 0 [Note] InnoDB: Buffer pool(s) load completed at 160525 12:06:02
2016-05-25T06:36:11.007067Z 14 [Warning] IP address '172.16.1.48' could not be resolved: Name or service not known
2016-05-25T06:36:20.359394Z 13 [Note] Aborted connection 13 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.58' (Got an error reading communication packets)
2016-05-25T06:38:22.358943Z 29 [Warning] IP address '172.16.3.72' could not be resolved: Name or service not known
2016-05-25T06:38:51.591178Z 26 [Note] Aborted connection 26 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.58' (Got an error reading communication packets)
2016-05-25T06:39:17.753741Z 34 [Warning] IP address '172.16.3.53' could not be resolved: Name or service not known
2016-05-25T06:41:43.343490Z 39 [Warning] IP address '172.16.2.172' could not be resolved: Name or service not known
2016-05-25T06:42:32.528345Z 43 [Warning] IP address '172.16.5.181' could not be resolved: Name or service not known
2016-05-25T06:43:55.808222Z 49 [Warning] IP address '172.16.1.127' could not be resolved: Name or service not known
2016-05-25T06:44:33.582066Z 51 [Warning] IP address '172.16.1.93' could not be resolved: Name or service not known
2016-05-25T06:46:10.717352Z 56 [Warning] IP address '10.0.0.254' could not be resolved: Name or service not known
2016-05-25T06:46:25.281681Z 57 [Warning] IP address '172.16.4.138' could not be resolved: Name or service not known
2016-05-25T06:49:08.356412Z 78 [Warning] IP address '172.16.1.124' could not be resolved: Name or service not known
2016-05-25T06:49:14.488339Z 79 [Warning] IP address '172.16.1.29' could not be resolved: Name or service not known
2016-05-25T06:52:22.121133Z 88 [Warning] IP address '172.16.1.87' could not be resolved: Name or service not known
2016-05-25T06:59:24.495151Z 81 [Note] Aborted connection 81 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.93' (Got an error reading communication packets)
2016-05-25T07:01:49.481888Z 71 [Note] Aborted connection 71 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.481936Z 70 [Note] Aborted connection 70 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.486902Z 93 [Note] Aborted connection 93 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.488708Z 62 [Note] Aborted connection 62 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.488746Z 60 [Note] Aborted connection 60 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.491444Z 53 [Note] Aborted connection 53 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.491476Z 54 [Note] Aborted connection 54 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.491550Z 72 [Note] Aborted connection 72 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.492401Z 61 [Note] Aborted connection 61 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.493221Z 63 [Note] Aborted connection 63 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.496796Z 65 [Note] Aborted connection 65 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.498087Z 69 [Note] Aborted connection 69 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:01:49.498770Z 2 [Note] Aborted connection 2 to db: 'dbowithrw' user: 'cwuser' host: '10.0.0.100' (Got an error reading communication packets)
2016-05-25T07:02:03.795662Z 122 [Warning] IP address '172.16.3.245' could not be resolved: Name or service not known
2016-05-25T07:03:19.123593Z 110 [Note] Aborted connection 110 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.93' (Got an error reading communication packets)
2016-05-25T07:04:45.370077Z 138 [Note] Aborted connection 138 to db: 'dbowithrw' user: 'cwuser' host: '172.16.1.93' (Got an error reading communication packets)
2016-05-25T07:06:32.471694Z 145 [Warning] IP address '172.16.1.72' could not be resolved: Name or service not known
2016-05-25T07:06:35.076698Z 103 [Note] Aborted connection 103 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T07:06:35.076745Z 100 [Note] Aborted connection 100 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T07:06:35.076849Z 104 [Note] Aborted connection 104 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T07:06:35.076859Z 135 [Note] Aborted connection 135 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T07:06:35.076919Z 105 [Note] Aborted connection 105 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T07:06:35.077813Z 134 [Note] Aborted connection 134 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T07:06:35.078668Z 111 [Note] Aborted connection 111 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
2016-05-25T07:06:35.079450Z 99 [Note] Aborted connection 99 to db: 'dbowithrw' user: 'cwuser' host: '172.16.2.172' (Got an error reading communication packets)
07:08:48 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
key_buffer_size=52428800
read_buffer_size=20971520
max_used_connections=72
max_threads=214
thread_count=58
connection_count=58
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 8819493 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fe5f4402670
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fe6b1d60e40 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xebef8c]
/usr/sbin/mysqld(handle_fatal_signal+0x451)[0x7acb61]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fe99f0de340]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe5f590f5c0): is an invalid pointer
Connection ID (thread ID): 102
Status: NOT_KILLED
Any pointers or help as to where we are going wrong will be greatly appreciated? Also, a small update, we had upgraded the MySql from 5.7.8 RC version? Can upgradtion also be an issue for error of this type to occur?
Your read_buffer_size of 20971520*max_threads means this one memory requirement is 4G of the 8,819,493 K bytes required. Tell us how much memory your machine has, please.
Since read_buffer_size serves scanning tables sequentially, a possible reasonable request would be read_buffer_size=1048576. A scan of 10M data would only require 10 reads.
If you could post show global variables and show global status, we could better assist.
The problem was resolved by setting of the variables in the my.cnf file. The variables were calculated with the help of My Sql Calculator with the help of which I was able to calculate the optimal amount of memory allocated for various purposes. One of the mistake that was done was that sort buffer size was set to 40 mb. That resulted is very less not of connections (< 270) while the DB system required about 1200 connections in total. That was a major relief. We had also given 15GB memory to the innodb_buffer_pool_size.
There was another issue with the slow running queries. In MS SQL, most of our select queries were run with(nolock) on tables. We had done that global level change through config by setting the configuration to read uncommited (default is repeatable read) that had boosted the performance of the queries significatly. (This may not be suitably desired for many others). But, for us it worked well.
Also, in the stored procedures, dropping the temporary tables did a reasonable work. Earlier, we were under wrong impression that as soon as the connection from the application is closed, the temporary table will be dropped which turned out to be false. So, we had manually dropped the temporary table created. (We were using temporary tables to emulate the functionality of rank, dense rank as in sql server). With this, we were able to convert the migration reasonably well.