Recurring "Can't connect to MySQL server .. Temporary failure in name resolution" in GKE cluster - mysql

I deployed MySQL server using mysql:5.7 image on my GKE cluster. Its deployed with one replica and exposed with a ClusterIP service named "mysql-server".
In the last few hours I'm experiencing recurring flaky errors from other pods that are running Python servers:
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'mysql-server' ([Errno -3] Temporary failure in name resolution)")
I've gone over Kubernetes DNS debugging and found no errors or other issues, except for CoreDNS not running at all in any of my clusters.
When executing nslookup mysql-server on another pod, I'm getting an healthy output.
Server: 10.39.240.10
Address: 10.39.240.10#53
Name: mysql-server.default.svc.cluster.local
Address: 10.39.245.88
However, ping mysql-server never returns, don't know if its relevant.
PING mysql-server.default.svc.cluster.local (10.39.245.88) 56(84) bytes of data.
^C
--- mysql-server.default.svc.cluster.local ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms
Would this be an issue on mysql or GKE? How can I debug it further?

Related

Pods stuck containercreating

previously my MySQL pod stuck at terminating status, and then I tried to force delete using command like this
kubectl delete pods <pod> --grace-period=0 --force
Later I tried to helm upgrade again, my pod was stuck at containercreating status, and this event from pod
17s Warning FailedMount pod/db-mysql-primary-0 MountVolume.SetUp failed for volume "pvc-f32a6f84-d897-4e35-9595-680302771c54" : kubernetes.io/csi: mount
er.SetUpAt failed to check for STAGE_UNSTAGE_VOLUME capability: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix
/var/lib/kubelet/plugins/dobs.csi.digitalocean.com/csi.sock: connect: no such file or directory"
17s Warning FailedMount pod/db-mysql-secondary-0 MountVolume.SetUp failed for volume "pvc-61fc6eda-97fa-455f-ac2c-df8ebcb90f1c" : kubernetes.io/csi: mount
er.SetUpAt failed to check for STAGE_UNSTAGE_VOLUME capability: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix
/var/lib/kubelet/plugins/dobs.csi.digitalocean.com/csi.sock: connect: no such file or directory"
anyone please can help me to resolve this issue, thanks a lot.
When you run the command
kubectl delete pods <pod> --grace-period=0 --force
you ask kubernetes to forget the Pod, not to delete it. You have to be careful while using this command. You have to make sure that the containers of the Pod are not running in the host especially when they are mounted to a PVC. Probably the containers is still running and attached to the PVC.
pool-product-8jd40 0
spec:
drivers: null
and on my some pool the driver csi not ready (null), it's supposed to be equal 1 (ready)
*sorry i can't attach the image yet

Can't expose mysql tcp service running inside kubernetes cluster publicly using nginx-ingress

I ran into a problem exposing a mysql database running inside a kubernetes cluster publicly. The cluster runs with kops on AWS. Im using a helm chart for nginx-ingress: https://github.com/helm/charts/tree/master/stable/nginx-ingress
controller:
config:
use-proxy-protocol: "true"
metrics:
enabled: true
replicaCount: 2
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
stats:
enabled: true
rbac:
create: true
tcp:
5000: default/cbioportal-prod-db-mysql:3306
From within the cluster I can telnet to the db through nginx over port 5000 :
# telnet eating-dingo-nginx-ingress-controller 5000
J
5.7.14
ke_|c&tc"ui%]}mysql_native_passwordConnection closed by foreign host
But i can't seem to connect from outside using the hostname of the aws load balancer.
telnet xxx.us-east-1.elb.amazonaws.com 5000
Trying x.x.x.x...
When i look in aws ec2 dashboard i see the load balancer's security group allows connections from everywhere on port 5000.
UPDATE
I can connect when I use port 3306 instead of 5000:
tcp:
3306: default/cbioportal-prod-db-mysql:3306
However now that the port is open:
$ nmap --verbose -Pn x.x.x.x
PORT STATE SERVICE
21/tcp open ftp
80/tcp open http
443/tcp open https
3306/tcp open mysql
I am getting an authorization issue:
$ mysql -h x.x.x.x -uroot -pabcdef
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 2
I can connect directly to the nginx controller without issues from within the cluster:
kubectl run -it --rm --image=mysql:5.7 --restart=Never mysql-client -- mysql -h eating-dingo-nginx-ingress-controller -uroot -pabcdef
I'm using this mysql helm chart:
https://github.com/helm/charts/tree/master/stable/mysql

CircleCI: MySQL starts on its own even after stopping the process

I am having some trouble with default MySQL installation on CircleCI. In 'post' section of 'machine', I stop mysql using, "- sudo service mysql stop". The reason behind doing so is that I want to use docker mysql container on port 3306. My "docker-compose up" takes some time to finish and sometimes before the docker mysql container starts, the mysql process starts again for no reason obvious to me. I have been tracking this issue using the following command.
while true; do sudo netstat -nlp | grep :3306; sleep 2; done
I have a build that ran fine with docker being able to register port 3306, and also a build in which mysqld started again even after stopping giving me the following error on docker-compose up.
ERROR: for dbm01 Cannot start service dbm01: failed to create endpoint minimum_dbm01_1 on network minimum_default: Error starting userland proxy: listen tcp 0.0.0.0:3306: bind: address already in use
ERROR: Encountered errors while bringing up the project.
Both the builds are of same commit so there is no difference in code. What might be the issue?

Intermittent MySQL connection on Vagrant VirtualBox when Jenkins runs PHPUnit

We have a Jenkins CI server that runs our suite of tests on every commit, triggered by a GitHub hook.
We recently moved the suite of tests from running locally on the Jenkins server to running inside a VirtualBox/Vagrant VM. This is to ensure that the test configuration matches the dev environment. This is an Ubuntu 14.04 guest running on Ubuntu 14.04 host.
After moving to the VM model, PHPUnit occasionally fails with no connection to MySQL. The error is Can't connect to MySQL server on '127.0.0.1'.
This error is intermittent, not easily reproducible. That is, if I trigger a new build on Jenkins, it usually succeeds. However, when the new build is triggered by the GitHub hook, it fails more often than manually triggered builds, and sometimes succeeds.
Here's what I tried:
sudo service mysql restart before running phpunit
sleep 5 between the mysql restart and phpunit
Connecting to localhost and 127.0.0.1 -- When I tried connecting to localhost, I received intermittent errors Can't connect to MySQL server on '/var/run/mysqld/mysqld.sock'.
Here's the full output of the failed build:
sudo service mysql restart
* Stopping MySQL (Percona Server) mysqld
...done.
* Starting MySQL (Percona Server) database server mysqld
...done.
* Checking for corrupt, not cleanly closed and upgrade needing tables.
sleep 5
sudo service mysql status
* /usr/bin/mysqladmin Ver 8.42 Distrib 5.6.23-72.1, for debian-linux-gnu on x86_64
Server version 5.6.23-72.1-log
Protocol version 10
Connection Localhost via UNIX socket
UNIX socket /var/run/mysqld/mysqld.sock
Uptime: 6 sec
Threads: 1 Questions: 111 Slow queries: 0 Opens: 761 Flush tables: 1 Open tables: 754 Queries per second avg: 18.500
phpunit
PHPUnit 4.6.2 by Sebastian Bergmann and contributors.
Configuration read from /vagrant/phpunit.xml
...........EEE.E.............E............................EEEEE.
Time: 8.51 seconds, Memory: 135.25Mb
1) ProcessDatasetsTest::test_process_on_census_fraction
PDOException: SQLSTATE[HY000] [2003] Can't connect to MySQL server on '127.0.0.1' (111)
I've had intermittent connectivity issues with Mysql on Vagrant, but not precisely related to PHPUnit. Connections were dropping just out of the blue, until I found out there were many boxes running at the same time in virtualbox for the same app. I killed them all, then ran vagrant global-status --purge and I had perfect connections again.
We saw a similar issue on a different Vagrant VM -- Can't connect to MySQL server -- and it turned out to be a memory issue. The VM was out of RAM. This was fixed by adding or increasing a swapfile on the VM:
sudo fallocate -l 1G /swapfile.img
sudo chmod 0600 /swapfile.img
sudo mkswap /swapfile.img
sudo swapon /swapfile.img

Zabbix JMX Tomcat monitoring

I have been trying to setup Zabbix to monitor my 2 tomcat servers on 2 different Amazon EC2 machines, but in vain.
The Z on the host is green, however te JMX is red with these errors
- ZBX_TCP_READ() failed: [4] Interrupted system call
- Someother error [111] connection refused
and many such errors, one after another, in the sense I resolve an error to see one more new error popping up.
These are some assumptions
All the machines run Ubuntu 12.10 and later
Server's IP address: 66.55.12.120 (Runs Zabbix server v2.2.4 (revision 46772) (23 June 2014) )
Agent's IP address: 87.52.45.198 ( Runs Zabbix agent v2.2.2 (revision 42525) (12 February 2014) )
My local machine's IP address: 76.89.54.111
Here is what I've done so far.
On Server Side:
1) Installed Zabbix_server using sudo apt-get install zabbix-server-mysql.
2) The GUI, mysql database all have been installed and configured.
3) The following are the only 3 changes that I've made in the file /etc/zabbix/zabbix_server.conf
...
JavaGateway=localhost
JavaGatewayPort=10052
StartJavaPollers=5
...
4) The Zabbix Java gateway was installed using sudo apt-get install zabbix-java-gateway.
5) The following are the only 3 changes that I've made in the file
/etc/zabbix/zabbix_java_gateway.conf
...
LISTEN_IP="127.0.0.1"
LISTEN_PORT=10052
START_POLLERS=5
...
On Client Side:
1) Installed Zabbix Client using
sudo apt-get install zabbix-agent
2) The following are the only 3 changes that I've made in the file
/etc/zabbix/zabbix_agentd.conf
...
Server=66.55.12.120
StartAgents=5
ServerActive=66.55.12.120:10051
Hostname=Security-test-JMX-EC2
... <br />
3) The Hostname is the same as the one that is mentioned while creating the Host on the GUI.
I believe that there are some issues with the IP and ports. So, here are the outbound rules for both the machines as obtained from Amazon EC2 Security Groups for the machines
OUTBOUND RULES for SERVER SECURITY GROUP:
Type Protocol Port Source Reasoning
Custom- TCP 8080 0.0.0.0/0
TCP Rule
All ICMP All N/A 0.0.0.0/0
Custom- TCP 10052 27.52.52.128/32 For access from Agent
TCP Rule
Custom- TCP 8081 76.84.120.130/32 To access Zabbix GUI from-
TCP Rule -my local machine's web browser
Custom- TCP 10051 27.52.52.128/32 As the agent responds to-
TCP Rule -the server on Port 10051TCP Rule-
-Must allow inbound communications-
- from the agent.
Custom- TCP 11000 27.52.52.128/32 The agent's JMX reporting-
TCP Rule -happens on port 11000(not on 12345).
OUTBOUND RULES for CLIENT SECURITY GROUP:
Type Protocol Port Source
HTTPS TCP 443 0.0.0.0/0
Custom- TCP 10050 66.55.12.120/32
TCP Rule
Custom- TCP 10052 66.55.12.120/32
TCP Rule
Custom- TCP 11000 66.55.12.120/32
TCP Rule
HTTP TCP 80 76.89.54.111/32
Custom- TCP 8080 76.89.54.111/32
TCP Rule
Custom- TCP 8443 76.89.54.111/32
TCP Rule
What am I missing? Please guide me.
Any help is appreciated.
Thanks
Goutham
If you can, then run VisualVm (probably using a tunneled X session) on the zabbix host, and see if you can connect to the target JVM with that. If you can't connect from that, you won't be able to connect from Zabbix.
Try with the following CATALINA_OPTS, replacing with the IP on the target that you want JMX to listen on:
export CATALINA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=falseom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=<LOCAL_IP>"
This will disable all JMX security so be aware!
Once you hopefully get it to connect, the "Tomcat JMX" items in Zabbix are also all incorrect! e.g.
Incorrect Zabbix default:
jmx["Catalina:type=GlobalRequestProcessor,name=http-8080",bytesReceived]
Correct entry:
jmx["Catalina:type=ThreadPool,name=\"http-bio-8080\"", bytesReceived]
Note the escaped quotes and incorrect thread name. Add the Mbeans plugin to VisualVM, and use that to browse the MBeans on the target VM, and check the Zabbix names.
It does work eventually, but is a real pain to setup. Zabbix is however one of the few open source monitoring tools that supports JMX at all!
By default, JMX does not work very well with firewalls. You might find related bug reports on Zabbix tracker useful: ZBX-5326 and ZBX-6815. The first one contains a workaround for Tomcat which might work for you.
#gvatreya wrote:
Server: (Runs Zabbix server)
Agent: (Runs Zabbix agent)
It looks like you have to start Zabbix Java gateway as well on host where it is installed (it is a daemon/service).
I configured as follows:
Server: (Runs Zabbix server, Zabbix Java gateway)
Agent: (Runs Zabbix agent)
I think it is possible to install it on a dedicated host.
Have you tried adding -Djava.net.preferIPv4Stack=true to the VM options?
to make it work add next java_opts to your tomcat startup script
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=2345
-Dcom.sun.management.jmxremote.rmi.port=12345
-Djava.rmi.server.hostname=<tomcat_hostname>