Docker container running supervisor shuts down too early - mysql

I am running a docker container with multiple processes (to achieve a LAMP environment) by using supervisor as described here. Everything works fine but when I stop the container the MySQL process doesn't get properly terminated even though I've set up supervisor's pidproxy in the supervisor config
[program:mysql]
command=/usr/bin/pidproxy /run/mysqld/mysqld.pid /bin/sh -c "exec /usr/bin/mysqld_safe"
When I access the container through nsenter and restart the MySQL daemon with
supervisorctl restart mysql
the shutdown completes without throwing an error next time it comes up. So I think supervisor is configured correctly. To me it seems docker (I am running 1.2.0) terminates the container just a bit early while mysqld is still shutting down.
Edit
I was able to debug some more details by running supervisord with the -e debug switch.
Shutdown with supervisorctl restart mysql via nsenter:
DEBG fd 17 closed, stopped monitoring <POutputDispatcher at 39322256 for <Subprocess at 38373280 with name mysql in state RUNNING> (stderr)>
DEBG fd 14 closed, stopped monitoring <POutputDispatcher at 39324128 for <Subprocess at 38373280 with name mysql in state RUNNING> (stdout)>
DEBG killing mysql (pid 1128) with signal SIGTERM
INFO stopped: mysql (exit status 0)
DEBG received SIGCLD indicating a child quit
CRIT reaped unknown pid 1129)
DEBG received SIGCLD indicating a child quit
And externally via docker restart container_name:
DEBG fd 17 closed, stopped monitoring <POutputDispatcher at 39290136 for <Subprocess at 38373280 with name mysql in state RUNNING> (stderr)>
DEBG fd 14 closed, stopped monitoring <POutputDispatcher at 39290424 for <Subprocess at 38373280 with name mysql in state RUNNING> (stdout)>
DEBG killing mysql (pid 7871) with signal SIGTERM
INFO stopped: mysql (exit status 0)
DEBG received SIGCLD indicating a child quit
This is the process structure before the first attempt:
1128 S 0:00 /usr/bin/python /usr/bin/pidproxy /run/mysqld/mysqld.pid /usr/bin/mysqld_safe
1129 S 0:00 \_ /bin/sh /usr/bin/mysqld_safe
1463 Sl 0:00 \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-erro
So while pid 1463 gets properly reaped as it's in /run/mysqld/mysqld.pid, pid 1129 causes the trouble as docker shuts down the container before it's gone. Is it a bug in supervisor's pidproxy or can it be fixed by a different configuration?

I had a similar problem and I fixed it by not using mysql_safe, but rather starting mysql directly. I did this by seeing what command mysql_safe generated in the ps aux overview.
Maybe it fails because mysql_safe doesn't create a proper parent -> child process relationship?
Anyway my config would look something like this if I use your mysql process as a template:
[program:mysql]
command=/usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-erro

I believe that Docker will only wait for ten seconds before killing a container it's trying to stop (or restart). This is so that the action requested is ultimately carried out, and it doesn't just hang. You can configure the time period using the "--time" parameter on "docker stop" or "docker restart".

Related

systemd podman This usually indicates unclean termination of a previous run, or service implementation deficiencies

I am running container with systemd/pod, when I want to deploy new image tag. stopping service, updating the service file and starting. but container failed to start.
systemd file.
[Unit]
Description=hello_api Podman Container
After=network.target
[Service]
Restart=on-failure
RestartSec=3
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStartPre=-/usr/bin/podman rm hello_api
ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d -h modelenv \
--name hello_api --rm --ulimit=host -p "8001:8001" -p "8443:8443" 7963-hello_api:7.8
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid
[Install]
WantedBy=default.target
here is error message.
May 21 10:41:43 webserver systemd[1471]: hello_api.service: Found left-over process 22912 (conmon) in control group while starting unit. Ignoring.
May 21 10:41:43 webserver systemd[1471]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 10:41:43 webserver systemd[1471]: hello_api.service: Found left-over process 22922 (node) in control group while starting unit. Ignoring.
May 21 10:41:43 webserver systemd[1471]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 10:41:43 webserver systemd[1471]: hello_api.service: Found left-over process 22960 (node) in control group while starting unit. Ignoring.
May 21 10:41:43 webserver systemd[1471]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 10:41:44 webserver podman[24565]: 2020-05-21 10:41:44.586396547 -0400 EDT m=+1.090025069 container create 28eaf881f532339766cc96ec27a69d8ad588e07d4bfc70e65e7c54e8a5082933 (image=7963-hello_api:7.8, name=hello_api)
May 21 10:41:45 webserver podman[24565]: Error: error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]
May 21 10:41:45 webserver systemd[1471]: hello_api.service: Control process exited, code=exited status=126
May 21 10:41:45 webserver systemd[1471]: hello_api.service: Failed with result 'exit-code'.
May 21 10:41:45 webserver systemd[1471]: Failed to start call_center_hello_api Podman Container.
why its giving this error, is there option to cleanly exit the old container?
I think we followed the same tutorial here: https://www.redhat.com/sysadmin/podman-shareable-systemd-services
"It’s important to set the kill mode to none. Otherwise, systemd will start competing with Podman to stop and kill the container processes. which can lead to various undesired side effects and invalid states"
I'm not sure if the behavior changed, but I removed the KillMode=none causing it to use the default KillMode=control-group. I have not had any problems managing the service since. Also, I removed the / from some of the commands because it was being duplicated:
ExecStartPre=/usr/bin/rm -f //run/user/1000/registry.service-pid //run/user/1000/registry.service-cid
It's now:
ExecStartPre=/usr/bin/rm -f /run/user/1000/registry.service-pid /run/user/1000/registry.service-cid
The full service file I use for running a docker registry:
[Unit]
Description=Image Registry
[Service]
Restart=on-failure
ExecStartPre=-/usr/bin/podman volume create registry
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile %t/%n-pid --cidfile %t/%n-cid -d -p 5000:5000 -v registry:/var/lib/registry --name registry docker.io/library/registry
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat %t/%n-cid`"
Type=forking
PIDFile=/%t/%n-pid
[Install]
WantedBy=multi-user.target

Docker seems to be killed before my shutdown-script is executed when my is VM preempted on GCE

I want to implement a shutdown-script that would be called when my VM is going to be preempted on Google Compute Engine. That VM is used to run dockers containers that execute long running batches, so I send them a signal to make them gracefully exit.
That shutting-down script is working well when I execute it manually, yet it breaks on a real premption use-case, or when I kill the VM by myself.
I got this error:
... logs from my containers ...
A 2019-08-13T16:54:07.943153098Z time="2019-08-13T16:54:07Z" level=error msg="error waiting for container: unexpected EOF"
(just after this error, I can see what I put in the 1st line of my shutting-down script, see code below)
A 2019-08-13T16:54:08.093815210Z 2019-08-13 16:54:08: Shutting down! TEST SIGTERM SHUTTING DOWN (this is the 1st line of my shuttig-down script)
A 2019-08-13T16:54:08.093845375Z docker ps -a
(no reult)
A 2019-08-13T16:54:08.155512145Z ps -ef
... a lot of things, but nothing related to docker ...
2019-08-13 16:54:08: task_subscriber not running, shutting down immediately.
I use preemptible VM from GCE, with image Container-Optimized OS 73-11647.267.0 stable. I run my dockers as service with systemctl, yet I don't thik this is related - [edit] Actually I could solve my issue thanks to this.
Right now, I am pretty sure that a lot of things happens when Google send the ACPI signal to my VM, even before the shutdown-script is fetched from the VM metadata and is called.
My guess is that all the services are being stopped at the same time, eventually docker.service itself.
When my container is running, I can get the same level=error msg="error waiting for container: unexpected EOF" with a simple sudo systemctl stop docker.service
Here is a part of my shuting-down script:
#!/bin/bash
# This script must be added in the VM metadata as "shutdown-script" so that
# it is executed when the instance is being preempted.
CONTAINER_NAME="task_subscriber" # For example, "task_subscriber"
logTime() {
local datetime="$(date +"%Y-%m-%d %T")"
echo -e "$datetime: $1" # Console
echo -e "$datetime: $1" >>/var/log/containers/$CONTAINER_NAME.log
}
logTime "Shutting down! TEST SIGTERM SHUTTING DOWN"
echo "docker ps -a" >>/var/log/containers/$CONTAINER_NAME.log
docker ps -a >>/var/log/containers/$CONTAINER_NAME.log
echo "ps -ef" >>/var/log/containers/$CONTAINER_NAME.log
ps -ef >>/var/log/containers/$CONTAINER_NAME.log
if [[ ! "$(docker ps -q -f name=${CONTAINER_NAME})" ]]; then
logTime "${CONTAINER_NAME} not running, shutting down immediately."
sleep 10 # Give time to send logs
exit 0
fi
logTime "Sending SIGTERM to ${CONTAINER_NAME}"
#docker kill --signal=SIGTERM ${CONTAINER_NAME}
systemctl stop taskexecutor.service
# Portable waitpid equivalent
while [[ "$(docker ps -q -f name=${CONTAINER_NAME})" ]]; do
sleep 1
logTime "Waiting for ${CONTAINER_NAME} termination"
done
logTime "${CONTAINER_NAME} is done, shutting down."
logTime "TEST SIGTERM SHUTTING DOWN BYE BYE"
sleep 10 # Give time to send logs
If I simply call systemctl stop taskexecutor.service manually (not by really shutting down the server), the SIGTERM signal is sent to my docker and my app properly handle it and exists.
Any idea?
-- How I solved my issue --
I could solve it by adding this dependency on docker in my service config:
[Unit]
Wants=gcr-online.target docker.service
After=gcr-online.target docker.service
I don't know how the magic works beyond the execution of the shutdown-script stored in metadata by Google. But I think that they should fix something in their Container-Optimized OS so that that magic happens before docker is stopped. Otherwise, we could not rely on it to gracefully shutdown a basic script with it (hopefully I was using systemd here)...
From the documentation[1] usage of shutdown scripts on the preemptible VM instances are feasible. However, it seems there are some limitations in place while using the shutdown scripts, Compute Engine executes shutdown scripts only on a best-effort basis. In rare cases, Compute Engine cannot guarantee that the shutdown script will complete. Also I would like to mention Preemptible instances has 30 seconds after instance preemption begins[2] which might be killing the docker before the shutdown script was executed.
From the error message provided in your use case, it seems to be an expected behaviour with the Docker running continuously for longer time[3].
[1]https://cloud.google.com/compute/docs/instances/create-start-preemptible-instance#handle_preemption
[2] https://cloud.google.com/compute/docs/shutdownscript#limitations
[3] https://github.com/docker/for-mac/issues/1941

Not To Start MySql Automatically at Centos Startup

i have "CentOS 6" VPS and i wanted to start mysql service automatically at Startup of Server when it is restarted. so i used this command in putty
chkconfig --level 345 mysqld on
this command is working and mysql starts on every startup automatically.
BUT how can i now stop this? what if i want to start Mysql manually on every startup, then what command should i use?
also what is the File where i can see the list of programs that are running automatically on every setup.
Thanks
You can turn off auto-start with this command:
chkconfig --level 345 mysqld off
To see what is configured for auto-start, you can run:
chkconfig --list
See more info on chkconfig here:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-services-chkconfig.html

Can't connect to '/var/run/mysqld/mysqld.sock'

This is my first post here, so I hope I do everything right and don't forget any important info. I'm glad for any hints, because I'm running out of ideas (if I ever had any ;)).
I am (or was) running owncloud on Raspbian Jessie (so I guess basically Debian). Suddenly owncloud stopped working. The nginx error points towards php5-fpm, further searches gave this error:
exception 'Doctrine\DBAL\DBALException' with message 'Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)' in /var/www/owncloud/lib/private/db/connection.php:54
So it looks like a mysql error, and /var/run/mysqld/ is actually empty.
Following these posts 1 and
2, I tried
sudo find / -type s
resulting in this output:
/run/php5-fpm.sock
/run/fail2ban/fail2ban.sock
/run/thd.socket
/run/dhcpcd.unpriv.sock
/run/dhcpcd.sock
/run/dbus/system_bus_socket
/run/avahi-daemon/socket
/run/udev/control
/run/systemd/journal/syslog
/run/systemd/journal/socket
/run/systemd/journal/stdout
/run/systemd/journal/dev-log
/run/systemd/shutdownd
/run/systemd/private
/run/systemd/notify
find: `/proc/30933/task/30933/fd/5': No such file or directory
find: `/proc/30933/task/30933/fdinfo/5': No such file or directory
find: `/proc/30933/fd/5': No such file or directory
find: `/proc/30933/fdinfo/5': No such file or directory
In the processes with top on the other hand, mysqld and mysqld_safe show up.
mysql-client, mysql-server and php5-mysql are installed and updated to the latest versions.
I also had a look at
/etc/mysql/my.cnf
/etc/mysql/debian.cnf
both show /var/run/mysqld/mysqld.sock as socket...
/var/lib/mysql/my.cnf
mentioned here does not exist.
Additionally, it seems that I can't connect to mysql through
mysql -u user -p
at least it results in the Error 2002 as well.
Finally, I tried stopping and starting the mysql service. This resulted in the following output of
systemctl status mysql.service
mysql.service - LSB: Start and stop the mysql database server daemon
Loaded: loaded (/etc/init.d/mysql)
Active: failed (Result: exit-code) since So 2016-04-10 11:54:23 CEST; 23s ago
Process: 9777 ExecStop=/etc/init.d/mysql stop (code=exited, status=0/SUCCESS)
Process: 12878 ExecStart=/etc/init.d/mysql start (code=exited, status=1/FAILURE)
So I'm kind of lost what is going on, the problem occurs since some updates a few days ago. While writing this post, I went through all the steps again, just to be safe. At one point, I had a short glimpse at my owncloud instance in the browser, but then it was gone again. So I appreciate any help/hints!!!
Thank you very much!!!
I faced the issue: Can't connect to '/var/run/mysqld/mysqld.sock'. The problem was that mysql service was not started after installation. Once I run the following command, then it worked properly:
systemctl start mysql.service
mysql -u root -p
1.Activate log in .my.cnf
log = /var/log/mysql/mysql.log
Error logging goes to syslog. This is a Debian improvement :)
Here you can see queries with especially long duration
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 2
log-queries-not-using-indexes
$ ls -l /var/run/ | grep mysqld
$ ps -ef |grep mysql
tail -f /var/log/mysql/mysql.log
restart mysql
(option) delete socket & restart mysql

Mysql Galera Autostart from boot --wsrep-new-cluster

To recover from a blackout I need to start the Galera cluster when the system boots and I can only do this with the following:
service mysql start --wsrep-new-cluster
"service mysql start" will get launched on boot but will fail because it is the only one in the cluster. How do I get the cluster to start from boot and not fail if it is the only one there?
EDIT
Looks like I have to leave gcomm:// blank for it to start but it is not the best solution as if another server came online first then it would fail.
#galera settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="my_wsrep_cluster"
wsrep_cluster_address="gcomm://"
wsrep_sst_method=rsync
wsrep_provider_options="pc.bootstrap=true"
My solution is to edit init scripts - This is solution for debian - location my init script is /etc/init.d/mysql
then I found this line:
/usr/bin/mysqld_safe "${#:2}" > /dev/null 2>&1 &
and I added parameter --wsrep-new-cluster
/usr/bin/mysqld_safe --wsrep-new-cluster "${#:2}" > /dev/null 2>&1 &
and it is working after boot.
I've been through this before. The following is the procedure I documented for my co-workers:
First we will determine the node with the latest change
On each node go to /var/lib/mysql and examine the grastate.dat file
We are looking for the node with the highest seqno and a uuid that is not all zeros
On the node that captured the latest change startup the cluster in bootstrap mode
service mysql bootstrap
Startup the other nodes via the usual startup command
service mysql start
Check each node if they have the same DB list
mysql -u root -p
show databases;
On any node run the command to check for the status of the cluster and ensure you see something like the following
wsrep_local_state_comment | Synced <-- cluster is synced
wsrep_incoming_addresses | 10.0.0.9:3306,10.0.0.11:3306,10.0.0.13:3306 <-- all nodes are providers
wsrep_cluster_size | 3 <-- cluster consists of 3 nodes
wsrep_ready | ON <-- good :)