How to manage supervisord restart gracefully with gunicorn? - gunicorn

I have scenario in which i want gunicorn workers to acknowledge and fulfill ongoing requests before supervisord starts after stopping(restart).
Can anyone help me with this.

Set your Supervisor configuration as autostart and autorestart as true.
Then kill the gunicorn by
kill -HUP `cat /tmp/process.pid`
It will shutdown gracefully hence processing all requests and supervisor will restart it according to the configurations.
You can also use reload of gunicorn as it sends HUP signal.

More precisely, you can to reload your app by sending HUP signal by: pkill -HUP gunicorn.
Because the /var/run/ directory is missing this pid:
find /var/run/ -iname '*gunicorn*' | wc -l
0
See the official Gunicorn docs, http://docs.gunicorn.org/en/stable/faq.html
and man 1 pkill

Related

Pm2 startup issue with CENTOS 8 / SELinux

Please, do you know how resolve this issue ?
I searched everywhere without finding.
06:45 SELinux is preventing systemd from open access on the file /root/.pm2/pm2.pid. For complete SELinux messages run: sealert -l d84a5a0b-cfcf-4cb9-918a-c0952bf70600 setroubleshoot
06:45 pm2-root.service: Can't convert PID files /root/.pm2/pm2.pid O_PATH file descriptor to proper file descriptor: Permission denied systemd 2
06:45 Failed to start PM2 process manager.
I have executed this command : sealert -l d84a5a0b-cfcf-4cb9-918a-c0952bf70600 setroubleshoot
Messages d'audit bruts
type=AVC msg=audit(1591498085.184:7731): avc: denied { open } for pid=1 comm="systemd" path="/root/.pm2/pm2.pid" dev="dm-0" ino=51695937 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:admin_home_t:s0 tclass=file permissive=0
PM2 Version : 4.4.0
NODE version : 12.18.0
CentOS Version : 8
my systemd service :
[Unit]
Description=PM2 process manager
Documentation=https://pm2.keymetrics.io/
After=network.target
[Service]
Type=forking
User=root
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
Environment=PM2_HOME=/root/.pm2
PIDFile=/root/.pm2/pm2.pid
Restart=on-failure
ExecStart=/usr/lib/node_modules/pm2/bin/pm2 resurrect
ExecReload=/usr/lib/node_modules/pm2/bin/pm2 reload all
ExecStop=/usr/lib/node_modules/pm2/bin/pm2 kill
[Install]
WantedBy=multi-user.target
Thank you
As said in the comments, I had the exact same issue.
To solve this, just run the following commands as root after trying to start the PM2 service (in your case, this start attempt would be systemctl start pm2-root)
ausearch -c 'systemd' --raw | audit2allow -M my-systemd
semodule -i my-systemd.pp
This looks pretty generic, but it works. These lines were suggested by SELinux itself. To get them, I had to run the command journalctl -xe after trying to start the service
Two options:
Edit the systemd file that starts pm2 and specify an alternative location for the pm2 PIDFile). You'll have to make two changes, one to tell pm2 where to place the PIDFile, and one to tell systemd where to look for it. Replace the existing PIDFile line with the following two lines
Environment=PM2_PID_FILE_PATH=/run/pm2.pid
PIDFile=/run/pm2.pid
Create an SELinux rule that allows this particular behavior. You can do that exactly as Backslash36 suggest in their answer. If you want to create the policy file yourself rather than through audit2allow,the following should work, although then you have to compile it to a usable .pp file yourself.
module pm2 1.0;
require {
type user_home_t;
type init_t;
class file read;
}
#============= init_t ==============
allow init_t user_home_t:file read;

systemd podman This usually indicates unclean termination of a previous run, or service implementation deficiencies

I am running container with systemd/pod, when I want to deploy new image tag. stopping service, updating the service file and starting. but container failed to start.
systemd file.
[Unit]
Description=hello_api Podman Container
After=network.target
[Service]
Restart=on-failure
RestartSec=3
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStartPre=-/usr/bin/podman rm hello_api
ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d -h modelenv \
--name hello_api --rm --ulimit=host -p "8001:8001" -p "8443:8443" 7963-hello_api:7.8
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid
[Install]
WantedBy=default.target
here is error message.
May 21 10:41:43 webserver systemd[1471]: hello_api.service: Found left-over process 22912 (conmon) in control group while starting unit. Ignoring.
May 21 10:41:43 webserver systemd[1471]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 10:41:43 webserver systemd[1471]: hello_api.service: Found left-over process 22922 (node) in control group while starting unit. Ignoring.
May 21 10:41:43 webserver systemd[1471]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 10:41:43 webserver systemd[1471]: hello_api.service: Found left-over process 22960 (node) in control group while starting unit. Ignoring.
May 21 10:41:43 webserver systemd[1471]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 10:41:44 webserver podman[24565]: 2020-05-21 10:41:44.586396547 -0400 EDT m=+1.090025069 container create 28eaf881f532339766cc96ec27a69d8ad588e07d4bfc70e65e7c54e8a5082933 (image=7963-hello_api:7.8, name=hello_api)
May 21 10:41:45 webserver podman[24565]: Error: error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]
May 21 10:41:45 webserver systemd[1471]: hello_api.service: Control process exited, code=exited status=126
May 21 10:41:45 webserver systemd[1471]: hello_api.service: Failed with result 'exit-code'.
May 21 10:41:45 webserver systemd[1471]: Failed to start call_center_hello_api Podman Container.
why its giving this error, is there option to cleanly exit the old container?
I think we followed the same tutorial here: https://www.redhat.com/sysadmin/podman-shareable-systemd-services
"It’s important to set the kill mode to none. Otherwise, systemd will start competing with Podman to stop and kill the container processes. which can lead to various undesired side effects and invalid states"
I'm not sure if the behavior changed, but I removed the KillMode=none causing it to use the default KillMode=control-group. I have not had any problems managing the service since. Also, I removed the / from some of the commands because it was being duplicated:
ExecStartPre=/usr/bin/rm -f //run/user/1000/registry.service-pid //run/user/1000/registry.service-cid
It's now:
ExecStartPre=/usr/bin/rm -f /run/user/1000/registry.service-pid /run/user/1000/registry.service-cid
The full service file I use for running a docker registry:
[Unit]
Description=Image Registry
[Service]
Restart=on-failure
ExecStartPre=-/usr/bin/podman volume create registry
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile %t/%n-pid --cidfile %t/%n-cid -d -p 5000:5000 -v registry:/var/lib/registry --name registry docker.io/library/registry
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat %t/%n-cid`"
Type=forking
PIDFile=/%t/%n-pid
[Install]
WantedBy=multi-user.target

Docker seems to be killed before my shutdown-script is executed when my is VM preempted on GCE

I want to implement a shutdown-script that would be called when my VM is going to be preempted on Google Compute Engine. That VM is used to run dockers containers that execute long running batches, so I send them a signal to make them gracefully exit.
That shutting-down script is working well when I execute it manually, yet it breaks on a real premption use-case, or when I kill the VM by myself.
I got this error:
... logs from my containers ...
A 2019-08-13T16:54:07.943153098Z time="2019-08-13T16:54:07Z" level=error msg="error waiting for container: unexpected EOF"
(just after this error, I can see what I put in the 1st line of my shutting-down script, see code below)
A 2019-08-13T16:54:08.093815210Z 2019-08-13 16:54:08: Shutting down! TEST SIGTERM SHUTTING DOWN (this is the 1st line of my shuttig-down script)
A 2019-08-13T16:54:08.093845375Z docker ps -a
(no reult)
A 2019-08-13T16:54:08.155512145Z ps -ef
... a lot of things, but nothing related to docker ...
2019-08-13 16:54:08: task_subscriber not running, shutting down immediately.
I use preemptible VM from GCE, with image Container-Optimized OS 73-11647.267.0 stable. I run my dockers as service with systemctl, yet I don't thik this is related - [edit] Actually I could solve my issue thanks to this.
Right now, I am pretty sure that a lot of things happens when Google send the ACPI signal to my VM, even before the shutdown-script is fetched from the VM metadata and is called.
My guess is that all the services are being stopped at the same time, eventually docker.service itself.
When my container is running, I can get the same level=error msg="error waiting for container: unexpected EOF" with a simple sudo systemctl stop docker.service
Here is a part of my shuting-down script:
#!/bin/bash
# This script must be added in the VM metadata as "shutdown-script" so that
# it is executed when the instance is being preempted.
CONTAINER_NAME="task_subscriber" # For example, "task_subscriber"
logTime() {
local datetime="$(date +"%Y-%m-%d %T")"
echo -e "$datetime: $1" # Console
echo -e "$datetime: $1" >>/var/log/containers/$CONTAINER_NAME.log
}
logTime "Shutting down! TEST SIGTERM SHUTTING DOWN"
echo "docker ps -a" >>/var/log/containers/$CONTAINER_NAME.log
docker ps -a >>/var/log/containers/$CONTAINER_NAME.log
echo "ps -ef" >>/var/log/containers/$CONTAINER_NAME.log
ps -ef >>/var/log/containers/$CONTAINER_NAME.log
if [[ ! "$(docker ps -q -f name=${CONTAINER_NAME})" ]]; then
logTime "${CONTAINER_NAME} not running, shutting down immediately."
sleep 10 # Give time to send logs
exit 0
fi
logTime "Sending SIGTERM to ${CONTAINER_NAME}"
#docker kill --signal=SIGTERM ${CONTAINER_NAME}
systemctl stop taskexecutor.service
# Portable waitpid equivalent
while [[ "$(docker ps -q -f name=${CONTAINER_NAME})" ]]; do
sleep 1
logTime "Waiting for ${CONTAINER_NAME} termination"
done
logTime "${CONTAINER_NAME} is done, shutting down."
logTime "TEST SIGTERM SHUTTING DOWN BYE BYE"
sleep 10 # Give time to send logs
If I simply call systemctl stop taskexecutor.service manually (not by really shutting down the server), the SIGTERM signal is sent to my docker and my app properly handle it and exists.
Any idea?
-- How I solved my issue --
I could solve it by adding this dependency on docker in my service config:
[Unit]
Wants=gcr-online.target docker.service
After=gcr-online.target docker.service
I don't know how the magic works beyond the execution of the shutdown-script stored in metadata by Google. But I think that they should fix something in their Container-Optimized OS so that that magic happens before docker is stopped. Otherwise, we could not rely on it to gracefully shutdown a basic script with it (hopefully I was using systemd here)...
From the documentation[1] usage of shutdown scripts on the preemptible VM instances are feasible. However, it seems there are some limitations in place while using the shutdown scripts, Compute Engine executes shutdown scripts only on a best-effort basis. In rare cases, Compute Engine cannot guarantee that the shutdown script will complete. Also I would like to mention Preemptible instances has 30 seconds after instance preemption begins[2] which might be killing the docker before the shutdown script was executed.
From the error message provided in your use case, it seems to be an expected behaviour with the Docker running continuously for longer time[3].
[1]https://cloud.google.com/compute/docs/instances/create-start-preemptible-instance#handle_preemption
[2] https://cloud.google.com/compute/docs/shutdownscript#limitations
[3] https://github.com/docker/for-mac/issues/1941

Correct way to set a crontab to stop and start mysql and httpd

i'm trying to create a shell script to stop and start mysql and httpd every saturday on 3am, i'm doing it:
myscript.sh:
#!/bin/sh
echo "Stopping MySQL"
service mysqld stop
sleep 1s
echo "Stopping HTTPD"
service httpd stop
sleep 5s
echo "Starting MySQL"
service mysqld start
sleep 2s
echo "Starting HTTPD"
service httpd start
and setting the crontab to:
0 3 * * 6 ~/myscript.sh
It's correct way to do it? i'm stopping and starting mysql and httpd cuz use of memory, should i do some check before stop them? or i can do it without problems?
another question: how to check memory ram before stop them? like a 'if' memory is less than X stop them, something like it?
Thanks in advanced.
Presumably your MySQL workload comes from your httpd web server.
So, do this, to stop httpd first, then bounce mysqld, then restart httpd.
service httpd stop
sleep 10s
service mysqld restart
service httpd start
But, you should investigate carefully whether this is truly necessary. Lots of production systems don't need it. Modern Apache servers limit the lifetime of their worker processes automatically, to handle the memory leak situation you are mentioning.

kubernetes failing to connect on fresh installation of CoreOS

I'm running (from Windows 8.1) a Vagrant VM for CoreOS (yungsang/coreos).
I installed kubernetes according to the guide I found here and created the json for the pod using my images.
When I execute sudo ./kubecfg list /pods I get the following error:
F0909 06:03:04.626251 01933 kubecfg.go:182] Got request error: Get http://localhost:8080/api/v1beta1/pods?labels=: dial tcp 127.0.0.1:8080: connection refused
Same goes for sudo ./kubecfg -h http://127.0.0.1:8080 -c /vagrant/app.json create /pods
EDIT: Update
Instead of running the commands myself I integrated into the vagrant file (as such) .
This makes kubernetes work fine. HOWEVER after some time my vagrant ssh connection gets closed off. I reconnect and any kubernetes commands I specify result in the same error as above.
EDIT 2: Update
I managed to get it to run again, however I am unsure if it will run smoothly
I had to re-execute the following commands.
sudo systemctl start etcd
sudo systemctl start download-kubernetes
sudo systemctl start apiserver
sudo systemctl start controller-manager
sudo systemctl start kubelet
sudo systemctl start proxy
I believe it is in fact the apiserver that needs restarting
What is the source of this "timeout"? (Where are any logs I can find for this matter)
Kubernetes development is moving insanely fast right now so this could be out of date by tomorrow. With that in mind, the kubernetes folks recommend following one of their official installation guides. The best advice would be to start over fresh with one of the new installation guides but there are a few tips that I have learned doing this myself.
The first thing to note is that Kubecfg is being deprecated in favor of kubectl. So for future reference if you want to get info about a pod you would run something like:
./kubectl get pods.
With kubectl you will also need to set an env variable so kubectl know how to talk to the apiserver:
KUBERNETES_MASTER=http://IPADDRESS:8080.
The easiest way to debug exactly what is going on if you are using CoreOS is to tail the logs for the service you are interested in. So if you have a kube-apiserver unit you can look at what's goin on by running:
journalctl -f -u kube-apiserver
from the node that is running the apiserver. If that service isn't running, which may be the case, you can start it with:
systemctl start kube-apiserver
On CoreOS you should look at the logs using journalctl.
For example, if you wish to see etcd logs, which Kubernetes relies on for storing the state of it's minions, run journalctl _COMM=etcd, and similarly journalctl _COMM=apiserver will show you the logs from the apiserver, one of key components in Kubernetes.
You also get last few log entries if you run systemctl status apiserver.
Based on errordevelopers advice, my recent installation ran against a similar problem.
Using systemctl status apiserver and sudo systemctl start apiserver I managed to get the environment up and running again.