We have a requirement to monitor and try to restart our gUnicorn/Django app if it goes down. We're using gunicorn 20.0.4.
I have the following nrs.service running fine with systemd. I'm trying to figure out if it's possible to integrate systemd's watchdog capabilities with gUnicorn. Looking through the source I don't see anywhere a sd_notify("WATCHDOG=1") is being called so I'm thinking that no, gunicorn doesn't know how to keep systemd aware that it's up (it calls sd_notify("READY=1...") at startup but in its run loop there's no signal being sent saying it's still running)
Here's the nrs.service file. I have commented out the watchdog vars because it obviously sends my service into a failed state shortly after it starts.
[Unit]
Description=Gunicorn instance to serve NRS project
After=network.target
[Service]
WorkingDirectory=/etc/nrs
Environment="PATH=/etc/nrs/bin"
ExecStart=/etc/nrs/bin/gunicorn --error-logfile /etc/nrs/logs/gunicorn_error.log --certfile=/etc/httpd/https_certificate/nrs.cer --keyfile=/etc/httpd/https_certificate/server.key --access-logfile /etc/nrs/logs/gunicorn_access.log --capture-output --bind=nrshost:8800 anomalyalerts.wsgi
#WatchdogSec=15s
#Restart=on-failure
#StartLimitInterval=1min
#StartLimitBurst=4
[Install]
WantedBy=multi-user.target
So systemd watchdog is doing its thing, just looks like out of the box gunicorn doesn't support it. Not very familiar with 'monkey-patching' but I'm thinking if we want to use this method of monitoring, I'm going to have to implement some custom code? Other thought was just to have a watch command check the service and try to restart it, which might be easier.
Thanks
Jason
monitor and try to restart our gUnicorn/Django app if it goes down
systemd's watchdog will not help in the described case. The reason is that the the watchdog is intended to monitor the main service process, which does not run your app directly.
The Gunicorn's master process, which is the main service process from the systemd's perspective, is a loop that manages the worker processes. Your app is running inside the worker process, so if anything happens there, the worker process is the one that should be restarted, not the master process.
Worker processes' restart is handled by Gunicorn automatically (see timeout setting). As for the main service process, in a rare case when it dies, the Restart=on-failure option can restart it even without a watchdog (see the docs for details on how it behaves).
Related
I am following what's suggested in this article to change the iptables rules in order to allow incoming connections. For some reason, the qemu hooks does not run. I simply tried to write into a file with echo 'some output' > someweirdfilename before making any vm name checks to run the actual script itself to later check the existence of the file. It looks like the hook is not executed at all. Made sure that libvirtd.service is restarted, so is guest and eventually tried the complete reboot. All resulted in the same. Running libvirt 7.6.0 on a fedora 35. Does anyone have any suggestions for troubleshooting?
I am creating a Flask app and I using Nginx and Gunicorn inside a virtual enviroment. When I start Gunicorn, gunicorn app:app everything works fine. Then when I activate the Supervisor to keep gunicorn active, it gives me a 500 error. I am reading in my log in var/log/ that is error happens when I try to open a file that should have been created after subprocess.run(command, capture_output=True, shell=True) So this line is not being executed correctly.
Is there an alternative to supervisor to keep my app running when my putty is closed?
Thanks.
I found the answer here.
https://docs.gunicorn.org/en/stable/deploy.html
It says that one good option is using Runit.
EDIT:
I ended up using the Gunicorn function called --deamon. It is similar and makes everything much simpler.
I have faced with following case and haven't found clear answer for me.
Preconditions:
I have kubernetes cluster
there are some options related to my application (for example debug_level=Error)
there are pods running and each of them uses configuration (ENV, mount path or cli args)
later I need to change value of some option (the same 'debug_level' Error -> Debug)
The Q is:
how should I notify my Pods that configuration has changed?
Earlier we could just send HUP signal to the exact process directly or call systemctl reload app.service
What are the best practices for this use-case?
Thanks.
I think this is something you could achieve using sidecar containers. This sidecar container could monitor for changes in the configuration and send the signal to the appropiate process. More info here: http://blog.kubernetes.io/2015/06/the-distributed-system-toolkit-patterns.html
Tools like kubediff or kube-applier can compare your kubernetes YAML files, to what's running on the cluster.
https://github.com/weaveworks/kubediff
https://github.com/box/kube-applier
I am using rabbitmq,mongodb and mysql with my webservices.
To keep my webservices awake i am using npm forever module(forever start app.js).
warn: --minUptime not set. Defaulting to: 1000ms
warn: --spinSleepTime not set. Your script will exit if it does not stay up for at least 1000ms
info: Forever processing file: app.js
But my web services goes down after a certain period probably after 9-12 hours of started.
If i restart my web services using node app or forever start app.js,my web services are again doing good and goes down again after certain span and this cycle goes on.
Is there any possible way to fix this and also I would like to know the reason for this?
Make sure you write the console log and error log to a file and check if you have any errors which is killing the node process.
forever start -o logs/out.log -e logs/err.log app.js
This might help troubleshoot exceptions and handle them.
While at the (excellent!) Polymer Summit in London, as part of the codelabs I ran "polymer serve" and got the application template up and running: http://localhost:8080/
Great! But how do I stop the server? It's continually running and survives a reboot. I need to get on with another project :P
I'm on Windows (W10 64). I have tried the usual method I have used before to stop node servers (is Polyserve node based?).
If i run netstat -an there is nothing listed related to 8080
If I run
netstat -ano | find "LISTENING" | find "8080"
nothing is returned.
Answering my own question, I guess this is just down to the hard caching by the service worker, as a hard reload refreshes as expected.
Lots of potential for confusion with service worker lifecycle!
Edit: "unregister service worker" in devtools did the trick!