Gunicorn retry limit - gunicorn

I deploy my app under gunicorn, and when my apps ran failed, I want to stop it instead of restarting with many times. There are any setting for limiting retries in gunicorn?
I'm using gunicorn 19.7.1

Related

Cloud Run sending SIGTERM with no visible scale down on container instances

I've deployed a Python FastAPI application on Cloud Run using Gunicorn + Uvicorn workers.
Cloud Run configuration:
Dockerfile
FROM python:3.8-slim
# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True
ENV PORT ${PORT}
ENV APP_HOME /app
ENV APP_MODULE myapp.main:app
ENV TIMEOUT 0
ENV WORKERS 4
WORKDIR $APP_HOME
COPY ./requirements.txt ./
# Install production dependencies.
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
# Copy local code to the container image.
COPY . ./
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.
CMD exec gunicorn --bind :$PORT --workers $WORKERS --worker-class uvicorn.workers.UvicornWorker --timeout $TIMEOUT $APP_MODULE --preload
My application receives a requests and does the following:
Makes async call to cloud-firestore using firestore.AsyncClient
Runs an algorithm using Google OR-Tools. I've used a Cprofiler to check that this task on average takes < 500 ms to complete.
Adds a FastAPI async Background Task to write to BigQuery. This is achieved as follows:
from fastapi.concurrency import run_in_threadpool
async def bg_task():
# create json payload
errors = await run_in_threadpool(lambda: client.insert_rows_json(table_id, rows_to_insert)) # Make an API request.
I have been noticing intermittent Handling signal: term logs which causes Gunicorn to shut down processes and restart them. I can't get my head around as to why this might be happening. And the surprising bit is that this happens sometimes at off-peak hours when the API is receiving 0 requests. There doesn't seem to be any apparent scaling down of Cloud Run instances to be causing this issue either.
Issue is, this also happens quite frequently during production load to my API during peak hours - and even causes Cloud Run to autoscale from 2 to 3/4 instances. This adds cold start times to my API. My API receives on average 1 reqs/minute.
Cloud Run metrics during random SIGTERM
As clearly shown here, my API has not been receiving any requests in this period and Cloud Run has no business killing and restarting Gunicorn processes.
Another startling issue is that this seems to only happen in my production environment. In my development environment, I have the exact SAME setup but I don't see any of these issues there.
Why is Cloud Run sending SIGTERM and how do I avoid it?
Cloud Run is a serverless platform, that means the server management is done by Google Cloud and it can choose to stop some instance time to time (for maintenance reason, for technical issue reason,...).
But it changes nothing for you, of course a cold start but it should be invisible for your process, even in high load, because you have a min-instance param to 2 that keep the instance up and ready to serve the traffic without cold start.
Can you have 3 or 4 instances in parallel, instead of 2 (min value)? Yes, but the Billable instance is flat to 2. Cloud Run, again, is serverless, it can create instances to backup and be sure that the future shut down of some won't impact your traffic. It's an internal optimization. No addition cost, it just works well!
Can you avoid that? No, because it's serverless, and also because there no impact on your workloads.
Last point about "environment". For Google Cloud, all the project are production projects. No difference, google can't know what is critical or not, therefore all is critical.
If you note difference between 2 projects it's simply because your projects are deployed on different Google Cloud internal clusters. The status, performances, maintenance operations (...) are different between clusters. And again, you can't do anything for that.

how to run containerized console app on Azure?

I have a containerized a console app which run as a scheduler app and performe some logic at specific time. I tried to run it as an Azure App service but getting below error
2022-03-30T19:10:29.209Z INFO - Waiting for response to warmup request for container one-site-scheduler-app_0_d429ab9d. Elapsed time = 209.8994308 sec
2022-03-30T19:10:45.377Z INFO - Waiting for response to warmup request for container one-site-scheduler-app_0_d429ab9d. Elapsed time = 226.067164 sec
2022-03-30T19:10:49.746Z ERROR - Container scheduler-app_0_d for site one-site-scheduler-app did not start within expected time limit. Elapsed time = 230.4276536 sec
2022-03-30T19:10:49.901Z ERROR - Container scheduler-app_0_d didn't respond to HTTP pings on port: 80, failing site start. See container logs for debugging.
2022-03-30T19:10:49.977Z INFO - Stopping site scheduler-app because it failed during startup.
Although before it stops the container I see the logs from the scheduler app.
I am not sure what am I missing here.
I tried following https://ameshram57.medium.com/azure-app-service-container-container-didnt-respond-to-http-pings-e2e653d867fe#:\~:text=if%20you%20are%20getting%20%E2%80%9C%20ERROR,we%20map%20port%20in%20docker.
this article but no luck.
I learn that Azure app service pings on port 80 by default to check if it gets any response, since my container just ran a console application it never responded to the ping from the app service. To fix that I converted my console app to a single point web app using webhostbuilder and followed the steps from the below,
https://nitishkaushik.com/what-is-webhostbuilder-in-asp-net-core/
This trick worked.

Openshift cluster(v4.6) using crc has empty operators hub

I have downloaded code ready containers on windows for installing my openshift cluster. I need to deploy 3scale on it using the operator from operators hub but the operators hub page is empty.
Digging deeper I found that a few pods on the cluster are not running and show a state "ImagePullBackOff"
I deleted the pods in order to get them restarted but the error wont go away. I checked the event logs and all the screenshotted images are attached.
Pods Terminal logs
This is an error that I keep on getting when I start my cluster. Sometimes it comes up sometimes it starts normally but maybe this has something to do with it.
Quay.io Error
This is my fist time making a deployment on openshift cluster and setting up my cluster environment. So far I am not able to resolve the issue even after deleting and restarting the cluster.

Laravel 5.4 queue:restart on windows?

I am learning laravel 5.4 "queues" chapter. I have a problem about queue:restart command. Because when I test it on my windows 10 platform, I found this command seems just kill queue worker, but not restart worker. So I wonder whether this command does not work on windows or this command is just kill worker but not restart worker? Thanks.
The queue:restart command never actually restarts a worker, it just tells it to shutdown. It is supposed to be combined with a process manager like supervisor that will restart the process when it quits. This also happens when queue:work hits the configured memory limits.
To keep the queue:work process running permanently in the background, you should use a process monitor such as Supervisor to ensure that the queue worker does not stop running.
Source: https://laravel.com/docs/5.4/queues#running-the-queue-worker

Start/stop sqsd daemon on Elastic Beanstalk to view SQS queue messages

I would like to view SQS messages before they get picked up by sqsd on my Elastic Beanstalk intstance. Is there a way, once ssh'ed into the instance, that sqsd can be stopped / started as a service all together?
For the purpose of debugging you may stop sqsd by running sudo service aws-sqsd stop on the instance. You can check the status by running sudo service aws-sqsd status. Note this is not recommended for a production service but for the purpose of debugging you may try this.