Cloud Run sending SIGTERM with no visible scale down on container instances - gunicorn

I've deployed a Python FastAPI application on Cloud Run using Gunicorn + Uvicorn workers.
Cloud Run configuration:
Dockerfile
FROM python:3.8-slim
# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True
ENV PORT ${PORT}
ENV APP_HOME /app
ENV APP_MODULE myapp.main:app
ENV TIMEOUT 0
ENV WORKERS 4
WORKDIR $APP_HOME
COPY ./requirements.txt ./
# Install production dependencies.
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
# Copy local code to the container image.
COPY . ./
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.
CMD exec gunicorn --bind :$PORT --workers $WORKERS --worker-class uvicorn.workers.UvicornWorker --timeout $TIMEOUT $APP_MODULE --preload
My application receives a requests and does the following:
Makes async call to cloud-firestore using firestore.AsyncClient
Runs an algorithm using Google OR-Tools. I've used a Cprofiler to check that this task on average takes < 500 ms to complete.
Adds a FastAPI async Background Task to write to BigQuery. This is achieved as follows:
from fastapi.concurrency import run_in_threadpool
async def bg_task():
# create json payload
errors = await run_in_threadpool(lambda: client.insert_rows_json(table_id, rows_to_insert)) # Make an API request.
I have been noticing intermittent Handling signal: term logs which causes Gunicorn to shut down processes and restart them. I can't get my head around as to why this might be happening. And the surprising bit is that this happens sometimes at off-peak hours when the API is receiving 0 requests. There doesn't seem to be any apparent scaling down of Cloud Run instances to be causing this issue either.
Issue is, this also happens quite frequently during production load to my API during peak hours - and even causes Cloud Run to autoscale from 2 to 3/4 instances. This adds cold start times to my API. My API receives on average 1 reqs/minute.
Cloud Run metrics during random SIGTERM
As clearly shown here, my API has not been receiving any requests in this period and Cloud Run has no business killing and restarting Gunicorn processes.
Another startling issue is that this seems to only happen in my production environment. In my development environment, I have the exact SAME setup but I don't see any of these issues there.
Why is Cloud Run sending SIGTERM and how do I avoid it?

Cloud Run is a serverless platform, that means the server management is done by Google Cloud and it can choose to stop some instance time to time (for maintenance reason, for technical issue reason,...).
But it changes nothing for you, of course a cold start but it should be invisible for your process, even in high load, because you have a min-instance param to 2 that keep the instance up and ready to serve the traffic without cold start.
Can you have 3 or 4 instances in parallel, instead of 2 (min value)? Yes, but the Billable instance is flat to 2. Cloud Run, again, is serverless, it can create instances to backup and be sure that the future shut down of some won't impact your traffic. It's an internal optimization. No addition cost, it just works well!
Can you avoid that? No, because it's serverless, and also because there no impact on your workloads.
Last point about "environment". For Google Cloud, all the project are production projects. No difference, google can't know what is critical or not, therefore all is critical.
If you note difference between 2 projects it's simply because your projects are deployed on different Google Cloud internal clusters. The status, performances, maintenance operations (...) are different between clusters. And again, you can't do anything for that.

Related

Google Cloud Composer - Create Environment - with a few compute engine instances - That is expensive

I am new to Google Cloud Composer and following the QuickStart instruction, Create the Environment, Load Dag, Check Airflow, and Delete the Environment.
But in (real life) production use case, after we finish load dag files and run them in the environment. Should we delete the Google Cloud Composer Environment? Because there might be several compute instances in that composer and doing nothing now. It is expensive.
But if I delete the environment, then I would lose the access to its airflow web portal, and I could not check the processing logs of my processing on the deleted environment.
So what should I do? In real life production case, should I delete or not delete the environment after the processing is done?
Apache Airflow (and therefore Cloud Composer) is for orchestrating workflows, not for ETL batch jobs that only require transient compute resources. Similarly to how you wouldn't turn a server off just because a scheduled cron task isn't running, Composer environments are meant to be long-running compute resources that are always online, such that you can schedule repeating workflows whenever necessary (whether that be per second, daily, etc)
In a real production case, a Composer environment should always be left running, or no DAGs will be scheduled when it is down. If you have a development environment and wish to save money, then you can resize the Composer environment's attached GKE cluster to 0 nodes so you won't be billed for them. Similarly, if you don't think you're running enough DAGs to justify the cost, consider smaller worker machine sizes.

rsync mechanism in wso2 all in one active-active

I am deploying active-active all in one in 2 separate servers with wso2-am 2.6.0 and wso2 analytics 2.6.0. I am configuring my servers by this link. In part 4 and 5 about rsync mechanism I have some questions:
1.how can I figure out that my server is working rsync or sync??
2.What will happen in future if I don't use rsync now and also don't use configuration on part 4 and 5 ?
1.how can I figure out that my server is working rsync or sync??
It is not really clear what are you asking for.. rsync is just a command to synchronize files in folders.
What is the rsync used for - when deploying an API, the gateway creates or updates a few synapse sequences or apis in the filesystem (repository/deployment/server) and these file updates need to be synchronized to all gateway nodes.
I personally don't advice using rsync, the whole issue is that you need to invoke regularly the rsynccommand to synchronize the files created by a master node. That creates certain delay for service availability and most important, if something goes wrong and you want to use another node as the master, you need to switch the rsync direction, which is not really automated process.
We usually keep it simple using a shared filesystem (nfs, gluster, ..) and then we have all active-active setup (ok, setting up HA NFS or glusterFS is not particulary simple, but that's usually job of the infra guys)
2.What will happen in future if I don't use rsync now and also don't use configuration on part 4 and 5 ?
In the case the filesystems between gateways is not synced or shared - you deploy an api from the publisher to a single gateway node, but other gateway nodes won't create the synapse sequences and api artefacts. As a result the other nodes won't pass the client request to the backend

Instance is overutilized. Consider switching to the machine type: g1-small

I created a new f1 micro instance with Ubuntu 16.04. I haven't logged in yet as I have not figured out how to create the SSH key-pair yet. But after two days, the Dashboard now shows:
Instance "xxx" is overutilized. Consider switching to the machine type: g1-small
Why is this happening? Isn't a f1 micro similar to an ec2 t1.nano? I have a t1.nano running a Node.js web site (with nginx, pm2, etc) and my CPU credit has been consistently at the maximum of 150 during this period with only me as a test user.
I started the f1 micro to run the same Node application to see which is more cost-effective. The parameter that was cloudy to me was that unexplained "0.2 virtual CPU". Is 0.2 CPU virtually unuseable? Would 0.5 (g1 small) be significantly better?
To address your connection problems, perhaps temporarily until you figure out the manual key management, you might want to try SSH from the browser which is possible from the Cloud Platform console or use gcloud CLI to assist you.
https://cloud.google.com/compute/docs/instances/connecting-to-instance
Once you get access via the terminal I would run 'top' or 'ps'.
Example of using ps to find the top CPU users:
ps wwaxr -o pid,stat,%cpu,time,command | head -10
Example of running top to find the top memory users:
top -l 1 -o rsize | head -20
Google Cloud also offers a monitoring product called Stackdriver which would give you this information in the Cloud console but it requires an agent to be running on your VM. See the getting started guide if this sounds like a good option for you.
https://cloud.google.com/monitoring/quickstart-lamp
Once you get access to the resource usage data you should be able to determine if 1) the VM isn't powerful enough to run your node.js server or 2) perhaps something else got started on the host unexpectedly and that's the source of your usage.

Managed VMs running Perl on Google App Engine

I have a perl job that runs for 5 mins at the top of every hour. What is the most cost effective way of running this job on the Google Cloud infrastructure? Running a compute engine VM seems too heavy-weight for this since I'd get charged for the other 55 mins of no use. I don't understand the "Managed VMs" well enough, but it seems like this might be an option, but I'm not sure if pricing is rounded to the hour. Does anyone have any ideas what the best option is so that I only get charged for 120 mins of usage (24 times run * 5 minutes). The script also uses some image processing binaries, so converting to Python won't do the trick.
Managed VMs are linked to Google App Engine. If you have an App in GAE, managed VMs are used to configure the hosting environment for you App using VMs that run on Google Compute Engine and these applications are subject to Java and Python run time. This link can give you an idea on pricing on GAE, however Perl is not a supported language in GAE.
On GCE, you can start up an instance, do the task and then delete the instance without deleting the persistence disk, this will allow you to recreate the instance using this disk, however you will still be charged for the provisioned disk space and you will need to create a script that will spin up the instance and delete it. You can also create a snapshot of your disk and recreate your instance based on the snapshot, this will be little bit less expensive that keeping the disk.
Also, you should look at the type of persistence disks (PD) on GCE, at this link, take a look at the examples provided, since based on your operation, regular PD or SSD PD can make a big difference on price.
You can use the pricing calculator to estimate your charges
When you deploy to App Engine using a managed VM, an compute engine instance (managed by google) is created for you. All request to App Engine will be forwarded to the created compute engine instance.
To run your script in App Engine as a Managed VM, you will have to dockerize your project, as the managed VM runs a docker container.
I don't see a reason to use App Engine managed VM (just for running a script), as the cost will be same as using a compute engine instance.
Probably the most cost effective way is to create a script that:
Launches a compute engine instance
Install perl
Copies your script to the instance
Runs you script in the created instance
To schedule the execution, you can put at home/office a cron job that executes the above script.

Integrate different Nagios webservers

I have different sites running with 4 to 5 server at each location. All the locations have one monitoring server with Nagios. Now I want to create a central location and want to combine all the nagios services running at each location. Can anyone please point me to some documentation for these type of jobs.
There are two approaches that you can take.
Install a new Nagios core as you did at each location and perform active checks on each of the remote hosts. You'll likely end up installing NRPE on each of the remote hosts at each location and can read this document for the details: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf. If your remote servers are Windows servers, you can use NSClient to much of the same things that NRPE does for Linux hosts. This effectively centralizes your monitoring server. I also wrote some how-to style entries for using NRPE to run privileged commands http://blog.gnucom.cc/?p=479 or to run event handlers http://blog.gnucom.cc/?p=458. If you get tired of installing NRPE, you can use my script here http://blog.gnucom.cc/?p=185. I also have instructions to install NSClient here http://blog.gnucom.cc/?p=201.
Install a new Nagios core as you did at each location and perform passive checks by instructing the remote Nagios cores to feed their results to the new central Nagios core's passive command file. I haven't done this myself, so I'm going to point you to the communities documentation here http://nagios.sourceforge.net/docs/2_0/passivechecks.html. You could probably look at my event handler post to set up event handlers that send checks to the main server.
From my personal experience, the first option I mentioned is easier to implement, and is far easy to administer. However, as your server fleet grows you'll start seeing major CPU bottlenecks with the main Nagios core. This is where passive checks would become beneficial, as the main Nagios core simply waits for critical checks to be sent to it rather than having to check them itself.
Hope this helps. :)
A centralized view tool may be what you are looking for. There are a number of different options available.
Nagiosfusion
MK Livestatus
Nagcen
Thruk