Google Compute Engine auto scaling based on queue length - google-compute-engine

We host our infrastructure on Google Compute Engine and are looking into Autoscaling for groups of instances. We do a lot of batch processing of binary data from a queue. In our case, this means:
When a worker is processing data the CPU is always 100%
When the queue is empty we want to terminate all workers
Depending on the length of the queue we want a certain amount of workers
However I'm finding it hard to figure out a way to auto-scale this on Google Compute Engine because they appear to scale on instance-only metrics such as CPU. From the documentation:
Not all custom metrics can be used by the autoscaler. To choose a
valid custom metric, the metric must have all of the following
properties:
The metric must be a per-instance metric.
The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale up or down
the number of virtual machines.
If I'm reading the documentation properly this makes it hard to use the auto scaling on a global queue length?
Backup solutions
Write a simple auto-scale handler using the Google Cloud API to create or destroy new workers using Instances API
Write a simple auto-scale handler using instance groups and then manually insert/remove instances using the InstanceGroups: insert
Write a simple auto-scaling handler using InstangeGroupManagers: resize
Create a custom per-instance metric which measures len(queue)/len(workers) on all workers

As of February 2018 (Beta) this is possible via "Per-group metrics" in stackdriver.
Per-group metrics allow autoscaling with a standard or custom metric
that does not export per-instance utilization data. Instead, the group
scales based on a value that applies to the whole group and
corresponds to how much work is available for the group or how busy
the group is. The group scales based on the fluctuation of that group
metric value and the configuration that you define.
More information at https://cloud.google.com/compute/docs/autoscaler/scaling-stackdriver-monitoring-metrics#per_group_metrics
The how-to is too long to post here.

As far as I understand this is not implemented yet (as at January 2016). At the moment autoscaling is only targeted at web serving scenarios, where you want to serve web pages/other web services from your machines and keep some reasonable headroom (e.g. in terms of CPU or other metrics) for spikes in traffic. Then the system will adjust the number of instances/VMs to match your target.
You are looking for autoscaling for batch processing scenarios, and this is not catered for at the moment.

Related

Compute Engine - Automatic scale

I have one VM Compute Engine to host simple apps. My apps is growing and the number of users too.
Now my users work basicaly from 08:00 AM to 07:00 PM, in this period the usage os CPU and Memory is High and the speed of work is very important.
I'm preparing to expand the memory and processor in the next days, but i search a more scalable and cost efective way.
Is there a way for automatic add resources when i need and reduce after no more need?
Thanks
The cost of running your VMs is directly related to a number of different factors i.e. the type of network in use (premium vs standard), the machine type, the boot disk image you use (premium vs open-source images) and the region/zone where your workloads are running, among other things.
Your use case seems to fit managed instance groups (MIGs). With MIGs you essentially configure a template for VMs that share the same attributes. During the configuration of your MIG, you will be able to specify the CPU/memory limit beyond which the MIG autoscaler will kick off. When your CPU/memory reading goes below that threshold, MIG scales your VMs down to the number of instances specified in your template.
You can also use requests per second as a threshold for autoscaling and I would recommend you explore the docs to know more about it.
See docs

How to scale azure functions to high throughput short lived Event Grid events

When publishing a large amount of events to a topic (where the retry and time to live is in the minutes) many fail to get delivered to subscribed functions. Does anyone know of any settings, or approaches to ensure scaling react quickly without dropping them all?
I am creating a Azure Function app that essentially passes events to an event grid topic at high rate, and other functions subscribed to a topic will handle the events. These events are meant to be short lived and not persist longer than a specified set of minutes. Ideally I want to see the app scale to handle the load without dropping events. the overall goal is that each event will trigger an outbound api endpoint call to my own api to test performance/load.
I have reviewed documentation on MSDN, and other locations but not much fits my scenario (most talk in terms of incoming events and not outbound http events).
For scaling I have looked into host.json settings for http (as there is none for grid events and grid events look to be similar to http triggers) and setting those seemed to have made some improvements
The end result I expect is that for every publish to a topic endpoint it gets delivered to a function and executed with a low fail delivery/drop rate.
What I am seeing is that when publishing many events to a topic (and at a consistent rate), the majority of events get dead-lettered/dropped
Consumption plan is limited by the computing power that is assigned to your function. In essence that there are some limits up to which it can scale, and then it becomes the bottle neck.
I suggest to have a look at the limitations.
And here you can some insights about computing power differences.
If you want to enable automatic scaling, or scaling in the number of vm instances I suggest using an app service plan. The cheapest option where scaling is supported is Standard pricing tier.

looking for simple cluster configuration

I am using compute engine for embarrassingly parallel scientific calculations. Some of my calculations require a single core and some require 64-cores machines. I am currently using my own scripts: I have a qsub-like command that creates a new instance with the required number of cores, booting it from a custom image with the pre-installed software, connects to a storage bucket via gcsfuse, runs the required command and then kills the instance after it's done.
Do I really need to do all of that with my own scripts, or is there any tool that I should use instead? I'd much rather use some ready made tool for all of the management.
My usage fluctuates widely (hundreds of cores in parallel for 3 hours, then 2 days with nothing, etc). So I don't want constant sized machines: I like to be billed by the minute for my computations.
You may want to use auto-scaling feature for managed instance group in Google Compute Engine(GCE). This feature adds more instances to your instance group when there is more load (upscaling), and removes instances when there is less load (downscaling). Moreover, you can define autoscaling policy based upon CPU utilization, or Load balancer utilization or request per seconds. Please refer autoscaler decisions document to understand decisions that autoscaler might make when scaling instance groups.

Autoscaling GCE Instance groups based on Cloud pub/sub queue

Can GCE Instance groups be scaled up/down bases on Google Cloud PubSub queue counts or other asynchronous task queues such as PSQ?
Yes!
The feature is now in alpha: https://cloud.google.com/compute/docs/autoscaler/scaling-queue-based
I haven't tried this myself but looking at the documentation, it looks possible to set up autoscaling against Pub/Sub message queue counts.
This page [0] explains how to setup autoscaler to scale based on a standard metric provided by the Cloud Monitoring service.
This page [1] explains what metrics you can use for autoscaler. These two looks useful:
pubsub.googleapis.com/subscription/num_outstanding_messages
pubsub.googleapis.com/subscription/num_undelivered_messages
[0] https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics
[1] https://cloud.google.com/monitoring/api/metrics
You can't use pubsub metrics (pubsub.googleapis.com/subscription/num_outstanding_messages or pubsub.googleapis.com/subscription/num_undelivered_messages) for that purpose.
According to the docs:
A valid utilization metric for scaling meets the following criteria:
The standard metric has a label for resource_id, and value of the label for each stream is ID of an instance.
The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number virtual machine instances in the group.
pubsub metrics don't meet that criteria.
However, there are two ways you can use pubsub based autoscaling:
Write your own custom metric - you can use the gcloud monitoring api to get your pubsub timeseries data. Than use it to calculate your own custom monitoring metric - for example - last time series value divided by your average/desired latency.
You can use this method with every async queue solution that you are using.
Still in alpha, there is a gcloud api for subscriber based autoscale: https://cloud.google.com/compute/docs/autoscaler/scaling-queue-based. This solution applies for google cloud pubsub only, and you can't use it with other async queue solutions.

Can Google compute engine addInstance fail due to no resources available?

If I am within my quota am I guaranteed to be able to have my instance created (assumimg all other inputs are valid) - trying to determine if my quota equates to reserved capacity that I can count on to be available if needed.
Google Compute Engine is engineered for scale, and a fundamental design goal is enabling all users to scale their workloads up (and down) on demand.
However, quotas are not private reservations and instances.insert() can fail in rare cases.