Resource Utilisation - google-compute-engine

I have created an compute engine instance in Google Cloud environment. The instance hosts a service. Using some script I load tested the service by sending http requests at different load level. Now I want to visualize the resource utilization in R. Is there a simple API that I can use to fetch CPU utilization between X and Y time? X & Y are load test time.
Thanks

Use Google Cloud Monitoring API.
Thecompute.googleapis.com/instance/cpu/utilization metric returns the percentage of the allocated CPU that is currently in use on the instance.
Thecompute.googleapis.com/instance/cpu/usage_time metric is a delta type and you can specify the time interval for CPU usage. You can try these APIs at this link or at Google API Explorer.

Related

looking for simple cluster configuration

I am using compute engine for embarrassingly parallel scientific calculations. Some of my calculations require a single core and some require 64-cores machines. I am currently using my own scripts: I have a qsub-like command that creates a new instance with the required number of cores, booting it from a custom image with the pre-installed software, connects to a storage bucket via gcsfuse, runs the required command and then kills the instance after it's done.
Do I really need to do all of that with my own scripts, or is there any tool that I should use instead? I'd much rather use some ready made tool for all of the management.
My usage fluctuates widely (hundreds of cores in parallel for 3 hours, then 2 days with nothing, etc). So I don't want constant sized machines: I like to be billed by the minute for my computations.
You may want to use auto-scaling feature for managed instance group in Google Compute Engine(GCE). This feature adds more instances to your instance group when there is more load (upscaling), and removes instances when there is less load (downscaling). Moreover, you can define autoscaling policy based upon CPU utilization, or Load balancer utilization or request per seconds. Please refer autoscaler decisions document to understand decisions that autoscaler might make when scaling instance groups.

Request Rate Limit on Google Compute Engine

I am running a Tornado webserver on Google Compute engine. The webserver returns a very simple JSON response. When I test the throughput capacity of this server, it seems to be throttled at 20 req/s. I can not achieve a higher throughput than 20 req/s.
I know that there is a Google Compute Engine API rate limit at 20 req/s. Is there some sort of Network/Instance rate limit that prevents my server fulfilling more than 20 req/s? How do I increase this limit?
The rate limit of 20 requests per second is not on the server, it is on the GCE API - like when you make calls from gcloud to create instances (it calls the GCE API underneath the covers).
As documented here, the network bandwidth of a GCE VM is limited mainly by the software you run on it, and to some extent by the size of the VM (VMs get up to 2 Gbps per core until 8 cores for a max rate of 16 Gbps). Nothing in the VM subsystem knows anything about requests or responses, it's all just IP traffic to us.

Autoscaling GCE Instance groups based on Cloud pub/sub queue

Can GCE Instance groups be scaled up/down bases on Google Cloud PubSub queue counts or other asynchronous task queues such as PSQ?
Yes!
The feature is now in alpha: https://cloud.google.com/compute/docs/autoscaler/scaling-queue-based
I haven't tried this myself but looking at the documentation, it looks possible to set up autoscaling against Pub/Sub message queue counts.
This page [0] explains how to setup autoscaler to scale based on a standard metric provided by the Cloud Monitoring service.
This page [1] explains what metrics you can use for autoscaler. These two looks useful:
pubsub.googleapis.com/subscription/num_outstanding_messages
pubsub.googleapis.com/subscription/num_undelivered_messages
[0] https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics
[1] https://cloud.google.com/monitoring/api/metrics
You can't use pubsub metrics (pubsub.googleapis.com/subscription/num_outstanding_messages or pubsub.googleapis.com/subscription/num_undelivered_messages) for that purpose.
According to the docs:
A valid utilization metric for scaling meets the following criteria:
The standard metric has a label for resource_id, and value of the label for each stream is ID of an instance.
The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number virtual machine instances in the group.
pubsub metrics don't meet that criteria.
However, there are two ways you can use pubsub based autoscaling:
Write your own custom metric - you can use the gcloud monitoring api to get your pubsub timeseries data. Than use it to calculate your own custom monitoring metric - for example - last time series value divided by your average/desired latency.
You can use this method with every async queue solution that you are using.
Still in alpha, there is a gcloud api for subscriber based autoscale: https://cloud.google.com/compute/docs/autoscaler/scaling-queue-based. This solution applies for google cloud pubsub only, and you can't use it with other async queue solutions.

Google Compute Engine auto scaling based on queue length

We host our infrastructure on Google Compute Engine and are looking into Autoscaling for groups of instances. We do a lot of batch processing of binary data from a queue. In our case, this means:
When a worker is processing data the CPU is always 100%
When the queue is empty we want to terminate all workers
Depending on the length of the queue we want a certain amount of workers
However I'm finding it hard to figure out a way to auto-scale this on Google Compute Engine because they appear to scale on instance-only metrics such as CPU. From the documentation:
Not all custom metrics can be used by the autoscaler. To choose a
valid custom metric, the metric must have all of the following
properties:
The metric must be a per-instance metric.
The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale up or down
the number of virtual machines.
If I'm reading the documentation properly this makes it hard to use the auto scaling on a global queue length?
Backup solutions
Write a simple auto-scale handler using the Google Cloud API to create or destroy new workers using Instances API
Write a simple auto-scale handler using instance groups and then manually insert/remove instances using the InstanceGroups: insert
Write a simple auto-scaling handler using InstangeGroupManagers: resize
Create a custom per-instance metric which measures len(queue)/len(workers) on all workers
As of February 2018 (Beta) this is possible via "Per-group metrics" in stackdriver.
Per-group metrics allow autoscaling with a standard or custom metric
that does not export per-instance utilization data. Instead, the group
scales based on a value that applies to the whole group and
corresponds to how much work is available for the group or how busy
the group is. The group scales based on the fluctuation of that group
metric value and the configuration that you define.
More information at https://cloud.google.com/compute/docs/autoscaler/scaling-stackdriver-monitoring-metrics#per_group_metrics
The how-to is too long to post here.
As far as I understand this is not implemented yet (as at January 2016). At the moment autoscaling is only targeted at web serving scenarios, where you want to serve web pages/other web services from your machines and keep some reasonable headroom (e.g. in terms of CPU or other metrics) for spikes in traffic. Then the system will adjust the number of instances/VMs to match your target.
You are looking for autoscaling for batch processing scenarios, and this is not catered for at the moment.

Best way to track CPU load % on a SGE cluster in Google Compute Engine?

I'd like to analyze the CPU load history to measure the efficiency of our algorithms. What is the best way to track CPU load % on a SGE cluster in GCE? Google cloud console tracks it, but does not make it downloadable as far as I can tell. SGE tracks load internally but it seems rather basic. My guess is Ganglia will be the route to go.
Consider using Google Cloud Monitoring API which was recently released to track CPU load in your instances.