GCE autoscaling by GKE resource reservation - google-compute-engine

According to Kubernetes documentation,
If you are using GCE, you can configure your cluster so that the number of nodes will be automatically scaled based on:
CPU and memory utilization.
Amount of of CPU and memory requested by the pods (called also reservation).
Is this actually true?
I am running mainly Jobs on my cluster, and would like to spin up new instances to service them on demand. CPU usage doesn't work well as a scaling metric for this workload.
From Google's CKE documentation, however, this only appears to be possible by using Cloud Monitoring metrics -- relying on a third-party service that you then have to customize. This seems like a perplexing gap in basic functionality that Kubernetes itself claims to support.
Is there any simpler way to achieve the very simple goal of having the GCE instance group autoscale based on the CPU requirements that I'm quite explictly specifying in my GKE Jobs?

The disclaimer at the bottom of that section explains why it won't work by default in GKE:
Note that autoscaling will work properly only if node metrics are accessible in Google Cloud Monitoring. To make the metrics accessible, you need to create your cluster with KUBE_ENABLE_CLUSTER_MONITORING equal to google or googleinfluxdb (googleinfluxdb is the default value). Please also make sure that you have Google Cloud Monitoring API enabled in Google Developer Console.
You might be able to get it working by standing up a heapster instance in your cluster configured with --sink=gcm (like this), but I think it was more of an older proof of concept than a well-maintained, production-grade configuration.
The community is working hard on a better, more-fully-supported version of node autoscaling in the upcoming 1.3 release.

Related

Google Compute API Anonymous Requests

Just noticed I have thousands of anonymous requests hitting all of the compute engine api list endpoints. I have no instances running and I'm only using Firebase and Cloud Build, Source, and Registry. Please see attached screenshot of API metrics report.
Any reason for this?
compute engine metrics
On the backend there are certain API calls needed to make sure that your project is healthy, these "Anonymous" requests represent an account used by the backend service making health checks.
Anonymous API calls (this could be just Compute Engine “list” calls) doesn't imply having enabled something from your side. A lot of different sections in the Console make calls to the Compute Engine API and there’s no easy way to figure out which section made the calls, but they are expected.
These kind of "Anonymous" Compute Engine APIs are part of the internal Monitoring tools needed to make sure that your project is healthy and are randomly triggered. These metrics might eventually disappear and come back throughout the project life.

GKE network bound kubernetes nodes?

We have a crawling engine that we are trialling on Google Kubernetes Engine.
It appears that we are severely network bound when it comes to making request outside the google network.
We have been in-touch with an architect at google, who though that perhaps there was some rate-limiting being applied inside the google data centre. He mentioned that I should raise a support ticket with Google to investigate. Raising a ticket involves subscribing to a support plan (which I am not ready to do until the network issues are addressed) [a bit of a catch-22].
Looking at the network documentation: https://cloud.google.com/network-tiers/?hl=en_US it seems that rates might be severely limited. I'm not sure that I'm reading this right, but are we saying 6Mbps network?
I'm reaching out to the community / Google to see is what we are seeing is expected, if there is any rate limiting and what options there are to increase raw throughput?
You can raise a ticket with Google using the public issue tracker free of charge. In this case, since it's possibly an issue on the Cloud side of things, raising a ticket in this manner will get a Google Engineer looking into this.

Google cloud load balancing instance status

I've tried few different setups of HTTP load balancing in google compute engine.
I used this as a reference :
https://cloud.google.com/compute/docs/load-balancing/http/cross-region-example
And I'm at scenario with 3 instances where I simulate the outage on one of them.
And I can see that one instance is not healthy which is great, so my question would be how can I see which one of them is not up. I mean when this is a real scenario I want to immediately know which one is it.
Any suggestions?
You can use the gcloud tool to get detailed health information. Based on that tutorial, I would run:
gcloud compute backend-services get-health NAME
I am not sure how to view this information in the developer console.
See more:
https://cloud.google.com/compute/docs/load-balancing/http/backend-service#health_checking

Openshift scaling on specific (software) condition

I'm looking for a scaling mechanism on OpenStack cloud, and then I found OpenShift. My scenario is something like this: we have a distributed system with many agents stand on many nodes. One node contain a Message Broker that direct the traffic. We want to monitor the Message Broker node, if a queue is full, we scale out the agent nodes handle that queue. In brief, we monitor one node to scale other nodes.
We used OpenStack cloud now. In OpenStack, I found heat and ceilometer which are able to create alarm and scale out nodes. However, alarms are based only on general info like CPU, RAM, Network usage, etc (not inside-VM info).
Then I search for a layer above: PaaS. I found OpenShift can handle scaling apps. But as I knew, the scaling mechanism of OpenShift is: duplicate the apps based on network traffic, then put an HAProxy in front.
Am I right that OpenShift can't monitor software specific data. Is there any other tool that suit our scenario?
You can try using this script (https://github.com/openshift/origin-server/blob/master/cartridges/openshift-origin-cartridge-haproxy/usr/bin/haproxy_ctld.rb) to control how your gears are scaled, but I believe that it is still experimental. Make sure that you read through all of the comments and understand what you are doing before making any changes. You might also consider spinning up a second scaled application to test this on before messing with your production application.

Additional Tutorials or worked examples of best practice for configuring multi vm projects in google compute engine

I was hoping people would know of more samples and best practice guides for configuring systems on google compute engine so I can gain more experience in deploying them and apply the knowledge to my own projects.
I had a look at https://developers.google.com/compute/docs/samples-and-videos#samples which runs through deploying cassendra cluster and hadoop using scripts but I was hoping there might be more available including on the following topics
Load balancing webservers across zones samples including configuring networking,
firewalls and load balancer.
Fronting tomcat servers with apache behind a load balancer
Multi network systems in compute engine using subnetting
Multi project systems and how to structure them for reliability and secure interoperability.
They would be easy to follow projects you build starting from a blank project and end up with a sample site running across multiple vm's & zones with recommended security in place, a bit like the videos you see for gae coding examples that go from hello world to something more complex but for infrastructure not code.
Does anyone know of any?
You may want to checkout https://cloud.google.com/developers/#resources for tutorial and samples as well as http://googlecloudplatform.github.io
I'm new to the forums so I can only post two links. Taking a quick look I see several topics that may be of interest to you:
Managing Hadoop Clusters on Compute Engine
Auto Scaling on the Google Cloud Platform
Apache Hadoop, Hive, and Pig on Google Compute Engine
Compute Engine Load Balancing in Action
I hope this helps!