Compute Engine - Automatic scale - google-compute-engine

I have one VM Compute Engine to host simple apps. My apps is growing and the number of users too.
Now my users work basicaly from 08:00 AM to 07:00 PM, in this period the usage os CPU and Memory is High and the speed of work is very important.
I'm preparing to expand the memory and processor in the next days, but i search a more scalable and cost efective way.
Is there a way for automatic add resources when i need and reduce after no more need?
Thanks

The cost of running your VMs is directly related to a number of different factors i.e. the type of network in use (premium vs standard), the machine type, the boot disk image you use (premium vs open-source images) and the region/zone where your workloads are running, among other things.
Your use case seems to fit managed instance groups (MIGs). With MIGs you essentially configure a template for VMs that share the same attributes. During the configuration of your MIG, you will be able to specify the CPU/memory limit beyond which the MIG autoscaler will kick off. When your CPU/memory reading goes below that threshold, MIG scales your VMs down to the number of instances specified in your template.
You can also use requests per second as a threshold for autoscaling and I would recommend you explore the docs to know more about it.
See docs

Related

looking for simple cluster configuration

I am using compute engine for embarrassingly parallel scientific calculations. Some of my calculations require a single core and some require 64-cores machines. I am currently using my own scripts: I have a qsub-like command that creates a new instance with the required number of cores, booting it from a custom image with the pre-installed software, connects to a storage bucket via gcsfuse, runs the required command and then kills the instance after it's done.
Do I really need to do all of that with my own scripts, or is there any tool that I should use instead? I'd much rather use some ready made tool for all of the management.
My usage fluctuates widely (hundreds of cores in parallel for 3 hours, then 2 days with nothing, etc). So I don't want constant sized machines: I like to be billed by the minute for my computations.
You may want to use auto-scaling feature for managed instance group in Google Compute Engine(GCE). This feature adds more instances to your instance group when there is more load (upscaling), and removes instances when there is less load (downscaling). Moreover, you can define autoscaling policy based upon CPU utilization, or Load balancer utilization or request per seconds. Please refer autoscaler decisions document to understand decisions that autoscaler might make when scaling instance groups.

Benefit of running Kubernetes in bare metal and cloud with idle VM or machines?

I want to know the high level benefit of Kubernetes running in bare metal machines.
So let's say we have 100 bare metal machines ready with kubelet being deployed in each. Doesn't it mean that when the application only runs on 10 machines, we are wasting the rest 90 machines, just standing by and not using them for anything?
For cloud, does Kubernetes launch new VMs as needed, so that clients do not pay for idle machines?
How does Kubernetes handle the extra machines that are needed at the moment?
Yes, if you have 100 bare metal machines and use only 10, you are wasting money. You should only deploy the machines you need.
The Node Autoscaler works at certain Cloud Providers like AWS, GKE, or Open Stack based infrastructures.
Now, Node Autoscaler is useful if your load is not very predictable and/or scales up and down widely over the course of a short period of time (think Jobs or cyclic loads like a Netflix type use case).
If you're running services that just need to scale eventually as your customer base grows, that is not so useful as it is as easy to simply add new nodes manually.
Kubernetes will handle some amount of auto-scaling with an assigned number of nodes (i.e. you can run many Pods on one node, and you would usually pick your machines to run in a safe range but still allow handling of spikes in traffic by spinning more Pods on those nodes.
As a side note: with bare metal, you typically gain in performance, since you don't have the overhead of a VM / hypervisor, but you need to supply distributed storage, which a cloud provider would typically provide as a service.

Google Compute Engine auto scaling based on queue length

We host our infrastructure on Google Compute Engine and are looking into Autoscaling for groups of instances. We do a lot of batch processing of binary data from a queue. In our case, this means:
When a worker is processing data the CPU is always 100%
When the queue is empty we want to terminate all workers
Depending on the length of the queue we want a certain amount of workers
However I'm finding it hard to figure out a way to auto-scale this on Google Compute Engine because they appear to scale on instance-only metrics such as CPU. From the documentation:
Not all custom metrics can be used by the autoscaler. To choose a
valid custom metric, the metric must have all of the following
properties:
The metric must be a per-instance metric.
The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale up or down
the number of virtual machines.
If I'm reading the documentation properly this makes it hard to use the auto scaling on a global queue length?
Backup solutions
Write a simple auto-scale handler using the Google Cloud API to create or destroy new workers using Instances API
Write a simple auto-scale handler using instance groups and then manually insert/remove instances using the InstanceGroups: insert
Write a simple auto-scaling handler using InstangeGroupManagers: resize
Create a custom per-instance metric which measures len(queue)/len(workers) on all workers
As of February 2018 (Beta) this is possible via "Per-group metrics" in stackdriver.
Per-group metrics allow autoscaling with a standard or custom metric
that does not export per-instance utilization data. Instead, the group
scales based on a value that applies to the whole group and
corresponds to how much work is available for the group or how busy
the group is. The group scales based on the fluctuation of that group
metric value and the configuration that you define.
More information at https://cloud.google.com/compute/docs/autoscaler/scaling-stackdriver-monitoring-metrics#per_group_metrics
The how-to is too long to post here.
As far as I understand this is not implemented yet (as at January 2016). At the moment autoscaling is only targeted at web serving scenarios, where you want to serve web pages/other web services from your machines and keep some reasonable headroom (e.g. in terms of CPU or other metrics) for spikes in traffic. Then the system will adjust the number of instances/VMs to match your target.
You are looking for autoscaling for batch processing scenarios, and this is not catered for at the moment.

gear size of more than one CPU / core in OpenShift Enterprise 2?

I'm setting up OpenShift Enterprise 2 and I'd like to create a district with a larger gear size. Changing
/etc/openshift/resource_limits.conf
on the nodes is straightforward for increasing memory and disk available to the gear, but CPU resource management is less intuitive (from resource_limits.conf):
# cpu cpu_rt_period_us=100000 cpu_rt_runtime_us=950000
cpu_shares=128
cpu_cfs_quota_us=100000
By default, a gear can only consume a maximum of 100% of a single processor core. If I want to allow a bigger gear size that could allow full utilization of 2 processor cores, how would I do that, or is it currently not possible at all in OpenShift?
Since all the gears are the same, and since 'cpu_shares' are compared on a relative basis when restricting a group, I'm not sure it makes sense to change 'cpu_shares'.
However, 'cpu_cfs_quota_us' looks like it might be the right knob to turn. From this page:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
It appears that I should be able to double the quota to get a full 2 cores. However, it's not clear whether OpenShift will respect this, since the 'cpu_cfs_period_us' parameter is not even found in resource_limits.conf.
I performed an experiment using 'stress'. I first confirmed that I could load 2 cores under a normal ssh login (using 'stress --cpu 2'). Then I logged in to a gear on that host and ran the same thing. With cpu_cfs_quota_us=100000, I can only consume a max of 50% CPU for each stress process. But when I change to cpu_cfs_quota_us=200000, I can consume over 99% for each process, so it appears that it is now successful. Would be nice if this was called out in the OpenShift docs...

Does there exist an open-source distributed logging library?

I'm talking about a library that would allow me to log events from different machines and would align these events on a "global" time axis with sufficiently high precision.
Actually, I'm asking because I've written such a thing myself in the course of a cluster computing project, I found it terrifically useful, and I was surprised that I couldn't find any analogues.
Therefore, the point is whether something like this exists (and I better contribute to it) or nothing exists (and I better write an open-source analogue of my solution).
Here are the features that I'd expect from such a library:
Independence on the clock offset between different machines
Timing precision on the order of at least milliseconds, preferably microseconds
Scalability to thousands of concurrent logging processes, with at least several megabytes of aggregated logs per second
Soft real-time operation (t.i. I don't want to collect 200 big logs from 200 machines and then compute clock offsets and merge them - I want to see what happens "live", perhaps with a small lag like 10s)
Facebook's contribution in the matter is called 'Scribe'.
Excerpt:
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.
...
Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
The API is Thrift-based, so you have a good platform coverage, but in case you're looking for simple integration for Java you may want to have a look at Digg's log4j appender for Scribe.
You could use log4j/log4net targeting a central syslog daemon. log4j has a builtin SyslogAppender, and in log4net you can do it as shown here. log4cpp docs here.
There are Windows implementations of Syslog around if you don't have a Unix system to hand for this.
Use Chukwa, Its Open source and Large scale Log Monitoring System