Kubernetes cluster on GCE from Instances/Group - google-compute-engine

Have Kubernetes computation cluster running on GCE, reasonable happy so far. I know if I created K-cluster, I'll get to see nodes as VM Instances and cluster as Instance group. I would like to do other way around - create instances/group and make K-cluster out of it so it could be managed by Kubernetes. Reason I want to do so is to try and make nodes preemptible, which might better fit my workload.
So question - Kubernetes cluster with preemptible nodes how-to. I could do either one or another now, but not together

There is a patch out for review at the moment (#12384) that makes a configuration option to mark the nodes in the instance group as preemptible. If you are willing to build from head, this should be available as a configuration option in the next couple of days. In the meantime, you can see from the patch how easy it is to modify the GCE startup scripts to make your VMs preemptible.

Related

Is it possible to get GCP's ANY distribution for Kubernetes GKE node pool?

I have a GKE Kubernetes cluster running on GCP. This cluster has multiple node pools set with autoscale ON and placed at us-central1-f.
Today we started getting a lot of errors on these Node pools' Managed Instance Groups saying that us-central1-f had run out of resources. The specific error: ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS
I've found another topic on Stackoverflow with a similar question, where the answer points to a discussion on Google Groups with more details. I know that one of the recommended ways of avoiding this is to use multiple zones and/or regions.
When I first faced this issue I wondered if there is a way to set multiple region as a fallback system, instead of redundancy system. In that sense, I would set my VMs to be placed wherever zone that has available resources prioritizing the ones closer to, lets say, us-central1-f.
Then, reading the discussion on the Google Group I found a feature that caught my attentions which is the ANY distribution method for Managed Instance Groups. It seems that this feature does exactly what I need - the zone fallback.
So, my question: Does the ANY distribution method resolve my issue? Can I use it for GKE Node Pools? If not, is there any other solution other than using multiple zones?
It is possible to get a regional (i.e. multi-zonal) GKE deployment, however this will use multiple zonal MIGs as the underlying compute layer. So technically speaking you will not use the ANY distribution method, but you should achieve pretty much the same result.

Kubernetes :: web interface to start a pod

Background:
As a backoffice service for our insurance mathematicians, a daily cronjob runs a pod.
Inside the pod, fairly complex future simulations take place.
The pod has two containers, an application server and a db server.
The process has few variables which are fed into the pod.
This is done by configmaps and container env variables.
When the pod is ready after approx. 10 hours, it copies the resulting database to another database
and then it's done. It runs daily because market data changes daily. And we also daily check our new codebase.
Great value, high degree of standardisation, fully automated.
So far so good.
But it uses the same configuration every time it runs.
Now what?
Our mathematicians would like to be able to start the pod feeding their own configuration data into it.
For example on a webpage with configurable input data.
Question:
Is there an existing Kubernetes framework implementing this?
"Provide a webpage with configurable input fields which are transformed into configmaps and env variables starting the pod"?
Sure, not too difficult to write.
But we do cloud native computing also because we want to reuse solutions of general problems and not write it ourselves if possible.
Thanks for any hints in advance.
They can start a Kubernetes Job for one time tasks. Apart from Google Cloud Console UI I'm not aware of an UI where you can configure fields for a config map. Maybe you can write a custom python script that launches these jobs.
https://kubernetes.io/docs/concepts/workloads/controllers/job/

Having as many Pods as Nodes

We are currently using 2 Nodes, but we may need more in the future.
The StatefulSets is a mariadb-galera is current replica is at 2.
When we'll had a new Nodes we want the replica to be a 3, f we don't need it anymore and we delete it or a other Node we want it to be a 2.
In fact, if we have 3 Nodes we want 3 replica one on each Nodes.
I could use Pod Topology Spread Constraints but we'll have a bunch of "notScheduled" pods.
Is there a way to adapt the number of Replica automatically, every time a nodes is add or remove?
When we'll had a new Nodes we want the replica to be a 3, f we don't need it anymore and we delete it or a other Node we want it to be a 2.
I would recommend to do it the other way around. Manage the replicas of your container workload and let the number of nodes be adjusted after that.
See e.g. Cluster Autoscaler for how this can be done, it depends on what cloud provider or environment your cluster is using.
It is also important to specify your CPU and Memory requests such that it occupy the whole nodes.
For MariaDB and similar workload, you should use StatefulSet and not DaemonSet.
You could use a Daemon Set https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
Which will ensure there is one pod per node.
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Also, its not advised to run a database in anything else than a statefulset due to the pod identity concept as statefulsets have.
Due to all the database administration it is advisable to use any cloud provider managed databases or managing it, specially inside the cluster will incur in multiple issues

Shrink a Dataproc worker boot disk

Due to some mix-up during planning we ended up with several worker nodes running 23TB drives which are now almost completely unused (we keep data on external storage). As the drives are only wasting money at the moment, we need to shrink them to a reasonable size.
Using weresync I was able to fully clone the drive to a much smaller one but apparently you can't swap the boot drive in GCE (which makes no sense to me). Is there a way to achieve that or do I need to create new workers using the images? If so, is there any other config I need to copy to the new instance in order for it to be automatically joined to the cluster?
Dataproc does not support VMs configuration changes in running clusters.
I would advise you to delete old cluster and create new one with workers disk size that you need.
I ended up creating a ticket with GCP support - https://issuetracker.google.com/issues/120865687 - to get an official answer to that question. Got an that this is not possible currently but should be available shortly (within months) in the beta GCP CLI, possibly in the Console on a later data as well.
Went on with a complete rebuild of the cluster.

couchbase cluster document not replicating but splitting up

I've set up a couchbase cluster with 2 nodes containing 300k docs on 4 buckets. the option replicas is forced to 1 as there are only 2 machines.
But documents are splitted half in one node half in the other, I need to have double copy of each document so if a node goes down the other one che still supply all data to my app.
Is there a setting I missed in creating the cluster?
can I still set the cluster to replicate all documents?
I hope someone can help.
thanks
PS: I'm using couchbase community 4.5
UPDATE:
I add screenshots of cluster web interface and cbstast output:
the following is the state with one node only
next the one with both node up:
then cbstats results on both node when both are up and running:
AS you can see with only one node there are half items displayed. Does it mean that the other half resides as replicas but are not shown???
can I still run consistenly my app with only one node???
UPDATE:
I had to click fail-over manually to see replicas become active on the remaining node. As with just two cluster auto fail-over is disabled!!!
Couchbase Server will partition or shard the documents across the two nodes, as you observed. It will also place replicas on those nodes, based on your one-replica configuration.
To access a replica, you must use one of the Client SDKs.
For example, this Java code will attempt to retrieve a replica (getFromReplica("id", ReplicaMode.ALL)) if the active document retrieval fails (get("id")).
bucket.async()
.get("id")
.onErrorResumeNext(bucket.async().getFromReplica("id", ReplicaMode.ALL))
.subscribe();
The ReplicaMode.ALL tells Couchbase to try all nodes with replicas and the active node.
So what was happening with only two nodes in the cluster was that auto fail-over didn't start automatically as specified here:
https://developer.couchbase.com/documentation/server/current/clustersetup/automatic-failover.html
this means data replicas where not activated in the remaining node unless fail-over was triggerd manullay.
The best thing is to have more than TWO nodes in the cluster before going in production.
To be honest I should have ridden documentation very carefully before asking any question.
thanks Jeff Kurtz for your help, you pushed me towards the solution. (the understanding of how couchbase replicas policy works).