Rolling Update with Kubernetes Deployment without increasing the cluster size - google-compute-engine

I have a cluster that can only run one Pod per node due to our configuration (sometimes Kubernetes will randomly run two on one pod but w/e). Any time I have to update my Deployment which causes a Rolling Update, Kubernetes will simply never finish the update.
The reason for this appears to be that there isn't enough room in the nodes to deploy the new pods from the rolling update.
Now, some of you may say that I may simple increase the cluster size every time I want to perform an update. The problem with that approach is that I have enabled autoscaling on the cluster and the Deployment replicas is set high so that Kubernetes automatically scales with the cluster. This means I can't change the cluster size to accomodate the Rolling Update.
How can I perform a Rolling Update with this configuration?

Can you set maxSurge to 0 and maxUnavailable to some positive value?

Related

kubernetes clustering architecture for zero down time

As I found, the best way to have zero down time even when one datacenter is down, is using kubernetes between at least two servers from two datacenters.
So because I wanted to use servers in Iran. I've heard low performance about infrastructure.
The question is that if I want to have master-master replication for mysql, in one server failure, how can I sync repaired server in kubernetes clustring?
K8s is the platform, it doesn't change how MySQL HA works. Example, if you have dedicated servers for MySQL, these servers become "pods" in K8s. What you need to do at MySQL level when any of the server is gone for whatever reason; is the same as what you need to do when you run it as a pod. In fact, K8s help you by automatically start a new pod. Where in former case, you will need to provision a new physical server - the time required is obvious here. You will normally run script to re-establish the HA, the same apply to K8s where you can run the recovery script as the init container before the actual MySQL server container is started.

Trying to create two MySQL pods in kubernetes with same volume for high availability

I am trying to deploy two MySQL pods with the same PVC, but I am getting CrashLoopBackoff state when I create the second pod with the error in logs: "innoDB check that you do not already have another mysqld process using the same innodb log files". How to resolve this error?
There are different options to solve high availability. If you are running kubernetes with an infrastructure that can provision the volume to different nodes (f.e. in the cloud) and your pod/node crashes, kubernetes will restart the database on a different node with the same volume. Aside from a short downtime you will have the database back up running in a relatively short time.
The volume will be mounted to a single running mysql pod to prevent data corruption from concurrent access. (This is what mysql notices in your scenario as well, since it is no designed for shared storage as HA solution)
If you need more you can use the built in replication of mysql to create a mysql 'cluster' which can be used even if one node/pod should fail. Each instance of the mysql cluster will have an individual volume in that case. Look at the kubernetes stateful set example for this scenario: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/

Reduce Kubernetes Cluster costs at night

I am using Google's Kubernetes Engine to manage a cluster with several node pools. Each pool has different configurations (ex. not all have auto-scaling).
The pools are mostly unused during the night, and so I would like to reduce resource consumption and cost during this period (about 10 hours).
I've considered stopping VM instances at the end of the day and restarting them in the morning. Additionally I could temporarily scale down the number of nodes by running gcloud container clusters resize $CLUSTER_NAME --size=0
What would be the best option to reduce costs during unused periods? Is there a better way?
Using cluster autoscaler (which adjusts number of nodes in your node pools) will not be able to scale all your node pools to zero. This is because there are some system pods running in your cluster (kubectl get pods -n kube-system).
You can however, force scaling down of node pools to zero as you pointed out, with a script calling:
gcloud container clusters resize $CLUSTER --num-nodes=0 [--node-pool=$POOL]

Is there a 'max-retries' for Kubernetes Jobs?

I have batch jobs that I want to run on Kubernetes. The way I understand Jobs:
If I choose restartPolicy: Never it means that if the Job fails, it will destroy the Pod and reschedule onto (potentially) another node. If restartPolicy: OnFailure, it will restart the container in the existing Pod. I'd consider a certain number of failures unrecoverable. Is there a way I can prevent it from rescheduling or restarting after a certain period of time and cleanup the unrecoverable Jobs?
My current thought for a workaround to this is to have some watchdog process that looks at retryTimes and cleans up Jobs after a specified number of retries.
Summary of slack discussion:
No, there is no retry limit. However, you can set a deadline on the job as of v1.2 with activeDeadlineSeconds. The system should back off restarts and then terminate the job when it hits the deadline.
FYI, this has now been added as .spec.backoffLimit.
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

How do I automatically restart a GCE preemptible instance?

How do I automatically restart a preemptible Google Compute Engine instance? I only have one instance that doesn't need 100% uptime but that I would like to restart once the data center becomes unloaded again. The instance/server that I'm trying to automatically restart has its own boot disk that I'd like to use each time it restarts.
You could try using Instance Group Manager to set up a pool of size 1. It will then try to re-create instances after they are preempted.
You should be aware that there is no guarantee that there is going to be capacity for your instance. As the docs say:
Preemptible instances are available from a finite amount of Compute Engine resources, and might not always be available.
You could create a f1-micro instance which is free for one instance per month in several data centers and create a cron job
*/10 * * * * /snap/bin/gcloud beta compute instances start --zone "yourzone" "yourinstance" --project "yourproject"
after you ran gcloud auth login once.
This will restart your instance every 10 minutes. Of course you can set this also to an hour or more. With a bit more scripting also things like exponential back off can be done.
If you'd like to restart it less frequently, you can use Instance schedules that's built in to the Google Cloud Dashboard.
https://cloud.google.com/compute/docs/instances/schedule-instance-start-stop