How to deploy OpenShift Origin 10 (OKD) on one node with GlusterFS - openshift

I am able to install OKD on one node and scaleup on multiple node accordngly.
But now i want to install OKD with GlusterFS on one node and then extend this on multiple nodes.
Currently i am getting error that at least three nodes required. How i can bypass this check in ansible?
As per github documentations i have three options
Configuring a new, natively-hosted GlusterFS cluster. In this scenario, GlusterFS pods are deployed on nodes in the OpenShift cluster which are configured to provide storage.
Configuring a new, external GlusterFS cluster. In this scenario, the cluster nodes have the GlusterFS software pre-installed but have not been configured yet. The installer will take care of configuring the cluster(s) for use by OpenShift applications.
Using existing GlusterFS clusters. In this scenario, one or more GlusterFS clusters are assumed to be already setup. These clusters can be either natively-hosted or external, but must be managed by a heketi service.
Can option 2 or 3 be used to start with one node and extend accordingly? I have install glusterfs cluster on one node and extend it to second node but how to introduce in openshift?
https://imranrazakh.blogspot.com/2018/08/

I found one way to install glusterfs on one node, Find below all in one installation with glusterfs
Changed inventory file like below
[OSEv3:children]
masters
nodes
etcd
glusterfs
[OSEv3:vars]
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
ansible_ssh_user=root
openshift_deployment_type=origin
openshift_enable_origin_repo=false
openshift_disable_check=disk_availability,memory_availability
os_firewall_use_firewalld=true
openshift_public_hostname=console.1.1.0.1.nip.io
openshift_master_default_subdomain=apps.1.1.0.1.nip.io
openshift_storage_glusterfs_is_native=false
openshift_storage_glusterfs_storageclass=true
openshift_storage_glusterfs_heketi_is_native=true
openshift_storage_glusterfs_heketi_executor=ssh
openshift_storage_glusterfs_heketi_ssh_port=22
openshift_storage_glusterfs_heketi_ssh_user=root
openshift_storage_glusterfs_heketi_ssh_sudo=false
openshift_storage_glusterfs_heketi_ssh_keyfile="/root/.ssh/id_rsa
[masters]
1.1.0.1 openshift_ip=1.1.0.1 openshift_schedulable=true
[etcd]
1.1.0.1 openshift_ip=1.1.0.1
[nodes]
1.1.0.1 openshift_ip=1.1.0.1 openshift_node_group_name="node-config-all-in-one" openshift_schedulable=true
[glusterfs]
1.1.0.1 glusterfs_devices='[ "/dev/vdb" ]'
Now we have to hack ansible script as it expect three nodes by adding --durability none in following ansible script
openshift-ansible/roles/openshift_storage_glusterfs/tasks/heketi_init_db.yml
Following is updated snippet
- name: Create heketi DB volume
command: "{{ glusterfs_heketi_client }} setup-openshift-heketi-storage --image {{ glusterfs_heketi_image }} --listfile /tmp/heketi-storage.json --durability none"
register: setup_storage
As by default it create StorageClass which expect replicate environment, so we have to create custom storageclass like below with "volumetype: none"
oc create -f - <<EOT
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: glusterfs-nr-storage
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
parameters:
resturl: http://heketi-storage.glusterfs.svc:8080
restuser: admin
secretName: heketi-storage-admin-secret
secretNamespace: glusterfs
volumetype: none
provisioner: kubernetes.io/glusterfs
volumeBindingMode: Immediate
EOT
Now you can create storage dynamically from webconsole :) Any suggestions for improvement are welcome.
Next i will check how i can extend it?

Related

Ingress Nginx cant tolerate Master taint

Problem
When trying to install ingress-nginx on a single node (also master) Kubernetes cluster, the Helm install fails complaining pod can't be scheduled on master as it cant tolerate the taint of master:
- FailedScheduling
- pod/ingress-nginx-admission-create--1-n7bhg
- 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Details
Kubernetes :
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:32:41Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Helm Version:
version.BuildInfo{Version:"v3.7.0", GitCommit:"eeac83883cb4014fe60267ec6373570374ce770b", GitTreeState:"clean", GoVersion:"go1.16.8"}
Installation steps followed : ( from documentation at https://kubernetes.github.io/ingress-nginx/deploy/#using-helm )
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx
Cluster node:
ip-172-29-1-103 Ready control-plane,master 81m v1.22.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-172-29-1-103,kubernetes.io/os=linux,mitg.cisco.com/node-type=pats,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
Removing the master node taint doesn't look right for other reasons. What would be a solution ?
In general, to get workloads schedules on the Kubernetes Control Plane (I.e. Master nodes), you need to do the following:
kubectl taint nodes --all node-role.kubernetes.io/master-
or 1.7 and above:
kubectl taint node mymasternode node-role.kubernetes.io/master:NoSchedule-
In order for you to find out what the master node is currently tainted with you can describe the node and look at the labels and taints associated with the node in question. What this does is it will untaint the master node and allow workloads to be scheduled to that node. Essentially find the taint fo the node in question and untaint the master node in question that is preventing it. Without any descriptions on your node, or the resources that are failing that's the best advice I can give. So it sounds like you didn't properly remove the taint that prevents scheduling to your master node, which by default workloads are restricted to the master node.
You can also spin up a worker node and try to join it to your cluster to overcome the issue and see if it gets scheduled to the joined worker.
My best advice is to find your taint:
kubectl describe node <insert-node-name-here>
Find the taints and/or tolerations that are preventing it and remove it.
Read through the following to see if it helps you:
https://kubernetes.io/docs/concepts/architecture/nodes/
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
NOTE: THIS SHOULD NOT BE DONE ON PRODUCTION CLUSTERS, SO I ASSUME THIS IS A DEVELOPMENT CLUSTER YOU ARE WORKING WITH *****

Kubernetes -- Helm -- Mysql Chart loses stored data after stopping pod

Using https://github.com/helm/charts/tree/master/stable/mysql (all the code is here), it is cool being able to run mysql as part of my local kubernetes cluster (using docker kubernetes).
The problem though is that once I stop running the pod, and then run the pod again, all the data that was stored is now gone.
My question is how do I keep the data that was added to the mysql pod? I have read about persistent volumes, and the mysql helm example from github is showing that it is using PersistentVolumeClaim. I have also enabled persistence on the values.yaml file, but I cannot seem to have the same data that was saved in the database.
My docker kubernetes version is currently 1.14.6.
Please verify your msql POD You should notice volumes and volumesMount options:
volumeMounts:
- mountPath: /var/lib/mysql
name: data
.
.
.
volumes:
- name: data
persistentVolumeClaim:
claimName: msq-mysql
In additions please verify your PersistentVolume and PersistentVolumeClaim, storageClass:
kubectl get pv,pvc,pods,sc:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-2c6aa172-effd-11e9-beeb-42010a840083 8Gi RWO Delete Bound default/msq-mysql standard 24m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/msq-mysql Bound pvc-2c6aa172-effd-11e9-beeb-42010a840083 8Gi RWO standard 24m
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/msq-mysql-b5c48c888-pz6p2 1/1 Running 0 4m28s 10.0.0.8 gke-te-1-default-pool-36546f4e-5rgw <none> <none>
Please run kubectl describe persistentvolumeclaim/msq-mysql (in your example you should change the pvc name)
You can notice that pvc was provisioned successfully using gce-pd and mounted by msq-mysql POD.
Normal ProvisioningSucceeded 26m persistentvolume-controller Successfully provisioned volume pvc-2c6aa172-effd-11e9-beeb-42010a840083 using kubernetes.io/gce-pd
Mounted By: msq-mysql-b5c48c888-pz6p2
I have created table with on row, deleted the pod and verified after that (as expected everything is alright):
mysql> SELECT * FROM t;
+------+
| c |
+------+
| ala |
+------+
1 row in set (0.00 sec)
Why: all the data that was stored is now gone.
As per helm chart docs:
The MySQL image stores the MySQL data and configurations at the /var/lib/mysql path of the container.
By default a PersistentVolumeClaim is created and mounted into that directory. In order to disable this functionality you can change the values.yaml to disable persistence and use an emptyDir instead.
Mostly there is problem with pv,pvc binding. It can be also problem with user defined or non default storageClass.
So please verify pv,pvc as stated above.
Take a look at StorageClass
A claim can request a particular class by specifying the name of a StorageClass using the attribute storageClassName. Only PVs of the requested class, ones with the same storageClassName as the PVC, can be bound to the PVC.
PVCs don’t necessarily have to request a class. A PVC with its storageClassName set equal to "" is always interpreted to be requesting a PV with no class, so it can only be bound to PVs with no class (no annotation or one set equal to ""). A PVC with no storageClassName is not quite the same and is treated differently by the cluster, depending on whether the DefaultStorageClass admission plugin is turned on.

Scaling Up of GlusterFS-storage only add new peer without new bricks in Openshift

Observed behavior
I started with one node Openshift cluster and it successfully deployed master/node and gluster volume. Now I extend Openshift cluster and it was successfully.
but on extending glusterfs volume with below
[glusterfs]
10.1.1.1 glusterfs_devices='[ "/dev/vdb" ]'
10.1.1.2 glusterfs_devices='[ "/dev/vdb" ]' openshift_node_labels="type=upgrade"
ansible-playbook -i inventory2.ini /usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.yml -e openshift_upgrade_nodes_label="type=upgrade"
it only added 10.1.1.2 as peer but volume still has only one brick
Following customization done to start deploy gluster from 1 node {--durability none}
openshift-ansible/roles/openshift_storage_glusterfs/tasks/heketi_init_db.yml
- name: Create heketi DB volume
command: "{{ glusterfs_heketi_client }} setup-openshift-heketi-storage --image {{ glusterfs_heketi_image }} --listfile /tmp/heketi-storage.json **--durability none**"
register: setup_storage
>gluster peer status
Number of Peers: 1
Hostname: 10.1.1.2
Uuid: 1b8159e4-99e2-4f4d-ad95-e97bc8655d32
State: Peer in Cluster (Connected)
gluster volume info
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 769419b9-d28f-4cdd-a8f3-708b6b738f65
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.1.1.1:/var/lib/heketi/mounts/vg_4187bfa3eb090ceffea9c53b156ddbd4/brick_80401b43be8c3c8a74417b18ad574524/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Expected/desired behavior
I am expecting that on addition of every new node it should create new brick too
Details on how to reproduce (minimal and precise)
Add nodes in gluster cluster with below commands
ansible-playbook -i inventory2.ini /usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.yml -e openshift_upgrade_nodes_label="type=upgrade"
Information about the environment:
Heketi version used (e.g. v6.0.0 or master): OpenShift 3.10
Operating system used: CentOS
Heketi compiled from sources, as a package (rpm/deb), or container: Container
If container, which container image: docker.io/heketi/heketi:latest
Using kubernetes, openshift, or direct install: Openshift
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: outside
If kubernetes/openshift, how was it deployed (gk-deploy, openshift-ansible, other, custom): openshift-ansible
Just adding a node/server does not mean that the brick will also be added to existing
gluster volume.
You have to add that brick, hosted on new node, to existing volume.
command -
"gluster volume add-brick host:brick-path commit force"
Not sure if you have provided this command in your automation script or not.

Cronjob of existing Pod

I have a django app running on Openshift 3. I need to run certain manage.py commands on a regular basis. In Openshift 2 I used the Cron gear and now in Openshift 3 I want to use the CronJob pod type.
I want to create a pod for the cronjob, use the same source as the django app is using, but not expose it.
For example:
W1 - Django app
D1 - Postgres DB
M1 - django app for manage.py jobs, run as a cronjob pod.
Any help is appreciated.
You want to use a scheduled job.
https://docs.openshift.com/container-platform/3.5/dev_guide/cron_jobs.html
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
https://blog.openshift.com/openshift-jobs/
Note that at this time (OpenShift 3.5), you have to use batch/v2alpha1 as the API version. Be careful of out of date documentation showing older version labels.
What I am not sure of is how you can easily reference the image associated with an existing imagestream produced when you used the S2I builder to build you application and you want to use the same image. The base Kubernetes object for this expects you to refer to the image from the image registry. You would thus need to work that out by looking at the imagestream and copying the image registry IP and image details over by hand.
UPDATE 1
See:
https://stackoverflow.com/a/45227960/128141
for details of how from OpenShift 3.6 you can have it resolve the imagestream name automatically. That mechanism is still alpha status in 3.6, but does work.
I've gotten it to work with specifying the image name in the YAML, but then tried to get it to work as part of the template, but ran into an error when trying to use the batch/v1 version on this server
Cannot create cron job "djangomanage". The API version batch/v1 for kind CronJob is not supported by this server.
My template code is
- apiVersion: batch/v1
kind: CronJob
metadata:
name: djangomanage
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: djangomanage
image: '${NAME}:latest'
env:
- name: APP_SCRIPT
value: "/opt/app-root/src/cron.sh"
restartPolicy: Never
CRON.SH
python /opt/app-root/src/manage.py
you need to update line 1 with this:
- apiVersion: batch/v1beta1
see link below:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#cronjob-v1beta1-batch

How to make oc cluster up persistent?

I'm using "oc cluster up" to start my Openshift Origin environment. I can see, however, that once I shutdown the cluster my projects aren't persisted at restart. Is there a way to make them persistent ?
Thanks
There are a couple ways to do this. oc cluster up doesn't have a primary use case of persisting resources.
There are couple ways to do it:
Leverage capturing etcd as described in the oc cluster up README
There is a wrapper tool, that makes it easy to do this.
There is now an example in the cluster up --help command, it is bound to stay up to date so check that first
oc cluster up --help
...
Examples:
# Start OpenShift on a new docker machine named 'openshift'
oc cluster up --create-machine
# Start OpenShift using a specific public host name
oc cluster up --public-hostname=my.address.example.com
# Start OpenShift and preserve data and config between restarts
oc cluster up --host-data-dir=/mydata --use-existing-config
So specifically in v1.3.2 use --host-data-dir and --use-existing-config
Assuming you are using docker machine with vm such as virtual box, the easiest way I found is taking a vm snapshot WHILE vm and openshift cluster are up and running. This snapshot will backup memory in addition to disk therefore you can restore entire cluster later on by restoring the vm snapshot, then run docker-machine start ...
btw, as of latest os image openshift/origin:v3.6.0-rc.0 and oc cli, --host-data-dir=/mydata as suggested in the other answer doesn't work for me.
I'm using:
VirtualBox 5.1.26
Kubernetes v1.5.2+43a9be4
openshift v1.5.0+031cbe4
Didn't work for me using --host-data-dir (and others) :
oc cluster up --logging=true --metrics=true --docker-machine=openshift --use-existing-config=true --host-data-dir=/vm/data --host-config-dir=/vm/config --host-pv-dir=/vm/pv --host-volumes-dir=/vm/volumes
With output:
-- Checking OpenShift client ... OK
-- Checking Docker client ...
Starting Docker machine 'openshift'
Started Docker machine 'openshift'
-- Checking Docker version ...
WARNING: Cannot verify Docker version
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
Using docker-machine IP 192.168.99.100 as the host IP
Using 192.168.99.100 as the server IP
-- Starting OpenShift container ...
Starting OpenShift using container 'origin'
FAIL
Error: could not start OpenShift container "origin"
Details:
Last 10 lines of "origin" container log:
github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4202a1600, 0x42b94c0, 0x1f, 0xc4214d9f08, 0x2, 0x2)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16a
github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.newBackend(0xc4209f84c0, 0x33, 0x5f5e100, 0x2710, 0xc4214d9fa8)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:106 +0x341
github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.NewDefaultBackend(0xc4209f84c0, 0x33, 0x461e51, 0xc421471200)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:100 +0x4d
github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer.func1(0xc4204bf640, 0xc4209f84c0, 0x33, 0xc421079a40)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:272 +0x39
created by github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:274 +0x345
Openshift writes to the directories /vm/... (also defined in VirtualBox) but successfully won't start.
See [https://github.com/openshift/origin/issues/12602][1]
Worked for me too, using Virtual Box Snapshots and restoring them.
To make it persistent after each shutdown you need to provide base-dir parameter.
$ mkdir ~/openshift-config
$ oc cluster up --base-dir=~/openshift-config
From help
$ oc cluster up --help
...
Options:
--base-dir='': Directory on Docker host for cluster up configuration
--enable=[*]: A list of components to enable. '*' enables all on-by-default components, 'foo' enables the component named 'foo', '-foo' disables the component named 'foo'.
--forward-ports=false: Use Docker port-forwarding to communicate with origin container. Requires 'socat' locally.
--http-proxy='': HTTP proxy to use for master and builds
--https-proxy='': HTTPS proxy to use for master and builds
--image='openshift/origin-${component}:${version}': Specify the images to use for OpenShift
--no-proxy=[]: List of hosts or subnets for which a proxy should not be used
--public-hostname='': Public hostname for OpenShift cluster
--routing-suffix='': Default suffix for server routes
--server-loglevel=0: Log level for OpenShift server
--skip-registry-check=false: Skip Docker daemon registry check
--write-config=false: Write the configuration files into host config dir
But you shouln't use it, because "cluster up" is removed in version 4.0.0. More here: https://github.com/openshift/origin/pull/21399