GKE ReadOnlyMany Persistent Volume with ReadOnlyMany Claims in Multiple Namespaces - google-compute-engine

I have a disk image with mirrors of some protein databases (HHsearch, BLAST, PDB, etc.) That I build with some CI tooling, and write to a GCE disk to run against. I'd like to access this ReadOnlyMany PV in Pods created by ReplicationControllers in multiple namespaces via PersistentVolumeClaims but I'm not getting the expected result.
The PersistentVolume configuration looks like this;
apiVersion: v1
kind: PersistentVolume
metadata:
name: "databases"
spec:
capacity:
storage: 500Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
gcePersistentDisk:
pdName: "databases-us-central1-b-kube"
fsType: "ext4"
How it looks when loaded into kubernetes;
$ kubectl describe pv
Name: databases
Labels: <none>
Status: Bound
Claim: production/databases
Reclaim Policy: Retain
Access Modes: ROX
Capacity: 500Gi
Message:
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: databases-us-central1-b-kube
FSType: ext4
Partition: 0
ReadOnly: false
The PVC configurations are all identical, and look like this;
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: databases
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage:
volumeName: databases
And the PVCs as they look in the system;
$ for ns in {development,staging,production}; do kubectl describe --namespace=$ns pvc; done
Name: databases
Namespace: development
Status: Pending
Volume: databases
Labels: <none>
Capacity: 0
Access Modes:
Name: databases
Namespace: staging
Status: Pending
Volume: databases
Labels: <none>
Capacity: 0
Access Modes:
Name: databases
Namespace: production
Status: Bound
Volume: databases
Labels: <none>
Capacity: 0
Access Modes:
I'm seeing lots of timeout expired waiting for volumes to attach/mount for pod "mypod-anid""[namespace]". list of unattached/unmounted volumes=[databases] when I do $ kubectl get events --all-namespaces
When I scale the RC 1->2 in production (where one pod did manage to bind the PV), the second Pod fails to mount the same PVC. When I create a second ReplicationController and PersistentVolumeClaim in my production namespace (recall that this is where the pod that successfully mounted the pv lives) backed by the same PersistentVolume, the second Pod/PVC cannot bind.
Am I missing something? How is one supposed to actually use an ROX PersistentVolume with PersistentVolumeClaims?

A single PV can only be bound to a single PVC at a given time, regardless of whether it is ReadOnlyMany or not (once a PV/PVC binds, the PV can't bind to any other PVC).
Once a PV/PVC is bound, ReadOnlyMany PVCs may be referenced from multiple pods. In Peter's case, however, he can't use a single PVC object since he is trying to refer to it from multiple namespaces (PVCs are namespaced while PV objects are not).
To make this scenario work, create multiple PV objects that are identical (referring to the same disk) except for the name. This will allow each PVC object (in all namespaces) to find a PV object to bind to.

Related

GKE Kubernetes MySQL Input/output error Ext4Error

I have deployed a MySQL database (statefulset) on Kubernetes zonal cluster, running as a service (GKE) in Google Cloud Platform.
The zonal cluster consist of 3 instances of type e2-medium.
The MySQL container cannot start due to the following error.
kubectl logs mysql-statefulset-0
2022-02-07 05:55:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.35-1debian10 started.
find: '/var/lib/mysql/': Input/output error
Last seen events.
4m57s Warning Ext4Error gke-cluster-default-pool-rnfh kernel-monitor, gke-cluster-default-pool-rnfh EXT4-fs error (device sdb): __ext4_find_entry:1532: inode #2: comm mysqld: reading directory lblock 0 40d 8062 gke-cluster-default-pool-rnfh
3m22s Warning BackOff pod/mysql-statefulset-0 spec.containers{mysql} kubelet, gke-cluster-default-pool-rnfh Back-off restarting failed container
Nodes.
kubectl get node -owide
gke-cluster-default-pool-ayqo Ready <none> 54d v1.21.5-gke.1302 So.Me.I.P So.Me.I.P Container-Optimized OS from Google 5.4.144+ containerd://1.4.8
gke-cluster-default-pool-rnfh Ready <none> 54d v1.21.5-gke.1302 So.Me.I.P So.Me.I.P Container-Optimized OS from Google 5.4.144+ containerd://1.4.8
gke-cluster-default-pool-sc3p Ready <none> 54d v1.21.5-gke.1302 So.Me.I.P So.Me.I.P Container-Optimized OS from Google 5.4.144+ containerd://1.4.8
I also noticed that rnfh node is out of memory.
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-cluster-default-pool-ayqo 117m 12% 992Mi 35%
gke-cluster-default-pool-rnfh 180m 19% 2953Mi 104%
gke-cluster-default-pool-sc3p 179m 19% 1488Mi 52%
MySql mainfest
# HEADLESS SERVICE
apiVersion: v1
kind: Service
metadata:
name: mysql-headless-service
labels:
kind: mysql-headless-service
spec:
clusterIP: None
selector:
tier: mysql-db
ports:
- name: 'mysql-http'
protocol: 'TCP'
port: 3306
---
# STATEFUL SET
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql-statefulset
spec:
selector:
matchLabels:
tier: mysql-db
serviceName: mysql-statefulset
replicas: 1
template:
metadata:
labels:
tier: mysql-db
spec:
terminationGracePeriodSeconds: 10
containers:
- name: my-mysql
image: my-mysql:latest
imagePullPolicy: Always
args:
- "--ignore-db-dir=lost+found"
ports:
- name: 'http'
protocol: 'TCP'
containerPort: 3306
volumeMounts:
- name: mysql-pvc
mountPath: /var/lib/mysql
env:
- name: MYSQL_ROOT_USER
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-root-username
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-root-password
- name: MYSQL_USER
valueFrom:
configMapKeyRef:
name: mysql-config
key: mysql-username
- name: MYSQL_PASSWORD
valueFrom:
configMapKeyRef:
name: mysql-config
key: mysql-password
- name: MYSQL_DATABASE
valueFrom:
configMapKeyRef:
name: mysql-config
key: mysql-database
volumeClaimTemplates:
- metadata:
name: mysql-pvc
spec:
storageClassName: 'mysql-fast'
resources:
requests:
storage: 120Gi
accessModes:
- ReadWriteOnce
- ReadOnlyMany
MySQL storage class manifest:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: mysql-fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
Why Kubernetes is trying to schedule pod in out of memory node?
UPDATES
I've added requests and limits to MySQL manifest to improve the Qos Class. Now the Qos Class is Guaranteed.
Unfortunately, Kubernetes still trying to schedule to out of memory rnfh node.
kubectl describe po mysql-statefulset-0 | grep node -i
Node: gke-cluster-default-pool-rnfh/So.Me.I.P
kubectl describe po mysql-statefulset-0 | grep qos -i
QoS Class: Guaranteed
I ran a few more tests but I couldn't replicate this.
To answer this one correctly, we would need much more logs. Not sure if you still have them. If I could guess which was the root cause of this issue I would say it was connected with the PersistentVolume.
In one of the Github issue - Volume was remounted as read only after error #752 I found very similar behavior to OP's behavior.
You have created a special storageclass for your MySQL. You've set reclaimPolicy: Retain so PV was not removed. When Statefulset pod (with the same suffix -0) has been recreated (restarted due to error with connectivity, some issues on DB, hard to say) it tried to re-claim this Volume. In the mentioned Github issue, user had very similar situation. Also got inode #262147: comm mysqld: reading directory lblock issue, but in the bellow there was also entry [ +0.003695] EXT4-fs (sda): Remounting filesystem read-only. Maybe it changed permissions when re-mounted?
Another thing that your volumeClaimTemplates contained
accessModes:
- ReadWriteOnce
- ReadOnlyMany
So one PersistentVolume could be used as ReadWriteOnce by one node or only ReadOnlyMany by many nodes. There is a possibility that POD was recreated in different node with Read-Only assessMode.
[ +35.912075] EXT4-fs warning (device sda): htree_dirblock_to_tree:977: inode #2: lblock 0: comm mysqld: error -5 reading directory block
[ +6.294232] EXT4-fs error (device sda): ext4_find_entry:1436: inode #262147: comm mysqld: reading directory lblock ...
[ +0.005226] EXT4-fs error (device sda): ext4_find_entry:1436: inode #2: comm mysqld: reading directory lblock 0
[ +1.666039] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
[ +0.003695] EXT4-fs (sda): Remounting filesystem read-only
It would fit to OP's comment:
Two days ago for reasons unknown to me Kubernetes restarted the container and was keep trying to run it on rnfa machine. The container was probably evicted from another node.
Another thing is that node or cluster might be updated (depending if the auto update option was turned on) which might enforce restart of the pod.
Issue with '/var/lib/mysql/': Input/output error might point to database corruption like mentioned here.
In general, the issue has been resolved by cordoning affected node. Additional information about the difference between cordon and drain can be found here.
Just as an addition, to assign pods to specific node or node with specified label, you can use Affinity

MySQL statefulset deployment with persitence volume not working

Environment :
Using Helm v3 charts for deployment.
Using Bitnami MySQL charts from the following source : https://artifacthub.io/packages/helm/bitnami/mysql
Username and Password is generated randomly for every deployment.
PersistenceVolume is created of type "standard" storage class.
Entire k8 cluster is done on Baremetal.
When deploying for the first time, installation goes successfully, all the mysql db files get generated appropriately in the data directory. After doing uninstall and then doing the install does not work, because the mysql data directory has db and config files for previously generated credentials.
As of now, after doing uninstall, we are manually clearing the files from the node before installing.
Looking forward on how to :
Clearing only the config files and keeping the db files intact.
On deleting persistent volume, clear all the files from the data directory.
Alter the DB with new username and password to use existing files as it is.
Persistence Volume K8 yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: mysql-pv
labels:
type: local
name: mysql-pv
spec:
storageClassName: standard
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
labels:
name: mysql-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 2Gi
selector:
matchLabels:
name: mysql-pv

How to limit access to pv

In openshift, PV is available through all projects, and if you can see it, you can claim it.
Is there a way to limit PV only available for certain apps/projects? for others, they are forbidden to use it.
Seems StorageClass does not fit this requirement.
Best regards
Lan
You can pre-bind volume and claim [1]:
You may also want your cluster administrator to "reserve" the volume for only your claim so that nobody else’s claim can bind to it before yours does. In this case, the administrator can specify the PVC in the PV using the claimRef field. The PV will only be able to bind to a PVC that has the same name and namespace specified in claimRef. The PVC’s access modes and resource requests must still be satisfied in order for the PV and PVC to be bound, though the label selector is ignored.
If you know exactly what PersistentVolume you want your PersistentVolumeClaim to bind to, you can specify the PV in your PVC using the volumeName field. This method skips the normal matching and binding process. The PVC will only be able to bind to a PV that has the same name specified in volumeName. If such a PV with that name exists and is Available, the PV and PVC will be bound regardless of whether the PV satisfies the PVC’s label selector, access modes, and resource requests.
PersistentVolume Example:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0001
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
nfs:
path: /tmp
server: 172.17.0.2
persistentVolumeReclaimPolicy: Recycle
claimRef:
name: claim1
namespace: default
PersistentVolumeClaim example:
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "claim1"
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "1Gi"
volumeName: "pv0001"
[1] https://docs.openshift.org/latest/dev_guide/persistent_volumes.html#persistent-volumes-volumes-and-claim-prebinding

Can't Share a Persistent Volume Claim for an EBS Volume between Apps

Is it possible to share a single persistent volume claim (PVC) between two apps (each using a pod)?
I read: Share persistent volume claims amongst containers in Kubernetes/OpenShift but didn't quite get the answer.
I tried to added a PHP app, and MySQL app (with persistent storage) within the same project. Deleted the original persistent volume (PV) and created a new one with read,write,many mode. I set the root password of the MySQL database, and the database works.
Then, I add storage to the PHP app using the same persistent volume claim with a different subpath. I found that I can't turn on both apps. After I turn one on, when I try to turn on the next one, it get stuck at creating container.
MySQL .yaml of the deployment step at openshift:
...
template:
metadata:
creationTimestamp: null
labels:
name: mysql
spec:
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: mysql
containers:
- name: mysql
...
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql/data
subPath: mysql/data
...
terminationMessagePath: /dev/termination-log
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
PHP .yaml from deployment step:
template:
metadata:
creationTimestamp: null
labels:
app: wiki2
deploymentconfig: wiki2
spec:
volumes:
- name: volume-959bo <<----
persistentVolumeClaim:
claimName: mysql
containers:
- name: wiki2
...
volumeMounts:
- name: volume-959bo
mountPath: /opt/app-root/src/w/images
subPath: wiki/images
terminationMessagePath: /dev/termination-log
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
The volume mount names are different. But that shouldn't make the two pods can't share the PVC. Or, the problem is that they can't both mount the same volume at the same time?? I can't get the termination log at /dev because if it can't mount the volume, the pod doesn't start, and I can't get the log.
The PVC's .yaml (oc get pvc -o yaml)
apiVersion: v1
items:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-class: ebs
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
creationTimestamp: YYYY-MM-DDTHH:MM:SSZ
name: mysql
namespace: abcdefghi
resourceVersion: "123456789"
selfLink: /api/v1/namespaces/abcdefghi/persistentvolumeclaims/mysql
uid: ________-____-____-____-____________
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
volumeName: pvc-________-____-____-____-____________
status:
accessModes:
- ReadWriteMany
capacity:
storage: 1Gi
phase: Bound
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""
Suspicious Entries from oc get events
Warning FailedMount {controller-manager }
Failed to attach volume "pvc-________-____-____-____-____________"
on node "ip-172-__-__-___.xx-xxxx-x.compute.internal"
with:
Error attaching EBS volume "vol-000a00a00000000a0" to instance
"i-1111b1b11b1111111": VolumeInUse: vol-000a00a00000000a0 is
already attached to an instance
Warning FailedMount {kubelet ip-172-__-__-___.xx-xxxx-x.compute.internal}
Unable to mount volumes for pod "the pod for php app":
timeout expired waiting for volumes to attach/mount for pod "the pod".
list of unattached/unmounted volumes=
[volume-959bo default-token-xxxxx]
I tried to:
turn on the MySQL app first, and then try to turn on the PHP app
found php app can't start
turn off both apps
turn on the PHP app first, and then try to turn on the MySQL app.
found mysql app can't start
The strange thing is that the event log never says it can't mount volume for the MySQL app.
The remaining volumen to mount is either default-token-xxxxx, or volume-959bo (the volume name in PHP app), but never mysql-data (the volume name in MySQL app).
So the error seems to be caused by the underlying storage you are using, in this case EBS. The OpenShift docs actually specifically state that this is the case for block storage, see here.
I know this will work for both NFS and Glusterfs storage, and have done this in numerous projects using these storage type but unfortunately, in your case it's not supported

Openshift nodes storage configuration

I'm looking for some info about what requirements are best practices for openshift storage in nodes which will execute dockers but I didn't find any clear solution.
My questions would be:
-is any shared storage mandatory for all nodes?
-can I control the directory where images will be placed?
-must be nfs directories that will be acceded by containers be already mounted in the node server?
I've been looking for information about this and these are my conclusions:
If you need persistant storage for example db, jenkins master or any kind of storage you want to maintain every time a docker boots then you have to mount the storage in the nodes that can run docker that requires that persistent storage.
Mount in nodes any of these:
NFS ,HostPath (single node testing only of course already mounted),GlusterFS,Ceph,OpenStack Cinder, AWS Elastic Block Store (EBS),GCE Persistent Disk ,iSCSI, Fibre Channel
Create persistent volumes in Openshift
Openshift nfs example creating file.yaml file
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0003
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /tmp
server: 172.17.0.2
Create from the file created
oc create -f file.yaml
Create a claim from the datastore, claims will search for persistent volumes available with the capacity required.
Then a claim will be used by pods.
For example let's claim 1GB ,later we will associate a claim with a pod.
Create nfs-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-claim1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Create from the file created
oc create -f nfs-claim.yaml
Create a pod fo with the storage volume and with a claim.
-
apiVersion: v1
kind: Pod
metadata:
name: nginx-nfs-pod
labels:
name: nginx-nfs-pod
spec:
containers:
- name: nginx-nfs-pod
image: fedora/nginx
ports:
- name: web
containerPort: 80
volumeMounts:
- name: nfsvol
mountPath: /usr/share/nginx/html
volumes:
- name: nfsvol
persistentVolumeClaim:
claimName: nfs-claim1
Some extra options like selinux settings must be required, but they are so well explained here (https://docs.openshift.org/latest/install_config/storage_examples/shared_storage.html)
is any shared storage mandatory for all nodes?
No shared storage is not mandatory, but it is highly recommended (as most application will require some "state-full" storage, which can only really be obtained with a shared storage provider. The following https://docs.openshift.org/latest/install_config/persistent_storage/index.html are options for such storage providers.