I'm trying to migrate an Openshift v2 application to v3. I'm really struggling to understand the documentation. In the section on persistent volumes, it says the following:
EmptyDir has the same lifecycle as the pod:
EmptyDir volumes survive container crashes/restarts.
EmptyDir volumes are deleted when the pod is deleted.
I cannot understand what this means. I have added storage to my app using the Web Console, which allowed me to add 1 GB of persistent storage and give it to a particular mountpoint. I don't know if this is an "EmptyDir" volume or not (I think it isn't, but in that case why the warning in the persistent volumes section?). Now, every time I rebuild the application, a new pod is created (if I understand this correctly). So far my data has persisted when this happens. Does this warning mean it can suddenly be wiped out? Or is persistent storage persistent?
When you claim a persistent volume, you are not usually using an EmptyDir volume type, so that isn't relevant. EmptyDir is a special volume type that is managed a bit differently that would normally only be needed if you want to share some temporary file system space between different containers in the same pod.
In short, the persistent storage is indeed 'persistent'. You could see the difference by a simple experiment of creating a mysql with "non-persistent" vs "persistent".
The openshift architecture is built on Amazon infrastructure. Please check AWS EBS volume which is backbone of Openshift persistent storage -
https://aws.amazon.com/ebs/getting-started/
EDIT
Think it like that
EBS Volume -- ATTACHED TO --- pod ---- WHICH CONTAINS --- Containers
What document means to say that, if you destroy the POD (think of a machine), the EBS volume will be wiped out and available again as an unattached resource. Its lifecycle limited by pod. However, a POD can contain multiple containers (say JVM) and each container can share the EBS volume (think of a hard disk), but killing a container does not affect the EBS lifecycle.
Related
Situation: Using Tekton tasks to build and deploy following this tutorial. After running the pipeline, it creates a pod that requires persistent volume. A persistent volume claim is automatically created to try and bind to a PV.
However, what if I want to run the pipeline again and again? I noticed after a pipeline run (right now it's failing for unrelated reasons), the PVC will no longer be needed, but the PV will be in a Released state. I can manually edit out the PVC in the YAML.
I looked into dynamic provisioning but the plug ins there do not seem to be for NFS, but rather other tools with APIs.
Is there an option for me to not have to manually reclaim my PV every time?
PersistentVolumes can be configured to automatically clean themselves up when released. You can configure this in your persistent volume definition at persistentvolume.spec.persistentVolumeReclaimPolicy.
Take a look at Reclaim Volumes in the official OpenShift documentation.
Retain reclaim policy allows manual reclamation of the resource for those volume plug-ins that support it.
Recycle reclaim policy recycles the volume back into the pool of unbound persistent volumes once it is released from its claim.
Delete reclaim policy deletes both the PersistentVolume object from OpenShift Container Platform and the associated storage asset in external infrastructure, such as AWS EBS or VMware vSphere.
In your case you want to use Recyle
Background:
I've deployed a spring boot app to the openshift platform, and would like to know how to handle persistent storage in OpenShift3.
I've subscribed to the free plan and have access to the console.
I can use oc command, but access seems limited under my user for commands like 'oc get pv' and others.
Question
How can I get a finer control over my pvc (persistent storage claim) on OS3?
Ideally, I want a shell and be able to 'list' file on that volume.
Thanks in advance for your help!
Solution
Add storage to your pod
use the command oc rsh <my-pod> to get access to the pod
cd /path-to-your-storage/
The oc get pv command can only be run by a cluster admin because it shows all the declared persistent volumes available in the cluster as a whole.
All you need to know is that in OpenShift Online starter, you have access to claim one persistent volume. The type of that persistent volume is ReadWriteOnce or RWO.
A persistent volume is not yours until you make a claim and so have a persistent volume claim (pvc) in your project. In order to be able to see what is in the persistent volume, it has to be mounted against a pod, or in other words, in use by an application. You can then get inside of the pod and use normal UNIX commands to look at what is inside the persistent volume.
For more details on persistent volumes, suggest perhaps reading chapter about storage in the free eBook at:
https://www.openshift.com/deploying-to-openshift/
How does OpenShift scale when using EBS for persistent storage? How does OpenShift map users to EBS volumes? Because its infeasible to allocate 1 ebs volume to each user, how does openshift handle this in the backend using kubernetes?
EBS volumes can only be mounted on a single node in a cluster at a time. This means you cannot scale an application that uses one beyond 1 replica. Further, an application using an EBS volume cannot use 'Rolling' deployment strategy as that would require there to be 2 replicas when the new deployment is occurring. The deployment strategy therefore needs to be set to 'Recreate'.
Subject to those restrictions on your deployed application which has claimed a volume of type EBS, there is no problems with using EBS volumes as an underlying storage type. Kubernetes will quite happily map the volume into the pod for your application. If that pod dies and gets started on a different node, Kubernetes will then mount the volume in the pod on the new node instead, such that your storage follows the application.
If you give up a volume claim, its contents are wiped and it is returned to the pool of available volumes. A subsequent claim by you or a different user can then get that volume and it would be applied to the pod for the new application.
This is all handled and works no problems. It is a bit hard to understand what you are asking, but hopefully this gives you a better picture.
Is it reasonable to use Kubernetes for a clustered database such as MySQL in production environment?
There are example configurations such as mysql galera example. However, most examples do not make use of persistent volumes. As far as I've understood persistent volumes must reside on some shared file system as defined here Kubernetes types of persistent volumes. A shared file system will not guarantee that the database files of the pod will be local to the machine hosting the pod. It will be accessed over network which is rather slow. Moreover, there are issues with MySQL and NFS, for example.
This might be acceptable for a test environment. However, what should I do in a production environment? Is it better to run the database cluster outside Kubernetes and run only application servers with Kubernetes?
The Kubernetes project introduced PetSets, a new pod management abstraction, intended to run stateful applications. It is an alpha feature at present (as of version 1.4) and moving rapidly. A list of the various issues as we move to beta are listed here. Quoting from the section on when to use petsets:
A PetSet ensures that a specified number of "pets" with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In addition to the above, it can be coupled with several other features which help one deploy clustered stateful applications and manage them. Coupled with dynamic volume provisioning for example, it can be used to provision storage automatically.
There are several YAML configuration files available (such as the ones you referenced) using ReplicaSets and Deployments for MySQL and other databases which may be run in production and are probably being run that way as well. However, PetSets are expected to make it a lot easier to run these types of workloads, while supporting upgrades, maintenance, scaling and so on.
You can find some examples of distributed databases with petsets here.
The advantage of provisioning persistent volumes which are networked and non-local (such as GlusterFS) is realized at scale. However, for relatively small clusters, there is a proposal to allow for local storage persistent volumes in the future.
I'm newbie at kubernetes and I'm having problem to understand how I can run persistent pods (Cassandras ones or mysql ones) in ubuntu servers.
Correct me if I'm wrong, kubernetes can scale up or down the pods when it sees that we need more CPU but we are not talking about static code but data that are present in other nodes. So what will do the pod when it receive the request from the balancer? Also, kubernetes has the power to destroy nodes when it sees that the traffic has reduced, so how we can not lose data and not disturb the environment?
You should use volumes to map a directory in the container to persistent disks on the host or other storage