How to destroy pod with status "Unknown" in openshift? - openshift

My pods are showing status as Unknown; and attempts to scale them down to 0 have failed from command prompt as well as UI.
And this is not allowing me to start new pods as well. See error below:
Unable to mount volumes for pod
"mysql-1-v9lcr_cloud-apps(1e9beb72-e3cf-11e8-943e-02ec8e61afcf)":
timeout expired waiting for volumes to attach or mount for pod
"cloud-apps"/"mysql-1-v9lcr". list of unmounted volumes=[mysql-data].
list of unattached volumes=[mysql-data default-token-zsbnc]
Location: starter-ca-central-1

Related

Unable to create a new app using an image from openshift internal registry

I have an nginx image ans I am able to push it to openshift internal registry. However, when I try to use that image from internal registry to create an app, it gives me imagepullback error.
Below are the steps which I am following.
[root#artel1 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/nginx latest 231d40e811cd 4 weeks ago 126 MB
[root#artel1 ~]# docker tag 231d40e811cd docker-registry-default.router.default.svc.cluster.local/openshift/nginx
[root#artel1 ~]# docker push docker-registry-default.router.default.svc.cluster.local/openshift/nginx
[root#artel1 ~]# oc new-app --docker-image=docker-registry-default.router.default.svc.cluster.local/openshift/test-image
W1227 10:18:34.761105 33535 dockerimagelookup.go:233] Docker registry lookup failed: Get https://docker-registry-default.router.default.svc.cluster.local/v2/: x509: certificate signed by unknown authority
W1227 10:18:34.784988 33535 newapp.go:479] Could not find an image stream match for "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest". Make sure that a Docker image with that tag is available on the node for the deployment to succeed.
--> Found Docker image 7809d84 (8 days old) from docker-registry-default.router.default.svc.cluster.local for "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
OpenShift Node
--------------
This is a component of OpenShift and contains the software for individual nodes when using SDN.
Tags: openshift, node
* This image will be deployed in deployment config "test-image"
* Ports 53/tcp, 8443/tcp will be load balanced by service "test-image"
* Other containers can access this service through the hostname "test-image"
* WARNING: Image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest" runs as the 'root' user which may not be permitted by your cluster administrator
--> Creating resources ...
deploymentconfig.apps.openshift.io "test-image" created
service "test-image" created
--> Success
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/test-image'
Run 'oc status' to view your app.
Events logs
34s 47s 2 test-image-1-dzhmk.15e44d430e48ec8d Pod spec.containers{test-image} Normal Pulling kubelet, artel2.fyre.ibm.com pulling image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
34s 46s 2 test-image-1-dzhmk.15e44d4318ec7f53 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Failed to pull image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest": rpc error: code = Unknown desc = Error: image openshift/test-image:latest not found
34s 46s 2 test-image-1-dzhmk.15e44d4318ed5311 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Error: ErrImagePull
27s 46s 7 test-image-1-dzhmk.15e44d433c24e5c9 Pod Normal SandboxChanged kubelet, artel2.fyre.ibm.com Pod sandbox changed, it will be killed and re-created.
25s 43s 6 test-image-1-dzhmk.15e44d43dd6a7b57 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Error: ImagePullBackOff
25s 43s 6 test-image-1-dzhmk.15e44d43dd6a10d9 Pod spec.containers{test-image} Normal BackOff kubelet, artel2.fyre.ibm.com Back-off pulling image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
Pod status
[root#artel1 ~]# oc get po
NAME READY STATUS RESTARTS AGE
test-image-1-deploy 1/1 Running 0 3m
test-image-1-dzhmk 0/1 ImagePullBackOff 0 3m
Where exactly things are going wrong ?
It looks like 'docker push' hasn't been completed successfully. It should return 'Image successfully pushed'.
Try to login to internal registry first (see accessing_registry), and recheck registry's service hostname or use service ip

route to application stopped working in OpenShift Online 3.9

I have an application running in Openshift Online starter, which worked for the last 5 months. A single pod behind a service with a route defined that does edge tls termination.
Since Saturday, when trying to access the application, I get the error message
Application is not available
The application is currently not serving requests at this endpoint. It may not have been started or is still starting.
Possible reasons you are seeing this page:
The host doesn't exist. Make sure the hostname was typed correctly and that a route matching this hostname exists.
The host exists, but doesn't have a matching path. Check if the URL path was typed correctly and that the route was created using the desired path.
Route and path matches, but all pods are down. Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
The pod is running, I can exec into it and check this, I can port-forward to it and access it.
checking the different components with oc:
$ oc get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE
taboo3-23-jt8l8 1/1 Running 0 1h 10.128.37.90 ip-172-31-30-113.ca-central-1.compute.internal
$ oc get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
taboo3 172.30.238.44 <none> 8080/TCP 151d
$ oc describe svc taboo3
Name: taboo3
Namespace: sothawo
Labels: app=taboo3
Annotations: openshift.io/generated-by=OpenShiftWebConsole
Selector: deploymentconfig=taboo3
Type: ClusterIP
IP: 172.30.238.44
Port: 8080-tcp 8080/TCP
Endpoints: 10.128.37.90:8080
Session Affinity: None
Events: <none>
$ oc get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
taboo3 taboo3-sothawo.193b.starter-ca-central-1.openshiftapps.com taboo3 8080-tcp edge/Redirect None
I tried to add a new route as well (with or without tls), but am getting the same error.
Does anybody have an idea what might be causing this and how to fix it?
Addition April 17, 2018: Got an email from Openshift Online support:
It looks like you may be affected by this bug.
So waiting for it to be resolved.
The problem has been resolved by Openshift Online, the application is working again

openshift v3 online pro volume and memory limit issues

I am trying to run an sonatype/nexus3 on openshift online v3 pro. If I just use the web console to create a new app from image it assigns it only 512Mi and it dies with OOM. It did get created though and logged a lot of java output before it died of out of memory. When using the web console there doesnt appear a way to set the memory on the image. When I try to edited the yaml of the pod it doesn't let me edited the memory limit.
Reading the docs about memory limits it suggests that I can run with this:
oc run nexus333 --image=sonatype/nexus3 --limits=memory=750Mi
Then it doesn't even start. It dies with:
{kubelet ip-172-31-59-148.ec2.internal} Error: Error response from
daemon: {"message":"create
c30deb38b3c26252bf1218cc898fbf1c68d8fc14e840076710c211d58ed87a59:
mkdir
/var/lib/docker/volumes/c30deb38b3c26252bf1218cc898fbf1c68d8fc14e840076710c211d58ed87a59:
permission denied"}
More information from oc get events:
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
16m 16m 1 nexus333-1-deploy Pod Normal Scheduled {default-scheduler } Successfully assigned nexus333-1-deploy to ip-172-31-50-97.ec2.internal
16m 16m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Pulling {kubelet ip-172-31-50-97.ec2.internal} pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.6.173.0.21"
16m 16m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Pulled {kubelet ip-172-31-50-97.ec2.internal} Successfully pulled image "registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.6.173.0.21"
15m 15m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Created {kubelet ip-172-31-50-97.ec2.internal} Created container
15m 15m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Started {kubelet ip-172-31-50-97.ec2.internal} Started container
15m 15m 1 nexus333-1-rftvd Pod Normal Scheduled {default-scheduler } Successfully assigned nexus333-1-rftvd to ip-172-31-59-148.ec2.internal
15m 14m 7 nexus333-1-rftvd Pod spec.containers{nexus333} Normal Pulling {kubelet ip-172-31-59-148.ec2.internal} pulling image "sonatype/nexus3"
15m 10m 19 nexus333-1-rftvd Pod spec.containers{nexus333} Normal Pulled {kubelet ip-172-31-59-148.ec2.internal} Successfully pulled image "sonatype/nexus3"
15m 15m 1 nexus333-1-rftvd Pod spec.containers{nexus333} Warning Failed {kubelet ip-172-31-59-148.ec2.internal} Error: Error response from daemon: {"message":"create 3aa35201bdf81d09ef4b09bba1fc843b97d0339acfef0c30cecaa1fbb6207321: mkdir /var/lib/docker/volumes/3aa35201bdf81d09ef4b09bba1fc843b97d0339acfef0c30cecaa1fbb6207321: permission denied"}
I am not sure why if I use the web console I cannot assign more memory. I am not sure why running it with oc run dies with the mkdir error. Can anyone tell me how to run sonatype/nexus3 on openshift online pro?
Looking in the documentation I see that it is a Java VM solution.
When using Java 8, memory usage can be DRAMATICALLY IMPROVED using only the following 2 runtime Java VM options:
... "-XX:+UnlockExperimentalVMOptions", "-XX:+UseCGroupMemoryLimitForHeap" ...
I just deployed my container (Spring Boot JAR) that consumed over 650 MB RAM. With just these two (new) options RAM consumption dropped to just 270 MB!!!
So, with these 2 runtime settings all OOM's are left far behind! Enjoy!
You may want to also follow along with the tutorial that is in the OpenShift docs https://docs.openshift.com/online/dev_guide/app_tutorials/maven_tutorial.html
I have had success deploying this in OpenShift Online Pro
Okay the mkdir /var/lib/docker/volumes/ permission denied seems to be that the image needs a /nexus-data mount and that is refused. I saw that by deploying from the web console (dies with OOM) but the edit yaml for the created pod to see the generated volume mount.
Creating the image with the following yaml using cat nexus3_pod.ephemeral.yaml | oc create -f - with the volume mount and explicit memory settings the container will now start up:
apiVersion: "v1"
kind: "Pod"
metadata:
name: "nexus3"
labels:
name: "nexus3"
spec:
containers:
-
name: "nexus3"
resources:
requests:
memory: "1200Mi"
limits:
memory: "1200Mi"
image: "sonatype/nexus3"
ports:
-
containerPort: 8081
name: "nexus3"
volumeMounts:
- mountPath: /nexus-data
name: nexus3-1
volumes:
- emptyDir: {}
name: nexus3-1
Notes
The mage sets -Xmx1200m as documented at sonatype/docker-nexus3. So if you assign memory less than 1200Mi it will crash with OOM when the heap grows over the limit. You may as well set requested and max to be the max heap side anything.
When the allocated memory was too low it crashed die just as it was setting up the DB which corrupted the db log which meant it then got in a crash loop "couldn't load 4 byte from 0 byte file" when I recreated it with more memory. It seems that with an emptyDir the files hang around between crash restarts and memory changes (that's documented behaviour I think). I had to recreate a pod with a different name to get a clean emptyDir and assigned memory of 1200Mi to get it to all start.

OpenShift Next Gen fails to mount persistent volume

I'm trying to set up an app on OpenShift Online Next Gen and I need to store a small file at runtime and read it again during startup. The content of the file changes, so I cannot simply add it to my source code.
My project is already up and running, all I need is persistent storage. So, I open the Web Console, click Browse->Storage and it says there are no volumes available. Same things if I go to Browse->Deployments and try to attach a volume.
So, I logged in via cli and issued the following command:
oc volume dc/mypingbot --add --type=pvc --claim-name=data1 --claim-size=1Gi
Now my volume appears both in the Storage section and in the deplyment section. I attach it to my deployment config using the Web Console and set its mount point to /data1.
The deployment process now takes a while and then fails with the following two errors:
Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-ba78501e": Error attaching EBS volume: VolumeInUse: vol-ba78501e is already attached to an instance status code: 400, request id:
Unable to mount volumes for pod "mypingbot-18-ilklx_mypingbot(0d22f712-58a3-11e6-a1a5-0e3d364e19a5)": Could not attach EBS Disk "aws://us-east-1c/vol-ba78501e": Error attaching EBS volume: VolumeInUse: vol-ba78501e is already attached to an instance status code: 400, request id:
What am I missing?

Kubernetes deployment cannot mount volume despite equivalent gcloud/mnt works fine [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I have a Kubernetes deployment where a pod should mount a PD.
Under spec.template.spec.containers.[*] I have this:
volumeMounts:
- name: app-volume
mountPath: /mnt/disk/app-pd
and under spec.template.spec that:
volumes:
- name: app-volume
gcePersistentDisk:
pdName: app-pd
fsType: ext4
app-pd is a GCE persistent disk with a single ext4 file system (hence no partitions) on it. If I run kubectl create I get these error messages from kubectl describe pod:
Warning FailedMount Unable to mount volumes for pod "<id>":
timeout expired waiting for volumes to attach/mount for pod"<id>"/"default".
list of unattached/unmounted volumes=[app-volume]
Warning FailedSync Error syncing pod, skipping:
timeout expired waiting for volumes to attach/mount for pod "<id>"/"default".
list of unattached/unmounted volumes=[app-volume]
On the VM instance that runs the pod, /var/log/kubelet.log contain repetitions of these error messages, which are presumably related to or even causing the above:
reconciler.go:179]
VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/<id>"
(spec.Name: "<id>") pod "<id>" (UID: "<id>")
goroutinemap.go:155]
Operation for "kubernetes.io/gce-pd/<id>" failed.
No retries permitted until <date> (durationBeforeRetry 2m0s).
error: Volume "kubernetes.io/gce-pd/<id>" (spec.Name: "<id>") pod "<id>" (UID: "<id>")
is not yet attached according to node status.
However, if I try to attach the PD to the VM instance which runs the pod with gcloud compute instances attach-disk and the gcloud compute ssh into it, I can see that these the following file have been created.
/dev/disk/by-id/google-persistent-disk-1
If I mount it (the PD) I can see and work with the expected files.
How can I further diagnose this problem and ultimately resolve it?
Could the problem be that the file is called /dev/disk/google-persistent-disk-1 instead of /dev/disk/google-<id> as would happen if I would have mounted them from the Cloud Console UI?
UPDATE I've simplified the setup by formatting the disk with a single ext4 file system (hence no partitions) and edited the description above accordingly. I've also added more specific error indications from kubelet.log.
UPDATE The problem also remains if I manually add the PD (in the Cloud Console UI) before deployment to the instance VM that will host the pod. Both the PD and the instance VM are in the same zone.
UPDATE The observed difference in block device names for the same persistent disk is normal according to GCE #211.
I don't know why (yet) but deleting and then recreating the GKE cluster before deployment apparently solved the issue.