OpenShift Next Gen fails to mount persistent volume - openshift

I'm trying to set up an app on OpenShift Online Next Gen and I need to store a small file at runtime and read it again during startup. The content of the file changes, so I cannot simply add it to my source code.
My project is already up and running, all I need is persistent storage. So, I open the Web Console, click Browse->Storage and it says there are no volumes available. Same things if I go to Browse->Deployments and try to attach a volume.
So, I logged in via cli and issued the following command:
oc volume dc/mypingbot --add --type=pvc --claim-name=data1 --claim-size=1Gi
Now my volume appears both in the Storage section and in the deplyment section. I attach it to my deployment config using the Web Console and set its mount point to /data1.
The deployment process now takes a while and then fails with the following two errors:
Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-ba78501e": Error attaching EBS volume: VolumeInUse: vol-ba78501e is already attached to an instance status code: 400, request id:
Unable to mount volumes for pod "mypingbot-18-ilklx_mypingbot(0d22f712-58a3-11e6-a1a5-0e3d364e19a5)": Could not attach EBS Disk "aws://us-east-1c/vol-ba78501e": Error attaching EBS volume: VolumeInUse: vol-ba78501e is already attached to an instance status code: 400, request id:
What am I missing?

Related

mkdir /.gitlab-runner: permission denied running GitLab Runner in Kubernetes deployed via Helm

I'm trying to deploy the GitLab Runner (15.7.1) onto an on-premise Kubernetes cluster and getting the following error:
PANIC: loading system ID file: saving system ID state file: creating directory: mkdir /.gitlab-runner: permission denied
This is occurring with both the 15.7.1 image (Ubuntu?) and the alpine3.13-v15.7.1 image. Looking at the deployment, it looks likes it should be trying to use /home/gitlab-runner, but for some reason it is trying to use root (/), which is a protected directory.
Anyone else experience this issue or have a suggestion as to what to look at?
I am using the Helm chart (0.48.0) using a copy of the images from dockerhub (simply moved into a local repository as internet access is not available from the cluster). Connectivity to GitLab appears to be working, but the error causes the overall startup to fail. Full logs are:
Registration attempt 4 of 30
Runtime platform arch=amd64 os=linux pid=33 revision=6d480948 version=15.7.1
WARNING: Running in user-mode.
WARNING: The user-mode requires you to manually start builds processing:
WARNING: $ gitlab-runner run
WARNING: Use sudo for system-mode:
WARNING: $ sudo gitlab-runner...
Created missing unique system ID system_id=r_Of5q3G0yFEVe
PANIC: loading system ID file: saving system ID state file: creating directory: mkdir /.gitlab-runner: permission denied
I have tried the 15.7.1 image, the alpine3.13-v15.7.1 image, and the gitlab-runner-ocp:amd64-v15.7.1 image and searched the values.yaml for anything relevant to the path. Looking at the deployment template, it appears that it ought to be using /home/gitlab-runner as the directory (instead of /) [though the docs suggested it was /home].
As for "what was I expecting", of course I was expecting that it would "just work" :)
So, resolved this (and other) issues with:
Updated helm deployment template to mount an empty volume at /.gitlab-runner
[separate issue] explicitly added builds_dir and environment [per gitlab-org/gitlab-runner#3511 (comment 114281106)].
These two steps appeared to be sufficient to get the Helm chart deployment working.
You can easily create and mount the emptyDir (in case you are creating gitlab-runner with kubernetes manifest *.yml file):
volumes:
- emptyDir: {}
name: gitlab-runner
volumeMounts:
- name: gitlab-runner
mountPath: /.gitlab-runner
-------------------- OR --------------------
volumeMounts:
- name: root-gitlab-runner
mountPath: /.gitlab-runner
volumes:
- name: root-gitlab-runner
emptyDir:
medium: "Memory"

Google Cloud Build windows builder error "Failed to get external IP address: Could not get external NAT IP from list"

I am trying to implement automatic deployments for my Windows Kubernetes container app. I'm following instructions from the Google's windows-builder, but the trigger quickly fails with this error at about 1.5 minutes in:
2021/12/16 19:30:06 Set ingress firewall rule successfully
2021/12/16 19:30:06 Failed to get external IP address: Could not get external NAT IP from list
ERROR
ERROR: build step 0 "gcr.io/[my-project-id]/windows-builder" failed: step exited with non-zero status: 1
The container, gcr.io/[my-project-id]/windows-builder, definitely exists and it's located in the same GCP project as the Cloud Build trigger just as the windows-builder documentation commanded.
I structured my code based off of Google's docker-windows example. Here is my repository file structure:
repository
cloudbuild.yaml
builder.ps1
worker
Dockerfile
Here is my cloudbuild.yaml:
steps:
# WORKER
- name: 'gcr.io/[my-project-id]/windows-builder'
args: [ '--command', 'powershell.exe -file build.ps1' ]
# OPTIONS
options:
logging: CLOUD_LOGGING_ONLY
Here is my builder.ps1:
docker build -t gcr.io/[my-project-id]/test-worker ./worker;
if ($?) {
docker push gcr.io/[my-project-id]/test-worker;
}
Here is my Dockerfile:
FROM gcr.io/[my-project-id]/test-windows-node-base:onbuild
Does anybody know what I'm doing wrong here? Any help would be appreciated.
Replicated the steps from GitHub and got the same error. It is throwing Failed to get external IP address... error because the External IP address of the VM is disabled by default in the source code. I was able to build it successfully by adding '--create-external-ip', 'true' in cloudbuild.yaml.
Here is my cloudbuild.yaml:
steps:
- name: 'gcr.io/$PROJECT_ID/windows-builder'
args: [ '--create-external-ip', 'true',
'--command', 'powershell.exe -file build.ps1' ]

How to destroy pod with status "Unknown" in openshift?

My pods are showing status as Unknown; and attempts to scale them down to 0 have failed from command prompt as well as UI.
And this is not allowing me to start new pods as well. See error below:
Unable to mount volumes for pod
"mysql-1-v9lcr_cloud-apps(1e9beb72-e3cf-11e8-943e-02ec8e61afcf)":
timeout expired waiting for volumes to attach or mount for pod
"cloud-apps"/"mysql-1-v9lcr". list of unmounted volumes=[mysql-data].
list of unattached volumes=[mysql-data default-token-zsbnc]
Location: starter-ca-central-1

Kubernetes deployment cannot mount volume despite equivalent gcloud/mnt works fine [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I have a Kubernetes deployment where a pod should mount a PD.
Under spec.template.spec.containers.[*] I have this:
volumeMounts:
- name: app-volume
mountPath: /mnt/disk/app-pd
and under spec.template.spec that:
volumes:
- name: app-volume
gcePersistentDisk:
pdName: app-pd
fsType: ext4
app-pd is a GCE persistent disk with a single ext4 file system (hence no partitions) on it. If I run kubectl create I get these error messages from kubectl describe pod:
Warning FailedMount Unable to mount volumes for pod "<id>":
timeout expired waiting for volumes to attach/mount for pod"<id>"/"default".
list of unattached/unmounted volumes=[app-volume]
Warning FailedSync Error syncing pod, skipping:
timeout expired waiting for volumes to attach/mount for pod "<id>"/"default".
list of unattached/unmounted volumes=[app-volume]
On the VM instance that runs the pod, /var/log/kubelet.log contain repetitions of these error messages, which are presumably related to or even causing the above:
reconciler.go:179]
VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/<id>"
(spec.Name: "<id>") pod "<id>" (UID: "<id>")
goroutinemap.go:155]
Operation for "kubernetes.io/gce-pd/<id>" failed.
No retries permitted until <date> (durationBeforeRetry 2m0s).
error: Volume "kubernetes.io/gce-pd/<id>" (spec.Name: "<id>") pod "<id>" (UID: "<id>")
is not yet attached according to node status.
However, if I try to attach the PD to the VM instance which runs the pod with gcloud compute instances attach-disk and the gcloud compute ssh into it, I can see that these the following file have been created.
/dev/disk/by-id/google-persistent-disk-1
If I mount it (the PD) I can see and work with the expected files.
How can I further diagnose this problem and ultimately resolve it?
Could the problem be that the file is called /dev/disk/google-persistent-disk-1 instead of /dev/disk/google-<id> as would happen if I would have mounted them from the Cloud Console UI?
UPDATE I've simplified the setup by formatting the disk with a single ext4 file system (hence no partitions) and edited the description above accordingly. I've also added more specific error indications from kubelet.log.
UPDATE The problem also remains if I manually add the PD (in the Cloud Console UI) before deployment to the instance VM that will host the pod. Both the PD and the instance VM are in the same zone.
UPDATE The observed difference in block device names for the same persistent disk is normal according to GCE #211.
I don't know why (yet) but deleting and then recreating the GKE cluster before deployment apparently solved the issue.

Startup script from Bitbucket (https) fail to download, but works if instance is reset

I am programatically launching a new instance using the Compute Engine API for Go [1], and a tool I made called vmproxy [2].
The problem I have is that if I launch a preemptible VM using a startup-script-url pointing to https://bitbucket.org/ronoaldo/debian-custom/raw/tip/tools/autobuild, the build script fails to download. I can see in the serial console output that the the startup script metadata is there, and that it attempts to be downloaded with curl, but that part fails.
However, if I reset the instance via the developers console, the script is properly downloaded and runs nicelly.
The code I am using to setup the instance is:
// Ronolinux is a VM Proxy that runs an live systems build on Compute Engine
var (
Ronolinux = &vmproxy.VM{
Path: "/",
Instance: vmproxy.Instance{
Name: "ronolinux-buildd",
Zone: "us-central1-f",
Image: vmproxy.ResourcePrefix + "/debian-cloud/global/images/debian-8-jessie-v20150915",
MachineType: "n1-standard-1",
Metadata: map[string]string{
"startup-script-url": "https://bitbucket.org/ronoaldo/debian-custom/raw/tip/tools/autobuild",
"shutdown-script": `!#/bin/bash
gsutil cp /var/log/startupscript.log gs://ronoaldo/ronolinux/build-$(date +%Y%m%d%H%M%S).log
`,
},
Scopes: []string{ storageReadWrite },
},
}
)
[1] https://godoc.org/google.golang.org/api/compute/v1
[2] https://godoc.org/ronoaldo.gopkg.net/aetools/vmproxy
If your startup script is not hosted on Cloud Storage, there is a random chance the download will fail. If you look at the serial console output, make sure to scroll horizontally, as it will not wrap long lines. In my case, the error line was very long, and this hidded the real end of the message:
(... long curl on-line progress output )
curl: (7) Failed to connect to bitbucket.org port 443: Connection timed out
(...)
Your host must respond within a 10s timeout. In my case, the first boot usually failed to contact Bitbucket, hence failing to download the script; a VM reset also made things work, as the network latency outside Google Cloud were probably better.
I ended up moving to host the script on cloud storage to avoid these issues.