mkdir /.gitlab-runner: permission denied running GitLab Runner in Kubernetes deployed via Helm - gitlab-ci-runner

I'm trying to deploy the GitLab Runner (15.7.1) onto an on-premise Kubernetes cluster and getting the following error:
PANIC: loading system ID file: saving system ID state file: creating directory: mkdir /.gitlab-runner: permission denied
This is occurring with both the 15.7.1 image (Ubuntu?) and the alpine3.13-v15.7.1 image. Looking at the deployment, it looks likes it should be trying to use /home/gitlab-runner, but for some reason it is trying to use root (/), which is a protected directory.
Anyone else experience this issue or have a suggestion as to what to look at?
I am using the Helm chart (0.48.0) using a copy of the images from dockerhub (simply moved into a local repository as internet access is not available from the cluster). Connectivity to GitLab appears to be working, but the error causes the overall startup to fail. Full logs are:
Registration attempt 4 of 30
Runtime platform arch=amd64 os=linux pid=33 revision=6d480948 version=15.7.1
WARNING: Running in user-mode.
WARNING: The user-mode requires you to manually start builds processing:
WARNING: $ gitlab-runner run
WARNING: Use sudo for system-mode:
WARNING: $ sudo gitlab-runner...
Created missing unique system ID system_id=r_Of5q3G0yFEVe
PANIC: loading system ID file: saving system ID state file: creating directory: mkdir /.gitlab-runner: permission denied
I have tried the 15.7.1 image, the alpine3.13-v15.7.1 image, and the gitlab-runner-ocp:amd64-v15.7.1 image and searched the values.yaml for anything relevant to the path. Looking at the deployment template, it appears that it ought to be using /home/gitlab-runner as the directory (instead of /) [though the docs suggested it was /home].
As for "what was I expecting", of course I was expecting that it would "just work" :)

So, resolved this (and other) issues with:
Updated helm deployment template to mount an empty volume at /.gitlab-runner
[separate issue] explicitly added builds_dir and environment [per gitlab-org/gitlab-runner#3511 (comment 114281106)].
These two steps appeared to be sufficient to get the Helm chart deployment working.

You can easily create and mount the emptyDir (in case you are creating gitlab-runner with kubernetes manifest *.yml file):
volumes:
- emptyDir: {}
name: gitlab-runner
volumeMounts:
- name: gitlab-runner
mountPath: /.gitlab-runner
-------------------- OR --------------------
volumeMounts:
- name: root-gitlab-runner
mountPath: /.gitlab-runner
volumes:
- name: root-gitlab-runner
emptyDir:
medium: "Memory"

Related

Unable to connect worker node to master using K3S

I am trying to setup a K3S cluster for learning purposes but I am having trouble connecting the master node with agents. I have looked several tutorials and discussions on this but I can't find a solution. I know I am probably missing something obvious (due to my lack of knowledge), but still help would be much appreciated.
I am using two AWS t2.micro instances with default configuration.
When ssh into the master and installed K3S using
curl -sfL https://get.k3s.io | sh -s - --no-deploy traefik --write-kubeconfig-mode 644 --node-name k3s-master-01
with kubectl get nodes, I am able to see the master
NAME STATUS ROLES AGE VERSION
k3s-master-01 Ready control-plane,master 13s v1.23.6+k3s1
So far it seems I am doing things right. From what I understand, I am supposed to configure the kubeconfig file. So, I accessed it by using
cat /etc/rancher/k3s/k3s.yaml
I copied the configuration file and the server info to match the private IP I took from AWS console, resulting in something like this
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <lots_of_info>
server: https://<master_private_IP>:6443
name: default
contexts:
- context:
cluster: default
user: default
name: default
current-context: default
kind: Config
preferences: {}
users:
- name: default
user:
client-certificate-data: <my_certificate_data>
client-key-data: <my_key_data>
Then, I ran vi ~/.kube/config, and there I pasted the kubeconfig file
Finally, I grabbed the token with cat /var/lib/rancher/k3s/server/node-token, ssh into the other machine and then run the following
curl -sfL https://get.k3s.io | K3S_NODE_NAME=k3s-worker-01 K3S_URL=https://<master_private_IP>:6443 K3S_TOKEN=<master_token> sh -
The output is
[INFO] Finding release for channel stable
[INFO] Using v1.23.6+k3s1 as release
[INFO] Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.23.6+k3s1/sha256sum-amd64.txt
[INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.23.6+k3s1/k3s
[INFO] Verifying binary download
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Skipping installation of SELinux RPM
[INFO] Creating /usr/local/bin/kubectl symlink to k3s
[INFO] Creating /usr/local/bin/crictl symlink to k3s
[INFO] Creating /usr/local/bin/ctr symlink to k3s
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-agent-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s-agent.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s-agent.service
[INFO] systemd: Enabling k3s-agent unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s-agent.service → /etc/systemd/system/k3s-agent.service.
[INFO] systemd: Starting k3s-agent
By this output, it looks like I have created an agent. However, when I run kubectl get nodes in the master, I still get
NAME STATUS ROLES AGE VERSION
k3s-master-01 Ready control-plane,master 12m v1.23.6+k3s1
What is the thing I was supposed to do in order to get the agent connected to the master? I am guess I am probably missing something simple, but I just can't seem to find the solution. I've read all the documentation but it is still not clear to me where I am making the mistake. I've tried saving the private master IP and token into the agent as environmental variables with export K3S_TOKEN=master_token and K3S_URL=master_private_IP and then simply running curl -sfL https://get.k3s.io | sh - but I still can't see the worker nodes when running kubectl get nodes
Any help would be appreciated.
It might be your VM instance firewall that prevents appropriate connection from your master to the worker node (and vice versa). Official rancher documentation advise to disable firewall for (Red Hat/CentOS) Enterprise Linux:
It is recommended to turn off firewalld:
systemctl disable firewalld --now
If enabled, it is required to disable nm-cloud-setup and reboot the node:
systemctl disable nm-cloud-setup.service nm-cloud-setup.timer reboot
If you are using Ubuntu on your VM's, there is a different firewall tool (ufw).
In my case, allowing 6443 and 443(not sure if required) port TCP connections worked fine.
Allow port 6443 and TCP connection in all of your cluster machines:
sudo ufw allow 6443/tcp
Then apply k3s installation script in your worker node(s):
curl -sfL https://get.k3s.io | K3S_NODE_NAME=k3s-worker-1 K3S_URL=https://<k3s-master-1 IP>:6443 K3S_TOKEN=<k3s-master-1 TOKEN> sh -
This should work. If not, you can try adding additional allow rule for 443 tcp port as well.
A few options to check.
Check Journalctl for errors
journalctl -u k3s-agent.service -n 300 -xn
If using RaspberryPi for a worker node, make sure you have
cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1
as the very end of your /boot/cmdline.txt file. DO NOT PUT THIS VALUE ON A NEW LINE! Should just be appended to the end of the line.
If your master node(s) have self-signed certs, make sure you copy the master node's self signed cert to your worker node(s). In linux or raspberry pi copy cert to /usr/local/share/ca-certificates, then issue an
sudo update-ca-certificates
on the worker node
Don't forget to reboot the worker node after you make these changes!
Hope this helps someone!

Annotation Validation Error when trying to install Vault on OpenShift

Following this tutorial on installing Vault with Helm on OpenShift, I encountered the following error after executing the command:
bash
helm install vault hashicorp/vault -n $VAULT_NAMESPACE -f my_values.yaml
For the config:
values.yaml
bash
echo '# Custom values for the Vault chart
global:
# If deploying to OpenShift
openshift: true
server:
dev:
enabled: true
serviceAccount:
create: true
name: vault-sa
injector:
enabled: true
authDelegator:
enabled: true' > my_values.yaml
The error:
$ helm install vault hashicorp/vault -n $VAULT_NAMESPACE -f values.yaml
Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: ClusterRole "vault-agent-injector-clusterrole" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "my-vault-1": current value is "my-vault-0"
What exactly is happening, or how can I reset this specific name space to point to the right release namespace?
Have you by chance tried the exact same thing before, because that is what the error is hinting.
If we dissect the error, we get the to the root of the problem:
Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists.
So something on the cluster already exists that you were trying to deploy via the helm chart.
Unable to continue with install:
Helm is aborting due to this failure
ClusterRole "vault-agent-injector-clusterrole" in namespace "" exists
So the cluster role vault-agent-injector-clusterrole that the helm chart is supposed to put onto the cluster already exsits. ClusterRoles aren't namespace specific, hence the "namespace" is blank.
and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "my-vault-1": current value is "my-vault-0"
The default behavior is to try to import existing resources that this chart requires, but it is not possible, because the owner of that ClusterRole is different from the deployment.
To fix this, you can remove the existing deployment of your chart and then give it an other try and it should work as expected.
Make sure all resources are gone. For this particular one you can check with kubectl get clusterroles

Google Cloud Build windows builder error "Failed to get external IP address: Could not get external NAT IP from list"

I am trying to implement automatic deployments for my Windows Kubernetes container app. I'm following instructions from the Google's windows-builder, but the trigger quickly fails with this error at about 1.5 minutes in:
2021/12/16 19:30:06 Set ingress firewall rule successfully
2021/12/16 19:30:06 Failed to get external IP address: Could not get external NAT IP from list
ERROR
ERROR: build step 0 "gcr.io/[my-project-id]/windows-builder" failed: step exited with non-zero status: 1
The container, gcr.io/[my-project-id]/windows-builder, definitely exists and it's located in the same GCP project as the Cloud Build trigger just as the windows-builder documentation commanded.
I structured my code based off of Google's docker-windows example. Here is my repository file structure:
repository
cloudbuild.yaml
builder.ps1
worker
Dockerfile
Here is my cloudbuild.yaml:
steps:
# WORKER
- name: 'gcr.io/[my-project-id]/windows-builder'
args: [ '--command', 'powershell.exe -file build.ps1' ]
# OPTIONS
options:
logging: CLOUD_LOGGING_ONLY
Here is my builder.ps1:
docker build -t gcr.io/[my-project-id]/test-worker ./worker;
if ($?) {
docker push gcr.io/[my-project-id]/test-worker;
}
Here is my Dockerfile:
FROM gcr.io/[my-project-id]/test-windows-node-base:onbuild
Does anybody know what I'm doing wrong here? Any help would be appreciated.
Replicated the steps from GitHub and got the same error. It is throwing Failed to get external IP address... error because the External IP address of the VM is disabled by default in the source code. I was able to build it successfully by adding '--create-external-ip', 'true' in cloudbuild.yaml.
Here is my cloudbuild.yaml:
steps:
- name: 'gcr.io/$PROJECT_ID/windows-builder'
args: [ '--create-external-ip', 'true',
'--command', 'powershell.exe -file build.ps1' ]

Openshift: Error pulling image from remote, secure docker registry using certificates

I use the all-in-one VM of Openshift origin.
I am trying to pull images from a private, secure registry using an Image Stream. This is the ImageStream definition:
apiVersion: v1
kind: ImageStream
metadata:
name: my-image-stream
annotations:
description: Keeps track of changes in the application image
name: my-image
spec:
dockerImageRepository: "my.registry.net/myproject/my-image"
The repository is secured with a certificate. On my local machine, i have them in /etc/docker/certs.d/my.registry.net and I can login with docker login my.registry.net.
When I run oc import-image, however, I get the following error:
The import completed with errors.
Name: my-image
Namespace: myproject
Created: About an hour ago
Labels: <none>
Description: Keeps track of changes in the application image
Annotations: openshift.io/image.dockerRepositoryCheck=2017-01-27T08:09:49Z
Docker Pull Spec: 172.30.53.244:5000/myproject/my-image
Unique Images: 0
Tags: 1
latest
tagged from my.registry.net/myproject/my-image
! error: Import failed (InternalError): Internal error occurred: Get https://my.registry.net/v2/: remote error: handshake failure
About an hour ago
I have copied the certificates to the vagrant machine and restarted the docker daemon, but the problem remains. I have not found any documentation on how to properly add the certificates, so I just put them in the usual docker folder.
What is the appropriate way to make this work?
Update in response to rezie's answer:
There is no file etc/origin/master/ca-bundle.crt on my vagrant box. I found the following ca-bundle.crt files :
$ find / -iname ca-bundle.crt
/etc/pki/tls/certs/ca-bundle.crt
##multiple lines like
/var/lib/docker/devicemapper/mnt/something-hash-like/rootfs/etc/pki/tls/certs/ca-bundle.crt
/var/lib/origin/openshift.local.config/master/ca-bundle.crt
I appended the root certificate to /etc/pki/tls/certs/ca-bundle.crt and to var/lib/origin/openshift.local.config/master/ca-bundle.crt, but that did not change anything.
Please note, however, that I do not need to have this root certificate in /etc/docker/certs.d/... in order to login directly using docker login my.registry.net
I have appended
I cannot comment due tow lo karma so I'll write an answer saying almost the same as rezie.
The error:
! error: Import failed (InternalError): Internal error occurred: Get https://my.registry.net/v2/: remote error: handshake failure
About an hour ago
Comes from OpenShift, not from docker, therefore adding it to /etc/docker/certs.d/my.registry.net doesn't prevent the error from happening.
You should add the CA certificate at OS level, my guess is the steps failed for some reason so do it this way:
openssl s_client -connect my.registry.net:443 </dev/null |
sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' \
> /etc/pki/ca-trust/source/anchors/my.registry.net.crt &&
update-ca-trust check && update-ca-trust extract
Finally test if it worked running
curl https://my.registry.net/v2
If it doesn't give you a certificate error and you still can't do the oc import restart the atomic-openshift-master-api service
Try appending your CA (the same one you said you said that was used in the my.registry.net directory) into Openshift's ca bundle (e.g. /etc/origin/master/ca-bundle.crt. Then restart the service and reattempt import-image (making sure that you do not include the --insecure flag).
For reference, check out this issue from the Origin project. As you've mentioned, there's currently no way to supply certificates along with the dockercfg secret, and the suggestion from that issue is to add the CA as a trusted root CA across all the hosts.

Using environment properties with files in elastic beanstalk config files

Working with Elastic Beanstalk .config files is kinda... interesting. I'm trying to use environment properties with the files: configuration option in an Elastc Beanstalk .config file. What I'd like to do is something like:
files:
"/etc/passwd-s3fs" :
mode: "000640"
owner: root
group: root
content: |
${AWS_ACCESS_KEY_ID}:${AWS_SECRET_KEY}
To create an /etc/passwd-s3fs file with content something like:
ABAC73E92DEEWEDS3FG4E:aiDSuhr8eg4fHHGEMes44zdkIJD0wkmd
I.e. use the environment properties defined in the AWS Console (Elastic Beanstalk/Configuration/Software Configuration/Environment Properties) to initialize system configuration files and such.
I've found that it is possible to use environment properties in container-command:s, like so:
container_commands:
000-create-file:
command: echo ${AWS_ACCESS_KEY_ID}:${AWS_SECRET_KEY} > /etc/passwd-s3fs
However, doing so will require me to manually set owner, group, file permissions etc. It's also much more of a hassle when dealing with larger configuration files than the Files: configuration option...
Anyone got any tips on this?
How about something like this. I will use the word "context" for dev vs. qa.
Create one file per context:
dev-envvars
export MYAPP_IP_ADDR=111.222.0.1
export MYAPP_BUCKET=dev
qa-envvars
export MYAPP_IP_ADDR=111.222.1.1
export MYAPP_BUCKET=qa
Upload those files to a private S3 folder, S3://myapp/config.
In IAM, add a policy to the aws-elasticbeanstalk-ec2-role role that allows reading S3://myapp/config.
Add the following file to your .ebextensions directory:
envvars.config
files:
"/opt/myapp_envvars" :
mode: "000644"
owner: root
group: root
# change the source when you need a different context
#source: https://s3-us-west-2.amazonaws.com/myapp/dev-envvars
source: https://s3-us-west-2.amazonaws.com/myapp/qa-envvars
Resources:
AWSEBAutoScalingGroup:
Metadata:
AWS::CloudFormation::Authentication:
S3Access:
type: S3
roleName: aws-elasticbeanstalk-ec2-role
buckets: myapp
commands:
# commands executes after files per
# http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
10-load-env-vars:
command: . /opt/myapp_envvars
Per the AWS Developer's Guide, commands "run before the application and web server are set up and the application version file is extracted," and before container-commands. I guess the question will be whether that is early enough in the boot process to make the environment variables available when you need them. I actually wound up writing an init.d script to start and stop things in my EC2 instance. I used the technique above to deploy the script.
Credit for the “Resources” section that allows downloading from secured S3 goes to the May 7, 2014 post that Joshua#AWS made to this thread.
I am gravedigging but since I stumbled across this in the course of my travels, there is a "clever" way to do what you describe–at least in 2018, and at least since 2016. You can retrieve an environment variable by key with get-config:
/opt/elasticbeanstalk/bin/get-config environment --key YOUR_ENV_VAR_KEY
And likewise all environment variables with (as JSON or --output YAML)
/opt/elasticbeanstalk/bin/get-config environment
Example usage in a container command:
container_commands:
00_store_env_var_in_file_and_chmod:
command: "/opt/elasticbeanstalk/bin/get-config environment --key YOUR_ENV_KEY | install -D /dev/stdin /etc/somefile && chmod 640 /etc/somefile"
Example usage in a file:
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/00_do_stuff.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/bash
YOUR_ENV_VAR=$(source /opt/elasticbeanstalk/bin/get-config environment --key YOUR_ENV_VAR_KEY)
echo "Hello $YOUR_ENV_VAR"
I was introduced to get-config by Thomas Reggi in https://serverfault.com/a/771067.
I assume that AWS_ACCESS_KEY_ID and AWS_SECRET_KEY are known to you prior to the app deployment.
You can create the file on your workstation and submit it to Elastic Beanstalk instance with the code on $ git aws.push
$ cd .ebextensions
$ echo 'ABAC73E92DEEWEDS3FG4E:aiDSuhr8eg4fHHGEMes44zdkIJD0wkmd' > passwd-s3fs
In .config:
files:
"/etc/passwd-s3fs" :
mode: "000640"
owner: root
group: root
container_commands:
10-copy-passwords-file:
command: "cat .ebextensions/passwd-s3fs > /etc/passwd-s3fs"
You might have to play with the permissions or execute cat as sudo. Also, I put the file into .ebextensions for example, it can be anywhere in your project.
Hope it helps.