How can I assign cgroup resource limits per gitlab runner job to ensure fair sharing of resources? - gitlab-ci-runner

I'd like to assign cgroup limits to my gitlab runner jobs so that they each get a fair amount of CPU and IO.
I've tried using sudo cgclassify to a slot specific cgroup within the job scripts, but the subprocesses of the job don't seem to inherit the same cgroup nor the limits. They reset back to the gitlab runner service's cgroup and its limits. I think systemd is monitoring future forks and assigning the cgroup.

What seems to work is to not try to take the job out of its own cgroup / scope with cgclassify. It turns out that each gitlab runner job is already put into its own systemd scope. So we just need to modify the limits in that scope.
This script will determine what scope you are in and apply the limit. If call this from your gitlab cicd script with sudo, you can assign the resource limits:
#!/bin/sh
set -x
set -e
echo CGROUP Before:
cat /proc/self/cgroup
scope=$(grep 1:name=systemd: /proc/self/cgroup | grep -o -E 'session-\S+.scope')
systemctl set-property --runtime $scope CPUShares=1000 BlockIOWeight=500
echo CGROUP After:
cat /proc/self/cgroup
And the sudoers ...
gitlab-runner ALL=(root) NOPASSWD: /usr/local/bin/limit-gitlab-job-resources

Related

Monitoring the progress of specified pods with oc cli

I want to know is there a way to monitor the progress of a particular pod instead of seeing all pods?
For example I scale consleapp
oc scale dc consleapp --replicas=3
After that command I want to watch the progress of only consoleapp and ensure the pods are active.
I thought you'd be able to run this command
oc get pods consoleapp -watch but it does not work. Is there a way for me to monitor the progress of this? very similar to oc rollout status deploymentconfig/consoleapp --watch but without rolling out a new deployment.
When you run oc get pods consoleapp, you're asking for a pod named consoleapp, which of course doesn't exist. If you want to watch all the pods managed by the DeploymentConfig, you can use the --selector (-l) option to select pods that match the selector in your DeploymentConfig.
If you have:
spec:
selector:
name: consoleapp
Then you would run:
oc get pod -l name=consoleapp -w
If your selector has multiple labels, combine them with commas:
oc get pod -l app=consoleapp,component=frontend -w
NB: DeploymentConfigs are considered deprecated and can generally be replaced by standard Kubernetes Deployments:
Both Kubernetes Deployment objects and OpenShift Container Platform-provided DeploymentConfig objects are supported in OpenShift Container Platform; however, it is recommended to use Deployment objects unless you need a specific feature or behavior provided by DeploymentConfig objects.
(from "Understanding Deployment and DeploymentConfig objects")

How to force detach a non-boot disk on all VM that's attached

I have a shared read-only persistent disk that gets an update every month.
How can I force all VMs attached to this shared disk to detach without passing to the command the list of instances?
How can I detach a Read-Only disk from all instances?
A solution is to use gcloud compute instances detach-disk INSTANCE_NAME --disk DISK but I don't want to sequentially input a list of instance names that's attached.
'gcloud compute instances detach-disk' doesn't have a 'force' option.
Bear in mind that detaching a disk without first unmounting it may result an error for the applications that are using the data. To unmount a persistent disk on a Linux-based image, ssh into the instance and run:
sudo umount /dev/disk/by-id/google-DEVICE_NAME
Once the device is detached you can use this script sample to run the 'gcloud compute instances detach-disk' command:
#!/bin/bash
zone="ZONE"
disk="DEVICE_NAME"
for i in $(gcloud compute disks describe $disk --zone $zone | grep "^-" | rev | cut -d "/" -f1 | rev)
do
gcloud compute instances detach-disk $i --disk=$disk --zone=$zone
done
You can refer to this documentation[1] to get more information about the parameters of the command.
[1] https://cloud.google.com/sdk/gcloud/reference/compute/instances/detach-disk

How to enable mutual SSL verification mode in Redhat-SSO image for OpenShift

I am using the template sso72-x509-postgresql-persistent, which is based on Redhat-SSO and Keycloak, to create an application in OpenShift.
I am going to enable its mutual SSL mode, so that a user has to only provide his certificate instead of user name and password in his request. The document (https://access.redhat.com/documentation/en-us/red_hat_single_sign-on/7.2/html-single/server_administration_guide/index#x509) told me to edit the standalone.xml file to add configuration sections. It worked fine.
But the template image sso72-x509-postgresql-persistent had problem with this procedure, because after it was deployed on the OpenShift, any changes on the files within the docker have been lost after restart of the docker.
Is there anyway to enable the mutual SSL mode through another level matter like commandline or API instead of editting a configuration file, except making my own docker image?
Ok, I'm including this anyway. I wasn't able to get this working due to permissions issues (the mounted files didn't persist the same permissions as before, so the container continued to fail. But a lot of work went into this answer, so hopefully it points you in the right direction!
You can add a Persistent Volume (PV) to ensure your configuration changes survive a restart. You can add a PV to your deployment via:
DON'T DO THIS
oc set volume deploymentconfig sso --add -t pvc --name=sso-config --mount-path=/opt/eap/standalone/configuration --claim-mode=ReadWriteOnce --claim-size=1Gi
This will bring up your RH-SSO image with a blank configuration directory, causing the pod to get stuck in Back-off restarting failed container. What you should do instead is:
Backup the existing configuration files
oc rsync <rhsso_pod_name>:/opt/eap/standalone/configuration ~/
Create a temporary, busybox deployment that can act as an intermediary for uploading the configuration files. Wait for deployment to complete
oc run busybox --image=busybox --wait --command -- /bin/sh -c "while true; do sleep 10; done"
Mount a new PV to the busybox deployment. Wait for deployment to complete
oc set volume deploymentconfig busybox --add -t pvc --name=sso-volume --claim-name=sso-config --mount-path=/configuration --claim-mode=ReadWriteOnce --claim-size=1Gi
Edit your configuration files now
Upload the configuration files to your new PV via the busybox pod
oc rsync ~/configuration/ <busybox_pod_name>:/configuration/
Destroy the busybox deployment
oc delete all -l run=busybox --force --grace-period=0
Finally, you attach your already created and ready-to-go persistent configuration to the RH SSO deployment
oc set volume deploymentconfig sso --add -t pvc --name=sso-volume --claim-name=sso-config --mount-path=/opt/eap/standalone/configuration
Once your new deployment is...still failing because of permission issues :/

Managing the selinux context of a file created on the host via a Docker container's volume

I ran through the fig python / django tutorial on Fedora 20 (docker 1.0.0) but it failed & tripped an AVC denial in SELinux when django-admin.py attempted to create the project files.
I reviewed the policy, i can see that setting the docker_var_lib_t context on my code dir would permit docker to write there (although i've just spied docker_share_t in the policy, that looks a better fit permissions wise - no chr / blk devices in that context).
Code directory locations are not predictable so setting a system wide policy (via semanage fcontext) doesn't seem the best way forward; i'd need to introduce some kind of convention.
Is there any way to automatically set this context on volumes mounted from a host?
You can set the following context on the directory
chcon -Rt svirt_sandbox_file_t $HOME/code/export
then run your docker command as
docker run --rm -it -v $HOME/code/export:/exported:ro image /foo/bar

Detect when instance has completed setup script?

I'm launching instances using the following command:
gcutil addinstance \
--image=debian-7 \
--persistent_boot_disk \
--zone=us-central1-a \
--machine_type=n1-standard-1 \
--metadata_from_file=startup-script:install.sh \
instance-name
How can I detect when this instance has completed it's install script? I'd like to be able to place this launch command in a larger provisioning script that then goes on to issue commands to the server that depend on the install script having been successfully completed.
There is a number of ways: sending yourself an email, uploading to Cloud Storage, sending a jabber message, ...
One simple, observable way IMHO is to add a logger entry at the end of your install.sh script (I also tweak the beginning for symmetry). Something like:
#!/bin/bash
/usr/bin/logger "== Startup script START =="
#
# Your code goes here
#
/usr/bin/logger "== Startup script END =="
You can check then if the script started or ended in two ways:
From your Developer's Console, select "Projects" > "Compute" > "VM Instances" > your instance > "Serial console" > "View Output".
From CLI, by issuing a gcutil getserialportoutput instance-name.
I don't know of a way to do all of this within gcutil addinstance.
I'd suggest:
Adding the instance via gcutil addinstance, making sure to use the --wait_until_running flag to ensure that the instance is running before you continue
Copying your script over to the instance via something like gcutil push
Using gcutil ssh <instance-name> </path-to-script/script-to-run> to run your script manually.
This way, you can write your script in such a way that it blocks until it's finished, and the ssh command will not return until your script on the remote machine is done executing.
There really are a lot of ways to accomplish this goal. One that tickles my fancy is to use the metadata server associated with the instance. Have the startup script set a piece of metadata to "FINISHED" when the script is done. You can query the metadata server with a hanging GET that will only return when the metadata updates. Just use gcutil setmetadata
from within the script as the last command.
I like this method because the hanging GET just gives you one command to run, rather than a poll to run in a loop, and it doesn't involve any services besides Compute Engine.
One more hacky way:
startup_script_finished=false
while [[ "$startup_script_finished" = false ]]; do
pid=$(gcloud compute ssh $GCLOUD_USER#$GCLOUD_INSTANCE -- pgrep -f "\"/usr/bin/python /usr/bin/google_metadata_script_runner --script-type startup\"")
if [[ -z $pid ]]; then
startup_script_finished=true
else
sleep 2
fi
done
One possible solution would be to have your install script create a text file in a cloud storage bucket, as the last thing it does, using the host name as the filename.
Your main script that did the original gcutil addinstance command could then be periodically polling the contents of the bucket (using gsutil ls) until it sees a file with a matching name and then it would know the install had completed on that instance.