How to change interval for node status sync - openshift

I am using openshift and testing HA features, pods have been running on 2 nodes as the following:
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
hello-1-7j6zp 1/1 Running 0 18m 10.128.0.153 node1.exampledis.com
hello-1-mztf8 1/1 Running 0 18m 10.128.0.152 node1.exampledis.com
hello-1-pmz2g 1/1 Running 0 26m 10.130.0.46 node2.exampledis.com
I shutdown vm which runs as node2.exampledis.com, after about 1 minute, new pod begins to startup on node1, pod on node2 becomes "unknown", I think there should be some parameter to control the interval, who can share some points on this?
version:
oc v3.6.1+008f2d5
kubernetes v1.6.1+5115d708d7
features: Basic-Auth
Server https://master.exampledis.com:8443
openshift v3.7.9
kubernetes v1.7.6+a08f5eeb62
Best regards
Lan

Kubelet --sync-frequency parameter controls sync interval, as shown in kubelet doc
--sync-frequency: Max period between synchronizing running containers and config (default 1m0s)

Related

Where do I find slurm diagnostic information when a job just hangs?

I am running slurm 20.11.8 on a system with 1 login node and 3 compute nodes. The status information I can find is below.
$ slurmd -V
slurm 20.11.8
$ sinfo -N
NODELIST NODES PARTITION STATE
pauli-node-01 1 normal* idle
pauli-node-02 1 normal* idle
pauli-node-03 1 normal* idle
$ systemctl status slurmctld
● slurmctld.service - Slurm controller daemon
Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2021-10-05 22:04:10 CDT; 10h ago
Main PID: 11802 (slurmctld)
Tasks: 7
Memory: 6.7M
CGroup: /system.slice/slurmctld.service
└─11802 /usr/sbin/slurmctld -D
Oct 05 22:04:10 pauli.mer.utexas.edu systemd[1]: Started Slurm controller daemon.
Here is my configuration file:
$ cat /etc/slurm/slurm.conf
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=pauli
#SlurmctldHost=
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/affinity
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
AccountingStorageHost=pauli
#AccountingStoragePass=abcdef
#AccountingStoragePort=1234
AccountingStorageType=accounting_storage/none
AccountingStorageUser=slurm
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags=
JobCompHost=pauli
#JobCompLoc=slurm_comp_db
#JobCompPass=abcdef
#JobCompPort=1234
JobCompType=jobcomp/none
JobCompUser=slurm
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
#SlurmctldLogFile=
SlurmdDebug=info
#SlurmdLogFile=
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=pauli-node-0[1-2] CPUs=64 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN
NodeName=pauli-node-03 CPUs=40 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN
PartitionName=normal Nodes=pauli-node-0[1-3] Default=YES MaxTime=INFINITE State=UP
When I try to run a simple job on 1 node with srun (along the lines of Slurm: Quick Start User Guide), the job just hangs. Does anyone know where I should look for diagnostic information to figure out why the job hangs?
$ srun -N1 -n1 -l hostname
One of the first things to check is the network connectivity and making sure no firewall is in the way. You can check that with
scontrol ping
on the control nodes. Also, srun has a -v option that can tell you where it is blocked (you can add multiple of such options to increase the verbosity).
And of course, the log files for both the controller and the slurmd's may contain information. Again, the log level can be increased, with scontrol setdebug.
The usual suspects, besides the firewall, are often SELinux, netmasks, IP routes. Make also sure the clocks are in sync and munge is running OK.
SOLVED. The firewall for all compute nodes must be either "off" or configured to trust the other nodes in the system.
See Compute node firewall must be off
I was able to run (on Linux RedHat)
firewall-cmd --zone=trusted --add-source=10.xxx.xxx.xxx --add-source=10.xxx.xxx.xxx --add-source=10.xxx.xxx.xxx
on each compute node in order to avoid turning off the firewall altogether. I think the reason the problem came up recently is that the firewall was probably deactivated, and after a recent reboot of the system, the firewall came back up.
Thanks #damienfrancois for helping point me to firewall problems.

Unable to create a new app using an image from openshift internal registry

I have an nginx image ans I am able to push it to openshift internal registry. However, when I try to use that image from internal registry to create an app, it gives me imagepullback error.
Below are the steps which I am following.
[root#artel1 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/nginx latest 231d40e811cd 4 weeks ago 126 MB
[root#artel1 ~]# docker tag 231d40e811cd docker-registry-default.router.default.svc.cluster.local/openshift/nginx
[root#artel1 ~]# docker push docker-registry-default.router.default.svc.cluster.local/openshift/nginx
[root#artel1 ~]# oc new-app --docker-image=docker-registry-default.router.default.svc.cluster.local/openshift/test-image
W1227 10:18:34.761105 33535 dockerimagelookup.go:233] Docker registry lookup failed: Get https://docker-registry-default.router.default.svc.cluster.local/v2/: x509: certificate signed by unknown authority
W1227 10:18:34.784988 33535 newapp.go:479] Could not find an image stream match for "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest". Make sure that a Docker image with that tag is available on the node for the deployment to succeed.
--> Found Docker image 7809d84 (8 days old) from docker-registry-default.router.default.svc.cluster.local for "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
OpenShift Node
--------------
This is a component of OpenShift and contains the software for individual nodes when using SDN.
Tags: openshift, node
* This image will be deployed in deployment config "test-image"
* Ports 53/tcp, 8443/tcp will be load balanced by service "test-image"
* Other containers can access this service through the hostname "test-image"
* WARNING: Image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest" runs as the 'root' user which may not be permitted by your cluster administrator
--> Creating resources ...
deploymentconfig.apps.openshift.io "test-image" created
service "test-image" created
--> Success
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/test-image'
Run 'oc status' to view your app.
Events logs
34s 47s 2 test-image-1-dzhmk.15e44d430e48ec8d Pod spec.containers{test-image} Normal Pulling kubelet, artel2.fyre.ibm.com pulling image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
34s 46s 2 test-image-1-dzhmk.15e44d4318ec7f53 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Failed to pull image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest": rpc error: code = Unknown desc = Error: image openshift/test-image:latest not found
34s 46s 2 test-image-1-dzhmk.15e44d4318ed5311 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Error: ErrImagePull
27s 46s 7 test-image-1-dzhmk.15e44d433c24e5c9 Pod Normal SandboxChanged kubelet, artel2.fyre.ibm.com Pod sandbox changed, it will be killed and re-created.
25s 43s 6 test-image-1-dzhmk.15e44d43dd6a7b57 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Error: ImagePullBackOff
25s 43s 6 test-image-1-dzhmk.15e44d43dd6a10d9 Pod spec.containers{test-image} Normal BackOff kubelet, artel2.fyre.ibm.com Back-off pulling image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
Pod status
[root#artel1 ~]# oc get po
NAME READY STATUS RESTARTS AGE
test-image-1-deploy 1/1 Running 0 3m
test-image-1-dzhmk 0/1 ImagePullBackOff 0 3m
Where exactly things are going wrong ?
It looks like 'docker push' hasn't been completed successfully. It should return 'Image successfully pushed'.
Try to login to internal registry first (see accessing_registry), and recheck registry's service hostname or use service ip

Scaling Up of GlusterFS-storage only add new peer without new bricks in Openshift

Observed behavior
I started with one node Openshift cluster and it successfully deployed master/node and gluster volume. Now I extend Openshift cluster and it was successfully.
but on extending glusterfs volume with below
[glusterfs]
10.1.1.1 glusterfs_devices='[ "/dev/vdb" ]'
10.1.1.2 glusterfs_devices='[ "/dev/vdb" ]' openshift_node_labels="type=upgrade"
ansible-playbook -i inventory2.ini /usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.yml -e openshift_upgrade_nodes_label="type=upgrade"
it only added 10.1.1.2 as peer but volume still has only one brick
Following customization done to start deploy gluster from 1 node {--durability none}
openshift-ansible/roles/openshift_storage_glusterfs/tasks/heketi_init_db.yml
- name: Create heketi DB volume
command: "{{ glusterfs_heketi_client }} setup-openshift-heketi-storage --image {{ glusterfs_heketi_image }} --listfile /tmp/heketi-storage.json **--durability none**"
register: setup_storage
>gluster peer status
Number of Peers: 1
Hostname: 10.1.1.2
Uuid: 1b8159e4-99e2-4f4d-ad95-e97bc8655d32
State: Peer in Cluster (Connected)
gluster volume info
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 769419b9-d28f-4cdd-a8f3-708b6b738f65
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.1.1.1:/var/lib/heketi/mounts/vg_4187bfa3eb090ceffea9c53b156ddbd4/brick_80401b43be8c3c8a74417b18ad574524/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Expected/desired behavior
I am expecting that on addition of every new node it should create new brick too
Details on how to reproduce (minimal and precise)
Add nodes in gluster cluster with below commands
ansible-playbook -i inventory2.ini /usr/share/ansible/openshift-ansible/playbooks/openshift-glusterfs/config.yml -e openshift_upgrade_nodes_label="type=upgrade"
Information about the environment:
Heketi version used (e.g. v6.0.0 or master): OpenShift 3.10
Operating system used: CentOS
Heketi compiled from sources, as a package (rpm/deb), or container: Container
If container, which container image: docker.io/heketi/heketi:latest
Using kubernetes, openshift, or direct install: Openshift
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: outside
If kubernetes/openshift, how was it deployed (gk-deploy, openshift-ansible, other, custom): openshift-ansible
Just adding a node/server does not mean that the brick will also be added to existing
gluster volume.
You have to add that brick, hosted on new node, to existing volume.
command -
"gluster volume add-brick host:brick-path commit force"
Not sure if you have provided this command in your automation script or not.

openshift v3 online pro volume and memory limit issues

I am trying to run an sonatype/nexus3 on openshift online v3 pro. If I just use the web console to create a new app from image it assigns it only 512Mi and it dies with OOM. It did get created though and logged a lot of java output before it died of out of memory. When using the web console there doesnt appear a way to set the memory on the image. When I try to edited the yaml of the pod it doesn't let me edited the memory limit.
Reading the docs about memory limits it suggests that I can run with this:
oc run nexus333 --image=sonatype/nexus3 --limits=memory=750Mi
Then it doesn't even start. It dies with:
{kubelet ip-172-31-59-148.ec2.internal} Error: Error response from
daemon: {"message":"create
c30deb38b3c26252bf1218cc898fbf1c68d8fc14e840076710c211d58ed87a59:
mkdir
/var/lib/docker/volumes/c30deb38b3c26252bf1218cc898fbf1c68d8fc14e840076710c211d58ed87a59:
permission denied"}
More information from oc get events:
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
16m 16m 1 nexus333-1-deploy Pod Normal Scheduled {default-scheduler } Successfully assigned nexus333-1-deploy to ip-172-31-50-97.ec2.internal
16m 16m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Pulling {kubelet ip-172-31-50-97.ec2.internal} pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.6.173.0.21"
16m 16m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Pulled {kubelet ip-172-31-50-97.ec2.internal} Successfully pulled image "registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.6.173.0.21"
15m 15m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Created {kubelet ip-172-31-50-97.ec2.internal} Created container
15m 15m 1 nexus333-1-deploy Pod spec.containers{deployment} Normal Started {kubelet ip-172-31-50-97.ec2.internal} Started container
15m 15m 1 nexus333-1-rftvd Pod Normal Scheduled {default-scheduler } Successfully assigned nexus333-1-rftvd to ip-172-31-59-148.ec2.internal
15m 14m 7 nexus333-1-rftvd Pod spec.containers{nexus333} Normal Pulling {kubelet ip-172-31-59-148.ec2.internal} pulling image "sonatype/nexus3"
15m 10m 19 nexus333-1-rftvd Pod spec.containers{nexus333} Normal Pulled {kubelet ip-172-31-59-148.ec2.internal} Successfully pulled image "sonatype/nexus3"
15m 15m 1 nexus333-1-rftvd Pod spec.containers{nexus333} Warning Failed {kubelet ip-172-31-59-148.ec2.internal} Error: Error response from daemon: {"message":"create 3aa35201bdf81d09ef4b09bba1fc843b97d0339acfef0c30cecaa1fbb6207321: mkdir /var/lib/docker/volumes/3aa35201bdf81d09ef4b09bba1fc843b97d0339acfef0c30cecaa1fbb6207321: permission denied"}
I am not sure why if I use the web console I cannot assign more memory. I am not sure why running it with oc run dies with the mkdir error. Can anyone tell me how to run sonatype/nexus3 on openshift online pro?
Looking in the documentation I see that it is a Java VM solution.
When using Java 8, memory usage can be DRAMATICALLY IMPROVED using only the following 2 runtime Java VM options:
... "-XX:+UnlockExperimentalVMOptions", "-XX:+UseCGroupMemoryLimitForHeap" ...
I just deployed my container (Spring Boot JAR) that consumed over 650 MB RAM. With just these two (new) options RAM consumption dropped to just 270 MB!!!
So, with these 2 runtime settings all OOM's are left far behind! Enjoy!
You may want to also follow along with the tutorial that is in the OpenShift docs https://docs.openshift.com/online/dev_guide/app_tutorials/maven_tutorial.html
I have had success deploying this in OpenShift Online Pro
Okay the mkdir /var/lib/docker/volumes/ permission denied seems to be that the image needs a /nexus-data mount and that is refused. I saw that by deploying from the web console (dies with OOM) but the edit yaml for the created pod to see the generated volume mount.
Creating the image with the following yaml using cat nexus3_pod.ephemeral.yaml | oc create -f - with the volume mount and explicit memory settings the container will now start up:
apiVersion: "v1"
kind: "Pod"
metadata:
name: "nexus3"
labels:
name: "nexus3"
spec:
containers:
-
name: "nexus3"
resources:
requests:
memory: "1200Mi"
limits:
memory: "1200Mi"
image: "sonatype/nexus3"
ports:
-
containerPort: 8081
name: "nexus3"
volumeMounts:
- mountPath: /nexus-data
name: nexus3-1
volumes:
- emptyDir: {}
name: nexus3-1
Notes
The mage sets -Xmx1200m as documented at sonatype/docker-nexus3. So if you assign memory less than 1200Mi it will crash with OOM when the heap grows over the limit. You may as well set requested and max to be the max heap side anything.
When the allocated memory was too low it crashed die just as it was setting up the DB which corrupted the db log which meant it then got in a crash loop "couldn't load 4 byte from 0 byte file" when I recreated it with more memory. It seems that with an emptyDir the files hang around between crash restarts and memory changes (that's documented behaviour I think). I had to recreate a pod with a different name to get a clean emptyDir and assigned memory of 1200Mi to get it to all start.

Multiple HAProxy instances on OpenShift

I have an application (Node.JS) deployed on OpenShift (bronze plan) with the Web Load Balancer activated, the minimum gears active are 3 and the max are 16.
Sometimes in the main gear I can see more than one HAProxy instance running, for example now I have:
> ps -ef|grep /usr/sbin/haproxy
3505 37488 1 1 08:46 ? 00:00:01 /usr/sbin/haproxy -f /var/lib/openshift/<APP_ID>/haproxy//conf/haproxy.cfg -sf 37237
3505 149643 1 1 May28 ? 00:09:08 /usr/sbin/haproxy -f /var/lib/openshift/<APP_ID>/haproxy//conf/haproxy.cfg -sf 114873
looking the logs I can't any error. Any explanation about this?
Thanks!
This could be a consequence of executing Haproxy reload script (/etc/init.d/haproxy). This will usually create a new haproxy process to accept new connections. It will also keep the old process alive until there are still open connections to it. Once they are closed, old haproxy process will be terminated.