Unable to start nginx-ingress-controller Readiness and Liveness probes failed - kubernetes-ingress

I have installed using instructions at this link for the Install NGINX using NodePort option.
When I do ks logs -f ingress-nginx-controller-7f48b8-s7pg4 -n ingress-nginx I get :
W0304 09:33:40.568799 8 client_config.go:614] Neither --kubeconfig nor --master was
specified. Using the inClusterConfig. This might not work.
I0304 09:33:40.569097 8 main.go:241] "Creating API client" host="https://10.96.0.1:443"
I0304 09:33:40.584904 8 main.go:285] "Running in Kubernetes cluster" major="1" minor="23" git="v1.23.1+k0s" state="clean" commit="b230d3e4b9d6bf4b731d96116a6643786e16ac3f" platform="linux/amd64"
I0304 09:33:40.911443 8 main.go:105] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0304 09:33:40.916404 8 main.go:115] "Enabling new Ingress features available since Kubernetes v1.18"
W0304 09:33:40.918137 8 main.go:127] No IngressClass resource with name nginx found. Only annotation will be used.
I0304 09:33:40.942282 8 ssl.go:532] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0304 09:33:40.977766 8 nginx.go:254] "Starting NGINX Ingress controller"
I0304 09:33:41.007616 8 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"1a4482d2-86cb-44f3-8ebb-d6342561892f", APIVersion:"v1", ResourceVersion:"987560", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller
E0304 09:33:42.087113 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:43.041954 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:44.724681 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:48.303789 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:59.113203 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:34:16.727052 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
I0304 09:34:39.216165 8 main.go:187] "Received SIGTERM, shutting down"
I0304 09:34:39.216773 8 nginx.go:372] "Shutting down controller queues"
E0304 09:34:39.217779 8 store.go:178] timed out waiting for caches to sync
I0304 09:34:39.217856 8 nginx.go:296] "Starting NGINX process"
I0304 09:34:39.218007 8 leaderelection.go:243] attempting to acquire leader lease ingress-nginx/ingress-controller-leader-nginx...
I0304 09:34:39.219741 8 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:initial-sync,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ClusterName:,ManagedFields:[]ManagedFieldsEntry{},}"
I0304 09:34:39.219787 8 nginx.go:316] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
I0304 09:34:39.242501 8 leaderelection.go:253] successfully acquired lease ingress-nginx/ingress-controller-leader-nginx
I0304 09:34:39.242807 8 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ClusterName:,ManagedFields:[]ManagedFieldsEntry{},}"
I0304 09:34:39.242837 8 status.go:84] "New leader elected" identity="ingress-nginx-controller-7f48b8-s7pg4"
I0304 09:34:39.252025 8 status.go:204] "POD is not ready" pod="ingress-nginx/ingress-nginx-controller-7f48b8-s7pg4" node="fbcdcesdn02"
I0304 09:34:39.255282 8 status.go:132] "removing value from ingress status" address=[]
I0304 09:34:39.255328 8 nginx.go:380] "Stopping admission controller"
I0304 09:34:39.255379 8 nginx.go:388] "Stopping NGINX process"
E0304 09:34:39.255664 8 nginx.go:319] "Error listening for TLS connections" err="http: Server closed"
2022/03/04 09:34:39 [notice] 43#43: signal process started
I0304 09:34:40.263361 8 nginx.go:401] "NGINX process has stopped"
I0304 09:34:40.263396 8 main.go:195] "Handled quit, awaiting Pod deletion"
I0304 09:34:50.263585 8 main.go:198] "Exiting" code=0
When I do ks describe pod ingress-nginx-controller-7f48b8-s7pg4 -n ingress-nginx I get :
Name: ingress-nginx-controller-7f48b8-s7pg4
Namespace: ingress-nginx
Priority: 0
Node: fxxxxxxxx/10.XXX.XXX.XXX
Start Time: Fri, 04 Mar 2022 08:12:57 +0200
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
pod-template-hash=7f48b8
Annotations: kubernetes.io/psp: 00-k0s-privileged
Status: Running
IP: 10.244.0.119
IPs:
IP: 10.244.0.119
Controlled By: ReplicaSet/ingress-nginx-controller-7f48b8
Containers:
controller:
Container ID: containerd://638ff4d63b7ba566125bd6789d48db6e8149b06cbd9d887ecc57d08448ba1d7e
Image: k8s.gcr.io/ingress-nginx/controller:v0.48.1#sha256:e9fb216ace49dfa4a5983b183067e97496e7a8b307d2093f4278cd550c303899
Image ID: k8s.gcr.io/ingress-nginx/controller#sha256:e9fb216ace49dfa4a5983b183067e97496e7a8b307d2093f4278cd550c303899
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--election-id=ingress-controller-leader
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 04 Mar 2022 11:33:40 +0200
Finished: Fri, 04 Mar 2022 11:34:50 +0200
Ready: False
Restart Count: 61
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-controller-7f48b8-s7pg4 (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zvcnr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ingress-nginx-admission
Optional: false
kube-api-access-zvcnr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 23m (x316 over 178m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Warning BackOff 8m52s (x555 over 174m) kubelet Back-off restarting failed container
Normal Pulled 3m54s (x51 over 178m) kubelet Container image "k8s.gcr.io/ingress-nginx/controller:v0.48.1#sha256:e9fb216ace49dfa4a5983b183067e97496e7a8b307d2093f4278cd550c303899" already present on machine
When I try to curl the health endpoints I get Connection refused :
The state of the pods shows that they are both not ready :
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-4hzzk 0/1 Completed 0 3h30m
ingress-nginx-controller-7f48b8-s7pg4 0/1 CrashLoopBackOff 63 (91s ago) 3h30m
I have tried to increase the values for initialDelaySeconds in /etc/nginx/nginx.conf but when I attempt to exec into the container (ks exec -it -n ingress-nginx ingress-nginx-controller-7f48b8-s7pg4 -- bash) I also get an error error: unable to upgrade connection: container not found ("controller")
I am not really sure where I should be looking in the overall setup.

I have installed using instructions at this link for the Install NGINX using NodePort option.
The problem is that you are using outdated k0s documentation:
https://docs.k0sproject.io/v1.22.2+k0s.1/examples/nginx-ingress/
You should use this link instead:
https://docs.k0sproject.io/main/examples/nginx-ingress/
You will install the controller-v1.0.0 version on your Kubernetes cluster by following the actual documentation link.
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.0/deploy/static/provider/baremetal/deploy.yaml
The result is:
$ sudo k0s kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-dw2f4 0/1 Completed 0 11m
ingress-nginx-admission-patch-4dmpd 0/1 Completed 0 11m
ingress-nginx-controller-75f58fbf6b-xrfxr 1/1 Running 0 11m

Related

"jx boot" fails in "openshift-3.11" provider with "tekton pipeline controller" pod into "crashloopbackoff" state

Summary:
I already have a setup of "static jenkins server" type jenkins-x running in openshift 3.11 provider. The cluster was crashed and I want to reinstall jenkins-x in my cluster but there is no support for "static jenkins server" now.
So I am trying to install "jenkins-x" via "jx boot" but the installation fails with "tekton pipeline controller" pod into "crashloopbackoff" state.
Steps to reproduce the behavior:
jx-requirements.yml:
autoUpdate:
enabled: false
schedule: ""
bootConfigURL: https://github.com/jenkins-x/jenkins-x-boot-config.git
cluster:
clusterName: cic-60
devEnvApprovers:
- automation
environmentGitOwner: cic-60
gitKind: bitbucketserver
gitName: bs
gitServer: http://rtx-swtl-git.fnc.net.local
namespace: jx
provider: openshift
registry: docker-registry.default.svc:5000
environments:
- ingress:
domain: 172.29.35.81.nip.io
externalDNS: false
namespaceSubDomain: -jx.
tls:
email: ""
enabled: false
production: false
key: dev
repository: environment-cic-60-dev
- ingress:
domain: ""
externalDNS: false
namespaceSubDomain: ""
tls:
email: ""
enabled: false
production: false
key: staging
repository: environment-cic-60-staging
- ingress:
domain: ""
externalDNS: false
namespaceSubDomain: ""
tls:
email: ""
enabled: false
production: false
key: production
repository: environment-cic-60-production
gitops: true
ingress:
domain: 172.29.35.81.nip.io
externalDNS: false
namespaceSubDomain: -jx.
tls:
email: ""
enabled: false
production: false
kaniko: true
repository: nexus
secretStorage: local
storage:
backup:
enabled: false
url: ""
logs:
enabled: false
url: ""
reports:
enabled: false
url: ""
repository:
enabled: false
url: ""
vault: {}
velero:
schedule: ""
ttl: ""
versionStream:
ref: v1.0.562
url: https://github.com/jenkins-x/jenkins-x-versions.git
webhook: lighthouse
Expected behavior:
All the pods under jx namespace should be up & running and jenkins-x should be installed properly
Actual behavior:
Tekton pipeline controller pod is into "CrashLoopBackOff" state with error:
Pods with status in "jx" namespace:
NAME READY STATUS RESTARTS AGE
jenkins-x-chartmuseum-5687695d57-pp994 1/1 Running 0 1d
jenkins-x-controllerbuild-78b4b56695-mg2vs 1/1 Running 0 1d
jenkins-x-controllerrole-765cf99bdb-swshp 1/1 Running 0 1d
jenkins-x-docker-registry-5bcd587565-rhd7q 1/1 Running 0 1d
jenkins-x-gcactivities-1598421600-jtgm6 0/1 Completed 0 1h
jenkins-x-gcactivities-1598423400-4rd76 0/1 Completed 0 43m
jenkins-x-gcactivities-1598425200-sd7xm 0/1 Completed 0 13m
jenkins-x-gcpods-1598421600-z7s4w 0/1 Completed 0 1h
jenkins-x-gcpods-1598423400-vzb6p 0/1 Completed 0 43m
jenkins-x-gcpods-1598425200-56zdp 0/1 Completed 0 13m
jenkins-x-gcpreviews-1598421600-5k4vf 0/1 Completed 0 1h
jenkins-x-nexus-c7dcb47c7-fh7kx 1/1 Running 0 1d
lighthouse-foghorn-654c868bc8-d5w57 1/1 Running 0 1d
lighthouse-gc-jobs-1598421600-bmsq8 0/1 Completed 0 1h
lighthouse-gc-jobs-1598423400-zskt5 0/1 Completed 0 43m
lighthouse-gc-jobs-1598425200-m9gtd 0/1 Completed 0 13m
lighthouse-jx-controller-6c9b8994bd-qt6tc 1/1 Running 0 1d
lighthouse-keeper-7c6fd9466f-gdjjt 1/1 Running 0 1d
lighthouse-webhooks-56668dc58b-4c52j 1/1 Running 0 1d
lighthouse-webhooks-56668dc58b-8dh27 1/1 Running 0 1d
tekton-pipelines-controller-76c8c8dd78-llj4c 0/1 CrashLoopBackOff 436 1d
tiller-7ddfd45c57-rwtt9 1/1 Running 0 1d
Error log:
2020/08/24 18:38:00 Registering 4 clients
2020/08/24 18:38:00 Registering 3 informer factories
2020/08/24 18:38:00 Registering 8 informers
2020/08/24 18:38:00 Registering 2 controllers
{"level":"info","caller":"logging/config.go:108","msg":"Successfully created the logger."}
{"level":"info","caller":"logging/config.go:109","msg":"Logging level set to info"}
{"level":"fatal","logger":"tekton","caller":"sharedmain/main.go:149","msg":"Version check failed","commit":"821ac4d","error":"kubernetes version \"v1.11.0\" is not compatible, need at least \"v1.14.0\" (this can be overridden with the env var \"KUBERNETES_MIN_VERSION\")","stacktrace":"github.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain.MainWithConfig\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain/main.go:149\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain.MainWithContext\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain/main.go:114\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/controller/main.go:72\nruntime.main\n\truntime/proc.go:203"}
After downgrading the tekton image from "0.11.0" to "0.9.0" the tekton pipeline controller pod is into running state. And a new tekton pipeline webhook pod got created and it is into "Crashloopbackoff"
Jx version:
Version 2.1.127
Commit 4bc05a9
Build date 2020-08-05T20:34:57Z
Go version 1.13.8
Git tree state clean
Diagnostic information:
The output of jx diagnose version is:
Running in namespace: jx
Version 2.1.127
Commit 4bc05a9
Build date 2020-08-05T20:34:57Z
Go version 1.13.8
Git tree state clean
NAME VERSION
Kubernetes cluster v1.11.0+d4cacc0
kubectl (installed in JX_BIN) v1.16.6-beta.0
helm client 2.16.9
git 2.24.1
Operating System "CentOS Linux release 7.8.2003 (Core)"
Please visit https://jenkins-x.io/faq/issues/ for any known issues.
Finished printing diagnostic information
Kubernetes cluster: openshift - 3.11
Kubectl version:
Client Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-15T09:45:30Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Operating system / Environment:
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
I need to install "jenkins-x" via "jx boot" in "openshift-3.11" which uses default kubernetes version - 1.11.0 but "jx boot" requires atleast 1.14.0. Please suggest if there is any work around to get jenkins-x on openshift-3.11
As the error message shows (in the crashloop), kubernetes version "v1.11.0" is not compatible, need at least "v1.14.0", which make it not installable on OpenShift 3 (as it ships with Kubernetes 1.11.0). It seems jenkins-X comes with Tetkon Pipelines v0.14.2 which requires at least Kubernetes 1.14.0 (and later releases like Tekton Pipelines v0.15.0 requires Kubernetes 1.16.0).
{"level":"fatal","logger":"tekton","caller":"sharedmain/main.go:149","msg":"Version check failed","commit":"821ac4d","error":"kubernetes version \"v1.11.0\" is not compatible, need at least \"v1.14.0\" (this can be overridden with the env var \"KUBERNETES_MIN_VERSION\")","stacktrace":"github.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain.MainWithConfig\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain/main.go:149\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain.MainWithContext\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/injection/sharedmain/main.go:114\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/controller/main.go:72\nruntime.main\n\truntime/proc.go:203"}
Theorically, setting KUBERNETES_MIN_VERSION in the controller deployment might make it work but it is not being tested and most likely the controller won't behave correctly as it's using feature that are not available in 1.11.0. Other than this, there is no workaround that I know of.

How to manually recreate the bootstrap client certificate for OpenShift 3.11 master?

Our origin-node.service on the master node fails with:
root#master> systemctl start origin-node.service
Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.
root#master> systemctl status origin-node.service -l
[...]
May 05 07:17:47 master origin-node[44066]: bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 05 07:17:47 master origin-node[44066]: bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 05 07:17:47 master origin-node[44066]: certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 05 07:17:47 master origin-node[44066]: server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
So it seems that kubelet-client-current.pem and/or kubelet-server-current.pem contains an expired certificate and the service tries to create a CSR using an endpoint which is probably not yet available (because the master is down). We tried redeploying the certificates according to the OpenShift documentation Redeploying Certificates, but this fails while detecting an expired certificate:
root#master> ansible-playbook -i /etc/ansible/hosts openshift-master/redeploy-openshift-ca.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] *******************************************************************************************************************************************
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200505T042754.html or /root/cert-expiry-report.20200505T042754.json.\n"}
[...]
root#master> cat /root/cert-expiry-report.20200505T042754.json
[...]
"kubeconfigs": [
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer#1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer#1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
[...]
"summary": {
"expired": 2,
"ok": 22,
"total": 24,
"warning": 0
}
}
There is a guide for OpenShift 4.4 for Recovering from expired control plane certificates, but that does not apply for 3.11 and we did not find such a guide for our version.
Is it possible to recreate the expired certificates without a running master node for 3.11? Thanks for any help.
OpenShift Ansible: https://github.com/openshift/openshift-ansible/releases/tag/openshift-ansible-3.11.153-2
Update 2020-05-06: I also executed redeploy-certificates.yml, but it fails at the same TASK:
root#master> ansible-playbook -i /etc/ansible/hosts playbooks/redeploy-certificates.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ******************************************************************************
Wednesday 06 May 2020 04:07:06 -0400 (0:00:00.909) 0:01:07.582 *********
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200506T040603.html or /root/cert-expiry-report.20200506T040603.json.\n"}
Update 2020-05-11: Running with -e openshift_certificate_expiry_fail_on_warn=False results in:
root#master> ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml
[...]
TASK [Wait for master API to come back online] *****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.111) 0:02:25.186 ************
skipping: [master.openshift-cluster.mydomain.com]
TASK [openshift_control_plane : restart master] ****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.257) 0:02:25.444 ************
changed: [master.openshift-cluster.mydomain.com] => (item=api)
changed: [master.openshift-cluster.mydomain.com] => (item=controllers)
RUNNING HANDLER [openshift_control_plane : verify API server] **************************************************************************************************
Monday 11 May 2020 03:48:57 -0400 (0:00:00.945) 0:02:26.389 ************
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
[...]
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "--silent", "--tlsv1.2", "--max-time", "2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://lb.openshift-cluster.mydomain.com:8443/healthz/ready"], "delta": "0:00:00.182367", "end": "2020-05-11 03:51:52.245644", "msg": "non-zero return code", "rc": 35, "start": "2020-05-11 03:51:52.063277", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
root#master> systemctl status origin-node.service -l
[...]
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: E0511 04:23:28.077964 109972 bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.078001 109972 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.080555 109972 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: F0511 04:23:28.130968 109972 server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
[...]
I have this same case in customer environment, this error is because the certified was expiry, i "cheated" changing da S.O date before the expiry date. And the origin-node service started in my masters:
systemctl status origin-node
● origin-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
Active: active (running) since Sáb 2021-02-20 20:22:21 -02; 6min ago
Docs: https://github.com/openshift/origin
Main PID: 37230 (hyperkube)
Memory: 79.0M
CGroup: /system.slice/origin-node.service
└─37230 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-c...
Você tem mensagem de correio em /var/spool/mail/okd
The openshift_certificate_expiry role uses the openshift_certificate_expiry_fail_on_warn variable to determine if the playbook should fail when the days left are less than openshift_certificate_expiry_warning_days.
So try running the redeploy-certificates.yml with this additional variable set to "False":
ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml

Upgrading K8S cluster from v1.2.0 to v1.3.0

I have 1 master and 4 minions all running on version 1.2.0. I am planning to upgrade them to 1.3.0. I want this done with minimal downtime.
So I did the following on one minion.
systemctl stop kubelet
yum update kubernetes-1.3.0-0.3.git86dc49a.el7
systemctl start kubelet
Once I bring up the service, i see the following ERROR.
Mar 28 20:36:55 csdp-e2e-kubernetes-minion-6 kubelet[9902]: E0328 20:36:55.215614 9902 kubelet.go:1222] Unable to register node "172.29.240.169" with API server: the body of the request was in an unknown format - accepted media types include: application/json, application/yaml
Mar 28 20:36:55 csdp-e2e-kubernetes-minion-6 kubelet[9902]: E0328 20:36:55.217612 9902 event.go:198] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"172.29.240.169.14b01ded8fb2d07b", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"172.29.240.169", UID:"172.29.240.169", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientDisk", Message:"Node 172.29.240.169 status is now: NodeHasSufficientDisk", Source:api.EventSource{Component:"kubelet", Host:"172.29.240.169"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63626321182, nsec:814949499, loc:(*time.Location)(0x4c8a780)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63626330215, nsec:213372890, loc:(*time.Location)(0x4c8a780)}}, Count:1278, Type:"Normal"}': 'the body of the request was in an unknown format - accepted media types include: application/json, application/yaml' (will not retry!)
Mar 28 20:36:55 csdp-e2e-kubernetes-minion-6 kubelet[9902]: E0328 20:36:55.246100 9902 event.go:198] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"172.29.240.169.14b01ded8fb2fc88", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"172.29.240.169", UID:"172.29.240.169", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node 172.29.240.169 status is now: NodeHasSufficientMemory", Source:api.EventSource{Component:"kubelet", Host:"172.29.240.169"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63626321182, nsec:814960776, loc:(*time.Location)(0x4c8a780)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63626330215, nsec:213381138, loc:(*time.Location)(0x4c8a780)}}, Count:1278, Type:"Normal"}': 'the body of the request was in an unknown format - accepted media types include: application/json, application/yaml' (will not retry!)
Is v1.2.0 incompatible with v1.3.0 ?
Seems like the issue is with JSON incompatibility ? application/json, application/yaml
From master standpoint ::
[root#kubernetes-master ~]# kubectl get nodes
NAME STATUS AGE
172.29.219.105 Ready 3h
172.29.240.146 Ready 3h
172.29.240.168 Ready 3h
172.29.240.169 NotReady 3h
The node that I upgraded is in NotReady state.
As per the documentation you must upgrade your master components (kube-scheduler, kube-apiserver and kube-controller-manager) before your node components (kubelet, kube-proxy).
https://kubernetes.io/docs/getting-started-guides/ubuntu/upgrades/

ejabberd mod_rest installation warning & crash report

I've tried to test mod_rest for my ejabberd. Received 'OK' result but with warning : "Warning: undefined callback function depends/2 (behaviour 'gen_mod')" during installation of the module. When I tried to test mod_rest that I installed using REST client, there's CRASH REPORT in log.
ejabberd version : 16.08.23
URL : http://localhost:5280/rest
ejabberd.yml
hosts:
- "localhost"
- "192.168.88.88"
- "127.0.0.1"
listen:
-
port: 5280
module: ejabberd_http
request_handlers:
"/websocket": ejabberd_http_ws
"/rest": mod_rest
web_admin: true
http_bind: true
captcha: true
modules:
mod_rest: {}
INSTALLATION INFO
$/sbin/ejabberdctl module_install mod_rest
src/mod_rest.erl:27: Warning: undefined callback function depends/2 (behaviour 'gen_mod')
ok
ejabberd.log
2016-08-24 11:14:31.516 [error] <0.3683.0> CRASH REPORT Process
<0.3683.0> with 0 neighbours crashed with reason: bad argument in call
to erlang:binary_to_list({127,0,0,1}) in str:tokens/2 line 242
crash.log
2016-08-24 14:42:20 =CRASH REPORT====
crasher:
initial call: ejabberd_http:init/2
pid: <0.3274.0>
registered_name: []
exception error: bad argument: [{erlang,binary_to_list,[{127,0,0,1}],[]},{str,tokens,2,[{file,"src/str.erl"},{line,242}]},{acl,parse_ip_netmask,1,[{file,"src/acl.erl"},{line,536}]},{mod_rest,'-ip_matches/2-fun-0-',2,[{file,"src/mod_rest.erl"},{line,144}]},{lists,any,2,[{file,"lists.erl"},{line,1225}]},{mod_rest,check_member_option,3,[{file,"src/mod_rest.erl"},{line,134}]},{mod_rest,process,2,[{file,"src/mod_rest.erl"},{line,55}]},{ejabberd_http,process,5,[{file,"src/ejabberd_http.erl"},{line,363}]}]
ancestors: [<0.3190.0>,ejabberd_listeners,ejabberd_sup,<0.2578.0>]
messages: []
links: [#Port<0.35648>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 1894
neighbours:

Unable to mount volumes for pod

EDITED:
I've an OpenShift cluster with one master and two nodes. I've installed NFS on the master and NFS client on the nodes.
I've followed the wordpress example with NFS: https://github.com/openshift/origin/tree/master/examples/wordpress
I did the following on my master as: oc login -u system:admin:
mkdir /home/data/pv0001
mkdir /home/data/pv0002
chown -R nfsnobody:nfsnobody /home/data
chmod -R 777 /home/data/
# Add to /etc/exports
/home/data/pv0001 *(rw,sync,no_root_squash)
/home/data/pv0002 *(rw,sync,no_root_squash)
# Enable the new exports without bouncing the NFS service
exportfs -a
So exportfs shows:
/home/data/pv0001
<world>
/home/data/pv0002
<world>
$ setsebool -P virt_use_nfs 1
# Create the persistent volumes for NFS.
# I did not change anything in the yaml-files
$ oc create -f examples/wordpress/nfs/pv-1.yaml
$ oc create -f examples/wordpress/nfs/pv-2.yaml
$ oc get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON
pv0001 <none> 1073741824 RWO,RWX Available
pv0002 <none> 5368709120 RWO Available
This is also what I get.
Than I'm going to my node:
oc login
test-admin
And I create a wordpress project:
oc new-project wordpress
# Create claims for storage in my project (same namespace).
# The claims in this example carefully match the volumes created above.
$ oc create -f examples/wordpress/pvc-wp.yaml
$ oc create -f examples/wordpress/pvc-mysql.yaml
$ oc get pvc
NAME LABELS STATUS VOLUME
claim-mysql map[] Bound pv0002
claim-wp map[] Bound pv0001
This looks exactly the same for me.
Launch the MySQL pod.
oc create -f examples/wordpress/pod-mysql.yaml
oc create -f examples/wordpress/service-mysql.yaml
oc create -f examples/wordpress/pod-wordpress.yaml
oc create -f examples/wordpress/service-wp.yaml
oc get svc
NAME LABELS SELECTOR IP(S) PORT(S)
mysql name=mysql name=mysql 172.30.115.137 3306/TCP
wpfrontend name=wpfrontend name=wordpress 172.30.170.55 5055/TCP
So actually everyting seemed to work! But when I'm asking for my pod status I get the following:
[root#ip-10-0-0-104 pv0002]# oc get pod
NAME READY STATUS RESTARTS AGE
mysql 0/1 Image: openshift/mysql-55-centos7 is ready, container is creating 0 6h
wordpress 0/1 Image: wordpress is not ready on the node 0 6h
The pods are in pending state and in the webconsole they're giving the following error:
12:12:51 PM mysql Pod failedMount Unable to mount volumes for pod "mysql_wordpress": exit status 32 (607 times in the last hour, 41 minutes)
12:12:51 PM mysql Pod failedSync Error syncing pod, skipping: exit status 32 (607 times in the last hour, 41 minutes)
12:12:48 PM wordpress Pod failedMount Unable to mount volumes for pod "wordpress_wordpress": exit status 32 (604 times in the last hour, 40 minutes)
12:12:48 PM wordpress Pod failedSync Error syncing pod, skipping: exit status 32 (604 times in the last hour, 40 minutes)
Unable to mount +timeout. But when I'm going to my node and I'm doing the following (test is a created directory on my node):
mount -t nfs -v masterhostname:/home/data/pv0002 /test
And I place some file in my /test on my node than it appears in my /home/data/pv0002 on my master so that seems to work.
What's the reason that it's unable to mount in OpenShift?
I've been stuck on this for a while.
LOGS:
Oct 21 10:44:52 ip-10-0-0-129 docker: time="2015-10-21T10:44:52.795267904Z" level=info msg="GET /containers/json"
Oct 21 10:44:52 ip-10-0-0-129 origin-node: E1021 10:44:52.832179 1148 mount_linux.go:103] Mount failed: exit status 32
Oct 21 10:44:52 ip-10-0-0-129 origin-node: Mounting arguments: localhost:/home/data/pv0002 /var/lib/origin/openshift.local.volumes/pods/2bf19fe9-77ce-11e5-9122-02463424c049/volumes/kubernetes.io~nfs/pv0002 nfs []
Oct 21 10:44:52 ip-10-0-0-129 origin-node: Output: mount.nfs: access denied by server while mounting localhost:/home/data/pv0002
Oct 21 10:44:52 ip-10-0-0-129 origin-node: E1021 10:44:52.832279 1148 kubelet.go:1206] Unable to mount volumes for pod "mysql_wordpress": exit status 32; skipping pod
Oct 21 10:44:52 ip-10-0-0-129 docker: time="2015-10-21T10:44:52.832794476Z" level=info msg="GET /containers/json?all=1"
Oct 21 10:44:52 ip-10-0-0-129 docker: time="2015-10-21T10:44:52.835916304Z" level=info msg="GET /images/openshift/mysql-55-centos7/json"
Oct 21 10:44:52 ip-10-0-0-129 origin-node: E1021 10:44:52.837085 1148 pod_workers.go:111] Error syncing pod 2bf19fe9-77ce-11e5-9122-02463424c049, skipping: exit status 32
Logs showed Oct 21 10:44:52 ip-10-0-0-129 origin-node: Output: mount.nfs: access denied by server while mounting localhost:/home/data/pv0002
So it failed mounting on localhost.
to create my persistent volume I've executed this yaml:
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "registry-volume"
},
"spec": {
"capacity": {
"storage": "20Gi"
},
"accessModes": [ "ReadWriteMany" ],
"nfs": {
"path": "/home/data/pv0002",
"server": "localhost"
}
}
}
So I was mounting to /home/data/pv0002 but this path was not on the localhost but on my master server (which is ose3-master.example.com. So I created my PV in a wrong way.
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "registry-volume"
},
"spec": {
"capacity": {
"storage": "20Gi"
},
"accessModes": [ "ReadWriteMany" ],
"nfs": {
"path": "/home/data/pv0002",
"server": "ose3-master.example.com"
}
}
}
This was also in a training environment. It's recommended to have a NFS server outside of your cluster to mount to.