Openshift : AlertManager going on CrashLoopBackOff , Prometheus Operator

Openshift : AlertManager going on CrashLoopBackOff , Prometheus Operator - openshift

Configured AlertManager for Prometheus with Promethus Operator facing below error ad Alertmanager is going on CrashLoopBackOff
level=info ts=2020-02-19T15:42:37.782116642Z caller=main.go:140 msg="Starting Alertmanager" version="(version=0.17.0, branch=HEAD, revision=c7551cd75c414dc81df027f691e2eb21d4fd85b2)"
level=info ts=2020-02-19T15:42:37.782197318Z caller=main.go:141 build_context="(go=go1.12.4, user=root#932a86a52b76, date=20190503-09:10:07)"
level=warn ts=2020-02-19T15:42:37.798588642Z caller=cluster.go:226 component=cluster msg="failed to join cluster" err="1 error occurred:\n\t* Failed to resolve alertmanager-alert-manager-0.alertmanager-operated.apparc-t.svc:9094: lookup alertmanager-alert-manager-0.alertmanager-operated.apparc-t.svc on 172.22.32.247:53: no such host\n\n"
level=info ts=2020-02-19T15:42:37.798644614Z caller=cluster.go:228 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2020-02-19T15:42:37.798665045Z caller=main.go:230 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to resolve alertmanager-alert-manager-0.alertmanager-operated.apparc-t.svc:9094: lookup alertmanager-alert-manager-0.alertmanager-operated.apparc-t.svc on 172.22.32.247:53: no such host\n\n"
level=info ts=2020-02-19T15:42:37.800267708Z caller=cluster.go:613 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2020-02-19T15:42:37.825176577Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml

Related

Unable to start nginx-ingress-controller Readiness and Liveness probes failed

I have installed using instructions at this link for the Install NGINX using NodePort option.
When I do ks logs -f ingress-nginx-controller-7f48b8-s7pg4 -n ingress-nginx I get :
W0304 09:33:40.568799 8 client_config.go:614] Neither --kubeconfig nor --master was
specified. Using the inClusterConfig. This might not work.
I0304 09:33:40.569097 8 main.go:241] "Creating API client" host="https://10.96.0.1:443"
I0304 09:33:40.584904 8 main.go:285] "Running in Kubernetes cluster" major="1" minor="23" git="v1.23.1+k0s" state="clean" commit="b230d3e4b9d6bf4b731d96116a6643786e16ac3f" platform="linux/amd64"
I0304 09:33:40.911443 8 main.go:105] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0304 09:33:40.916404 8 main.go:115] "Enabling new Ingress features available since Kubernetes v1.18"
W0304 09:33:40.918137 8 main.go:127] No IngressClass resource with name nginx found. Only annotation will be used.
I0304 09:33:40.942282 8 ssl.go:532] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0304 09:33:40.977766 8 nginx.go:254] "Starting NGINX Ingress controller"
I0304 09:33:41.007616 8 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"1a4482d2-86cb-44f3-8ebb-d6342561892f", APIVersion:"v1", ResourceVersion:"987560", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller
E0304 09:33:42.087113 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:43.041954 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:44.724681 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:48.303789 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:33:59.113203 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
E0304 09:34:16.727052 8 reflector.go:138] k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource
I0304 09:34:39.216165 8 main.go:187] "Received SIGTERM, shutting down"
I0304 09:34:39.216773 8 nginx.go:372] "Shutting down controller queues"
E0304 09:34:39.217779 8 store.go:178] timed out waiting for caches to sync
I0304 09:34:39.217856 8 nginx.go:296] "Starting NGINX process"
I0304 09:34:39.218007 8 leaderelection.go:243] attempting to acquire leader lease ingress-nginx/ingress-controller-leader-nginx...
I0304 09:34:39.219741 8 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:initial-sync,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ClusterName:,ManagedFields:[]ManagedFieldsEntry{},}"
I0304 09:34:39.219787 8 nginx.go:316] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
I0304 09:34:39.242501 8 leaderelection.go:253] successfully acquired lease ingress-nginx/ingress-controller-leader-nginx
I0304 09:34:39.242807 8 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ClusterName:,ManagedFields:[]ManagedFieldsEntry{},}"
I0304 09:34:39.242837 8 status.go:84] "New leader elected" identity="ingress-nginx-controller-7f48b8-s7pg4"
I0304 09:34:39.252025 8 status.go:204] "POD is not ready" pod="ingress-nginx/ingress-nginx-controller-7f48b8-s7pg4" node="fbcdcesdn02"
I0304 09:34:39.255282 8 status.go:132] "removing value from ingress status" address=[]
I0304 09:34:39.255328 8 nginx.go:380] "Stopping admission controller"
I0304 09:34:39.255379 8 nginx.go:388] "Stopping NGINX process"
E0304 09:34:39.255664 8 nginx.go:319] "Error listening for TLS connections" err="http: Server closed"
2022/03/04 09:34:39 [notice] 43#43: signal process started
I0304 09:34:40.263361 8 nginx.go:401] "NGINX process has stopped"
I0304 09:34:40.263396 8 main.go:195] "Handled quit, awaiting Pod deletion"
I0304 09:34:50.263585 8 main.go:198] "Exiting" code=0
When I do ks describe pod ingress-nginx-controller-7f48b8-s7pg4 -n ingress-nginx I get :
Name: ingress-nginx-controller-7f48b8-s7pg4
Namespace: ingress-nginx
Priority: 0
Node: fxxxxxxxx/10.XXX.XXX.XXX
Start Time: Fri, 04 Mar 2022 08:12:57 +0200
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
pod-template-hash=7f48b8
Annotations: kubernetes.io/psp: 00-k0s-privileged
Status: Running
IP: 10.244.0.119
IPs:
IP: 10.244.0.119
Controlled By: ReplicaSet/ingress-nginx-controller-7f48b8
Containers:
controller:
Container ID: containerd://638ff4d63b7ba566125bd6789d48db6e8149b06cbd9d887ecc57d08448ba1d7e
Image: k8s.gcr.io/ingress-nginx/controller:v0.48.1#sha256:e9fb216ace49dfa4a5983b183067e97496e7a8b307d2093f4278cd550c303899
Image ID: k8s.gcr.io/ingress-nginx/controller#sha256:e9fb216ace49dfa4a5983b183067e97496e7a8b307d2093f4278cd550c303899
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--election-id=ingress-controller-leader
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 04 Mar 2022 11:33:40 +0200
Finished: Fri, 04 Mar 2022 11:34:50 +0200
Ready: False
Restart Count: 61
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-controller-7f48b8-s7pg4 (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zvcnr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ingress-nginx-admission
Optional: false
kube-api-access-zvcnr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 23m (x316 over 178m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Warning BackOff 8m52s (x555 over 174m) kubelet Back-off restarting failed container
Normal Pulled 3m54s (x51 over 178m) kubelet Container image "k8s.gcr.io/ingress-nginx/controller:v0.48.1#sha256:e9fb216ace49dfa4a5983b183067e97496e7a8b307d2093f4278cd550c303899" already present on machine
When I try to curl the health endpoints I get Connection refused :
The state of the pods shows that they are both not ready :
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-4hzzk 0/1 Completed 0 3h30m
ingress-nginx-controller-7f48b8-s7pg4 0/1 CrashLoopBackOff 63 (91s ago) 3h30m
I have tried to increase the values for initialDelaySeconds in /etc/nginx/nginx.conf but when I attempt to exec into the container (ks exec -it -n ingress-nginx ingress-nginx-controller-7f48b8-s7pg4 -- bash) I also get an error error: unable to upgrade connection: container not found ("controller")
I am not really sure where I should be looking in the overall setup.

I have installed using instructions at this link for the Install NGINX using NodePort option.
The problem is that you are using outdated k0s documentation:
https://docs.k0sproject.io/v1.22.2+k0s.1/examples/nginx-ingress/
You should use this link instead:
https://docs.k0sproject.io/main/examples/nginx-ingress/
You will install the controller-v1.0.0 version on your Kubernetes cluster by following the actual documentation link.
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.0/deploy/static/provider/baremetal/deploy.yaml
The result is:
$ sudo k0s kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-dw2f4 0/1 Completed 0 11m
ingress-nginx-admission-patch-4dmpd 0/1 Completed 0 11m
ingress-nginx-controller-75f58fbf6b-xrfxr 1/1 Running 0 11m

How to manually recreate the bootstrap client certificate for OpenShift 3.11 master?

Our origin-node.service on the master node fails with:
root#master> systemctl start origin-node.service
Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.
root#master> systemctl status origin-node.service -l
[...]
May 05 07:17:47 master origin-node[44066]: bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 05 07:17:47 master origin-node[44066]: bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 05 07:17:47 master origin-node[44066]: certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 05 07:17:47 master origin-node[44066]: server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
So it seems that kubelet-client-current.pem and/or kubelet-server-current.pem contains an expired certificate and the service tries to create a CSR using an endpoint which is probably not yet available (because the master is down). We tried redeploying the certificates according to the OpenShift documentation Redeploying Certificates, but this fails while detecting an expired certificate:
root#master> ansible-playbook -i /etc/ansible/hosts openshift-master/redeploy-openshift-ca.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] *******************************************************************************************************************************************
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200505T042754.html or /root/cert-expiry-report.20200505T042754.json.\n"}
[...]
root#master> cat /root/cert-expiry-report.20200505T042754.json
[...]
"kubeconfigs": [
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer#1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer#1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
[...]
"summary": {
"expired": 2,
"ok": 22,
"total": 24,
"warning": 0
}
}
There is a guide for OpenShift 4.4 for Recovering from expired control plane certificates, but that does not apply for 3.11 and we did not find such a guide for our version.
Is it possible to recreate the expired certificates without a running master node for 3.11? Thanks for any help.
OpenShift Ansible: https://github.com/openshift/openshift-ansible/releases/tag/openshift-ansible-3.11.153-2
Update 2020-05-06: I also executed redeploy-certificates.yml, but it fails at the same TASK:
root#master> ansible-playbook -i /etc/ansible/hosts playbooks/redeploy-certificates.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ******************************************************************************
Wednesday 06 May 2020 04:07:06 -0400 (0:00:00.909) 0:01:07.582 *********
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200506T040603.html or /root/cert-expiry-report.20200506T040603.json.\n"}
Update 2020-05-11: Running with -e openshift_certificate_expiry_fail_on_warn=False results in:
root#master> ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml
[...]
TASK [Wait for master API to come back online] *****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.111) 0:02:25.186 ************
skipping: [master.openshift-cluster.mydomain.com]
TASK [openshift_control_plane : restart master] ****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.257) 0:02:25.444 ************
changed: [master.openshift-cluster.mydomain.com] => (item=api)
changed: [master.openshift-cluster.mydomain.com] => (item=controllers)
RUNNING HANDLER [openshift_control_plane : verify API server] **************************************************************************************************
Monday 11 May 2020 03:48:57 -0400 (0:00:00.945) 0:02:26.389 ************
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
[...]
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "--silent", "--tlsv1.2", "--max-time", "2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://lb.openshift-cluster.mydomain.com:8443/healthz/ready"], "delta": "0:00:00.182367", "end": "2020-05-11 03:51:52.245644", "msg": "non-zero return code", "rc": 35, "start": "2020-05-11 03:51:52.063277", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
root#master> systemctl status origin-node.service -l
[...]
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: E0511 04:23:28.077964 109972 bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.078001 109972 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.080555 109972 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: F0511 04:23:28.130968 109972 server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
[...]

I have this same case in customer environment, this error is because the certified was expiry, i "cheated" changing da S.O date before the expiry date. And the origin-node service started in my masters:
systemctl status origin-node
● origin-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
Active: active (running) since Sáb 2021-02-20 20:22:21 -02; 6min ago
Docs: https://github.com/openshift/origin
Main PID: 37230 (hyperkube)
Memory: 79.0M
CGroup: /system.slice/origin-node.service
└─37230 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-c...
Você tem mensagem de correio em /var/spool/mail/okd

The openshift_certificate_expiry role uses the openshift_certificate_expiry_fail_on_warn variable to determine if the playbook should fail when the days left are less than openshift_certificate_expiry_warning_days.
So try running the redeploy-certificates.yml with this additional variable set to "False":
ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml

How to solve bazel build error in Docker Build?

This error keeps on coming during docker build.
Tried various code techniques.
ERROR: Process exited with status 128: Process exited with status 128
++ git describe --long --tags
+ tf_git_rev=v1.14.0-14-g1aad02a78e
+ echo 'STABLE_TF_GIT_VERSION v1.14.0-14-g1aad02a78e'
+ pushd native_client
++ git describe --long --tags
fatal: No names found, cannot describe anything.
+ ds_git_rev=
STABLE_TF_GIT_VERSION v1.14.0-14-g1aad02a78e
/tensorflow/native_client /tensorflow
INFO: Elapsed time: 150.094s, Critical Path: 6.47s
INFO: 1 process: 1 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
The command '/bin/sh -c bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=cuda -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}' returned a non-zero code: 1
Build should be successful.

Upgrading K8S cluster from v1.2.0 to v1.3.0

I have 1 master and 4 minions all running on version 1.2.0. I am planning to upgrade them to 1.3.0. I want this done with minimal downtime.
So I did the following on one minion.
systemctl stop kubelet
yum update kubernetes-1.3.0-0.3.git86dc49a.el7
systemctl start kubelet
Once I bring up the service, i see the following ERROR.
Mar 28 20:36:55 csdp-e2e-kubernetes-minion-6 kubelet[9902]: E0328 20:36:55.215614 9902 kubelet.go:1222] Unable to register node "172.29.240.169" with API server: the body of the request was in an unknown format - accepted media types include: application/json, application/yaml
Mar 28 20:36:55 csdp-e2e-kubernetes-minion-6 kubelet[9902]: E0328 20:36:55.217612 9902 event.go:198] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"172.29.240.169.14b01ded8fb2d07b", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"172.29.240.169", UID:"172.29.240.169", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientDisk", Message:"Node 172.29.240.169 status is now: NodeHasSufficientDisk", Source:api.EventSource{Component:"kubelet", Host:"172.29.240.169"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63626321182, nsec:814949499, loc:(*time.Location)(0x4c8a780)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63626330215, nsec:213372890, loc:(*time.Location)(0x4c8a780)}}, Count:1278, Type:"Normal"}': 'the body of the request was in an unknown format - accepted media types include: application/json, application/yaml' (will not retry!)
Mar 28 20:36:55 csdp-e2e-kubernetes-minion-6 kubelet[9902]: E0328 20:36:55.246100 9902 event.go:198] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"172.29.240.169.14b01ded8fb2fc88", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"172.29.240.169", UID:"172.29.240.169", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node 172.29.240.169 status is now: NodeHasSufficientMemory", Source:api.EventSource{Component:"kubelet", Host:"172.29.240.169"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63626321182, nsec:814960776, loc:(*time.Location)(0x4c8a780)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63626330215, nsec:213381138, loc:(*time.Location)(0x4c8a780)}}, Count:1278, Type:"Normal"}': 'the body of the request was in an unknown format - accepted media types include: application/json, application/yaml' (will not retry!)
Is v1.2.0 incompatible with v1.3.0 ?
Seems like the issue is with JSON incompatibility ? application/json, application/yaml
From master standpoint ::
[root#kubernetes-master ~]# kubectl get nodes
NAME STATUS AGE
172.29.219.105 Ready 3h
172.29.240.146 Ready 3h
172.29.240.168 Ready 3h
172.29.240.169 NotReady 3h
The node that I upgraded is in NotReady state.

As per the documentation you must upgrade your master components (kube-scheduler, kube-apiserver and kube-controller-manager) before your node components (kubelet, kube-proxy).
https://kubernetes.io/docs/getting-started-guides/ubuntu/upgrades/

Unable to mount volumes for pod

EDITED:
I've an OpenShift cluster with one master and two nodes. I've installed NFS on the master and NFS client on the nodes.
I've followed the wordpress example with NFS: https://github.com/openshift/origin/tree/master/examples/wordpress
I did the following on my master as: oc login -u system:admin:
mkdir /home/data/pv0001
mkdir /home/data/pv0002
chown -R nfsnobody:nfsnobody /home/data
chmod -R 777 /home/data/
# Add to /etc/exports
/home/data/pv0001 *(rw,sync,no_root_squash)
/home/data/pv0002 *(rw,sync,no_root_squash)
# Enable the new exports without bouncing the NFS service
exportfs -a
So exportfs shows:
/home/data/pv0001
<world>
/home/data/pv0002
<world>
$ setsebool -P virt_use_nfs 1
# Create the persistent volumes for NFS.
# I did not change anything in the yaml-files
$ oc create -f examples/wordpress/nfs/pv-1.yaml
$ oc create -f examples/wordpress/nfs/pv-2.yaml
$ oc get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON
pv0001 <none> 1073741824 RWO,RWX Available
pv0002 <none> 5368709120 RWO Available
This is also what I get.
Than I'm going to my node:
oc login
test-admin
And I create a wordpress project:
oc new-project wordpress
# Create claims for storage in my project (same namespace).
# The claims in this example carefully match the volumes created above.
$ oc create -f examples/wordpress/pvc-wp.yaml
$ oc create -f examples/wordpress/pvc-mysql.yaml
$ oc get pvc
NAME LABELS STATUS VOLUME
claim-mysql map[] Bound pv0002
claim-wp map[] Bound pv0001
This looks exactly the same for me.
Launch the MySQL pod.
oc create -f examples/wordpress/pod-mysql.yaml
oc create -f examples/wordpress/service-mysql.yaml
oc create -f examples/wordpress/pod-wordpress.yaml
oc create -f examples/wordpress/service-wp.yaml
oc get svc
NAME LABELS SELECTOR IP(S) PORT(S)
mysql name=mysql name=mysql 172.30.115.137 3306/TCP
wpfrontend name=wpfrontend name=wordpress 172.30.170.55 5055/TCP
So actually everyting seemed to work! But when I'm asking for my pod status I get the following:
[root#ip-10-0-0-104 pv0002]# oc get pod
NAME READY STATUS RESTARTS AGE
mysql 0/1 Image: openshift/mysql-55-centos7 is ready, container is creating 0 6h
wordpress 0/1 Image: wordpress is not ready on the node 0 6h
The pods are in pending state and in the webconsole they're giving the following error:
12:12:51 PM mysql Pod failedMount Unable to mount volumes for pod "mysql_wordpress": exit status 32 (607 times in the last hour, 41 minutes)
12:12:51 PM mysql Pod failedSync Error syncing pod, skipping: exit status 32 (607 times in the last hour, 41 minutes)
12:12:48 PM wordpress Pod failedMount Unable to mount volumes for pod "wordpress_wordpress": exit status 32 (604 times in the last hour, 40 minutes)
12:12:48 PM wordpress Pod failedSync Error syncing pod, skipping: exit status 32 (604 times in the last hour, 40 minutes)
Unable to mount +timeout. But when I'm going to my node and I'm doing the following (test is a created directory on my node):
mount -t nfs -v masterhostname:/home/data/pv0002 /test
And I place some file in my /test on my node than it appears in my /home/data/pv0002 on my master so that seems to work.
What's the reason that it's unable to mount in OpenShift?
I've been stuck on this for a while.
LOGS:
Oct 21 10:44:52 ip-10-0-0-129 docker: time="2015-10-21T10:44:52.795267904Z" level=info msg="GET /containers/json"
Oct 21 10:44:52 ip-10-0-0-129 origin-node: E1021 10:44:52.832179 1148 mount_linux.go:103] Mount failed: exit status 32
Oct 21 10:44:52 ip-10-0-0-129 origin-node: Mounting arguments: localhost:/home/data/pv0002 /var/lib/origin/openshift.local.volumes/pods/2bf19fe9-77ce-11e5-9122-02463424c049/volumes/kubernetes.io~nfs/pv0002 nfs []
Oct 21 10:44:52 ip-10-0-0-129 origin-node: Output: mount.nfs: access denied by server while mounting localhost:/home/data/pv0002
Oct 21 10:44:52 ip-10-0-0-129 origin-node: E1021 10:44:52.832279 1148 kubelet.go:1206] Unable to mount volumes for pod "mysql_wordpress": exit status 32; skipping pod
Oct 21 10:44:52 ip-10-0-0-129 docker: time="2015-10-21T10:44:52.832794476Z" level=info msg="GET /containers/json?all=1"
Oct 21 10:44:52 ip-10-0-0-129 docker: time="2015-10-21T10:44:52.835916304Z" level=info msg="GET /images/openshift/mysql-55-centos7/json"
Oct 21 10:44:52 ip-10-0-0-129 origin-node: E1021 10:44:52.837085 1148 pod_workers.go:111] Error syncing pod 2bf19fe9-77ce-11e5-9122-02463424c049, skipping: exit status 32

Logs showed Oct 21 10:44:52 ip-10-0-0-129 origin-node: Output: mount.nfs: access denied by server while mounting localhost:/home/data/pv0002
So it failed mounting on localhost.
to create my persistent volume I've executed this yaml:
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "registry-volume"
},
"spec": {
"capacity": {
"storage": "20Gi"
},
"accessModes": [ "ReadWriteMany" ],
"nfs": {
"path": "/home/data/pv0002",
"server": "localhost"
}
}
}
So I was mounting to /home/data/pv0002 but this path was not on the localhost but on my master server (which is ose3-master.example.com. So I created my PV in a wrong way.
{
"apiVersion": "v1",
"kind": "PersistentVolume",
"metadata": {
"name": "registry-volume"
},
"spec": {
"capacity": {
"storage": "20Gi"
},
"accessModes": [ "ReadWriteMany" ],
"nfs": {
"path": "/home/data/pv0002",
"server": "ose3-master.example.com"
}
}
}
This was also in a training environment. It's recommended to have a NFS server outside of your cluster to mount to.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Openshift : AlertManager going on CrashLoopBackOff , Prometheus Operator - openshift

Related

Unable to start nginx-ingress-controller Readiness and Liveness probes failed

How to manually recreate the bootstrap client certificate for OpenShift 3.11 master?

How to solve bazel build error in Docker Build?

Upgrading K8S cluster from v1.2.0 to v1.3.0

Unable to mount volumes for pod

Categories

Resources