OpenShift rebuild fails to push image: connection refused - openshift

I am working with OpenShift Origin 3.9 and had an application (consisting of a service, pods, etc.) building and running alright.
However, now rebuilds fail with this error message:
Successfully built 1234567890ab
Pushing image docker- registry.default.svc:5000/my_project/my_app:latest ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount#example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image:
After retrying 6 times, Push image still failed due to error:
Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp 1.2.3.4:5000:
getsockopt: connection refused
I don't have admin privileges on that cluster, so it is unlikely that this is due the the nodes' DNS setup, as similar answers would suggest (e.g. here).
One possibly contributing cause could be that I had created a service account in the meantime (since the last successful build) and temporarily logged in with its API token. However I am no logged in again with (an API token for) my full account (e.g. according to oc whoami.)
This is how I am starting the rebuild:
oc login --token=$api_token
oc start-build --follow my_app
What could explain this error and how can I further diagnose and overcome it, esp. given that I don't have cluster admin rights?

The problem "somehow" went away after some days. Whether by operator intervention or otherwise I cannot tell.

You missed one steps
oc policy add-role-to-user system:image-builder
Please follow this doc
https://blog.openshift.com/remotely-push-pull-container-images-openshift/

Related

Deployment "tiller" exceeded its progress deadline

I'm trying to install tiller server to an Openshift project
Helm/tiller version: 2.9.0
My project name: paytiller
At step 3, executing this command (mentioned as per this document - https://www.openshift.com/blog/getting-started-helm-openshift)
oc rollout status deployment tiller
I get this error:
error: deployment "tiller" exceeded its progress deadline
I'm not clear on what's the error message or could find any logs.
Any idea why this error?
If this doesn't work, what are the other suggestions for templating in Openshift?
EDIT
oc get events
Events:
Type Reason Age From Message
---- ------ ---- ---- ---
Warning Failed 14m (x5493 over 21h) kubelet, example.com Error: ImagePullBackOff
Normal Pulling 9m (x255 over 21h) kubelet, example.com pulling image "gcr.io/kubernetes-helm/tiller:v2.9.0"
Normal BackOff 4m (x5537 over 21h) kubelet, example.com Back-off pulling image "gcr.io/kubernetes-helm/tiller:v2.9.0"
Thanks.
The issue was with the permissions on our OpenShift platform. We didn't have access to download from open-source directly.
We tried to add kubernetes-helm as a docker image to our organization repository and then we were able to pull the image to OpenShift project. It is working now. But still, we didn't get any clue of the issue from the logs.
The status ImagePullBackOff tells you that this image gcr.io/kubernetes-helm/tiller:v2.9.0 could not be pulled from the container registry. So your OpenShift node cannot pull that image for some reason. This is often due to network proxies, a non-existing image (not the issue here) or other restrictions in the (corporate) network.
You can use oc describe pod <pod that shows ImagePullBackOff> to find out the more detailed error message that may help you further.
Also, note that the blog post you linked is from 2017, which is very old. Here is a more current version: Build Kubernetes Operators from Helm Charts in 5 steps
.

Unable to create a new app using an image from openshift internal registry

I have an nginx image ans I am able to push it to openshift internal registry. However, when I try to use that image from internal registry to create an app, it gives me imagepullback error.
Below are the steps which I am following.
[root#artel1 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/nginx latest 231d40e811cd 4 weeks ago 126 MB
[root#artel1 ~]# docker tag 231d40e811cd docker-registry-default.router.default.svc.cluster.local/openshift/nginx
[root#artel1 ~]# docker push docker-registry-default.router.default.svc.cluster.local/openshift/nginx
[root#artel1 ~]# oc new-app --docker-image=docker-registry-default.router.default.svc.cluster.local/openshift/test-image
W1227 10:18:34.761105 33535 dockerimagelookup.go:233] Docker registry lookup failed: Get https://docker-registry-default.router.default.svc.cluster.local/v2/: x509: certificate signed by unknown authority
W1227 10:18:34.784988 33535 newapp.go:479] Could not find an image stream match for "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest". Make sure that a Docker image with that tag is available on the node for the deployment to succeed.
--> Found Docker image 7809d84 (8 days old) from docker-registry-default.router.default.svc.cluster.local for "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
OpenShift Node
--------------
This is a component of OpenShift and contains the software for individual nodes when using SDN.
Tags: openshift, node
* This image will be deployed in deployment config "test-image"
* Ports 53/tcp, 8443/tcp will be load balanced by service "test-image"
* Other containers can access this service through the hostname "test-image"
* WARNING: Image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest" runs as the 'root' user which may not be permitted by your cluster administrator
--> Creating resources ...
deploymentconfig.apps.openshift.io "test-image" created
service "test-image" created
--> Success
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/test-image'
Run 'oc status' to view your app.
Events logs
34s 47s 2 test-image-1-dzhmk.15e44d430e48ec8d Pod spec.containers{test-image} Normal Pulling kubelet, artel2.fyre.ibm.com pulling image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
34s 46s 2 test-image-1-dzhmk.15e44d4318ec7f53 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Failed to pull image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest": rpc error: code = Unknown desc = Error: image openshift/test-image:latest not found
34s 46s 2 test-image-1-dzhmk.15e44d4318ed5311 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Error: ErrImagePull
27s 46s 7 test-image-1-dzhmk.15e44d433c24e5c9 Pod Normal SandboxChanged kubelet, artel2.fyre.ibm.com Pod sandbox changed, it will be killed and re-created.
25s 43s 6 test-image-1-dzhmk.15e44d43dd6a7b57 Pod spec.containers{test-image} Warning Failed kubelet, artel2.fyre.ibm.com Error: ImagePullBackOff
25s 43s 6 test-image-1-dzhmk.15e44d43dd6a10d9 Pod spec.containers{test-image} Normal BackOff kubelet, artel2.fyre.ibm.com Back-off pulling image "docker-registry-default.router.default.svc.cluster.local/openshift/test-image:latest"
Pod status
[root#artel1 ~]# oc get po
NAME READY STATUS RESTARTS AGE
test-image-1-deploy 1/1 Running 0 3m
test-image-1-dzhmk 0/1 ImagePullBackOff 0 3m
Where exactly things are going wrong ?
It looks like 'docker push' hasn't been completed successfully. It should return 'Image successfully pushed'.
Try to login to internal registry first (see accessing_registry), and recheck registry's service hostname or use service ip

Openshift 3 , 503 Error (No server is available to handle this request)

I have created a web application using jsp/tiles/struts/mysql/tomcat. I created new project on Openshift 3 console (Openshift online) https://console.preview.openshift.com/console/ then added tomcat/mySql. I was getting 503 error sometimes and other times, same page was working as expected. 503 error came randomly for any page from my project. When I get 503 error, I refresh some no of times and it goes away, and my page is correctly displayed.
Error that I see is:
"503 Service Unavailable
No server is available to handle this request. "
I did some research:
What I understand from this openshift 2 link:
https://blog.openshift.com/how-to-host-your-java-ee-application-with-auto-scaling/
is that to correct 503 error:
SSH into your application gear using rhc ssh --app <app_name>
Change directory to haproxy/conf
change the following in haproxy.cfg option httpchk GET / to option httpchk GET /api/v1/ping
Restart the HAProxy cartridge from your local machine using RHC rhc cartridge-restart --cartridge haproxy
I dont know if it is also applicable to openshift 3. In openshift 3 where is haproxy.log, haproxy.cfg, haproxy/conf or its slightly different in openshift 3. (Nut thanks to Warrens comments, yes he saw 503 error in openshift related to HAProxy)
Now after 1 week after posting this question:
I am getting Quota Reached Error. I am able to build my project but all deployments are failing. I wonder if 503 error that I was getting earlier(either completely or partially) was related to Quota reached. How should I proceed now.
curl -i localhost:8080/GEA
HTTP/1.1 302 Found Server:
Apache-Coyote/1.1
Location: http://localhost:8080/GEA/
Transfer-Encoding: chunked Date: Tue, 11 Apr 2017 18:03:25 GMT
Tomcat logs do not show any application error.
Will Readiness Probe and Liveness Probe help me? I have not set them yet.
Nor do I know how to set them.
Will scaling help me (I dont know how to set it either)
Do I have to set memory/... all at maximum allowed to ensure project runs smooth?
For me I had a similar situation of getting 503's sometimes and sometimes getting my actual page. the reason was because you have haproxy on the frontend handling the requests. Depending on your setup you may even have a few haproxy pods and your request could be funneled between one of the pods. So as in my case one pod was working and the other not.
So basically
oc get pods -n default
NAME READY STATUS RESTARTS AGE
docker-registry-7-i02rh 1/1 Running 0 75d
registry-console-12-wciib 1/1 Running 0 67d
router-1-533cg 1/1 Running 3 76d
router-1-9utld 1/1 Running 1 76d
router-1-uwf64 1/1 Running 1 76d
As you can see in my output default namespace is where my router(haproxy) pods live. If I change to that namespace
oc project default
Then run
oc logs -f router-1-533cg
on each of the pods you will most likely find a sepcific pod that is behaving bad. You can simply delete, and the replication controller will create a new one

Openshift: Error pulling image from remote, secure docker registry using certificates

I use the all-in-one VM of Openshift origin.
I am trying to pull images from a private, secure registry using an Image Stream. This is the ImageStream definition:
apiVersion: v1
kind: ImageStream
metadata:
name: my-image-stream
annotations:
description: Keeps track of changes in the application image
name: my-image
spec:
dockerImageRepository: "my.registry.net/myproject/my-image"
The repository is secured with a certificate. On my local machine, i have them in /etc/docker/certs.d/my.registry.net and I can login with docker login my.registry.net.
When I run oc import-image, however, I get the following error:
The import completed with errors.
Name: my-image
Namespace: myproject
Created: About an hour ago
Labels: <none>
Description: Keeps track of changes in the application image
Annotations: openshift.io/image.dockerRepositoryCheck=2017-01-27T08:09:49Z
Docker Pull Spec: 172.30.53.244:5000/myproject/my-image
Unique Images: 0
Tags: 1
latest
tagged from my.registry.net/myproject/my-image
! error: Import failed (InternalError): Internal error occurred: Get https://my.registry.net/v2/: remote error: handshake failure
About an hour ago
I have copied the certificates to the vagrant machine and restarted the docker daemon, but the problem remains. I have not found any documentation on how to properly add the certificates, so I just put them in the usual docker folder.
What is the appropriate way to make this work?
Update in response to rezie's answer:
There is no file etc/origin/master/ca-bundle.crt on my vagrant box. I found the following ca-bundle.crt files :
$ find / -iname ca-bundle.crt
/etc/pki/tls/certs/ca-bundle.crt
##multiple lines like
/var/lib/docker/devicemapper/mnt/something-hash-like/rootfs/etc/pki/tls/certs/ca-bundle.crt
/var/lib/origin/openshift.local.config/master/ca-bundle.crt
I appended the root certificate to /etc/pki/tls/certs/ca-bundle.crt and to var/lib/origin/openshift.local.config/master/ca-bundle.crt, but that did not change anything.
Please note, however, that I do not need to have this root certificate in /etc/docker/certs.d/... in order to login directly using docker login my.registry.net
I have appended
I cannot comment due tow lo karma so I'll write an answer saying almost the same as rezie.
The error:
! error: Import failed (InternalError): Internal error occurred: Get https://my.registry.net/v2/: remote error: handshake failure
About an hour ago
Comes from OpenShift, not from docker, therefore adding it to /etc/docker/certs.d/my.registry.net doesn't prevent the error from happening.
You should add the CA certificate at OS level, my guess is the steps failed for some reason so do it this way:
openssl s_client -connect my.registry.net:443 </dev/null |
sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' \
> /etc/pki/ca-trust/source/anchors/my.registry.net.crt &&
update-ca-trust check && update-ca-trust extract
Finally test if it worked running
curl https://my.registry.net/v2
If it doesn't give you a certificate error and you still can't do the oc import restart the atomic-openshift-master-api service
Try appending your CA (the same one you said you said that was used in the my.registry.net directory) into Openshift's ca bundle (e.g. /etc/origin/master/ca-bundle.crt. Then restart the service and reattempt import-image (making sure that you do not include the --insecure flag).
For reference, check out this issue from the Origin project. As you've mentioned, there's currently no way to supply certificates along with the dockercfg secret, and the suggestion from that issue is to add the CA as a trusted root CA across all the hosts.

Why does my openshift app timeout when I try to access the URL?

I am trying to set up a BrowserQuest server that runs in openshift
I've been following this readme. Everything seems to go fine, I get to the end and run rhc app show bq and get the following output:
bq # http://bq-plantagenet.rhcloud.com/ (uuid: 55e4311189f5cf028d0000fc)
------------------------------------------------------------------------
Domain: plantagenet
Created: 8:18 AM
Gears: 1 (defaults to small)
Git URL: ssh://55e4311189f5cf028d0000fc#bq-plantagenet.rhcloud.com/~/git/bq.git/
SSH: 55e4311189f5cf028d0000fc#bq-plantagenet.rhcloud.com
Deployment: auto (on git push)
nodejs-0.10 (Node.js 0.10)
--------------------------
Gears: Located with smarterclayton-redis-2.6
smarterclayton-redis-2.6 (Redis)
--------------------------------
From: http://cartreflect-claytondev.rhcloud.com/reflect?github=smarterclayton/openshift-redis-cart
Website: https://github.com/smarterclayton/openshift-redis-cart
Gears: Located with nodejs-0.10
But when I try to access http://bq-plantagenet.rhcloud.com:8080/ in a browser, I get:
The connection has timed out
The server at bq-plantagenet.rhcloud.com is taking too long to respond
My questions are what is going wrong and how can I fix it? Many thanks for your consideration in reading through this and any suggestions you might have for resolving it
You need to access http://bq-plantagenet.rhcloud.com, leave off the port 8080, that is the port you listen on internally. You should also try checking your log files (https://developers.openshift.com/en/managing-log-files.html) to see what errors your application is producing.