Deployment "tiller" exceeded its progress deadline - openshift

I'm trying to install tiller server to an Openshift project
Helm/tiller version: 2.9.0
My project name: paytiller
At step 3, executing this command (mentioned as per this document - https://www.openshift.com/blog/getting-started-helm-openshift)
oc rollout status deployment tiller
I get this error:
error: deployment "tiller" exceeded its progress deadline
I'm not clear on what's the error message or could find any logs.
Any idea why this error?
If this doesn't work, what are the other suggestions for templating in Openshift?
EDIT
oc get events
Events:
Type Reason Age From Message
---- ------ ---- ---- ---
Warning Failed 14m (x5493 over 21h) kubelet, example.com Error: ImagePullBackOff
Normal Pulling 9m (x255 over 21h) kubelet, example.com pulling image "gcr.io/kubernetes-helm/tiller:v2.9.0"
Normal BackOff 4m (x5537 over 21h) kubelet, example.com Back-off pulling image "gcr.io/kubernetes-helm/tiller:v2.9.0"
Thanks.

The issue was with the permissions on our OpenShift platform. We didn't have access to download from open-source directly.
We tried to add kubernetes-helm as a docker image to our organization repository and then we were able to pull the image to OpenShift project. It is working now. But still, we didn't get any clue of the issue from the logs.

The status ImagePullBackOff tells you that this image gcr.io/kubernetes-helm/tiller:v2.9.0 could not be pulled from the container registry. So your OpenShift node cannot pull that image for some reason. This is often due to network proxies, a non-existing image (not the issue here) or other restrictions in the (corporate) network.
You can use oc describe pod <pod that shows ImagePullBackOff> to find out the more detailed error message that may help you further.
Also, note that the blog post you linked is from 2017, which is very old. Here is a more current version: Build Kubernetes Operators from Helm Charts in 5 steps
.

Related

Openshift pods stuck in Init:0/1 status

I am deploying microservices in my openshift cluster but I can see out of 90 microservices nearly 10 got stuck in Init:0/1 status. Is there a way to troubleshoot the issue??
If you are using web page UI.
Go to your developer tab, and go to project.
There you should see the recent events in your project where the errors related to the 0/1 pods stuck state should appear. For me it was something like
Error creating: pods "xxxx" is forbidden: exceeded quota: project-quota, requested: requests.memory=500Mi, used: requests.memory=750Mi, limited: requests.memory=1Gi
So that meant that my project was attempting to have 1.25Gi of memory when 1Gi was the limit
In this case I went down to project quotas in my project screen.
and saw something like this in yaml format:
spec:
hard:
file-share-dr-off.storageclass.storage.k8s.io/requests.storage: xGi
file-share-dr-on.storageclass.storage.k8s.io/requests.storage: xGi
limits.cpu: 'x'
limits.memory: 1Gi
pods: 'x'
requests.cpu: 'x'
requests.memory: 1Gi
vsan.storageclass.storage.k8s.io/requests.storage: xGi
So I increased limits.memory and requests.memory to 2Gi for my project quota and hit save.
After that the pod errors got fixed.
And deployment went from 0/1 to 1/1 pods.

GCP deployment fails on "Updating service"

I have asp.net core application hosted on GCP App Engine. When I try to deploy the application it fails on last step:
Updating service [name] (this may take several minutes)... ...failed
ERROR: (gcloud.app.deploy) Error Response: [9] An internal error occurred while processing task /app-engine-flex/flex_await_healthy/flex_await_healthy>blablabla.wm.1
The exception stack trace show that service running in background couldn't find MySQL table (that table obviously exists).
my app.yaml file:
service: XXX
runtime: custom
env: flex
automatic_scaling:
max_concurrent_requests: 80
min_num_instances: 1
max_num_instances: 1
resources:
cpu: XXX
memory_gb: XXX
beta_settings:
cloud_sql_instances: "XXX:XXXX:XXXX=tcp:3306"
It looks like the application is deployed properly despite the error. This is the only error and backgroud service desn't throw any exceptions at later point. In fact it works properly and can connect to the database.
My guess was that maybe GCP is checking health while the application is not connected do database. So I tried to add liveness_check and readiness_check to app.yaml and configured dedicated /healthcheck endpoint in my application but it didn't make any change.
Any ideas how to fix it and what might be a cause?
Deploying app with new version fixed the issue

Hyperledger Composer CLI Ping to a Business Network returns AccessException

Im trying to learn Hyperledger Composer but seems to be a relatively new technology, i mean there are few tutorials and few solutions to a lot of questions, tutorial does not mention possible error case when following the commands and which means there are is also no solution for those errors.
I have joined the composer channel in their community chat, looks like its running in Discord or something, and asked the same question without a response, i have a better experience here in SO.
This is the problem: I have deployed my business network, installed it, started it, created my network admin card and imported it, then to test if everything is ok i have to command composer network ping --card NAME-OF-MY-ADMIN-CARD
And this error comes:
juan#JuanDeDios:~/proyectos/inovacion/a3-poliza-microservice$ composer network ping --card admin#a3-policy-microservice
Error: transaction returned with failure: AccessException: Participant 'org.hyperledger.composer.system.NetworkAdmin#admin' does not have 'READ' access to resource 'org.hyperledger.composer.system.Network#a3-policy-microservice#0.0.1'
Command failed
I think that it has to do something with the permission.acl file, and gave permission to everyone to everything so there would not be any restrictions to anyone, and tryied again, but failed.
So i thought i had to uninstall my business network and create it again, i deleted my .bna and my network.card files also so everything would be created again, but the same error result.
My other attempt was to update the business network, but didn't work, the same error happened and I'm sure i didn't miss any step from the tutorial. I do also followed the playground tutorial. What i have not done its to create another app with the Yeoman but i will do if i don't find a solution to this problem which would not require me to create another app.
This were my steps:
1-. Created my app with Yeoman
yo hyperledger-composer:businessnetwork
2-. Selected Apache-2.0 for my license
3-. Created a3-policy-microservice as the name of the business network
4-. Created org.microservice.policy (Yeah i switched names but Im totally aware)
5-. Generated my app with a template selecting the NO option
6-. Created my assets, participants and transactions
7-. Changed my permission rules to mine
8-. I generated the .bna file
composer archive create -t dir -n .
9-. Then installed my bna file
composer network install --card PeerAdmin#hlfv1 --archiveFile a3-policy-microservice#0.0.1.bna
10-. Then started my network and created my networkadmin card
composer network start --networkName a3-policy-network --networkVersion 0.0.1 --networkAdmin admin --networkAdminEnrollSecret adminpw --card PeerAdmin#hlfv1 --file networkadmin.card
11-. Imported my card
composer card import --file networkadmin.card
12-. Tried to ping my network
composer network ping --card admin#a3-poliza-microservice
And the error happens
Later i tried to create everything again shutting down my fabric and started it again and creating the network from the first step.
My other attempt was to change the permissions and upgrade my bna network, but it failed too. Im running out of options
Hope this description its not too long to ignore it. Thanks in advance
thanks for the question!
First possibility is that your network name is a3-policy-network but you're pinging a network called a3-poliza-microservice - once you do get the correct ACLs in place (currently, that's the error you're trying to resolve).
The procedure for upgrade would normally be the procedure below:
After your step 12 (where you can't ping the business network due to restrictive ACL conditions, assuming you are using the right network name) you would have:
Make the changes to to include your System ACLs this time eg.
/**
* Sample access control list.
*/
rule SystemACL {
description: "System ACL to permit all access"
participant: "org.hyperledger.composer.system.Participant"
operation: ALL
resource: "org.hyperledger.composer.system.**"
action: ALLOW
}
rule NetworkAdminUser {
description: "Grant business network administrators full access to user resources"
participant: "org.hyperledger.composer.system.NetworkAdmin"
operation: ALL
resource: "**"
action: ALLOW
}
rule NetworkAdminSystem {
description: "Grant business network administrators full access to system resources"
participant: "org.hyperledger.composer.system.NetworkAdmin"
operation: ALL
resource: "org.hyperledger.composer.system.**"
action: ALLOW
}
Update the "version" field in your existing package.json in your Business Network project directory (ie need to change it next increment - eg. update the version property from 0.0.1 to 0.0.2.)
From the same directory, run the following command:
composer archive create --sourceType dir --sourceName . -a a3-policy-network#0.0.2.bna
Now install the new business network code firstly:
composer network install --card PeerAdmin#hlfv1 --archiveFile a3-policy-network#0.0.2.bna
Then perform the requisite upgrade step (single '-' for short form of the parameter):
composer network upgrade -c PeerAdmin#hlfv1 -n a3-policy-network -V 0.0.2
After a few seconds, ping the network again to see ACL changes are now in effect:
composer network ping -c a3-policy-network

Openshift 3 , 503 Error (No server is available to handle this request)

I have created a web application using jsp/tiles/struts/mysql/tomcat. I created new project on Openshift 3 console (Openshift online) https://console.preview.openshift.com/console/ then added tomcat/mySql. I was getting 503 error sometimes and other times, same page was working as expected. 503 error came randomly for any page from my project. When I get 503 error, I refresh some no of times and it goes away, and my page is correctly displayed.
Error that I see is:
"503 Service Unavailable
No server is available to handle this request. "
I did some research:
What I understand from this openshift 2 link:
https://blog.openshift.com/how-to-host-your-java-ee-application-with-auto-scaling/
is that to correct 503 error:
SSH into your application gear using rhc ssh --app <app_name>
Change directory to haproxy/conf
change the following in haproxy.cfg option httpchk GET / to option httpchk GET /api/v1/ping
Restart the HAProxy cartridge from your local machine using RHC rhc cartridge-restart --cartridge haproxy
I dont know if it is also applicable to openshift 3. In openshift 3 where is haproxy.log, haproxy.cfg, haproxy/conf or its slightly different in openshift 3. (Nut thanks to Warrens comments, yes he saw 503 error in openshift related to HAProxy)
Now after 1 week after posting this question:
I am getting Quota Reached Error. I am able to build my project but all deployments are failing. I wonder if 503 error that I was getting earlier(either completely or partially) was related to Quota reached. How should I proceed now.
curl -i localhost:8080/GEA
HTTP/1.1 302 Found Server:
Apache-Coyote/1.1
Location: http://localhost:8080/GEA/
Transfer-Encoding: chunked Date: Tue, 11 Apr 2017 18:03:25 GMT
Tomcat logs do not show any application error.
Will Readiness Probe and Liveness Probe help me? I have not set them yet.
Nor do I know how to set them.
Will scaling help me (I dont know how to set it either)
Do I have to set memory/... all at maximum allowed to ensure project runs smooth?
For me I had a similar situation of getting 503's sometimes and sometimes getting my actual page. the reason was because you have haproxy on the frontend handling the requests. Depending on your setup you may even have a few haproxy pods and your request could be funneled between one of the pods. So as in my case one pod was working and the other not.
So basically
oc get pods -n default
NAME READY STATUS RESTARTS AGE
docker-registry-7-i02rh 1/1 Running 0 75d
registry-console-12-wciib 1/1 Running 0 67d
router-1-533cg 1/1 Running 3 76d
router-1-9utld 1/1 Running 1 76d
router-1-uwf64 1/1 Running 1 76d
As you can see in my output default namespace is where my router(haproxy) pods live. If I change to that namespace
oc project default
Then run
oc logs -f router-1-533cg
on each of the pods you will most likely find a sepcific pod that is behaving bad. You can simply delete, and the replication controller will create a new one

Why does my openshift app timeout when I try to access the URL?

I am trying to set up a BrowserQuest server that runs in openshift
I've been following this readme. Everything seems to go fine, I get to the end and run rhc app show bq and get the following output:
bq # http://bq-plantagenet.rhcloud.com/ (uuid: 55e4311189f5cf028d0000fc)
------------------------------------------------------------------------
Domain: plantagenet
Created: 8:18 AM
Gears: 1 (defaults to small)
Git URL: ssh://55e4311189f5cf028d0000fc#bq-plantagenet.rhcloud.com/~/git/bq.git/
SSH: 55e4311189f5cf028d0000fc#bq-plantagenet.rhcloud.com
Deployment: auto (on git push)
nodejs-0.10 (Node.js 0.10)
--------------------------
Gears: Located with smarterclayton-redis-2.6
smarterclayton-redis-2.6 (Redis)
--------------------------------
From: http://cartreflect-claytondev.rhcloud.com/reflect?github=smarterclayton/openshift-redis-cart
Website: https://github.com/smarterclayton/openshift-redis-cart
Gears: Located with nodejs-0.10
But when I try to access http://bq-plantagenet.rhcloud.com:8080/ in a browser, I get:
The connection has timed out
The server at bq-plantagenet.rhcloud.com is taking too long to respond
My questions are what is going wrong and how can I fix it? Many thanks for your consideration in reading through this and any suggestions you might have for resolving it
You need to access http://bq-plantagenet.rhcloud.com, leave off the port 8080, that is the port you listen on internally. You should also try checking your log files (https://developers.openshift.com/en/managing-log-files.html) to see what errors your application is producing.