How to programmatically restart a Openshift POD when an exception occurs - openshift

Is there a way to programatically restart PODs when an exception occurs.Please let me know.
Thanks.

This is the exact use case for Liveness probes: https://docs.openshift.com/container-platform/4.6/applications/application-health.html
Applying a Liveness probe to your pod/deployment will perform the defined check (HTTP/Exec/TCP) against the pod. If it errors out, it will automatically trigger a restart of the pod.

Related

connection refused when i execute oc new-project demo

Openshift local has been installed successfully I can see that running on mac.
but when i do oc new-project demo
The connection to the server 127.0.0.1:63267 was refused - did you specify the right host or port?
I don't have a definitive answer, but here are some troubleshooting steps based on your comments.
First, you should not be executing oc with sudo. My initial guess is that your problems are related to that mistake: you may have changed some of your configuration files to owned by root and/or corrupted them somehow.
Try oc config get-contexts. This will list all of the currently defined clusters. Check to make sure that localhost or 63267 isn't in there anywhere. As selllami points out, 63267 isn't an expected port (nor is localhost a expected host).
Also check the permissions on your kubeconfig files: ls -l ~/.kube, especially if you see something unusual in your context in the previous step.
You may find the answer obvious from these steps and be able to resolve the problem by fixing your permissions or your context definition. But, if not, you may just want to delete your .kube directory and login again without using sudo so that the proper config is created. (Of course, backup the directory first, especially if you have other K8S clusters you connect to.)

Running Fiware-Cygnus listener in CentOS

I have a VM with CentOS installed, where Orion Context Broker, Cygnus, Mosqquito and MongoDB are present. When I am checking connections with the following command:
netstat -ntlpd
I receive the following data Connections
It is seen that something is already listening to ports 8081 and 5050 (which are of Cygnus). But the Cygnus itself is not active, when I use the following:
service cygnus status
There aren't any instance of Cygnus running
While trying to run Cygnus test, it gives me fatal error which states that ports are taken and that the configuration is wrong.
Trying to run cygnus from
sudo service cygnus start
also fails. Here is the systemctl status:
FailedCygnus
After checking what is the process under the PID that is assigned to the Cygnus ports, I have this:
CygnusPorts
Perhaps someone has any clue what that can be? It feels like Cygnus is there but something is configured wrong. Also, is there another way of running Cygnus then, because I need to receive notifications from subscriptions somehow.
Thank you in advance, Stackoverflow!
EDIT 1
Tried killing processes under those PIDs that are listening to ports 5050 and 8081 but it did not help, cygnus still cannot be started.
Currently thinking of simply reinstalling everything.
EDIT 2
So, I have managed to run the simple "dummy" listener using the agent_test file. But I guess it is good only in the beginning and for learning purposes, later using own configurations is preferred?
So, for further investigation using agent-test.conf file is enough for me, the listener works and data is stored in a database. Perhaps in the future I will encounter this problem again, but for now it works.
What I had to do beforehand is to kill existing processes.

Getting error while trying to deploy blockchain inside of an IBM Container Service

For deploying blockchain in an IBM Cloud Container Service, I am following the steps outlined on https://github.ibm.com/IBM-Blockchain/ibm-container-service/blob/v1.0.0/cs-offerings/free/README.md
while running the script "create_all.sh" I am getting the following error repeatedly:
Unable to connect to the server: dial tcp 127.0.0.1:8080: connectex: No connection could be made because the target machine actively refused it.
Waiting for createchannel container to be Completed
I have already tried starting the procedure from the first step all over again. But no luck so far. Not sure why I keep getting this error.
Any help or hint in this regard will be of great value to me. Thanks!
The environment variable KUBECONFIG should be pointing to the correct "kube config yml" file.

OpenShift Pod gets stuck in pending state

MySQL pod in OpenShift gets stuck after new deployment and shows the message "The pod has been stuck in the pending state for more than five minutes." What can I do to solve this? I tried to scale the current deployment pod to 0 and scale the previous deployment pod to 1. But it also got stuck which was working earlier.
If pod stuck in pending state we can remove it by executing
oc delete pod/<name of pod> --grace-period=0
This command would remove pods immediately, but use it with caution because it may leave some process pid files on persistent volumes.

Cluster not responding, weird error message

My container engine cluster has a red exclamation mark next to its name in the Google cloud console overview of the container engine. A tooltip says "The cluster has a problem. Click the cluster name for details." Once I click the name I don't get any more infos, it's just the usual summary.
Stackdriver doesn't report anything unusual. No incidents are logged, all pods are marked as healthy but I can't reach my services.
Trying to get infos or logs via kubectl doesn't work:
kubectl cluster-info
Unable to connect to the server: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout
How can I debug this problem? And what does this cryptic message mean anyway?
Are you able to use other kubectl commands such as kubectl get pods?
This sounds like the cluster isn't set up correctly or there's some network issue. Would you also try kubectl config view to see how your cluster is configured? More specifically, look for current-context and clusters fields to see if your cluster is configured as expected.
In our case it was a billing issue. Someone mistakenly disabled the billing profile for our project. We re-enabled it and waited a while, after 20 - 30 mins the cluster came back up with no errors