Openshift: How to alert/publish a message if deployment/build fails - openshift

In our deployment process it is crucial that we are informed, when a deployment fails. The deployment is rolling, but an information through slack would be nice anyway. Would it be possible through lifecycles or what other possibilities do exist?

The deployment status is logging out as events logs from OpenShift usually.
Do you use OpenShift logging component EFK stack ? Then additionally consider to install EventRouter, it collects OpenShift events logs as eventrouter pod's logs.
You can pick up the deployment event messages from the logs and trigger the alert by custom script or monitoring system's log tailing feature and so on.
Refer Specifying Logging Ansible Variables
for ansible variable details.
openshift_logging_install_eventrouter
openshift_logging_eventrouter_nodeselector
openshift_logging_eventrouter_namespace
...

You can pass customParams to the deployment process and do a curl if openshift-deploy fails.
"strategy": {
"type": "Rolling",
"timeoutSeconds": 180,
"customParams": {
"command": [
"/bin/sh",
"-c",
"set -e && if ! openshift-deploy; then curl -i -X POST -d '{\"text\": \"Deployment of ${application} failed!\"}' ${webhook} && exit 1; else echo \"Deployment complete\"; fi"
]
}

Related

Oracle cloud api health check

I have below command for creating api health check in oracle cloud.
oci health-checks http-monitor create --compartment-id ocid1.compartment.oc1..aaaaaaaabbb5aavs3npxp6ttq525qoollwxtrjmp1vh6skthcsitfzpw4sq2rfa --display-name "keepalive-check" --interval-in-seconds 300 --method HEAD --protocol "HTTPS" --timeout-in-seconds 60 --targets "[api.abcglobal.com]" --path "/dev/user-service/warm" --vantage-point-names '["aws-sin"]'
While running this command from cloud terminal I am getting below error. Any help would be appreciated.
***Parameter 'targets' must be in JSON format.***
- Command
**ocidevelop#cloudshell:~ (ap-hyderabad-1)$** *oci health-checks http-monitor create --compartment-id ocid1.compartment.oc1..aaaaaaaabbb5aavs3npxp6ttq525qoollwxtrjmp1vh6skthcsitfzpw4sq2rfa --display-name "keepalive-check" --interval-in-seconds 300 --method HEAD --protocol "HTTPS" --timeout-in-seconds 60 --targets "[api.abcglobal.com]" --path "/dev/user-service/warm" --vantage-point-names '["aws-sin"]'*
**Parameter 'targets' must be in JSON format.**
For help with formatting JSON input see our documentation here: https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/cliusing.htm#ManagingCLIInputandOutput
--targets is a complex parameter. You can create its skeleton using https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.6.1/oci_cli_docs/cmdref/health-checks/http-monitor/create.html#cmdoption-targets
Please follow this:
oci health-checks http-monitor create --generate-param-json-input targets > target.json
edit target.json
oci health-checks http-monitor create --compartment-id $C --protocol "HTTPs" --display-name "test" --interval-in-seconds "300" --targets file://target.json

Configuring Apache drill for Cassandra

I am trying to configure Cassandra with Drill. I used the same approach given on the link: https://drill.apache.org/docs/starting-the-web-ui/.
I used the following code for New Storage Plugin:
{
"type": "cassandra",
"hosts": [
"127.0.0.1"
],
"port": 9042,
"username": "<username>",
"password": "<password>",
"enabled": false
}
I have attached the Screenshot here.
But I'm getting the following error:
Please retry: Error (invalid JSON mapping)
How can I resolve this?
All the code :
Git: https://github.com/yssharma/drill/tree/cassandra-storage
Patch: https://gist.github.com/yssharma/2581ae8a97c559b2677f
1. Get Drill: Lets get the Drill source
$ git clone https://github.com/apache/drill.git
2. Get Cassandra Storage patch/Download the Patch file from:
https://reviews.apache.org/r/29816/diff/raw/
3. Apply the patch on top of Drill
$ cd drill
$ git apply --check ~/Downloads/DRILL-92-CassandraStorage.patch
$ git apply ~/Downloads/DRILL-92-CassandraStorage.patch
4. Build Drill with Cassandra Storage & export distribution to /opt/drill
$ mvn clean install -DskipTests
$ mkdir /opt/drill
$ tar xvzf distribution/target/*.tar.gz --strip=1 -C /opt/drill
5. Start Sqlline.
That it we have finished with the Drill build and installation – and its time we can start using Drill.
$ cd /opt/drill
$ bin/sqlline -u jdbc:drill:zk=local -n admin -p admin
Drill-Sqlline
Hit ‘show schemas‘ to view existing schemas.
Drill-Sqlline-schemas
6. Drill Web interface
You should be able to see the Drill web interface on localhost:8047, or whatever your host/port is.
Use this as your config:
{
"type": "cassandra",
"config": {
"cassandra.hosts": [
"127.0.0.1",
"127.0.0.2"
],
"cassandra.port": 9042
},
"enabled": true
}
Also, if this doesnt work, know that they are working on a plugin for it now: https://github.com/apache/drill/pull/1960
I'll give an update here as well. We're doing some serious refactoring of the how Drill works with storage plugins. Specifically, we're working to incorporate the Calcite Adapter1 for Cassandra. The reason for this is that the hard part of storage plugins isn't the connection, it's the optimizations. Calcite already does query planning for Drill and already implemented a bunch of these adapters which means that the work of figuring out all the optimizations (AKA pushdowns) is largely done.
In the case of Cassandra/Scylla, this is particularly important because some filters should be pushed down to Cassandra, and some should absolutely not be pushed down. The adapters also include aggregate pushdowns--something which no Drill plugins currently do. Again the point of this is that once we commit this, the connector should work VERY will with Cassandra/Scylla. We have one for ElasticSearch that is very near completion and once that's done the Cassandra plugin is next. If you have any suggestions/comments or other feedback, please post on the pull request linked above.
** UPDATE 11 April 2021: Cassandra/Scylla Plugin Now Merged in Drill 1.19.0-SNAPSHOT **

Kubernetes google cloud composer with gitlab ci yaml file

I am working on the deployment of a gitlab CI pipeline to trigger a google cloud composer DAG
Below is the .yaml I wrote :
stages:
- deploy
deploy:
stage: deploy
image: google/cloud-sdk
script:
- apt-get update && apt-get --only-upgrade install kubectl google-cloud-sdk
- gcloud config set project $GCP_PROJECT_ID
- gsutil cp plugins/*.py ${PLUGINS_BUCKET}
- gsutil cp dags/*.py ${DAGS_BUCKET}
- kubectl get pods
- gcloud composer environments run ${COMPOSER_ENVIRONMENT} --location ${ENVIRONMENT_LOCATION} trigger_dag -- ${DAG_NAME}
Unfortunately, the execution of the pipleine fails with the error below :
$ gcloud config set project $GCP_PROJECT_ID
Updated property [core/project].
$ gsutil cp plugins/*.py ${PLUGINS_BUCKET}
Copying file://plugins/dataproc_custom_operators.py [Content-Type=text/x-python]...
/ [0 files][ 0.0 B/ 2.3 KiB]
/ [1 files][ 2.3 KiB/ 2.3 KiB]
Operation completed over 1 objects/2.3 KiB.
$ gsutil cp dags/*.py ${DAGS_BUCKET}
copying file://dags/frrm_infdeos_workflow.py [Content-Type=text/x-python]...
/ [0 files][ 0.0 B/ 3.3 KiB]
/ [1 files][ 3.3 KiB/ 3.3 KiB]
Operation completed over 1 objects/3.3 KiB.
$ gcloud composer environments run ${COMPOSER_ENVIRONMENT} --location ${ENVIRONMENT_LOCATION} trigger_dag -- ${DAG_NAME}
kubeconfig entry generated for europe-west1-nameenvironment-a5456e0c-gke.
ERROR: (gcloud.composer.environments.run) No running GKE pods found. If the environment was recently started, please wait and retry.
ERROR: Job failed: command terminated with exit code 1
Do you have any idea about how to fix this please ?
Best regards
I had the same problem as #scalacode. For me, the solution was that the gitlab-runner was running in a different GCP Project than the Composer Environment, so it failed without specifying that error. Running a gitlab-runner in the same project as the Composer Environment fixed the issue.
It seems Composer is unable to retrieve information about the pods/GKE cluster. This could be for a number of reasons ranging from the GKE cluster not creating the nodes to the pods being in a crash loop.
I notice in the script you did not “get-credentials” to authenticate to the cluster. When running commands on a GKE cluster through CLI, traditionally you would first have to authenticate to the cluster first with command. To do this with composer:
gcloud composer environments describe ${COMPOSER_ENVIRONMENT} --location ${ENVIRONMENT_LOCATION} --format="get(config.gkeCluster)"
This will return something of the form: projects/PROJECT/zones/ZONE/clusters/CLUSTER Then run:
gcloud container clusters get-credentials ${CLUSTER} --zone ${ZONE}
Once you have authenticated to the cluster in the script, see if it is now able to complete. If not, try running kubectl get pods to see what is happening with the pods/if they exist.
If you see many pods restarting or generally not in the “running/completed” state, the issue could be with the pod configuration.
If you don’t see pods at all, the deployment may have failed. Check the deployment with command kubectl get deployments.
The deployments airflow-scheduler, airflow-sqlproxy, & airflow-worker should be present. If those three deployments are not present, the environment was likely tampered with, & it would be easiest to make a new environment.

How to display the content of the volume?

How to display the content of the openshift volume? (files that are in, the total space used etc.).
The only information I've managed to find in the docs is to oc rsh into the running POD and use ls, which of course is no way a viable solution if no pod using the volume is running and can't be started because of some issues with the volume...
For the moment there's no "volume file explorer" or whatever interface in Openshift.
Currently you always need to attach the volume to a running pod and list files within.
If you're using glusterfs (and are cluster/storage admin) all volumes are also mounted inside the storage pods , so you can get a complete overview within the storage pods.
I don't know these ways are fit for you, but I just list the availabilities as follows.
As far as I remember, if the pod can be created based on docker image, then you can run without run the application like this.
oc run tmp-pod --image=your-docker-registry.default.svc/yourapplication -- tail -f /dev/null
You are using PersistentVolume(PV/PVC pair) for your volume, then you can display the volume after mounting temporarily the PV to temporary pod as follows.
oc run tmp-pod --image=registry.access.redhat.com/rhel7 -- tail -f /dev/null
oc set volume dc/tmp-pod --add -t pvc --name=new-registry --claim-name=new-registry --mount-path=/mountpath
You can see the volume contents mounted above configuration via tmp-pod, and you can remove above temporary pod simply after checking.
I hope it help you.
The solution proposed by #Daein Park to display the PersistentVolume(PV/PVC pair) content was not working for me. The command oc run tmp-pod does not create a dc deploymentConfig and it seems impossible to set a volume to a pod.
My solution was to use the following command:
oc run tmp-pod --image=dummy --restart=Never --overrides='{"spec":{"containers":[{"command":["tail","-f","/dev/null"],"image":"registry.access.redhat.com/rhel7","name":"tmp-pod","volumeMounts":[{"mountPath":"/mountpath","name":"volume"}]}],"volumes":[{"name":"volume","persistentVolumeClaim":{"claimName":"pv-clain"}}]}}'
NOTE2: The --image=dummy is only provided to make the oc run command happy, anyway the image field is overridden the json.
Finally, to list the content of the mounted volume:
oc rsh tmp-pod ls /mountpath
As the json content is not easy to read in the command line, here is what it is provided to the --overrides parameter:
{
"spec": {
"containers": [{
"command": ["tail", "-f", "/dev/null"],
"image": "registry.access.redhat.com/rhel7",
"name": "tmp-pod",
"volumeMounts": [{
"mountPath": "/mountpath",
"name": "volume"
}
]
}
],
"volumes": [{
"name": "volume",
"persistentVolumeClaim": {
"claimName": "pv-clain"
}
}
]
}
}

Restarting a MySQL server managed by Ambari

I have a scenario where I need to change several parameters of a hadoop cluster managed by Ambari to document performance of a particular application. The change in the configs entails a restart of the affected components.
I am using the Ambari REST API for achieving this. I figured out how to do this for all service components of hadoop. I' am not sure whether the API provides a way to restart the MySQL server that Hive uses.
I have the following questions:-
Is it the case that a mere stop and start of mysqld on the appropriate machine is enough to ensure that the required configuration changes are recognized by Ambari and the application?
I chose the 'New MySQL database' option while installing Hive via Ambari. Does this mean that restarts are reflected in Ambari only when it is carried out from the Ambari UI?
Your inputs would be highly appreciated.
Thanks!
Found a solution to the problem. I used the following commands using the Ambari REST API for changing configurations and restarting services from the backend.
Login to the host on which the ambari server is running and use the already provided config.sh script as described below.
Modifying configuration files
#!/bin/bash
CLUSTER_NAME=$1
CONFIG_FILE=$2
PROPERTY_NAME=$3
PROPERTY_VALUE=$4
/var/lib/ambari-server/resources/scripts/configs.sh -port <ambari-server-port> set localhost $1 $2 "$3" "$4"
where CONFIG_FILE can take values like tez-site, mapred-site, hadoop-site, hive-site etc. PROPERTY_NAME and PROPERTY_VALUE should be set to values relevant to the specified CONFIG_FILE.
Restarting host components
curl -uadmin:admin -H 'X-Requested-By: ambari' -X POST -d '
{
"RequestInfo":{
"command":"RESTART",
"context":"Restart MySQL server used by Hive Metastore on node3.cluster.com and HDFS client on node1.cluster.com",
"operation_level":{
"level":"HOST",
"cluster_name":"c1"
}
},
"Requests/resource_filters":[
{
"service_name":"HIVE",
"component_name":"MYSQL_SERVER",
"hosts":"node3.cluster.com"
},
{
"service_name":"HDFS",
"component_name":"HDFS_CLIENT",
"hosts":"node1.cluster.com"
}
]
}' http://localhost:<ambari-server-port>/api/v1/clusters/c1/requests
Reference Links:
Restarting components
modifying configurations
Hope this helps!