When using Bamboo there is a build notifier (mostly a timer) that sends an e-mail notification when your build has been running unusually long. I've hunted around and could not find a similar feature for Hudson. Does Hudson have a build hung notifier or suitable plugin?
Build time-out plugin helps you to set a time-out value and hudson terminates the build if time-out is exceeded. Though not exactly what you are looking for, hope this solves your problem to some extent.
Hudson Bulit Timeout Plugin
Related
So I installed a single host Openshift OKD v3.11 cluster. I installed it on a VM running Centos 7.8.2003.
It seems to have installed ok except that it continually streams verbose logs to /var/log/messages. Around 5 logs per second and all seem to be about throttling requests. Example of a typical log message:
******Jun 13 15:49:13 centos7 journal: I0613 14:49:13.011402 1 request.go:485] Throttling request took 196.341689ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-service-cert-signer/serviceaccounts/service-serving-cert-signer-sa*****
The only reference I have managed to find is a question here but the access to the discussion is only available to those with deep pockets.
https://access.redhat.com/solutions/3348921
I assume these logs are nothing to worry about and so my main question is what is the "best"/cleanest/simplest/easiest way to ensure the Openshift cluster doesn't continue to fill up /var/log/messages but will still log any important messages there?
I would recommend looking at the root cause for this behavior. These messages indicate that there are a lot of requests coming to your API. Typically this is due to some application performing calls in a tight loop leading to this many messages. In your case check your openshift-service-cert-signer if you can see any warnings or an abnormal amount of log messages.
If you want to get rid of the throttling messages, you can increase the amount of Queries per second (QPS) for the API server: Recommended Practices for OKD Master Hosts (lower part).
The only reference I have managed to find is a question here but the access to the discussion is only available to those with deep pockets. https://access.redhat.com/solutions/3348921
I do not understand why you're saying that, as I can access that document with my free Red Hat account without any subscriptions. Have you tried with a free account as it says on the site?
Simon's answer was helpful but I've finally got to the bottom of this.
The problem was simply that the version of Docker I had installed was old. At the time of writing the latest version of Centos is 7.8.2003 and if you install that and then simply run "yum install docker" hoping that you'll get something at least reasonably new and certainly compatible with the rest of the linux installation, you'll probably be making a mistake.
The right thing to do is to follow the simple steps here:
https://docs.docker.com/engine/install/centos/
The reason I found the problem was because excessive logging of my openshift cluster wasn't the only issue. I started seeing strange behaviour of other containers. A process of trial and error narrowed down the issue to the default Centos version of docker. Once I followed the page above all my problem vanished including the original problem of /var/log/message getting hammered by openshift containers.
The main reason I decided to answer my own question was because surely someone else is going to be as impatient/thick as me and simply install Centos7 then try "yum install docker" without knowing they're about to enter a world of pain.
I've built my container multiple times successfully, as noted by the image below. Each time it remains on 99% for > 20+ mins AFTER saying 'Finished: SUCCESS' in the logs. It never makes it past this. I cant kick off the deploy phase until the build registers completion. Is there a way to get past this hang?
I've got no notable errors in the console. The build is based on the registry.ng.bluemix.net/ibmnode:latest image, runs an apache2 server with some Node.js processes that run during the build phase. And lastly, it kicks off a bash script to run apache2 in the foreground.
I just checked my toolchain and wasn't able to reproduce this problem. Please try again, it might have been a transient issue with the toolchains.
If the problem persists, it might have to do with how you have your build script setup. If you are spawning processes and leaving them running, that could be stopping the build from finishing.
How does one keep OpenShift gears up-to-date? For example, updates to:
The Linux kernel
Important components/libraries like libc
Apache
Apache modules like mod_wsgi
Python
Python packages
Does OpenShift automatically update these and then restart the gear (or reboot the node)? Or does OpenShift send email notifications and the end-user can restart the gear during maintenance windows? What is the model?
What got me thinking about this was back in January there was a remote-code-execution bug in Ruby on Rails that everyone had to patch immediately.
This FAQ seems to suggest that some level of upgrades happen automatically, but it isn’t clear whether this only applies to the OpenShift-specific code, or also other components like the kernel, Apache, etc.
I can tell you from my experience that changes to the openshift system are not always automatic. They had a change about 10 days ago and I'm still tracking down what they did to make my app run correctly. As far as I know there was no email sent. I did find a blog post of some of the major changes, not all. Of course, they introduced at least one bug that I know of. YMMV
My experiences over the last few weeks have been the following:
Last week there seemed to be an unannounced reboot of the server. I detected this by logging from a custom action hook. I didn't receive any email about it and I didn't see any notice at https://twitter.com/openshift_ops or https://openshift.redhat.com/app/status.
This week, there was the Heartbleed OpenSSL vulnerability and it seems like some gears were restarted. I didn't receive any email about it, Twitter didn't show anything, but there was information on the status page.
I am calling a perl script using my hudson job.
This script is responsible for getting data from cvs and svn and then running
an operation on this data.
Now during the weekend this script just didn't stop running.
This happened because a particular cvs location was locked and the script was waiting for the locks to be released.
How can I make hudson notify me in such scenarios?
Check out the build timeout plugin.
How can I check if a build is running for more than X minutes? Does Hudson have some XML API for it? I want the API to be independent of the job, so I can call a URL from my remote machine to check if Hudson is executing any build or any job for more than x minutes.
It may be the case that the job that Hudson is executing is stuck or hanged. How can I identify that?
My idea is if a build is taking too long I will just restart the system or kill all its processes
I will call a URL, say http://myhudsonserver/something, and I should get some XML by which I can make out.
You could use the build timeout plugin.
You may find the Hudson Remote Access API of use.