I met a problem about the runit service bootstrap. The service will setup and curl an external service endpoint to get its data. It will restart over and over again and keep sending requests until the data is ready for the external service.
So I thought there is remediation to reduce the requests which delay the runit service to run the script for the first time. But I could not find any way to delay the runit service. Does runit support delay its service to the first start? Or the solution has any improvement?
BTW, the service will setup with system boot.
You might try changing the runlevel of runit so that it doesn't start too soon, but that depends on the dependent process running first. A better solution, described in the documentation, is to use the fact that runit will try to start a service again if it dies, so you can do the following in your run script:
sv start dependent_service || exit 1
# my service code
This will ensure that dependent_service is started first.
Related
A service start by Upstart or systemd and I want hot update it. After update, the new service process is running and then kill the old service process that start by upstart or systemd. A at last, the new process can be monitored by the upstart or systemd as the old process.
You didn't say what your service does. I am answering for the easy case, a network service with short-lived connections, such as an HTTP server.
Have systemd own the socket. Search for "systemd socket activation" (stackoverflow search). I describe how to do it in Go here: https://www.darkcoding.net/software/systemd-socket-activation-in-go/
While the service is running, replace the binary on disk with the new one.
systemctl restart <myservice>
In practice there will often also be some state in your service you will need to persist on shutdown, and load on startup.
The service shutting down might need to wait a brief amount of time until all it's requests complete.
For the more difficult case with many long-lived TCP connections (such as an XMPP server), it's no longer about systemd, you have to have your old and new processes co-ordinate to pass the sockets from one to another. I describe it in Online upgrades in Go, but it's a lot more work.
This Monday 24th, I had a problem with a container and Secure Gateway Client in Bluemix. The container was stopped and SecureGatewayClient was inhibited (it answered error 500 but it showed Started)
Is it possible to send an alert for a Container of Bluemix, for example, the alert will send an email or call an API if the container will stop?
In the case of SecureGatewayClient, I think to monitor the services through the SecureGateway, every 5 minutes I will test the services, but I can accept more ideas...
I can't really speak to potential container issues, but I can provide some details on how the Secure Gateway Client works. The Secure Gateway Client runs as a clustered process where the actual connective pieces are worker processes beneath a single management process. Because of this, if the worker process goes down, the container is essentially none the wiser as long as the management process is still running, as the management process is the entry point for the container.
The Secure Gateway Client supports a --service option that will cause the management process to monitor the worker count. Should the worker count reach 0, the manager will create new workers with the credentials passed on startup.
For example, starting with:
ibmcom/secure-gateway-client myGatewayID -t myGatewaySecurityToken --service
would spawn a worker that will attempt to connect to myGatewayID. Should that worker process terminate for some reason, the management process would create a new worker within 60s as a replacement.
I am attempting to profile an application standalone (i.e. on the same machine as a developer). I'm not sure I'm configuring it right, but I do:
NSOLID_PROXY=0.0.0.0:0 npm run myserverlauncher
The application fires up and uses a random port for NSolid
Now, I want to fire up the nsolid console, and it starts but cannot find my machine. I tried:
npm start
NSOLID_PROXY=0.0.0.0:0 npm start
NSOLID_PROXY=0.0.0.0:47020 npm start (using the port given during launch)
None of these can discover my server.
Any clues on how to troubleshoot the standalone configuration?
To avoid overload on your application when profiling you don't connect directly to N|Solid. We designed a Hub for use to gathering the information for profiling without any overload.
You'll need a etcd server running and the N|Solid Hub. Afterwards you point your application to connect to the Hub using NSOLID_HUB env var (note that NSOLID_PROXY is wrong).
We have a very complete guide to run N|Solid in a standalone and development environment, take a look and also check out the scripts used there to make it all work out of the box.
Feel free to reach us anytime!
I wish to know if its possible to post process traps and events that zabbix server would have received from zabbix agents . I am hoping there is some configuration which I don't know of .
Since you don't give more details, my assumption will be that you want to do something in case of a certain event. Most probably a trigger. Like a service went down or there are too many open connections. These can be handled by using zabbix's Actions to intercept an event.
The following operation depends on what you have to do, it can be a remote command (executed on remote agent) or a script executed by server.
The remote command is a straight forward concept working OOTB. Just follow the manual and howtos.
Running something on the server isn't there, but you can trick zabbix to do just that by using custom alert scripts, which are just scripts launched by the server process. Then create a send message operation that uses your custom alert script and off you go.
I have different sites running with 4 to 5 server at each location. All the locations have one monitoring server with Nagios. Now I want to create a central location and want to combine all the nagios services running at each location. Can anyone please point me to some documentation for these type of jobs.
There are two approaches that you can take.
Install a new Nagios core as you did at each location and perform active checks on each of the remote hosts. You'll likely end up installing NRPE on each of the remote hosts at each location and can read this document for the details: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf. If your remote servers are Windows servers, you can use NSClient to much of the same things that NRPE does for Linux hosts. This effectively centralizes your monitoring server. I also wrote some how-to style entries for using NRPE to run privileged commands http://blog.gnucom.cc/?p=479 or to run event handlers http://blog.gnucom.cc/?p=458. If you get tired of installing NRPE, you can use my script here http://blog.gnucom.cc/?p=185. I also have instructions to install NSClient here http://blog.gnucom.cc/?p=201.
Install a new Nagios core as you did at each location and perform passive checks by instructing the remote Nagios cores to feed their results to the new central Nagios core's passive command file. I haven't done this myself, so I'm going to point you to the communities documentation here http://nagios.sourceforge.net/docs/2_0/passivechecks.html. You could probably look at my event handler post to set up event handlers that send checks to the main server.
From my personal experience, the first option I mentioned is easier to implement, and is far easy to administer. However, as your server fleet grows you'll start seeing major CPU bottlenecks with the main Nagios core. This is where passive checks would become beneficial, as the main Nagios core simply waits for critical checks to be sent to it rather than having to check them itself.
Hope this helps. :)
A centralized view tool may be what you are looking for. There are a number of different options available.
Nagiosfusion
MK Livestatus
Nagcen
Thruk