Hot update a service that start by upstart or systemd - auto-update

A service start by Upstart or systemd and I want hot update it. After update, the new service process is running and then kill the old service process that start by upstart or systemd. A at last, the new process can be monitored by the upstart or systemd as the old process.

You didn't say what your service does. I am answering for the easy case, a network service with short-lived connections, such as an HTTP server.
Have systemd own the socket. Search for "systemd socket activation" (stackoverflow search). I describe how to do it in Go here: https://www.darkcoding.net/software/systemd-socket-activation-in-go/
While the service is running, replace the binary on disk with the new one.
systemctl restart <myservice>
In practice there will often also be some state in your service you will need to persist on shutdown, and load on startup.
The service shutting down might need to wait a brief amount of time until all it's requests complete.
For the more difficult case with many long-lived TCP connections (such as an XMPP server), it's no longer about systemd, you have to have your old and new processes co-ordinate to pass the sockets from one to another. I describe it in Online upgrades in Go, but it's a lot more work.

Related

Live changing worker queue in Apache Airflow

In Apache Airflow, I can specify the worker queue upon starting the worker.
I have a use case where I would like to change the queue the worker is using live so that the existing worker will pull new jobs from that queue.
Is this possible?
As far as I can tell, no. You can launch a worker with airflow worker -q <queue>, but there doesn't seem to be a way to kill that worker, or interact with it at all once it's spawned.

How does Google Compute Engine decide what instances to shut down when autoscaling?

I'm creating a managed instance group with autoscaling in GCE. When a lot of work is queued up new instances will be created which start doing work.
Let's say each chunk of work takes 10 minutes, could it happen that GCE decides to shut down an instance that still has work in progress?
Autoscaler will immediately terminate instance if the health check condition meets.
However, you can use a shutdown script to control the termination. A shutdown script will run, on a best-effort basis, in the brief period between when the termination request is made and when the instance is actually terminated. During this period, Compute Engine will attempt to run your shutdown script to perform any tasks you provide in the script. You can read more about the autoscaler decision in this document. You can read about using shutdown script and its limitation at this link.
Also if these instances are offering backend service then it is good to enable connection draining. You can enable connection draining on backend services to ensure minimal interruption to your users when an instance is deleted automatically by an autoscaler or manually removed from an instance group. You can find more at this link about enabling connection draining.

Does runit support to delay to first start

I met a problem about the runit service bootstrap. The service will setup and curl an external service endpoint to get its data. It will restart over and over again and keep sending requests until the data is ready for the external service.
So I thought there is remediation to reduce the requests which delay the runit service to run the script for the first time. But I could not find any way to delay the runit service. Does runit support delay its service to the first start? Or the solution has any improvement?
BTW, the service will setup with system boot.
You might try changing the runlevel of runit so that it doesn't start too soon, but that depends on the dependent process running first. A better solution, described in the documentation, is to use the fact that runit will try to start a service again if it dies, so you can do the following in your run script:
sv start dependent_service || exit 1
# my service code
This will ensure that dependent_service is started first.

Alerts for containers in bluemix

This Monday 24th, I had a problem with a container and Secure Gateway Client in Bluemix. The container was stopped and SecureGatewayClient was inhibited (it answered error 500 but it showed Started)
Is it possible to send an alert for a Container of Bluemix, for example, the alert will send an email or call an API if the container will stop?
In the case of SecureGatewayClient, I think to monitor the services through the SecureGateway, every 5 minutes I will test the services, but I can accept more ideas...
I can't really speak to potential container issues, but I can provide some details on how the Secure Gateway Client works. The Secure Gateway Client runs as a clustered process where the actual connective pieces are worker processes beneath a single management process. Because of this, if the worker process goes down, the container is essentially none the wiser as long as the management process is still running, as the management process is the entry point for the container.
The Secure Gateway Client supports a --service option that will cause the management process to monitor the worker count. Should the worker count reach 0, the manager will create new workers with the credentials passed on startup.
For example, starting with:
ibmcom/secure-gateway-client myGatewayID -t myGatewaySecurityToken --service
would spawn a worker that will attempt to connect to myGatewayID. Should that worker process terminate for some reason, the management process would create a new worker within 60s as a replacement.

Google Compute Engine - Where is the STOPPED instance status?

Yesterday I tried to delete an Instance by invoking the "halt" command through SSH. Unlike AWS, GCE does not allow us to choose the behavior of the VM shutdown and stop the instance by default (the instance status is TERMINATED).
Today I was browsing the Google Compute Engine REST API documentation and I found the following description :
status : [Output Only] The status of the instance. One of the following values: PROVISIONING, STAGING, RUNNING, STOPPING, STOPPED, TERMINATED.
What is this "STOPPPED" status ? Both the instances stopped through the Web console or the "halt" command have the "TERMINATED" status.
Any ideas ?
This STOPPED state is a new feature added a few weeks ago which you can reach via the compute engine API.
This method stops a running instance, shutting it down cleanly, and allows you to restart the instance at a later time. Stopped instances do not incur per-minute, virtual machine usage charges while they are stopped, but any resources that the virtual machine is using, such as persistent disks and static IP addresses,will continue to be charged until they are deleted. For more information, see Stopping an instance.
I think this is similar to the AWS option you mention.
For anyone stumbling on this question years later, a detailed lifecycle diagram of instances can be found here
There is no STOPPED status anymore, instances are going from STOPPING to TERMINATED, whatever the stopping method is.
However a new state, that may be closer to what halt does, has been introduced since: SUSPENDED. It's still in beta though, and not sure that invoking halt would induce this state or simply terminates the instance.
See here for more details