Is there a time limit for the startup script to finish before it stops? - google-compute-engine

I need to create a VM instance in google compute engine with a startup script that takes 30 minutes, but it never finishes, it stops around 10 minutes after the instance boots. Is there a timeout? Is there another alternative to accomplish what I need to do? Thanks!

Given the additional clarification in the comments:
My script downloads another script and then executes it, and what that script does is download some big files, and then compute some values based on latitude/longitude. Then, when the process is finished, the VM is destroyed.
My recommendation would be to run the large download and processing asynchronously rather than synchronously. The reason being is that if it's synchronous, it's part of the VM startup (in the critical path), and the VM monitoring infrastructure notices that the VM is not completing its startup phase within a reasonable amount of time and is terminating it.
Instead, take the heavy-duty processing out of the critical path and do it in the background, i.e., asynchronously.
In other words, the startup script currently probably looks like:
# Download the external script
curl [...] -o /tmp/script.sh
# Run the file download, computation, etc. and shut down the VM.
/tmp/script.sh
I would suggest converting this to:
# Download the external script
curl [...] -o /tmp/script.sh
# Run the file download, computation, etc. and shut down the VM.
nohup /tmp/script.sh &
What this does is start the heavy processing in the background, but also disconnect it from the parent process such that it is not automatically terminated when the parent process (the actual startup script) is terminated. We want the main startup script to terminate so that the entire VM startup phase is marked completed.
For more info, see the Wikipedia page on nohup.

Related

gUnicorn with systemd Watchdog

We have a requirement to monitor and try to restart our gUnicorn/Django app if it goes down. We're using gunicorn 20.0.4.
I have the following nrs.service running fine with systemd. I'm trying to figure out if it's possible to integrate systemd's watchdog capabilities with gUnicorn. Looking through the source I don't see anywhere a sd_notify("WATCHDOG=1") is being called so I'm thinking that no, gunicorn doesn't know how to keep systemd aware that it's up (it calls sd_notify("READY=1...") at startup but in its run loop there's no signal being sent saying it's still running)
Here's the nrs.service file. I have commented out the watchdog vars because it obviously sends my service into a failed state shortly after it starts.
[Unit]
Description=Gunicorn instance to serve NRS project
After=network.target
[Service]
WorkingDirectory=/etc/nrs
Environment="PATH=/etc/nrs/bin"
ExecStart=/etc/nrs/bin/gunicorn --error-logfile /etc/nrs/logs/gunicorn_error.log --certfile=/etc/httpd/https_certificate/nrs.cer --keyfile=/etc/httpd/https_certificate/server.key --access-logfile /etc/nrs/logs/gunicorn_access.log --capture-output --bind=nrshost:8800 anomalyalerts.wsgi
#WatchdogSec=15s
#Restart=on-failure
#StartLimitInterval=1min
#StartLimitBurst=4
[Install]
WantedBy=multi-user.target
So systemd watchdog is doing its thing, just looks like out of the box gunicorn doesn't support it. Not very familiar with 'monkey-patching' but I'm thinking if we want to use this method of monitoring, I'm going to have to implement some custom code? Other thought was just to have a watch command check the service and try to restart it, which might be easier.
Thanks
Jason
monitor and try to restart our gUnicorn/Django app if it goes down
systemd's watchdog will not help in the described case. The reason is that the the watchdog is intended to monitor the main service process, which does not run your app directly.
The Gunicorn's master process, which is the main service process from the systemd's perspective, is a loop that manages the worker processes. Your app is running inside the worker process, so if anything happens there, the worker process is the one that should be restarted, not the master process.
Worker processes' restart is handled by Gunicorn automatically (see timeout setting). As for the main service process, in a rare case when it dies, the Restart=on-failure option can restart it even without a watchdog (see the docs for details on how it behaves).

What does the command line arguments for PM2 startup mean precisely

I am a little confused about start up scripts and the command line options. I am building a small raspberry pi based server for my node applications. In order to provide maximum protection against power failures and flash write corruption, the root file system is read only, and that embraces the home directory of my main user, were the production versions of my apps (two of them) are stored. Because the .pm2 directory here is no good for logs etc I currently set PM2_HOME environment variable to a place in /var (which has 512kb unused space around it to ensure writes to i. The eco-system.json file reads this environment variable also to determine where to place its logs.
In case I need to, I also have a secondary user with a read write home directory in another (protected by buffer space around it) partition. This contains development versions of my application code which because of the convenience of setting environments up etc I also want to monitor with PM2. If I need to investigate a problem I can log in to that user and run and test the application there.
Since this is a headless box, and with watchdog and kernel panic restarts built in, I want pm2 to start during boot and at minimum restart the two production apps. Ideally it should also starts the two development versions of the app also but I can live without that if its impossible.
I can switch the read only root partition to read/write - indeed it does so automatically when I ssh into my production user account. It switches back to read only automatically when I log out.
So I went to this account to try and create a startup script. It then said (unsurprisingly) that I had to run a sudo command like so:-
sudo su -c "env PATH=$PATH:/usr/local/bin pm2 startup ubuntu -u pi --hp /home/pi"
The key issue for me here is the --hp switch. I went searching for some clue as to what it means. Its clearly a home directory, but it doesn't match PM2_HOME - which is set to /var/pas in my case to take it out of the read only area. I don't want to try and and spray my home directory with files that shouldn't be there. So am asking for some guidance here
I found out by experiment what it does with an "ubuntu" start up script. It uses it to set PM2_HOME in the script by appending "/.pm2" to it.
However there is nothing stopping you editing the script once it has created it and setting PM2_HOME to whatever you want.
So effectively its a helper for the script, but only that and nothing more special.

How to kill a zombie process which always initiated whenever geany does

I am using Geany for editing a large text data in Ubuntu (600MB or so). But after a while, a zombie process starts whenever I start Geany and it couldn't load the file so that I edit the content. It took 100% of my CPU while Geany runs. I try to kill the zombie process with the following:
kill -HUP `ps -A -ostat,ppid,pid,cmd | grep -e '^[Zz]' | awk '{print $2}'`
But once I start the application again, the zombie process starts automatically. Also tried logout.
What can I do to kill the zombie process once and for all? Thanks!
You can't kill a zombie process since it's already dead.
On Unix and Unix-like computer operating systems, a zombie process or
defunct process is a process that has completed execution (via the
exit system call) but still has an entry in the process table: it is a
process in the "Terminated state".
(from Wikipedia)
It's simply an entry in the process table with no associated process. It exists because the spawning (parent) process has yet to collect the return status (via wait()). Other than that it will consume no resources.
So I suspect the parent process is either busy or not working properly. I would first of all try to identify that process (via the PPID column in ps, for example)
EDIT: I note there's a geany issue raised/resolved around this

SSIS Execute Process task ability to kill children?

We are using SQL 2005, and the bundled SSIS.
An Execute Process task is running a standard Windows .BAT batch file.
Inside that batch file, a Java process may be started with something like:
%javapath%\java.exe -cp %classpath% com.mycompany.ToDo
We put a TimeOut value in the task, expecting it to kill the entire task if the job ran too long.
It does appear to terminate the batch file, but not the child Java program.
Options, or ways to kill the entire process tree?
If You are willing to write some code, maybe You could have use of these:
process tree
or
Kill process tree
If You do find solution on the first link, please vote up for both question and answer You used.
Note that code can be utilized from Script Task or You can build an executable program and start it from Execute Process Task.

On a Hudson master node, what are the .tmp files created in the workspace-files folder?

Question:
In the path HUDSON_HOME/jobs/<jobname>/builds/<timestamp>/workspace-files, there are a series of .tmp files. What are these files, and what feature of Hudson do they support?
Background
Using Hudson version 1.341, we have a continuous build task that runs on a slave instance. After the build is otherwise complete, including archiving the artifacts, task scanner, etc., the job appears to hang for a long period of time. In monitoring the master node, I noted that many .tmp files were being created and modified under builds//workspace=files, and that some of them were very large. This appears to be causing the delay, as the job completed at the same time that files in this path stopped changing.
Some key configuration points of the job:
It is tied to a specific slave node
It builds in a 'custom workspace'
It runs the Task Scanner plugin on a portion of the workspace to find "todo" items
It triggers a downstream job that builds in the same custom workspace on the same slave node
In this particular instance, the .tmp files were being created by the Task Scanner plugin. When tasks are found, the files in which they are found are copied back to the master node. This allows the master node to serve those files in the browser interface for Tasks.
Per this answer, it is likely that this same thing occurs with other plug-ins, too.
Plug-ins known to exhibit this behavior (feel free to add to this list)
Task Scanner
Warnings
FindBugs
There's an explanation on the hudson users mailing list:
...it looks like the warnings plugin copies any files that have compiler warnings from the workspace (possibly on a slave) into a "workspace-files" directory within HUDSON_HOME/jobs//builds/
The files then, I surmise, get processed resulting in a "compiler-warnings.xml" file within the HUDSON_HOME/jobs//builds/
I am using the "warnings" plugin, and I suspect it's related to that.