Detect when instance has completed setup script? - google-compute-engine

I'm launching instances using the following command:
gcutil addinstance \
--image=debian-7 \
--persistent_boot_disk \
--zone=us-central1-a \
--machine_type=n1-standard-1 \
--metadata_from_file=startup-script:install.sh \
instance-name
How can I detect when this instance has completed it's install script? I'd like to be able to place this launch command in a larger provisioning script that then goes on to issue commands to the server that depend on the install script having been successfully completed.

There is a number of ways: sending yourself an email, uploading to Cloud Storage, sending a jabber message, ...
One simple, observable way IMHO is to add a logger entry at the end of your install.sh script (I also tweak the beginning for symmetry). Something like:
#!/bin/bash
/usr/bin/logger "== Startup script START =="
#
# Your code goes here
#
/usr/bin/logger "== Startup script END =="
You can check then if the script started or ended in two ways:
From your Developer's Console, select "Projects" > "Compute" > "VM Instances" > your instance > "Serial console" > "View Output".
From CLI, by issuing a gcutil getserialportoutput instance-name.

I don't know of a way to do all of this within gcutil addinstance.
I'd suggest:
Adding the instance via gcutil addinstance, making sure to use the --wait_until_running flag to ensure that the instance is running before you continue
Copying your script over to the instance via something like gcutil push
Using gcutil ssh <instance-name> </path-to-script/script-to-run> to run your script manually.
This way, you can write your script in such a way that it blocks until it's finished, and the ssh command will not return until your script on the remote machine is done executing.

There really are a lot of ways to accomplish this goal. One that tickles my fancy is to use the metadata server associated with the instance. Have the startup script set a piece of metadata to "FINISHED" when the script is done. You can query the metadata server with a hanging GET that will only return when the metadata updates. Just use gcutil setmetadata
from within the script as the last command.
I like this method because the hanging GET just gives you one command to run, rather than a poll to run in a loop, and it doesn't involve any services besides Compute Engine.

One more hacky way:
startup_script_finished=false
while [[ "$startup_script_finished" = false ]]; do
pid=$(gcloud compute ssh $GCLOUD_USER#$GCLOUD_INSTANCE -- pgrep -f "\"/usr/bin/python /usr/bin/google_metadata_script_runner --script-type startup\"")
if [[ -z $pid ]]; then
startup_script_finished=true
else
sleep 2
fi
done

One possible solution would be to have your install script create a text file in a cloud storage bucket, as the last thing it does, using the host name as the filename.
Your main script that did the original gcutil addinstance command could then be periodically polling the contents of the bucket (using gsutil ls) until it sees a file with a matching name and then it would know the install had completed on that instance.

Related

Deploying an application with database inside mysql container inside docker [duplicate]

I'm trying to wrap my head around Docker from the point of deploying an application which is intended to run on the users on desktop. My application is simply a flask web application and mongo database. Normally I would install both in a VM and, forward a host port to the guest web app. I'd like to give Docker a try but I'm not sure how I'm meant to use more than one program. The documentations says there can only be only ENTRYPOINT so how can I have Mongo and my flask application. Or do they need to be in separate containers, in which case how do they talk to each other and how does this make distributing the app easy?
There can be only one ENTRYPOINT, but that target is usually a script that launches as many programs that are needed. You can additionally use for example Supervisord or similar to take care of launching multiple services inside single container. This is an example of a docker container running mysql, apache and wordpress within a single container.
Say, You have one database that is used by a single web application. Then it is probably easier to run both in a single container.
If You have a shared database that is used by more than one application, then it would be better to run the database in its own container and the applications each in their own containers.
There are at least two possibilities how the applications can communicate with each other when they are running in different containers:
Use exposed IP ports and connect via them.
Recent docker versions support linking.
I strongly disagree with some previous solutions that recommended to run both services in the same container. It's clearly stated in the documentation that it's not a recommended:
It is generally recommended that you separate areas of concern by using one service per container. That service may fork into multiple processes (for example, Apache web server starts multiple worker processes). It’s ok to have multiple processes, but to get the most benefit out of Docker, avoid one container being responsible for multiple aspects of your overall application. You can connect multiple containers using user-defined networks and shared volumes.
There are good use cases for supervisord or similar programs but running a web application + database is not part of them.
You should definitely use docker-compose to do that and orchestrate multiple containers with different responsibilities.
I had similar requirement of running a LAMP stack, Mongo DB and my own services
Docker is OS based virtualisation, which is why it isolates its container around a running process, hence it requires least one process running in FOREGROUND.
So you provide your own startup script as the entry point, thus your startup script becomes an extended Docker image script, in which you can stack any number of the services as far as AT LEAST ONE FOREGROUND SERVICE IS STARTED, WHICH TOO TOWARDS THE END
So my Docker image file has two line below in the very end:
COPY myStartupScript.sh /usr/local/myscripts/myStartupScript.sh
CMD ["/bin/bash", "/usr/local/myscripts/myStartupScript.sh"]
In my script I run all MySQL, MongoDB, Tomcat etc. In the end I run my Apache as a foreground thread.
source /etc/apache2/envvars
/usr/sbin/apache2 -DFOREGROUND
This enables me to start all my services and keep the container alive with the last service started being in the foreground
Hope it helps
UPDATE: Since I last answered this question, new things have come up like Docker compose, which can help you run each service on its own container, yet bind all of them together as dependencies among those services, try knowing more about docker-compose and use it, it is more elegant way unless your need does not match with it.
Although it's not recommended you can run 2 processes in foreground by using wait. Just make a bash script with the following content. Eg start.sh:
# runs 2 commands simultaneously:
mongod & # your first application
P1=$!
python script.py & # your second application
P2=$!
wait $P1 $P2
In your Dockerfile, start it with
CMD bash start.sh
I would recommend to set up a local Kubernetes cluster if you want to run multiple processes simultaneously. You can 'distribute' the app by providing them a simple Kubernetes manifest.
They can be in separate containers, and indeed, if the application was also intended to run in a larger environment, they probably would be.
A multi-container system would require some more orchestration to be able to bring up all the required dependencies, though in Docker v0.6.5+, there is a new facility to help with that built into Docker itself - Linking. With a multi-machine solution, its still something that has to be arranged from outside the Docker environment however.
With two different containers, the two parts still communicate over TCP/IP, but unless the ports have been locked down specifically (not recommended, as you'd be unable to run more than one copy), you would have to pass the new port that the database has been exposed as to the application, so that it could communicate with Mongo. This is again, something that Linking can help with.
For a simpler, small installation, where all the dependencies are going in the same container, having both the database and Python runtime started by the program that is initially called as the ENTRYPOINT is also possible. This can be as simple as a shell script, or some other process controller - Supervisord is quite popular, and a number of examples exist in the public Dockerfiles.
Docker provides a couple of examples on how to do it. The lightweight option is to:
Put all of your commands in a wrapper script, complete with testing
and debugging information. Run the wrapper script as your CMD. This is
a very naive example. First, the wrapper script:
#!/bin/bash
# Start the first process
./my_first_process -D
status=$?
if [ $status -ne 0 ]; then
echo "Failed to start my_first_process: $status"
exit $status
fi
# Start the second process
./my_second_process -D
status=$?
if [ $status -ne 0 ]; then
echo "Failed to start my_second_process: $status"
exit $status
fi
# Naive check runs checks once a minute to see if either of the processes exited.
# This illustrates part of the heavy lifting you need to do if you want to run
# more than one service in a container. The container will exit with an error
# if it detects that either of the processes has exited.
# Otherwise it will loop forever, waking up every 60 seconds
while /bin/true; do
ps aux |grep my_first_process |grep -q -v grep
PROCESS_1_STATUS=$?
ps aux |grep my_second_process |grep -q -v grep
PROCESS_2_STATUS=$?
# If the greps above find anything, they will exit with 0 status
# If they are not both 0, then something is wrong
if [ $PROCESS_1_STATUS -ne 0 -o $PROCESS_2_STATUS -ne 0 ]; then
echo "One of the processes has already exited."
exit -1
fi
sleep 60
done
Next, the Dockerfile:
FROM ubuntu:latest
COPY my_first_process my_first_process
COPY my_second_process my_second_process
COPY my_wrapper_script.sh my_wrapper_script.sh
CMD ./my_wrapper_script.sh
I agree with the other answers that using two containers is preferable, but if you have your heart set on bunding multiple services in a single container you can use something like supervisord.
in Hipache for instance, the included Dockerfile runs supervisord, and the file supervisord.conf specifies for both hipache and redis-server to be run.
If a dedicated script seems like too much overhead, you can spawn separate processes explicitly with sh -c. For example:
CMD sh -c 'mini_httpd -C /my/config -D &' \
&& ./content_computing_loop
In docker, there are two ways you can run a program
CMD
ENTRYPOINT
If you want to know the difference between them, please refer here
In CMD/ENTRYPOINT, there are two formats to run a command
SHELL format
EXEC format
SHELL format:
CMD executable_first arg1; executable_second arg1 arg2
ENTRYPOINT executable_first arg1; executable_second arg1 arg2
This version will create a shell and executes above command. Here you can use any shell syntax such as ";", "&", "|", etc. So you can run any number of commands here. If you have complex set of commands to run, you can create separate shell script and use it.
CMD my_script.sh arg1
ENTRYPOINT my_script.sh arg1
EXEC format:
CMD ["executable", "parameter 1", "parameter 2", …]
ENTRYPOINT ["executable", "parameter 1", "parameter 2", …]
Here you can notice that only first parameter is an executable. From the second parameter, everything become an arguments/parameters for that executable.
To run multiple commands in EXEC format
CMD ["/bin/sh", "-c", "executable_first arg1; executable_second"]
CMD ["/bin/sh", "-c", "executable_first arg1; executable_second"]
In above command, we have used shell command as executable to run the command. This is the only way to run multiple commands in EXEC format.
Following are WRONG
CMD ["executable_first parameter", "executable_second parameter"]
ENTRYPOINT ["executable_first parameter", "executable_second parameter"]
CMD ["executable_first", "parameter", ";", "executable_second", "parameter"]
ENTRYPOINT ["executable_first", "parameter", ";", "executable_second", "parameter"]
Can I run multiple programs in a Docker container?
Yes. But with significant risks.
Below is the same answer as above. But with details and a recommended resolution. If you're interested in those.
Not Recommended
Warning. Using the same container for multiple services is not recommended by the Docker community, though. The Docker documentation reads: "It is generally recommended that you separate areas of concern by using one service per container." Source at:
• https://archive.ph/3Roa6#selection-307.2-307.100
• https://docs.docker.com/config/containers/multi-service_container/
If you choose to ignore the recommendation above, you container risk to be with weaker security, increasingly unstable, and in the future a painful growth.
If you are ok with those risks above, the documentation to use one container for multiple services is at:
• https://archive.ph/3Roa6#selection-335.0-691.1
• https://docs.docker.com/config/containers/multi-service_container/
Recommended
If you need a container(s) with stronger security, and more stability, and in the future, scale bigger, as well as better performance, then the Docker community recommends those two steps:
Use one service per Docker container. The end result is that you will have multiple containers.
Use this Docker "Networking" feature to connect any of those containers to your liking.

Create a new google cloud instance using shut-down script

I am trying to use a shutdown-script to create a new instance from within the the instance that is shutting down now.
The script has three tasks,
1. creates an empty file
2. get the name of the new instance to be created
3. generates a name for the next new instance to be spawned
4. creates a new instance from within this instance with the name generated.
Here is the script:
#!/bin/bash
touch /home/ubuntu/newfile.txt
new_instance_name=$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/next_instance_name -H "Metadata-Flavor: Google")
next_instance_name="instance-"$(printf "%04d" $((${new_instance_name: -4}+1)))
gcloud beta compute --project=xxxxxxxxx instances create $new_instance_name --zone=us-central1-c --machine-type=f1-micro --subnet=default --network-tier=PREMIUM --metadata=next_instance_name=$next_instance_name --maintenance-policy=MIGRATE --service-account=XXXXXXXX-compute#developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/cloud-platform --image=image-1 --image-project=xxxxxxxx --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=$new_instance_name
This script is made executable using chmod +xand the file-name of the script is /home/ubuntu/shtudown_script.sh.he metadata shutdown-script for this instance is also /home/ubuntu/shtudown_script.sh.
All parts of the script runs fine when I run it manually from within the instance, so a new file is created and also a new instance is created when the current instance shuts-down.
But when it is invoked from API when I stop the instance, it only creates the file I create using touch command, but no new instance is created as before.
Am I doing something wrong here?
So I was able to reproduce the behavior you described. I ran a bash script similar to the one you have provided as a shutdown script, and it would only create the empty file called "newfile.txt".
I then decided to append the output of the gcloud command to see what was happening. I had to tweak the bash script to fit my project. Here is the bash script I ran to copy the output to a file:
#!/bin/bash
touch /home/ubuntu/newfile.txt
gcloud beta compute --project=xxx instances create instance-6 --zone=us-central1-c --machine-type=f1-micro --subnet=default --maintenance-policy=MIGRATE --service-account=xxxx-compute#developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/cloud-platform --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=instance-6 > /var/output.txt 2>&1
The output I received was the following:
ERROR: (gcloud.beta.compute.instances.create) Could not fetch resource: - Insufficient Permission
This means that my default service account did not have the appropriate scopes to create the VM instance.
I then stopped my VM instance and edited the scopes to give the service account full access as described here. Once I changed the scopes, I started the VM instance back up and then stopped it again. At this point, it successfully created the VM instance called "instance-6". I would not suggest giving the default service full access. I would suggest specifying which scopes it should have, but make sure that it has full access to Compute Engine if you want the shutdown script to work.
If the shutdown script works when you stop the VM instance using the command:
$sudo shutdown -h now
And does not work when stopping the VM instance from the Cloud Console by pressing the “Stop” button, then I suspect this behavior is to be expected.
A shutdown script has a limited period of time to run when you stop a VM instance; however, this limit does not apply if you request the shutdown using the “sudo shutdown” command. You can read more about this behavior here.
If you would like to know more about the shutdown period, you can read about it here.
I already had given my instance proper scope by giving the service account full access (which is a bad practice).
But the actual problem was solved when I reinstalled google-cloud-sdk using
sudo apt-get install google-cloud-sdk
When I was running those scripts before reinstalling gcloud by sshing into the instance it was using the gcloud command from preinstalled directory /snap/bin/gcloud. But when it runs from the startup or shutdown script, for some reason it can not get an access to the preinstalled /snap/bin/ directory, and when I reinstall google cloud sdk using apt-get the gcloud command was being accessed from /usr/bin/gcloud which I think is accessible by the startup or shutdown script.

Running task in the background?

If we are submitting a task to the compute engine through ssh from host machine and if we shut down the host machine is there a way that we can get hold of the output of the submitted task later on when we switch on the host machine?
From the Linux point of view ‘ssh’ and ‘gcloud compute ssh’ are commands like all the others, therefore it is possible to redirect their output to a file while the command is performed using for example >> to redirect and append stdout to a file or 2>> to store stderr.
For example if you run from the first instance 'name1':
$ gcloud compute ssh name2 --command='watch hostname' --zone=XXXX >> output.out
where 'name2' is the second instance, and at some point you shutdown 'name1' you will find stored into output.out the output provided by the command till the shutdown occurred.
Note that there is also the possibility to create shut down scripts, that in this scenario could be useful in order to upload output.out to a bucket or to perform any kind of clean-up operation.
In order to do so you can run the following command
$ gcloud compute instances add-metadata example-instance --metadata-from-file shutdown-script=path/to/script_file
Where the content of the script could be something like
#! /bin/bash
gsutil cp path/output.out gs://yourbucketname
Always keep in mind that Compute Engine only executes shutdown scripts on a best-effort basis and does not guarantee that the shutdown script will be run in all cases.
More Documentation about shutdown scrips if needed.

Google Compute Engine - Clone Instance

I have a GCE instance that I have customised and uploaded various applications to (such as PHP apps running under Apache). I now want to duplicate this instance - i.e. everything on it.
I originally thought clone might do this but I had a play around with it and it only seems to clone the instance config and not anything customised on it.
I've been googling it and it looks like what I need to do is create an image and use this image on a new instance or clone?
Is that correct?
If so, are there any could steps by steps out there to do this?
I had a look at the Google page on images and it talks about having to terminate the instance to do this. I'm a bit wary of this. Maybe it's just the language used in the docs, but I don't want to lose my existing instance.
Also, will everything be stored on the image?
So, for example, will the following all make it onto the image?
MySQL - config & databases schemas & data?
Apache - All installed apps under /var/www/html
PHP - php.ini, etc...
All other server configs/modifications?
You can create a snapshot of the source instance, then create a new instance selecting the source snapshot as disk. It will replicate the server very fast. For other attached disks, you have to create a new disk and copy file by net (scp, rsync etc)
In the Web Console, create a snapshot, then click on the snapshot and over CREATE INSTANCE button, you can customize the settings and then click where it says:
Equivalent REST or command line
and copy the command line, this will be your template.
From this, you can create a a BASH script (clone_instance.sh), I did something like this:
#!/bin/bash -e
snapshot="my-snapshot-name"
gcloud_account="ACCOUNTNUMBER-compute#developer.gserviceaccount.com"
#clone 10 machines
for machine in 01 02 03 04 05 06 07 08 09 10
do
gcloud compute --project "myProject" disks create "instance-${machine}" \
--size "220" --zone "us-east1-d" --source-snapshot "${snapshot}" \
--type "pd-standard"
gcloud compute --project "bizqualify" instances create "webscrape-${machine}" \
--zone "us-east1-d" --machine-type "n1-highmem-4" --network "default" \
--maintenance-policy "MIGRATE" \
--service-account "ACCOUNTNUMBER-compute#developer.gserviceaccount.com" \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring.write","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--tags "http-server","https-server" \
--disk "name=webscrape-${machine},device-name=webscrape-${machine},mode=rw,boot=yes,auto-delete=yes"
done
Now, in your terminal, you can execute your script
sh clone_instance.sh
In case you have other disks attached, the best way without actually unmounting them is changing the path of how they're mounted in /etc/fstab.
If you use the UUID in fstab and use the same disks from snapshots (which will have the same UUIDs) then you can do the cloning without unmounting anything.
Just change each disk in fstab to use UUID like this
UUID=[UUID_VALUE] [MNT_DIR] ext4 discard,defaults,[NOFAIL] 0 2
you can get the UUID from
sudo blkid /dev/[DEVICE_ID]
if you're unsure about your DEVICE_ID you can use
sudo lsblk
to get the list of device ids used by your system.
It's 2021 and this is now very simple:
Click the VM Instance you want to clone
Click "Create Machine Image" at the top
From Machine Images on the left, open your new image and click "Create VM Instance"
This will clone the machine specs and data.
As was mentioned, if the source instance has a secondary disk attached, it is not possible to ssh into the new instance.
I had to take a snapshot of a production instance, so I couldn't unmount the secondary disk without causing disruption.
I was able to fix the problem by creating a disk from the snapshot, mounting the disk on another instance, removing any reference to the secondary disk, i.e., removing the entry from /etc/fstab.
Once I had done that, I was able to use the disk as boot disk in a new instance, and ssh to it.
You can use the GCP Import VM option, to import this machine back to the project.

exporting files from GCE to my local machine

Is there a reverse command for gcutil push basically what I want to do is have a copy of my python files on my local machine so I'm looking for a way to import the files into my local machine exporting them from my google compute engine instance without using GIT or any other source control tool
Yep, there is gcutil pull. Here is the help file:
Local:~ mark$ gcutil help pull
Command line tool for interacting with Google Compute Engine.
Please refer to http://developers.google.com/compute/docs/gcutil/tips for more
information about gcutil usage.
USAGE: gcutil [--global_flags] <command> [--command_flags] [args]
pull Pull one or more files from a VM instance.
Usage: gcutil [--global_flags] pull
[--command_flags] <instance-name> <file-1> ...
<file-n> <destination>
Flags for pull:
gcutil_lib.instance_cmds:
--ssh_arg: Additional arguments to pass to ssh;
repeat this option to specify a list of values
(default: '[]')
--ssh_key_push_wait_time: Number of seconds to wait for updates to
project-wide ssh keys to cascade to the instances within the project
(default: '120')
(an integer)
--ssh_port: TCP port to connect to
(default: '22')
(an integer)
--zone: [Required] The zone for this request.
gflags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on
the command line even if the program does not define a flag with that name.
IMPORTANT: flags in this list that have arguments MUST use the --flag=value
format.
(default: '')
Run 'gcutil --help' to get help for global flags.
Run 'gcutil help' to see the list of available commands.
The Syntax of file uploading to GCE from your Local Machine as following
gcutil push --zone=us-central1-a \
my_instance \
~/local/path
/remote/file1 \
/remote/file2 \
for example in mac
example
gcutil push --zone=us-central1-a \your-instance\ ~/Desktop/Gcloud /home/munish/
The Syntax of file downloading from GCE to your Local Machine as following
gcutil pull --zone=us-central1-a \
my_instance \
/remote/file1 \
/remote/file2 \
~/local/path
for example in mac
example
gcutil pull --zone=us-central1-a \your-instance \ /home/munish/source-folder ~/Desktop/destination-folder