Always sleep on packer provisioning? - packer

On my exploration of Packer I wonder the following:
The docs state (as part of the getting started steps where a Ubuntu image is provisioned to AWS):
Note: The sleep 30 in the example above is very important. Because
Packer is able to detect and SSH into the instance as soon as SSH is
available, Ubuntu actually doesn't get proper amounts of time to
initialize. The sleep makes sure that the OS properly initializes.
It shows an example where a shell provisioner (inline) is the first provisioner to kick in.
Do you always need to sleep 30 before any provisioner is to start, in particular:
When I start the provisioning block with a file provisioner, does it automatically wait until the OS properly initializes?
When I run a script/scripts shell provisioner instead of an inline block of commands, do I need to start the first script with sleep 30?
If so, would a general suggestion be that you always put this on top of your provisioning block:
"provisioners": [
{
"type": "shell",
"inline": [
"sleep 30"
]
},
{...}]

You can run without the sleep, but particularly on AWS it's going to be a crapshoot whether it works or not. Packer builds can be long and complex, and some sleep here and there can greatly improve your success rate. You don't have to run a sleep before every provisioner though, just the first one. After that the OS is up and everything should be running along nicely.
I don't use the sleep command before apt, but my packages were failing all over the place. I'm using the Packer AWS ebs builder. There's a statement in the docs that solved my problems with a very similar strategy - it polls against cloud-init to see that it has finished; cloud-init being the aws init built into the Ubuntu ec2 images produced by canonical.
{
"type": "shell",
"inline": [
"while [ ! -f /var/lib/cloud/instance/boot-finished ]; do echo 'Waiting for cloud-init...'; sleep 1; done"
]
}
So it's not strictly necessary, but you'll find that once you have a working build with Packer, you still will want to improve the reliability of your scripts and other provisioners with timing and retries. A failed build on Packer is a big time waster.

No, a sleep before every provisioner is not necessary - only the first one or after a provisioner restarts the box.
Once Packer has connected to the running instance successfully, further sleeps will just unnecessarily slow down your Packer runs.

Some images may do some jobs on its start, and you can find out your builds fail. You can use different checks to make sure the vm is ready to be provisioned like Ele Munjeli said.
Also you can use sleep once as the first provisioner as you said.
Please note there is a better solution for sleep: you can use pause_before_connecting communicator option and set up required time. See the docs

Related

Deploying an application with database inside mysql container inside docker [duplicate]

I'm trying to wrap my head around Docker from the point of deploying an application which is intended to run on the users on desktop. My application is simply a flask web application and mongo database. Normally I would install both in a VM and, forward a host port to the guest web app. I'd like to give Docker a try but I'm not sure how I'm meant to use more than one program. The documentations says there can only be only ENTRYPOINT so how can I have Mongo and my flask application. Or do they need to be in separate containers, in which case how do they talk to each other and how does this make distributing the app easy?
There can be only one ENTRYPOINT, but that target is usually a script that launches as many programs that are needed. You can additionally use for example Supervisord or similar to take care of launching multiple services inside single container. This is an example of a docker container running mysql, apache and wordpress within a single container.
Say, You have one database that is used by a single web application. Then it is probably easier to run both in a single container.
If You have a shared database that is used by more than one application, then it would be better to run the database in its own container and the applications each in their own containers.
There are at least two possibilities how the applications can communicate with each other when they are running in different containers:
Use exposed IP ports and connect via them.
Recent docker versions support linking.
I strongly disagree with some previous solutions that recommended to run both services in the same container. It's clearly stated in the documentation that it's not a recommended:
It is generally recommended that you separate areas of concern by using one service per container. That service may fork into multiple processes (for example, Apache web server starts multiple worker processes). It’s ok to have multiple processes, but to get the most benefit out of Docker, avoid one container being responsible for multiple aspects of your overall application. You can connect multiple containers using user-defined networks and shared volumes.
There are good use cases for supervisord or similar programs but running a web application + database is not part of them.
You should definitely use docker-compose to do that and orchestrate multiple containers with different responsibilities.
I had similar requirement of running a LAMP stack, Mongo DB and my own services
Docker is OS based virtualisation, which is why it isolates its container around a running process, hence it requires least one process running in FOREGROUND.
So you provide your own startup script as the entry point, thus your startup script becomes an extended Docker image script, in which you can stack any number of the services as far as AT LEAST ONE FOREGROUND SERVICE IS STARTED, WHICH TOO TOWARDS THE END
So my Docker image file has two line below in the very end:
COPY myStartupScript.sh /usr/local/myscripts/myStartupScript.sh
CMD ["/bin/bash", "/usr/local/myscripts/myStartupScript.sh"]
In my script I run all MySQL, MongoDB, Tomcat etc. In the end I run my Apache as a foreground thread.
source /etc/apache2/envvars
/usr/sbin/apache2 -DFOREGROUND
This enables me to start all my services and keep the container alive with the last service started being in the foreground
Hope it helps
UPDATE: Since I last answered this question, new things have come up like Docker compose, which can help you run each service on its own container, yet bind all of them together as dependencies among those services, try knowing more about docker-compose and use it, it is more elegant way unless your need does not match with it.
Although it's not recommended you can run 2 processes in foreground by using wait. Just make a bash script with the following content. Eg start.sh:
# runs 2 commands simultaneously:
mongod & # your first application
P1=$!
python script.py & # your second application
P2=$!
wait $P1 $P2
In your Dockerfile, start it with
CMD bash start.sh
I would recommend to set up a local Kubernetes cluster if you want to run multiple processes simultaneously. You can 'distribute' the app by providing them a simple Kubernetes manifest.
They can be in separate containers, and indeed, if the application was also intended to run in a larger environment, they probably would be.
A multi-container system would require some more orchestration to be able to bring up all the required dependencies, though in Docker v0.6.5+, there is a new facility to help with that built into Docker itself - Linking. With a multi-machine solution, its still something that has to be arranged from outside the Docker environment however.
With two different containers, the two parts still communicate over TCP/IP, but unless the ports have been locked down specifically (not recommended, as you'd be unable to run more than one copy), you would have to pass the new port that the database has been exposed as to the application, so that it could communicate with Mongo. This is again, something that Linking can help with.
For a simpler, small installation, where all the dependencies are going in the same container, having both the database and Python runtime started by the program that is initially called as the ENTRYPOINT is also possible. This can be as simple as a shell script, or some other process controller - Supervisord is quite popular, and a number of examples exist in the public Dockerfiles.
Docker provides a couple of examples on how to do it. The lightweight option is to:
Put all of your commands in a wrapper script, complete with testing
and debugging information. Run the wrapper script as your CMD. This is
a very naive example. First, the wrapper script:
#!/bin/bash
# Start the first process
./my_first_process -D
status=$?
if [ $status -ne 0 ]; then
echo "Failed to start my_first_process: $status"
exit $status
fi
# Start the second process
./my_second_process -D
status=$?
if [ $status -ne 0 ]; then
echo "Failed to start my_second_process: $status"
exit $status
fi
# Naive check runs checks once a minute to see if either of the processes exited.
# This illustrates part of the heavy lifting you need to do if you want to run
# more than one service in a container. The container will exit with an error
# if it detects that either of the processes has exited.
# Otherwise it will loop forever, waking up every 60 seconds
while /bin/true; do
ps aux |grep my_first_process |grep -q -v grep
PROCESS_1_STATUS=$?
ps aux |grep my_second_process |grep -q -v grep
PROCESS_2_STATUS=$?
# If the greps above find anything, they will exit with 0 status
# If they are not both 0, then something is wrong
if [ $PROCESS_1_STATUS -ne 0 -o $PROCESS_2_STATUS -ne 0 ]; then
echo "One of the processes has already exited."
exit -1
fi
sleep 60
done
Next, the Dockerfile:
FROM ubuntu:latest
COPY my_first_process my_first_process
COPY my_second_process my_second_process
COPY my_wrapper_script.sh my_wrapper_script.sh
CMD ./my_wrapper_script.sh
I agree with the other answers that using two containers is preferable, but if you have your heart set on bunding multiple services in a single container you can use something like supervisord.
in Hipache for instance, the included Dockerfile runs supervisord, and the file supervisord.conf specifies for both hipache and redis-server to be run.
If a dedicated script seems like too much overhead, you can spawn separate processes explicitly with sh -c. For example:
CMD sh -c 'mini_httpd -C /my/config -D &' \
&& ./content_computing_loop
In docker, there are two ways you can run a program
CMD
ENTRYPOINT
If you want to know the difference between them, please refer here
In CMD/ENTRYPOINT, there are two formats to run a command
SHELL format
EXEC format
SHELL format:
CMD executable_first arg1; executable_second arg1 arg2
ENTRYPOINT executable_first arg1; executable_second arg1 arg2
This version will create a shell and executes above command. Here you can use any shell syntax such as ";", "&", "|", etc. So you can run any number of commands here. If you have complex set of commands to run, you can create separate shell script and use it.
CMD my_script.sh arg1
ENTRYPOINT my_script.sh arg1
EXEC format:
CMD ["executable", "parameter 1", "parameter 2", …]
ENTRYPOINT ["executable", "parameter 1", "parameter 2", …]
Here you can notice that only first parameter is an executable. From the second parameter, everything become an arguments/parameters for that executable.
To run multiple commands in EXEC format
CMD ["/bin/sh", "-c", "executable_first arg1; executable_second"]
CMD ["/bin/sh", "-c", "executable_first arg1; executable_second"]
In above command, we have used shell command as executable to run the command. This is the only way to run multiple commands in EXEC format.
Following are WRONG
CMD ["executable_first parameter", "executable_second parameter"]
ENTRYPOINT ["executable_first parameter", "executable_second parameter"]
CMD ["executable_first", "parameter", ";", "executable_second", "parameter"]
ENTRYPOINT ["executable_first", "parameter", ";", "executable_second", "parameter"]
Can I run multiple programs in a Docker container?
Yes. But with significant risks.
Below is the same answer as above. But with details and a recommended resolution. If you're interested in those.
Not Recommended
Warning. Using the same container for multiple services is not recommended by the Docker community, though. The Docker documentation reads: "It is generally recommended that you separate areas of concern by using one service per container." Source at:
• https://archive.ph/3Roa6#selection-307.2-307.100
• https://docs.docker.com/config/containers/multi-service_container/
If you choose to ignore the recommendation above, you container risk to be with weaker security, increasingly unstable, and in the future a painful growth.
If you are ok with those risks above, the documentation to use one container for multiple services is at:
• https://archive.ph/3Roa6#selection-335.0-691.1
• https://docs.docker.com/config/containers/multi-service_container/
Recommended
If you need a container(s) with stronger security, and more stability, and in the future, scale bigger, as well as better performance, then the Docker community recommends those two steps:
Use one service per Docker container. The end result is that you will have multiple containers.
Use this Docker "Networking" feature to connect any of those containers to your liking.

Is there a time limit for the startup script to finish before it stops?

I need to create a VM instance in google compute engine with a startup script that takes 30 minutes, but it never finishes, it stops around 10 minutes after the instance boots. Is there a timeout? Is there another alternative to accomplish what I need to do? Thanks!
Given the additional clarification in the comments:
My script downloads another script and then executes it, and what that script does is download some big files, and then compute some values based on latitude/longitude. Then, when the process is finished, the VM is destroyed.
My recommendation would be to run the large download and processing asynchronously rather than synchronously. The reason being is that if it's synchronous, it's part of the VM startup (in the critical path), and the VM monitoring infrastructure notices that the VM is not completing its startup phase within a reasonable amount of time and is terminating it.
Instead, take the heavy-duty processing out of the critical path and do it in the background, i.e., asynchronously.
In other words, the startup script currently probably looks like:
# Download the external script
curl [...] -o /tmp/script.sh
# Run the file download, computation, etc. and shut down the VM.
/tmp/script.sh
I would suggest converting this to:
# Download the external script
curl [...] -o /tmp/script.sh
# Run the file download, computation, etc. and shut down the VM.
nohup /tmp/script.sh &
What this does is start the heavy processing in the background, but also disconnect it from the parent process such that it is not automatically terminated when the parent process (the actual startup script) is terminated. We want the main startup script to terminate so that the entire VM startup phase is marked completed.
For more info, see the Wikipedia page on nohup.

Hide/obfuscate environmental parameters in docker

I'm using the mysql image as an example, but the question is generic.
The password used to launch mysqld in docker is not visible in docker ps however it's visible in docker inspect:
sudo docker run --name mysql-5.7.7 -e MYSQL_ROOT_PASSWORD=12345 -d mysql:5.7.7
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b98afde2fab7 mysql:5.7.7 "/entrypoint.sh mysq 6 seconds ago Up 5 seconds 3306/tcp mysql-5.7.7
sudo docker inspect b98afde2fab75ca433c46ba504759c4826fa7ffcbe09c44307c0538007499e2a
"Env": [
"MYSQL_ROOT_PASSWORD=12345",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"MYSQL_MAJOR=5.7",
"MYSQL_VERSION=5.7.7-rc"
]
Is there a way to hide/obfuscate environment parameters passed when launching containers. Alternatively, is it possible to pass sensitive parameters by reference to a file?
Weirdly, I'm just writing an article on this.
I would advise against using environment variables to store secrets, mainly for the reasons Diogo Monica outlines here; they are visible in too many places (linked containers, docker inspect, child processes) and are likely to end up in debug info and issue reports. I don't think using an environment variable file will help mitigate any of these issues, although it would stop values getting saved to your shell history.
Instead, you can pass in your secret in a volume e.g:
$ docker run -v $(pwd)/my-secret-file:/secret-file ....
If you really want to use an environment variable, you could pass it in as a script to be sourced, which would at least hide it from inspect and linked containers (e.g. CMD source /secret-file && /run-my-app).
The main drawback with using a volume is that you run the risk of accidentally checking the file into version control.
A better, but more complicated solution is to get it from a key-value store such as etcd (with crypt), keywhiz or vault.
You say "Alternatively, is it possible to pass sensitive parameters by reference to a file?", extract from the doc http://docs.docker.com/reference/commandline/run/ --env-file=[] Read in a file of environment variables.

Detect when instance has completed setup script?

I'm launching instances using the following command:
gcutil addinstance \
--image=debian-7 \
--persistent_boot_disk \
--zone=us-central1-a \
--machine_type=n1-standard-1 \
--metadata_from_file=startup-script:install.sh \
instance-name
How can I detect when this instance has completed it's install script? I'd like to be able to place this launch command in a larger provisioning script that then goes on to issue commands to the server that depend on the install script having been successfully completed.
There is a number of ways: sending yourself an email, uploading to Cloud Storage, sending a jabber message, ...
One simple, observable way IMHO is to add a logger entry at the end of your install.sh script (I also tweak the beginning for symmetry). Something like:
#!/bin/bash
/usr/bin/logger "== Startup script START =="
#
# Your code goes here
#
/usr/bin/logger "== Startup script END =="
You can check then if the script started or ended in two ways:
From your Developer's Console, select "Projects" > "Compute" > "VM Instances" > your instance > "Serial console" > "View Output".
From CLI, by issuing a gcutil getserialportoutput instance-name.
I don't know of a way to do all of this within gcutil addinstance.
I'd suggest:
Adding the instance via gcutil addinstance, making sure to use the --wait_until_running flag to ensure that the instance is running before you continue
Copying your script over to the instance via something like gcutil push
Using gcutil ssh <instance-name> </path-to-script/script-to-run> to run your script manually.
This way, you can write your script in such a way that it blocks until it's finished, and the ssh command will not return until your script on the remote machine is done executing.
There really are a lot of ways to accomplish this goal. One that tickles my fancy is to use the metadata server associated with the instance. Have the startup script set a piece of metadata to "FINISHED" when the script is done. You can query the metadata server with a hanging GET that will only return when the metadata updates. Just use gcutil setmetadata
from within the script as the last command.
I like this method because the hanging GET just gives you one command to run, rather than a poll to run in a loop, and it doesn't involve any services besides Compute Engine.
One more hacky way:
startup_script_finished=false
while [[ "$startup_script_finished" = false ]]; do
pid=$(gcloud compute ssh $GCLOUD_USER#$GCLOUD_INSTANCE -- pgrep -f "\"/usr/bin/python /usr/bin/google_metadata_script_runner --script-type startup\"")
if [[ -z $pid ]]; then
startup_script_finished=true
else
sleep 2
fi
done
One possible solution would be to have your install script create a text file in a cloud storage bucket, as the last thing it does, using the host name as the filename.
Your main script that did the original gcutil addinstance command could then be periodically polling the contents of the bucket (using gsutil ls) until it sees a file with a matching name and then it would know the install had completed on that instance.

How to solve jenkins 'Disk space is too low' issue?

I have deployed Jenkins in my CentOS machine, Jenkins was working well for 3 days, but yesterday there was a Disk space is too low. Only 1.019GB left. problem.
How can I solve this problem, it make my master offline for hours?
You can easily change the threshold from jenkins UI (my version is 1.651.3):
[]
Update: How to ensure high disk space
This feature is meant to prevent working on slaves with low free disk space. Lowering the threshold would not solve the fact that some jobs do not properly cleanup after they finish.
Depending on what you're building:
Make sure you understand what is the disk output of your build - if possible - restrict the output to happen only to the job workspace. Use workspace cleanup plugin to cleanup the workspace as post build step.
If the process must write some data to external folders - clean them up manually on post build steps.
Alternative1 - provision a new slave per job (use spot slaves - there are many plugins that integrate with different cloud provider to provision on the fly machines on demand)
Alternative2 - run the build inside a container. Everything will be discarded once the build is finished
Beside above solutions, there is a more "COMMON" way - directly delete the largest space consumer from Linux machine. You can follow the below steps:
Login to Jenkins machine (Putty)
cd to the Jenkins installation path
Using ls -lart to list out hidden folder also, normally jenkin
installation is placed in .jenkins/ folder
[xxxxx ~]$ ls -lart
drwxrwxr-x 12 xxxx 4096 Feb 8 02:08 .jenkins/
list out the folders spaces
Use df -h to show Disk space in high level
du -sh ./*/ to list out total memory for each subfolder in current path.
du -a /etc/ | sort -n -r | head -n 10 will list top 10 directories eating disk space in /etc/
Delete old build or other large size folder
Normally ./job/ folder or ./workspace/ folder can be the largest folder. Please go inside and delete base on you need (DO NOT
delete entire folder).
rm -rf theFolderToDelete
You can limit the reduce of disc space by discarding the old builds. There's a checkbox for this in the project configuration.
This is actually a legitimate question so I don't understand the downvotes, perhaps it belongs on Superuser or Serverfault. This is a soft warning threshold not hard limit where the disk is out of space.
For hudson see where to configure hudson node disk temp space thresholds - this is talking about the host, not nodes
Jenkins is the same. The conclusion is for many small projects the system property called hudson.diagnosis.HudsonHomeDiskUsageChecker.freeSpaceThreshold could be decreased.
In saying that I haven't tested it and there is a disclaimer
No compatibility guarantee
In general, these switches are often experimental in nature, and subject to change without notice. If you find some of those useful, please file a ticket to promote it to the official feature.
I got the same issue. My jenkins version is 2.3 and its UI is slightly different. Putting it here so that it may helps someone. Increasing both disk space thresholds to 5GB fixed the issue.
I have a cleanup job with the following build steps. You can schedule it #daily or #weekly.
Execute system groovy script build step to clean up old jobs:
import jenkins.model.Jenkins
import hudson.model.Job
BUILDS_TO_KEEP = 5
for (job in Jenkins.instance.items) {
println job.name
def recent = job.builds.limit(BUILDS_TO_KEEP)
for (build in job.builds) {
if (!recent.contains(build)) {
println "Preparing to delete: " + build
build.delete()
}
}
}
You'd need to have Groovy plugin installed.
Execute shell build step to clean cache directories
rm -r ~/.gradle/
rm -r ~/.m2/
echo "Disk space"
du -h -s /
To check the free space as Jenkins Job:
Parameters
FREE_SPACE: Needed free space in GB.
Job
#!/usr/bin/env bash
free_space="$(df -Ph . | awk 'NR==2 {print $4}')"
if [[ "${free_space}" = *G* ]]; then
free_space_gb=${x/[^0-9]*/}
if [[ ${free_space_gb} -lt ${FREE_SPACE} ]]; then
echo "Warning! Low space: ${free_space}"
exit 2
fi
else
echo "Warning! Unknown: ${free_space}"
exit 1
fi
echo "Free space: ${free_space}"
Plugins
Set build description
Post-Build Actions
Regular expression: Free space: (.*)
Description: Free space: \1
Regular expression for failed builds: Warning! (.*)
Description for failed builds: \1
For people who do not know where the configs are, download the tmpcleaner from
https://updates.jenkins-ci.org/download/plugins/tmpcleaner/
You will get an hpi file here. Go to Manage Jenkins-> Manage plugins-> Advanced and then upload the hpi file here and restart jenkins
You can immediately see a difference if you go to Manage Nodes.
Since my jenkins was installed in a debian server, I did not understand most of the answers related to this since i cannot find a /etc/default folder or jenkins file.
If someone knows where the /tmp folder is or how to configure it for debian , do let me know in comments