Should config be built into a Docker image (best practice) [closed]

Should config be built into a Docker image (best practice) [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am currently looking to deploy a Docker application that uses an Nginx proxy and a MySQL instance (amongst other things).
What is the best practice when it comes to configuration files (in the case of Nginx) or initial table set up and server configuration (in the case of SQL)?
Is it generally better to build this config into a custom image in the Docker file (so take the standard Nginx image as a parent and copy the config into the image and build it) or to build the standard Nginx file in the docker-compose file and use volumes and bind mounts to bring the config into the image at the time of deployment? (and the analogous questions for a SQL container).

There's a bit of "it depends" here. The big question is, how much do you expect the person running the container to need to modify the configuration?
If the configuration is totally fixed then just build it into the container. A good example of this is an nginx configuration file for proxying a set of other containers in a Docker Compose setup: if you think the host names for the other containers you will use will never change, it's easier to build it into the image.
If the configuration has a limited number of things that can change, but the configuration language allows variable substitutions, then compile a config file referencing environment variables into the image, and use environment variables to adjust the configuration. The prototypical example of this in my mind is a Rails database.yml.erb file where you can substitute
host: <%= ENV['MYSQL_HOST'] %>
docker run --net some_network -e MYSQL_HOST=mysql myimage
If the configuration has a limited number of things that can change, you can also apply variable substitutions at startup time. sed can do this fine; if you otherwise have the GNU tools available (perhaps your image is Debian or Ubuntu-based) envsubst can do this straightforwardly. An entrypoint script can do this before you start the main program.
#!/bin/sh
# Fill in runtime values for configuration
sed -e "s/MYSQL_HOST/$MYSQL_HOST/" < database.conf.tpl > database.conf
# Run the CMD from the Dockerfile
exec "$#"
...
COPY entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["myapp"]
docker run --net some_network -e MYSQL_HOST=mysql myimage
If there are many complex configuration choices or you expect the user to just wholesale replace the configuration, bind-mount it in.
docker run -v $PWD/application.conf:/app/application.conf myimage
If you're actually deploying into Kubernetes you can put the configuration file in a ConfigMap object, which essentially behaves the same way as the bind-mount option. If you're also using a tool like Helm to manage the deployment then you can use its templating layer to set the ConfigMap content. The Helm templating language is fairly involved and has loops and conditionals, so you can build up a complex configuration file based on deploy-time settings.
You also mention databases. The standard SQL database containers support placing content in a /docker-entrypoint-initdb.d directory, and this will get run the very first time the database starts up. I'd minimize the use of this, and prefer creating tables via a database-migration system. Mostly this is because those scripts only get run when the database is created the very first time, and you'll need a migration system anyways; you don't want to need to delete all of your data every time you change your schema.

Related

Deploying an application with database inside mysql container inside docker [duplicate]

I'm trying to wrap my head around Docker from the point of deploying an application which is intended to run on the users on desktop. My application is simply a flask web application and mongo database. Normally I would install both in a VM and, forward a host port to the guest web app. I'd like to give Docker a try but I'm not sure how I'm meant to use more than one program. The documentations says there can only be only ENTRYPOINT so how can I have Mongo and my flask application. Or do they need to be in separate containers, in which case how do they talk to each other and how does this make distributing the app easy?

There can be only one ENTRYPOINT, but that target is usually a script that launches as many programs that are needed. You can additionally use for example Supervisord or similar to take care of launching multiple services inside single container. This is an example of a docker container running mysql, apache and wordpress within a single container.
Say, You have one database that is used by a single web application. Then it is probably easier to run both in a single container.
If You have a shared database that is used by more than one application, then it would be better to run the database in its own container and the applications each in their own containers.
There are at least two possibilities how the applications can communicate with each other when they are running in different containers:
Use exposed IP ports and connect via them.
Recent docker versions support linking.

I strongly disagree with some previous solutions that recommended to run both services in the same container. It's clearly stated in the documentation that it's not a recommended:
It is generally recommended that you separate areas of concern by using one service per container. That service may fork into multiple processes (for example, Apache web server starts multiple worker processes). It’s ok to have multiple processes, but to get the most benefit out of Docker, avoid one container being responsible for multiple aspects of your overall application. You can connect multiple containers using user-defined networks and shared volumes.
There are good use cases for supervisord or similar programs but running a web application + database is not part of them.
You should definitely use docker-compose to do that and orchestrate multiple containers with different responsibilities.

I had similar requirement of running a LAMP stack, Mongo DB and my own services
Docker is OS based virtualisation, which is why it isolates its container around a running process, hence it requires least one process running in FOREGROUND.
So you provide your own startup script as the entry point, thus your startup script becomes an extended Docker image script, in which you can stack any number of the services as far as AT LEAST ONE FOREGROUND SERVICE IS STARTED, WHICH TOO TOWARDS THE END
So my Docker image file has two line below in the very end:
COPY myStartupScript.sh /usr/local/myscripts/myStartupScript.sh
CMD ["/bin/bash", "/usr/local/myscripts/myStartupScript.sh"]
In my script I run all MySQL, MongoDB, Tomcat etc. In the end I run my Apache as a foreground thread.
source /etc/apache2/envvars
/usr/sbin/apache2 -DFOREGROUND
This enables me to start all my services and keep the container alive with the last service started being in the foreground
Hope it helps
UPDATE: Since I last answered this question, new things have come up like Docker compose, which can help you run each service on its own container, yet bind all of them together as dependencies among those services, try knowing more about docker-compose and use it, it is more elegant way unless your need does not match with it.

Although it's not recommended you can run 2 processes in foreground by using wait. Just make a bash script with the following content. Eg start.sh:
# runs 2 commands simultaneously:
mongod & # your first application
P1=$!
python script.py & # your second application
P2=$!
wait $P1 $P2
In your Dockerfile, start it with
CMD bash start.sh
I would recommend to set up a local Kubernetes cluster if you want to run multiple processes simultaneously. You can 'distribute' the app by providing them a simple Kubernetes manifest.

They can be in separate containers, and indeed, if the application was also intended to run in a larger environment, they probably would be.
A multi-container system would require some more orchestration to be able to bring up all the required dependencies, though in Docker v0.6.5+, there is a new facility to help with that built into Docker itself - Linking. With a multi-machine solution, its still something that has to be arranged from outside the Docker environment however.
With two different containers, the two parts still communicate over TCP/IP, but unless the ports have been locked down specifically (not recommended, as you'd be unable to run more than one copy), you would have to pass the new port that the database has been exposed as to the application, so that it could communicate with Mongo. This is again, something that Linking can help with.
For a simpler, small installation, where all the dependencies are going in the same container, having both the database and Python runtime started by the program that is initially called as the ENTRYPOINT is also possible. This can be as simple as a shell script, or some other process controller - Supervisord is quite popular, and a number of examples exist in the public Dockerfiles.

Docker provides a couple of examples on how to do it. The lightweight option is to:
Put all of your commands in a wrapper script, complete with testing
and debugging information. Run the wrapper script as your CMD. This is
a very naive example. First, the wrapper script:
#!/bin/bash
# Start the first process
./my_first_process -D
status=$?
if [ $status -ne 0 ]; then
echo "Failed to start my_first_process: $status"
exit $status
fi
# Start the second process
./my_second_process -D
status=$?
if [ $status -ne 0 ]; then
echo "Failed to start my_second_process: $status"
exit $status
fi
# Naive check runs checks once a minute to see if either of the processes exited.
# This illustrates part of the heavy lifting you need to do if you want to run
# more than one service in a container. The container will exit with an error
# if it detects that either of the processes has exited.
# Otherwise it will loop forever, waking up every 60 seconds
while /bin/true; do
ps aux |grep my_first_process |grep -q -v grep
PROCESS_1_STATUS=$?
ps aux |grep my_second_process |grep -q -v grep
PROCESS_2_STATUS=$?
# If the greps above find anything, they will exit with 0 status
# If they are not both 0, then something is wrong
if [ $PROCESS_1_STATUS -ne 0 -o $PROCESS_2_STATUS -ne 0 ]; then
echo "One of the processes has already exited."
exit -1
fi
sleep 60
done
Next, the Dockerfile:
FROM ubuntu:latest
COPY my_first_process my_first_process
COPY my_second_process my_second_process
COPY my_wrapper_script.sh my_wrapper_script.sh
CMD ./my_wrapper_script.sh

I agree with the other answers that using two containers is preferable, but if you have your heart set on bunding multiple services in a single container you can use something like supervisord.
in Hipache for instance, the included Dockerfile runs supervisord, and the file supervisord.conf specifies for both hipache and redis-server to be run.

If a dedicated script seems like too much overhead, you can spawn separate processes explicitly with sh -c. For example:
CMD sh -c 'mini_httpd -C /my/config -D &' \
&& ./content_computing_loop

In docker, there are two ways you can run a program
CMD
ENTRYPOINT
If you want to know the difference between them, please refer here
In CMD/ENTRYPOINT, there are two formats to run a command
SHELL format
EXEC format
SHELL format:
CMD executable_first arg1; executable_second arg1 arg2
ENTRYPOINT executable_first arg1; executable_second arg1 arg2
This version will create a shell and executes above command. Here you can use any shell syntax such as ";", "&", "|", etc. So you can run any number of commands here. If you have complex set of commands to run, you can create separate shell script and use it.
CMD my_script.sh arg1
ENTRYPOINT my_script.sh arg1
EXEC format:
CMD ["executable", "parameter 1", "parameter 2", …]
ENTRYPOINT ["executable", "parameter 1", "parameter 2", …]
Here you can notice that only first parameter is an executable. From the second parameter, everything become an arguments/parameters for that executable.
To run multiple commands in EXEC format
CMD ["/bin/sh", "-c", "executable_first arg1; executable_second"]
CMD ["/bin/sh", "-c", "executable_first arg1; executable_second"]
In above command, we have used shell command as executable to run the command. This is the only way to run multiple commands in EXEC format.
Following are WRONG
CMD ["executable_first parameter", "executable_second parameter"]
ENTRYPOINT ["executable_first parameter", "executable_second parameter"]
CMD ["executable_first", "parameter", ";", "executable_second", "parameter"]
ENTRYPOINT ["executable_first", "parameter", ";", "executable_second", "parameter"]

Can I run multiple programs in a Docker container?
Yes. But with significant risks.
Below is the same answer as above. But with details and a recommended resolution. If you're interested in those.
Not Recommended
Warning. Using the same container for multiple services is not recommended by the Docker community, though. The Docker documentation reads: "It is generally recommended that you separate areas of concern by using one service per container." Source at:
• https://archive.ph/3Roa6#selection-307.2-307.100
• https://docs.docker.com/config/containers/multi-service_container/
If you choose to ignore the recommendation above, you container risk to be with weaker security, increasingly unstable, and in the future a painful growth.
If you are ok with those risks above, the documentation to use one container for multiple services is at:
• https://archive.ph/3Roa6#selection-335.0-691.1
• https://docs.docker.com/config/containers/multi-service_container/
Recommended
If you need a container(s) with stronger security, and more stability, and in the future, scale bigger, as well as better performance, then the Docker community recommends those two steps:
Use one service per Docker container. The end result is that you will have multiple containers.
Use this Docker "Networking" feature to connect any of those containers to your liking.

Cannot map agent.conf using Cygnus docker installation

I have problem installing CYGNUS using docker as source, simply i cannot understand where i should map what specific agent.conf.
Image i am using is from here.
When i try to map agent.conf witch have my specific setup to container it starts and run but fail to copy, and not only that any change i made to file inside container wont stay it returns to previous default state.
While i have no issues with grouping_rules.conf using same approach.
I used docker and docker compose both same results.
Path on witch i try to copy opt/apache-flume/conf/agent.conf
docker run -v /home/igor/Documents/cygnus/agent.conf:/opt/apache-flume/conf/agent.conf fiware/cygnus-ngsi
Can some who managed to run it using his config tell me if i misunderstood location of agent.conf or something because this is weird, i used many docker images and never had issue where i was not able to copy from my machine to docker container.
Thanks in advance.
** EDIT **
Link of agent.conf

Did you copy the agent.conf file to your directory before start the container?
As you can see here, when you define a volume with "-v" option, docker copies the content of the host directory, inside the container directory using the mount point. Therefore, you must first provide the agent.conf file on your host.
The reason is that when using a "bind mounted" directory from the
host, you're telling docker that you want to take a file or directory
from your host and use it in your container. Docker should not modify
those files/directories, unless you explicitly do so. For example, you
don't want -v /home/user/:/var/lib/mysql to result in your
home-directory being replaced with a MySQL database.
If you do not have access to the agent.conf file, you can download the template in the source code from the official cygnus github repo here. You can also copy it once the docker container is running, using the docker cp option:
docker cp <containerId>:/file/path/within/container /host/path/target
Keep in mind, that you will have to edit the agent.conf file to configure it according to the database you are using. You can find in the official doc how to configure cygnus to use differents sinks like MongoDB, MySQL, etc.
I hope I have been helpful.
Best regards!

docker commit mysql doesn't save

I am trying to create a docker image from a mysql container.
The problem is that db of the new image is clean, but
files/folders, which I create manually
in the origin container before commit, are copied.
base mysql image is official 5.6
docker is 1.11.
I checked that folder
/var/lib/mysql/d1 appears when a db is created but new image
doesn't persist this folder, though folders in / root are persisted.

Several things happening here:
First, docker commit is a code smell. It tends to be used by those creating images with a manual process, rather than automating their builds with a Dockerfile that would allow for easy recreation. If at all possible, I recommend you transition to a Dockerfile for your image creation.
Next, a docker commit will not capture changes made to a volume. And this same issue occurs if you try to update a volume with a RUN step in a Dockerfile. Both of these capture changes to the container filesystem and store those changes as a layer in the docker image, and the volumes are not part of the container filesystem. This is also visible if you run docker diff against a container. In this case, the upstream image has defined the volume in their Dockerfile:
VOLUME /var/lib/mysql
And docker does not have a command to undo a created volume from the Dockerfile. You would need to either directly modify the image definition from outside of docker (not recommended) or build your own upstream image with that step removed (recommended).
What the mysql image does provide is the ability to inject your own database creation scripts in /docker-entrypoint-initdb.d, which you can add with your own image that extends mysql, or mount as a volume. This is where you would inject your schema, or initialize from a known backup for development.
Lastly, if the goal is to have persistence, you should store your data in a volume, not by committing containers:
docker run -v mysql-data:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql
The volume allows you to recreate the container, upgrade to a newer version of mysql when patches are released (e.g. security fixes) without losing your data.
To backup the volume this will export to a tgz:
docker run --rm -v mysql-data:/source busybox tar -cC /source . >backup.tgz
And to restore a volume, this creates one from a tgz:
docker run --rm -i -v mysql-data:/target busybox tar -xC /target <backup.tgz

You can make data persist by using docker commit command like below.
docker commit CONTAINER_ID REPOSITORY:TAG
docker commit | Docker Documentation
But just as BMitch's answer said, a docker commit will not capture changes made to a volume.
And usually you should use a volume to store data permanently and let a container be ephemeral without data being stored in itself.
So I guess many people think that trying to persist data without using a volume is a bad practice.
But there are some cases you might consider committing and freeze data into an image.
For example, it's handy when you have an image with all the tables and records in it if you use the image for automated test in CI.
In the case of github actions, only thing you need to do is just pull the image and create the database container and run tests against the database.
No need to think about migration of data.

'undo' or 'cancel' dockerfile VOLUME to share mysql DB in registry

I'm inheriting from the mysql Dockerfile and want to move a VOLUME (/var/lib/mysql) back inside the container so I can distribute it from a registry.
Is there a way in my downstream Dockerfile to (a) undo the VOLUME declaration or (b) replace /var/lib/mysql with a symlink?

I'm giving up on this -- seems simpler to distribute a zipped copy of the DB data directory. If you have a better option, please post.

I had the exact same problem, just with another database (arangodb).
However, I did not find a direct solution for this problem, but in my case (this should also work with mysql), I simply changed the data directory of my database to a non-volume directory in the Dockerfile.
For now, this seems like the best solution, as you can build a full image that contains your data.

As L0j1k has argued vividly in general it is a very bad idea to have your data dir inside of the container. However there are situations where it makes sense. Like for automated tests, run a container with testdata check that everything works as expected and throw it away. Also on OSX & Windows volumes aren't native mounds (because docker runs in a VM) and they can be painfully slow. So you might be better of with copying your data from and to the container, depending on your situation.
While you can't undo the VOLUME directive you can simply create a new data dir and tell Mysql to use that:
FROM mariadb:latest
# Create data dir in /var/lib/data
RUN mkdir /var/lib/data
RUN chown mysql.mysql /var/lib/data
# Change data dir from /var/lib/mysql to /var/lib/data
RUN sed -i 's/\/var\/lib\/mysql/\/var\/lib\/data/g' /etc/mysql/my.cnf
Use with caution.

DO NOT ship your database data in the same image as your database! This is an antipattern and will create bigger problems almost immediately. Ship the data separately as an archive which you then mount into your database container via bind-mount (-v /home/foo/db:/var/lib/mysql). Bind-mount volumes in your docker run statement will override any VOLUME Dockerfile directive. Alternatively, create some automation to dump the database and ship that to your containers, then restore using the dump. Whatever you do will be better than creating an image with your data in the database image. Just as one example of why this is a bad idea: What happens when you need to move the data/database mutant which now has changes? You'll probably use docker export to dump the entire container's filesystem into a new image, and now you're passing around a big blob of crap which is hard to audit. Docker containers (and microservices in general) are designed to be ephemeral and stateless, which means you can hose any one container and recreate it and it'll continue working. You can't do this if you ship your blob of data inside the database image.
With respect to the VOLUME directive in that Dockerfile: Remember that Dockerfiles are used during docker build and therefore do not (and cannot) contain host-dependent information or actions. So the VOLUME /var/lib/mysql isn't making your image impossible to distribute. What that directive does is create a generic (i.e. non-bind-mount) data volume that persists the data of that directory beyond the lifetime of the container. It is not the same thing as a bind-mount volume for example in docker run -v "/var/docker/app/data:/var/lib/mysql" .... This Dockerfile directive does not prevent you from distributing the image because it does not specify host-dependent information.

Hide/obfuscate environmental parameters in docker

I'm using the mysql image as an example, but the question is generic.
The password used to launch mysqld in docker is not visible in docker ps however it's visible in docker inspect:
sudo docker run --name mysql-5.7.7 -e MYSQL_ROOT_PASSWORD=12345 -d mysql:5.7.7
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b98afde2fab7 mysql:5.7.7 "/entrypoint.sh mysq 6 seconds ago Up 5 seconds 3306/tcp mysql-5.7.7
sudo docker inspect b98afde2fab75ca433c46ba504759c4826fa7ffcbe09c44307c0538007499e2a
"Env": [
"MYSQL_ROOT_PASSWORD=12345",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"MYSQL_MAJOR=5.7",
"MYSQL_VERSION=5.7.7-rc"
]
Is there a way to hide/obfuscate environment parameters passed when launching containers. Alternatively, is it possible to pass sensitive parameters by reference to a file?

Weirdly, I'm just writing an article on this.
I would advise against using environment variables to store secrets, mainly for the reasons Diogo Monica outlines here; they are visible in too many places (linked containers, docker inspect, child processes) and are likely to end up in debug info and issue reports. I don't think using an environment variable file will help mitigate any of these issues, although it would stop values getting saved to your shell history.
Instead, you can pass in your secret in a volume e.g:
$ docker run -v $(pwd)/my-secret-file:/secret-file ....
If you really want to use an environment variable, you could pass it in as a script to be sourced, which would at least hide it from inspect and linked containers (e.g. CMD source /secret-file && /run-my-app).
The main drawback with using a volume is that you run the risk of accidentally checking the file into version control.
A better, but more complicated solution is to get it from a key-value store such as etcd (with crypt), keywhiz or vault.

You say "Alternatively, is it possible to pass sensitive parameters by reference to a file?", extract from the doc http://docs.docker.com/reference/commandline/run/ --env-file=[] Read in a file of environment variables.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008