Huge static (mysql) database in docker - mysql

I am developing an application and try to implement the microservice architecture. For information about locations (cities, zip codes, etc.) I downloaded a database dump for mysql from opengeodb.org.
Now I want to provide the database as a docker container.
I set up a mysql image with following Dockerfile as mentioned in the docs for the mysql image:
FROM mysql
ENV MYSQL_ROOT_PASSWORD=mypassword
ENV MYSQL_DATABASE geodb
WORKDIR /docker-entrypoint-initdb.d
ADD ${PWD}/sql .
EXPOSE 3306
The "sql"-folder contains sql scripts with the raw data as insert statements, so it creates the whole database.The problem is, that the database is really huge and it takes really long to set it up.
So I thought, maybe there is a possibility to save the created database inside an image, because it is an static database for read-only operations only.
I am fairly new to docker and not quite sure how to achieve this.
I'm using docker on a Windows 10 machine.
EDIT:
I achieved my goal by doing the following:
I added the sql dump file as described above.
I ran the container and built the whole database with a local directory (the 'data' folder) mounted to /var/lib/mysql.
Then stopped the container and edited the Dockerfile:
FROM mysql
ENV MYSQL_ROOT_PASSWORD=mypassword
ENV MYSQL_DATABASE geodb
WORKDIR /var/lib/mysql
COPY ${PWD}\data .
EXPOSE 3306
So the generated Database is now beeing copied from local system into the container.

You could create a volume with your container to persist the database on your local machine. When you first create the container, the SQL in /docker-entrypoint-initdb.d will be executed, and the changes will be stored to the volume. Next time you start the container, MySQL will see that the schema already exists and it won't run the scripts again.
https://docs.docker.com/storage/volumes/

In principle you could achieve it like this:
start the container
load the database
perform a docker commit to build an image of the current state of the container.
The other option would be to load in the database during the image build time, but for this you would have to start mysql similarly to how it's done in the entrypoint script.
start mysql in background
wait for it to initialize
load in the data using mysql < sql file

Related

How to make a stateless mysql docker container

I want to make a mysql docker image that imports some initial data in the build process.
Afterwards, when used in a container, the container stays stateless, meaning the data added while the container is running does not survive destroying/starting the container again but the inital data is still there.
Is this possible? How would I a setup such an image and container?
I suggest creating the MySQL tables as needed in a SQL script, or directly in a local MySQL instance and exporting them to a file.
With this file in hand, create a Dockerfile which builds on the MySQL container. Add to this another entrypoint script which injects the SQL script into the database.
You don't write anything about mounting volumes. You may want a data volume for the database or configure MySQL for keeping everything in memory.
For added "statelessness" you may want to DROP all tables in your SQL script too.
I think what you need is a multi-stage build:
FROM mysql:5.7 as builder
# needed for intialization
ENV MYSQL_ROOT_PASSWORD=somepassword
ADD initialize.aql /docker-entrypoint-initdb.d/
# That file does the DB initialization but also runs mysql daemon, by removing the last line it will only init
RUN ["sed", "-i", "s/exec \"$#\"/echo \"not running $#\"/", "/usr/local/bin/docker-entrypoint.sh"]
RUN ["/usr/local/bin/docker-entrypoint.sh", "mysqld", "--datadir", "/initialized-db"]
FROM mysql:5.7
COPY --from=builder /initialized-db /var/lib/mysql
You can put your initialization scripts in initialize.sql (or choose a different way to initialize your database).
The resulting image is a database that is already initialised. You can use it and throw it away as you like.
You can also use this process to create different images (tag them differently) for different use cases.
Hope this answers your question.

Creating, populating, and using Docker Volumes

I've been plugging around with Docker for the last few days and am hoping to move my Python-MySQL webapp over to Docker here soon.
The corollary is that I need to use Docker volumes and have been stumped lately. I can create a volume directly by
$ docker volume create my-vol
Or indirectly by referencing a nonexistent volume in a docker run call, but I cannot figure out how to populate these volumes with my .sql database file, without copying the file over via a COPY call in the Dockerfile.
I've tried directly creating the volume within the directory containing the .sql file (first method mentioned above) and mounting the directory containing the .sql file in my 'docker run' call, which does move the .sql file to the container (I've seen it by navigaating the bash shell inside the container) but when running a mariadb container connecting to the database-containing mariadb container (as suggested in the mariadb docker readme file), it only has the standard databases (information_schema, mysql, performance_schema)
How can I create a volume containing my pre-existing .sql database?
When working with mariadb in a docker container, the image supports running .sql files as a part of the first startup of the container. This allows you to push data into the database before it is made accessible.
From the mariadb documentation:
Initializing a fresh instance
When a container is started for thefirst time, a new database with the specified name will be created and
initialized with the provided configuration variables. Furthermore, it
will execute files with extensions .sh, .sql and .sql.gz that are
found in /docker-entrypoint-initdb.d. Files will be executed in
alphabetical order. You can easily populate your mariadb services by
mounting a SQL dump into that directory and provide custom images with
contributed data. SQL files will be imported by default to the
database specified by the MYSQL_DATABASE variable.
This means that if you want to inject data into the container, when it starts up for the first time. In your Dockerfile, COPY the .sql file into the container at the path /docker-entrypoint-initdb.d/myscript.sql - and it will be invoked on the database that you specified in the environment variable MYSQL_DATABASE.
Like this:
FROM mariadb
COPY ./myscript.sql /docker-entrypoint-initdb.d/myscript.sql
Then:
docker run -e MYSQL_DATABASE=mydb mariadb
There is then the question of how you want to manage the database storage. You basically have two options here:
Create a volume binding to the host, where mariadb stores the database. This will enable you to access the database storage files easily from the host machine.
An example with docker run:
docker run -v /my/own/datadir:/var/lib/mysql mariadb
Create a docker volume and bind it to the storage location in the container. This will be a volume that is managed by docker. This volume will persist the data between restarts of the container.
docker volume create my_mariadb_volume
docker run -v my_mariadb_volume:/var/lib/mysql mariadb
The is also covered in the docs for the mariadb docker image. I can recommend reading it from top to bottom if you are going to use this image.

How to properly move a mariadb database from a container to the host

I have a smallish webapp running in a Docker container. It uses a mariadb database running in another container on the same box, based on the official "mariadb" image.
When I first set up these containers, I started the mariadb container using an "internal" database. I gave the "/var/lib/mysql" a volume name, but I didn't map it to a directory on the host ("-v vol-name:/var/lib/mysql"). Actually, I'm not even sure why I gave it a volume name. I set this up several months ago, and I'm not sure why I would have done that specifically.
In any case, I've concluded that having a database internal to the container wasn't a good idea. I've decided I really need to have the actual database stored on the host and use a volume mapping to refer to it. I know how to do this if I was setting this up from scratch, but now that the app is running, I need to move the database to the host and restart the container to point to that. I'm not certain of all the proper steps to make this happen.
In addition, I'm also going to need to set up a second instance of this application, using containers based on the same images. The second database will also be stored on the host, in a directory next to the other one. I can initialize the second db with the backup file from the first one, but I'll likely manually empty most of the tables in the second instance.
I did use mysqldump inside the container to dump the database, then I copied that backup file to the host.
I know how to set a volume mapping in "docker run" to map /var/lib/mysql in the container to a location on the host.
At this point, I'm not certain exactly what to do with this backup file so I can restart the container with the modified volume mapping. I know I can run "mysql dbname < backup.sql", but I'm not sure of the consequences of that.
While the container is running, run docker cp-a CONTAINER:/var/lib/mysql /local/path/to/folder to copy the MariaDB databases from the container to your local machine. Replace "CONTAINER" with the name or ID of your MariaDB container.
Once you've done that, you can stop the container and restart it binding /local/path/to/folder to the container's /var/lib/mysql path.
If you're using an older version of docker that does not support the -a or --archive flag, you can copy the files without that flag but you'll need to make sure that the folder on the host machine has the proper ownership: the UID and GID of the folder must match the UID and GID of the folder in the Docker container.
Note: if you're using SELinux, you might need to set the proper permissions as well, as the documentation for the MariaDB image states:
Note that users on host systems with SELinux enabled may see issues with this. The current workaround is to assign the relevant SELinux policy type to the new data directory so that the container will be allowed to access it:
$ chcon -Rt svirt_sandbox_file_t /my/own/datadir

How to make a docker image with a populated database for automated tests?

I want to create containers w/ a MySQL db and a dump loaded for integration tests. Each test should connect to a fresh container, with the DB in the same state. It should be able to read and write, but all changes should be lost when the test ends and the container is destroyed. I'm using the "mysql" image from the official docker repo.
1) The image's docs suggests taking advantage of the "entrypoint" script that will import any .sql files you provide on a specific folder. As I understand, this will import the dump again every time a new container is created, so not a good option. Is that correct?
2) This SO answer suggests extending that image with a RUN statement to start the mysql service and import all dumps. This seems to be the way to go, but I keep getting
mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
followed by
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
when I run build, even though I can connect to mysql fine on containers of the original image. I tried sleep 5 to wait for the mysqld service to startup, and adding -h with 'localhost' or the docker-machine ip.
How can I fix "2)"? Or, is there a better approach?
If re-seeding the data is an expensive operation another option would be starting / stopping a Docker container (previously build with the DB and seed data). I blogged about this a few months ago Integration Testing using Spring Boot, Postgres and Docker and although the blog focuses on Postgres, the idea is the same and could be translated to MySQL.
The standard MySQL image is pretty slow to start up so might be useful to use something that has been prepared more for this situation like this:
https://github.com/awin/docker-mysql
You can include data or use with a Flyway situation too, but it should speed things up a bit.
How I've solved this before is using a Database Migration tool, specifically flyway: http://flywaydb.org/documentation/database/mysql.html
Flyway is more for migrating the database schema opposed to putting data into it, but you could use it either way. Whenever you start your container just run the migrations against it and your database will be setup however you want. It's easy to use and you can also just use the default MySQL docker container without messing around with any settings. Flyway is also nice for many other reasons, like having a way to have version control for a database schema, and the ability to perform migrations on production databases easily.
To run integration tests with a clean DB I would just have an initial dataset that you insert before the test, then afterwards just truncate all the tables. I'm not sure how large your dataset is, but I think this is generally faster than restarting a mysql container every time,.
Yes, the data will be imported every time you start a container. This could take a long time.
You can view an example image that I created
https://github.com/kliewkliew/mysql-adventureworks
https://hub.docker.com/r/kliew/mysql-adventureworks/
My Dockerfile builds an image by installing MySQL, imports a sample database (from a .sql file), and sets the entrypoint to auto-start MySQL server. When you start a container from this image, it will have the data pre-loaded in the database.

seeding mysql data in a docker build

I'm attempting to build a docker image that will include MySQL and some seed data, and I'm trying to figure out how to insert the data into the database during the docker build phase.
It seems I need to start the MySQL engine, invoke a command to run some SQL statements, and then shut down the MySQL engine. Any good ideas on how to best do that?
When the image is built, the folder /docker-entrypoint-initdb.d is checked for files to seed the DB with. Below the corresponding paragraph from the section Initializing a fresh instance on the official MySQL Docker Hub page.
When a container is started for the first time, a new database with the specified name will be created and initialized with the provided configuration variables. Furthermore, it will execute files with extensions .sh, .sql and .sql.gz that are found in /docker-entrypoint-initdb.d. Files will be executed in alphabetical order. You can easily populate your mysql services by mounting a SQL dump into that directory and provide custom images with contributed data. SQL files will be imported by default to the database specified by the MYSQL_DATABASE variable.
This blog post might help you.
Essentially, the steps to be followed are:
1. create a file (say seed_data.sh) and put it in the same directory as your Dockerfile
2. in the dockerfile add the following lines
ADD resources/seed_data.sh /tmp/
RUN chmod +x /tmp/seed_data.sh
RUN /tmp/seed_data.sh
RUN rm /tmp/seed_data.sh
The file seed_data.sh contains the code for running the mysql server, logging into it and then inserting the data.