Singularity image size and internal disk usage ( du ) mismatch - containers

I wanted to ask for clarification regarding disk usage ( du ) within a singularity image and the apparent mismatch I'm seeing.
For example:
If I list the container image information:
ls -s --block-size=M container.sif
428M container.sif
The image size is as displayed.
But if I check on a directory within which is neither mounted nor bound to the container and was created in the build stage I get the following result.
singularity exec container.sif du -sh /src
1.4G /src
The total disk usage of the container when checked from within is over 4.4G. I assumed this was due to files within my home directory that are bound but now I'm not so sure.
Why is there a mismatch?
Thanks in advance :)

The Singularity image format (SIF) uses SquashFS, a compressed file system. This lets the image be bigger on the inside, and smaller on the host disk. It's part of why singularity images are smaller* than the docker images they're built from.
e.g.,
docker pull ubuntu:22.04
# 22.04: Pulling from library/ubuntu
# 2b55860d4c66: Pull complete
# ...
docker image ls ubuntu:22.04
# REPOSITORY TAG IMAGE ID CREATED SIZE
# ubuntu 22.04 2dc39ba059dc 38 hours ago 77.8MB
singularity pull docker://ubuntu:22.04
# INFO: Converting OCI blobs to SIF format
# INFO: Starting build...
# ...
du -h ubuntu_22.04.sif
# 29M ubuntu_22.04.sif
*: Docker also stores layer information that singularity does not and can reach similar sizes if exported (via docker save) and gzipped. They still must be expanded and loaded again to be used, though.

Related

Cannot map agent.conf using Cygnus docker installation

I have problem installing CYGNUS using docker as source, simply i cannot understand where i should map what specific agent.conf.
Image i am using is from here.
When i try to map agent.conf witch have my specific setup to container it starts and run but fail to copy, and not only that any change i made to file inside container wont stay it returns to previous default state.
While i have no issues with grouping_rules.conf using same approach.
I used docker and docker compose both same results.
Path on witch i try to copy opt/apache-flume/conf/agent.conf
docker run -v /home/igor/Documents/cygnus/agent.conf:/opt/apache-flume/conf/agent.conf fiware/cygnus-ngsi
Can some who managed to run it using his config tell me if i misunderstood location of agent.conf or something because this is weird, i used many docker images and never had issue where i was not able to copy from my machine to docker container.
Thanks in advance.
** EDIT **
Link of agent.conf
Did you copy the agent.conf file to your directory before start the container?
As you can see here, when you define a volume with "-v" option, docker copies the content of the host directory, inside the container directory using the mount point. Therefore, you must first provide the agent.conf file on your host.
The reason is that when using a "bind mounted" directory from the
host, you're telling docker that you want to take a file or directory
from your host and use it in your container. Docker should not modify
those files/directories, unless you explicitly do so. For example, you
don't want -v /home/user/:/var/lib/mysql to result in your
home-directory being replaced with a MySQL database.
If you do not have access to the agent.conf file, you can download the template in the source code from the official cygnus github repo here. You can also copy it once the docker container is running, using the docker cp option:
docker cp <containerId>:/file/path/within/container /host/path/target
Keep in mind, that you will have to edit the agent.conf file to configure it according to the database you are using. You can find in the official doc how to configure cygnus to use differents sinks like MongoDB, MySQL, etc.
I hope I have been helpful.
Best regards!

docker commit mysql doesn't save

I am trying to create a docker image from a mysql container.
The problem is that db of the new image is clean, but
files/folders, which I create manually
in the origin container before commit, are copied.
base mysql image is official 5.6
docker is 1.11.
I checked that folder
/var/lib/mysql/d1 appears when a db is created but new image
doesn't persist this folder, though folders in / root are persisted.
Several things happening here:
First, docker commit is a code smell. It tends to be used by those creating images with a manual process, rather than automating their builds with a Dockerfile that would allow for easy recreation. If at all possible, I recommend you transition to a Dockerfile for your image creation.
Next, a docker commit will not capture changes made to a volume. And this same issue occurs if you try to update a volume with a RUN step in a Dockerfile. Both of these capture changes to the container filesystem and store those changes as a layer in the docker image, and the volumes are not part of the container filesystem. This is also visible if you run docker diff against a container. In this case, the upstream image has defined the volume in their Dockerfile:
VOLUME /var/lib/mysql
And docker does not have a command to undo a created volume from the Dockerfile. You would need to either directly modify the image definition from outside of docker (not recommended) or build your own upstream image with that step removed (recommended).
What the mysql image does provide is the ability to inject your own database creation scripts in /docker-entrypoint-initdb.d, which you can add with your own image that extends mysql, or mount as a volume. This is where you would inject your schema, or initialize from a known backup for development.
Lastly, if the goal is to have persistence, you should store your data in a volume, not by committing containers:
docker run -v mysql-data:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql
The volume allows you to recreate the container, upgrade to a newer version of mysql when patches are released (e.g. security fixes) without losing your data.
To backup the volume this will export to a tgz:
docker run --rm -v mysql-data:/source busybox tar -cC /source . >backup.tgz
And to restore a volume, this creates one from a tgz:
docker run --rm -i -v mysql-data:/target busybox tar -xC /target <backup.tgz
You can make data persist by using docker commit command like below.
docker commit CONTAINER_ID REPOSITORY:TAG
docker commit | Docker Documentation
But just as BMitch's answer said, a docker commit will not capture changes made to a volume.
And usually you should use a volume to store data permanently and let a container be ephemeral without data being stored in itself.
So I guess many people think that trying to persist data without using a volume is a bad practice.
But there are some cases you might consider committing and freeze data into an image.
For example, it's handy when you have an image with all the tables and records in it if you use the image for automated test in CI.
In the case of github actions, only thing you need to do is just pull the image and create the database container and run tests against the database.
No need to think about migration of data.

'undo' or 'cancel' dockerfile VOLUME to share mysql DB in registry

I'm inheriting from the mysql Dockerfile and want to move a VOLUME (/var/lib/mysql) back inside the container so I can distribute it from a registry.
Is there a way in my downstream Dockerfile to (a) undo the VOLUME declaration or (b) replace /var/lib/mysql with a symlink?
I'm giving up on this -- seems simpler to distribute a zipped copy of the DB data directory. If you have a better option, please post.
I had the exact same problem, just with another database (arangodb).
However, I did not find a direct solution for this problem, but in my case (this should also work with mysql), I simply changed the data directory of my database to a non-volume directory in the Dockerfile.
For now, this seems like the best solution, as you can build a full image that contains your data.
As L0j1k has argued vividly in general it is a very bad idea to have your data dir inside of the container. However there are situations where it makes sense. Like for automated tests, run a container with testdata check that everything works as expected and throw it away. Also on OSX & Windows volumes aren't native mounds (because docker runs in a VM) and they can be painfully slow. So you might be better of with copying your data from and to the container, depending on your situation.
While you can't undo the VOLUME directive you can simply create a new data dir and tell Mysql to use that:
FROM mariadb:latest
# Create data dir in /var/lib/data
RUN mkdir /var/lib/data
RUN chown mysql.mysql /var/lib/data
# Change data dir from /var/lib/mysql to /var/lib/data
RUN sed -i 's/\/var\/lib\/mysql/\/var\/lib\/data/g' /etc/mysql/my.cnf
Use with caution.
DO NOT ship your database data in the same image as your database! This is an antipattern and will create bigger problems almost immediately. Ship the data separately as an archive which you then mount into your database container via bind-mount (-v /home/foo/db:/var/lib/mysql). Bind-mount volumes in your docker run statement will override any VOLUME Dockerfile directive. Alternatively, create some automation to dump the database and ship that to your containers, then restore using the dump. Whatever you do will be better than creating an image with your data in the database image. Just as one example of why this is a bad idea: What happens when you need to move the data/database mutant which now has changes? You'll probably use docker export to dump the entire container's filesystem into a new image, and now you're passing around a big blob of crap which is hard to audit. Docker containers (and microservices in general) are designed to be ephemeral and stateless, which means you can hose any one container and recreate it and it'll continue working. You can't do this if you ship your blob of data inside the database image.
With respect to the VOLUME directive in that Dockerfile: Remember that Dockerfiles are used during docker build and therefore do not (and cannot) contain host-dependent information or actions. So the VOLUME /var/lib/mysql isn't making your image impossible to distribute. What that directive does is create a generic (i.e. non-bind-mount) data volume that persists the data of that directory beyond the lifetime of the container. It is not the same thing as a bind-mount volume for example in docker run -v "/var/docker/app/data:/var/lib/mysql" .... This Dockerfile directive does not prevent you from distributing the image because it does not specify host-dependent information.

How to add external NTFS drive to Google Cloud Engine app?

Since using docker takes up a lot of space for images, I would like to attach an external hard drive to my 10GB instance Ubuntu VM. However, I've added a blank disk and attached it, but I end up with this message when I type "fdisk -l":
Disk /dev/sdb doesn't contain a valid partition table.
How do I create an external NTFS drive and mount it to my filesystem?
Just as on any other Ubuntu instance, once you've attached an unformatted or unsuitably-formatted drive... one good set of instructions is for example at https://help.ubuntu.com/community/InstallingANewHardDrive .
To do it manually, run, as root (sudo bash for example):
$ apt-get install ntfsprogs
$ df -k # just to check nothing is mounted on /dev/sdb...
$ # umount /dev/sdb if df -k shows something mounted there
$ fdisk # to fix the partition table, see http://linux.die.net/man/8/fdisk
$ # if you need a tutorial, http://www.howtogeek.com/106873/how-to-use-fdisk-to-manage-partitions-on-linux/
$ mkfs.ntfs -f /dev/sdb1 # if you're in a hurry, or
$ # mkfs.ntfs /dev/sdb1 # if you have all the time in the world
Incidentally, this is a system administration question, not a software development one, so you might be happier asking it over at serverfault -- we do monitor the google-cloud-platform there, too.
Two side issues -- (1) why NTFS? You're unlikely to be using this PD with Windows, so a native Linux file system might be preferable... (2) what does this have to do with google-app-engine? Did you mistype that tag meaning actually google-compute-engine instead...?

Managing the selinux context of a file created on the host via a Docker container's volume

I ran through the fig python / django tutorial on Fedora 20 (docker 1.0.0) but it failed & tripped an AVC denial in SELinux when django-admin.py attempted to create the project files.
I reviewed the policy, i can see that setting the docker_var_lib_t context on my code dir would permit docker to write there (although i've just spied docker_share_t in the policy, that looks a better fit permissions wise - no chr / blk devices in that context).
Code directory locations are not predictable so setting a system wide policy (via semanage fcontext) doesn't seem the best way forward; i'd need to introduce some kind of convention.
Is there any way to automatically set this context on volumes mounted from a host?
You can set the following context on the directory
chcon -Rt svirt_sandbox_file_t $HOME/code/export
then run your docker command as
docker run --rm -it -v $HOME/code/export:/exported:ro image /foo/bar