Running Couchbase under GCE Docker and getting error about max number of files - couchbase

When I run Couchbase server in a Docker container on GCE, using the ncolomer/couchbase image, I'm getting this error:
The maximum number of open files for the couchbase user is set too low.
It must be at least 10240. Normally this can be increased by adding
the following lines to /etc/security/limits.conf:
couchbase soft nofile <value>
couchbase hard nofile <value>
Where <value> is greater than 10240.
The docs in ncolomer/couchbase are recommending to update /etc/init/docker.conf and add limit nofile 262144, but I'm not sure that's even available when using Docker under GCE.
I see a few options:
In the Dockerfile, run a script to modify /etc/security/limits.conf as suggested by the couchbase error.
Call ulimit -n 64000 in the Dockerfile
Any suggestions?

The problem with ulimit is that the limits are bounded by the limits of the docker host process.
This is related to the known #4717 (and to a lesser extent #1916) Docker issue.
As I understand, the two options you mentioned should not work since it will only set ulimit on the child process (i.e. the container). From there, I see no choice but set the correct ulimit on the host before trying to increase it on your container.
The documented procedure should work fine, until you have the possibility to apply it.
I don't know the GCE platform very well, but if you have a root access to your instance, you should just apply the changes to the /etc/init/docker.conf file, restart the Docker service and fire up the Couchbase container.

Related

"Too many open files" error during backward ingest of data in influxdb v2 on rootless containerd

I am trying to figure out how resource limits in rootless container environments are handled. I am running influxdb:latest (tried :alpine as well) as a rootless container with nerdctl and containerd an an up-to-date ubuntu host. While all other containers on the host and influx startup are running just fine, I get "too many open files" errors when ingesting about 3 years of data into the database. Data is sampled on a 24h basis, all in all 926 days with 5 points each. Logs indicate that errors arise during shard creation.
Form what I have read during the last hours, when running rootless containers the container 'root' user gets mapped to an UID from my personal user's namespace on the host system. From my understanding the container's 'root' should therefore be inheriting my host user's limits on open files and processes. ulimit -u gives 14959 as open files limit. lsof -u myuser | wc -l indicates about 833 open files during normal operation. Accordingly, there should be plenty of room for opening more files for the influxdb container. The errors start to happen during creation of shards numbering below 100 on a newly set up database.
After hours of testing and googling I as able to mitigate my problem by setting --ulimit nofile=5000:5000 when running the influxdb container.
Can someone explain why this is necessary even though my user has a limit set way above what should be needed an is indicated by --ulimit?
Already tried mounting the influx data folder as a volume, as bind mount and already tried without mounting at all. The issue stays all the same, independent of how storage is handled. My impression is, that the container gets started with a fairly low number of open files allowed. But why? Could not find anything related in neither documentation.

mysql-Docker with separate file-based log files and log rotation

So I have a mysql Docker up and running with 3 log files (general, error, slow-query log) enabled, that are written to /var/log/mysql/ (path inside the mysql container), which actually is a directory on the docker host (named 'log') and mounted into the container as a volume specified in the docker-compose.yml.
We chose this way, because we didn't want general and slow-query logs combined on stdout and we prefer a daily rotation of the 3 separate log files, since it seems more comfortable to us to find a certain query that was issued - let's say - 4 days ago.
Since the mysql Docker (afaik) doesn't come with logrotate and/or cron, we decided to have another service in the docker-compose.yml named logrotator, which starts cron in it's entrypoint, which in turn regularly runs logrotate with a given logrotate.conf. The 'log' directory is also mounted into the logrotator container, so it can do it's rotation job on the mysql log files.
Now it seems like mysql needs a "mysqladmin flush-logs" after each rotation to start writing into a new file descriptor, but the logrotator container cannot issue this command inside the mysql container.
To make it short(er): I'm sure there are better ways to accomplish separate log files with log rotation. Just how? Any ideas are much appreciated. Thanks.
Update:
Since we're using mysql 5.7 as of now, and hence probably cannot solve our issue by the solution as proposed by #buaacss (which might absolutely work), we decided to stay with a "cron" container. Additionally we installed docker.io inside the cron container and mounted the docker host's /var/run/docker.sock into the cron container. This allows us to use "docker exec" to issue commands (in this case 'mysqladmin flush-logs') from the cron container to be executed in the mysql container. Problem solved.
you can indeed use SIGHUP instead of flush log statement based on doc
https://dev.mysql.com/doc/refman/5.6/en/log-file-maintenance.html
but may have some undesired effects, i.e. write huge report information to the error log.
so, as I mentioned in comment, they developed a light version of SIGHUP, i.e. SIGUSR1 to accomplish functions below
FR1: When SIGUSR1 is sent to the server, it must flush the error log.
FR2: When SIGUSR1 is sent to the server, it must flush the general log.
FR3: When SIGUSR1 is sent to the server, it must flush the slow query log.
FR4: SIGUSR1 must not send MySQL status report.
Currently when SIGHUP is sent to the server a large report of information is
printed to stdout, the status report.
FR5: The server must not fail when SIGUSR1 is sent, even though slow log is not
enabled.
FR6: The server must not fail when SIGUSR1 is sent, even though slow log output
is set to a table (log_output).
FR7: The server must not fail when SIGUSR1 is sent, even though general log is
set to OFF.
NFR1: SIGALRM must be undisguisable from how SIGUSR1 behaved before.
unfortunately such signal is only available in MySQL 8 or above

How to track disk usage on Container-Optimized OS

I have an application running on Container-Optimized OS based Compute Engine.
My application runs every 20min, fetches and writes data to a local file, then deletes the file after some processing. Note that each file is less than 100KB.
My boot disk size is the default 10GB.
I run into "no space left on device" error every month or so while attempting to write the file locally.
How can I track disk usage?
I manually checked the size of the folders and it seems that the bulk of the space is taken by /mnt/stateful_partition/var/lib/docker/overlay2.
my-vm / # sudo du -sh /mnt/stateful_partition/var/lib/docker/*
20K /mnt/stateful_partition/var/lib/docker/builder
72K /mnt/stateful_partition/var/lib/docker/buildkit
208K /mnt/stateful_partition/var/lib/docker/containers
4.4M /mnt/stateful_partition/var/lib/docker/image
52K /mnt/stateful_partition/var/lib/docker/network
1.6G /mnt/stateful_partition/var/lib/docker/overlay2
20K /mnt/stateful_partition/var/lib/docker/plugins
4.0K /mnt/stateful_partition/var/lib/docker/runtimes
4.0K /mnt/stateful_partition/var/lib/docker/swarm
4.0K /mnt/stateful_partition/var/lib/docker/tmp
4.0K /mnt/stateful_partition/var/lib/docker/trust
28K /mnt/stateful_partition/var/lib/docker/volumes
TL;DR: Use Stackdriver Monitoring and create an alert for DISK usage.
Since you are using COS images, you can enable Stackdriver Monitoring agent by simply adding the “google-monitoring-enabled” label set to “true” on GCE Instance metadata. To do so, run the command:
gcloud compute instances add-metadata instance-name --metadata=google-monitoring-enabled=true
Replace instance-name with the name of your instance. Remember to restart your instance to get the change done. You don't need to install the Stackdriver Monitoring agent since is already installed by default in COS images.
Then, you can use disk usage metric to get the usage of your disk.
You can create an alert to get a notification each time the usage of the partition reaches a certain threshold.
Since you are in a cloud, it is always the best idea to use the Cloud resources to solve Cloud issues.
Docker uses /var/lib/docker to store your images, containers, and local named volumes. Deleting this can result in data loss and possibly stop the engine from running. The overlay2 subdirectory specifically contains the various filesystem layers for images and containers.
To cleanup unused containers and images via command:
docker system prune.
Monitor it via command "watch"
sudo watch "du -sh /mnt/stateful_partition/var/lib/docker/*"

Google compute engine, instance dead? How to reach?

I have a small instance running in GCE, had some troubles with the MongoDb so after some tries decided to reset the instance. But... it didn't seem to come back online. So i stopped the instance and restarted it.
It is an Bitnami MEAN stack which starts apache and stuff at startup.
But... i can't reach the instance! No SCP, no SSH, no webservice running. When i try to connect via SSH (in GCE) it times out, cant make connection on port 22. In the information it says 'The instance is booting up and sshd is not running yet', which is possible of course.... But i cant reach the instance in no possible manner not even after an hour wait :) Not sure what's happening if i cant connect to it somehow :(
There is some activity in the console... some CPU usage, mostly 0%, some incomming traffic but no outgoing...
I hope someone can give me a hint here!
Update 1
After the helpfull tip form Serhii... if found this in the logs...
Booting from Hard Disk 0...
[ 0.872447] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck
Update 2...
So, i need to fsck the drive...
Created a snapshot, made a new disk from that snapshot, added the new disk as an extra disk to another instance. Now that instance wont boot with the same problem... removing the extra disk fixed it again. So adding the disk makes it crash even though it isn't the boot-disk?
First, have a look at the Compute Engine -> VM instances -> NAME_OF_YOUR_VM -> Logs -> Serial port 1 (console) and try to find errors and warnings that could be connected to lack of free space or SSH. It'll be helpful if you updated your post by providing this information. In case if your instance run out of free space follow this instructions.
You can try to connect to your VM via Serial console by following this guide, but keep in mind that:
The interactive serial console does not support IP-based access
restrictions such as IP whitelists. If you enable the interactive
serial console on an instance, clients can attempt to connect to that
instance from any IP address.
more details you can find in the documentation.
Have a look at the Troubleshooting SSH guide and Known issues for SSH in browser. In addition, Google provides a troubleshooting script for Compute Engine to identify issues with SSH login/accessibility of your Linux based instance.
If you still have a problem try to use your disk on a new instance.
EDIT It looks like your test VM is trying to boot from the disk that you created from the snapshot. Try to follow this guide.
If you still have a problem, you can try to recreate the boot disk from a snapshot to resize it.

Is MariaDB data lost after Docker setting change?

I've setup a basic MariaDB instance running in Docker - basically from starting the container using the Kitematic UI, changing the settings, and letting it run.
Today, I wanted to make a backup, so I used Kitematic to change the port so I could access it from a machine to make automated backups. After changing the port in Kitematic, it seems to have started a fresh MariaDB container (i.e. all my data seems to be removed).
Is that the expected behavior? And, more importantly, is there any way to recover the seemingly missing data, or has it been completely removed?
Also, if the data is actually removed, what is the preferred way to change settings—such as the exposed ports—without losing all changes? docker commit?
Notes:
running docker 1.12.0 beta for OS X
docker -ps a shows the database status as "Up for X minutes" when the original had been up for several days
Thanks in advance!
UPDATE:
It looks like the recommended procedure to retain data (without creating a volume or similar) is to:
commit changes (e.g. docker commit <containerid> <name/tag>)
take the container offline
update settings such as exposed port or whatever else
run the image with committed changes
...taken from this answer.
Yes, this is expected behavior. If you want your data to be persistant you should mount volume from host (via --volume option for docker run) or from another container and store your database files at this volume.
docker run --volume /path/on/your/host/machine:/var/lib/mysql mariadb
Losing changes are actually core feature of containers so it can not be omitted. This way you can be sure that between every docker run you get fresh environment without any changes. If you want your changes to be permanent you should do them in your image's Dockerfile, not in container itself.
For more information please visit official documentation: https://docs.docker.com/engine/tutorials/dockervolumes/.
it looks like you dont mount container volume into certain path. You can read about volumes and storing data into container here
you need run container with volume option
$ docker run --name some-mariadb -v /my/own/datadir:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mariadb:tag
where /my/own/datadir is directory on host machine