"Too many open files" error during backward ingest of data in influxdb v2 on rootless containerd - containers

I am trying to figure out how resource limits in rootless container environments are handled. I am running influxdb:latest (tried :alpine as well) as a rootless container with nerdctl and containerd an an up-to-date ubuntu host. While all other containers on the host and influx startup are running just fine, I get "too many open files" errors when ingesting about 3 years of data into the database. Data is sampled on a 24h basis, all in all 926 days with 5 points each. Logs indicate that errors arise during shard creation.
Form what I have read during the last hours, when running rootless containers the container 'root' user gets mapped to an UID from my personal user's namespace on the host system. From my understanding the container's 'root' should therefore be inheriting my host user's limits on open files and processes. ulimit -u gives 14959 as open files limit. lsof -u myuser | wc -l indicates about 833 open files during normal operation. Accordingly, there should be plenty of room for opening more files for the influxdb container. The errors start to happen during creation of shards numbering below 100 on a newly set up database.
After hours of testing and googling I as able to mitigate my problem by setting --ulimit nofile=5000:5000 when running the influxdb container.
Can someone explain why this is necessary even though my user has a limit set way above what should be needed an is indicated by --ulimit?
Already tried mounting the influx data folder as a volume, as bind mount and already tried without mounting at all. The issue stays all the same, independent of how storage is handled. My impression is, that the container gets started with a fairly low number of open files allowed. But why? Could not find anything related in neither documentation.

Related

What if log file size exceeds my claimed Persistent volume?

I am working on logging of my application on Persistent Volume.
I am using OpenShift, I created storage(Persistent volume under nas-thin class) and allocated 64Gib to it. I added mount path to this PV for one of my pods where my application is running and generating logs in one of the folder named "logs".
My mount path is "/logs". Hence anything inside this folder will be root for my PVC.
I am appending my logs inside logs folder in a single file.
I tried to read about expanding PV but couldn't understand much.
What would happen if my log file size exceeds allocated PV size(which is 64Gib)?
That will depend on the persistent storage actually being used by the cluster. Some persistent volume providers will let you write more data than you actually defined. So you'll have to test how your storage and your application actually behave on your particular cluster.
That being said, it is generally a bad idea to have container workload log to a persistent volume. I would strongly recommend to log to STDOUT and then use an external logging system to manage your logs instead of writing to a file.
How will you deal with multiple replicas of your application running? Do you really want to go into each container to get the log files? How will you correlate logs between different replicas?
Applications running on OpenShift / Kubernetes should not manage their logs in files but write to STDOUT.

mysql-Docker with separate file-based log files and log rotation

So I have a mysql Docker up and running with 3 log files (general, error, slow-query log) enabled, that are written to /var/log/mysql/ (path inside the mysql container), which actually is a directory on the docker host (named 'log') and mounted into the container as a volume specified in the docker-compose.yml.
We chose this way, because we didn't want general and slow-query logs combined on stdout and we prefer a daily rotation of the 3 separate log files, since it seems more comfortable to us to find a certain query that was issued - let's say - 4 days ago.
Since the mysql Docker (afaik) doesn't come with logrotate and/or cron, we decided to have another service in the docker-compose.yml named logrotator, which starts cron in it's entrypoint, which in turn regularly runs logrotate with a given logrotate.conf. The 'log' directory is also mounted into the logrotator container, so it can do it's rotation job on the mysql log files.
Now it seems like mysql needs a "mysqladmin flush-logs" after each rotation to start writing into a new file descriptor, but the logrotator container cannot issue this command inside the mysql container.
To make it short(er): I'm sure there are better ways to accomplish separate log files with log rotation. Just how? Any ideas are much appreciated. Thanks.
Update:
Since we're using mysql 5.7 as of now, and hence probably cannot solve our issue by the solution as proposed by #buaacss (which might absolutely work), we decided to stay with a "cron" container. Additionally we installed docker.io inside the cron container and mounted the docker host's /var/run/docker.sock into the cron container. This allows us to use "docker exec" to issue commands (in this case 'mysqladmin flush-logs') from the cron container to be executed in the mysql container. Problem solved.
you can indeed use SIGHUP instead of flush log statement based on doc
https://dev.mysql.com/doc/refman/5.6/en/log-file-maintenance.html
but may have some undesired effects, i.e. write huge report information to the error log.
so, as I mentioned in comment, they developed a light version of SIGHUP, i.e. SIGUSR1 to accomplish functions below
FR1: When SIGUSR1 is sent to the server, it must flush the error log.
FR2: When SIGUSR1 is sent to the server, it must flush the general log.
FR3: When SIGUSR1 is sent to the server, it must flush the slow query log.
FR4: SIGUSR1 must not send MySQL status report.
Currently when SIGHUP is sent to the server a large report of information is
printed to stdout, the status report.
FR5: The server must not fail when SIGUSR1 is sent, even though slow log is not
enabled.
FR6: The server must not fail when SIGUSR1 is sent, even though slow log output
is set to a table (log_output).
FR7: The server must not fail when SIGUSR1 is sent, even though general log is
set to OFF.
NFR1: SIGALRM must be undisguisable from how SIGUSR1 behaved before.
unfortunately such signal is only available in MySQL 8 or above

How to track disk usage on Container-Optimized OS

I have an application running on Container-Optimized OS based Compute Engine.
My application runs every 20min, fetches and writes data to a local file, then deletes the file after some processing. Note that each file is less than 100KB.
My boot disk size is the default 10GB.
I run into "no space left on device" error every month or so while attempting to write the file locally.
How can I track disk usage?
I manually checked the size of the folders and it seems that the bulk of the space is taken by /mnt/stateful_partition/var/lib/docker/overlay2.
my-vm / # sudo du -sh /mnt/stateful_partition/var/lib/docker/*
20K /mnt/stateful_partition/var/lib/docker/builder
72K /mnt/stateful_partition/var/lib/docker/buildkit
208K /mnt/stateful_partition/var/lib/docker/containers
4.4M /mnt/stateful_partition/var/lib/docker/image
52K /mnt/stateful_partition/var/lib/docker/network
1.6G /mnt/stateful_partition/var/lib/docker/overlay2
20K /mnt/stateful_partition/var/lib/docker/plugins
4.0K /mnt/stateful_partition/var/lib/docker/runtimes
4.0K /mnt/stateful_partition/var/lib/docker/swarm
4.0K /mnt/stateful_partition/var/lib/docker/tmp
4.0K /mnt/stateful_partition/var/lib/docker/trust
28K /mnt/stateful_partition/var/lib/docker/volumes
TL;DR: Use Stackdriver Monitoring and create an alert for DISK usage.
Since you are using COS images, you can enable Stackdriver Monitoring agent by simply adding the “google-monitoring-enabled” label set to “true” on GCE Instance metadata. To do so, run the command:
gcloud compute instances add-metadata instance-name --metadata=google-monitoring-enabled=true
Replace instance-name with the name of your instance. Remember to restart your instance to get the change done. You don't need to install the Stackdriver Monitoring agent since is already installed by default in COS images.
Then, you can use disk usage metric to get the usage of your disk.
You can create an alert to get a notification each time the usage of the partition reaches a certain threshold.
Since you are in a cloud, it is always the best idea to use the Cloud resources to solve Cloud issues.
Docker uses /var/lib/docker to store your images, containers, and local named volumes. Deleting this can result in data loss and possibly stop the engine from running. The overlay2 subdirectory specifically contains the various filesystem layers for images and containers.
To cleanup unused containers and images via command:
docker system prune.
Monitor it via command "watch"
sudo watch "du -sh /mnt/stateful_partition/var/lib/docker/*"

Google compute engine, instance dead? How to reach?

I have a small instance running in GCE, had some troubles with the MongoDb so after some tries decided to reset the instance. But... it didn't seem to come back online. So i stopped the instance and restarted it.
It is an Bitnami MEAN stack which starts apache and stuff at startup.
But... i can't reach the instance! No SCP, no SSH, no webservice running. When i try to connect via SSH (in GCE) it times out, cant make connection on port 22. In the information it says 'The instance is booting up and sshd is not running yet', which is possible of course.... But i cant reach the instance in no possible manner not even after an hour wait :) Not sure what's happening if i cant connect to it somehow :(
There is some activity in the console... some CPU usage, mostly 0%, some incomming traffic but no outgoing...
I hope someone can give me a hint here!
Update 1
After the helpfull tip form Serhii... if found this in the logs...
Booting from Hard Disk 0...
[ 0.872447] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck
Update 2...
So, i need to fsck the drive...
Created a snapshot, made a new disk from that snapshot, added the new disk as an extra disk to another instance. Now that instance wont boot with the same problem... removing the extra disk fixed it again. So adding the disk makes it crash even though it isn't the boot-disk?
First, have a look at the Compute Engine -> VM instances -> NAME_OF_YOUR_VM -> Logs -> Serial port 1 (console) and try to find errors and warnings that could be connected to lack of free space or SSH. It'll be helpful if you updated your post by providing this information. In case if your instance run out of free space follow this instructions.
You can try to connect to your VM via Serial console by following this guide, but keep in mind that:
The interactive serial console does not support IP-based access
restrictions such as IP whitelists. If you enable the interactive
serial console on an instance, clients can attempt to connect to that
instance from any IP address.
more details you can find in the documentation.
Have a look at the Troubleshooting SSH guide and Known issues for SSH in browser. In addition, Google provides a troubleshooting script for Compute Engine to identify issues with SSH login/accessibility of your Linux based instance.
If you still have a problem try to use your disk on a new instance.
EDIT It looks like your test VM is trying to boot from the disk that you created from the snapshot. Try to follow this guide.
If you still have a problem, you can try to recreate the boot disk from a snapshot to resize it.

Pre-existing MySQL data with Vagrant / VirtualBox

Background: I used to develop using MAMP and over the months/years I've accumulated a large mysql database (a few gigs) that I use for development for my different projects. I finally got around to setting up a VM using Vagrant and I've gotten everything set up and working nicely except my database. I'm running a CentOS 6.5 guest box on an OSX host.
My problem: I need my database to be completely persistent so I can vagrant up/destroy as many boxes as I need to, but the mysql persists.
My solution #1: I initially mounted a synced folder using vboxsf. This works pretty well and seems to be my best option so far, but performance is pretty bad. Query-intensive pages on my dev sites take 1-3 seconds to load whereas they might normally take under a second to load.
My solution #2: I then tried mounting a synced folder using nfs because the performance should be much better. The issue here is that mysql complains b/c, given the nature of nfs, it can't chown the data directory to the mysql:mysql user. I get the following errors when trying to start up the mysqld service:
chown: changing ownership of '/www/mysql': Operation not permitted
chmod: changing permissions of '/www/mysql': Permission denied
Sooo, my question is: are there any better ways to accomplish what I need? I feel like NFS would be the best solution, but I don't know how to get around the whole ownership/permission issues automatically with Vagrant. Any help would be appreciated.
I had the same issue or requirement for my local dev on Mac. And I found a solution for a MySQL-only Vagrant box with external data linked as folder_sync. But it'll run on Win too I guess.
Here is the Vagrant box config: https://github.com/ronnyhartenstein/vagrant-mysql-shared-folder
And if you understand German, here is my blog article with some background infos and tests (and fails of course): http://blog.rh-flow.de/2014/11/11/es-hat-sich-ausgemampft-vagrant-ist/
First of all, let me start with saying this is not best practice. You may know yourself that this can lead to problems if e.g. your PC goes blank or you want to give one project to another person for development. Of course, especially as a one-person-endevour, there are more important things than having test data importers and stuff :) So let's look for solutions.
NFS Permissions
To get NFS permissions right, your users need to have the same UID and GUID on host and guest. It's pretty tricky to setup and you should not change it from the guest. Maybe you can change it on the host to make it writeable to mysql and make UID and GUID the same. Of course, the moment the host changes this won't work anymore.
rsync shared folder
Rsync might not be the fastest in terms of syncing, but if you create on rsync shared folder where only MySQL is writing and which syncs back to some folder on your host this might be a solution. The "real" projects could still live inside a virtualbox share or nfs and you don't need to bother with correct permissions.
There might be some other solutions as well:
Create a backup/restore strategy
One way to go would be to backup MySQL inside your vagrant box at various points, e.g. every day. You could also run the backup when the box is shut down, thus creating a backup right before you destroy the box. Placing this backup at a shared folder, you'd have up-to-date data in case you destroy a box. Performance should be pretty good as the data MySQL is using wouldn't be on a shared folder.
Run MySQL on host or other vagrant box
It's of course possible to connect from within your vagrant box to your host or another vagrant box which runs MySQL. Your host or this box could be long-lived and could serve as a central "MySQL Server" for all your projects.
Have a MySQL slave running on the same machine which writes to shared folder
I believe with MySQL a master/slave combination is possible. Running both on one machine with the master (which you use in your projects) living inside your vm and not writing anything to a shared folder and a slave which writes to your shared folder and is a mirror of your master. This would mean that you have high performance and a few secs of delay between writing something and having it written to your shared folder. Of course, keeping this setup running and making sure it works all the time can be tricky.
You can use bindfs for changing the user/group of a share. I'm actually using a plugin called vagrant-bindfs which let's you remount a share with different ownerships. It works, but i haven't tried it with mysql to see how it performs.
Relevant lines on my Vagrantfile:
unless Vagrant.has_plugin?("vagrant-bindfs")
raise 'vagrant-bindfs is not installed! Please install with vagrant plugin install vagrant-bindfs'
end
config.vm.synced_folder "../", "/temp-nfs-mounts/sites-unbinded", type: :nfs
config.bindfs.bind_folder "/temp-nfs-mounts/sites-unbinded", "/sites", :force_user => "vagrant", :force_group => "vagrant", :create_as_user => true