What if log file size exceeds my claimed Persistent volume? - openshift

I am working on logging of my application on Persistent Volume.
I am using OpenShift, I created storage(Persistent volume under nas-thin class) and allocated 64Gib to it. I added mount path to this PV for one of my pods where my application is running and generating logs in one of the folder named "logs".
My mount path is "/logs". Hence anything inside this folder will be root for my PVC.
I am appending my logs inside logs folder in a single file.
I tried to read about expanding PV but couldn't understand much.
What would happen if my log file size exceeds allocated PV size(which is 64Gib)?

That will depend on the persistent storage actually being used by the cluster. Some persistent volume providers will let you write more data than you actually defined. So you'll have to test how your storage and your application actually behave on your particular cluster.
That being said, it is generally a bad idea to have container workload log to a persistent volume. I would strongly recommend to log to STDOUT and then use an external logging system to manage your logs instead of writing to a file.
How will you deal with multiple replicas of your application running? Do you really want to go into each container to get the log files? How will you correlate logs between different replicas?
Applications running on OpenShift / Kubernetes should not manage their logs in files but write to STDOUT.

Related

"Too many open files" error during backward ingest of data in influxdb v2 on rootless containerd

I am trying to figure out how resource limits in rootless container environments are handled. I am running influxdb:latest (tried :alpine as well) as a rootless container with nerdctl and containerd an an up-to-date ubuntu host. While all other containers on the host and influx startup are running just fine, I get "too many open files" errors when ingesting about 3 years of data into the database. Data is sampled on a 24h basis, all in all 926 days with 5 points each. Logs indicate that errors arise during shard creation.
Form what I have read during the last hours, when running rootless containers the container 'root' user gets mapped to an UID from my personal user's namespace on the host system. From my understanding the container's 'root' should therefore be inheriting my host user's limits on open files and processes. ulimit -u gives 14959 as open files limit. lsof -u myuser | wc -l indicates about 833 open files during normal operation. Accordingly, there should be plenty of room for opening more files for the influxdb container. The errors start to happen during creation of shards numbering below 100 on a newly set up database.
After hours of testing and googling I as able to mitigate my problem by setting --ulimit nofile=5000:5000 when running the influxdb container.
Can someone explain why this is necessary even though my user has a limit set way above what should be needed an is indicated by --ulimit?
Already tried mounting the influx data folder as a volume, as bind mount and already tried without mounting at all. The issue stays all the same, independent of how storage is handled. My impression is, that the container gets started with a fairly low number of open files allowed. But why? Could not find anything related in neither documentation.

How to track disk usage on Container-Optimized OS

I have an application running on Container-Optimized OS based Compute Engine.
My application runs every 20min, fetches and writes data to a local file, then deletes the file after some processing. Note that each file is less than 100KB.
My boot disk size is the default 10GB.
I run into "no space left on device" error every month or so while attempting to write the file locally.
How can I track disk usage?
I manually checked the size of the folders and it seems that the bulk of the space is taken by /mnt/stateful_partition/var/lib/docker/overlay2.
my-vm / # sudo du -sh /mnt/stateful_partition/var/lib/docker/*
20K /mnt/stateful_partition/var/lib/docker/builder
72K /mnt/stateful_partition/var/lib/docker/buildkit
208K /mnt/stateful_partition/var/lib/docker/containers
4.4M /mnt/stateful_partition/var/lib/docker/image
52K /mnt/stateful_partition/var/lib/docker/network
1.6G /mnt/stateful_partition/var/lib/docker/overlay2
20K /mnt/stateful_partition/var/lib/docker/plugins
4.0K /mnt/stateful_partition/var/lib/docker/runtimes
4.0K /mnt/stateful_partition/var/lib/docker/swarm
4.0K /mnt/stateful_partition/var/lib/docker/tmp
4.0K /mnt/stateful_partition/var/lib/docker/trust
28K /mnt/stateful_partition/var/lib/docker/volumes
TL;DR: Use Stackdriver Monitoring and create an alert for DISK usage.
Since you are using COS images, you can enable Stackdriver Monitoring agent by simply adding the “google-monitoring-enabled” label set to “true” on GCE Instance metadata. To do so, run the command:
gcloud compute instances add-metadata instance-name --metadata=google-monitoring-enabled=true
Replace instance-name with the name of your instance. Remember to restart your instance to get the change done. You don't need to install the Stackdriver Monitoring agent since is already installed by default in COS images.
Then, you can use disk usage metric to get the usage of your disk.
You can create an alert to get a notification each time the usage of the partition reaches a certain threshold.
Since you are in a cloud, it is always the best idea to use the Cloud resources to solve Cloud issues.
Docker uses /var/lib/docker to store your images, containers, and local named volumes. Deleting this can result in data loss and possibly stop the engine from running. The overlay2 subdirectory specifically contains the various filesystem layers for images and containers.
To cleanup unused containers and images via command:
docker system prune.
Monitor it via command "watch"
sudo watch "du -sh /mnt/stateful_partition/var/lib/docker/*"

Openshift Mymsql persistent storage won't mount on php

I have pod that utilizes php and I have a persistent MySQL storage created on openshift online. Whenever I click the option "add storage to php" and I set mysql as the storage with mount point /var/lib/mysql the server attempts to redeploy but the new container is stuck creating and then fails. I get multiple error messages like this one:
Failed to attach volume "pvc-d4962378-aae0-11e7-8a41-0a2a2b777307" on node "ip-172-31-50-169.us-west-2.compute.internal" with: Error attaching EBS volume "vol-0087ade77401256f5" to instance "i-0b8b81e68bc629f01": VolumeInUse: vol-0087ade77401256f5 is already attached to an instance status code: 400, request id: dfbdac9b-bad0-4211-8158-080a4e120b1a. The volume is currently attached to instance "i-02a6b44c53ab0d7f2"
Isn't this the proper way to connect mysql storage to a pod?
EBS volume type can only be mounted on one node at a time in an OpenShift cluster. When you have PHP and MySQL as separate applications that can land on different nodes and as a result, you can't mount the persistent volume against both. The error is warning you of this.
The only way you can use a single EBS volume against PHP and MySQL at the same time is for them to be running in separate containers of the same pod. You also need to ensure that the deployment strategy is set to Recreate and not Rolling, as rolling results in a new instance being created when the old still exists, with same issue arising as the new and old could be on different nodes.

Access hard drive for VM that won't boot

The hard drive on my VM filled up (it was under a SSH login attack I believe) and now the VM won't boot.
How can I access the boot hard drive so I can clear out space and get it booting again?
You'll need to delete the instance without deleting the boot disk, attach the disk on a temporary instance and then mount it there. The instructions to do this can found here: https://cloud.google.com/compute/docs/troubleshooting#ssherrors (Mount your disk on a temporary instance)
You'll then be able to access the disk through the /mnt/myinstance directory, and delete the files. Once the disk is no longer full, you can detach the disk from the temporary instance, and then use it to recreate your original instance.

configuring log4j in a clustered environment

We are using log4j for the logging functionality. The application is running in a Clustered environment. How can I configure log4j properties such that all the instances log to the same log file?
One solution is to have a directory dedicated for logging. That directory can be on a network share (NFS, etc.) that is mounted to a location that is the same for both processes. This could be as simple as mounting to the identical spot in the file structure or it could be done using environment variables ($LOGDIR) so each host could point to a different location in their local file structure.
The important thing is that the folder be shared so that multiple processes are writing to the same file. The normal shared-resource restrictions apply though; make sure the file isn't locked by one host while preventing the other from writing, etc. Also, use an output pattern that includes hostname/process name/thread id.
Another approach I've used is a database appender that writes to a log table. No share needed but you still need to design the table considering the issues with multi-process logging.