I am trying to build a large image with Podman. The build fails with
Reached heap limit Allocation failed
error.
In docker I can avoid this by allocating more memory for docker engine in docker settings.
However, for Podman it doesn't seem to be so easy.
I tried to modify ~/.config/containers/podman/machine/qemu/podman-machine-default.json and increase the memory, then run
podman machine stop
podman machine start
with no luck to solve the problem.
Any ideas?
So after playing around I found out that I was doing the right thing, but it is needed to change the memory amount in two places in the mentioned file (I changed only in one):
In the "CmdLine" section after -m flag
the "Memory" section itself
The following commands are also very handy for quick resolution.
podman machine stop
podman machine rm
podman machine init --cpus 2 --memory 2048
podman machine start
Source: https://github.com/containers/podman/issues/12713#issuecomment-1002567777
Related
as a first... yes...yes I know there are 1000 questions and solutions to this. But unfortunately none of them helps me.
Let's get to the problem:
I have a Docker container running on which MySQL is configured. Now I would like to change the bind address from 127.0.0.1 to 0.0.0.0. Unfortunately I can't open my.cnf because I don't have nano, vim installed. With apk, yum, vim, apt-get and so on I get that:
apt-get: command not found
apk: command not found
...
Could someone of you maybe help me out with my little problem?
best thanks and greetings
The default for MySQL docker image has been changed to Oracle based Linux distribution. In this distribution, the default package manager is yum. If for whatever reason you still want to use apt, pull Debian image explicitly. Something like mysql:8-debian.
See this issue for more detail.
You could do a docker cp to copy the file out of the container, edit it, and then docker cp it back in again. This may be fine if you need to do this for troubleshooting, but you probably want to look at fixing this in your deployment process. You should be able to destroy and re-create the docker container without having to manually fix configurations. This should be handled in your Dockerfile, or perhaps copying the correct configuration file in in your docker compose file.
I just installed CentOS 8 and added nodejs (tried v12 & v14) And then I installed pm2 using npm install pm2#latest (so at the time of posting it uses v4.4.0). I did try an older version (v3.5.0), but it does the exact same thing.
and after pm2 got installed, i ran the command "pm2 startup"
after a restart, pm2 does start, but gets killed after 90 seconds and then restarts giving this message
"pm2 has been killed by signal, dumping process list before exit..."
First, I thought it was because of my app (the one that pm2 is supposed to manage), but i removed it from pm2, so it's practically empty, but it does the same thing
Running the following command as root worked for me:
pm2 update
I had the same issue and I tried several solutions online but none worked for me.
However, I completely removed pm2, restarted the server, and reinstalled pm2 and that does it for me.
1- Stop and remove pm2
pm2 kill
sudo npm remove pm2 -g
2- Restart the server
sudo reboot
3- Log in again, then reinstall pm2
sudo npm install -g pm2
I did not disable SE Linux (I think it's not safe to disable it), but the following method helped me:
Edit file: /etc/systemd/system/pm2-root.service
Add new line: Environment=PM2_PID_FILE_PATH=/run/pm2.pid
And replace: PIDFile=/root/.pm2/pm2.pid to: PIDFile=/run/pm2.pid
Versions:
CentOS 8.3.2011
Node.js 14.16.0
NPM 7.7.5
PM2 4.5.5
Original answer. Thanks Alec!
Later update. For those who are facing the same issues. It's an issue related to SE Linux. Known workarounds (the ones I discovered).
Disabling SE Linux (obviously, not recommended)
go to /etc/systemd/system/pm2-root.service - comment PIDFile=... (add a # in front of that line)
Audit and trace - use following commands:
# dnf install policycoreutils-python-utils setroubleshoot-server -y
# journalctl -f
At ths point, you should see the solution in the output (the log)
it should be something like:
# ausearch -c 'systemd' --raw | audit2allow -M my-systemd
# semodule -i my-systemd.pp
You need to do the last step (ausearch... and semodule...) twice - I did it once, restarted the machine and noticed the same issue after 90 seconds. But if you read the log carefully, you will notice that the issue seems to be outputed twice. (looks the same). Probably two things are trying to write to that file (pm2-root.service).
Still waiting for the perfect solution (done by the person that really knows how to fix this in a proper manner), but for those that have this issue, any of these options seem to work just fine.
I've had this problem (on Debian), when for some reason two "PM2 God Daemon" processes (not threads) were launched, so they conflicting with each other.
Killing one of them solved the issue.
I am running Podman version 1.6.2 on Ubuntu 18.04. I am unable to start a container after stopping it.
I run the container with:
podman run -d -p 8081:8081 --name nexus -v /opt/nexus-data:/nexus-data sonatype/nexus3
And it starts up ok. If I run:
podman container stop nexus
podman container start nexus
I get an error:
Error: unable to start container "nexus": container create failed (no
logs from conmon): EOF
When run with debug logging I see this in the output:
DEBU[0000] Initializing event backend journald DEBU[0000]
using runtime "/usr/lib/cri-o-runc/sbin/runc" WARN[0000] Error
initializing configured OCI runtime crun: no valid executable found
for OCI runtime crun: invalid argument
DEBU[0000] unmounted container
"419f6576ff23328c6445526058c9988aa27a4b69605348230fa26246a522c726"
ERRO[0000] unable to start container "nexus": container create failed
(no logs from conmon): EOF
The source image is:
docker.io/sonatype/nexus3
I'm not sure what the "invalid argument" in the logs means. Do I need to pass another argument?
there seems to be problem with the latest version of conmon package from Project Atomic PPA (v 2.0.3).
I had the same problem and I installed a lower version of conmon package (v 2.0.0) from,
https://launchpad.net/ubuntu/+archive/primary/+files/conmon_2.0.0-1_amd64.deb
This is a package built for Eoan. However, it worked on my Bionic environment and I am able to start my containers again.
As #Loki Arya noted, a bug in the common package was causing the issue. Since Podman for Unbuntu is no longer being hosted at projectatomic ppa, the updates after version 1.6.2 that fixed the bug were not available.
After removing the project atomic ppa and all associated packages, I reinstalled Podman for Ubuntu from its new repository location here
I've tested Podman (1.7) and it is working great, including the start command
I am trying to run the u-boot on QEMU. But when start QEMU it gives nothing, so why this doesn't work and how to debug to find out the reason?
This is I tried:
Install Ubuntu 18.04 WSL2 on Windows.
Compile u-boot for the Raspi2
sudo apt install make gcc bison flex
sudo apt-get install gcc-arm-none-eabi binutils-arm-none-eabi
export CROSS_COMPILE=arm-none-eabi-
export ARCH=arm
make rpi_2_defconfig all
Start QEMU
qemu-system-arm -M raspi2 -nographic -kernel ./u-boot/u-boot.bin
And also tried QEMU on the Windows side, and the result is the same.
PS C:\WINDOWS\system32> qemu-system-arm.exe -M raspi2 --nographic -kernel E:\u-boot\u-boot.bin
Then QEMU didn't give output, even I tried to ctrl+c cannot stop the process.
Unfortunately this is an incompatibility between the way that u-boot expects to be started on the raspberry pi and the ways of starting binaries that QEMU supports for this board.
QEMU supports two ways of starting guest code on Arm in general:
Linux kernels; these boot with whatever the expected
boot protocol for the kernel on this board is. For raspi
that will be "start the primary CPU, but put the secondaries in
the pen waiting on the mbox". Effectively, QEMU emulates a
very minimal bit of the firmware, just enough to boot Linux.
Not Linux kernels; these are booted as if they were the
first thing to execute on the raw hardware, which is to say
that all CPUs start executing at once, and it is the job of
the guest code to provide whatever penning of secondary CPUs
it wants to do. That is, your guest code has to do the work
of the firmware here, because it effectively is the firmware.
We assume that you're a Linux kernel if you're a raw image,
or a suitable uImage. If you're an ELF image we assume you're
not a Linux kernel. (This is not exactly ideal but we're to
some extent lumbered with it for backwards-compatibility reasons.)
On the raspberry pi boards, the way the u-boot binary expects to be started is likely to be "as if the firmware launched it", which is not exactly the same as either of the two options QEMU supports. This mismatch tends to result in u-boot crashing (usually because it is not expecting the "all CPUs run at once" behaviour).
A fix would require either changes to u-boot so it can handle being launched the way QEMU launches it, or changes to QEMU to support more emulation of the firmware of this board (which QEMU upstream would be reluctant to accept).
An alternative approach if it's not necessary to use the raspi board in particular would be to use some other board like the 'virt' board which u-boot does handle in a way that allows it to boot on QEMU. (The 'virt' board also has better device support; for instance it can do networking and USB devices, which 'raspi' and 'raspi2' cannot at the moment.)
My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.
I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported.
Placing cudaDeviceReset() in the beginning of the program is only affecting the current context created by the process and doesn't flush the memory allocated before it.
I'm accessing a Fedora server with that GPU remotely, so physical reset is quite complicated.
So, the question is - Is there any way to flush the device memory in this situation?
check what is using your GPU memory with
sudo fuser -v /dev/nvidia*
Your output will look something like this:
USER PID ACCESS COMMAND
/dev/nvidia0: root 1256 F...m Xorg
username 2057 F...m compiz
username 2759 F...m chrome
username 2777 F...m chrome
username 20450 F...m python
username 20699 F...m python
Then kill the PID that you no longer need on htop or with
sudo kill -9 PID.
In the example above, Pycharm was eating a lot of memory so I killed 20450 and 20699.
First type
nvidia-smi
then select the PID that you want to kill
sudo kill -9 PID
Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing
$ rmmod nvidia
with suitable root privileges and then reloading it with
$ modprobe nvidia
If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.
This answer has been assembled from comments and posted as a community wiki to get this question off the unanswered list for the CUDA tag
for the ones using python:
import torch, gc
gc.collect()
torch.cuda.empty_cache()
I also had the same problem, and I saw a good solution in quora, using
sudo kill -9 PID.
see https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi
One can also use nvtop, which gives an interface very similar to htop, but showing your GPU(s) usage instead, with a nice graph.
You can also kill processes directly from here.
Here is a link to its Github : https://github.com/Syllo/nvtop
on macOS (/ OS X), if someone else is having trouble with the OS apparently leaking memory:
https://github.com/phvu/cuda-smi is useful for quickly checking free memory
Quitting applications seems to free the memory they use. Quit everything you don't need, or quit applications one-by-one to see how much memory they used.
If that doesn't cut it (quitting about 10 applications freed about 500MB / 15% for me), the biggest consumer by far is WindowServer. You can Force quit it, which will also kill all applications you have running and log you out. But it's a bit faster than a restart and got me back to 90% free memory on the cuda device.
For OS: UBUNTU 20.04
In the terminal type
nvtop
If the direct killing of consuming activity doesn't work then find and note the exact number of activity PID with most GPU usage.
sudo kill PID -number
If you have the problem that after killing one process the next starts (Comment)- like for example when you have a bash script that calls multiple python scripts and you want to kill them but can't find its PID you can use ps -ef where you'll find the PID of your "problematic" process and also its PPID (parent PID). Use kill PPID or kill -9 PPID or sudo kill PPID to stop the processes.
to kill all processess on GPU:
sudo fuser -v /dev/nvidia* -k
If all of this does not work, I found another answer here:
How to kill process on GPUs with PID in nvidia-smi using keyword?
nvidia-smi | grep 'python' | awk '{ print $X }' | xargs -n1 kill -9
Note that X (in the 'awk' expression) corresponds to the Xth column of your nvidia-smi command.
If your nvidia-smi command looks like this, you should then replace X by 5.
I just started a new terminal and closed the old one and it worked out pretty well for me.