Cannot start an existing container with Podman - containers

I am running Podman version 1.6.2 on Ubuntu 18.04. I am unable to start a container after stopping it.
I run the container with:
podman run -d -p 8081:8081 --name nexus -v /opt/nexus-data:/nexus-data sonatype/nexus3
And it starts up ok. If I run:
podman container stop nexus
podman container start nexus
I get an error:
Error: unable to start container "nexus": container create failed (no
logs from conmon): EOF
When run with debug logging I see this in the output:
DEBU[0000] Initializing event backend journald DEBU[0000]
using runtime "/usr/lib/cri-o-runc/sbin/runc" WARN[0000] Error
initializing configured OCI runtime crun: no valid executable found
for OCI runtime crun: invalid argument
DEBU[0000] unmounted container
"419f6576ff23328c6445526058c9988aa27a4b69605348230fa26246a522c726"
ERRO[0000] unable to start container "nexus": container create failed
(no logs from conmon): EOF
The source image is:
docker.io/sonatype/nexus3
I'm not sure what the "invalid argument" in the logs means. Do I need to pass another argument?

there seems to be problem with the latest version of conmon package from Project Atomic PPA (v 2.0.3).
I had the same problem and I installed a lower version of conmon package (v 2.0.0) from,
https://launchpad.net/ubuntu/+archive/primary/+files/conmon_2.0.0-1_amd64.deb
This is a package built for Eoan. However, it worked on my Bionic environment and I am able to start my containers again.

As #Loki Arya noted, a bug in the common package was causing the issue. Since Podman for Unbuntu is no longer being hosted at projectatomic ppa, the updates after version 1.6.2 that fixed the bug were not available.
After removing the project atomic ppa and all associated packages, I reinstalled Podman for Ubuntu from its new repository location here
I've tested Podman (1.7) and it is working great, including the start command

Related

Redhat UBI8: buildah fails with Error: Unable to find a match: e2fsprogs

I have the same problem as Docker build fails with Error: Unable to find a match: e2fsprogs xfsprogs mdadm parted. I can't install e2fsprogs into a UBI8 container and I'm already using podman and buildah (Docker has never been installed). I just need the e2fsprogs package to work on an intermediate image that I will use to create my final image that doesn't need to include it.

Wireguard refuses to run on Fedora 31: protocol not supported

I recently install wireguard-tools and restarted my machine several times. However, it will simply not run. I get this error every time:
Warning: `/etc/wireguard/myprovider.conf' is world accessible
[#] ip link add myprovider type wireguard
Error: Unknown device type.
Unable to access interface: Protocol not supported
[#] ip link delete dev myprovider
Cannot find device "provider"
I then ran dkms status and it came up entirely blank, even after a restart. Looking online, it doesn't seem like anyone else's dkms went blank. My kernel version is 5.5.13-200.fc31.x86_64, which is the latest I can go to. I've tried the general advice of cleaning packages, updating and then reinstalling wireguard but it has not worked. What should I do from here? Does this require a reinstall of the whole OS?
What does modinfo wireguard show?
Wireguard is included Linux 5.6 and higher. On your 5.5 kernel you'll need to install the a kernel module.
I'm a Debian user, but on fedora this should work:
$ sudo dnf copr enable jdoss/wireguard
$ sudo dnf install wireguard-dkms wireguard-tools
wireguard-tools only install's the tools, wireguard-dkms the kernel module, this will do the actual work.
reference
I solved it by upgrading wireguard :
sudo apt-get upgrade wireguard
Here is the terminal results:
(...)
Preparing to unpack .../92-wireguard-dkms_1.0.20200520-0ppa1~18.04_all.deb ...
-------- Uninstall Beginning --------
Module: wireguard
Version: 1.0.20200401
Kernel: 5.3.0-46-generic (x86_64)
-------------------------------------
Status: Before uninstall, this module version was ACTIVE on this kernel.
wireguard.ko:
- Uninstallation
- Deleting from: /lib/modules/5.3.0-46-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
depmod...
DKMS: uninstall completed.
-------- Uninstall Beginning --------
Module: wireguard
Version: 1.0.20200401
Kernel: 5.3.0-51-generic (x86_64)
-------------------------------------
Status: Before uninstall, this module version was ACTIVE on this kernel.
wireguard.ko:
- Uninstallation
- Deleting from: /lib/modules/5.3.0-51-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
depmod...
DKMS: uninstall completed.
------------------------------
Deleting module version: 1.0.20200401
completely from the DKMS tree.
------------------------------
Done.
(...)
Then sudo wg-quick up {conf_name} worked again.

ERROR: Preparation failed: Getwd: getwd: no such file or directory

Gitlab runner throw ERROR: Preparation failed: Getwd: getwd: no such file or directory?
gitlab version is: GitLab Community Edition 8.6.4
gitlab-runner version: 1.11.5
My CI throw ERROR: Preparation failed: Getwd: getwd frequently,
but sometimes we commit is work fine. So we didn't know what the final reason cause this problem.
We only know about one thing that is this error shows after we moved the build directory.
In my case that was because of residual gitlab-runner processes still executing. I resolved it by identifying guilty pids then killed them:
$ ps -ax | grep gitlab-runner
27034 ? Ssl 0:06 /usr/bin/gitlab-runner run --working-directory /home/gitlab-runner --config /etc/gitlab-runner/config.toml --service gitlab-runner --syslog --user gitlab-runner
$ sudo kill -9 27034
I got the same error and solved by restarting gitlab-runner
gitlab-runner restart
The Gitlab Runner checks out a copy of your repository into CI_PROJECT_DIR. You can check its value by adding the following to your .gitlab-ci.yml:
script:
- echo $CI_PROJECT_DIR
I received the "getwd: no such file or directory" error because:
I had changed my working directory to /var/www/mysite (I am using a docker container with gitlab-runner installed inside it, but I think that's beside the point)
one of my deploy script lines moves /var/www/mysite to /var/www/old-mysite.
I'm used to the Gitlab Runner checking out its build inside /home/gitlab-runner/build. When I changed the docker working directory this caused the runner to check it out at /var/www/mysite/build.
After my script moved /var/www/mysite to /var/www/old-mysite, on second and subsequent runs, gitlab runner still expected to find (/var/www/mysite) but it no longer existed, hence the error.
Given the above, I can't explain why the runner works the first time ever, when that directory also doesn't exist, but hopefully my answer might at least prompt something useful for someone! :)

Unable to get cuda to work in tensorflow

I'm trying to use cuda to accelerate tensorflow. I'm running tensorflow using the docker image.
Firstly, when I launch the gpu image, it has a mismatch in the LT_LIBRARY_PATH environment variable:
~# echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64:
root#d578acbbc2cd:~# ls /usr/local/
bin cuda cuda-7.0 etc games include lib man sbin share src
There's no nvidia directory there. When I try to run the convolutional.py demo, it can't initialise the cuda support:
# python models/image/mnist/convolutional.py
Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.2.0-23-generic/modules.dep.bin'
E tensorflow/stream_executor/cuda/cuda_driver.cc:466] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:98] retrieving CUDA diagnostic information for host: d578acbbc2cd
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:106] hostname: d578acbbc2cd
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:131] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:242] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.68 Tue Dec 1 17:24:11 PST 2015
GCC version: gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu2)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:135] kernel reported version is: 352.68
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA:
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
It then goes on to train using cpu only.
# find /usr -name libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so
So in the docker image, there's only the gnu cpu cuda implementation. No NVIDIA stuff. In the host ubuntu 15.10 session, I have libcuda.so installed:
$ find /usr -name libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/i386-linux-gnu/libcuda.so
/usr/local/cuda-7.5/targets/x86_64-linux/lib
/stubs/libcuda.so
So these seem to be stubs ... not sure why.
Is there some trick to getting this to work?
Try rebuilding the Docker image directly from the Tensorflow repository (i.e. don't rely on the image on the container registry) and use https://github.com/NVIDIA/nvidia-docker to run the container (the Docker command described in the Tensorflow documentation is not portable).
I had a similar problem, though not in docker. The libcuda.so in /usr/local/cuda/lib64/stubs was a broken sym link. When I searched for libcuda.so it only turned up a file in a lib32 folder.
It seems that the problem was how I originally installed the NVIDIA device driver. At some point in the driver install process you're given the option to install the lib32 drivers. I had thought this meant in addition to lib64 drivers so I selected it. Turns out it only installs lib32 and not lib64 drivers.
I reinstalled the NIVDIA device driver, this time not selecting the lib32 'option'. Now tensorflow finds libcuda.so.
I had the same problem with running tensorflow on a Ubuntu machine after I upgraded my driver to 352.63 and 352.93. (I remember it works with 346.* but when I try to install 346., it installs 352. automatically for some reason).
I finally figured out that it's caused by permission issue. (I can run it with root) So, I changed the permission of the libcuda.so.352-63 file to executable by anyone and it works well now.
Hope this will be helpful to those still struggling with this issue.
I didn't try the docker one, but I guess it's also caused by permission setting.
Try this command
sudo apt-get install nvidia-modprobe
As mentioned here:
https://github.com/tensorflow/tensorflow/issues/394
and
http://kkjkok.blogspot.in/2016_08_01_archive.html
After I updated NVIDIA driver to 378.09 on Ubuntu 14.10 I had the same error,
although all the right for lib files were set correctly.
Thanks to #PhoenixQ, I tried to run with sudo and it worked.
After that I tried to run without sudo one more time and error disappeared. I'm not sure what ecxactly happened, but maybe something was configured during call with sudo, which was not possible withous sudo.
So the solution:
Try to run the same thing with sudo.
After this. Tryu running without sudo. Worked for me.

Bluemix `docker exec` returns 404

I pushed an image (mysql:5.5 to be exact) to my registry and am currently running the container under the name db and it does appear when I run cf ic ps.
As docker exec seems to be supported now, I tried to run cf ic exec -it db bash but I get a response of Error response from daemon: 404 error encountered while processing request!. Any exec command I try results in the same error... Does anyone know why this returns a 404 when my container does exist?
For reference I need to load a dump onto the container which is why I'm trying docker exec in the first place.
Edit: Can confirm this occurs for any container I create and try to exec -it into. logs for any container give the same error as well
For some reasons the daemon could not reach your container. I've just tried the following command on different kinds of containers and it worked:
cf ic exec -it [containerId] [command]
I think you should retry. If the problem persists I suggest you to restart the container with:
cf ic restart [containerId]
If you still get 404 you could try with a new container instance using docker run again.
Moreover, be sure that you have installed the latest version of the IBM Containers CLI
Due to a platform issue this command, even though recently added to the docker's supported commands on Bluemix, was not working fine. This was a bug that's been resolved few days ago so you should try again.