Centos-Libvirt: For running vm, virsh vol-delete command fails while deleting SCSI disk image: cannot unlink file 'XXX': Success - libvirt

Description of problem:
On Centos machine, if we try to delete the SCSI disk image for running vm (which is added via virt-manager) through virsh vol-delete command, getting "cannot unlink file 'XXX': Success" error.
This error occurs intermittently and occurs only if we try to delete disk image when vm is running.
Version-Release number of selected component (if applicable):
OS:CentOS Linux release 7.2.1511
Libvirt:
[root#CV-HJ-CentOS7-02 images]# virsh version
Compiled against library: libvirt 1.2.17
Using library: libvirt 1.2.17
Using API: QEMU 1.2.17
Running hypervisor: QEMU 1.5.3
Steps to Reproduce:
Add SCSI disk from virt manager to VM
Start VM from virt-manager and confirm disk is attached as SCSI.
Try to delete the newly added SCSI disk using virsh vol-delete command:
virsh # vol-delete /var/lib/libvirt/images/.img
Actual results:
It is giving Following error:
error: Failed to delete vol /var/lib/libvirt/images/.img
error: cannot unlink file '/var/lib/libvirt/images/.img': Success

It looks like you're trying to delete the disk image before detaching it from the running VM which isn't allowed. You'll need to detach the disk first, then do a pool refresh and then you'll be able to delete it.
Here's an example using "f23-tst_default" as the name of my VM (domain) and a disk named "f23-test_default.qcow2" which I want to remove:
# virsh domblklist f23-tst_default
Target Source
------------------------------------------------
vda /var/lib/libvirt/images/f23-tst_default.img
sda /var/lib/libvirt/images/f23-tst_default.qcow2
# virsh detach-disk f23-tst_default --target sda
Disk detached successfully
# virsh domblklist f23-tst_default
Target Source
------------------------------------------------
vda /var/lib/libvirt/images/f23-tst_default.img
# virsh pool-refresh default
Pool default refreshed
# virsh vol-delete --pool default f23-tst_default.qcow2
Vol f23-tst_default.qcow2 deleted
If you don't do a 'pool-refresh' then virsh doesn't realize that the domain is no longer using the volume and, therefore, won't allow you to remove it.

I faced the same issue and it was a file permissions problem. I too faced it after upgrading from Centos 7.1 to 7.2.
To resolve make sure that the owner of the directory where the image is stored (default pool is /var/lib/libvirt/images) is the one defined in "user" option in /etc/libvirt/qemu.conf (default user is qemu).
If you haven't touched the defaults then:
# chown qemu:qemu /var/lib/libvirt/images
Then create a new image and try to delete it. It should succeed.

Related

OC cluster UP on Fedora not started correctly

I am trying to run openshift on Fedora 36 using Origin-Client or OC.
I have updated fedora to the latest version.
I have installed oc .
whenever I tried to do oc cluster up
it shows below error :
[root#fedora ridhoswasta]# oc cluster up
Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.11 ...
I0825 12:11:14.411027 50887 flags.go:30] Running "create-kubelet-flags"
I0825 12:11:16.391985 50887 run_kubelet.go:49] Running "start-kubelet"
I0825 12:11:17.200056 50887 run_self_hosted.go:181] Waiting for the kube-apiserver to be ready ...
E0825 12:16:17.201364 50887 run_self_hosted.go:571] API server error: Get "https://127.0.0.1:8443/healthz?timeout=32s": dial tcp 127.0.0.1:8443: connect: connection refused ()
Error: timed out waiting for the condition
Then I checked the logs for kubelet container it shows :
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-min-version has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-private-key-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --file-check-frequency has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
I0825 05:13:19.249680 51788 server.go:417] Version: v1.11.0+d4cacc0
I0825 05:13:19.249928 51788 plugins.go:97] No cloud provider specified.
F0825 05:13:19.253892 51788 server.go:261] failed to run Kubelet: mountpoint for cpu not found
I have tried to reinstall docker with latest version but still I face this issue.
Could someone give me another thing to try?
Thanks!
oc cluster up is using the deprecated version of OpenShift, this has been superseded by OpenShift Local now: https://developers.redhat.com/products/openshift-local/overview. Although OpenShift Local uses a good deal more resources than oc cluster up ever did. There's a spiritual successor that might be worth checking out, and that's MicroShift: https://microshift.io/

Trying to monitor resource usage of a kvm/qemu virtual machine with mesos

I’m currently deploying a kvm/qemu virtual machine with mesos/marathon. In marathon, I’m using the built in mesos command executor and running the script.
virsh start centos7.0; while true; do echo 'centos 7.0 guest is running'; sleep 5; done
Note the while loop is there only to keep the task running. My issue is that I cannot get mesos to monitor the resource usage of the virtual machine.
When marathon deploys this task on a mesos-agent, it is creating a container that uses the memory and cpu cgroups.
/sys/fs/cgroup/cpu/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
/sys/fs/cgroup/memory/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
When the virtual machine is being kicked off, the virsh start command is sending a request to libvirtd. Libvirtd then reads the guest.xml file located in /etc/libvirt/qemu/ and then sends a request to the qemu/kvm driver to deploy it.
In my guest.xml file I’m using a custom partition cgroup slice to monitor my virtual machine usage.
https://libvirt.org/cgroups.html
(for each cgroup)
/sys/fs/cgroup/???/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
What I have tried.
I tried deleting my memory / cpu cgroup from this slice by doing
cgdelete -r cpu,memory:vmHolder.slice
and then adding my qemu guest process to the mesos controllers
cgclassify -g cpu,memory:mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895 GUEST-PID
When I run the command cat /proc/5531/cgroup
11:perf_event:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
10:pids:/
9:devices:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
8:cpuset:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope/emulator
7:net_prio,net_cls:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
6:freezer:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
5:blkio:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
4:hugetlb:/
3:cpuacct,cpu:/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
2:memory:/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
1:name=systemd:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
It shows that I’m using those controllers, but when I run systemd-cgtop it's not adding the memory usage of the VM. I'm not sure what to do next. Any suggestions?

How to make oc cluster up persistent?

I'm using "oc cluster up" to start my Openshift Origin environment. I can see, however, that once I shutdown the cluster my projects aren't persisted at restart. Is there a way to make them persistent ?
Thanks
There are a couple ways to do this. oc cluster up doesn't have a primary use case of persisting resources.
There are couple ways to do it:
Leverage capturing etcd as described in the oc cluster up README
There is a wrapper tool, that makes it easy to do this.
There is now an example in the cluster up --help command, it is bound to stay up to date so check that first
oc cluster up --help
...
Examples:
# Start OpenShift on a new docker machine named 'openshift'
oc cluster up --create-machine
# Start OpenShift using a specific public host name
oc cluster up --public-hostname=my.address.example.com
# Start OpenShift and preserve data and config between restarts
oc cluster up --host-data-dir=/mydata --use-existing-config
So specifically in v1.3.2 use --host-data-dir and --use-existing-config
Assuming you are using docker machine with vm such as virtual box, the easiest way I found is taking a vm snapshot WHILE vm and openshift cluster are up and running. This snapshot will backup memory in addition to disk therefore you can restore entire cluster later on by restoring the vm snapshot, then run docker-machine start ...
btw, as of latest os image openshift/origin:v3.6.0-rc.0 and oc cli, --host-data-dir=/mydata as suggested in the other answer doesn't work for me.
I'm using:
VirtualBox 5.1.26
Kubernetes v1.5.2+43a9be4
openshift v1.5.0+031cbe4
Didn't work for me using --host-data-dir (and others) :
oc cluster up --logging=true --metrics=true --docker-machine=openshift --use-existing-config=true --host-data-dir=/vm/data --host-config-dir=/vm/config --host-pv-dir=/vm/pv --host-volumes-dir=/vm/volumes
With output:
-- Checking OpenShift client ... OK
-- Checking Docker client ...
Starting Docker machine 'openshift'
Started Docker machine 'openshift'
-- Checking Docker version ...
WARNING: Cannot verify Docker version
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
Using docker-machine IP 192.168.99.100 as the host IP
Using 192.168.99.100 as the server IP
-- Starting OpenShift container ...
Starting OpenShift using container 'origin'
FAIL
Error: could not start OpenShift container "origin"
Details:
Last 10 lines of "origin" container log:
github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4202a1600, 0x42b94c0, 0x1f, 0xc4214d9f08, 0x2, 0x2)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16a
github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.newBackend(0xc4209f84c0, 0x33, 0x5f5e100, 0x2710, 0xc4214d9fa8)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:106 +0x341
github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.NewDefaultBackend(0xc4209f84c0, 0x33, 0x461e51, 0xc421471200)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:100 +0x4d
github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer.func1(0xc4204bf640, 0xc4209f84c0, 0x33, 0xc421079a40)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:272 +0x39
created by github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:274 +0x345
Openshift writes to the directories /vm/... (also defined in VirtualBox) but successfully won't start.
See [https://github.com/openshift/origin/issues/12602][1]
Worked for me too, using Virtual Box Snapshots and restoring them.
To make it persistent after each shutdown you need to provide base-dir parameter.
$ mkdir ~/openshift-config
$ oc cluster up --base-dir=~/openshift-config
From help
$ oc cluster up --help
...
Options:
--base-dir='': Directory on Docker host for cluster up configuration
--enable=[*]: A list of components to enable. '*' enables all on-by-default components, 'foo' enables the component named 'foo', '-foo' disables the component named 'foo'.
--forward-ports=false: Use Docker port-forwarding to communicate with origin container. Requires 'socat' locally.
--http-proxy='': HTTP proxy to use for master and builds
--https-proxy='': HTTPS proxy to use for master and builds
--image='openshift/origin-${component}:${version}': Specify the images to use for OpenShift
--no-proxy=[]: List of hosts or subnets for which a proxy should not be used
--public-hostname='': Public hostname for OpenShift cluster
--routing-suffix='': Default suffix for server routes
--server-loglevel=0: Log level for OpenShift server
--skip-registry-check=false: Skip Docker daemon registry check
--write-config=false: Write the configuration files into host config dir
But you shouln't use it, because "cluster up" is removed in version 4.0.0. More here: https://github.com/openshift/origin/pull/21399

Unable to create tap device vnet d: Operation not permitted

I am trying to add a bridge network to my guest VM on a Centos 6 host.
I have created a bridge br0 by adding a file:
/etc/sysconfig/network-scripts/ifcfg-br0:
DEVICE=br0
TYPE=Bridge
BOOTPROTO=dhcp
STP=on
ONBOOT=yes
Also, I have added a line in my /etc/sysconfig/network-scripts/ifcfg-eth0:
BRIDGE=br0
Now, I tried to create a VM using:
virt-install -n ubuntu_vm --disk path=kvm-images/ubuntu-12.04.qcow2,size=30,format=qcow2 --ram=2048 --cdrom= --os-type=linux --network bridge=br0 --os-variant=ubuntuprecise --graphics vnc,listen=0.0.0.0
Now, I am getting the following error:
Starting install...
**ERROR Unable to create tap device vnet%d: Operation not permitted**
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
virsh --connect qemu:///session start ubuntu_new_vm
otherwise, please restart your installation.
I see that this problem was fixed before libvirt 0.10.2 which I am using currently, but still I am getting the same error.
http://www.redhat.com/archives/libvir-list/2012-May/msg00678.html
From the error message I see that you are running virt-install as an unprivileged user connecting to qemu:///session. With the unprivileged libvirtd instance you are very limited in what networking modes you can use, and in partuclar the 'network' mode is not available as your user won't have privileges to manage the TAP devices.
The alternatives are you use the privileged libvirtd instance (qemu:///system) to run the VM, which gives it full network access, or enable the QEMU setuid network helper. This lets you use --network bridge=NAME for virt-install when running unprivileged, delegating TAP device setup to the setuid helper program

Monit service name error

So I have the following in my monitrc file:
check process apache with pidfile /usr/local/apache/logs/httpd.pid
group apache
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host XXX port 80 protocol http
and request "/monit/token" then restart
if cpu is greater than 60% for 2 cycles then alert
if cpu 80% for 5 cycles then restart
if totalmem 500 MB for 5 cycles then restart
if children 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
but I keep getting the error that:
Error: service name conflict, apache already defined '/usr/local/apache/logs/httpd.pid'
If the hostname of the server is 'apache' then the conflict is with the default rule for monitoring the system load.
Monit seems to have the implicit rule of 'check system hostname', where the hostname is the output of hostname command.
You can overwrite that by adding just a line like:
check system newhostname
For example:
check system localhost
I saw this error when I forgot to comment out the line:
include /etc/monit/conf.d/*
in a custom /etc/monit/conf.d/myprogram.conf file, so it was recursively including that file.
By any chance do you have an entry with a host name apache beneath this entry or in a separate monit config file?
You have the same service defined more than once. Check all your monit config files for that service. This includes your monitrc and all files listed under the "Includes" section (like include /etc/monit/conf.d/*).
If you redefine "Includes" within a file in one of your "Includes" directories, you will run into recursive reference problems.
Very very important thing : you need monit 5.5
For example in ubuntu 12.04 available in repo only 5.3
So you need to download and install from other repo.
Solution for me , for example :
wget http://mirrors.kernel.org/ubuntu/pool/universe/m/monit/monit_5.5.1-1_amd64.deb && sudo dpkg -i monit_5.5.1-1_amd64.deb
For my case, I simply had to restart monit to get rid of the service name error:
sudo service monit restart
Check if you have had any conflicts for Apache defined in any of the monit conf files under /etc/monit.d/ directory, I accidentally did added nginx for my puma.conf and ran into the same error before.