Create multiple VMs based on single image - qemu

I would like to use libvirt to run multiple Domains (VMs) based on the same image at once. The image itself should not be modified. The image should be considered as a starting point or template.
An obvious possibility would be to create a (temporary) copy for every domain. Since the image might take multiple GB, I don't want to create a full copy of it every time. It would like to store differences only. As I understand the documentation, external snapshots are using such technics. But it seems that snapshots are bound to a domain and I cannot use them as template.
According to documentation of qemu, I could use qemu directly while passing option -snapshot. As far as I'm not committing changes manually, it should work.
qemu-system-x86_64 -snapshot -hda <image>
Is there a way to achieve something similar in libvirt?

All you need is to use qcow2 backing files. In the next steps I'll assume that you already have your base image as a qcow2.
Create a disk image backed by your base image:
qemu-img create -f qcow2 \
-o backing_file=/path/to/base/image.qcow2 \
/path/to/guest/image.qcow2
Then in your guest, use /path/to/guest/image.qcow2 as disk. This file will only get the difference with the base image.
Check qemu-img's man page for more details. qemu-img also has commands to commit the overlay file changes into the base image, rebase on another base, etc.

Related

Is QEMU's `savevm` transactional/atomic?

I've looked into QEMU documentation but found no answers.
I'm using QEMU's savevm command in monitor to save the vm snapshot. For simplicity i use the same tag (it's hightly unlikely to save a new snapshot and then remove existing snapshot due to lack of QEMU admin utils on the device). Is savevm transactional/atomic? Can it happen that no new snapshot is saved, but the existing one (to be overwritten) is corrupted or broken?
How can i understand savevm call was not successful (if it's the case)?
Simply, savevm if you didn't give it a name or id, it will create a new one,
you can check it by this command
(qemu) info snapshots
but if you want to overwrite it, just put an old name or id,
(qemu) savevm `old_snapshot_name`
the attached link talked about that in more detail
here

Using buildah to create OCI images, how does one create an OCI "Bundle" image that can be passed directly to crun

Using Buildah, I can create images for multiple architectures, and populate them trivially, and run the containers at will using buildah. An embedded project, requires that I create an OCI "BUNDLE" (with config.json and mounted rootfs) which can be passed directly to crun (yes, CRUN, not RUNC), but it is unclear how to move from the OCI image (stored locally), to an image bundle using the buildah workflow.
Has anyone any experience with this? What am I missing? I have poured over the documentation, but it appears that my use case is (as always), a bit eccentric. A pointer to documentation or tutorial would be ideal, but my search for same has thus far been fruitless.
Usually we would hand of the driving of the container image to Podman.

How to know the content of a container?

I have container specified in my pipeline as:
container 'insilicodb/docker-impute2'
It allows me to just run the pipeline without downloading necessary programms. How to see the list of stuff it contains?
That image is not on Docker Hub, so you will need to first know which registry it is being pulled from. Insilicodb is however a known publisher on Hub. An example of theirs which lists its Dockerfile is https://hub.docker.com/r/insilicodb/ubuntu/dockerfile.
There is no built-in way to view the Dockerfile of an image you have pulled, it is up to the publisher to provide this. Images don't have to be built from a Dockerfile and may not have one at all. If there is one, it will tell you the steps taken to create that image.
By the way, "without downloading necessary programs" is the point of containers. The purpose of them is to be scripted with everything they need to run without you having to install anything.

In what scenario is the --srcdir config option valid?

I am working on a configuration program - it isn't autoconf, but I'm trying (as much as possible) to get it so that ./configure files that use it can be interfaced in a similar manner to those that are made with autoconf -- and that means (as much as possible) supporting the same variable options.
Only there's one option that makes no sense to me. I mean, yes, I am fully clear on what the option means, I just can't conceive of a single scenario in which someone would be well-advised to use that option - except for one scenario in which I'm equally curious why ./configure scripts can't auto-detect the information it would provide.
The option I am referring to is the "--srcdir" option. The reason it so befuddles me is that the only scenario I can imagine in which the source-code files won't be in your present-working-directory (or relative to your present-working-directory as the configure script is programmed to expect) is if the "configure" script itself isn't in your present-working-directory ---- and in that one scenario, I really am unable to imagine why the "configure" script can't extrapolate the source-directory from the name it is invoked by - and instead has to have that --srcdir option to give it that information.
For example, let's say your program's source-code is located in the "awesome/software" directory. That means that, from where you are, the "configure" script would be "awesome/software/configure". Why can't the "configure" script deduce that the source-directory is "awesome/software" just from the fact that it is invoked by the name "awesome/software/configure", and instead require me to add a separate command-line option of: --srcdir=awesome/software
And if this is not the kind of scenario where one would need to specify the --srcdir option (or if it is not the only kind of such scenario) can someone describe to me any other kind of scenario where the person installing a program would be well-advised to alter the "srcdir" variable from it's default?
The option I am referring to is the "--srcdir" option. ...
the only scenario I can imagine in which the source-code files won't be in your present-working-directory (or relative to your present-working-directory as the configure script is programmed to expect) is if the "configure" script itself isn't in your present-working-directory
Right.
and in that one scenario, I really am unable to imagine why the "configure" script can't extrapolate the source-directory from the name it is invoked by - and instead has to have that --srcdir option to give it that information.
I'm not sure it's required. The configure script will attempt to guess the location of srcdir:
# Find the source files, if location was not specified.
if test -z "$srcdir"; then
ac_srcdir_defaulted=yes
# Try the directory containing this script, then the parent directory.
...
So if it's in neither of those places, this will fail, hence the need for --srcdir. Maybe this is (was?) needed where there's some kind of performance differential where the sources are stored on a "slow" drive and the build happens on a "fast" drive, and configure seems to run faster on the "fast" drive so it needs to be there as well...
At any rate, --srcdir is just a variable assignment, so it's not hard to do.
Why can't the "configure" script deduce that the source-directory is "awesome/software" just from the fact that it is invoked by the name "awesome/software/configure"
The configure source seems to do that without specifying --srcdir, but I have not tried it.

Scrape multi-frame website

I'm auditing our existing web application, which makes heavy use of HTML frames. I would like to download all of the HTML in each frame, is there a method of doing this with wget or a little bit of scripting?
as an addition to Steve's answer:
Span to any host—‘-H’
The ‘-H’ option turns on host spanning, thus allowing Wget's recursive run to visit any host referenced by a link. Unless sufficient recursion-limiting criteria are applied depth, these foreign hosts will typically link to yet more hosts, and so on until Wget ends up sucking up much more data than you have intended.
Limit spanning to certain domains—‘-D’
The ‘-D’ option allows you to specify the domains that will be followed, thus limiting the recursion only to the hosts that belong to these domains. Obviously, this makes sense only in conjunction with ‘-H’.
A typical example would be downloading the contents of ‘www.server.com’, but allowing downloads from ‘images.server.com’, etc.:
wget -rH -Dserver.com http://www.server.com/
You can specify more than one address by separating them with a comma,
e.g. ‘-Ddomain1.com,domain2.com’.
taken from: wget manual
wget --recursive --domains=www.mysite.com http://www.mysite.com
Which indicates a recursive crawl should also traverse into frames and iframes. Be careful to limit the scope of recursion only to your web site since you probably don't want to crawl the whole web.
wget has a -r option to make it recursive, try wget -r -l1 (in case the font makes it hard to read: that last part is a lower case L followed by a number one)
The -l1 part tells it to recurse to a maximum depth of 1. Try playing with this number to scrape more.