what interrupt Controller a guest OS use, find it - qemu

My host system is Linux. I am using qemu as an emulator. I want to know what interrupt controller is used by the guest OS. Other information also like what interrupts are called etc.
guide me in details

Linux can use the PIC 8259, the simplest interrupt controller that has 2 banks of 8 pins. Then there is the APIC with 256 interrupt vectors that can be assigned to devices. You read the PCI memory for a given device to find out what interrupts it has been assigned for PIC and can tell a device to use a given ISR vector for APIC. Then there is MSI and MSI-X that use messages and not interrupts for potentially higher performance. hth

Related

How does qemu do device emulation for memory mapped devices when using a HW CPU (KVM)?

How does qemu intercept only those addresses in the address space that belong to memory mapped devices?
Can someone please explain the full path of, let's say, a read? How does the read from an address X get intercepted (and directed to a device back-end)? And then how is the read completed?

How to access pci device from another device

I'm creating new PCI device in qemu that is part DMA and part NVMe controller.
And I need to get the physical address of the NVMe device, from within my new device to use
dma_memory_read(...)
Is there a function to get new device address?
Is there other function that I can use without physical address?
Is there another way to do it, through pointers?
Generally the best question to ask when trying to figure out how to model devices in QEMU is "what does the real hardware do?".
For real PCI devices, the only way they can access other devices elsewhere in the system is if they do DMA accesses, which they do using PCI addresses (which are usually about the same thing as physical addresses on x86, but not necessarily so on other architectures). In QEMU we model this by having APIs for PCI devices to do DMA accesses (pci_dma_*()).
On the other hand, if you have a PCI card that is itself implementing an NVMe controller (or another kind of controller, like a SCSI disk controller), the answer is that the disks are plugged directly into the controller, which then can talk to them with no physical addresses involved at all. In QEMU we model this by having a concept of controller devices possibly having a "bus" which the disks are plugged into.
How does the real hardware talk between the PCI device and the NVMe ? Generally the answer is not "weird backdoor mechanism" and so you shouldn't be looking for an API in QEMU that corresponds to "weird backdoor mechanism".

NvLink or PCIe, how to specify the interconnect?

My cluster is equipped with both Nvlink and PCIe. All the GPUs(V100) can communicate directly through both PCIe or NvLink. To my knowledge, both PCIe switch and Nvlink can support the direct link through using CUDA.
Now, I want to compare the peer-to-peer communication performance of PCIe and NvLink. However, I don't know how to specify one, it seems CUDA will always automatically specify one. Could anyone help me?
If two GPUs in CUDA have a direct NVLink connection between them, and you enable Peer-to-Peer transfers, those transfers will flow over NVLink. There is no method of any kind in CUDA to alter this behavior.
If you do not enable Peer-to-Peer transfers, then data transfers (e.g. cudaMemcpy, cudaMemcpyAsync, cudaMemcpyPeerAsync) between those two devices will flow from the source GPU over PCIE to the CPU socket, (perhaps traversing intermediate PCIE switches, perhaps also flowing over a socket-level link such as QPI) and then over PCIE from the CPU socket to the other GPU. At least one CPU socket will always be involved, even if a shorter direct path exists across the PCIE fabric. This behavior is also not modifiable in any fashion available to the programmer.
Both methodologies are demonstrated using the p2pBandwidthLatencyTest CUDA sample code.
The accepted answer -- from an NVIDIA employee -- was correct in 2018. But at some point, NVIDIA added an (undocumented?) option to the driver.
On Linux, you can now put this in /etc/modprobe.d/disable-nvlink.conf:
options nvidia NVreg_NvLinkDisable=1
This will disable NVLink when the driver is next loaded, forcing GPU peer-to-peer communication to use the PCIe interconnect. This gadget exists in driver 515.65.01 (CUDA 11.7.1). I am not sure when it was added.
As for "there is no reason to allow the end-user to choose the slower path", the very existence of this SO question suggests otherwise. In my case, we buy not one server, but dozens... And in the process of choosing our configuration, it is nice to use a single prototype system to benchmark our application using either NVLink or PCIe.

How to decide whether a GPU card is being used?

In CUDA, is there any runtime API that will tell whether a GPU device is being used or not? And whether the user is from video display or a GUGPU application? And what is the GPU occupancy?
On linux at least, you can use the program nvidia-smi to see the current memory use, and if any compute processes are running. Think though that the status about compute processes is only supported on a selected number of graphics cards, e.g. tesla.
While it doesn't show exactly what is using it, MSI Afterburner on Windows will show you the core usage, memory usage, fan speed, and temperature of GPU's in a system (NV or otherwise.)

How to choose device when running a CUDA executable?

I'm connecting to a GPU cluster from the outside and I have no idea how to select the device on which to run my CUDA programs.
I know there are two Tesla GPU in the cluster, and I'd like to choose one of them.
Any ideas how? How do you choose the device you want to use when there are many connected to your computer?
The canonical way to select a device in the runtime API is using cudaSetDevice. That will configure the runtime to perform lazy context establishment on the nominated device. Prior to CUDA 4.0, this call didn't actually establish a context, it just told the runtime which GPU to try and use. Since CUDA 4.0, this call will establish a context on the nominated GPU at the time of calling. There is also cudaChooseDevice, which will select amongst available devices to find one which matches criteria supplied by the caller.
You can enumerate the available GPUs on a system with cudaGetDeviceCount, and retrieve their particulars using cudaGetDeviceProperties. The SDK deviceQuery example shows full details of how to do this.
You may need to be careful, however, on how you select GPUs in a multi-GPU system, depending on the host and driver configuration. In both the Linux and the Windows TCC driver, there exists the option for GPUs to be marked "compute exculsive", meaning that the driver will limit each GPU to one active context at a time, or compute prohibited, meaning that no CUDA program can establish a context on that device. If your code attempts to establish a context on a compute prohibited device, or on a compute exclusive device which is in use, the result will be an invalid device error. In a multiple GPU system where the policy is to use compute exclusivity, the correct approach is not to try and select a particular GPU, but simply to allow lazy context establishment to happen implicitly. The driver will automagically select a free GPU for your code to run. The compute mode status of any device can be checked by reading the cudaDeviceProp.computeMode field using the cudaGetDeviceProperties call. Note that you are free to check unavailable or prohibited GPUs and query their properties, but any operation which would require context establishment will fail.
See the runtime API documentation on all of these calls
You can set the environment variable CUDA_VISIBLE_DEVICES to a comma-separated list of device IDs to make only those devices visible to an application. Use this either to mask out devices or to change the visibility order of devices so that the CUDA runtime enumerates them in a specific order.