QEMU PCIe TLP Emulation - qemu

Does QEMU emulate the PCIe Transaction Layer somehow?
When I have a virtual Switch like (https://blogs.oracle.com/linux/post/a-study-of-the-linux-kernel-pci-subsystem-with-qemu) am I able to intercept at least the TLP's going over it? Or is the abstraction so high the the hypervisor does not go down to that layer at all.

As expected QEMU does not implement the TLP layer at the moment. Instead whenever a memory access happens qemu jumps out of the KVM execution and handles MMIO.

Related

qemu performance same with and without multi-threading and inconsistent behaviour

I am new to qemu simulator.I want to emulate our existing pure c h264(video decoder)code in arm platform(cortex-a9) using qemu in ubuntu 12.04 and I had done it successfully from the links available in the internet.
Also we are having multithreading(pthreads) code in our application to speed up the process.If we enable multithreading we are getting the same performance (i.e)single thread(without multithreading).
Eg. single thread 9.75sec
Multithread 9.76sec
Since qemu will support parallel processing we are not able to get the performance.
steps done are as follows
1.compile the code using arm-linux-gnueabi-toolchain
2.Execute the code
qemu-arm -L executable
3.qemu version 1.6.1
Is there any option or settings has to be done in qemu if we want measure the performance in multi threading because we want to get the difference between single thread and multithread using qemu since we are not having any arm board with us.
Moreover,multithreading application hangs if we run for third time or fourth time i.e inconsistent behaviour in qemu.
whether we can rely on this qemu simulator or not since it is not cycle accurate.
You will not be able to use QEMU to estimate real hardware speed.
Also QEMU currently supports SMP running in a single thread... this means your guest OS will see multiple CPUs but will not recieve adicional cycles since all the emulation is occuring in a single thread.
Note that IO is delegated to separate threads... so usually if your VM is doing cpu and IO work you will see at least 1.5+ cores on the host being used.
There has been alot of research into parallelizing the cpu emulation in qemu but without much sucess. I suggest you buy some real hardware and run it there especially consiering that coretex-a9 hardware is cheap these days.

what interrupt Controller a guest OS use, find it

My host system is Linux. I am using qemu as an emulator. I want to know what interrupt controller is used by the guest OS. Other information also like what interrupts are called etc.
guide me in details
Linux can use the PIC 8259, the simplest interrupt controller that has 2 banks of 8 pins. Then there is the APIC with 256 interrupt vectors that can be assigned to devices. You read the PCI memory for a given device to find out what interrupts it has been assigned for PIC and can tell a device to use a given ISR vector for APIC. Then there is MSI and MSI-X that use messages and not interrupts for potentially higher performance. hth

How to decide whether a GPU card is being used?

In CUDA, is there any runtime API that will tell whether a GPU device is being used or not? And whether the user is from video display or a GUGPU application? And what is the GPU occupancy?
On linux at least, you can use the program nvidia-smi to see the current memory use, and if any compute processes are running. Think though that the status about compute processes is only supported on a selected number of graphics cards, e.g. tesla.
While it doesn't show exactly what is using it, MSI Afterburner on Windows will show you the core usage, memory usage, fan speed, and temperature of GPU's in a system (NV or otherwise.)

Limitations of work-item load in GPU? CUDA/OpenCL

I have a compute-intensive image algorithm that, for each pixel, needs to read many distant pixels. The distance is dependent on a constant defined at compile-time. My OpenCL algorithm performs well, but at a certain maximum distance - resulting in more heavy for loops - the driver seems to bail out. The screen goes black for a couple of seconds and then the command queue never finishes. A balloon message reveals that the driver is unhappy:
"Display driver AMD driver stopped responding and has successfully recovered."
(Running this on OpenCL 1.1 with an AMD FirePro V4900 (FireGL V) Graphics Adapter.)
Why does this occur?
Is it possible to, beforehand, tell the driver that everything is ok?
This is a known "feature" under Windows (not sure about Linux) - if the video driver stops responding, the OS will reset it. Except that, since OpenCL (and CUDA) is implemented by the driver, a kernel that takes too long will look like a frozen driver. There is a watchdog timer that keeps track of this (5 seconds, I believe).
Your options are:
You need to make sure that your kernels are not too time-consuming (best).
You can turn-off the watchdog timer: Timeout Detection and Recovery of GPUs.
You can run the kernel on a GPU that is not hooked up to a display.
I suggest you go with 1.

CPU or GPU bound? profiling OpenGL application

I've got an OpenGL application which I'm afraid is GPU bound.
How can I be sure that's the case?
And if it is, how can I profile the code run by the GPU?
I would also check it with AMD GPU PerfStudio.
It will analyse your GPU and CPU usage and show relative load values.
If you are using Windows, Linux or Mac, (well, a computer!) give a try to gDEBugger.
If your OpenGL thread should uses less than one core you are not CPU bound. If you're running at 60Hz you're probably limited by vsync.
gDEBugger no longer supports OSX..
For OSX users (and perhaps other OS's) the Intel Graphics Performance Analyser might be worth a look
https://software.intel.com/en-us/gpa
Info here