Display via Tesla graphic card - cuda

I want to display a processed video on monitor. For video processing in CUDA, I am thinking to get a Nvidia Tesla grade card, but it does not have any video out port. Is there a way to create the frame buffer on the Tesla GPU card, then transfer it to system memory and display via motherboard graphics?
PS: I don't want to compute anything on CPU, to have a near real time performance.

For video processing (and display), and given what I understand of your problem, Tesla is probably not your best choice.
Tesla cards are expensive, (partly) because of double precision support, which you don't need for video processing
Tesla cards don't have any video port, meaning you have to send back your frames to system (obviously possible). That means a performance penalty, and more code to write and maintain.
Did you have a look to Quadro product line? They have display ouput, and are usually meant for this kind of applications (but still expensive).
If you want to display, that probably means you work on a desktop application. So you graphics card won't work 24/7 in full compute load? In that case, why not a GeForce?

Related

Does nvidia gpu work less efficiently when it is the only gpu in PC?

I want to assemble a new computer mainly for CUDA applications. When it comes to CPU I have to choose between AMD and Intel.
Most of the AMD's processors don't have integrated gpu while Intel's processors do.
My question is:
If the nvidia gpu would be the only graphic processing unit in the whole PC (without integrated one),
would its efficiency for CUDA programs be worse as it has to produce some graphics on a desktop (while using for example Matlab)?
The anwer is yes, efficiency would be slightly lower due to the GPU doing display tasks, like moving the cursor around or scrolling a display in a .pdf browser.
however if you are aiming for a reasonably mid-to-high-end GPU, the loss of efficiency is marginal. If you have enough money, you will buy dedicated GPU, but if not, then just don't bother. It might be like 1% or less.
A bigger problem is that the display takes up RAM, that (a) becomes unavailable to CUDA applications and (b) the CUDA manual states that the display driver is allowed to dis-own the CUDA application from it's memory at any time without warning (!).
If you ask me if that does really happen (display driver taking over the CUDA app memory), then yes, I have experienced it, with the prime example being when you change the resolution of your display.
So definetely don't do any banking with GPUs or you might see your accounts being randomly infused with millions :-)
That's why 'proffesional' CUDA cards (the tesla variety) have no display outputs - just in case.

How to decide whether a GPU card is being used?

In CUDA, is there any runtime API that will tell whether a GPU device is being used or not? And whether the user is from video display or a GUGPU application? And what is the GPU occupancy?
On linux at least, you can use the program nvidia-smi to see the current memory use, and if any compute processes are running. Think though that the status about compute processes is only supported on a selected number of graphics cards, e.g. tesla.
While it doesn't show exactly what is using it, MSI Afterburner on Windows will show you the core usage, memory usage, fan speed, and temperature of GPU's in a system (NV or otherwise.)

Kernel time increases for same number of particles

I am trying to run my code on NVIDIA's K10 GPU. I am using 5.0 CUDA Driver and 4.2 CUDA runtime. The problem is that the time taken by the kernel increases with iterations, where each iteration uses the same number of sources and targets (or particles). Because of this, the kernel eventually takes very large times, and the code crashes with runtime error, which says something like "GPU fallen off the bus".
The plot showing the behavior of increasing kernel run time with number of iterations can be seen here:
https://docs.google.com/open?id=0B5QLL4ig3LVqODdmVjNBTlp5UFU
I tried to run the NVIDIA "nbody" example to understand if the same thing happens here too, and yes it does. For the number of particles/bodies (Np) = 1e5 and 10 iterations, code runs fine. For Np=1e5 and iterations= 100, OR Np=1e6 and iterations = 10, code goes into a mode where it hangs the entire system.
When I run my own kernel as well as NVIDIA's nbody example on a different machine with Tesla C2050 NVIDIA card (CUDA Driver version: 3.2, and runtime version: 3.2), there is no problem, and kernel takes the same amount of time for every iteration.
I am trying to understand whats going on in the machine with the K10 GPU. I have tried different combinations of CUDA driver and runtime versions on this machine, and here is what I get:
For 5.0 CUDA Driver, 4.2 Runtime, it just hangs and sometimes says "GPU fallen off the bus".
For 4.2 CUDA Driver, 4.2 Runtime, the codes (nbody as well as my code) crash with error: "CUDA Runtime API error 39: uncorrectable ECC error encountered."
For 5.0 CUDA Driver, 5.0 Runtime, it just hangs and sometimes says "GPU fallen off the bus".
This is a 64-bit linux machine, which we have recently assembled with NVIDIA K10 GPU card. I am using gfortran44 and gcc44.
Please let me know if any other info. is required to track the problem.
Thanks in advance for the help!
M
I'm mostly just creating an answer so we can call this question closed, but I'll try to add a few details.
Tesla GPUs come in 2 distinct categories: those with a fan, and those without. Those with a fan carry (at this time) the "C" designation, although the K20 product family naming will be slightly different:
These are not exhaustive lists:
Tesla GPUs with a Fan: C870, C1060, C2050, C2070, C2075, K20c ("C Class")
Tesla GPUs without a Fan: M1060, M2050, M2070, M2075, M2090, K10, K20, K20X ("M class")
(note that there is currently no K10 type product with a fan or "C" designation)
Tesla GPUs with a fan are designed to be plugged into a wide variety of PC boxes and chassis, including various workstation and server variants. Since they have their own fan, they require a supply of inlet air that is below a certain temperature level, but given that, they will keep themselves cool. As the workload increases, and the generated heat increases, they will spin up their own fan to keep themselves cool. The main ways you can screw up this process are by either restricting the inlet air flow or by putting it in an ambient air environment that is hotter than its max inlet spec.
Tesla GPUs without a fan have something called a passive heatsink and they cannot keep themselves cool independently and take a passive role in the cooling process. They still have a temperature sensor, but it becomes the responsibility of the server BMC (baseboard management controller) to monitor this temperature sensor (this is done directly at the hardware/firmware level, independent of any OS or any activity being directed at the GPU), and to direct a level of airflow over the card that is sufficient to keep the card cool based on it's indicated temperature. The BMC does this by ramping up whatever fans are designed into the server chassis that control airflow over the GPU. Normally there will be shrouding/ducting within the chassis to aid in this process. Server manufacturers integrating these cards have a variety of responsibilities and must follow various technical specifications from NVIDIA in order to make this work.
If you happen to get your hands on a Tesla GPU without a fan and just slap it in some random chassis, you're pretty much guaranteed to have the behavior as described in this question. For this reason, Tesla "M" series and "K" series GPUs are normally only sold to OEMs who have undergone the qualification process.
Since the average sysadmin/system assembler is not likely to devise a suitable closed loop fan control system and normally does not have easy access to the necessary specifications defining the temperature sensor and access method, the only klugey workaround if you have one of these that you simply must play with, is to direct a high level of continuous airflow over the card, in whatever setting you put it. Be advised, that this will most likely be noisy. If you don't have a noisy level of airflow, you probably do not have enough airflow to keep a card cool that is in a high workload situation. In addition, you should probably keep an eye on GPU temps. Note that the nvidia-smi method for monitoring GPU temps does not work for all M class GPUs (i.e. GPUs without a fan). Unfortunately, the method of temperature sensor access in Fermi and prior for the M class GPUs (different than the C class GPUs) was such that it could not be readily monitored in-system via the nvidia-smi command, so in these cases you will get no temperature reading from nvidia-smi, making this approach even harder to manage. Things changed with the Kepler generation, so now the temperature can be monitored both by the nvidia-smi method and by the server BMC at the hardware/firmware level.
C class products with a fan have a temperature that can be monitored with nvidia-smi, regardless of generation. But this is normally not necessary since the card has it's own control system to keep itself cool.
As mentioned in the comments, all GPUs also have a variety of protection mechanisms, none of which are guaranteed to prevent damage. (If you throw the card in a fire, there's nothing to be done about that.) But the first typical mechanism is thermal throttling. At some predefined high temperature near the maximum safe operating range of the GPU, the GPU firmware will independently reduce its clocks to attempt to prevent further temperature rise. (If the card is clocked slower, then generally it's ability to generate heat is also somewhat reduced.) This is a crude mechanism, and when this thermal throttling occurs, something in the cooling arena is already wrong. The card is designed to not enter thermal throttling ever, under normal operating conditions. If temperatures continue to rise (and there is not much headroom at this point), the card will enter it's final protection mode which is to halt itself. At this point the GPU has become unresponsive to the system, and at the OS level, messages like "gpu has fallen of the bus" are typical. This means cooling has failed and protection mechanisms have failed.

Developing for CUDA on "cheap" GPUs

I develop algorithms in CUDA on my desktop which should later run on a server.
Is it okay to use a recent low end card (like compute capability 2.1) to get all the nice debug and profiling features and then put the code on the server with the high end card (with the same cc)? Would I just need to adjust the thread/mesh sizes, or does it change everything™.
Example: I would develop on a Quadro 600 and the server would use a Tesla C2075.
As long your kernel call and kernel itself is scalable you have no problem.
Check out this question:
CUDA development on different cards?
There are some issues, like memory bandwith being different (25.6 GiB/s on Quadro and 148 GiB/s on Tesla, according to your links), or different number of SMs (the driver could distribute blocks across SMs differently). However, in most cases such small diffrences are of minor importance.
If the server has more than one GPU installed then you need to change your code to run on Multi-GPU to fully leverage the power of the server. Although the same code will run fine but on a single card.
In case there's only one card on server; general rule of thumb is that you do not need to change any line of code to harness the power of the stronger GPU as the driver distributes the load among SMs automatically.

Use NVIDA card for CUDA, motherboard for video

I want use the motherboard as the primary display adapter and my NVIDIA graphics card as a dedicated CUDA processor. My first thought was to simply plug the monitor's VGA cable into the motherboard's VGA port and hope the BIOS was smart enough to use the on-board video as the display adapter when it booted. That didn't work. The BIOS must have detected the NVIDIA card and continued to use it as the display adapter. The next thing I looked for was a setting in the BIOS to tell it "don't use the the NVIDIA 560 as the display adapter, use the on-board video as the display adapter". I search through the BIOS and the Web, but either this cannot be done or I cannot figure out how to do it. The mobo is a BIOSTAR TH67+ LGA 1155. Windows 7 OS.
RESULTS SUMMARY (from answers provided below)
Enabling the Integrated Graphics Device (IGD) in the BIOS will allow the system to be driven from the on-board graphics even with the graphics card connected to the system bus. However, the graphics card cannot be used for CUDA processing. Windows will not enable graphics devices unless a monitor is attached to them. The normal driver stack cannot see them. Solution: use Linux, or attach a display to the graphics card but do not use it. The Tesla cards (GPGPU-only) are not recognized by Windows as graphics devices, so they don't suffer from this.
Also ,a newer BIOSTAR motherboard, the TZ68A+, supports the Virtu drivers which permit sophisticated simultaneous use of the graphics cards and on-board video.
Looking at the BIOS manual (.zip), the setting you probably want is Chipset -> North Bridge -> Initiate Graphics Adapter. Try setting it to IGD (Integrated Graphics Device).
I believe this will happen automatically as the native video won't support CUDA. After installing the SDK, if you run DeviceQuery, do you see more than one result?
I believe h67 allows coexistence of both integrated & dedicated GPU. Check out Lucid Virtu here http://www.lucidlogix.com/driverdownloads-virtu.html it allows switching GPUs on the fly. But I don't know if it affects CUDA device query.
I never tried it on my rig, because its x58, I just heard it from tomshardware. Try it out and let us know. Lucid Virtu is definitely worth a try, its free, and it can cut you electric bill.