Webgl: max vertex uniform vectors in ANGLE-based implementations - google-chrome

I have some rendering code which queries GL_MAX_VERTEX_UNIFORM_VECTORS, reports this value to console and computes uniform array sizes based on it. The goal is to use (almost) all available GPRs for some batched rendering.
The code runs on desktop (Linux and Windows) and in browsers when built with emscripten. I tried to run it on Nvidia, AMD and Intel HD GPUs with all combinations of GPUs and OS.
On Linux both native and web versions work fine and report GL_MAX_VERTEX_UNIFORM_VECTORS = 1024 on my GPUs.
Then I reboot to Windows, where native version works fine too, while web version in both Chrome and Firefox report GL_MAX_VERTEX_UNIFORM_VECTORS = 4096 and work fine with Nvidia and AMD. Looks like it uses phantom uniforms which were unavailable for native build. So my first question: how? Does it swap extra values from memory?
Then I run this code on Intel HD 4000 GPU on Windows. Native version works as expected (with correct value of GL_MAX_VERTEX_UNIFORM_VECTORS), but web version reports 4096 GL_MAX_VERTEX_UNIFORM_VECTORS and corrupts some of the uniforms: geometry gets glitched when uniform array is used in vertex shader.
Why does ANGLE report wrong GL_MAX_VERTEX_UNIFORM_VECTORS? Is it a bug? How can I get correct value or otherwise use all available uniforms? No UBOs please, I'm bound with webgl1.
Quick reproduction for webgl: https://sergeyext.github.io/sergeyext/max_vertex_uniform_vectors/start_webgl.html

Related

Can I debug CUDA on the device that drives the display output?

I develop on VS2012. I have 3 monitors connected to my pc with one GTX 960 graphic card.
I knew that it's impossible to debug CUDA on the same device that drives the display output. Maybe I'm reading it wrong, but when I go to NSight->Windows->System Info->Display Devices, I can see that the monitor uses my graphic card. Since I have only one graphic card and I can debug (as the image shows in CUDA WarpWatch1) I deduct that either I do can debug on the same device that drives the display output or it uses my built-in Intel HD Graphics but doesn't show it in the Display Device .
Despite what you have apparently read somewhere, CUDA (and NSight) has supported debugging on GPUs with the WDDM driver on active display GPUs for a number of years. You can see the exact matrix of supported hardware, drviers and debugging modes in the documentation here.
When CUDA was first introduced, debugging was limited to non-display cards. However, this limitation was removed on Windows and Linux using more recent hardware some time ago.

WebGL Unavailable, GPU Process unable to boot

I'm running Chrome 54.0.2840.87 on Windows 10. I have two GPUs: an Intel(R) HD Graphics 520, and a AMD Radeon R5 M335.
Up until a couple of weeks ago, WebGL was running just fine in chrome. Now, after not having changed any settings anywhere, WebGL is no longer available.
When trying to run a chrome experiment for example, I get a message saying that my graphics card does not seem to support WebGL. I know my graphics cards work fine (they have been updated with the latest drivers), plus WebGL runs perfectly in firefox. I know my GPUs have not been blacklisted (on either browser).
On chrome:gpu, I am told that WebGL is unavailable, and that the GPU process was unable to boot. When checking chrome:flags enabling or disabling WebGL no longer seems to be an option.
Enabling/disabling anything else that involves WebGL has not made any difference. Is there something else that can be done to get it working again? At what level is the issue? (The issue persists on Chrome Canary.) I am not the most technologically savvy person, but I've had no luck finding answers anywhere else.
The following is what I see on my chrome:gpu page:
Graphics Feature Status
Canvas: Software only, hardware acceleration unavailable
Flash: Software only, hardware acceleration unavailable
Flash Stage3D: Software only, hardware acceleration unavailable
Flash Stage3D Baseline profile: Software only, hardware acceleration
unavailable
Compositing: Software only, hardware acceleration unavailable
Multiple Raster Threads: Unavailable
Native GpuMemoryBuffers: Software only. Hardware acceleration disabled
Rasterization: Software only, hardware acceleration unavailable
Video Decode: Software only, hardware acceleration unavailable
Video Encode: Software only, hardware acceleration unavailable
VPx Video Decode: Software only, hardware acceleration unavailable
WebGL: Unavailable
Driver Bug Workarounds
clear_uniforms_before_first_program_use
disable_d3d11
disable_discard_framebuffer
disable_dxgi_zero_copy_video
disable_nv12_dxgi_video
disable_framebuffer_cmaa
exit_on_context_lost
scalarize_vec_and_mat_constructor_args
Problems Detected
GPU process was unable to boot: GPU process launch failed.
Disabled Features: all
Some drivers are unable to reset the D3D device in the GPU process
sandbox
Applied Workarounds: exit_on_context_lost
Clear uniforms before first program use on all platforms: 124764,
349137
Applied Workarounds: clear_uniforms_before_first_program_use
Always rewrite vec/mat constructors to be consistent: 398694
Applied Workarounds: scalarize_vec_and_mat_constructor_args
Disable Direct3D11 on systems with AMD switchable graphics: 451420
Applied Workarounds: disable_d3d11
Framebuffer discarding can hurt performance on non-tilers: 570897
Applied Workarounds: disable_discard_framebuffer
NV12 DXGI video hangs or displays incorrect colors on AMD drivers:
623029, 644293
Applied Workarounds: disable_dxgi_zero_copy_video,
disable_nv12_dxgi_video
Limited enabling of Chromium GL_INTEL_framebuffer_CMAA: 535198
Applied Workarounds: disable_framebuffer_cmaa
Native GpuMemoryBuffers have been disabled, either via about:flags or
command line.
Disabled Features: native_gpu_memory_buffers
Version Information
Data exported 11/7/2016, 2:09:57 PM
Chrome version Chrome/54.0.2840.87
Operating system Windows NT 10.0.14393
Software rendering list version 11.12
Driver bug list version 9.00
ANGLE commit id 905fbdea9ef0
2D graphics backend Skia/54 a21f10dd8b19c6cb47d07d94d0a0525c16461969
Command Line Args Files (x86)\Google\Chrome\Application\chrome.exe"
--flag-
switches-begin --enable-gpu-rasterization --enable-unsafe-es3-apis
--enable-
webgl-draft-extensions --flag-switches-end
Driver Information
Initialization time 0
In-process GPU true
Sandboxed false
GPU0 VENDOR = 0x1002, DEVICE= 0x6660
GPU1 VENDOR = 0x8086, DEVICE= 0x1916
Optimus false
AMD switchable true
Desktop compositing Aero Glass
Diagonal Monitor Size of \.\DISPLAY1 15.5"
Driver vendor Advanced Micro Devices, Inc.
Driver version 16.200.2001.0
Driver date 6-16-2016
Pixel shader version
Vertex shader version
Max. MSAA samples
Machine model name
Machine model version
GL_VENDOR
GL_RENDERER
GL_VERSION
GL_EXTENSIONS
Disabled Extensions
Window system binding vendor
Window system binding version
Window system binding extensions
Direct rendering Yes
Reset notification strategy 0x0000
GPU process crash count 0
Compositor Information
Tile Update Mode One-copy
Partial Raster Enabled
GpuMemoryBuffers Status
ATC Software only
ATCIA Software only
DXT1 Software only
DXT5 Software only
ETC1 Software only
R_8 Software only
BGR_565 Software only
RGBA_4444 Software only
RGBX_8888 Software only
RGBA_8888 Software only
BGRX_8888 Software only
BGRA_8888 Software only
YVU_420 Software only
YUV_420_BIPLANAR Software only
UYVY_422 Software only
Diagnostics
... loading ...
Log Messages
[1268:3756:1107/133435:ERROR:gl_surface_egl.cc(252)] : No suitable EGL configs found.
[1268:3756:1107/133435:ERROR:gl_surface_egl.cc(1012)] : eglCreatePbufferSurface failed with error EGL_BAD_CONFIG
[1268:3756:1107/133435:ERROR:gpu_info_collector.cc(35)] : gl::GLContext::CreateOffscreenGLSurface failed
[1268:3756:1107/133435:ERROR:gpu_info_collector.cc(108)] : Could not create surface for info collection.
[1268:3756:1107/133435:ERROR:gpu_main.cc(506)] : gpu::CollectGraphicsInfo failed (fatal).
GpuProcessHostUIShim: The GPU process exited normally. Everything is okay.
Got the same issue, found a post about ignoring the hardware list of compatible materials. So, go to chrome://flags, and activate first option:
Ignorer la liste de rendu logiciel (in french)
Override software rendering list (english)
https://superuser.com/questions/836832/how-can-i-enable-webgl-in-my-browser
Tell me if it helps !
For anyone still showing WebGL unavailabe under chrome://gpu/ after enabling Override software rendering list at chrome://flags/.
Check further down at chrome://gpu/ under the topic: Problems Detected. If there is a mention about GPU access being disabled:
GPU process was unable to boot: GPU access is
disabled in chrome://settings. Disabled Features: all
Navigate to:
chrome://settings/ > Advanced > Systems
And enable Use hardware acceleration.
Had the same problem when using sketchfab! In "Override software rendering list" I selected to show "disable", and everything looks ok now!!

Cuda Compute Mode and 'CUBLAS_STATUS_ALLOC_FAILED'

I have a host in our cluster with 8 Nvidia K80s and I would like to set it up so that each device can run at most 1 process. Before, if I ran multiple jobs on the host and each use a large amount of memory, they would all attempt to hit the same device and fail.
I set all the devices to compute mode 3 (E. Process) via nvidia-smi -c 3 which I believe makes it so that each device can accept a job from only one CPU process. I then run 2 jobs (each of which only takes about ~150 MB out of 12 GB of memory on the device) without specifying cudaSetDevice, but the second job fails with ERROR: CUBLAS_STATUS_ALLOC_FAILED, rather than going to the second available device.
I am modeling my assumptions off of this site's explanation and was expecting each job to cascade onto the next device, but it is not working. Is there something I am missing?
UPDATE: I ran Matlab using gpuArray in multiple different instances, and it is correctly cascading the Matlab jobs onto different devices. Because of this, I believe I am correctly setting up the compute modes at the OS level. Aside from cudaSetDevice, what could be forcing my CUDA code to lock into device 0?
This is relying on an officially undocumented behavior (or else prove me wrong and point out the official documentation, please) of the CUDA runtime that would, when a device was set to an Exclusive compute mode, automatically select another available device, when one is in use.
The CUDA runtime apparently enforced this behavior but it was "broken" in CUDA 7.0.
My understanding is that it should have been "fixed" again in CUDA 7.5.
My guess is you are running CUDA 7.0 on those nodes. If so, I would try updating to CUDA 7.5, or else revert to CUDA 6.5 if you really need this behavior.
It's suggested, rather than relying on this, that you instead use an external means, such as a job scheduler (e.g. Torque) to manage resources in a situation like this.

Running CUDA GUI samples from a passive (inactive) GPU

I managed to successfully run CUDA programs on a GeForce GTX 750 Ti while using a AMD Radeon HD 7900 as the rendering device (actually connected to the display) using this guide; for instance, the Vector Addition sample runs nicely. However, I can only run applications that do not produce visual output. For example, the Mandelbrot CUDA sample does not run and fails with an error:
Error: failed to get minimal extensions for demo:
Missing support for: GL_ARB_pixel_buffer_object
This sample requires:
OpenGL version 1.5
GL_ARB_vertex_buffer_object
GL_ARB_pixel_buffer_object
The error originates from asking glewIsSupported() for these extensions. Is there any way to run an application, like these CUDA samples, so that the CUDA operations are run on the GTX as usual but the Window is drawn on the Radeon card? I tried to convince Nsight Eclipse to run a remote debugging session, with my own PC as the remote host, but something else failed right away. Is this supposed to actually work? Could it be possible to use VirtualGL?
Some of the NVIDIA CUDA samples that involve graphics, such as the Mandelbrot sample, implement an efficient rendering strategy: they bind OpenGL data structures - Pixel Vertex Objects in the case of Mandelbrot - to the CUDA arrays containing the simulation data and render them directly from the GPU. This avoids copying the data from the device to the host at end of each iteration of the simulation, and results in a lightning fast rendering phase.
To answer your question: NVIDIA samples as they are need to run the rendering phase on the same GPU where the simulation phase is executed, otherwise, the GPU that handles the graphics would not have the data to be rendered in its memory.
This does not exclude that the samples can be modified to work with multiple GPUs. It should be possible to copy the simulation data back to the host at end of each iteration, and then render it using a custom method or even send it over the network. This would require to (1) modify the code, by separating and making independent simulation and rendering phases, and (2) accept the big loss in frame per second that would result from this.

restrict OpenCL access to Intel CPU?

It is currently possible to restrict OpenCL access to an NVIDIA GPU on Linux using the CUDA_VISIBLE_DEVICES env variable. Is anyone aware of a similar way to restrict OpenCL access to Intel CPU devices? (Motivation: I'm trying to force users of a compute server to run their OpenCL programs through SLURM exclusively.)
One possibility is to link directly to the Intel OpenCL library (libintelocl.so on my system) instead of going through the OpenCL ICD loader.
In pure OpenCL, the way to avoid assigning tasks to the CPU is to not select it (as platform or device). clGetDeviceIDs can do that using the device_type argument (don't set the CL_DEVICE_TYPE_CPU bit).
At the ICD level, I guess you could exclude the CPU driver if it's Intel's implementation; for AMD, it gets a little trickier since they have one driver for both platforms (it seems the CPU_MAX_COMPUTE_UNITS environment variable can restrict it to one core, but not disable it).
If the goal is to restrict OpenCL programs to running through a specific launcher, such as slurm, one way might be to add a group for that launcher and just make the OpenCL ICD vendor files in /etc/OpenCL (and possibly driver device nodes) usable only by that group.
None of this would prevent a user from having their own OpenCL implementation in place to run on CPU, but it could be enough to guide them to not run there by mistake.