Faceted search and heat map creation on GPU - cuda

I am trying to find ways to filter and render 100 million+ data points as a heat map in real time.
Each point in addition to the (x,y) coordinates has a fixed set of attributes (int, date, bit flags) which can be dynamically chosen by the user in order to filter down the data set.
Would it be feasible to accelerate all or parts of this task on GPUs?

It would help if you were more specific, but I'm assuming that you want to apply a user specified filter to the same 2D spatial data. If this is the case, you could consider organizing your data into a spatial datastructure, such as a Quadtree or K-d tree.
Once you have done this, you could run a GPU kernel for each region in your datastructure based on the filter you want to apply. Each thread will figure out which points in its region satisfy the specified filter.

Definitely, this is the kind of problem that fits into the GPGPU spectrum.
You could decide to create your own kernel to filter your data or simply use some functions of vendor's libraries to that end. Probably, you would normalize, interpolate, and so on, which are common utilities in those libraries. These kind of functions are typically embarrassingly parallel, at it shouldn't be difficult to create your own kernel.
I'd rather use a visualization framework that allows you to filter and visualize your data in real time. Vispy is a great option but, of course, there are some others.

Related

Goal Seek in Octave to replicate Excel's 'Solver' Macro

This is essentially a question on fundamentals, and whether or not there is a more efficient way to achieve what I am looking for. I have built a working fluid dynamics calculator in Excel to find the flow rates required for a target pressure loss, the optimisation is handled using Solver but it's very clunky and not user friendly.
I'm trying to replicate the function in Octave since it's widely used here, but I am a complete beginner; I'm probably missing something obvious. I can easily enter all of the math for a single iteration via a series of functions, but my excel file required using the 'Solver' macro, and I'm unsure how to efficiently replicate this in Octave.
I am aware that linprog (in matlab) and glpk (octave) can be used to solve systems of linear equations.
I have a series of nested equations which are all dependant on a single matrix, Q (flow rates at various locations). Many other inputs are required, but they either remain constant throughout calculation (e.g. system geometry) or are dictated by Q (e.g. Reynolds number and loss coefficients). In trying to simplify my problem I have settled on two steps:
Write code to solve my problem, input: Q matrix, output: pressure loss matrix
Create a loop that iterates different Q matrices until some conditions for the pressure loss matrix are met.
I don't think it will be practical to get my expressions into the form of A*x = B (in order to use glpk) given the complexity. In excel, I can point solver at a Q value that drives a multitude of equations that impact pressure loss, and it will find the value I need to achieve a target. How can I most efficiently replicate this functionality in Octave?
First off all Solver is not a macro. Pretty far from.
So, you're going to replicate a comprehensive "What-If" Analysis Plug-in -- so complex in fact, that Microsoft chose to contract a 3rd Party company of experts to develop the tool and provide support for it (successfully based on the 1.2 Billion copies they've distributed).
And you're going to this an inferior coding language that you're a complete beginner with? Cool. I'd like to see this!
Cool. Here's a checklist of Solver's features, so you don't miss anything:
Good Luck!
More Information:
Wikipedia : Solver
Office.com : Define and Solve a Problem by using Solver
Frontline: Official Solver Page: http://solver.com
AppSource.Microsoft.com : Solver (with Video)
Frontline:L Solver International Manazine

Should I use MySQL Geo-Spatial data types for vector graphics

I am working on a project where I need to store and do computations on SVG paths and points (preferably in MySQL). I need to be able to quickly query whether a point lies within a path. MySQL's Geo-spatial features seems to support this kind of query with the ST_Within function.
However, I have found 2 opposing claims regarding whether MySQL's Geo-spatial functionality takes into account the 'curvature of the earth'. "I understand spatial will factor in the curvature of the earth" and "all calculations are performed assuming Euclidean (planar) geometry as opposed to the geocentric system (coordinates on the Earth's surface)". So, my question is which of the claims is true and whether/how does this effect me?
Also, any general advice on whether I should be taking this approach of storing SVG objects as MySQL Geo-spatial data types is welcome.
Upon further research, it seems that the second claim is true. That is, all computations in MySQL are done without regards to the curvature of the earth and just assumes a flat plane. References:
https://www.percona.com/blog/2013/10/21/using-the-new-mysql-spatial-functions-5-6-for-geo-enabled-applications/
http://www.programering.com/a/MTNwQjMwATI.html
http://blog.karmona.com/index.php/2010/11/01/the-geospatial-cloud/
General advice on whether I should be taking this approach of storing SVG objects as MySQL Geo-spatial data types is still very much welcome.

Multiple GPUs in OptiX (asynchronous launches possible?)

I have some challenges with my Master's thesis I hope you can help me with or maybe point me in the correct direction.
I'm implementing Progressive Photon Mapping using the new approach by Knaus and Zwicker (http://www.cs.jhu.edu/~misha/ReadingSeminar/Papers/Knaus11.pdf) using OptiX. This approach makes each iteration/frame of PPM independent and more suitable for multi-GPU.
What i do (with a single GPU) is trace a number of photons using OptiX and then store them in a buffer. Then, the photons are then sorted into a spatial hash map using CUDA and thrust, never leaving the GPU. I want to do the spatial hash map creation on GPU since it is the bottleneck of my renderer. Finally, this buffer is used during indirect radiance estimation. So this is a several pass algorithm, consisting of ray-tracing, photon-tracing, photon map generation and finally create image.
I understand that OptiX can support multiple GPU. Each context launch is divided up across the GPUs. Any writes to buffers seems to be serialized and broadcasted to each device so that their buffer contents are the same.
What i would like to do is let one GPU do one frame, while second GPU does the next frame. I can then combine the results, for instance on the CPU or on one of the GPU's in a combine pass. It is also acceptable if i can do each pass in parallel on each device (synchronize between each pass). Is this somehow possible?
For instance, could I create two OptiX contexts mapping to each device on two different host threads. This would allow me to do the CUDA/thrust spatial hash map generation as before, assuming the photons are on one device, and merge the two generated images at the end of the pipeline. However, the programming guide states it does not support multi-thread context handling. I could use multiple processes but then there is a lot of mess with inter-process communication. This approach also requires duplicate work with creating the scene geometry, compiling PTX files and so on.
Thanks!
OptiX already splits the workload accordingly to your GPUs power so your approach will likely not be faster than having OptiX dispose of all the GPUs.
If you want to force your data to remain on the device (notice that in such a situation writes from different devices will not be coherent) you can use the RT_BUFFER_GPU_LOCAL flag as indicated into the programming guide
https://developer.nvidia.com/optix-documentation

Is there efficient way to map graph onto blocks in CUDA programming?

In parallel computing, it is usually the first step to divide the origin problem into some sub-task and map them onto blocks and threads.
For problems with regular data structure, it is very easy and efficient, for example, matrix multiplication, FFT and so on.
But graph theory problems like shortest path, graph traversal, tree search, have irregular data structure. It seems not easy, at least in my mind, to partition the problem onto blocks and threads when using GPU.
I am wondering if there efficient solutions for this kind of partition?
For simplicity, take single-source shortest-path problem as a example. I am stuck at how to divide the graph so that both locality and coalescing.
The tree data structure is designed to best optimize the sequential way of progressing. In tree search, since each state is highly dependent on the previous state, I think it would not be optimal to parallelize traversal on tree.
As far as the graph is concerned, each connected node can be analyzed in parallel, but I guess there might be redundant operations for overlapping paths.
You can use MapGraph which uses GAS method for all the things u mentioned....they also have some example implemented for the same and Library included for Gather, Apply and Scatter in cuda for GPU and cpu only implementation also.
You can find latest version here: http://sourceforge.net/projects/mpgraph/files/0.3.3/

Best data structure for an immutable persistent 3D grid

I'm experimenting with writing a game in a functional programming style, which implies representing the game state with a purely functional, immutable data structures.
One of the most important data structures would be a 3D grid representing the world, where objects can be stored at any [x,y,z] grid location. The properties I want for this data structure are:
Immutable
Fast persistent updates - i.e. creation of a new version of the entire grid with small changes is cheap and achieved through structural sharing. The grid may be large so copy-on-write is not a feasible option.
Efficient handling of sparse areas / identical values - empty / unpopulated areas should consume no resources (to allow for large open spaces). Bonus points if it is also efficient at storing large "blocks" of identical values
Unbounded - can grow in any direction as required
Fast reads / lookups - i.e. can quickly retrieve the object(s) at [x,y,z]
Fast volume queries, i.e. quick searches through a region [x1,y1,z1] -> [x2,y2,z2], ideally exploiting sparsity so that empty spaces are quickly skipped over
Any suggestions on the best data structure to use for this?
P.S. I know this may not be the most practical way to write a game, I'm just doing it as a learning experience and to stretch my abilities with FP......
I'd try an octtree. The boundary coordinates of each node are implicit in structure placement, and each nonterminal node keep 8 subtree but no data. You can thus 'unioning' to gain space.
I think that Immutable and Unbounded are (generally) conflicting requirements.
Anyway... to grow a octtree you must must replace the root.
Other requirement you pose should be met.