Should I use MySQL Geo-Spatial data types for vector graphics - mysql

I am working on a project where I need to store and do computations on SVG paths and points (preferably in MySQL). I need to be able to quickly query whether a point lies within a path. MySQL's Geo-spatial features seems to support this kind of query with the ST_Within function.
However, I have found 2 opposing claims regarding whether MySQL's Geo-spatial functionality takes into account the 'curvature of the earth'. "I understand spatial will factor in the curvature of the earth" and "all calculations are performed assuming Euclidean (planar) geometry as opposed to the geocentric system (coordinates on the Earth's surface)". So, my question is which of the claims is true and whether/how does this effect me?
Also, any general advice on whether I should be taking this approach of storing SVG objects as MySQL Geo-spatial data types is welcome.

Upon further research, it seems that the second claim is true. That is, all computations in MySQL are done without regards to the curvature of the earth and just assumes a flat plane. References:
https://www.percona.com/blog/2013/10/21/using-the-new-mysql-spatial-functions-5-6-for-geo-enabled-applications/
http://www.programering.com/a/MTNwQjMwATI.html
http://blog.karmona.com/index.php/2010/11/01/the-geospatial-cloud/
General advice on whether I should be taking this approach of storing SVG objects as MySQL Geo-spatial data types is still very much welcome.

Related

What does the EpisodeParameterMemory of keras-rl do?

I have found the keras-rl/examples/cem_cartpole.py example and I would like to understand, but I don't find documentation.
What does the line
memory = EpisodeParameterMemory(limit=1000, window_length=1)
do? What is the limit and what is the window_length? Which effect does increasing either / both parameters have?
EpisodeParameterMemory is a special class that is used for CEM. In essence it stores the parameters of a policy network that were used for an entire episode (hence the name).
Regarding your questions: The limit parameter simply specifies how many entries the memory can hold. After exceeding this limit, older entries will be replaced by newer ones.
The second parameter is not used in this specific type of memory (CEM is somewhat of an edge case in Keras-RL and mostly there as a simple baseline). Typically, however, the window_length parameter controls how many observations are concatenated to form a "state". This may be necessary if the environment is not fully observable (think of it as transforming a POMDP into an MDP, or at least approximately). DQN on Atari uses this since a single frame is clearly not enough to infer the velocity of a ball with a FF network, for example.
Generally, I recommend reading the relevant paper (again, CEM is somewhat of an exception). It should then become relatively clear what each parameter means. I agree that Keras-RL desperately needs documentation but I don't have time to work on it right now, unfortunately. Contributions to improve the situation are of course always welcome ;).
A little late to the party, but I feel like the answer doesn't really answer the question.
I found this description online (https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html#replay-memory):
We’ll be using experience replay
memory for training our DQN. It stores the transitions that the agent
observes, allowing us to reuse this data later. By sampling from it
randomly, the transitions that build up a batch are decorrelated. It
has been shown that this greatly stabilizes and improves the DQN
training procedure.
Basically you observe and save all of your state transitions so that you can train your network on them later on (instead of having to make observations from the environment all the time).

Faceted search and heat map creation on GPU

I am trying to find ways to filter and render 100 million+ data points as a heat map in real time.
Each point in addition to the (x,y) coordinates has a fixed set of attributes (int, date, bit flags) which can be dynamically chosen by the user in order to filter down the data set.
Would it be feasible to accelerate all or parts of this task on GPUs?
It would help if you were more specific, but I'm assuming that you want to apply a user specified filter to the same 2D spatial data. If this is the case, you could consider organizing your data into a spatial datastructure, such as a Quadtree or K-d tree.
Once you have done this, you could run a GPU kernel for each region in your datastructure based on the filter you want to apply. Each thread will figure out which points in its region satisfy the specified filter.
Definitely, this is the kind of problem that fits into the GPGPU spectrum.
You could decide to create your own kernel to filter your data or simply use some functions of vendor's libraries to that end. Probably, you would normalize, interpolate, and so on, which are common utilities in those libraries. These kind of functions are typically embarrassingly parallel, at it shouldn't be difficult to create your own kernel.
I'd rather use a visualization framework that allows you to filter and visualize your data in real time. Vispy is a great option but, of course, there are some others.

Is there efficient way to map graph onto blocks in CUDA programming?

In parallel computing, it is usually the first step to divide the origin problem into some sub-task and map them onto blocks and threads.
For problems with regular data structure, it is very easy and efficient, for example, matrix multiplication, FFT and so on.
But graph theory problems like shortest path, graph traversal, tree search, have irregular data structure. It seems not easy, at least in my mind, to partition the problem onto blocks and threads when using GPU.
I am wondering if there efficient solutions for this kind of partition?
For simplicity, take single-source shortest-path problem as a example. I am stuck at how to divide the graph so that both locality and coalescing.
The tree data structure is designed to best optimize the sequential way of progressing. In tree search, since each state is highly dependent on the previous state, I think it would not be optimal to parallelize traversal on tree.
As far as the graph is concerned, each connected node can be analyzed in parallel, but I guess there might be redundant operations for overlapping paths.
You can use MapGraph which uses GAS method for all the things u mentioned....they also have some example implemented for the same and Library included for Gather, Apply and Scatter in cuda for GPU and cpu only implementation also.
You can find latest version here: http://sourceforge.net/projects/mpgraph/files/0.3.3/

Cosine in floating point

I am trying to implement the cosine and sine functions in floating point (but I have no floating point hardware).
Since my processor has no floating-point hardware, nor instructions, I have already implemented algorithms for floating point multiplication, division, addition, subtraction, and square root. So those are the tools I have available to me to implement cosine and sine.
I was considering using the CORDIC method, at this site
However, I implemented division and square root using newton's method, so I was hoping to use the most efficient method.
Please don't tell me to just go look in a book or that "paper's exist", no kidding they exist. I am looking for names of well known algorithms that are known to be fast and efficient.
First off, depending on your accuracy requirements, this can be considerably fussier than your earlier questions.
Now that you've been warned: you'll first want to reduce the argument modulo pi/2 (or 2pi, or pi, or pi/4) to get the input into a manageable range. This is the subtle part. For a nice discussion of the issues involved, download a copy of K.C. Ng's ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit. (simple google search on the title will get you a pdf). It's very readable, and does a great job of describing why this is tricky.
After doing that, you only need to approximate the functions on a small range around zero, which is easily done via a polynomial approximation. A taylor series will work, though it is inefficient. A truncated chebyshev series is easy to compute and reasonably efficient; computing the minimax approximation is better still. This is the easy part.
I have implemented sine and cosine exactly as described, entirely in integer, in the past (sorry, no public sources). Using hand-tuned assembly, results in the neighborhood of 100 cycles are entirely reasonable on "typical" processors. I don't know what hardware you're dealing with (the performance will mostly be gated on how quickly your hardware can produce the high part of an integer multiply).
For various levels of precision, you can find some good approximations here:
http://www.ganssle.com/approx.htm
With the added advantage that they are deterministic in runtime unlike the various "converging series" options which can vary wildly depending on the input value. This matters if you are doing anything real-time (games, motion control etc.)
Since you have the basic arithmetic operations implemented, you may as well implement sine and cosine using their taylor series expansions.

Bing Maps API - SQL - geometry vs geography type

I'm developing a Mapping Service with Bing Maps AJAX API and SQL Server 2008. The question which appears to me is should I use the geography or geometry data type. I researched a lot but doesn't found a satisfactory answer. Here are some links about the topic:
SQL 2008 geography & geometry - which to use?
http://www.mssqltips.com/tip.asp?tip=1847
https://alastaira.wordpress.com/2011/01/23/the-google-maps-bing-maps-spherical-mercator-projection/
If I compare the two types I see the following points.
pro geography
consistent distance calculation around the world (time line!)
the coordinate system of the database is the same as the one which is used to add data to a map with the Bing Maps API (WGS84)
precise
contra geography
high computational costs
data size constrained to one hemisphere
missing functions (STConvexHull(), STRelate(),...)
pro geometry
faster computation
unconstrained data size
contra geography
distance units in degree (if we use WGS84 coordinates)
The problem for me is that I don't need a fast framework, a great coverage (the whole world) and high functionality. So I would prefer the geometry type.
The problem with the geometry type is, that I have to transform my data into a flat projection (Bing Map use SRID=3875), so that I get meters for the calculation. But when I use the Bing Maps projection (3875) in the database I have to transform my data back to WGS84 if I won't to display it in the map.
You've provided quite a good summary of the differences between the two types, and you've correctly identified the two sensible alternatives to be either geography(4326) or geometry(3857), so I'm not quite sure what more information anyone can provide - you just need to make the decision yourself based on the information available to you.
I would say that, although the geometry datatype is likely to be slightly quicker than the geography datatype (since it relies on simpler planar calculations, and can benefit from a tight bounding box over the area in question), this increase in performance will be more than offset by the fact that you'll then have to unproject back to WGS84 lat/long in order to pass back to Bing Maps - reprojection is an expensive process.
You could of course store WGS84 angular coordinates using the geometry datatype, but this is really a hack and not recommended - you are almost certain to run into difficulties further down the line.
So, I'd recommend using the geography datatype and WGS84. With careful index tuning, you should still be able to get sub-second response time for most queries of even large datasets. Incidentally, the "within a hemisphere" rule is lifted for the geography datatype in SQL Denali, so that limitation goes away if you were to upgrade.