Indexing based on Peano-hilbert curve? - mysql

I have a x,y,z 3D points stored in MySQL,
I would like to ask the regions, slices or point neighbours.
Are there way to index the points using Peano-Hilbert curves to accelerate the queries?
Or are there more efficient way to store the 3D data in the MySQL?
thanks Arman.

I've personally never went this far, but I used a Z-curve to store 2D points. This worked quite well, and didn't feel the need to try to implement the hilbert curve for better results.
This should allow you to quickly filter out points that certainly are not close by. In an absolute worst case scenario you still need to scan more than 25% of your table to find points within an area.
The way to go about it is to split the x y z in binary and stitch them together into a single value using the curve. I wish I had a SQL script ready, but I just have one for the 2d z-curve which is a much much easier to do.
Edit:
Sorry you might already know all this already and really just looking for SQL samples, but I have some additions:
I'm not sure the 25% worst case scan is true as well for 3D planes. It might be higher, don't have the brainpower right now to tell you ;).
This type of Curve will help you find ranges of where you need to search. If you have 2 coordinates, you can convert these to the hilbert-curve number to find out which section of your table you need to look for items that do exactly match your query.
You might be able to extend this concept to find neighbours, but in order to use the curve you are still 'stuck' to look in ranges.

You can probably take the algorithm to create a geohash, and extend it to 3 coordinates. Basically, you define would have a world cube of possible 3d points, and then as you add more bits, you narrow down the cube. You then consistently define it so that the lower left hand corner has the smallest value, and you can perform range checks like:
XXXXa < the_hash < XXXXz

Related

Trilateration different approaches and issues

Although there exists several posts about (multi)lateration, i would like to summarize some approaches and present some issues/questions to better clarify the approach.
It seems that are two ways to detect the target location; using geometric/analytic approach (solving directly the equations with some trick) and fitting approach converting from non-linear to linear system.
With respect to the first one i would like to ask few questions.
Suppose in the presence of perfect range measurements,considering 2D case, the exact solution is a unique point at three circles intersection. Can anyone point some geometric solution for the first case? I found this approach: https://math.stackexchange.com/questions/884807/find-x-location-using-3-known-x-y-location-using-trilateration
but is seems it fails to consider two points with the same y coordinate as we can get a division by 0. Moreover can this be extended to 3D?
The same solution can be extracted using the second approach
Ax=b and latter recovering x = A^-1b or using MLS (x = A^T A)^-1 A^T b.
Please see http://www3.nd.edu/~cpoellab/teaching/cse40815/Chapter10.pdf
What about the case when the three circles have no intersection. It seems that the second approach still finds a solution. Is this normal? How can be explained?
What about the first approach when the range measurements are noisy. Does it find an approximate solution or it fails?
Considering the 3D, it seems that it needs at least 4 anchors to provide a unique solution. However, considering 3 anchors it can provide 2 solutions. I am asking if anyone of u guys can provide such equations to find the two solutions. This can be good even we have two solutions we may discard one by checking the values if they agree with our scenario. E.g., the GPS case where we pick the solution located in the earth. Instead the second approach of LMS would provide always one solution, wrong one.
Do u know any existing library C/C++ which would implement some of this techniques and maybe some more complex fitting functions such as non-linear etc.
Thank you
Regards

Do I really need to use MySQL Spatial Functions?

Well I want your opinions about this case:
I need a database that will have... two or three tables at most, one of them will have points (latitude, longitude) and some other info.
It's really simple what I need: Get the points within a given radius.
I'm not asking how to do it (but any advice is more than welcome, specially if it's about good practices), I want to know if making use of the MySQL's spatial support would help. Since what I need is fairly easy to get with just one query, what I expect by using Spatial support is to increase performance.
So, are the spatial indexes going to help noticeably? I don't think the table will store that many points. I'd say no more than 200.
If it's really only 200 points, I recommend you do without: This makes it much easier to write portable SQL (which I consider an important thing).
Write your SQL so, that first longitued and latitude are checked against the precalculated mins and maxes (giving you a rectangle), then check for the radius. This way, you will only need to calculate the radius without finally selecting the point for 1/pi of the result set.
I personally consider this an acceptable tradeof against writing SQL, that could if must be executed against SQlite or whatever.

Efficient represention for growing circles in 2D space?

Imagines there's a 2D space and in this space there are circles that grow at different constant rates. What's an efficient data structure for storing theses circles, such that I can query "Which circles intersect point p at time t?".
EDIT: I do realize that I could store the initial state of the circles in a spatial data structure and do a query where I intersect a circle at point p with a radius of fastest_growth * t, but this isn't efficient when there are a few circles that grow extremely quickly whereas most grow slowly.
Additional Edit: I could further augment the above approach by splitting up the circles and grouping them by there growth rate, then applying the above approach to each group, but this requires a bounded time to be efficient.
Represent the circles as cones in 3d, where the third dimension is time. Then use a BSP tree to partition them the best you can.
In general, I think the worst-case for testing for intersection is always O(n), where n is the number of circles. Most spacial data structures work by partitioning the space cleverly so that a fraction of the objects (hopefully close to half) are in each half. However, if the objects overlap then the partitioning cannot be perfect; there will always be cases where more than one object is in a partition. If you just think about the case of two circles overlapping, there is no way to draw a line such that one circle is entirely on one side and the other circle is entirely on the other side. Taken to the logical extreme, assuming arbitrary positioning of the circles and arbitrary radiuses, there is no way to partition them such that testing for intersection takes O(log(n)).
This doesn't mean that, in practice, you won't get a big advantage from using a tree, but the advantage you get will depend on the configuration of the circles and the distribution of the queries.
This is a simplified version of another problem I have posted about a week ago:
How to find first intersection of a ray with moving circles
I still haven't had the time to describe the solution that was expected there, but I will try to outline it here(for this simplar case).
The approach to solve this problem is to use a kinetic KD-tree. If you are not familiar with KD trees it is better to first read about them. You also need to add the time as additional coordinate(you make the space 3d instead of 2d). I have not implemented this idea yet, but I believe this is the correct approach.
I'm sorry this is not completely thought through, but it seems like you might look into multiplicatively-weighted Voronoi Diagrams (MWVDs). It seems like an adversary could force you into computing one with a series of well-placed queries, so I have a feeling they provide a lower-bound to your problem.
Suppose you compute the MWVD on your input data. Then for a query, you would be returned the circle that is "closest" to your query point. You can then determine whether this circle actually contains the query point at the query time. If it doesn't, then you are done: no circle contains your point. If it does, then you should compute the MWVD without that generator and run the same query. You might be able to compute the new MWVD from the old one: the cell containing the generator that was removed must be filled in, and it seems (though I have not proved it) that the only generators that can fill it in are its neighbors.
Some sort of spatial index, such as an quadtree or BSP, will give you O(log(n)) access time.
For example, each node in the quadtree could contain a linked list of pointers to all those circles which intersect it.
How many circles, by the way? For small n, you may as well just iterate over them. If you constantly have to update your spatial index and jump all over cache lines, it may end up being faster to brute-force it.
How are the centres of your circles distributed? If they cover the plane fairly evenly you can discretise space and time, then do the following as a preprocessing step:
for (t=0; t < max_t; t++)
foreach circle c, with centre and radius (x,y,r) at time t
for (int X = x-r; X < x+r; x++)
for (int Y = x-r; Y < y+r; y++)
circles_at[X][Y][T].push_back (&c)
(assuming you discretise space and time along integer boundaries, scale and offset however you like of course, and you can add circles later on or amortise the cost by deferring calculation for distant values of t)
Then your query for point (x,y) at time (t) could do a brute-force linear check over circles_at[x][y][ceil(t)]
The trade-off is obvious, increasing the resolution of any of the three dimensions will increase preprocessing time but give you a smaller bucket in circles_at[x][y][t] to test.
People are going to make a lot of recommendations about types of spatial indices to use, but I would like to offer a bit of orthogonal advice.
I think you are best off building a few indices based on time, i.e. t_0 < t_1 < t_2 ...
If a point intersects a circle at t_i, it will also intersect it at t_{i+1}. If you know the point in advance, you can eliminate all circles that intersect the point at t_i for all computation at t_{i+1} and later.
If you don't know the point in advance, then you can keep these time-point trees (built based on the how big each circle would be at a given time). At query time (e.g. t_query), find i such that t_{i-1} < t_query <= t_i. If you check all the possible circles at t_i, you will not have any false negatives.
This is sort of a hack for a data structure that is "time dynamics aware", but I don't know of any. If you have a threaded environment, then you only need to maintain one spacial index and be working on the next one in the background. It will cost you a lot of computation for the benefit of being able to respond to queries with low latency. This solution should be compared at the very least to the O(n) solution (go through each point and check if dist(point, circle.center) < circle.radius).
Instead of considering the circles, you can test on their bounding boxes to filter out the ones which do not contain the point. If your bounding box sides are all sorted, this is essentially four binary searches.
The tricky part is reconstructing the sorted sides for any given time, t. To do that, you can start off with the original points: two lists for the left and right sides with the x coordinate, and two lists for top and bottom with the y coordinates. For any time greater than 0, all the left side points will move to left, etc. You only need to check each location to the one next to it to obtain a points where the element and the one next to it are are swapped. This should give you a list of time points to modify your ordered lists. If you now sort these modification records by time, for any given starting time and an ending time you can extract all the modification records between the two, and apply them to your four lists in order. I haven't completely figured out the algorithm, but I think there will be edge cases where three or more successive elements can cross over exactly at the same time point, so you may need to modify the algorithm to handle those edge cases as well. Perhaps a list modification record that contains the position in list, and the number of records to reorder would suffice.
I think it's possible to create a binary tree that solves this problem.
Each branch should contain a growing circle, a static circle for partitioning and the latest time at which the partitioning circle cleanly partitions. Further more the growing circle that is contained within a node should always have a faster growing rate than either of it's child nodes' growing circles.
To do a query, take the root node. First check it's growing circle, if it contains the query point at the query time, add it to the answer set. Then, if the time that you're querying is greater than the time at which the partition line is broken, query both children, otherwise if the point falls within the partitioning circle, query the left node, else query the right node.
I haven't quite completed the details of performing insertions, (the difficult part is updating the partition circle so that the number of nodes on the inside and outside is approximately equal and the time when the partition is broken is maximized).
To combat the few circles that grow quickly case, you could sort the circles in descending order by rate of growth and check each of the k fastest growers. To find the proper k given t, I think you can perform a binary search to find the index k such that k*m = (t * growth rate of k)^2 where m is a constant factor you'll need to find by experimentation. The will balance the part the grows linearly with k with the part that falls quadratically with the growth rate.
If you, as already suggested, represent growing circles by vertical cones in 3d, then you can partition the space as regular (may be hexagonal) grid of packed vertical cylinders. For each cylinder calculate minimal and maximal heights (times) of intersections with all cones. If circle center (vertex of cone) is placed inside the cylinder, then minimal time is zero. Then sort cones by minimal intersection time. As result of such indexing, for each cylinder you’ll have the ordered sequence of records with 3 values: minimal time, maximal time and circle number.
When you checking some point in 3d space, you take the cylinder it belongs to and iterate its sequence until stored minimal time exceeds the time of the given point. All obtained cones, which maximal time is less than given time as well, are guaranteed to contain given point. Only cones, where given time lies between minimal and maximal intersection times, are needed to recalculate.
There is a classical tradeoff between indexing and runtime costs – the less is the cylinder diameter, the less is the range of intersection times, therefore fewer cones need recalculation at each point, but more cylinders have to be indexed. If circle centers are distributed non-evenly, then it may be worth to search better cylinder placement configuration then regular grid.
P.S. My first answer here - just registered to post it. Hope it isn’t late.

SQL Bounding Box Optimization

Can anybody link to any documents regarding optimized bounding box style queries in SQL?
At the most basic level, imagine an table consisting of x,y float columns, we query the table for rows within a certain (x1,x2),(y1,y2) range. The query to do this is trivial, but what is the best way to define the indexes to ensure this query behaves efficiently?
We could simply create an index on the x and y columns, or I could create an index on both the x and y columns, but I don't know enough about SQL indexing to reason my way through this.
I am using MySQL.
A space filling curve is best to reduce a 2d space to a 1d problem. It's constructed like a fractal and is basically a gray code traversal of the surface. Instead of calculate an index you can put together a quadtree path prefix-free key similar to a huffman code. Then you can use a simple string query to retrieve a box. MySql has a spatial index extension but I don't know what curve they use. It's probably the simple z-curve or the peano curve. You can take a look at Nick spatial index quadtree hilbert curve blog. Monotonic n-ary gray code can also be very interesting.
mysqls spatial extensions
it can use r tree indexes
then you have handy functions like mbrwithin
seems right up your alley

Get polygons close to a lat,long in MySQL

Does anyone know of a way to fetch all polygons in a MySQL db within a given distance from a point? The actual distance is not that important since it's calculated for each found polygon later, but it would be a huge optimization to just do that calculation for the polygons that are "close".
I've looked at the MBR and contains functions but the problem is that some of the polygons are not contained within a bounding box drawn around the point since they are very big, but some of their vertices are still close.
Any suggestions?
A slow version (without spatial indexes):
SELECT *
FROM mytable
WHERE MBRIntersects(mypolygon, LineString(Point(#X - #distance, #Y - #distance), Point(#X + #distance, #Y + #distance))
To make use of the spatial indexes, you need to denormalize your table so that each polygon vertex is stored in its own record.
Then create the SPATIAL INDEX on the field which contains the coordinates of the vertices and just issue this query:
SELECT DISTINCT polygon_id
FROM vertices
WHERE MBRContains(vertex, LineString(Point(#X - #distance, #Y - #distance), Point(#X + #distance, #Y + #distance))
The things will be much more easy if you store UTM coordinates in your database rather than latitude and longitude.
I don't think there's a single answer to this. It's generally a question of how to organize your data so that it makes use of the spacial locality inherent to your problem.
The first idea that pops into my head would be to use a grid, assign each point to a square, and check select the square the point is in, and those around it. If we're talking infinite grids, then use a hash-value of the square, this would give you more points than needed (where you have collisions), but will still reduce the amount by a bunch. Of course this isn't immediately applicable to polygons, it's just a brainstorm. A possible approach that might yield too many collisions would be to OR all hashed values together and select all entries where the hashes ANDed with that value is non-zero (not sure if this is possible in MySQL), you might want to use a large amount of bits though.
The problem with this approach is, assuming we're talking spherical coordinates (lat, long generally does) are the singularities, as the grid 'squares' grow narrower as you approach the poles. The easy approach to this is... don't put any points close to the poles... :)
Create a bounding box for all of the polygons and (optionally storing these results in the database will make this a lot faster for complex polygons). You can then compare the bounding box for each polygon with the one round the point at the desired size. Select all the polygons which have intersecting bounding boxes.