MySQL Find Polygon Nearest to Point - mysql

I have a MySQL database that contains geo-tagged objects. The objects are tagged by using a bounding polygon that the user draws and my program exports into the database. The bounding polygon is stored in the database as a Polygon (the MySQL spatial extensions kind).
I can think of a couple ways to do this, but I'm not very pleased with any of them, as this needs to be an efficient process that will execute fairly often, although on probably only < 50,000 records in the pertinent table.
I need a way to, given any point on the earth, find the record that corresponds to the closest geo-tagged/bounded object. It doesn't need to be correct in all cases but, let's say (just to invent a number), 95% of the time. Manual correction is acceptable if it doesn't need to be done very frequently.

It appears as though this question is very similar
Get polygons close to a lat,long in MySQL.
I am going to write some application-level code to do an interatively-widening search on the distance in the linked question.

Related

Geo-location search with MYSQL InnoDB

i am working on a GEO-enabled application where i have a obvious use case of searching users within some distance of given user location .Currently i am having MySQL DB used. as the User table is expected to be very large by time the time for getting results will get longer (too long in case it need to traverse entire table).
i am using InnoDB as my table do need many things which MYISAM cant do. i have tried mongo and had a test drive with adding 5 million users and doing some test over them . now i am curious to know what MYSQL can offer in same situation as i will prefer MYSQL if it gives slightly near results to mongo .
My user table is having other fields plus a lat field and a lng (both indexed). still it takes much time. can anyone suggest a better design approach for faster results.
Mongo has a bunch of very useful built in geospatial commands and aggregations that will be ideal for your given case of finding users near to a given user point. Others include within that finds points within a bounding box or polygon. In your case the geoNear aggregation is perfect and can provide the calculated distance away from the given point.
You will have to code a lot of that functionality with mysql. Then you also have Postgis an add on for Postgres. Postgres is the classic open source Mysql competitor and Postgis has been around longer than Mongo and the database presumably behind open street maps, government gis and similar.
But to the problem, you need to use geojson format and 2dsphere index that you might not be using. Post a single record of your data.

Handling multiple geofence in google map

I have around 100 geofences (polygons) defined and stored in DB. My tracking devices updates it location once a minute. What could be the best way to check a given LatLng is in any of these geofences? I want to trigger alert when the device in any of these geofences.
What I could think is, in each minute after receiving the location from tracking device, I have to query geofence information from DB or array and compare one at a time. But this seems computationally expensive.
Any idea and help, please..
Assuming that the stored geo-fences are relatively static (i.e. not modified/added/deleted frequently) you could trade storage space for point look-up time by choosing to represent your geo-fences with a suitable spatial data structure.
R-Trees (https://en.wikipedia.org/wiki/R-tree) for example could be used to store which geo-fences might be applicable to a given point location so that only a subset of those fences need to be checked to determine if the point lies within them.
Pragmatically, you are likely best off using already existing spatially enabled databases like PostgreSQL+PostGIS (http://postgis.net/) which allow you efficiently post queries based on spatial relations (in your application likely ST_Within or ST_Contains)

MySQL Postgresql / PostGIS

I have lat/lon coordinates in a 400 million rows partitioned mysql table.
The table grows # 2000 records a minute and old data is flushed every few weeks.
I am exploring ways to do spatial analysis of this data as it comes in.
Most of the analysis requires finding whether a point is in a particular lat/lon polygon or which polygons contain that point.
I see the following ways of tackling the point in polygon (PIP) problem:
Create a mysql function that takes a point and a Geometry and returns a boolean.
Simple but not sure how Geometry can be used to perform operations on lat/lon co-ordinates since Geometry assumes flat surfaces and not spheres.
Create a mysql function that takes a point and identifier of a custom data structure and returns a boolean.
The polygon vertices can be stored in a table and a function can compute PIP using spherical math. Large number of polygon points may lead to a huge table and slow queries.
Leave point data in mysql and store polygon data in PostGIS and use the app server to run PIP query in PostGIS by probviding point as a parameter.
Port the application from MySQL to Postgresql/PostGIS.
This will require a lot of effort in rewriting queries and procedures.
I can still do it but how good is Postgresql at handling 400 million rows.
A quick search on google for "mysql 1 billion rows" returns many results. same query for Postgres returns no relevant results.
Would like to hear some thoughts & suggestions.
A few thoughts.
First PostgreSQL and MySQL are completely different beasts when it comes to performance tuning. So if you go the porting route be prepared to rethink your indexing strategies. Not only does PostgreSQL have a far more flexible indexing than MySQL, but the table approaches are very different also, meaning the appropriate indexing strategies are as different as the tactics are. Unfortunately this means you can expect to struggle a bit. If i could give advice I would suggest dropping all non-key indexes at first and then adding them back sparingly as needed.
The second point is that nobody here can likely give you a huge amount of practical advice at this point because we don't know the internals of your program. In PostgreSQL, you are best off indexing only what you need, but you can index functions' outputs (which is really helpful in cases like this) and you can index only part of a table.
I am more a PostgreSQL guy than a MySQL guy so of course I think you should go with PostgreSQL. However rather than tell you why etc. and have you struggle at this scale, I will tell you a few things that I would look at using if I were trying to do this.
Functional indexes
Write my own functions for indexes for related analysis
PostGIS is pretty amazing and very flexible
In the end, switching db's at this volume is going to be a learning curve, and you need to be prepared for that. However, PostgreSQL can handle the volume just fine.
The number of rows is quite irrelevant here.
The question is how much of the point in polygon work that can be done by the index.
The answer to that depends on how big the polygons are.
PostGIS is very fast to find all points in the bounding box of a polygon. Then it takes more effort to find out if the point actually is inside the polygon.
If your polygons is small (small bounding boxes) the query will be efficient. If your polygons are big or have a shape that mekes the bounding box big then it will be less efficient.
If your polygons is more or less static there is work arounds. You can divide your polygons in smaller polygons and recreate the idnex. Then the index will be more efficient.
If your polygons is actually multipolygons the firs step is to split the multipolygons to polygons with ST_Dump and recreate and build an index on the result.
HTH
Nicklas

Reverse Image Search Storage in Relational Database

So, this question is similar to what I need, but the answers there don't quite match. I'm looking for a way to take a set of SURF descriptors and store them in a MySQL database so that I can take an image from a user, and run a reverse image search quickly.
What I'm doing now
At the moment, I am taking the list of descriptors given to me by jOpenSurf, running through them, and converting them to two 64 character strings. With this, I can query and find exact matches very easily, but I don't just want exact matches, I would like to do comparison of features.
What (I think) I need to do
After doing a bit of research online and looking at the comparison code provided by jOpenSurf, I think what I need to do is store the vector value of each interest point in the database so that I can compare that. But that is where I'm stuck.
What I need help with
How in the world can I store a vector value into a MySQL database so I can do a comparison for similarity matching on images?
I don't know of anything in mysql that would allow vector comparisons natively - some people have questioned whether the geospatial module would allow this but the consensus was no (e.g. cosine function on vectors).
You would need to evaluate the similarity scores outside of the database and store them in the database after evaluation (e.g. store the top 5 hits per image in the database). Assuming you are using a symmetric scoring algorithm, you would only need to update the database for at most N images for managing top N similar images for each new image evaluated.
There is some interesting work using a reverse index database (e.g. search engine) in providing image relevancy searches based on both image features and additional metadata or text if you so choose: http://www.mendeley.com/research/lire-lucene-image-retrieval-an-extensible-java-cbir-library/

How good is the geography datatype in sql server 2008?

I have a large database full of customers, implemented in sql server 2005. Customers each have a latitude and longitude, represented as Decimal(18,15). The most important search query in the database tries to find all customers close to a certain location like this:
(Addresses.Latitude - #SearchInLat) BETWEEN -1 * #LatitudeBound AND #LatitudeBound)
AND ( (Addresses.Longitude - #SearchInLng) BETWEEN -1 * #LongitudeBound AND #LongitudeBound)
So, this is a very simple method. #LatitudeBound and #LongitudeBound are just numbers, used to pull back all the customers within a rough bounding rectangle of the point #SearchInLat, #SearchInLng. Once the results get to a client PC, some results are filtered out so that there is a bounding circle rather than a rectangle. (This is done on the client PC to avoid calculating square roots on the server.)
This method has worked well enough in the past. However, we now want to make the search do more interesting things - for instance, having the number of results pulled back be more predictable, or for the user to dynamically increase the size of the search radius. To do this, I have been looking at the possibility of ugprading to sql server 2008, with its Geography datatype, spatial indexes, and distance functions. My question is this: how fast are these?
The advantage of the simple query we have at the moment is that it is very fast and not performance intensive, which is important as it is called very often. How fast would a query based around something like this:
SearchInPoint.STDistance(Addresses.GeographicPoint) < #DistanceBound
be by comparison? Do the spatial indexes work well, and is STDistance fast?
If your handling just a standard Lat/Lng pair as you describe, and all your doing is a simple lookup, then arguably your not going to gain much in the way of a speed increase by using the Geometry Type.
However, if you do want to get more adventurous as you state, then swapping to using the Geometry types will open up a whole world of new possibilities for you, and not just for searches.
For example (Based on a project I'm working on) you could (If it's uk data) download the polygon definitions for all the towns / villages / city's for a given area, then do cross references to search in a particular town, or if you had a road map, you could find which customers lived next to major delivery routes, motorways, primary roads all sorts of things.
You could also do some very fancy reporting, imagine a map of towns, where each outline was plotted on a map, then shaded in with a colour to show density of customers in an area, some simple geometry SQL will easily return you a count straight from the database, to graph this kind of information.
Then there's tracking, I don't know what data you handle, or why you have customers, but if your delivering anything, feeding the co-ordinates of a delivery van in, tells you how close it is to a given customer.
As for the Question is STDistance fast? well that's difficult to say really, I think a better question is "Is it fast in comparison to.....", it's difficult to say yes or no, unless you have something to compare it to.
Spatial Indexes are one of the primary reasons for moving your data to geographically aware database they are optimised to produce the best results for a given task, but like any database, if you create bad indexes, then you will get bad performance.
In general you should definitely see a speed increase of some sort, because the maths in the sorting and indexing are more aware of the data's purpose as opposed to just being fairly linear in operation like a normal index is.
Bear in mind as well, that the more beefy the SQL server machine is, the better results you'll get.
One last point to mention is management of the data, if your using a GIS aware database, then that opens the avenue for you to use a GIS package such as ArcMap or MapInfo to manage, correct and visualise your data, meaning corrections are very easy to do by pointing, clicking and dragging.
My advice would be to create a side by side table to your existing one, that is formatted for spatial operations, then write a few stored procs and do some timing tests, see which comes out the best. If you have a significant increase just on the basic operations your doing, then that's justification alone, if it's about equal then your decision really hinges on, what new functionality you actually want to achieve.