I'm aware this kind of questions is frequently asked. But due to the age of the most ones and the enhancements on databases that I rode, I think it could be a good idea to create a new one.
I try to efficiency store coordinates (100K) in a database and do some operations on it. In others words, I try to store coordinates and get a rapid access to them. A typical operation should return all the records within a circle with a radius of 20km and center with given coordinates.
I rode MongoDB has a 2D spatial index that can store lat and long like in this example: https://stackoverflow.com/a/6026634/6271092
I also rode that it's possible to store and get coordinates within the circle by using MySQL, a kd-tree and the Haversine formula.
So my question is, in January 2018, which database and how should I use to store, access and do some operations with coordinates efficiency? Thx.
Related
I have data of locations of thousands of sensors in MySQL. I want to identify the sensor closest to the user's location and show that specific sensor's data. All the location data is available as lat lng.
I understand that one approach can be to find displacements between the origin and all the sensors using Haversine formula and select the one with the shortest distance. The problem here is that there are tens of thousands of sensors.
Any suggestions/leads?
Spatial index allows efficient query of points within any specific distance. The problem of course is one might not know the search radius needed in specific case. Unfortunately, a large radius causes inefficient queries, and a small radius might result in no match at all.
A possible solution is to search with increasing radius, until the search returns some results, and then find the closest result among those.
This article describes this solution for BigQuery, would require some adaptation for MySQL script dialect:
https://mentin.medium.com/nearest-neighbor-using-bq-scripting-373241f5b2f5
Not the MySQL answer you are looking for but Postgresql's popular PostGIS extension has an inbuilt K Nearest Neighbor operator class). Also, see its documentation. It works great!
Also, I am aware of this Go library that allows you to do KNN in memory after building a Quadtree with your sensor locations.
For only thousands, a simple bounding box with two 2-column indexes may be fast enough.
For better speed, see SPATIAL indexing.
For details on those two solutions, plus two faster ones, see Find Nearest
I want to implement a feature where a list of nearby venues can be presented sorted by the distance from user's location. The approach I have right now is to store lat and lon values as floats and to make a query that is looking for +/- values of the location of the user (searching for a square that extends north, south, east and west of the user). Then I do a quick calculation across the resultset determining the distance and sort in my business logic. Now I am approaching this with the perspective of someone who has primarily used relational databases (the app is running MySQL with Hibernate), but is there a better approach (in a different database like Neo4J or with a better column type?)
Also the approach I have has a semi complex workaround for queries at or near 0 lat or 0 lon).
As for my working definition of optimal I'm looking for approaches that are scalable to potentially hundreds of venues in a 10 mile radius and hundreds of thousands of venues in total. To put it another way approximately 1% of SimpleGEO, so if the scale of this problem doesn't require an optimal solution then "you're alright" would also be an interesting answer, though I'd be intested in knowing why)
You could have a look at Lucene/Solr. Lucene supported location-aware search at least since v2.9.
If you're worried about the Lucene complexities, there's Hibernate Search which is meant to replicated all database changes across to Lucene transparently.
MongoDB has native support for geospatial indexes and extensions to the query language to support a lot of different ways of querying your geo spatial documents.
But if you are looking for relation database try PostgreSQL with PostGIS.
Are you look to Hibernate Spatial?
Hibernate Spatial is a generic extension to Hibernate for handling geographic data. And HS have MySQL Provider.
http://www.hibernatespatial.org/
I am working with PHP and use MySQL for database. I need a way, to get 5 closest coordinates to a given coordinate from database, which is very fast and at least 80-90% accurate. I have researched a lot. I found havershine formula, spherical law of cosines, bounding square method to compare min and max latitude-longitude values with coordinate in database and other methods which use trigonometric math functions. But all these formulas take a long to return result in database with thousands of entries. Does MySQL provide any function to do it fast?
See this similar question on the GIS Stack site. The performance of your ultimate solution will depend on how many targets are in the reference table you are searching and if you can limit the distance you are interested in (such as closest 5 within 30 miles). I don't think you can reliably optimize the process; you need to calculate the distance for all coordinates in your reference table.
I have a large database full of customers, implemented in sql server 2005. Customers each have a latitude and longitude, represented as Decimal(18,15). The most important search query in the database tries to find all customers close to a certain location like this:
(Addresses.Latitude - #SearchInLat) BETWEEN -1 * #LatitudeBound AND #LatitudeBound)
AND ( (Addresses.Longitude - #SearchInLng) BETWEEN -1 * #LongitudeBound AND #LongitudeBound)
So, this is a very simple method. #LatitudeBound and #LongitudeBound are just numbers, used to pull back all the customers within a rough bounding rectangle of the point #SearchInLat, #SearchInLng. Once the results get to a client PC, some results are filtered out so that there is a bounding circle rather than a rectangle. (This is done on the client PC to avoid calculating square roots on the server.)
This method has worked well enough in the past. However, we now want to make the search do more interesting things - for instance, having the number of results pulled back be more predictable, or for the user to dynamically increase the size of the search radius. To do this, I have been looking at the possibility of ugprading to sql server 2008, with its Geography datatype, spatial indexes, and distance functions. My question is this: how fast are these?
The advantage of the simple query we have at the moment is that it is very fast and not performance intensive, which is important as it is called very often. How fast would a query based around something like this:
SearchInPoint.STDistance(Addresses.GeographicPoint) < #DistanceBound
be by comparison? Do the spatial indexes work well, and is STDistance fast?
If your handling just a standard Lat/Lng pair as you describe, and all your doing is a simple lookup, then arguably your not going to gain much in the way of a speed increase by using the Geometry Type.
However, if you do want to get more adventurous as you state, then swapping to using the Geometry types will open up a whole world of new possibilities for you, and not just for searches.
For example (Based on a project I'm working on) you could (If it's uk data) download the polygon definitions for all the towns / villages / city's for a given area, then do cross references to search in a particular town, or if you had a road map, you could find which customers lived next to major delivery routes, motorways, primary roads all sorts of things.
You could also do some very fancy reporting, imagine a map of towns, where each outline was plotted on a map, then shaded in with a colour to show density of customers in an area, some simple geometry SQL will easily return you a count straight from the database, to graph this kind of information.
Then there's tracking, I don't know what data you handle, or why you have customers, but if your delivering anything, feeding the co-ordinates of a delivery van in, tells you how close it is to a given customer.
As for the Question is STDistance fast? well that's difficult to say really, I think a better question is "Is it fast in comparison to.....", it's difficult to say yes or no, unless you have something to compare it to.
Spatial Indexes are one of the primary reasons for moving your data to geographically aware database they are optimised to produce the best results for a given task, but like any database, if you create bad indexes, then you will get bad performance.
In general you should definitely see a speed increase of some sort, because the maths in the sorting and indexing are more aware of the data's purpose as opposed to just being fairly linear in operation like a normal index is.
Bear in mind as well, that the more beefy the SQL server machine is, the better results you'll get.
One last point to mention is management of the data, if your using a GIS aware database, then that opens the avenue for you to use a GIS package such as ArcMap or MapInfo to manage, correct and visualise your data, meaning corrections are very easy to do by pointing, clicking and dragging.
My advice would be to create a side by side table to your existing one, that is formatted for spatial operations, then write a few stored procs and do some timing tests, see which comes out the best. If you have a significant increase just on the basic operations your doing, then that's justification alone, if it's about equal then your decision really hinges on, what new functionality you actually want to achieve.
I would like to store thousands of latitude/longitude points in a MySQL db. I was successful at setting up the tables and adding the data using the geospatial extensions where the column 'coord' is a Point(lat, lng).
Problem:
I want to quickly find the 'N' closest entries to latitude 'X' degrees and longitude 'Y' degrees. Since the Distance() function has not yet been implemented, I used GLength() function to calculate the distance between (X,Y) and each of the entries, sorting by ascending distance, and limiting to 'N' results. The problem is that this is not calculating shortest distance with spherical geometry. Which means if Y = 179.9 degrees, the list of closest entries will only include longitudes of starting at 179.9 and decreasing even though closer entries exist with longitudes increasing from -179.9.
How does one typically handle the discontinuity in longitude when working with spherical geometries in databases? There has to be an easy solution to this, but I must just be searching for the wrong thing because I have not found anything helpful.
Should I just forget the GLength() function and create my own function for calculating angular separation? If I do this, will it still be fast and take advantage of the geospatial extensions?
Thanks!
josh
UPDATE:
This is exactly what I am describing above. However, it is only for SQL Server. Apparently SQL Server has a Geometry and Geography datatypes. The geography does exactly what I need. Is there something similar in MySQL?
How does one typically handle the discontinuity in longitude when working with spherical geometries in databases?
Not many people use MySQL for this, because it's geospatial extensions aren't really up to snuff.
From the docs:
"All calculations are done assuming Euclidean (planar) geometry."
The solution is usually to roll your own.
Alternatively, you can fake it -- if your distances are less than a 500 miles or so, then you can treat your latitude and longitude as rectangular coordinates and just use the euclidean distance formula (sqrt(a^2 + b^2)).