I am planning a website (Drupal/MySQL), which must search a fairly large database based on distance from a location (we're starting with ~20,000 locations). So far, the best solution I've found to searching in a reasonable manner is to use a user-defined function in SQL to calculate the distance between to coordinates, e.g.:
SELECT *, CoordinateDistanceMiles(lat, lon, ${inputLat}, ${inputLon}) as distance
FROM items WHERE distance < {$radius}
(Using John Dyer's distance function or similar)
However, I've also read that UDFs are very inefficient. My second idea (and tentative plan) is to nest another query inside this one to narrow its' scope and therefore run the UDF on a much smaller subset of items, e.g.:
SELECT *, CoordinateDistanceMiles(lat, lon, ${inputLat}, ${inputLon}) as distance
FROM (
SELECT * FROM items WHERE
lat BETWEEN ${inputLat - const} AND ${inputLat + const} AND
lon BETWEEN ${inputLon - const} AND ${inputLon + const}
) WHERE distance < ${radius}
Would this model make the search faster, or just more convoluted? Are there any better solutions?
The overhead of using UDF here is negligible, as long as you perform scan over distance < ${radius} and have 2 range-based comparisons (they cannot be optimized with indexes).
So don't worry about UDF "inefficiency" and use it, since it is much more readable.
Related
I see many solutions for getting the nearest rows from a POINT to convert to X() and Y() and do trig calculations of distances... As I understand, this does not seem to take advantage of the spatial index?
How do you take advantage of the spatial index, in the most common sense of, returning rows whose spatial POINT is within a radius from a center POINT?
In other words, how do you get something like this - where LatLng is the lat lng location stored as POINT for each row, and CenterPoint the epicenter
Pseudocode query: SELECT * FROM geotable WHERE d=Distance(LatLng,CenterPoint) < 10 ORDER by d
You can use st_distance_sphere
SELECT *
FROM geotable
WHERE st_distance_sphere(POINT(-82.337036, 29.645095 ), POINT(`longitude`, `latitude` ))/1000 < 10
Here you can see a working example
I have some data with latitude and longitude information, but most of the data points are geographically dense and not representative. I hope to pick a representative subset with uniform distribution from these data sets.
Below is my data example
no lon lat
1 121.62 31.18
2 121.91 30.90
3 121.76 31.11
4 121.49 31.12
... ...
I checked some information, learned to group latitude and longitude, and then use Pearson chi-square test, but I am not familiar with the use of sql.
I hope to get sql code to get such a subset or better use sql to get even distribution Methods.
You usually bucket the points by some grid, and select one (random) point from each cell. If the area is relatively small, you can use GeoHash as bucket Id. To select an arbitrary point, use ANY_VALUE aggregate (it is a strange aggregate function that returns arbitrary element from a group - not truly random, but probably good enough here).
The query would be something like
SELECT ANY_VALUE(geo_point)
FROM (
SELECT
ST_MakePoint(lon, lat) as geo_point,
ST_GeoHash(lon, lat, <level>) as geo_hash
FROM <table>
)
GROUP BY geo_hash
For larger areas GeoHash is not a good choice as it is much more dense near poles than near equator, and the solution would depend on how complicated you want it to be :). Either ignore the problem, and keep using GeoHash, or maybe switch to S2 cell id which offers more uniform distribution, or create some custom grid and find grid id for each point using ST_Intersects condition.
I have a SQL database set of places to which I am assigned coordinates (lat, long). I would like to ask those points that lie within a radius of 5km from my point inside. I wonder how to construct a query in a way that does not collect unnecessary records?
Since you are talking about small distances of about 5 km and we are probably not in the direct vicinity of the north or south pole we can work with an approximated grid system of longitude and latitude values. Each degree in latidude is equivalent to a distance of km_per_lat=6371km*2*pi/360degrees = 111.195km. The distance between two longitudinal lines that are 1 degree apart depends on the actual latitude:
km_per_long=km_per_lat * cos(lat)
For areas here in North Germany (51 degrees north) this value would be around 69.98km.
So, assuming we are interested in small distances around lat0 and long0 we can safely assume that the translation factors for longitudinal and latitudinal angles will stay the same and we can simply apply the formula
SELECT 111.195*sqrt(power(lat-#lat0,2)
+power(cos(pi()/180*#lat0)*(long-#long0),2)) dist_in_km FROM tbl
Since you want to use the formula in the WHERE clause of your select you could use the following:
SELECT * FROM tbl
WHERE 111.195*sqrt(power(lat-#lat0,2)
+power(cos(pi()/180*#lat0)*(long-#long0),2)) < 5
The select statement will work for latitude and longitude values given in degree (in a decimal notation). Because of that we have to convert the value inside the cos() function to radians by multiplying it with pi()/180.
If you have to work with larger distances (>500km) then it is probably better to apply the appropriate distance formula used in navigation like
cos(delta)=cos(lat0)*cos(lat)*cos(long-long0) + sin(lat0)*sin(lat)
After calculating the actual angle delta by applying acos() you simply multiply that value by the earth's radius R = 6371km = 180/pi()*111.195km and you have your desired distance (see here: Wiki: great circle distance)
Update (reply to comment):
Not sure what you intend to do. If there is only one reference position you want to compare against then you can of course precompile your distance calculation a bit like
SELECT #lat0:=51,#long0:=-9; -- assuming a base position of: 51°N 9°E
SELECT #rad:=PI()/180,#fx:=#rad*6371,#fy:=#fx*cos(#rad*#lat0);
Your distance calculation will then simplify to just
SELECT #dist:=sqrt(power(#fx*(lat-#lat0),2)+power(#fy*(long-#long0),2))
with current positions in lat and long (no more cosine functions necessary). It is up to you whether you want to store all incoming positions in the database first or whether you want to do the calculations somewhere outside in Spring, Java or whatever language you are using. The equations are there and easy to use.
I would go with Euklid. dist=sqrt(power(x1-x2,2)+power(y1-y2,2)) . It works everywhere. Maybe you have to add a conversion to the x/y-coordinates, if degrees can't be translated in km that easy.
Than you can go and select everything you like WHERE x IS BETWEEN (x-5) AND (x+5) AND y IS BETWEEN (y-5) AND (y+5) . Now you can check the results with Euklid.
With an optimisation of the result order, you can get better results at first. Maybe there's a way to take Euklid to SQL, too.
right now I store long and lat as two decimal, indexed fields in the DB.
I am wondering (without installing any bizzare engine) if there is an efficient way to do this, so the index will also help me to calculate distance. A sample query would be
get me all the location in 10M radios from long X lat Y
Use the float datatype for latitude and longitude. Anything of higher precision is most likely over-engineering.
Unless your results need to be accurate to less than a meter or so, the float datatype has PLENTY of precision for what you're trying to do. If you are working at resolutions of less than a meter, you're going to need to find out about projections (sphere-to-plane) like Universal Transverse Mercator and Lambert.
When you start doing the computations, keep in mind that one minute (one-sixtieth of a degree) of change in latitude (north-to-south) is one nautical mile.
Here's a nice presentation from a mySql person on doing this search.
http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
The performance optimization is to make an index on the latitudes, and maybe also longitudes, then do a search like this (positive radius)
where loctable.lat >= (mylat-radius)
and loctable.lat <= (mylat+radius)
and loctable.long >= (mylong-radius)
and loctable.long <= (mylong+radius)
and haversine_distance(mylat, mylong, loctable.lat, loctable.long) <= radius
This searches for a bounding box. That bounding box is the right size in latitude, and probably too big in longitude (unless you're near the equator). But it's OK if the box is too big, because the last line gets rid of any extra matches.
You want to look for a spatial index or a space-filling-curve. A si reduces the 2d complexity to a 1d complexity. It's looks like a quadtree and a bit like a fractal. If you don't mind the shape and an exact search you want to delete the harvesine formule because you can just search for a quadtree tile. Of course you need the mercantor projection. This is by far the fastest method. I uses it a lot with a hilbert curve. You want to look for Nick's hilbert curve spatial index quadtree blog.
Tech used: MySQL 5.1 and PHP 5.3
I am just designing a new database for a site I am writing. I am looking at the best way of now storing Lat and Lng values.
In the past I have been using DECIMAL and using a PHP/MySQL select in the form:
SQRT(POW(69.1 * (fld_lat - ( $lat )), 2) + POW(69.1 * (($lon) - fld_lon) * COS(fld_lat / 57.3 ), 2 )) AS distance
to find nearest matching places.
Starting to read up more on new technologies I am wondering if I should use Spatial Extensions. http://dev.mysql.com/doc/refman/5.1/en/geometry-property-functions.html
Information is quite thin on the ground though and had a question on how to store the data. Instead of using DECIMAL, would I now use POINT as a Datatype?
Also, once stored as a POINT is it easy just to get the Lat Lng values from it in case I want to plot it on a map or should I additionally store the lat lngs as DECIMALS again as well?
I know I should prob use PostGIS as most posts on here say I just don't want to learn a new DB though!
Follow up
I have been playing with the new POINT type. I have been able to add Lat Lng values using the following:
INSERT INTO spatialTable (placeName, geoPoint) VALUES( "London School of Economics", GeomFromText( 'POINT(51.514 -0.1167)' ));
I can then get the Lat and Lng values back from the Db using:
SELECT X(geoPoint), Y(geoPoint) FROM spatialTable;
This all looks good, however the calculation for distance is the bit I need to solve. Apparently MySQL has a place-holder for a distance function but won't be released for a while. In a few posts I have found I need to do something like the below, however I think my code is slightly wrong:
SELECT
placeName,
ROUND(GLength(
LineStringFromWKB(
LineString(
geoPoint,
GeomFromText('POINT(52.5177, -0.0968)')
)
)
))
AS distance
FROM spatialTable
ORDER BY distance ASC;
In this example geoPoint is a POINT entered into the DB using the INSERT above.
GeomFromText('POINT(52.5177, -0.0968)' is a Lat Lng value I want to calculate a distance from.
More Follow-up
Rather stupidly I had just put in the ROUND part of the SQL without really thinking. Taking this out gives me:
SELECT
placeName,
(GLength(
LineStringFromWKB(
LineString(
geoPoint,
GeomFromText('POINT(51.5177 -0.0968)')
)
)
))
AS distance
FROM spatialTable
ORDER BY distance ASC
Which seems to give me the correct distances I need.
I suppose the only thing currently that needs answering is any thoughts on whether I am just making life difficult for myself by using Spatial now or future-proofing myself...
I think you should always use the highest level abstraction easily available. If your data is geospatial, then use geospatial objects.
But be careful. Mysql is the worst geospatial database there is. Its OK for points but all its polygon functions are completely broken - they change the polygon to its bounding rectangle and then do the answer on that.
The worst example that hit me is that if you have a polygon representing Japan and you ask what places are in Japan, Vladivostok gets into the list!
Oracle and PostGIS don't have this problem. I expect MSSQL doesn't and any Java database using JTS as its engine doesn't. Geospatial Good. MySQL Geospatial Bad.
Just read here How do you use MySQL spatial queries to find all records in X radius? that its fixed in 5.6.1.
Hoorah!
Mysql GIS yagni:
If you have no experience with GIS, learning spatial extensions is practically like learning a new database, plus a little math, and a lot of acronyms. Maps, projections, srids, formats... Do you have to learn all that to calculate distances between points given a certain lat/long: probably not, will you be integrating 3rd party GIS data or working with anything more complex than points, what coordinate system will you be using?
Going back to yagni: do things as simple as posible, in this case implement your code in php or with simple SQL. Once you reach a barrier and decide you need spatial, read up on GIS system, coordinate systems, projects, and conventions.
By then, you will probably want PostGIS.
It's a good thing, because then you get to use spatial indexes on your queries. Limit to a bounding box, for example, to limit how many rows to compare against.
If you can affor placing some extra code into your backend, use Geohash.
It encodes a coordinate into a string in a way that prefixes denote a broader area. The longer your string is, the more precision you have.
And it has bindings for many languages.
http://en.wikipedia.org/wiki/Geohash
https://www.elastic.co/guide/en/elasticsearch/guide/current/geohashes.html