store geolocation efficiently in mysql - mysql

right now I store long and lat as two decimal, indexed fields in the DB.
I am wondering (without installing any bizzare engine) if there is an efficient way to do this, so the index will also help me to calculate distance. A sample query would be
get me all the location in 10M radios from long X lat Y

Use the float datatype for latitude and longitude. Anything of higher precision is most likely over-engineering.
Unless your results need to be accurate to less than a meter or so, the float datatype has PLENTY of precision for what you're trying to do. If you are working at resolutions of less than a meter, you're going to need to find out about projections (sphere-to-plane) like Universal Transverse Mercator and Lambert.
When you start doing the computations, keep in mind that one minute (one-sixtieth of a degree) of change in latitude (north-to-south) is one nautical mile.
Here's a nice presentation from a mySql person on doing this search.
http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
The performance optimization is to make an index on the latitudes, and maybe also longitudes, then do a search like this (positive radius)
where loctable.lat >= (mylat-radius)
and loctable.lat <= (mylat+radius)
and loctable.long >= (mylong-radius)
and loctable.long <= (mylong+radius)
and haversine_distance(mylat, mylong, loctable.lat, loctable.long) <= radius
This searches for a bounding box. That bounding box is the right size in latitude, and probably too big in longitude (unless you're near the equator). But it's OK if the box is too big, because the last line gets rid of any extra matches.

You want to look for a spatial index or a space-filling-curve. A si reduces the 2d complexity to a 1d complexity. It's looks like a quadtree and a bit like a fractal. If you don't mind the shape and an exact search you want to delete the harvesine formule because you can just search for a quadtree tile. Of course you need the mercantor projection. This is by far the fastest method. I uses it a lot with a hilbert curve. You want to look for Nick's hilbert curve spatial index quadtree blog.

Related

SQL Finding the coordinates that belong to a circle

I have a SQL database set of places to which I am assigned coordinates (lat, long). I would like to ask those points that lie within a radius of 5km from my point inside. I wonder how to construct a query in a way that does not collect unnecessary records?
Since you are talking about small distances of about 5 km and we are probably not in the direct vicinity of the north or south pole we can work with an approximated grid system of longitude and latitude values. Each degree in latidude is equivalent to a distance of km_per_lat=6371km*2*pi/360degrees = 111.195km. The distance between two longitudinal lines that are 1 degree apart depends on the actual latitude:
km_per_long=km_per_lat * cos(lat)
For areas here in North Germany (51 degrees north) this value would be around 69.98km.
So, assuming we are interested in small distances around lat0 and long0 we can safely assume that the translation factors for longitudinal and latitudinal angles will stay the same and we can simply apply the formula
SELECT 111.195*sqrt(power(lat-#lat0,2)
+power(cos(pi()/180*#lat0)*(long-#long0),2)) dist_in_km FROM tbl
Since you want to use the formula in the WHERE clause of your select you could use the following:
SELECT * FROM tbl
WHERE 111.195*sqrt(power(lat-#lat0,2)
+power(cos(pi()/180*#lat0)*(long-#long0),2)) < 5
The select statement will work for latitude and longitude values given in degree (in a decimal notation). Because of that we have to convert the value inside the cos() function to radians by multiplying it with pi()/180.
If you have to work with larger distances (>500km) then it is probably better to apply the appropriate distance formula used in navigation like
cos(delta)=cos(lat0)*cos(lat)*cos(long-long0) + sin(lat0)*sin(lat)
After calculating the actual angle delta by applying acos() you simply multiply that value by the earth's radius R = 6371km = 180/pi()*111.195km and you have your desired distance (see here: Wiki: great circle distance)
Update (reply to comment):
Not sure what you intend to do. If there is only one reference position you want to compare against then you can of course precompile your distance calculation a bit like
SELECT #lat0:=51,#long0:=-9; -- assuming a base position of: 51°N 9°E
SELECT #rad:=PI()/180,#fx:=#rad*6371,#fy:=#fx*cos(#rad*#lat0);
Your distance calculation will then simplify to just
SELECT #dist:=sqrt(power(#fx*(lat-#lat0),2)+power(#fy*(long-#long0),2))
with current positions in lat and long (no more cosine functions necessary). It is up to you whether you want to store all incoming positions in the database first or whether you want to do the calculations somewhere outside in Spring, Java or whatever language you are using. The equations are there and easy to use.
I would go with Euklid. dist=sqrt(power(x1-x2,2)+power(y1-y2,2)) . It works everywhere. Maybe you have to add a conversion to the x/y-coordinates, if degrees can't be translated in km that easy.
Than you can go and select everything you like WHERE x IS BETWEEN (x-5) AND (x+5) AND y IS BETWEEN (y-5) AND (y+5) . Now you can check the results with Euklid.
With an optimisation of the result order, you can get better results at first. Maybe there's a way to take Euklid to SQL, too.

Quadtree for collisions with latitude/longitude (earth size)

I have a Google Map and a server sends a list of objects that have a position with a small radius (100m max). I need to quickly be able to know if a position is colliding with something in the list and draw on the map everything.
I'm thinking I should use a Quadtree (very useful in 2D collisions for games) but my issue is I'm not limited to a screen but to the earth !
Sure, if I have 100 objects it's not a problem but at any time the server can send me new objects that I need to add to the list and so my Quadtree could drastically change or become unbalanced.
What should I do ? Should I still use a Quadtree and modify the entire tree if a new element is added outside of the current boundaries ? Should I set the boundaries to the max latitude longitude (but could have issue with double precision) ? Or does someone knows a better data structure for that type of problem ?
rXp
To avoid issues with double precision, especially at the splitting border of a quad cell, it is advisable to use integer coordinates in the quad tree.
convert double lat/lon to int by multiplying with 1E6, this results in a precision of about 10cm.
You can use a space-filling-curve, for example a z curve.

Computing which points (latitude, longitude) are within a certain distance in mysql?

There are two points A, B, and distances x (miles from A), and y (miles from B). Let the distance from A to B be N. So, A is N miles away from B. How do I solve the problem: What are the points available that are (N + x + y) miles away from A? I'm not sure how to explain this any better. I really have no clue on how to attack this problem, I read Fastest Way to Find Distance Between Two Lat/Long Points and I believe the solution given calculates the distance between two points and have no idea if this solution could be used to apply to my problem, or if so, how.
If you are looking for an approximation algorithm I suggest to look for a k-means algorithm or a hierarchical cluster, especially a monster curve or a space filling curve. First off you can compute a minimal spanning tree of the graph and then remove the longest and expensivest edges. Then the tree makes many little trees and you can use the k-means to compute group of points i.e. clusters.
"The single-link k-clustering algorithm ... is precisely Kruskal's algorithm ... equivalent to finding an MST and deleting the k-1 most expensive edges." See for example here: https://stats.stackexchange.com/questions/1475/visualization-software-for-clustering.
A good example for a monster curve is the hilbert curve. The basic form of this curve is an U-shape and by copy many of it together and rotating it the curve fills the euklidian space. Surprisingly a gray code can help to find out the orientation of this U-shape. You can look up Nick's spatial index quadtree hilbert curve blog article about more details. Instead to calculate the curve's index you can put together a quadkey like in bing maps. The quadkey is unique for each coordinate and it can be used with normal string operations. Each position in the key is part of the U-shape curve and thus you can select this region of points from select partially from left to right from the quadkey.
In this image you can see the green polygon is found using a hilbert curve:
You can find my php classes here: http://www.phpclasses.org/package/6202-PHP-Generate-points-of-an-Hilbert-curve.html

Mysql geometry AREA() function returns what exactly when coords are long/lat?

My question is somewhat related to this similar one, which links to a pretty complex solution - but what I want to understand is the result of this:
Using a Mysql Geometry field to store a small polygon I duly ran
select AREA(myPolygon) where id =1
over it, and got an value like 2.345. So can anyone tell me, just what does that number represent seeing as the stored values were long/lat sets describing the polygon?
FYI, the areas I am working on are relatively small (car parks and the like) and the area does not have to be exact - I will not be concerned about the curvature of the earth.
2.345 of what? Thanks, this is bugging me.
The short answer is that the units for your area calculation are basically meaningless ([deg lat diff] * [deg lon diff]). Even though the curvature of the earth wouldn't come into play for the area calculation (since your areas are "small"), it does come into play for the calculation of distance between the lat/lon polygon coordinates.
Since a degree of longitude is different based on the distance from the equator (http://en.wikipedia.org/wiki/Longitude#Degree_length), there really is no direct conversion of your area into m^2 or km^2. It is dependent on the distance north/south of the equator.
If you always have rectangular polygons, you could just store the opposite corner coordinates and calculate area using something like this: PHP Library: Calculate a bounding box for a given lat/lng location
The most "correct" thing to do would be to store your polygons using X-Y (meters) coordinates (perhaps UTM using the WGS-84 ellipsoid), which can be calculated from lat/lon using various libraries like the following for Java: Java, convert lat/lon to UTM. You could then continue to use the MySQL AREA() function.

What is the ideal data type to use when storing latitude / longitude in a MySQL database?

Bearing in mind that I'll be performing calculations on lat / long pairs, what datatype is best suited for use with a MySQL database?
Basically it depends on the precision you need for your locations. Using DOUBLE you'll have a 3.5nm precision. DECIMAL(8,6)/(9,6) goes down to 16cm. FLOAT is 1.7m...
This very interesting table has a more complete list: http://mysql.rjweb.org/doc.php/latlng :
Datatype Bytes Resolution
Deg*100 (SMALLINT) 4 1570 m 1.0 mi Cities
DECIMAL(4,2)/(5,2) 5 1570 m 1.0 mi Cities
SMALLINT scaled 4 682 m 0.4 mi Cities
Deg*10000 (MEDIUMINT) 6 16 m 52 ft Houses/Businesses
DECIMAL(6,4)/(7,4) 7 16 m 52 ft Houses/Businesses
MEDIUMINT scaled 6 2.7 m 8.8 ft
FLOAT 8 1.7 m 5.6 ft
DECIMAL(8,6)/(9,6) 9 16cm 1/2 ft Friends in a mall
Deg*10000000 (INT) 8 16mm 5/8 in Marbles
DOUBLE 16 3.5nm ... Fleas on a dog
Use MySQL's spatial extensions with GIS.
Google provides a start to finish PHP/MySQL solution for an example "Store Locator" application with Google Maps. In this example, they store the lat/lng values as "Float" with a length of "10,6"
http://code.google.com/apis/maps/articles/phpsqlsearch.html
MySQL's Spatial Extensions are the best option because you have the full list of spatial operators and indices at your disposal. A spatial index will allow you to perform distance-based calculations very quickly. Please keep in mind that as of 6.0, the Spatial Extension is still incomplete. I am not putting down MySQL Spatial, only letting you know of the pitfalls before you get too far along on this.
If you are dealing strictly with points and only the DISTANCE function, this is fine. If you need to do any calculations with Polygons, Lines, or Buffered-Points, the spatial operators do not provide exact results unless you use the "relate" operator. See the warning at the top of 21.5.6. Relationships such as contains, within, or intersects are using the MBR, not the exact geometry shape (i.e. an Ellipse is treated like a Rectangle).
Also, the distances in MySQL Spatial are in the same units as your first geometry. This means if you're using Decimal Degrees, then your distance measurements are in Decimal Degrees. This will make it very difficult to get exact results as you get furthur from the equator.
When I did this for a navigation database built from ARINC424 I did a fair amount of testing and looking back at the code, I used a DECIMAL(18,12) (Actually a NUMERIC(18,12) because it was firebird).
Floats and doubles aren't as precise and may result in rounding errors which may be a very bad thing. I can't remember if I found any real data that had problems - but I'm fairly certain that the inability to store accurately in a float or a double could cause problems
The point is that when using degrees or radians we know the range of the values - and the fractional part needs the most digits.
The MySQL Spatial Extensions are a good alternative because they follow The OpenGIS Geometry Model. I didn't use them because I needed to keep my database portable.
Depends on the precision that you require.
Datatype Bytes resolution
------------------ ----- --------------------------------
Deg*100 (SMALLINT) 4 1570 m 1.0 mi Cities
DECIMAL(4,2)/(5,2) 5 1570 m 1.0 mi Cities
SMALLINT scaled 4 682 m 0.4 mi Cities
Deg*10000 (MEDIUMINT) 6 16 m 52 ft Houses/Businesses
DECIMAL(6,4)/(7,4) 7 16 m 52 ft Houses/Businesses
MEDIUMINT scaled 6 2.7 m 8.8 ft
FLOAT 8 1.7 m 5.6 ft
DECIMAL(8,6)/(9,6) 9 16cm 1/2 ft Friends in a mall
Deg*10000000 (INT) 8 16mm 5/8 in Marbles
DOUBLE 16 3.5nm ... Fleas on a dog
From: http://mysql.rjweb.org/doc.php/latlng
To summarise:
The most precise available option is DOUBLE.
The most common seen type used is DECIMAL(8,6)/(9,6).
As of MySQL 5.7, consider using Spatial Data Types (SDT), specifically POINT for storing a single coordinate. Prior to 5.7, SDT does not support indexes (with exception of 5.6 when table type is MyISAM).
Note:
When using POINT class, the order of the arguments for storing coordinates must be POINT(latitude, longitude).
There is a special syntax for creating a spatial index.
The biggest benefit of using SDT is that you have access to Spatial Analyses Functions, e.g. calculating distance between two points (ST_Distance) and determining whether one point is contained within another area (ST_Contains).
Based on this wiki article
http://en.wikipedia.org/wiki/Decimal_degrees#Accuracy
the appropriate data type in MySQL is Decimal(9,6) for storing the longitude and latitude in
separate fields.
Use DECIMAL(8,6) for latitude (90 to -90 degrees) and DECIMAL(9,6) for longitude (180 to -180 degrees). 6 decimal places is fine for most applications. Both should be "signed" to allow for negative values.
No need to go far, according to Google Maps, the best is FLOAT(10,6) for lat and lng.
We store latitude/longitude X 1,000,000 in our oracle database as NUMBERS to avoid round off errors with doubles.
Given that latitude/longitude to the 6th decimal place was 10 cm accuracy that was all we needed. Many other databases also store lat/long to the 6th decimal place.
TL;DR
Use FLOAT(8,5) if you're not working in NASA / military and not making aircrafts navi systems.
To answer your question fully, you'd need to consider several things:
Format
degrees minutes seconds: 40° 26′ 46″ N 79° 58′ 56″ W
degrees decimal minutes: 40° 26.767′ N 79° 58.933′ W
decimal degrees 1: 40.446° N 79.982° W
decimal degrees 2: -32.60875, 21.27812
Some other home-made format? Noone forbids you from making your own home-centric coordinates system and store it as heading and distance from your home. This could make sense for some specific problems you're working on.
So the first part of the answer would be - you can store the coordinates in the format your application uses to avoid constant conversions back and forth and make simpler SQL queries.
Most probably you use Google Maps or OSM to display your data, and GMaps are using "decimal degrees 2" format. So it will be easier to store coordinates in the same format.
Precision
Then, you'd like to define precision you need. Of course you can store coordinates like "-32.608697550570334,21.278081997935146", but have you ever cared about millimeters while navigation to the point? If you're not working in NASA and not doing satellites or rockets or planes trajectories, you should be fine with several meters accuracy.
Commonly used format is 5 digits after dots which gives you 50cm accuracy.
Example: there is 1cm distance between X,21.2780818 and X,21.2780819. So 7 digits after dot give you 1/2cm precision and 5 digits after dot will give you 1/2 meters precision (because minimal distance between distinct points is 1m, so rounding error cannot be more than half of it). For most civil purposes it should be enough.
degrees decimal minutes format (40° 26.767′ N 79° 58.933′ W) gives you exactly the same precision as 5 digits after dot
Space-efficient storage
If you've selected decimal format, then your coordinate is a pair (-32.60875, 21.27812). Obviously, 2 x (1 bit for sign, 2 digits for degrees and 5 digits for exponent) will be enough.
So here I'd like to support Alix Axel from comments saying that Google suggestion to store it in FLOAT(10,6) is really extra, because you don't need 4 digits for main part (since sign is separated and latitude is limited to 90 and longitude is limited to 180). You can easily use FLOAT(8,5) for 1/2m precision or FLOAT(9,6) for 50/2cm precision. Or you can even store lat and long in separated types, because FLOAT(7,5) is enough for lat. See MySQL float types reference. Any of them will be like normal FLOAT and equal to 4 bytes anyway.
Usually space is not an issue nowadays, but if you want to really optimize the storage for some reason (Disclaimer: don't do pre-optimization), you may compress lat(no more than 91 000 values + sign) + long(no more than 181 000 values + sign) to 21 bits which is significantly less than 2xFLOAT (8 bytes == 64 bits)
In a completely different and simpler perspective:
if you are relying on Google for showing your maps, markers, polygons, whatever, then let the calculations be done by Google!
you save resources on your server and you simply store the latitude and longitude together as a single string (VARCHAR), E.g.: "-0000.0000001,-0000.000000000000001" (35 length and if a number has more than 7 decimal digits then it gets rounded);
if Google returns more than 7 decimal digits per number, you can get that data stored in your string anyway, just in case you want to detect some flees or microbes in the future;
you can use their distance matrix or their geometry library for calculating distances or detecting points in certain areas with calls as simple as this: google.maps.geometry.poly.containsLocation(latLng, bermudaTrianglePolygon))
there are plenty of "server-side" APIs you can use (in Python, Ruby on Rails, PHP, CodeIgniter, Laravel, Yii, Zend Framework, etc.) that use Google Maps API.
This way you don't need to worry about indexing numbers and all the other problems associated with data types that may screw up your coordinates.
While it isn't optimal for all operations, if you are making map tiles or working with large numbers of markers (dots) with only one projection (e.g. Mercator, like Google Maps and many other slippy maps frameworks expect), I have found what I call "Vast Coordinate System" to be really, really handy. Basically, you store x and y pixel coordinates at some way-zoomed-in -- I use zoom level 23. This has several benefits:
You do the expensive lat/lng to mercator pixel transformation once instead of every time you handle the point
Getting the tile coordinate from a record given a zoom level takes one right shift.
Getting the pixel coordinate from a record takes one right shift and one bitwise AND.
The shifts are so lightweight that it is practical to do them in SQL, which means you can do a DISTINCT to return only one record per pixel location, which will cut down on the number records returned by the backend, which means less processing on the front end.
I talked about all this in a recent blog post:
http://blog.webfoot.com/2013/03/12/optimizing-map-tile-generation/
Latitudes range from -90 to +90 (degrees), so DECIMAL(10, 8) is ok for that
longitudes range from -180 to +180 (degrees) so you need DECIMAL(11, 8).
Note: The first number is the total number of digits stored, and the second is the number after the decimal point.
In short: lat DECIMAL(10, 8) NOT NULL, lng DECIMAL(11, 8) NOT NULL
The spatial functions in PostGIS are much more functional (i.e. not constrained to BBOX operations) than those in the MySQL spatial functions. Check it out: link text
depending on you application, i suggest using FLOAT(9,6)
spatial keys will give you more features, but in by production benchmarks the floats are much faster than the spatial keys. (0,01 VS 0,001 in AVG)
MySQL uses double for all floats ...
So use type double. Using float will lead to unpredictable rounded values in most situations
I suggest you use Float datatype for SQL Server.
The ideal datatype for storing Lat Long values is decimal(9,6)
This is at approximately 10cm precision, whilst only using 5 bytes of storage.
e.g. CAST(123.456789 as decimal(9,6))
GeoLocationCoordinates returns a double data type representing the position's latitude and longitude in decimal degrees. You can try using double.
Lat Long calculations require precision, so use some type of decimal type and make the precision at least 2 higher than the number you will store in order to perform math calculations. I don't know about the my sql datatypes but in SQL server people often use float or real instead of decimal and get into trouble because these are are estimated numbers not real ones. So just make sure the data type you use is a true decimal type and not a floating decimal type and you should be fine.
A FLOAT should give you all of the precision you need, and be better for comparison functions than storing each co-ordinate as a string or the like.
If your MySQL version is earlier than 5.0.3, you may need to take heed of certain floating point comparison errors however.
Prior to MySQL 5.0.3, DECIMAL columns store values with exact precision because they are represented as strings, but calculations on DECIMAL values are done using floating-point operations. As of 5.0.3, MySQL performs DECIMAL operations with a precision of 64 decimal digits, which should solve most common inaccuracy problems when it comes to DECIMAL columns