Calculating short distances between lat/long points - mysql

I have a MySQL table with Spatial Points, and need to calculate distances. I found lots of material on doing this using the Haversine formula, however all of these assume a large distance between points. In my case, I only care about short distances (< 1 mile) so I don't need to correct for the earth's curvature. My intuition is using the Haversine formula will be inaccurate at such small distances. Any suggestions?

Your intuition is incorrect. Consider the haversine formula, and the definition of haversine, according to Wikipedia (φ is latitude and ψ is longitude):
There is a further fact that is relevant: for small values of θ, sin θ is approximately equal to θ; more relevantly, it is approximately linear in θ. Therefore, haversin θ will be approximately (θ/2)². This approximation gets better as θ approaches zero.
If the latitude and longitude are close together, then φ₂ - φ₁ and ψ₂ - ψ₁, which are what the haversine function is applied to here, will be close to zero, meaning that the formula is approximately
(d/2r)² = ((φ₂ - φ₁) / 2)² + cos(φ₁) cos(φ₂) ((ψ₂ - ψ₁) / 2)²
Now note that this formula has the same form as Euclidean distance in two dimensions with some arbitrary scaling factors (remembering that (kx)² = k² x² so we can move constants in and out of the squares):
k₁ d² = k₂ ∆φ² + k₃ ∆ψ²
Lastly, I assert without proof that those arbitrary scaling factors turn out to be the same ones which convert changes in latitude/longitude to linear distance.
Therefore, the haversine formula does not become inaccurate for small distances; it is precisely the same as an ordinary Euclidean distance calculation, in the limit of small distances.

Create your points using Point values of Geometry datatypes in
MyISAM table
Create a SPATIAL index on these points
Use MBRContains() to find the values:
SELECT *
FROM table
WHERE MBRContains(LineFromText(CONCAT(
'('
, #lon + 10 / ( 111.1 / cos(RADIANS(#lon)))
, ' '
, #lat + 10 / 111.1
, ','
, #lon - 10 / ( 111.1 / cos(RADIANS(#lat)))
, ' '
, #lat - 10 / 111.1
, ')' )
,mypoint)
, or, in MySQL 5.1 and above:
SELECT *
FROM table
WHERE MBRContains
(
LineString
(
Point
(
#lon + 10 / ( 111.1 / COS(RADIANS(#lat))),
#lat + 10 / 111.1
)
Point
(
#lon - 10 / ( 111.1 / COS(RADIANS(#lat))),
#lat - 10 / 111.1
)
),
mypoint
)
This will select all points approximately within the box (#lat +/- 10 km, #lon +/- 10km).
This actually is not a box, but a spherical rectangle: latitude and longitude bound segment of the sphere. This may differ from a plain rectangle on the Franz Joseph Land, but quite close to it on most inhabited places.
Apply additional filtering to select everything inside the circle (not the square)
Possibly apply additional fine filtering to account for the big circle distance (for large distances)
here following solution to click

Related

How to get more precision for a computed distance in MySQL, without using geometric types?

I must compute the distance between an object (a city) and each of the several entries from a MySQL table I have (some restaurants). This city and the restaurants are located in a same country.
The computed distance is used in order to show all the restaurants which are close to this city ; the threshold distance is arbitrary. Moreover, this is a ranked list: the closest restaurants are shown first, and the farest are shown at end-of-list. My problem is about this ranking.
What I've done for now
So I made some researches and I succeeded in computing this distance.
$special_select_distance = "DEGREES(ACOS(COS(RADIANS(" . $oneVilles->__get('latitude')[app::getLang()] . ")) * COS(RADIANS(lat)) * COS(RADIANS(lon) - RADIANS(" . $oneVilles->__get('longitude')[app::getLang()] . ")) + SIN(RADIANS(" . $oneVilles->__get('latitude')[app::getLang()] . ")) * SIN(RADIANS(lat))))";
$restaurants = $restaurantsDAO->getAll(null, ['distance DESC'] , null, 'HAVING distance < 1.9' , null , '*, ' . $special_select_distance . " AS distance");
... where:
['distance DESC'] stands for the ranking by distance
'HAVING distance < 1.9' stands for the arbitrary threshold
'*, ' . $special_select_distance . " AS distance" is the selector
$oneVilles->__get('latitude')[app::getLang()] and $oneVilles->__get('longitude')[app::getLang()] are the city's coordinates lat and lon
lat and lon are the restaurant's coordinates (automatically taken into the table we are iterating on, i.e.: restaurants table, since we use the restaurants DAO)
Question
Actual and unexpected result
For each of the restaurants that are quite close between themselves, the computed distance with the city remains the same.
Example: assume that restaurants A and B are quite close. Then, the distance between A and the city is the same than B and the city, it's my actual and unexpected result.
This is not what I want. Indeed, in reality one of these restaurants is closest to the city than the other. I think there isn't enough precision in MySQL.
Expected result
Expected result: to make the restaurants ranking according to the distance to the city working. In other words, to get a more precise computed distance.
Example: assume that restaurants A and B are quite close. Then, the distance between A and the city is shorter than B and the city, it's my expected result.
Examples of computed distances
Between a restaurant and the city (the restaurant being far from the city): 1.933156948976873
Between a restaurant A and the city (A being close to the city): 1.6054631070094885
Between a restaurant B and the city (B being close to A): 1.6054631070094885
Distances in points 2. and 3. are the same and it's not normal. I would want to have more digits, in order to be able to rank my restaurants more efficiently.
Constraints
I wouldn't want to change the configuration of the MySQL Server.
In particular: I absolutely can't use MySQL geometric types (it's a firm's constraint)
The expected solution should simply change the SQL query I wrote and provided to you, in order to be more precise, if it's possible.
Other methods of calculating the distance are allowed, if necessary.
For long distances, use the Haversine formula for accuracy. For short distances, Pythagoras is twice as fast.
16 significant digits (data type DOUBLE) is ludicrous. You don't need to distinguish two different fleas on your dog.
With Pythagoras, be sure to divide the longitude by the cosine of the latitude -- One degree of longitude near Helsinki is half as far as one degree at the equator.
Some more details here: http://mysql.rjweb.org/doc.php/latlng
If 1.6054631070094885 is a latitude diff, then think about it this way: If you and I are at the same longitude, but our latitudes are 1.605463 and 1.605464, then, well, I don't know you well enough to be that close.
It is not practical to compare two floating point values without having a fudge factor:
If abs(a-b) < 0.00001, then treat them as equal.
More
I recommend FLOAT for lat, lng, and distance since you are talking about restaurants. If you are not talking about more than, say, 100 miles or km, then this expression is sufficiently precise:
SQRT( ($lat - lat) *
($lat - lat) +
(($lng - lng) * COS(RADIANS(lat))) *
(($lng - lng) * COS(RADIANS(lat))) ) * $factor
Where...
lat and lng are names of FLOAT columns in the table, in units of degrees.
$lat and $lng are values of the location you are starting from, also in degrees. (PHP uses $; other languages use other conventions.)
$factor is 69.172 for miles or 111.325 for kilometers.
I would not display the result with more than perhaps 1 decimal place. (Don't display "12.345678 miles"; "12.3 miles" is good enough.)
A comparison of Pythagoras and GCD:
Pyt GCD
To Rennes: 93.9407 93.6542
To Vannes: 95.6244 95.6241

SQL - Agg Func Manhattan Distance

SO Link doesn't answer the question. I can't figure out how to solve this query on Hackerspace. None of the solutions online seem to be working. Is this a bug or am I doing something wrong?
Consider P1(a,b) and P2(c,d) to be two points on a 2D plane.
a happens to equal the minimum value in Northern Latitude (LAT_N in STATION).
b happens to equal the minimum value in Western Longitude (LONG_W in STATION).
c happens to equal the maximum value in Northern Latitude (LAT_N in STATION).
d happens to equal the maximum value in Western Longitude (LONG_W in STATION).
Query the Manhattan Distance between points and and round it to a scale of decimal places.
Input Format
The STATION table is described as follows:
STATION Table
ID | Number
City | VarChar2(21)
State | VarChar2(2)
LAT_N | Number
LONG_W | Number
Database: MySQL
Source: https://www.hackerrank.com/challenges/weather-observation-station-18/problem
Link: distance between two longitude and latitude (Tried, but none of the answers provided work.)
SELECT ROUND(ABS(MIN(Station.LAT_N) - MIN(Station.LONG_W)) + ABS(MAX(Station.LAT_N) - MAX(Station.Long_W)), 4)
FROM Station;
The formula for manhattan distance is | a - c| + | b - d| where a and b are min lat and long and c and d are max lat and long respectively.
select
round(
abs(
min(lat_n)- max(lat_n)
) + abs(
min(long_w)- max(long_w)
), 4
)
from
station;
I got 25 hakker points! so can I get 25 points from you?
Without just writing the answer: you need to calculate the horizontal difference between the min and max longitude, and add the vertical difference between the min and max latitude.
Your code does something a bit different. If you update your code accordingly, then the rest is OK and will be marked as correct by hackerrank.
You are comparing latitude and longitude when instead you need to compare latitude with latitude and longitude with longitude. The Manhattan distance between (1,3) and (2,4) is |1-2|+|3-4|, not |1-4|+|2-3|.
It should also be pointed out that since you're taking the min and max of the same range, you don't actually need the absolute value function. round(max(x)-min(x)+max(y)-min(y), 4) works perfectly well - and is slightly faster.
My answer for MS SQL
SELECT CAST(
ABS(MAX(LAT_N) - MIN(LAT_N)) + ABS(MAX(LONG_W) - MIN(LONG_W))
AS DECIMAL(20, 4))
FROM STATION
select round((max(lat_n)-min(lat_n)),4)+round((max(long_w)-min(long_w)),4)
from station;
As we will get result from diff of max and min we don't need abs.
The above code works for Sql Problem
SELECT ROUND(ABS(MAX(Station.LAT_N) - MIN(Station.LONG_W)) + ABS(MIN(Station.LAT_N) - MAX(Station.Long_W)), 4)
FROM Station;enter image description here

Different result for Haversine formulas

I am using mysql to count the proximity and for that i have created one procedure named distance which is as follows but that procedure is not working properly but the sql statement is working so what is the difference over here as both are i guess Haversine formulas but not giving me the proper result. i really don't know wht i am missing in formula one.
Data structure of my table is as follows
for formula one
id varchar(100)
userid varchar(100)
username varchar(100)
currLoc point
radius int(10)
for formula two
id varchar(30)
userid varchar(30)
username varchar(40)
lat float(10,6)
lan float(10,6)
radius varchar(100)
Formula One: reference
sql statement to execute distance function
SELECT userid, username, distance(userstatus.currLoc,
GeomFromText('POINT(23.039574 72.56602)')) AS cdist
FROM userstatus HAVING cdist <= 0.6 ORDER BY cdist LIMIT 10
RETURN 6371 * 2 *
ASIN( SQRT(POWER(SIN(RADIANS(ABS(X(a)) - ABS(X(b)))), 2) +
COS(RADIANS(ABS(X(a)))) * COS(RADIANS(ABS(X(b)))) *
POWER(SIN(RADIANS(Y(a) - Y(b))), 2)));
Formula two: reference
SELECT *,(((acos(sin((23.039574*pi()/180)) *
sin((lat *pi()/180))+cos((23.039574*pi()/180)) *
cos((lat *pi()/180)) * cos(((72.56602- lon)*pi()/180))))*
180/pi())*60*1.1515*1.609344) as distance
FROM status HAVING distance <= 0.6
here 0.6 is a radius in kilometers
One version of the expression is using ABS(X(a)) etc and the other is not. The one using ABS is suspect. You can't afford to ignore the sign on the angles. You'll get different results in some areas of the world (near the equator or the prime meridian, for example, or near the poles).
Your constants are also different.
60*1.1515*1.609344
vs
6371 * 2
One expression involves SQRT, the other does not.
One expression involves ASIN and the other uses ACOS.
There is essentially nothing in common between the two...
See the discussion at Wikipedia 'Haversine Formula', and in particular the references to numerical stability when the distance between the points is small.
You could also improve the chances of people helping you by making the formulae you're using semi-readable, by splitting them over lines.
For example:
RETURN 6371 * 2 *
ASIN( SQRT(POWER(SIN(RADIANS(ABS(X(a)) - ABS(X(b)))), 2) +
COS(RADIANS(ABS(X(a)))) * COS(RADIANS(ABS(X(b)))) *
POWER(SIN(RADIANS(Y(a) - Y(b))), 2)));
And:
(((acos(sin((23.039574*pi()/180)) * sin((lat *pi()/180)) +
cos((23.039574*pi()/180)) * cos((lat *pi()/180)) *
cos(((72.56602-lan)*pi()/180))
)
) * 180/pi()) * 60 * 1.1515 * 1.609344)
The latter references 'lan'; is that meant to be 'lon'? In the second example, you appear to have encoded one of the two positions as 23.039574°N and 72.56602°W, and lat and lan come from the table in the SQL query.

Finding all users within X miles calculation via mysql is slow

I'm having trouble with this query. It takes about 5 seconds to run against ~300 users. I assume it's because it's calculating the distance for every possible user.
Is there a way for me to optimize this to make it run fast? Thanks in advance.
select
t2.*,
t2.city,
t2.state,
t2.county,
ifnull(round((6371 * acos( cos( radians('32.7211') ) * cos( radians( t2.latitude ) ) * cos( radians( t2.longitude ) - radians('-117.16431') ) + sin( radians('32.7211') ) * sin( radians( t2.latitude ) ) ) ),0),1) AS distance
from
users t1
inner join
zipcodes_coordinates t2
on t1.zip_code=t2.zipcode
having
distance <= 150
I would eliminate as much of the data as you can before the main bit of the query is run, which you list. Your query is almost certainly looping over every single row in the table.
For example, you know that if a user at (X,Y) is within an R mile circle of a certain point X',Y', then they are certainly within a square of diameter 2R, which means the following things hold:
X <= X' + R
X >= X' - R
Y <= Y' + R
Y >= Y' - R
So to make a query on the database, you could first have the database eliminate all users who's X value doesn't satisfy those constraints, and this can be done using the index on the field. (same goes for the Y co-ordinate)
Another (rather more domain-specific) trick would be to split the world up into small squares that are indexable with a single identifier (could be a long, or even a string with the co-ordinates of the centre so long as you could re-create them reliably from any co-ordinate within the square). Then store which square each co-ordinate is in as well as the co-ordinate itself. If you are looking for e.g a 5 mile radius, then make the squares something like 2 miles square. That way you can very quickly do a search on a small number of adjacent squares by identity (it would be no more than 9 in this case), then loop over the results in those squares to find the closest matches in your application.
Most performance optimisations in this kind of thing are about eliminating data that certainly doesn't fit and then refining, rather than immediately going after data that certainly does.
PS - if you are using MySQL there is a GIS extension, which I haven't tried: http://dev.mysql.com/tech-resources/articles/4.1/gis-with-mysql.html. This probably does something like what I describe, and may or may not take into account the curvature of the earth, etc. However in most cases the successive refinement method is fairly safe, and means your database doesn't have to 'know' about GIS co-ordinate systems.

SQL Query For Total Points Within Radius of a Location

I have a database table of all zipcodes in the US that includes city,state,latitude & longitude for each zipcode. I also have a database table of points that each have a latitude & longitude associated with them. I'd like to be able to use 1 MySQL query to provide me with a list of all unique city/state combinations from the zipcodes table with the total number of points within a given radius of that city/state. I can get the unique city/state list using the following query:
select city,state,latitude,longitude
from zipcodes
group by city,state order by state,city;
I can get the number of points within a 100 mile radius of a specific city with latitude '$lat' and longitude '$lon' using the following query:
select count(*)
from points
where (3959 * acos(cos(radians($lat)) * cos(radians(latitude)) * cos(radians(longitude) - radians($lon)) + sin(radians($lat)) * sin(radians(latitude)))) < 100;
What I haven't been able to do is figure out how to combine these queries in a way that doesn't kill my database. Here is one of my sad attempts:
select city,state,latitude,longitude,
(select count(*) from points
where status="A" AND
(3959 * acos(cos(radians(zipcodes.latitude)) * cos(radians(latitude)) * cos(radians(longitude) - radians(zipcodes.longitude)) + sin(radians(zipcodes.latitude)) * sin(radians(latitude)))) < 100) as 'points'
from zipcodes
group by city,state order by state,city;
The tables currently have the following indexes:
Zipcodes - `zip` (zip)
Zipcodes - `location` (state,city)
Points - `status_length_location` (status,length,longitude,latitude)
When I run explain before the previous MySQL query here is the output:
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
| 1 | PRIMARY | zipcodes | ALL | NULL | NULL | NULL | NULL | 43187 | Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | points | ref | status_length_location | status_length_location | 2 | const | 16473 | Using where; Using index |
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
I know I could loop through all the zipcodes and calculate the number of matching points within a given radius but the points table will be growing all the time and I'd rather not have stale point totals in the zipcodes database. I'm hoping a MySQL guru out there can show me the error of my ways. Thanks in advance for your help!
MySQL Guru or not, the problem is that unless you find a way of filtering out various rows, the distance needs to be calculated between each point and each city...
There are two general approaches that may help the situation
make the distance formula simpler
filter out unlikely candidates to the 100k radius from a given city
Before going into these two avenue of improvement, you should decide on the level of precision desired with regard to this 100 miles distance, also you should indicate which geographic area is covered by the database (is this just continental USA etc.
The reason for this is that while more precise numerically, the Great Circle formula, is very computationally expensive. Another avenue of performance improvement would be to store "Grid coordinates" of sorts in addtion (or instead of) the Lat/Long coordinates.
Edit:
A few ideas about a simpler (but less precise) formula:
Since we're dealing with relatively small distances, (and I'm guessing between 30 and 48 deg Lat North), we can use the euclidean distance (or better yet the square of the euclidean distance) rather than the more complicated spherical trigonometry formulas.
depending on the level of precision expected, it may even be acceptable to have one single parameter for the linear distance for a full degree of longitude, taking something average over the area considered (say circa 46 statute miles). The formula would then become
LatDegInMi = 69.0
LongDegInMi = 46.0
DistSquared = ((Lat1 - Lat2) * LatDegInMi) ^2 + ((Long1 - Long2) * LongDegInMi) ^2
On the idea of a columns with grid info to filter to limit the number of rows considered for distance calculation.
Each "point" in the system, be it a city, or another point (?delivery locations, store locations... whatever) is assigned two integer coordinate which define the square of say 25 miles * 25 miles where the point lies. The coordinates of any point within 100 miles from the reference point (a given city), will be at most +/- 4 in the x direction and +/- 4 in the y direction. We can then write a query similar to the following
SELECT city, state, latitude, longitude, COUNT(*)
FROM zipcodes Z
JOIN points P
ON P.GridX IN (
SELECT GridX - 4, GridX - 3, GridX - 2, GridX - 1, GridX, GridX +1, GridX + 2 GridX + 3, GridX +4
FROM zipcode ZX WHERE Z.id = ZX.id)
AND
P.GridY IN (
SELECT GridY - 4, GridY - 3, GridY - 2, GridY - 1, GridY, GridY +1, GridY + 2 GridY + 3, GridY +4
FROM zipcode ZY WHERE Z.id = ZY.id)
WHERE P.Status = A
AND ((Z.latitude - P.latitude) * LatDegInMi) ^2
+ ((Z.longitude - P.longitude) * LongDegInMi) ^2 < (100^2)
GROUP BY city,state,latitude,longitude;
Note that the LongDegInMi could either be hardcoded (same for all locations within continental USA), or come from corresponding record in the zipcodes table. Similarly, LatDegInMi could be hardcoded (little need to make it vary, as unlike the other it is relatively constant).
The reason why this is faster is that for most records in the cartesian product between the zipcodes table and the points table, we do not calculate the distance at all. We eliminate them on the basis of a index value (the GridX and GridY).
This brings us to the question of which SQL indexes to produce. For sure, we may want:
- GridX + GridY + Status (on the points table)
- GridY + GridX + status (possibly)
- City + State + latitude + longitude + GridX + GridY on the zipcodes table
An alternative to the grids is to "bound" the limits of latitude and longitude which we'll consider, based on the the latitude and longitude of the a given city. i.e. the JOIN condition becomes a range rather than an IN :
JOIN points P
ON P.latitude > (Z.Latitude - (100 / LatDegInMi))
AND P.latitude < (Z.Latitude + (100 / LatDegInMi))
AND P.longitude > (Z.longitude - (100 / LongDegInMi))
AND P.longitude < (Z.longitude + (100 / LongDegInMi))
When I do these type of searches, my needs allow some approximation. So I use the formula you have in your second query to first calculate the "bounds" -- the four lat/long values at the extremes of the allowed radius, then take those bounds and do a simple query to find the matches within them (less than the max lat, long, more than the minimum lat, long). So what I end up with is everything within a square sitting inside the circle defined by the radius.
SELECT * FROM tblLocation
WHERE 2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
where the 2 > part would be the number of parallels away and 40 and -90 are lat/lon of the test point
Sorry I didn't use your tablenames or structures, I just copied this out of one of my stored procedures I have in one of my databases.
If I wanted to see the number of points in a zip code I suppose I would do something like this:
SELECT
ParcelZip, COUNT(LocationID) AS LocCount
FROM
tblLocation
WHERE
2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
GROUP BY
ParcelZip
Getting the total count of all locations in the range would look like this:
SELECT
COUNT(LocationID) AS LocCount
FROM
tblLocation
WHERE
2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
A cross join may be inefficient here since we are talking about a large quantity of records but this should do the job in a single query:
SELECT
ZipCodes.ZipCode, COUNT(PointID) AS LocCount
FROM
Points
CROSS JOIN
ZipCodes
WHERE
2 > POWER(POWER(Points.Latitude - ZipCodes.Latitude, 2) + POWER(Points.Longitude - ZipCodes.Longitude, 2), .5)
GROUP BY
ZipCodeTable.ZipCode