How can I optimize this stored procedure? - mysql

I need some help optimizing this procedure:
DELIMITER $$
CREATE DEFINER=`ryan`#`%` PROCEDURE `GetCitiesInRadius`(
cityID numeric (15),
`range` numeric (15)
)
BEGIN
DECLARE lat1 decimal (5,2);
DECLARE long1 decimal (5,2);
DECLARE rangeFactor decimal (7,6);
SET rangeFactor = 0.014457;
SELECT `latitude`,`longitude` into lat1,long1
FROM world_cities as wc WHERE city_id = cityID;
SELECT
wc.city_id,
wc.accent_city as city,
s.state_name as state,
c.short_name as country,
GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) as dist
FROM world_cities as wc
left join states s on wc.state_id = s.state_id
left join countries c on wc.country_id = c.country_id
WHERE
wc.`latitude` BETWEEN lat1 -(`range` * rangeFactor) AND lat1 + (`range` * rangeFactor)
AND wc.`longitude` BETWEEN long1 - (`range` * rangeFactor) AND long1 + (`range` * rangeFactor)
AND GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) <= `range`
ORDER BY dist limit 6;
END
Here is my explain on the main portion of the query:
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | B | range | idx_lat_long | idx_lat_long | 12 | NULL | 7619 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | civilipedia.B.state_id | 1 | |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 1 | civilipedia.B.country_id | 1 | Using where |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
3 rows in set (0.00 sec)
Here are the indexes:
mysql> show indexes from world_cities;
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| world_cities | 0 | PRIMARY | 1 | city_id | A | 3173958 | NULL | NULL | | BTREE | |
| world_cities | 1 | country_id | 1 | country_id | A | 23510 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | city | 1 | city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city | 1 | accent_city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_pop | 1 | population | A | 28854 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 1 | latitude | A | 1057986 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 2 | longitude | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city_2 | 1 | accent_city | NULL | 1586979 | NULL | NULL | YES | FULLTEXT | |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (0.01 sec)
The function you see in the query I wouldn't think would cause the slow down, but here is the function:
CREATE DEFINER=`ryan`#`%` FUNCTION `GetDistance`(lat1 numeric (9,6),
lon1 numeric (9,6),
lat2 numeric (9,6),
lon2 numeric (9,6) ) RETURNS decimal(10,5)
BEGIN
DECLARE x decimal (20,10);
DECLARE pi decimal (21,20);
SET pi = 3.14159265358979323846;
SET x = sin( lat1 * pi/180 ) * sin( lat2 * pi/180 ) + cos(
lat1 *pi/180 ) * cos( lat2 * pi/180 ) * cos( (lon2 * pi/180) -
(lon1 *pi/180)
);
SET x = atan( ( sqrt( 1- power( x, 2 ) ) ) / x );
RETURN ( 1.852 * 60.0 * ((x/pi)*180) ) / 1.609344;
END

As far as I can tell there is not something directly wrong with your logic that would make this slow, so the problems ends up being that you can't use any indexes with this query.
MySQL needs to do a full table scan and apply the functions of your WHERE clause to each row to determine if it passed the conditions. Currently there's 1 index used: idx_lat_long.
It's a bit of a bad index, the long portion will never be used, because the lat portion is a float. But at the very least you managed to effectively filter out all rows that are outside the latitude range. But it's likely.. these are still a lot though.
You'd actually get slightly better results on the longitude, because humans only really live in the middle 30% of the earth. We're very much spread out horizontally, but not really vertically.
Regardless, the best way to further minimize the field is to try to filter out as many records in the general area. Right now it's a full vertical strip on the earth, try to make it a bounding box.
You could naively dice up the earth in say, 10x10 segments. This would in a best case make sure the query is limited to 10% of the earth ;).
But as soon as your bounding box exceeds to separate segments, only the first coordinate (lat or lng) can be used in the index and you end up with the same problem.
So when I thought of this problem I started thinking about this differently. Instead, I divided up the earth in 4 segments (lets say, north east, north west, south east, south west on map). So this gives me coordinates like:
0,0
0,1
1,0
1,1
Instead of putting the x and y value in 2 separate fields, I used it as a bit field and store both at once.
Then every 1 of the 4 boxes I divided up again, which gives us 2 sets of coordinates. The outer and inner coordinates. I'm still encoding this in the same field, which means we now use 4 bits for our 8x8 coordinate system.
How far can we go? If we assume a 64 bit integer field, it means that 32bit can be used for each of the 2 coordinates. This gives us a grid system of 4294967295 x 4294967295 all encoded into one database field.
The beauty of this field is that you can index it. This is sometimes called (I believe) a Quad-tree. If you need to select a big area in your database, you just calculate the 64bit top-left coordinate (in the 4294967295 x 4294967295 grid system) and the bottom-left, and it's guaranteed that anything that lies in that box, will also be within the two numbers.
How do you get to those numbers. Lets be lazy and assume that both our x and y coordinate have range from -180 to 180 degrees. (The y coordinate of course is half that, but we're lazy).
First we make it positive:
// assuming x and y are our long and lat.
var x+=180;
var y+=180;
So the max for those is 360 now, and (4294967295 / 360 is around 11930464).
So to convert to our new grid system, we just do:
var x*=11930464;
var y*=11930464;
Now we have to distinct numbers, and we need to turn them into 1 number. First bit 1 of x, then bit 1 of y, bit 2 of x, bit 2 of y, etc.
// The 'morton number'
morton = 0
// The current bit we're interleaving
bit = 1
// The position of the bit we're interleaving
position = 0
while(bit <= latitude or bit <= longitude) {
if (bit & latitude) morton = morton | 1 << (2*position+1)
if (bit & longitude) morton = morton | 1 << (2*position)
position += 1
bit = 1 << position
}
I'm calling the final variable 'morton', the guy who came up with it in 1966.
So this leaves us finally with the following:
For each row in your database, calculate the morton number and store it.
Whenever you do a query, first determine the maximum bounding box (as the morton number) and filter on that.
This will greatly reduce the number of records you need to check.
Here's a stored procedure I wrote that will do the calculation for you:
CREATE FUNCTION getGeoMorton(lat DOUBLE, lng DOUBLE) RETURNS BIGINT UNSIGNED DETERMINISTIC
BEGIN
-- 11930464 is round(maximum value of a 32bit integer / 360 degrees)
DECLARE bit, morton, pos BIGINT UNSIGNED DEFAULT 0;
SET #lat = CAST((lat + 90) * 11930464 AS UNSIGNED);
SET #lng = CAST((lng + 180) * 11930464 AS UNSIGNED);
SET bit = 1;
WHILE bit <= #lat || bit <= #lng DO
IF(bit & #lat) THEN SET morton = morton | ( 1 << (2 * pos + 1)); END IF;
IF(bit & #lng) THEN SET morton = morton | ( 1 << (2 * pos)); END IF;
SET pos = pos + 1;
SET bit = 1 << pos;
END WHILE;
RETURN morton;
END;
A few caveats:
The absolute worst case scenario will still scan 50% of your entire table. This chance is extremely low though, and I've seen absolutely significant performance increases for most real-world queries.
The bounding box in this case assumes a Eucllidean space, meaning.. a flat surface. In reality your bounding boxes are not exact squares, and they warp heavily when getting closer to the poles. By just making the boxes a bit larger (depending on how exact you want to be) you can get quite far. Most real-world data is also often not close to the poles ;). Remember that this filter is just a 'rough filter' to get the most of the likely unwanted rows out.
This is based on a so-called Z-Order curve. To get even better performance, if you're feeling adventurous.. you could try to go for the Hilbert Curve instead. This curve oddly rotates, which ensures that in a worst case scenario, you will only scan about 25% of the table.. Magic! In general this one will also filter much more unwanted rows.
Source for all this: I wrote 3 blogposts about this topic when I came to the same problems and tried to creatively get to a solution. I got much better performance with this compared to MySQL's GEO indexes.
http://www.rooftopsolutions.nl/blog/229
http://www.rooftopsolutions.nl/blog/230
http://www.rooftopsolutions.nl/blog/231

Related

How can I make "euclidean distance calculation" faster in MySQL?

I am creating a face recognition system, but the search is very slow. Can you share how to speed up the search?
It takes about 6 seconds for 100,000 data items.
MySQL
mysql> SHOW VARIABLES LIKE '%version%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| version | 8.0.29 |
| version_comment | MySQL Community Server - GPL |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+--------------------------+------------------------------+
Table
CREATE TABLE `face_feature` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`f1` decimal(9,8) NOT NULL,
`f2` decimal(9,8) NOT NULL,
...
...
`f127` decimal(9,8) NOT NULL,
`f128` decimal(9,8) NOT NULL,
PRIMARY KEY (id)
);
Data
mysql> SELECT count(*) FROM face_feature;
+----------+
| count(*) |
+----------+
| 110004 |
+----------+
mysql> SELECT * FROM face_feature LIMIT 1\G;
id: 1
f1: -0.07603023
f2: 0.13605964
...
f127: 0.09608927
f128: 0.00082345
SQL
SELECT
id,
sqrt(
power(f1 - (-0.09077361), 2) +
power(f2 - (0.10373443), 2) +
...
...
power(f127 - (0.0778369), 2) +
power(f128 - (0.00951046), 2)
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
Result
+----+--------------------+
| id | distance |
+----+--------------------+
| 1 | 0.3376853491771237 |
+----+--------------------+
1 row in set (6.18 sec)
Update 1:
Changed from decimal(9,8) to float(9,8)
Then, improved from approximately 4sec to 3.26 sec
mysql> desc face_feature;
+-------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| f1 | float(9,8) | NO | | NULL | |
| f2 | float(9,8) | NO | | NULL | |
..
| f127 | float(9,8) | NO | | NULL | |
| f128 | float(9,8) | NO | | NULL | |
+-------+------------+------+-----+---------+----------------+
Update 2:
Changed from POWER(z, 2) to z*z
Then, the result was changed from 3.26 sec to 4.65 sec
SELECT
id,
sqrt(
((f1 - (-0.09077361)) * (f1 - (-0.09077361))) +
((f2 - (0.10373443)) * (f2 - (0.10373443))) +
((f3 - (0.00798536)) * (f3 - (0.00798536))) +
...
...
((f126 - (0.07803915)) * (f126 - (0.07803915))) +
((f127 - (0.0778369)) * (f127 - (0.0778369))) +
((f128 - (0.00951046)) * (f128 - (0.00951046))
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
Update 3
I am looking into the usage of MySQL GIS.
How can I migrate from "float" to "points" in MySQL?
Update 4
I'm also looking at PostgreSQL because I can't find a way to handle 128 dimensions in MySQL.
DECIMAL(9,8) -- that's a lot of significant digits. Do you need that much precision?
FLOAT -- about 7 significant digits; faster arithmetic.
POWER(z, 2) -- probably a lot slower than z*z. (This may be the slowest part.)
SQRT -- In many situations, you can simply work with the squares. In this case:
SELECT SQRT(closest)
FROM ( SELECT -- leave out SQRT
... ORDER BY .. LIMIT 1 )
Here are some other thoughts. They are not necessarily relevant to the query being discussed:
Precise testing -- Beware of comparing for 'equal' Roundoff error is likely to make things unequal unexpectedly. Imprecise measurements add to the issue. If I measure something twice, I might get 1.23456789 one time and 1.23456788 the next time. (Especially at that level of "precision".
Trade complexity vs speed -- Use ABS(a - b) as the distance formula; find the 10 items closest in that way, then use the Euclidean distance to get the 'right' distance.
Break the face into regions. Find which region the item is in, then check only the subset of the 128 points that are in that region. (Being near a boundary -- put some points in two regions.)
Think out of the box -- I'm not familiar with your facial recognition, so I have run out of mathematical tricks.
Switch to POINTs and a SPATIAL index. It may be possible your task orders of magnitude faster. (This is probably not practical for 128-dimensional space.)

MySQL - query optimisation: subquery, user-defined variables and join

I am currently building a single (but extremely important in its context) query, which seems like it is working (qualitatively ok), but which I think/hope/wish could run faster.
I am running tests on MySQL 5.7.29, until a box running OmnisciDB in GPU mode can become available (which should be relatively soon). While I am hoping the switch to that different DB backend will improve performance, I am also aware it might require some tweaking in the table structures, querying techniques used, etc. But that is for later.
A little context:
Data
Is summed up as an extremely simple table:
CREATE TABLE `entities_for_perception` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`pos` POINT NOT NULL,
`perception` INT(11) NOT NULL DEFAULT '0',
`stealth` INT(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
SPATIAL INDEX `pos` (`pos`),
INDEX `perception` (`perception`),
INDEX `stealth` (`stealth`)
)
COLLATE='utf8mb4_bin'
ENGINE=InnoDB
AUTO_INCREMENT=10001
;
Which then contains values like (obvious but helps visualise :-) ):
| id | pos | perception | stealth |
| 1 | ... | 10 | 3 |
| 2 | ... | 6 | 5 |
| 3 | ... | 5 | 5 |
| 4 | ... | 7 | 7 |
etc..
Now I have this query (see below) whose intent is the following: in one pass, fetch all the ids of the "entities" that see other entities and return the list of "who sees who".
[The "in one pass" is obvious and is to limit roundtrips.]
Let's assume POINT() is in a cartesian system.
The query is the following:
SHOW WARNINGS;
SET #automatic_perception_distance := 10;
SELECT
*
FROM (
SELECT
e1.id AS oid,
e1.perception AS operception,
#max_perception_distance := e1.perception * 5 AS 'max_perception_distance',
#dist := ST_DISTANCE(e1.pos, e2.pos) AS 'dist',
# minimum 0
#dist_from_auto := GREATEST(#dist - #automatic_perception_distance, 0) AS 'dist_from_auto',
#effective_perception := (
#origin_perception - (
#dist_from_auto
/ (#max_perception_distance - #automatic_perception_distance)
* #origin_perception
)
) AS 'effective_perception',
e2.id AS tid,
e2.stealth AS tstealth
FROM
entities_for_perception e1
INNER JOIN entities_for_perception e2 ON
e1.id != e2.id
ORDER BY
oid,
dist
) AS subquery
WHERE
effective_perception >= tstealth
;
What it does is list "who seems whom" by applying the following criteria/filters:
determining a maximum distance beyond which perception is not possible
determining a minimal distance below which perception is automatic (not implemented yet)
determining an effective perception value varying (and regressing) with distance
...and comparing the effective perception of the "spotter" versus the stealth of the "target".
This works, but runs somewhat slowly (laptop + virtualbox + centos7) on a table with very few rows (~1,000). The query time seems to fluctuate between 0.2 and 0.29 seconds. This is however orders of magnitude faster than it would be with one query per "spotter", which would not scale with 1,000+ spotters. Heh. :-)
Example of output:
| oid | operception | max_perception_distance | dist | dist_fromt_auto | effective_perception | tid | tstleath |
| 1 | 9 | 45 | 1.4142135623730951 | 0 | 9 | 156 | 5 |
| 1 | 9 | 45 | 11.045361017187261 | 1.0453610171872612 | 8.731192881294705 | 164 | 2 |
| 1 | 9 | 45 | 13.341664064126334 | 3.341664064126334 | 8.140714954938943 | 163 | 8 |
| 1 | 9 | 45 | 16.97056274847714 | 6.970562748477139 | 7.207569578963021 | 125 | 7 |
| 1 | 9 | 45 | 25.019992006393608 | 15.019992006393608 | 5.137716341213072 | 152 | 3 |
| 1 | 9 | 45 | 25.079872407968907 | 15.079872407968907 | 5.122318523665138 | 191 | 5 |
etc.
Could the reason for what I believe is a slow response:
be the subquery?
be the variables or the arithmetics applied to them?
the join?
something else I am not aware of?
Thank you for any insight!
An index would probably help: CREATE INDEX idx_ID ON entities_for_perception (id);
If you were to upgrade to MySQL version 8, you could take advantage of a Common Table Expression as follows:
with e1 as (
SELECT
e1.id AS oid,
e1.perception AS operception,
#max_perception_distance := e1.perception * 5 AS 'max_perception_distance',
#dist := ST_DISTANCE(e1.pos, e2.pos) AS 'dist',
# minimum 0
#dist_from_auto := GREATEST(#dist - #automatic_perception_distance, 0) AS 'dist_from_auto',
#effective_perception := (
#origin_perception - (
#dist_from_auto
/ (#max_perception_distance - #automatic_perception_distance)
* #origin_perception
)
) AS 'effective_perception',
e2.id AS tid,
e2.stealth AS tstealth
FROM
entities_for_perception)
SELECT *
FROM e1
INNER JOIN entities_for_perception e2 ON
e1.id != e2.id
ORDER BY
oid,
dist
WHERE
effective_perception >= tstealth
;

How to get nearest coordinates from database in mysql?

I have got a table with id,latitude (lat),longitude (lng),altitude (alt).
I have some coordinates and I would like to find the closest entry in the DB.
I used this but not yet working correctly:
SELECT lat,ABS(lat - TestCordLat), lng, ABS(lng - TestCordLng), alt AS distance
FROM dhm200
ORDER BY distance
LIMIT 6
I have a table with the 6 nearest points displaying me the lattitude, longtitude and altitude.
Query to get nearest distance in kilometer (km) from mysql:
SELECT id, latitude, longitude, SQRT( POW(69.1 * (latitude - 4.66455174) , 2) + POW(69.1 * (-74.07867091 - longitude) * COS(latitude / 57.3) , 2)) AS distance FROM ranks ORDER BY distance ASC;
You may wish to limit radius by HAVING syntax.
... AS distance FROM ranks HAVING distance < '150' ORDER BY distance ASC;
Example:
mysql> describe ranks;
+------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| latitude | decimal(10,8) | YES | MUL | NULL | |
| longitude | decimal(11,8) | YES | | NULL | |
+------------+---------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> SELECT id, latitude, longitude, SQRT( POW(69.1 * (latitude - 4.66455174) , 2) + POW(69.1 * (-74.07867091 - longitude) * COS(latitude / 57.3) , 2)) AS distance FROM ranks ORDER BY distance ASC;
+----+-------------+--------------+--------------------+
| id | latitude | longitude | distance |
+----+-------------+--------------+--------------------+
| 4 | 4.66455174 | -74.07867091 | 0 |
| 10 | 4.13510880 | -73.63690401 | 47.59647003096195 |
| 11 | 6.55526689 | -73.13373892 | 145.86590936973073 |
| 5 | 6.24478548 | -75.57050110 | 149.74731096011348 |
| 7 | 7.06125013 | -73.84928550 | 166.35723903407165 |
| 9 | 3.48835279 | -76.51532198 | 186.68173882319724 |
| 8 | 7.88475514 | -72.49432589 | 247.53456848808233 |
| 1 | 60.00001000 | 101.00001000 | 7156.836171031409 |
| 3 | 60.00001000 | 101.00001000 | 7156.836171031409 |
+----+-------------+--------------+--------------------+
9 rows in set (0.00 sec)
You will need to use the Haversine formula to calculate distances taking into account the latitude and longitude:
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
distance = R * c (where R is the radius of the Earth)
However, the altitude raises the difficulty of the problem. If between point A and point B, having different altitudes the road contains a lot of high altitude differences, then assuming that the altitude's line's derivative between the two points is unchanged might be misleading, not taking that into account at all might be very misleading. Compare the distance between a point in China and a point in India, having the Himalaja in between with the distance between two points on the surface of the Pacific ocean. A possibility would be to vary R to be the average of the altitudes for each comparisons, but in case of large distances this could be misleading, as discussed earlier.

use st_within to get the locations in a radius/circle

I have a table that looks like this
| id | name | latitude | langitude | costLat | sinLat | cosLng | sinLng |
| 1 | place 1 | 2.942743912327621 | 101.79377630352974 | 0.99868133582304 | 0.051337992546461 | -0.20438972214917 | 0.97888959616485 |
Referring to this article, it seems like a good idea to use st_within in order for me to search for locations within 5 km radius from a given latitute and langitude in my table above. But I totally have no idea how to do that.
The table is MyISAM, MySQL version 5.6
Sorry for not being clear on what I tried. From the documentation it mentions that
ST_Within(g1,g2)
Returns 1 or 0 to indicate whether g1 is spatially within g2.
So my understanding is, we need to pass 2 params to ST_Within. Sound simple enough, but when I looked at the sample query in the linked articles, it does (*note: I changed shape to CIRCLE in the query, as my assumption is my shape is CIRCLE because I'm searching for radius)
set #lat= 37.615223;
set #lon = -122.389979;
set #dist = 10;
set #rlon1 = #lon-#dist/abs(cos(radians(#lat))*69);
set #rlon2 = #lon+#dist/abs(cos(radians(#lat))*69);
set #rlat1 = #lat-(#dist/69);
set #rlat2 = #lat+(#dist/69);
SELECT ASTEXT("CIRCLE"), NAME FROM location_final
WHERE st_within("CIRCLE", ENVELOPE(LINESTRING(POINT(#rlon1, #rlat1), POINT(#rlon2, #rlat2))))
ORDER BY st_distance(POINT(#lon, #lat), "CIRCLE") LIMIT 10;
So looking at the query above, my confusion is, where do the comparison between the latitude and langitude happens? Where in the query should I mention about my column latitude and langitude?
Looking at the output at the given link, it display something like
+--------------------------------+-------------------------------+
| astext(shape) | name |
+--------------------------------+-------------------------------+
| POINT(-122.3890954 37.6145378) | Tram stop:Terminal A |
| POINT(-122.3899 37.6165902) | Tram stop:Terminal G |
Where do the POINT come from?

Geolocation distance SQL from a cities table [duplicate]

This question already has answers here:
Calculate distance between 2 GPS coordinates
(31 answers)
Closed 3 years ago.
So I have this function to calculate nearest cities based on latitude, longitude and radius parameters.
DELIMITER $$
DROP PROCEDURE IF EXISTS `world_db`.`geolocate_close_cities`$$
CREATE PROCEDURE `geolocate_close_cities`(IN p_latitude DECIMAL(8,2), p_longitude DECIMAL(8,2), IN p_radius INTEGER(5))
BEGIN
SELECT id, country_id, longitude, latitude, city,
truncate((degrees(acos( sin(radians(latitude))
* sin(radians(p_latitude))
+ cos(radians(latitude))
* cos(radians(p_latitude))
* cos(radians(p_longitude - longitude) ) ) )
* 69.09*1.6),1) as distance
FROM cities
HAVING distance < p_radius
ORDER BY distance desc;
END$$
DELIMITER ;
Here's the structure of my cities table:
> +------------+-------------+------+-----+---------+----------------+ |
> Field | Type | Null | Key | Default | Extra |
> +------------+-------------+------+-----+---------+----------------+ |
> id | int(11) | NO | PRI | NULL | auto_increment | |
> country_id | smallint(6) | NO | | NULL | | |
> region_id | smallint(6) | NO | | NULL | | |
> city | varchar(45) | NO | | NULL | | |
> latitude | float | NO | | NULL | | |
> longitude | float | NO | | NULL | | |
> timezone | varchar(10) | NO | | NULL | | |
> dma_id | smallint(6) | YES | | NULL | | |
> code | varchar(4) | YES | | NULL | |
> +------------+-------------+------+-----+---------+----------------+
It works very well.
What i'd lke to do (pseudcode) is something like:
SELECT * FROM cities WHERE DISTANCE(SELECT id FROM cities WHERE id={cityId}, {km))
and it'll return me the closest cities.
Any ideas of how I can do this?
At the moment, I just call the function, and then iterate through the ids into an array and then perform a WHEREIN in the city table which obviously isn't very efficient.
Any help is MUCH appreciated. Thanks.
If you can limit the maximum distance between your cities and your local position, take advantage of the fact that one minute of latitude (north - south) is one nautical mile.
Put an index on your latitude table.
Make yourself a haversine(lat1, lat2, long1, long2, unit) stored function from the haversine formula shown in your question. See below
Then do this, given mylatitude, mylongitude, and mykm.
SELECT *
from cities a
where :mylatitude >= a.latitude - :mykm/111.12
and :mylatitude <= a.latitude + :mykm/111.12
and haversine(:mylatitude,a.latitude,:mylongitude,a.longitude, 'KM') <= :mykm
order by haversine(:mylatitude,a.latitude,:mylongitude,a.longitude, 'KM')
This will use a latitude bounding box to crudely rule out cities that are too far away from your point. Your DBMS will use an index range scan on your latitude index to quickly pick out the rows in your cities table that are worth considering. Then it will run your haversine function, the one with all the sine and cosine maths, only on those rows.
I suggest latitude because the on-the-ground distance of longitude varies with latitude.
Note this is crude. It's fine for a store-finder, but don't use it if you're a civil engineer -- the earth has an elliptical shape and the this assumes it's circular.
(Sorry about the 111.12 magic number. That's the number of km in a degree of latitude, that is in sixty nautical miles.)
See here for a workable distance function.
Why does this MySQL stored function give different results than to doing the calculation in the query?