MySQL query performance tuning - mysql

I'm trying to run a query over a table with around a 1 million records.
The structure of the table with its indexes is:
CREATE TABLE `table` (
`Id` int(11) NOT NULL,
`Name` varchar(510) DEFAULT NULL,
`Latitude` float NOT NULL DEFAULT '0',
`Longitude` float NOT NULL DEFAULT '0',
PRIMARY KEY (`Latitude`,`Longitude`,`Id`),
KEY `IX_Latitude_Longitude` (`Latitude`,`Longitude`),
KEY `IX_Latitude` (`Latitude`),
KEY `IX_Longitude` (`Longitude`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I'm running the following query:
SELECT m.Id, m.Name, sqrt(69.1 * (m.Latitude - :latitude) * 69.1 * (m.Latitude - :latitude) +
53.0 * (m.longitude - :longitude) * 53.0 * (m.longitude - :longitude)) as Distance,
m.Latitude as Latitude, m.Longitude as Longitude
FROM table m
WHERE sqrt(69.1 * (m.Latitude - :latitude) * 69.1 * (m.Latitude - :latitude)
+ 53.0 * (m.longitude - :longitude) * 53.0 * (m.longitude - :longitude)) < :radius
ORDER BY sqrt(69.1 * (m.Latitude - :latitude) * 69.1 * (m.Latitude - :latitude) +
53.0 * (m.longitude - :longitude) * 53.0 * (m.longitude - :longitude)) desc
LIMIT 0, 100
That suppose to return all the records in a specific radius (distance calculation information: http://www.meridianworlddata.com/Distance-Calculation.asp)
BUT the query takes a lot of time...
Here is the explain plan that I get:
id|select_type |table|type|possible_keys|key |key_len|ref |rows |Extra
1 |SIMPLE |m |ALL |{null} |{null}|{null} |{null}|1264001|Using where; Using filesort
What am I doing wrong?
Which index do I need to add in order to cause the query to use it instead of table scan?
Do I need to change the table structure?

You're using functions inside your WHERE clause, so it will always result in a table scan. The database has no way to index against the result of a function. I think your best option is to come up with some way to limit the results before trying to evaluate the distance algorithm.
For instance, for a given location, you can know the minimum and maximum possible latitude that can fall within your set distance, so filter by that first. A degree of latitude is ~69 miles, so if your search radius is 50 miles, it would never be possible for anything more than 0.725 degrees of latitude away to fall within 50 miles of your location. Since this is just a numeric comparison WHERE m.latitude > (:latitude - 0.725) AND m.latitude < (:latitude + 0.725), not a call to a function, the database will be able to use your indexes to evaluate it.
Longitude is more complicated, since the distance for each degree varies depending on how far north/south the location happens to be, but depending on how much work you want to put into it, you could do the same with longitude as well.

Related

Rounding mysql 0.5 doesn't always go up

https://i.stack.imgur.com/pxEQW.png
CREATE TABLE `zz` (
`jum_r` double DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `zz` VALUES (71045),(31875),(12045),(172125),(27325),(5465);
SELECT
jum_r,
ROUND(ROUND((jum_r * 1.1), 2), 2) as q_gross,
ROUND(jum_r * 1.1) as gross,
ROUND((jum_r * 10 / 100), 2) as q_ppn,
ROUND(jum_r * 10 / 100) as ppn
FROM zz;
I have data according to the picture. Why does rounding 0.5 not always go up ...? What's wrong with my query? Thanks
For exact-precision numbers (e.g. DECIMAL) MySQL rounds 0.5 up to the next highest integer. For imprecise numbers (e.g. FLOAT) MySQL counts on the underlying C library's rounding, which is often "round-to-even". Doc ref here
After clarifying below, this should be your answer:
CASE would help. Basically:
WHEN (ROUND(jum_r * 1.1) < 0.5) THEN FLOOR(ROUND(jum_r * 1.1)), WHEN (ROUND(jum_r * 1.1) >= 0.5 THEN CEILING(ROUND(jum_r * 1.1)). Not pretty but should work

distance calculation between two tables of lat/lon

I have the following two tables
cities
id,lat,lon
mountains
id,latitude,longitude
SELECT cities.id,
(SELECT id FROM mountains
WHERE SQRT(POW(69.1 * ( latitude - cities.lat ) , 2 ) +
POW( 69.1 * (cities.lon - longitude ) *
COS( latitude / 57.3 ) , 2 ) )<20 LIMIT 1) as mountain_id
FROM cities
(Query took 0.5060 seconds.)
I've removed some parts of the query (e.g. order by, where) for the complexity's sake. However it doesn't affect the execution time really.
The EXPLAIN below.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY cities ALL NULL NULL NULL NULL 478379
2 DEPENDENT SUBQUERY mountains ALL NULL NULL NULL NULL 15645 Using where
Using the SELECT itself is not my problem but when I try to use the given result... e.g.
id mountain_id
588437 NULL
588993 4269
589014 4201
589021 4213
589036 4952
589052 7625
589113 9235
589125 NULL
589176 1184
589210 4317
...to UPDATE a table everything gets awfully slow. I tried pretty much everything that I know of. I do know that a dependent sub-query isn't optimal but I don't know how to get rid of it.
Is there any way to improve my query. Maybe changing it into a JOIN?
The 2 tables itself have nothing really in common except latitude and longitude which are different and are only brought into relation when using calculations.
Spatial distance search (km,miles) in MariaDB seems not to be available yet.
The trick to making this sort of operation fast is to avoid doing all that computation on every possible pair of lat/lon points. To do that you should incorporate a bounding-box operation.
Let's start by using a JOIN. In pseudocode, you want something like this, but it doesn't matter if you catch a few extra pairs, as long as they are further apart than the others.
SELECT c.city_id, m.mountain_id
FROM cities c
JOIN mountains m ON distance_in_miles(c, m) < 20
So we need to figure out how to make that ON clause fast -- make it use indexes rather than rambling around all the cities and mountains (with apologies to Woody Guthrie).
Let's try this for the ON clause. It searches within square bounding boxes of +/- 20 miles for nearby pairs.
SELECT c.city_id, m.mountain_id
FROM cities c
JOIN mountains m
ON m.lat BETWEEN c.lat - (20.0 / 69.0)
AND c.lat + (20.0 / 69.0)
AND m.lon BETWEEN c.lon - (20.0 / (69.0 * COS(RADIANS(c.lat))))
AND c.lon + (20.0 / (69.0 * COS(RADIANS(c.lat))))
In this query, 20.0 is the comparison limit radius, and 69.0 is the constant defining statute miles per degree of latitude.
Then, put compound indexes on (lat, lon, id) on both tables, and your JOIN operation will be able to use index range scans to make the query more efficient.
Finally, you can augment that query with these sorts of clauses, in pseudocode
ORDER BY dist_in_miles (c,m) ASC
LIMIT 1
Here you actually need to use a distance formula. The cartesian-distance formula in your question is an approximation that works tolerably well unless you're near the pole. You may want to use a great circle formula instead. Those are called spherical cosine law, haversine, or Vincenty formulas.

Need expert's help to solve minor change in spatial data query

i have the following query to access the nearest locations around the given lat-lon.
I followed Mr.Ollie's blog Nearest-location finder for MySQL to find nearest locations around given lat-long using haversine formula.
But due to lack of much knowledge in spatial data query i failed to execute it properly, so looking for an expert's advice to solve this.
Here is my query
SELECT z.id,
p.distance_unit
* DEGREES(ACOS(COS(RADIANS(p.latpoint))
* COS(RADIANS(z.(x(property))))
* COS(RADIANS(p.longpoint) - RADIANS(z.(y(property))))
+ SIN(RADIANS(p.latpoint))
* SIN(RADIANS(z.(x(property)))))) AS distance_in_km
FROM mytable AS z
JOIN ( /* these are the query parameters */
SELECT 12.00 AS latpoint, 77.00 AS longpoint,
20.0 AS radius, 111.045 AS distance_unit
) AS p
WHERE z.(x(property))
BETWEEN p.latpoint - (p.radius / p.distance_unit)
AND p.latpoint + (p.radius / p.distance_unit)
AND z.(y(property)
BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
AND p.longpoint + (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
ORDER BY distance_in_km
LIMIT 15;
when i run this query i'm getting error as
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(x(property)))) * COS(RADIANS(p.longpoint) - RADIANS(z.(y(geo' at line 1
i also tried z.(GeomFromText(x.property)))
this is my table desc
+-----------+----------------+
| Field | Type |
+-----------+----------------+
| id | Int(10) |
| property | geometry |
+-----------+----------------+
select x(property) from mytable; //gives me lat
select y(property) from mytable; //gives me lan
Where am i going wrong?
is this the way to achieve this.?
Please suggest.
It seems to me that you are assuming that once you have selected z.id in the query, that this gives you you direct access to the x(property) and y(property)
(Aside - do those names really have parentheses in them?)
So to me it looks like you should replace things like
* COS(RADIANS(z.(x(property))))
with something like
* COS(RADIANS( select x(property) from mytable where id = z.id ))
However on further thinking about it, I think that your mytable doesn't have the required structure. From looking at the link, I believe that your mytable should have a structure more like:
+-----------+----------------+
| Field | Type |
+-----------+----------------+
| id | Int(10) |
| latitude | Float |
| longitude | Float |
+-----------+----------------+
So that you can do something like
* COS(RADIANS(z.latitude))
NOTE
The above was based on me not understanding that MySQL supports spatial data types (for which I have no idea how to use)
Update
I just did some googling to understand the spatial types and found this:
How do you use MySQL spatial queries to find all records in X radius? [closed]
which suggests that you can't do what you want to do with spatial data types in mysql. Which thus brings you back to using a non-optimal way of storing data in mutable
However in re-reading that link, the comments to the answer suggest that you may now be able to use spatial data types. (I told you I didn't have a clue here) This would mean replacing the query code with things like ST_Distance(g1,g2), which effectively means totally rewriting the example.
To put it another way
The example you gave presumes that spatial data types and testing of
geometries do not exist in MySQL. But now that they do exist, they
make this set of example code irrelevant, and that you are in for a
world of hurt if you try and combine the two forms of analysis.
update 2
There are three paths you can follow:
Deny that spatial data types exists in MySQL and use a table that has explicit columns for lat and long, and use the sample code as originally written on that blog.
Embrace the MySQL spatial data types (warts and all) and take a look at things like this answer https://stackoverflow.com/a/21231960/31326 that seem to do what you want directly with spatial data types, but as noted in that answer there are some caveats.
Use a spatial type to hold your data, and use a pre-query to extract lat and long before passing it into the original sample code.
Finally i followed this link and ended with this.
query is not optimized yet but working great.
here is my query
select id, ( 3959 * acos( cos( radians(12.91841) ) * cos( radians( y(property) ) ) * cos( radians( x(property)) - radians(77.58631) ) + sin( radians(12.91841) ) * sin( radians(y(property) ) ) ) ) AS distance from mytable having distance < 10 order by distance limit 10;

How can I speed up this MySQL query that finds the closest locations to a given latitude/longitude?

I have a zip code table in my database which is used in conjunction with a business table to find businesses matching certain criteria that is closest to a specified zip code. The first thing I do is grab just the latitude and longitude since it's used in a couple places on the page. I use:
$zipResult = mysql_fetch_array(mysql_query("SELECT latitude,longitude FROM zipCodes WHERE zipCode='".mysql_real_escape_string($_SESSION['zip'])."' Limit 1"));
$latitude = $zipResult['latitude'];
$longitude = $zipResult['longitude'];
$radius = 100;
$lon1 = $longitude - $radius / abs(cos(deg2rad($latitude))*69);
$lon2 = $longitude + $radius / abs(cos(deg2rad($latitude))*69);
$lat1 = $latitude - ($radius/69);
$lat2 = $latitude + ($radius/69);
From there, I generate the query:
$query2 = "Select * From (SELECT business.*,zipCodes.longitude,zipCodes.latitude,
(3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - $latitude)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS($latitude *pi()/180) * POWER(SIN((zipCodes.longitude - $longitude) *pi()/180 / 2), 2) ) )) as distance FROM business INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode)
Where business.active = 1
And (3958*3.1415926*sqrt((zipCodes.latitude-$latitude)*(zipCodes.latitude-$latitude) + cos(zipCodes.latitude/57.29578)*cos($latitude/57.29578)*(zipCodes.longitude-$longitude)*(zipCodes.longitude-$longitude))/180) <= '$radius'
And zipCodes.longitude between $lon1 and $lon2 and zipCodes.latitude between $lat1 and $lat2
GROUP BY business.id ORDER BY distance) As temp Group By category_id ORDER BY distance LIMIT 18";
Which turns out something like:
Select *
From (SELECT business.*,zipCodes.longitude,zipCodes.latitude, (3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - 39.056784)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS(39.056784 *pi()/180) * POWER(SIN((zipCodes.longitude - -84.343573) *pi()/180 / 2), 2) ) )) as distance
FROM business
INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode)
Where business.active = 1
And (3958*3.1415926*sqrt((zipCodes.latitude-39.056784)*(zipCodes.latitude-39.056784) + cos(zipCodes.latitude/57.29578)*cos(39.056784/57.29578)*(zipCodes.longitude--84.343573)*(zipCodes.longitude--84.343573))/180) <= '100'
And zipCodes.longitude between -86.2099407074 and -82.4772052926
and zipCodes.latitude between 37.6075086377 and 40.5060593623
GROUP BY business.id
ORDER BY distance) As temp
Group By category_id
ORDER BY distance
LIMIT 18
The code runs and executes just fine, but it takes just over a second to complete (usually around 1.1 seconds). However, I've been told that in some browsers the page loads slowly. I have tested this is multiple browsers and multiple versions of those browsers without ever seeing an issue. However, I figure if I can get the execution time down it will help either way. The problem is I do not know what else I can do to cut down on the execution time. The zip code table already came with preset indexes which I assume are good (and contains the columns I'm using in my queries). I've added indexes to the business table as well, though I'm not too knowledgeable about them. But I've made sure to include the fields used in the Where clause at least, and maybe a couple more.
If I need to add my indexes to this question just let me know. If you see something in the query itself I can improve also please let me know.
Thanks,
James
EDIT
Table structure for the business table:
CREATE TABLE IF NOT EXISTS `business` (
`id` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`active` tinyint(3) unsigned NOT NULL,
`featured` enum('yes','no') NOT NULL DEFAULT 'yes',
`topFeatured` tinyint(1) unsigned NOT NULL DEFAULT '0',
`category_id` smallint(5) NOT NULL DEFAULT '0',
`listZip` varchar(12) NOT NULL,
`name` tinytext NOT NULL,
`address` tinytext NOT NULL,
`city` varchar(128) NOT NULL,
`state` varchar(32) NOT NULL DEFAULT '',
`zip` varchar(12) NOT NULL,
`phone` tinytext NOT NULL,
`alt_phone` tinytext NOT NULL,
`website` tinytext NOT NULL,
`logo` tinytext NOT NULL,
`index_logo` tinytext NOT NULL,
`large_image` tinytext NOT NULL,
`description` text NOT NULL,
`views` int(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `featured` (`featured`,`topFeatured`,`category_id`,`listZip`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3085 ;
SQL Fiddle
http://sqlfiddle.com/#!2/2e26ff/1
EDIT 2014-03-26 09:09
I've updated my query, but the shorter query actually takes about .2 seconds longer to execute every time.
Select * From (
SELECT Distinct business.id, business.name, business.large_image, business.logo, business.address, business.city, business.state, business.zip, business.phone, business.alt_phone, business.website, business.description, zipCodes.longitude, zipCodes.latitude, (3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - 39.056784)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS(39.056784 *pi()/180) * POWER(SIN((zipCodes.longitude - -84.343573) *pi()/180 / 2), 2) ) )) as distance
FROM business
INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode)
Where business.active = 1
And zipCodes.longitude between -86.2099407074 and -82.4772052926
And zipCodes.latitude between 37.6075086377 and 40.5060593623
GROUP BY business.category_id
HAVING distance <= '50'
ORDER BY distance
) As temp LIMIT 18
There is already an index on the zip code, latitude, and longitude fields in the zip codes database, both all in one index, and each with their own index. That's just how the table came when purchased.
I had updated the listZip data type to match the zip code table's zip data type yesterday.
I did take out the GROUP BY business.id and replace it with DISTINCT, but left the GROUP BY business.category_id because I only want one business per category.
Also, I started getting the 0.2 second execution difference as soon as I changed the query to use the HAVING clause instead of the math formula in the WHERE clause. I did try using WHERE distance <= 50 in the outer-query, but that didn't speed anything up either. Also using 50 miles instead of 100 miles doesn't seem to effect this particular query either.
Thanks for all of the suggestions so far though.
Put indexes on zipCodes.longitude and zipCodes.latitude. That should help a lot.
See here for more information. http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/
Edit you need an index in the zipCodes table on longitude alone or starting with longitude. It looks to me like you should try a composite index on
(longitude, latitude, zipCode)
for best results.
Make the data types of zipCodes.zipCode and business.listingZip the same, so the join will be more efficient. If those data types are different, MySQL will typecast one to the other as it does the join, and so the join will be inefficient. Make sure business.listingZip has an index.
You are misusing GROUP BY. (Did you maybe mean SELECT DISTINCT?) It makes no sense unless you also use an aggregate function like MAX() In a similar vein, see if you can get rid of the * in SELECT business.*, and instead give a list of the columns you need.
100 miles is a very wide search radius. Narrow it a bit to speed things up.
You're computing the great circle distance twice. You surely can recast the query to do it once.

Different result for Haversine formulas

I am using mysql to count the proximity and for that i have created one procedure named distance which is as follows but that procedure is not working properly but the sql statement is working so what is the difference over here as both are i guess Haversine formulas but not giving me the proper result. i really don't know wht i am missing in formula one.
Data structure of my table is as follows
for formula one
id varchar(100)
userid varchar(100)
username varchar(100)
currLoc point
radius int(10)
for formula two
id varchar(30)
userid varchar(30)
username varchar(40)
lat float(10,6)
lan float(10,6)
radius varchar(100)
Formula One: reference
sql statement to execute distance function
SELECT userid, username, distance(userstatus.currLoc,
GeomFromText('POINT(23.039574 72.56602)')) AS cdist
FROM userstatus HAVING cdist <= 0.6 ORDER BY cdist LIMIT 10
RETURN 6371 * 2 *
ASIN( SQRT(POWER(SIN(RADIANS(ABS(X(a)) - ABS(X(b)))), 2) +
COS(RADIANS(ABS(X(a)))) * COS(RADIANS(ABS(X(b)))) *
POWER(SIN(RADIANS(Y(a) - Y(b))), 2)));
Formula two: reference
SELECT *,(((acos(sin((23.039574*pi()/180)) *
sin((lat *pi()/180))+cos((23.039574*pi()/180)) *
cos((lat *pi()/180)) * cos(((72.56602- lon)*pi()/180))))*
180/pi())*60*1.1515*1.609344) as distance
FROM status HAVING distance <= 0.6
here 0.6 is a radius in kilometers
One version of the expression is using ABS(X(a)) etc and the other is not. The one using ABS is suspect. You can't afford to ignore the sign on the angles. You'll get different results in some areas of the world (near the equator or the prime meridian, for example, or near the poles).
Your constants are also different.
60*1.1515*1.609344
vs
6371 * 2
One expression involves SQRT, the other does not.
One expression involves ASIN and the other uses ACOS.
There is essentially nothing in common between the two...
See the discussion at Wikipedia 'Haversine Formula', and in particular the references to numerical stability when the distance between the points is small.
You could also improve the chances of people helping you by making the formulae you're using semi-readable, by splitting them over lines.
For example:
RETURN 6371 * 2 *
ASIN( SQRT(POWER(SIN(RADIANS(ABS(X(a)) - ABS(X(b)))), 2) +
COS(RADIANS(ABS(X(a)))) * COS(RADIANS(ABS(X(b)))) *
POWER(SIN(RADIANS(Y(a) - Y(b))), 2)));
And:
(((acos(sin((23.039574*pi()/180)) * sin((lat *pi()/180)) +
cos((23.039574*pi()/180)) * cos((lat *pi()/180)) *
cos(((72.56602-lan)*pi()/180))
)
) * 180/pi()) * 60 * 1.1515 * 1.609344)
The latter references 'lan'; is that meant to be 'lon'? In the second example, you appear to have encoded one of the two positions as 23.039574°N and 72.56602°W, and lat and lan come from the table in the SQL query.