I have the following MYSQL query which is running on a table with around 50,000 records. The query is returning records within a 20 mile radius and i'm using a bounding box in the where clause to narrow down the records. The query is sorted by distance and limited to 10 records as it will be used on a paginated page.
The query is currently taking 0.0210 seconds to complete on average, but because the website is so busy I am looking for ways to improve this.
The adverts table has around 20 columns in it and has an index on the longitude and latitude columns.
Can anyone see anyway to improve the performance of this query? I was thinking about creating a separate table which just has the advert_id and longitude and latitude fields, but was wondering if anyone had any other suggestions or ways to improve the query below?
SELECT adverts.advert_id,
round( sqrt( ( ( (adverts.latitude - '52.536320') *
(adverts.latitude - '52.536320') ) * 69.1 * 69.1 ) +
( (adverts.longitude - '-2.063380') *
(adverts. longitude - '-2.063380') * 53 * 53 ) ),
1 ) as distance FROM adverts
WHERE (adverts.latitude BETWEEN 52.2471737281 AND 52.8254662719)
AND (adverts.longitude BETWEEN -2.53875093307 AND -1.58800906693)
having (distance <= 20)
ORDER BY distance ASC
LIMIT 10
You have to use spatial data formats and spatial indexes: how to use them.
In particular, you have to use the POINT data format to store both latitude and longitude in a single column, then you add a spatial index to that column.
The spatial index is usually implemented as an R-tree (or derivations) so that the cost of searching all points in a given area is logarithmic.
Related
I'm not particularly knowledgeable about MYSQL queries and optimising them, so I require a bit of help on this one. I'm checking a table of international cities to find the 10 nearest cities based on the longitude and latitude values in the table.
The query I'm using for this is as follows:
SELECT City as city,
SQRT(POW(69.1 * (Latitude - 51.5073509), 2) +
POW(69.1 * (-0.1277583 - Longitude) * COS(Latitude / 57.3), 2)) AS distance
from `cities`
group by `City`
having distance < 50
order by `distance` asc
limit 10
(The longitude & latitude values are obviously placed dynamically in my code)
sometimes this can take around 3-4 mintues of my development environment to complete.
Have I made any classic mistakes here, or is there a much better query I should be using to retrieve this data?
Any help woould be greatly appreciated.
Assuming City is unique and you are abusing GROUP BY and HAVING in order to get a cleaner code
SELECT City as city,
SQRT(POW(69.1 * (Latitude - 51.5073509), 2) +
POW(69.1 * (-0.1277583 - Longitude) * COS(Latitude / 57.3), 2)) AS distance
from `cities`
where SQRT(POW(69.1 * (Latitude - 51.5073509), 2) +
POW(69.1 * (-0.1277583 - Longitude) * COS(Latitude / 57.3), 2)) < 50
order by `distance` asc
limit 10
If City is unique then the aggregation is done on single rows.
MySQL uses sort operation to implement GROUP BY.
Sort complexity is O(n*log(n)), so without indexes this is going to complexity of GROUP BY.
If City is not unique than the filtering in the HAVING CLAUSE is done on one arbitrary row which is for sure not what the OP intended.
The case where HAVING and WHERE are both relevant for filtering and HAVING has an performance advantage is where the filtering is done on the aggregated column, there are some heavy calculations and the GROUP BY operation significantly reduce the number of rows
select x,... from ... group by x having ... some heavy calculations on x ...
This is a follow-up to my previous post How to improve wind data SQL query performance.
I have expanded the SQL statement to also perform the first part in the calculation of the average wind direction using circular statistics. This means that I want to calculate the average of the cosines and sines of the wind direction. In my PHP script, I will then perform the second part and calculate the inverse tangent and add 180 or 360 degrees if necessary.
The wind direction is stored in my table as voltages read from the sensor in the field 'dirvolt' so I first need to convert it to radians.
The user can look at historical wind data by stepping backwards using a pagination function, hence the use of LIMIT which values are set dynamically in my PHP script.
My SQL statement currently looks like this:
SELECT ROUND(AVG(speed),1) AS speed_mean, MAX(speed) as speed_max,
MIN(speed) AS speed_min, MAX(dt) AS last_dt,
AVG(SIN(2.04*dirvolt-0.12)) as dir_sin_mean,
AVG(COS(2.04*dirvolt-0.12)) as dir_cos_mean
FROM table
GROUP BY FLOOR(UNIX_TIMESTAMP(dt) / 300)
ORDER BY FLOOR(UNIX_TIMESTAMP(dt) / 300) DESC
LIMIT 0, 72
The query takes about 3-8 seconds to run depending on what value I use to group the data (300 in the code above).
In order for me to learn, is there anything I can do to optimize or improve the SQL statement otherwise?
SHOW CREATE TABLE table;
From that I can see if you already have INDEX(dt) (or equivalent). With that, we can modify the SELECT to be significantly faster.
But first, change the focus from 72*300 seconds worth of readings to datetime ranges, which is 6(?) hours.
Let's look at this query:
SELECT * FROM table
WHERE dt >= '...' - INTERVAL 6 HOUR
AND dt < '...';
The '...' would be the same datetime in both places. Does that run fast enough with the index?
If yes, then let's build the final query using that as a subquery:
SELECT FORMAT(AVG(speed), 1) AS speed_mean,
MAX(speed) as speed_max,
MIN(speed) AS speed_min,
MAX(dt) AS last_dt,
AVG(SIN(2.04*dirvolt-0.12)) as dir_sin_mean,
AVG(COS(2.04*dirvolt-0.12)) as dir_cos_mean
FROM
( SELECT * FROM table
WHERE dt >= '...' - INTERVAL 6 HOUR
AND dt < '...'
) AS x
GROUP BY FLOOR(UNIX_TIMESTAMP(dt) / 300)
ORDER BY FLOOR(UNIX_TIMESTAMP(dt) / 300) DESC;
Explanation: What you had could not use an index, hence had to scan the entire table (which is getting bigger and bigger). My subquery could use an index, hence was much faster. The effort for my outer query was not "too bad" since it worked with only N rows.
I'm using CodeIgniter 2 and in my database model, I have a query that joins two tables and filters row based upon distance from a given geolocation.
SELECT users.id,
(3959 * acos(cos(radians(42.327612)) *
cos(radians(last_seen.lat)) * cos(radians(last_seen.lon) -
radians(-77.661591)) + sin(radians(42.327612)) *
sin(radians(last_seen.lat)))) AS distance
FROM users
JOIN last_seen ON users.id = last_seen.seen_id
WHERE users.age >= 18 AND users.age <= 30
HAVING distance < 50
I'm not sure if it's the distance that is making this query take especially long. I do have over 300,000 rows in my users table. The same amount in my last_seen table. I'm sure that plays a role.
But, the age column in the users table is indexed along with the id column.
The lat and lon columns in the last_seen table are also indexed.
Does anyone have ideas as to why this query takes so long and how I can improve it?
UPDATE
It turns out that this query actually runs pretty quickly. When I execute this query in PHPMyAdmin, it takes 0.56 seconds. Not too bad. But, when I try to execute this query with a third party SQL client like SequelPro, it takes at least 20 seconds and all of the other apps on my mac slow down. When the query is executed by loading the script via jQuery's load() method, it takes around the same amount of time.
Upon viewing my network tab in Google Chrome's developer tools, it seems that the reason it's taking so long to load is because of what's called TTFB or Time To First Byte. It's taking forever.
To make this query faster you need to limit the count of rows using an index before actually calculating the distance on every and each of them. To do so you can limit the rows from last_seen based on their lat/lon and a rough formula for desired distance.
The idea is that the positions with the same latitude as the reference latitude would be in 50 miles distance if their longitude falls in a certain distance from the reference longitude and vice versa.
For 50 miles distance, RefLat+-1 and RefLon+-1 would be a good start to limit the rows before actually calculating the precise distance.
last_seen.lat BETWEEN 42.327612 - 1 AND 42.327612 + 1
AND last_seen.lon BETWEEN -77.661591 - 1 AND -77.661591 + 1
For this query:
SELECT users.id, (3959 * acos(cos(radians(42.327612)) * cos(radians(last_seen.lat)) * cos(radians(last_seen.lon) - radians(-77.661591)) + sin(radians(42.327612)) * sin(radians(last_seen.lat)))) AS distance
FROM users JOIN
last_seen
ON users.id = last_seen.seen_id
WHERE users.age >= 18 AND users.age <= 30
HAVING distance < 50;
The best index is users(age, id) and last_seen(seen_id). Unfortunately, the distance calculations are going to take a while, because they have to be calculated for every row. You might want to consider a GIS extension to MySQL to help with this type of query.
I am trying to get the nearest location to a users input from within a database, (nearest store based on latitude and longitude), so based on the users postcode I am converting that to latitude and longitude and from these results I need to search my database to find the store that is the nearest to these values. I have the latitude and longitude of all stores saved and so far (from looking at previous questions) I have tried something like:
SELECT *
FROM mystore_table
WHERE `latitude` >=(51.5263472 * .9) AND `longitude` <=(-0.3830181 * 1.1)
ORDER BY abs(latitude - 51.5263472 AND longitude - -0.3830181) limit 1;
When I run this query, it does display a result, but it is not the nearest store, not sure if it could be something to do with the negative numbers, both my columns latitude + longitude are saved as decimal data types?
You have a logic operation in the order by rather than an arithmetic one. Try this:
SELECT *
FROM mystore_table
WHERE `latitude` >=(51.5263472 * .9) AND `longitude` <=(-0.3830181 * 1.1)
ORDER BY abs(latitude - 51.5263472) + abs(longitude - -0.3830181)
limit 1;
The AND in your original version would be producing a boolean value, either 0 or 1 -- and it would only be 1 when the values match exactly to the last decimal point. Not very interesting.
There are many reasons why this is not the nearest distance, but it might be close enough for your purposes. Here are some reasons:
Euclidean distance would take the square of the differences
Distance between two latitudes depends on the longitude (varying from about 70 miles on the equator to 0 at the poles).
I'm trying to get 100 points from my table with a lowest distance to a given point.
I'm using
SELECT *, GLENGTH(
LINESTRINGFROMWKB(
LINESTRING(
ASBINARY(
POINTFROMTEXT("POINT(40.4495 -79.988)")
),
ASBINARY(pt)
)
)
)
AS `distance` FROM `ip_group_city` ORDER BY distance LIMIT 100
(Yeah, that's painful. I've just googled it. I have no idea how to measure distance in MySQL correctly)
It takes very long time for execute. EXPLAIN says that there are no possible_keys.
I created a SPATIAL index on the pt column:
CREATE SPATIAL INDEX sp_index ON ip_group_city (pt);
Though I don't really know how to use it correctly. Can you please help me?
Because you don't have WHERE clause therefore no affected index. I think you should improve this query by add using MBR_ (MySQL 5.0 or later) or ST_ functions (MySQL 5.6 or later).
Something like:
SELECT *, GLENGTH(
LINESTRINGFROMWKB(
LINESTRING(
ASBINARY(
POINTFROMTEXT("POINT(40.4495 -79.988)")
),
ASBINARY(pt)
)
)
)
AS `distance`
FROM `ip_group_city`
WHERE
MBRWithin(
pt, -- your point
GeomFromText('Polygon( -- your line (in polygon format) from pt to target point
(
#{bound.ne.lat} #{bound.ne.lng}, --North East Lat - North East Long
#{bound.ne.lat} #{bound.sw.lng}, --North East Lat - South West Long
#{bound.sw.lat} #{bound.sw.lng}, --
#{bound.sw.lat} #{bound.ne.lng},
#{bound.ne.lat} #{bound.ne.lng}
)
)')
)
ORDER BY distance LIMIT 100
I've used the great circle equation to do these types of calculations in the past. I'm not sure how the performance compares but it might be worth trying it and comparing.
Here is a good SO post that goes over how to do it in MySQL.
Have a look at these questions:
Finding Cities within 'X' Kilometers (or Miles)