Optimize calculations in this query? - mysql

What's the best way to optimize this query?
$tripsNearLocation = mysqli_query($con,
"SELECT * FROM (
SELECT *
, ( 3959 * acos( cos(" . $latRad . ")
* cos( radians( startingLatitude ) )
* cos( radians( startingLongitude )
- (" . $longRad . ") )
+ sin(" . $latRad . ")
* sin( radians( startingLatitude ) ) ) )
AS distance FROM trips
) as query
WHERE distance < 10
ORDER BY distance LIMIT 0 , 10;");
With 50,000 rows it takes it a second or two to finish. Should I add a different query that eliminates all rows that aren't even in the "close range" of the coordinates inputted then calculate the remaining rows? Say if the latitude coordinate inputted is 67, eliminate all rows with latitude coordinate that isn't from 65-69.
Or add a "state column" where it removes all rows from calculations if they aren't in the same state?
Or just deal with the 2 seconds of calculations? I'm worried the database may contain more that 100,000 rows and it will take to long to excute.

Plan A: For 100K rows, you might get away with just narrowing down by latitude. That is,
calculate the degrees latitude that corresponds to "10" units of distance
Have INDEX(startingLatitude)
Add to the WHERE clause to limit it to startingLatitude plus/minus "10". Perhaps your example is AND startingLatitude BETWEEN 65 AND 69.
If you are thinking about using INDEX(lat, lng), it is not as simple. See if Lat is good enough.
Plan B: Next choice will involve lat and lng, plus a subquery. And version 5.6 would be beneficial. It's something like this (after including INDEX(lat, lng, id)):
SELECT ... FROM (
SELECT id FROM tbl
WHERE lat BETWEEN...
AND lng BETWEEN... ) x
JOIN tbl USING (id)
WHERE ...;
For various reasons, Plan B is only slightly better than Plan A.
Plan C: If you are going to need millions of rows, you will need my pizza parlor algorithm. This involves a Stored Procedure to repeatedly probe, looking for enough rows. It also involves PARTITIONing to get a crude 2D index.
Plans A and B are O(sqrt(N)); Plan C is O(1). That is, for Plans A and B, if you quadruple the number of rows, you double the time taken. Plan C does not get slower. (It sounded like your code is O(N) -- double the rows = double the time.)

This is how I ended up solving it incase people need to reference this in the future.
$tripsNearLocation = mysqli_query($con, "SELECT * FROM (
SELECT *, (3959 * acos(cos(" . $latRad . ") * cos(radians(startingLatitude))
* cos(radians(startingLongitude) - (" . $longRad . ")) + sin(" . $latRad . ")
* sin(radians(startingLatitude)))) AS distance FROM (
SELECT * FROM trips_test WHERE startingLatitude BETWEEN " .
($locationLatitude - 1) . " AND " . ($locationLatitude + 1) . ") as query1)
as query2 WHERE distance < 10 ORDER BY distance LIMIT 0 , 10;");
Although I will accept Rick James' answer as he helped me get to this solution.

Related

Slow performance when running repeated complex spatial query in MySQL

I'm searching for circle select by distance. I have one point with latitude & longitude and I want to search if I have in database some points around me. And yes, it's must be a circle!
I'm using this clause in query (I just google it, I can't do math):
((6373 * acos (cos ( radians( 48.568962 ) ) * cos( radians( X(coords) ) ) * cos( radians( Y(coords) ) - radians( 6.821352 ) ) + sin ( radians( 48.568962 ) ) * sin( radians( X(coords) ) ))) <='0.2')
0.2 = 200 meters
I'm using POINT data type
Yes, I have SPATIAL index on it
Yes, I'm trying to use the "spatial" functions, but it's not returning a circle, it's returning some OVAL and i need PRECISE circle
This "circle" clause takes very, very, VERY long time for all tables. When I'm using the OVAL method of SPATIAL foos. It takes maybe 0.1s and that's great! But I need circle and this takes 17 sec, LOL.
Can you help me someone? Thanks a lot guys!
EDIT: spatial functions means some like this:
WHERE ST_Contains(ST_Buffer(
ST_GeomFromText('POINT(12.3456 34.5678)'), (0.00001*1000)) , coords) <= 1 /* 1 km */
EDIT 2 (table struct.):
I'm expecting 10 rows from this tables of course I have indexes on wz_uuid
select a....., b.... from table_1 a left join table_2 b on a.wz_uuid=b.wz_uuid
And this is not just 2 tables, i have 11 tables *2 like this. (weekly database backups). First tables (_1) have 0-4000 rows, 2-11 have 300k+ rows.
All indexes are relevant and also data types & encoding.
wz_uuid & id - unique, btree index
others - btree indexes
coords - spatial index
Great solution from XX sec to 100 ms, that's all I want :-)
Use MySQL spatial extensions to select points inside circle

Mysql - distance parameter search finding all targets within distance of central location using individual preference

This is a tough one to explain. I'm able to find all zipcodes within a radius of x miles. However what I want to do is find all Userids from tblUsers whos MaxDistance is <= x zipcode.
So in plain english I want to know all the people who are within a zipcode radius based on their MaxDistance
For example I have a table:
tblUsers(ID int, Maxdistance int,Zipcode varchar(5))
1|50|94129
2|25|94111
3|100|19019
In my second table:
tblTmpPlaces(ID int,Zipcode varchar(5))
1|94129
What I want to do is using tblTmpPlaces zipcode, I want to be able to say hey users 1 and 2 are within their max distance and select these. However user 3's max distance is 100 and not close enough to tblTmpPlaces zipcode of 94129. 94129 is San Fran, and 19019 is Philadelphia. The user is over 100 miles from San Fran.
This is what i've been using to get the distance but this uses a central location to find all within an area but it doesn't take into consideration MaxDistance. Any help is appreciated.
So basically select ID from tblUsers where this is the part i'm stumbling on
SELECT Zipcode
FROM tblZipcodes
WHERE ( 3959
* acos(
cos(
radians(
#XLocationParam))
* cos(
radians(
x(location)))
* cos(
radians(
y(location))
- radians(
#YLocationParam))
+ sin(
radians(
#XLocationParam))
* sin(
radians(
x(location)))) <= 30))
It really looks like you need the latitude and longitude for the "center" of each zipcode. Without that, MySQL can't calculate the distance between the zip codes.
tblZipcodeLatLong
( Zipcode varchar(5)
, latitude decimal(7,4)
, longitude decimal(7,4)
)
Then you could calculate the distance between all of the Zipcodes, using your Great Circle Distance (GCD) formula.
For performance, though, you'll likely not want to do that in each individual query, but rather, you'd want to pre-calculate the distance between all the Zipcodes, and have those calculated distances stored in a table.
SELECT p1.Zipcode AS p1_Zipcode
, p2.Zipcode AS p2_Zipcode
, <gcd_formula> AS distance
FROM tblZipcodeLatLong p1
CROSS
JOIN tblZipcodeLatLong p2
Where <gcd_distance> represents your great circle distance formula that calculates the distance between all of the zipcodes.
A query of this form would return the result set you are looking for:
SELECT u.*, p.*
FROM tblTmpPlaces p
JOIN (
SELECT p1.Zipcode AS p_Zipcode
, p2.Zipcode AS u_Zipcode
, <gcd_formula> AS distance
FROM tblZipcodeLatLong p1
CROSS
JOIN tblZipcodeLatLong p2
) d
ON d.p_Zipcode = p.Zipcode
JOIN tblUsers u
ON u.Zipcode = d.u_Zipcode
AND u.Maxdistance >= d.distance
WHERE p.Zipcode = '94129'
As I noted before, doing that cross join operation and calculating all those distances in that subquery (aliased as d) on each query could be quite a bit of overhead. For performance, you'd likely want those results pre-calculated, stored in an appropriately indexed table, and then replace that subquery with a reference to the pre-populated table.
NOTE:
I have a GCD formula in one of the other answers I posted here on stackoverflow a while back. I'll see if I can find it.
Similar question answered here:
MYSQL sorting by HAVING distance but not able to group?
Did the exact same thing, if i can understand, u are getting the zipcodes already (possibly in an array) simple thing to do would be to find an match of those zipcodes with the corresponding users using the IN operator in sql
SELECT *
FROM users
WHERE zipcode IN zipcodes_array;
zipcodes_array being the array u already have

Different result for Haversine formulas

I am using mysql to count the proximity and for that i have created one procedure named distance which is as follows but that procedure is not working properly but the sql statement is working so what is the difference over here as both are i guess Haversine formulas but not giving me the proper result. i really don't know wht i am missing in formula one.
Data structure of my table is as follows
for formula one
id varchar(100)
userid varchar(100)
username varchar(100)
currLoc point
radius int(10)
for formula two
id varchar(30)
userid varchar(30)
username varchar(40)
lat float(10,6)
lan float(10,6)
radius varchar(100)
Formula One: reference
sql statement to execute distance function
SELECT userid, username, distance(userstatus.currLoc,
GeomFromText('POINT(23.039574 72.56602)')) AS cdist
FROM userstatus HAVING cdist <= 0.6 ORDER BY cdist LIMIT 10
RETURN 6371 * 2 *
ASIN( SQRT(POWER(SIN(RADIANS(ABS(X(a)) - ABS(X(b)))), 2) +
COS(RADIANS(ABS(X(a)))) * COS(RADIANS(ABS(X(b)))) *
POWER(SIN(RADIANS(Y(a) - Y(b))), 2)));
Formula two: reference
SELECT *,(((acos(sin((23.039574*pi()/180)) *
sin((lat *pi()/180))+cos((23.039574*pi()/180)) *
cos((lat *pi()/180)) * cos(((72.56602- lon)*pi()/180))))*
180/pi())*60*1.1515*1.609344) as distance
FROM status HAVING distance <= 0.6
here 0.6 is a radius in kilometers
One version of the expression is using ABS(X(a)) etc and the other is not. The one using ABS is suspect. You can't afford to ignore the sign on the angles. You'll get different results in some areas of the world (near the equator or the prime meridian, for example, or near the poles).
Your constants are also different.
60*1.1515*1.609344
vs
6371 * 2
One expression involves SQRT, the other does not.
One expression involves ASIN and the other uses ACOS.
There is essentially nothing in common between the two...
See the discussion at Wikipedia 'Haversine Formula', and in particular the references to numerical stability when the distance between the points is small.
You could also improve the chances of people helping you by making the formulae you're using semi-readable, by splitting them over lines.
For example:
RETURN 6371 * 2 *
ASIN( SQRT(POWER(SIN(RADIANS(ABS(X(a)) - ABS(X(b)))), 2) +
COS(RADIANS(ABS(X(a)))) * COS(RADIANS(ABS(X(b)))) *
POWER(SIN(RADIANS(Y(a) - Y(b))), 2)));
And:
(((acos(sin((23.039574*pi()/180)) * sin((lat *pi()/180)) +
cos((23.039574*pi()/180)) * cos((lat *pi()/180)) *
cos(((72.56602-lan)*pi()/180))
)
) * 180/pi()) * 60 * 1.1515 * 1.609344)
The latter references 'lan'; is that meant to be 'lon'? In the second example, you appear to have encoded one of the two positions as 23.039574°N and 72.56602°W, and lat and lan come from the table in the SQL query.

Slow SQL Query by Limit/Order dynamic field (coordinates from X point)

I'm trying to make a SQL query on a database of 7 million records, the database "geonames" have the "latitude" and "longitude" in decimal(10.7) indexed both, the problem is that the query is too slow:
SELECT SQL_NO_CACHE DISTINCT
geonameid,
name,
(6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance
FROM geoNames
WHERE (6367.41 * SQRT(2 * (1 - Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude)) * Sin(-0.0669560660943) + cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533))) <= '10')
ORDER BY Distance
The problem is sort by the "Distance" field, which when created dynamically take long to seep into the condition "WHERE", if I remove the condition of the "WHERE ... <= 10" takes only 0.34 seconds, but the result is 7 million records and to transfer data from MySQL to PHP takes almost 120 seconds.
Can you think of any way to make the query to not lose performance by limiting the Distance field, given that the query will very often change the values?
This kind of query cannot use an index but must compute whether the lat/lon of each row falls within the specified distance. Therefore, it is typical that some form of preprocessing is used to limit the scan to a subset of rows. You could create tables corresponding to distance "bands" (2, 5, 8, 10, 20 miles/km -- whatever makes sense for your application requirements) and then populate these bands and keep them up to date. If you want only those medical providers, say, or hotels, or whatever, within 10 miles of a given location, there's no need to worry about the ones that are hundreds or thousands of miles away. With ad hoc queries you could inner join on the "within 10 miles" band, say, and thereby exclude from the comparison scan all rows where the computed distance > 10. When the location varies, the "elegant" way to handle this is to implement an RTREE, but you can define your encompassing region in any arbitrary way you like if you have access to additional data -- e.g. by using zipcodes or counties or states.
There are two things you can do:
Make sure the datatypes are the same on both sides of a comparison: ie compare with 10 (a number), not '10' (a char type) - it will make less work for the DB
In cases like this, I create a view, which means the calculation to be made just once, even if you refer to it more than once in the query
If these two points are incorporated into you code, you get:
CREATE VIEW geoNamesDistance AS
SELECT SQL_NO_CACHE DISTINCT
geonameid,
name,
(6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance
FROM geoNames;
SELECT * FROM geoNamesDistance
WHERE Distance <= 10
ORDER BY Distance;
I came up with:
select * from retailer
where latitude is not null and longitude is not null
and pow(2*(latitude - ?), 2) + pow(longitude - ?, 2) < your_magic_distance_value
With this fast & easy flat-Earth code, Los Angeles is closer to Honolulu than San Fransisco, but i doubt customers will consider that when going that far to shop.

Finding all users within X miles calculation via mysql is slow

I'm having trouble with this query. It takes about 5 seconds to run against ~300 users. I assume it's because it's calculating the distance for every possible user.
Is there a way for me to optimize this to make it run fast? Thanks in advance.
select
t2.*,
t2.city,
t2.state,
t2.county,
ifnull(round((6371 * acos( cos( radians('32.7211') ) * cos( radians( t2.latitude ) ) * cos( radians( t2.longitude ) - radians('-117.16431') ) + sin( radians('32.7211') ) * sin( radians( t2.latitude ) ) ) ),0),1) AS distance
from
users t1
inner join
zipcodes_coordinates t2
on t1.zip_code=t2.zipcode
having
distance <= 150
I would eliminate as much of the data as you can before the main bit of the query is run, which you list. Your query is almost certainly looping over every single row in the table.
For example, you know that if a user at (X,Y) is within an R mile circle of a certain point X',Y', then they are certainly within a square of diameter 2R, which means the following things hold:
X <= X' + R
X >= X' - R
Y <= Y' + R
Y >= Y' - R
So to make a query on the database, you could first have the database eliminate all users who's X value doesn't satisfy those constraints, and this can be done using the index on the field. (same goes for the Y co-ordinate)
Another (rather more domain-specific) trick would be to split the world up into small squares that are indexable with a single identifier (could be a long, or even a string with the co-ordinates of the centre so long as you could re-create them reliably from any co-ordinate within the square). Then store which square each co-ordinate is in as well as the co-ordinate itself. If you are looking for e.g a 5 mile radius, then make the squares something like 2 miles square. That way you can very quickly do a search on a small number of adjacent squares by identity (it would be no more than 9 in this case), then loop over the results in those squares to find the closest matches in your application.
Most performance optimisations in this kind of thing are about eliminating data that certainly doesn't fit and then refining, rather than immediately going after data that certainly does.
PS - if you are using MySQL there is a GIS extension, which I haven't tried: http://dev.mysql.com/tech-resources/articles/4.1/gis-with-mysql.html. This probably does something like what I describe, and may or may not take into account the curvature of the earth, etc. However in most cases the successive refinement method is fairly safe, and means your database doesn't have to 'know' about GIS co-ordinate systems.