MySQL spatial query to find all rows that deliver to set point - mysql

I have a table, stores, with thousands of stores that deliver. If i have the lat, lng, and delivery_radius for each store (I can add a point column), what is the most efficient way to query the table to see which stores can deliver to where I stand currently?
I feel that checking if the distance between myself and each row is less than the delivery_radius would be a very long process. Would it be best to add a column to store a polygon calculated from each row's info and see if my current point is in that polygon (point-in-polygon)? Any other suggestions?

You can get the distance between two geo points by using following code segment in a SQL query.
ROUND((3959 * acos(cos(radians(IFNULL(P1.LAT, 0))) * cos(radians(IFNULL(P2.LAT, 0))) * cos(radians(IFNULL(P2.LNG, 0)) - radians(IFNULL(P1.LNG, 0))) + sin(radians(IFNULL(P1.LAT, 0))) * sin(radians(IFNULL(P2.LAT, 0))))),3) AS DISTANCE
However this is very costly operation and you will definitely have performance issues when the data grows. Maintaining a polygon also might be difficult as you have to update the polygon for each new store addition and the update process will exponentially slow down when data grows.
If it is not really a need to have this on a RDMBS please consider about using other technology like elasticsearch which natively support this kind of operations. Please refer https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-queries.html

Related

Improve performance using geolocation to sort by distance

I have to build the structure of a posts table to handle a big number of data (let's say, 1 million of rows) with with notably those two fields:
latitude
longitude
What I'd like to do is optimise the time consumed by read queries, when sorting by distance.
I have chosen this type: decimal (precision: 10, scale: 6), thinking it is more precise than float, and relevant.
Would it be appropriate to add an index on latitude and an index on longitude?
I'm always scared watching all the operation, such as SIN(), that ORM are performing to build such queries. I'd like to follow the best practices, to be sure it will scale, event with a lot of rows.
Note: If a general solution is not possible, let's say the database is MySQL.
Thanks.
INDEX(latitude) will help some. But to make it significantly faster you need complicated data structure and code. See my blog .
In there, I point out that 6 decimal places is probably overkill in resolution, unless you are trying to distinguish two persons standing next to each other.
There is also reference code that includes the trigonometry to handle great circle distances.

How would I use DynamoDB to move this usage from my mysql db to nosql?

I'm currently experiencing issues with a service I've developed that relies heavily on large payload reads from the db (500 rows). I'm seeing huge throughput, in the range of 35,000+ requests per minute for up to 500 rows per request going through the DB and it is not handling the scaling at all.
The data in question is retrieved primarily on a latitude / longitude where statement that checks if the latitude and longitude of the row can be contained within a minimum latitude longitude coordinate, and a maximum latitude longitude coordinate. This is effective checking if the row in question is within the bounding box created by the min / max passed into the where.
This is the where portion of the query we rely on for reference.
s.latitude > {minimumLatitude} AND
s.longitude > {minimumLongitude} AND
s.latitude < {maximumLatitude} AND
s.longitude < {maximumLongitude}
SO, with that said. MySQL is handling this find, I'm presently on RDS and having to rely heavily on an r3.8XL master, and 3 r3.8XL reads just to get the throughput capacity I need to prevent the application from slowing down and throwing the CPU into 100% usage.
Obviously, with how heavy the payload is and how frequently it's queried this data needs to be moved into a more fitting service. Something like Elasticache's services or DynamoDB.
I've been leaning towards DynamoDB, but my only option here seems to be using SCAN as there is no useful primary key I can associate on my data to reduce the result set as it relies on calculating if the latitude / longitude of a point is within a bounding box. DynamoDB filters on attributes would work great as they support the basic conditions needed, however on a table that would be 250,000+ rows and growing by nearly 200,000 a day or more would be unusably expensive.
Another option to reduce the result set was to use a Map Binning technique to associate a map region with the data, and reduce on that in dynamo as the primary key and then further filter down on the latitude / longitude attributes. This wouldn't be ideal though, we'd prefer to get data within specific bounds and not have excess redundant data passed back as the min/max lat/lng can overlap multiple bins and would then pull data from pins that a majority may not be needed for.
At this point I'm continuously having to deploy read replicas to keep the service up and it's definitely not ideal. Any help would be greatly appreciated.
You seem to be overlooking what seems like it would be the obvious first thing to try... indexing the data using an index structure suited to the nature of the data... in MySQL.
B-trees are of limited help since you still have to examine all possible matches in one dimension after eliminating impossible matches in the other.
Aside: Assuming you already have an index on (lat,long), you will probably be able to gain some short-term performance improvement by adding a second index with the columns reversed (long,lat). Try this on one of your replicas¹ and see if it helps. If you have no indexes at all, then of course that is your first problem.
Now, the actual solution. This requires MySQL 5.7 because before then, the feature works with MyISAM but not with InnoDB. RDS doesn't like it at all if you try to use MyISAM.
This is effective checking if the row in question is within the bounding box created by the min / max passed into the where.
What you need is an R-Tree index. These indexes actually store the points (or lines, polygons, etc.) in an order that understands and preserves their proximity in more than one dimension... proximate points are closer in the index and minimum bounding rectangles ("bounding box") are easily and quickly identified.
The MySQL spatial extensions support this type of index.
There's even an MBRContains() function that compares the points in the index to the points in the query, using the R-Tree to find all the points contained in thr MBR you're searching. Unlike the usual optimization rule that you should not use column names as function arguments in the where clause to avoid triggering a table scan, this function is an exception -- the optimizer does not actually evaluate the function against every row but uses the meaning of the expression to evaluate it against the index.
There's a bit of a learning curve needed in order to understand the design of the spatial extensions but once you understand the principles, it falls into place nicely and the performance will exceed your expectations. You'll want a single column of type GEOMETRY and you'll want to store lat and long together in that one indexed column as a POINT.
To safely test this without disruption, make a replica, then detach it from your master, promoting it to become its own independent master, and upgrade it to 5.7 if necessary. Create a new table with the same structure plus a GEOMETRY column and a SPATIAL KEY, then populate it with INSERT ... SELECT.
Note that DynamoDB scan is a very "expensive" operation. On a table I was testing against just yesterday, a single scan consistently cost 112 read units each time it was run, regardless of the number of records, presumably because a scan always reads 1MB of data, which is 256 blocks of 4K (definition of a read unit) but not with strong consistency (so, half the cost). 1 MB ÷ 4KB ÷ 2 = 128 which I assume is close enough to 112 that this explains that number.
¹ It's a valid, supported operation to add an index to a MySQL replica but not the master, even in RDS. You need to temporarily make the replica writable by creating a new parameter group identical to the existing one, and then flipping read_only to 0 in that group. Associate the replica to the new parameter group, then wait for the state to change from applying to in-sync, log in to the replica and add the index. Then put the parameter group back when done.

Storing millions of 3D coordinates in MySQL - bad idea?

All-
So I need to store 3D positions (x, y, z) associated with objects in a video game.
I'm curious, is this a terrible idea? The positions are generated quite frequently, and may vary some.
I basically would ONLY like to store the position in my database if it's not within a yard of a position already stored.
I was basically selecting the existing positions for an object in the game (by object_id, object_type, continent and game_version), looping through, and calculating the distance using PHP. If It was > 1, I would insert it.
Now that i'm at about 7 million rows (obviously not for the same object), this isn't efficient and the server I'm using is coming to a crawl.
Does anyone have any ideas on how I could better store this information? I'd prefer it be in MySQL somehow.
Here is the structure of the table:
object_id
object_type (like unit or game object)
x
y
z
continent (an object can be on more than one continent)
game_version (positions can vary based on the game version)
Later when I need to access the data, I basically only query it by object_id, object_type, continent, and game_version (so I have an index on these 4)
Thanks!
Josh
Presumably objects on different continents are considered infinitely far apart. Also you haven't disclosed the units you're using in your table. I'll assume inches (of which there are 36 in a yard).
So, before you insert a point you need to determine whether you're within a yard. To do this you're going to need either the MySQL geo extension (which you can go read about) or separate indexes on at least your x and y columns, and maybe the z column.
Are there any points within a yard? This query will get you whether there are any points within the bounding box of +/- one yard around your new point. A 'nearby' result of one or more means you shouldn't insert the new point.
SELECT COUNT(*) nearby
FROM table t
WHERE t.x between (?xpos - 36) AND (?xpos + 36)
AND t.y between (?ypos - 36) AND (?ypos + 36)
AND t.z between (?zpos - 36) AND (?zpos + 36)
AND t.continent = ?cpos
If you need the query to work with Cartesian distances rather than bounding boxes you can add a sum-of-squares distance computation. But I suspect bounding boxes will work just fine for your app, and be much more efficient than repeatedly fetching 75-row result sets to do proximity testing in your application.
Conceptually it wouldn't be much harder to create a stored procedure for MySQL that would conditionally insert the new row only if it met the proximity criteria. That way you'd have a simple one-way transaction rather than server back-and-forth.
It may be killing your server because of the continuous activity on the disk that could be fixed by having mysql work in memory, add: ENGINE = MEMORY; on your table def.

MySQL Lat/Lon radius search

I have a table with zipcode(int) and Location(point). I'm looking for a MySql query or function. Here is an example of the date. I'd like to return 100 miles.
37922|POINT(35.85802 -84.11938)
Is there an easy query to achieve this?
Okay so I have this
select x(Location), Y(Location) FROM zipcodes
This will give me my two points, but how do i figure out whats within a distance of x/y?
The query to do this is not too hard, but is slow. You would want to use the Haversine formula.
http://en.wikipedia.org/wiki/Haversine_formula
Converting that to SQL should not be too difficult, but calculating the distance for every record in a table gets costly as the data set increases.
The work can be significantly reduced by using a geohash function to limit the locus of candidate records. If accuracy is important, the Haversine formula can be applied to the records inside a geohash region.
If the mysql people never completed their GIS and Spatial extension, consider using ElasticSearch or MongoDB.
There is a pretty complete discussion here:
Formulas to Calculate Geo Proximity

Most efficient way to get points within radius of a point with sql server spatial

I am trying to work out the most efficient query to get points within a radius of a given point. The results do not have to be very accurate so I would favor speed over accuracy.
We have tried using a where clause comparing distance of points using STDistance like this (where #point and v.GeoPoint are geography types):
WHERE v.GeoPoint.STDistance(#point) <= #radius
Also one using STIntersects similar to this:
WHERE #point.STBuffer(#radius).STIntersects(v.GeoPoint) = 1
Are either of these queries preferred or is there another function that I have missed?
If accuracy is not paramount then using the Filter function might be a good idea:
http://msdn.microsoft.com/en-us/library/cc627367.aspx
This can i many cases be orders of magnitude faster because it does not do the check to see if your match was exact.
In the index the data is stored in a grid pattern, so how viable this approach is probably depends on your spatial index options.
Also, if you don't have to many matches then doing a filter first, and then doing a full intersect might be viable.