MySQL Levenshtein distance as a query instead of UDF - mysql

I have a case where I need to calculate the Levenshtein distance between two columns for a row in MySQL. There are UDF's available for this, but I need to do this without a UDF. The reason for this is that I am using MemSQL, which is an extremely fast in-memory database, but does not support UDFs - but it does support nearly any query you can run in MySQL. Is anyone aware of a non-udf implementation of the Levenshtein distance algorithm as a query? Something like the following UDF:
http://www.artfulsoftware.com/infotree/qrytip.php?id=552
I'm working on converting this myself as well. I'm open to other solutions as well (aka, other ways to make this happen in MemSQL).
Note: I cannot using Hamming distance. That would be simpler, but the use case calls for Levenshtein distance.

Related

Geospatial Clustering using Presto

I have Lat/Lons as POINT coordinates and want to cluster them based on location with presto. For now, I am rounding the lat/lons to 2 decimals and converting them to strings, concatening them and finally grouping by. But this way I lose the information of individual points. Is there any good and clean way to do this (like may be ST_Cluster* functions in postgis) using presto?
Trino (formerly known as Presto SQL) seems to have https://trino.io/docs/current/functions/geospatial.html
geospatial functions, but nothing equivalent to st_cluster as you asked for. Probably, you may use function like ST_Distance for concatenation instead of converting to decimal, strings..
Though not as clean as directly using st_cluster, but a workaround to create clustering like behavior using existing geospatial functions

Do I really need to use MySQL Spatial Functions?

Well I want your opinions about this case:
I need a database that will have... two or three tables at most, one of them will have points (latitude, longitude) and some other info.
It's really simple what I need: Get the points within a given radius.
I'm not asking how to do it (but any advice is more than welcome, specially if it's about good practices), I want to know if making use of the MySQL's spatial support would help. Since what I need is fairly easy to get with just one query, what I expect by using Spatial support is to increase performance.
So, are the spatial indexes going to help noticeably? I don't think the table will store that many points. I'd say no more than 200.
If it's really only 200 points, I recommend you do without: This makes it much easier to write portable SQL (which I consider an important thing).
Write your SQL so, that first longitued and latitude are checked against the precalculated mins and maxes (giving you a rectangle), then check for the radius. This way, you will only need to calculate the radius without finally selecting the point for 1/pi of the result set.
I personally consider this an acceptable tradeof against writing SQL, that could if must be executed against SQlite or whatever.

MySQL equivalent to MSSQL Geography functions

I couldn't find anything on this, hence the new thread.
We have an application where the data is stored in SQL Server. Some tables have columns of the type "Geography". We use the SQL-Server function STDistance to filter out data within a specified distance. Now we are researching a little on converting the application to PHP for different reasons. One of the heaviest reasons is the cost of ASP.Net and SQL-Server. Now i can't seem to find anything on how MySQL handles Geography-datatype, am i right it doesn't exist?
Isn't it possible to create own functions in MySQL? I thought i could create simple function that calculates whether a location is within the desired radius. What would be the most efficient way of doing this? Of course i could calculate for each row if the coordinates is within the radius, but that feels inefficient and not like a very scalable solution. I was thinking that i first would select all the rows where x1>lat>x2 and y1>lon>y2 and then do the "heavy calculation".
What would be the best way of doing this?

MySQL Lat/Lon radius search

I have a table with zipcode(int) and Location(point). I'm looking for a MySql query or function. Here is an example of the date. I'd like to return 100 miles.
37922|POINT(35.85802 -84.11938)
Is there an easy query to achieve this?
Okay so I have this
select x(Location), Y(Location) FROM zipcodes
This will give me my two points, but how do i figure out whats within a distance of x/y?
The query to do this is not too hard, but is slow. You would want to use the Haversine formula.
http://en.wikipedia.org/wiki/Haversine_formula
Converting that to SQL should not be too difficult, but calculating the distance for every record in a table gets costly as the data set increases.
The work can be significantly reduced by using a geohash function to limit the locus of candidate records. If accuracy is important, the Haversine formula can be applied to the records inside a geohash region.
If the mysql people never completed their GIS and Spatial extension, consider using ElasticSearch or MongoDB.
There is a pretty complete discussion here:
Formulas to Calculate Geo Proximity

Most efficient way to get points within radius of a point with sql server spatial

I am trying to work out the most efficient query to get points within a radius of a given point. The results do not have to be very accurate so I would favor speed over accuracy.
We have tried using a where clause comparing distance of points using STDistance like this (where #point and v.GeoPoint are geography types):
WHERE v.GeoPoint.STDistance(#point) <= #radius
Also one using STIntersects similar to this:
WHERE #point.STBuffer(#radius).STIntersects(v.GeoPoint) = 1
Are either of these queries preferred or is there another function that I have missed?
If accuracy is not paramount then using the Filter function might be a good idea:
http://msdn.microsoft.com/en-us/library/cc627367.aspx
This can i many cases be orders of magnitude faster because it does not do the check to see if your match was exact.
In the index the data is stored in a grid pattern, so how viable this approach is probably depends on your spatial index options.
Also, if you don't have to many matches then doing a filter first, and then doing a full intersect might be viable.