use st_within to get the locations in a radius/circle - mysql

I have a table that looks like this
| id | name | latitude | langitude | costLat | sinLat | cosLng | sinLng |
| 1 | place 1 | 2.942743912327621 | 101.79377630352974 | 0.99868133582304 | 0.051337992546461 | -0.20438972214917 | 0.97888959616485 |
Referring to this article, it seems like a good idea to use st_within in order for me to search for locations within 5 km radius from a given latitute and langitude in my table above. But I totally have no idea how to do that.
The table is MyISAM, MySQL version 5.6
Sorry for not being clear on what I tried. From the documentation it mentions that
ST_Within(g1,g2)
Returns 1 or 0 to indicate whether g1 is spatially within g2.
So my understanding is, we need to pass 2 params to ST_Within. Sound simple enough, but when I looked at the sample query in the linked articles, it does (*note: I changed shape to CIRCLE in the query, as my assumption is my shape is CIRCLE because I'm searching for radius)
set #lat= 37.615223;
set #lon = -122.389979;
set #dist = 10;
set #rlon1 = #lon-#dist/abs(cos(radians(#lat))*69);
set #rlon2 = #lon+#dist/abs(cos(radians(#lat))*69);
set #rlat1 = #lat-(#dist/69);
set #rlat2 = #lat+(#dist/69);
SELECT ASTEXT("CIRCLE"), NAME FROM location_final
WHERE st_within("CIRCLE", ENVELOPE(LINESTRING(POINT(#rlon1, #rlat1), POINT(#rlon2, #rlat2))))
ORDER BY st_distance(POINT(#lon, #lat), "CIRCLE") LIMIT 10;
So looking at the query above, my confusion is, where do the comparison between the latitude and langitude happens? Where in the query should I mention about my column latitude and langitude?
Looking at the output at the given link, it display something like
+--------------------------------+-------------------------------+
| astext(shape) | name |
+--------------------------------+-------------------------------+
| POINT(-122.3890954 37.6145378) | Tram stop:Terminal A |
| POINT(-122.3899 37.6165902) | Tram stop:Terminal G |
Where do the POINT come from?

Related

How to merge two tables in a mysql query

First of sorry I know mysql is not recommended any more but in this case I have no control and have to use it.
Now onto the question.
I have two tables
Games and videos
Inside games I have
| id | gameID | GameTitle |
| 1 | 1 | Halo ODST |
| 2 | 2 | Disgaea 4 |
Inside videos I have
| id | game | videoTitle | image |
| 1 | 1 | Title 1 | PATH |
| 2 | 1 | Title 2 | PATH |
| 3 | 2 | Title 3 | PATH |
| 4 | 1 | Title 4 | PATH |
I need to basically do the following
Select x,y,z from video where videos.game = games.gameID
which will basically read
select id, videoTitle, image from videos where video.game = 1
(or some other numeric value)
I’m aware I have to use a join however nothing I have tried appears to be working and yeah I’m getting nowhere with this.
The closest I am is the below query which says it works but is returning an empty result set so clearly its wrong somewhere.
SELECT * FROM `games` INNER JOIN `videos` on `game` WHERE `game` = 1
If its any help I'm using phpmyadmins sql query tool rather than actual code at this stage as i just want to get it working before coding it.
Any help is greatly appreciated.
Thanks.
SELECT *
FROM `games` g
INNER JOIN
`videos` v
ON v.game = g.gameId
WHERE g.gameId = 1
you can use view to merge table and can perform operations on that..
create view view_name as
select Games.id,gameID, GameTitle,videos.id, game, videoTitle, image
from Games,videos
where videos.game = games.gameID
where section will contain ids or anything you want to match

SQL Server 2008: Spatial Query - Return 5 closest sites

I'm trying to create an origin/destination matrix selection with SQL Server 2008. I want to find the closest 5 sites to any given site.
The matrix should include the origin ID, Destination ID and the distance between the two. So far I have managed to get something working for one site, but I want to loop through every row in my table. I've hit a wall in working out how to do this, could anybody help? I only want to return a destination if they are within 2.5km of the origin site.
The working code for my one origin site is below (I want the same output, but including all rows as an origin):
SP_Geometry is my geography column (MapInfo names this column by default when using EasyLoader)
DECLARE #Point1 AS Geography
DECLARE #Point1ID AS Nvarchar (255)
SELECT #Point1=SP_Geometry FROM SitesTable WHERE Label = 'ID1'
SELECT #Point1ID = Label FROM SitesTable WHERE Label = 'ID1'
SELECT TOP 5
#Point1ID AS Origin
,#Point1 AS Origin_SP_Geometry
,#Point1.STDistance(SP_Geometry) AS Distance
,#Label AS Destination
,SP_Geometry AS Destination_SP_Geometry
FROM SiteTable
WHERE #Point1.STDistance(SP_Geometry) <2500
ORDER BY #Point1.STDistance(SP_Geometry)
Running the above results in the following selection:
+--------+---------------------+-------------+-------------+----------------------------+
| Origin | Origin_SP_GEOMETRY | Distance | Destination | Destination_SP_GEOMETRY |
+--------+---------------------+-------------+-------------+----------------------------+
| ID1 | 0xE6100000010CDD(…) | 0 | ID1 | 0xE6100000010CDD772D9D(…) |
| ID1 | 0xE6100000010CDD(…) | 395.7739586 | ID867 | 0xE6100000010C2466CDFA5(…) |
| ID1 | 0xE6100000010CDD(…) | 407.6394398 | ID2500 | 0xE6100000010C6FBC54(…) |
| ID1 | 0xE6100000010CDD(…) | 1033.827269 | ID91 | 0xE6100000010C3981C0353(…) |
| ID1 | 0xE6100000010CDD(…) | 1082.667065 | ID1540 | 0xE6100000010CD03BFCD2(…) |
+--------+---------------------+-------------+-------------+----------------------------+
Ideally this is exactly what I want, but am having trouble establishing any kind of loop (that would union origin ID2, ID3 etc.)
Any help would be much appreciated!
Try using the RANK function. I can't test since I don't have sample data so this might need a tweak but should be close...
;WITH cteDistances AS (
SELECT
origin.Label AS OriginId
,dest.Label AS DestinationId
,origin.SP_Geometry.STDistance(dest.SP_Geometry) AS Distance
,RANK() OVER (PARTITION BY origin.Label ORDER BY origin.SP_Geometry.STDistance(dest.SP_Geometry)) AS DistanceRank
FROM
SiteTable origin
INNER JOIN SiteTable dest ON (dest.Label <> origin.Label)
WHERE
origin.SP_Geometry.STDistance(dest.SP_Geometry) < 2500
)
SELECT
OriginId, DestinationId, Distance
FROM
cteDistances
WHERE
DistanceRank <= 5

MySQL select to find similar lat/lng with matching name column

I am trying to find rows in a single table of locations that have the same latitude/longitude when rounded to 2 decimal places as well as the same name. Here is my table (for example):
+---------------------------------------+
| ID | lat | lng | name |
+---------------------------------------+
| 11 | -11.119 | 13.891 | Smith's Place |
| 81 | -11.121 | 13.893 | Smith's Place |
+---------------------------------------+
What SELECT statement would find instances (like the one above) where the lat/lng match when rounded to 2 decimal places...and the names are the same?
I am looking for something similar to this query that obviously doesn't work (but is asking for what I am after):
SELECT * FROM pb_locations GROUP BY ROUND(lat,2),ROUND(lng,2) WHERE name = name HAVING count(ID) > 1
WHERE name = name is always true, since it's just comparing within the same row, not across different rows.
You need to put all 3 columns in the GROUP BY clause.
SELECT *
FROM pb_locations
GROUP BY ROUND(lat, 2), ROUND(lng, 2), name
HAVING COUNT(*) > 1

How can I optimize this stored procedure?

I need some help optimizing this procedure:
DELIMITER $$
CREATE DEFINER=`ryan`#`%` PROCEDURE `GetCitiesInRadius`(
cityID numeric (15),
`range` numeric (15)
)
BEGIN
DECLARE lat1 decimal (5,2);
DECLARE long1 decimal (5,2);
DECLARE rangeFactor decimal (7,6);
SET rangeFactor = 0.014457;
SELECT `latitude`,`longitude` into lat1,long1
FROM world_cities as wc WHERE city_id = cityID;
SELECT
wc.city_id,
wc.accent_city as city,
s.state_name as state,
c.short_name as country,
GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) as dist
FROM world_cities as wc
left join states s on wc.state_id = s.state_id
left join countries c on wc.country_id = c.country_id
WHERE
wc.`latitude` BETWEEN lat1 -(`range` * rangeFactor) AND lat1 + (`range` * rangeFactor)
AND wc.`longitude` BETWEEN long1 - (`range` * rangeFactor) AND long1 + (`range` * rangeFactor)
AND GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) <= `range`
ORDER BY dist limit 6;
END
Here is my explain on the main portion of the query:
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | B | range | idx_lat_long | idx_lat_long | 12 | NULL | 7619 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | civilipedia.B.state_id | 1 | |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 1 | civilipedia.B.country_id | 1 | Using where |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
3 rows in set (0.00 sec)
Here are the indexes:
mysql> show indexes from world_cities;
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| world_cities | 0 | PRIMARY | 1 | city_id | A | 3173958 | NULL | NULL | | BTREE | |
| world_cities | 1 | country_id | 1 | country_id | A | 23510 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | city | 1 | city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city | 1 | accent_city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_pop | 1 | population | A | 28854 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 1 | latitude | A | 1057986 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 2 | longitude | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city_2 | 1 | accent_city | NULL | 1586979 | NULL | NULL | YES | FULLTEXT | |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (0.01 sec)
The function you see in the query I wouldn't think would cause the slow down, but here is the function:
CREATE DEFINER=`ryan`#`%` FUNCTION `GetDistance`(lat1 numeric (9,6),
lon1 numeric (9,6),
lat2 numeric (9,6),
lon2 numeric (9,6) ) RETURNS decimal(10,5)
BEGIN
DECLARE x decimal (20,10);
DECLARE pi decimal (21,20);
SET pi = 3.14159265358979323846;
SET x = sin( lat1 * pi/180 ) * sin( lat2 * pi/180 ) + cos(
lat1 *pi/180 ) * cos( lat2 * pi/180 ) * cos( (lon2 * pi/180) -
(lon1 *pi/180)
);
SET x = atan( ( sqrt( 1- power( x, 2 ) ) ) / x );
RETURN ( 1.852 * 60.0 * ((x/pi)*180) ) / 1.609344;
END
As far as I can tell there is not something directly wrong with your logic that would make this slow, so the problems ends up being that you can't use any indexes with this query.
MySQL needs to do a full table scan and apply the functions of your WHERE clause to each row to determine if it passed the conditions. Currently there's 1 index used: idx_lat_long.
It's a bit of a bad index, the long portion will never be used, because the lat portion is a float. But at the very least you managed to effectively filter out all rows that are outside the latitude range. But it's likely.. these are still a lot though.
You'd actually get slightly better results on the longitude, because humans only really live in the middle 30% of the earth. We're very much spread out horizontally, but not really vertically.
Regardless, the best way to further minimize the field is to try to filter out as many records in the general area. Right now it's a full vertical strip on the earth, try to make it a bounding box.
You could naively dice up the earth in say, 10x10 segments. This would in a best case make sure the query is limited to 10% of the earth ;).
But as soon as your bounding box exceeds to separate segments, only the first coordinate (lat or lng) can be used in the index and you end up with the same problem.
So when I thought of this problem I started thinking about this differently. Instead, I divided up the earth in 4 segments (lets say, north east, north west, south east, south west on map). So this gives me coordinates like:
0,0
0,1
1,0
1,1
Instead of putting the x and y value in 2 separate fields, I used it as a bit field and store both at once.
Then every 1 of the 4 boxes I divided up again, which gives us 2 sets of coordinates. The outer and inner coordinates. I'm still encoding this in the same field, which means we now use 4 bits for our 8x8 coordinate system.
How far can we go? If we assume a 64 bit integer field, it means that 32bit can be used for each of the 2 coordinates. This gives us a grid system of 4294967295 x 4294967295 all encoded into one database field.
The beauty of this field is that you can index it. This is sometimes called (I believe) a Quad-tree. If you need to select a big area in your database, you just calculate the 64bit top-left coordinate (in the 4294967295 x 4294967295 grid system) and the bottom-left, and it's guaranteed that anything that lies in that box, will also be within the two numbers.
How do you get to those numbers. Lets be lazy and assume that both our x and y coordinate have range from -180 to 180 degrees. (The y coordinate of course is half that, but we're lazy).
First we make it positive:
// assuming x and y are our long and lat.
var x+=180;
var y+=180;
So the max for those is 360 now, and (4294967295 / 360 is around 11930464).
So to convert to our new grid system, we just do:
var x*=11930464;
var y*=11930464;
Now we have to distinct numbers, and we need to turn them into 1 number. First bit 1 of x, then bit 1 of y, bit 2 of x, bit 2 of y, etc.
// The 'morton number'
morton = 0
// The current bit we're interleaving
bit = 1
// The position of the bit we're interleaving
position = 0
while(bit <= latitude or bit <= longitude) {
if (bit & latitude) morton = morton | 1 << (2*position+1)
if (bit & longitude) morton = morton | 1 << (2*position)
position += 1
bit = 1 << position
}
I'm calling the final variable 'morton', the guy who came up with it in 1966.
So this leaves us finally with the following:
For each row in your database, calculate the morton number and store it.
Whenever you do a query, first determine the maximum bounding box (as the morton number) and filter on that.
This will greatly reduce the number of records you need to check.
Here's a stored procedure I wrote that will do the calculation for you:
CREATE FUNCTION getGeoMorton(lat DOUBLE, lng DOUBLE) RETURNS BIGINT UNSIGNED DETERMINISTIC
BEGIN
-- 11930464 is round(maximum value of a 32bit integer / 360 degrees)
DECLARE bit, morton, pos BIGINT UNSIGNED DEFAULT 0;
SET #lat = CAST((lat + 90) * 11930464 AS UNSIGNED);
SET #lng = CAST((lng + 180) * 11930464 AS UNSIGNED);
SET bit = 1;
WHILE bit <= #lat || bit <= #lng DO
IF(bit & #lat) THEN SET morton = morton | ( 1 << (2 * pos + 1)); END IF;
IF(bit & #lng) THEN SET morton = morton | ( 1 << (2 * pos)); END IF;
SET pos = pos + 1;
SET bit = 1 << pos;
END WHILE;
RETURN morton;
END;
A few caveats:
The absolute worst case scenario will still scan 50% of your entire table. This chance is extremely low though, and I've seen absolutely significant performance increases for most real-world queries.
The bounding box in this case assumes a Eucllidean space, meaning.. a flat surface. In reality your bounding boxes are not exact squares, and they warp heavily when getting closer to the poles. By just making the boxes a bit larger (depending on how exact you want to be) you can get quite far. Most real-world data is also often not close to the poles ;). Remember that this filter is just a 'rough filter' to get the most of the likely unwanted rows out.
This is based on a so-called Z-Order curve. To get even better performance, if you're feeling adventurous.. you could try to go for the Hilbert Curve instead. This curve oddly rotates, which ensures that in a worst case scenario, you will only scan about 25% of the table.. Magic! In general this one will also filter much more unwanted rows.
Source for all this: I wrote 3 blogposts about this topic when I came to the same problems and tried to creatively get to a solution. I got much better performance with this compared to MySQL's GEO indexes.
http://www.rooftopsolutions.nl/blog/229
http://www.rooftopsolutions.nl/blog/230
http://www.rooftopsolutions.nl/blog/231

selecting specific row number from select

I'm a beginner at MySQL syntax. So there are a few question I want to ask.
I got a clue DB where users can add in clues. And have a webmethod that select a range of numbers to do random function. (random for the sake of the game, no point doing same clue over and over right?)
But my main problem right now is that what if the author decided to add in more clues?
then my clue db will be looking like this.
+--------+-------------+-----------+--------+
| cID | clueDetails | location | author |
+--------+-------------+-----------+--------+
| 1 | abcde | loc 1 | auth 1 |
| 2 | efghi | loc 1 | auth 1 |
| 3 | jklmno | loc 2 | auth 1 |
| 4 | pqrstu | loc 2 | auth 1 |
| 5 | vwxyz | loc 1 | auth 1 |
+--------+-------------+-----------+--------+
If the player select loc1 auth 1, it will be showing cID 1,2 and 5. so I couldn't use my random function effectively as it select the first and last of loc and auth and 3 and 4 doesnt fit in. I know right now it's very vague as information are scarce. And to actually understand the whole process, goes right down to the game, and the method/function I have. (which will be very long)
Cutting to the chase, my result will be something as shown below, and the way to identify it will be by cID, but in the event that clue were added in different order ( as shown above) then my function will get rather screw up.
EDIT: assuming this random function give me back 2 clues, because I want to play 2 clues. this random function give me back 1 and 3. so from the table result below, 1 and 3 will give me cID1 and cID5 as they are row number 1 and 3. (sorry for the confusion caused)
+--------+-------------+-----------+--------+
| cID | clueDetails | location | author |
+--------+-------------+-----------+--------+
| 1 | abcde | loc 1 | auth 1 |
| 2 | efghi | loc 1 | auth 1 |
| 5 | vwxyz | loc 1 | auth 1 |
+--------+-------------+-----------+--------+
So with that, I want to ask if can we select row by its number? e.g row[3] = cID 5, vwxyz, loc 1, auth 1.
As far as I'm concerned, I've done massive research and there doesn't seem to be any function in MySQL that allow us to select by row number. (though all the article were pretty old dated, 2010 and before. Not sure if MySQL has added in any new function)
I saw a SO thread - MySQL - Get row number on select and from how I see it, it seems to be generating a field called ranking.
What I want to know is, is this field ranking temp or permanent? Because if it's just a temp field, then I could shift the identifier from cID to this numbering.
Or do any of you have any suggestion to go around solving this issue? I thought of clearing the db, and re create the db, but that will be taking too much time. And over time when the DB get large it will be slower as well. And another method is to make a datatable to fill all the current clue where loc=?loc and auth=?auth and add them in again with the new clue(latest), but i figure that will cause the cID to boom and fly at a very fast rate. And I'm afraid this will cause memory management issue / memory leak.
EDIT2: As the create field is just a temp field, and seem to be the only alternative, I tried this MySQL command.
set #rank=0;
select #rank:=#rank+1 AS rank, cId, clueDetails, location, author from tbl_clue where location = "loc" and author = "auth" order by rank ASC
It seem to display what I want, but my command seem different from what other usually give. (more bracket and other stuff). Is my command ok? will there be any indirect implication caused by it?
You can try this one. Please add a comment if this helps :)
SELECT cID, clueDetails, location, author
FROM
(
SELECT #rownum := #rownum + 1 as `RowNo`,
p.cID,
p.clueDetails,
p.location,
p.author
FROM (
SELECT cID, clueDetails, location, author
FROM myTableName
WHERE location = 'loc 1' AND author = 'auth 1'
) p , (SELECT #rownum:=0) r
) y
WHERE y.RowNo = 3
ORDER BY RowNo
I'm not sure if I understand you correctly, but assuming you end up with:
+--------+-------------+-----------+--------+
| cID | clueDetails | location | author |
+--------+-------------+-----------+--------+
| 1 | abcde | loc 1 | auth 1 |
| 2 | efghi | loc 1 | auth 1 |
| 5 | vwxyz | loc 1 | auth 1 |
+--------+-------------+-----------+--------+
and you only want one record at random instead of 3 records you could do the following:
$query = "THE QUERY";
if ($result = $dbc->query($query))
{
$num_rows = mysql_num_rows($result);
$random_number = rand(1, $num_rows);
$count = 1;
while($nt = $result->fetch_assoc())
{
if ($count = $random_number)
{
//SAVE THE CLUE DETAILS
}
$count = $count + 1;
}
}