Selecting Unique records - mysql

I'm working on a Geo/Spatial search where I'm looking for nearby points. I have this Haversine query being run against a table:
SELECT
uid, adrLat, adrLng,
round(3956 * 2 * ASIN(SQRT(POWER(SIN((39.97780609 - abs(adrLat)) * pi() / 180 / 2), 2) + COS(39.97780609 * pi() / 180) * COS(abs(adrLat) * pi() / 180) * POWER(SIN((-105.25861359 - adrLng) * pi() / 180 / 2), 2))), 2) AS distance
FROM dataPoints
WHERE adrLng BETWEEN -105.2680699902 AND -105.2491571898
AND adrLat BETWEEN 39.970559713188 AND 39.985052466812
HAVING distance <= 0.30 and distance > 0.00
ORDER BY distance;
This would give me a result much like this:
+-----+-------------+---------------+----------+
| uid | adrLat | adrLng | distance |
+-----+-------------+---------------+----------+
| 191 | 39.97764587 | -105.25627136 | 0.12 |
| 520 | 39.97746658 | -105.25627136 | 0.13 |
| 265 | 39.97560120 | -105.25814056 | 0.15 |
| 266 | 39.97560120 | -105.25814056 | 0.15 |
| 274 | 39.97710037 | -105.25589752 | 0.15 |
| 98 | 39.97764969 | -105.26172638 | 0.17 |
| 576 | 39.97967911 | -105.25613403 | 0.18 |
| 575 | 39.97967911 | -105.25613403 | 0.18 |
| 469 | 39.97895813 | -105.25386810 | 0.26 |
| 470 | 39.97895813 | -105.25386810 | 0.26 |
| 1 | 39.98003006 | -105.25471497 | 0.26 |
| 383 | 39.97621155 | -105.26350403 | 0.28 |
| 431 | 39.97459793 | -105.25507355 | 0.29 |
| 430 | 39.97459793 | -105.25507355 | 0.29 |
| 429 | 39.97459793 | -105.25507355 | 0.29 |
| 428 | 39.97459793 | -105.25507355 | 0.29 |
+-----+-------------+---------------+----------+
However, as you can probably tell, some records are duplicated in the table (that's the way the data is provided to me, and I have to retain it that way.) 265:266, 576:575, 469:470, and 431-428 are all duplicates.
Is there a way to modify the query to pick unique records only? It looks like I have to match adrLat and adrLng to filter duplicates out, but I'm not sure if I can do it all within the same query, or if I have to do some post processing on the result.

SELECT adrLat, adrLng,
round(3956 * 2 * ASIN(SQRT(POWER(SIN((39.97780609 - abs(adrLat)) * pi() / 180 / 2), 2) + COS(39.97780609 * pi() / 180) * COS(abs(adrLat) * pi() / 180) * POWER(SIN((-105.25861359 - adrLng) * pi() / 180 / 2), 2))), 2) AS distance
FROM mytable
WHERE adrLng BETWEEN -105.2680699902 AND -105.2491571898
AND adrLat BETWEEN 39.970559713188 AND 39.985052466812
GROUP BY
adrLat, adrLng
HAVING distance <= 0.30
AND distance > 0.00
ORDER BY
distance

SELECT DISTINCT colum_name FROM table
The SELECT keyword allows us to grab all information from a column (or columns) on a table. This, of course, necessarily mean that there will be redundancies. What if we only want to select each DISTINCT element? This is easy to accomplish in SQL. All we need to do is to add DISTINCT after SELECT. The syntax is as follows:

I still need all four columns returned
You've already got the unique data there (e.g. uid 576 and 575 return the same coordinates - but the uid is obviously different).
Your definition of 'unique' is obviously different from ours - can you provide an example of wht you expect to see?

Related

How do you sort a multiple column query (BY WEIGHT NOT ORDER) in an SQL Query?

To keep the question simple I'm going to give much simpler and abstract code than the actual SQL query.
At the moment I'm running two queries. The first is to get the MAX value of different SQL fields. For example... if table.likes is a column that we want to influence the order of the final results I'd get the max value MAX(table.likes) AS max_likes then get the ratio of each row to the max value.
(table.likes / max_table.max_likes) AS like_ratio
(table.comments / max_table.max_comments) AS comment_ratio
This gives me a nice range of [0, 1]. Then I can increase or decrease the importance of each row by including a scale: .3 for like_ratio and .2 for comment_ratio. So like_ratio becomes [.7, 1] and comment_ratio becomes [.8, 1].
((like_ratio * .3) + .7) * ((comment_ratio * .2) + .8) AS final_weight
This seems to work moderately well but I'm wondering if there's a better way to weigh multiple columns in the final result with MySQL as a sorting field. Sorting with comma separated values obviously doesn't work well because the same number of table.likes doesn't occur often and neglects the other columns like table.comments. I don't really like having to run one query to find all the max values then run the same query again to sort the values based off the max_table.
I've played around with the idea of using ATAN(table.likes) so that as table.likes increases, the weight gets closer and closer to 1. This doesn't seem to be ideal because anything past a certain threshold will become increasingly similar in value.
Is there a "meta" to how you should sort if multiple columns are important to the final sort order?
EDIT: EXAMPLE DATA
+---+------------+-------------+-----------------+
| | likes | comments | relevance |
+---+------------+-------------+-----------------+
| 1 | 6 | 1 | 40 |
| 2 | 2 | 12 | 37 |
| 3 | 12 | 24 | 12 |
+---+------------+-------------+-----------------+
First I select MAX(table.likes): 12, MAX(table.comments): 24, MAX(table.query_relevance): 40.
+---+------------+-------------+-----------------+
| | max_likes |max_comments | max_relevance |
+---+------------+-------------+-----------------+
| 1 | 12 | 24 | 40 |
+---+------------+-------------+-----------------+
Next I get the ratio of each row to it's relevant maximum. likes / max_likes: 6/12, likes / max_likes: 2/12, likes / max_likes: 12/12. The like for each column.
+---+------------+--------------+-----------------+
| |like_weight |comment_weight| relevance_weight|
+---+------------+--------------+-----------------+
| 1 | .5 | .04 | 1 |
+---+------------+--------------+-----------------+
| 2 | .16 | .5 | .92 |
+---+------------+--------------+-----------------+
| 3 | 1 | 1 | .3 |
+---+------------+--------------+-----------------+
Next, I apply some sort of scale to each field so that different fields have different weights.
+---+-------------+--------------+-----------------+
| |like_weight |comment_weight| relevance_weight|
+---+-------------+--------------+-----------------+
| 1 |.5 * .3 + .7 | .04 * .2 + .8| 1 * .4 + .6 |
+---+-------------+--------------+-----------------+
| 2 |.16 * .3 + .7| .5 * .2 + .8 | .92 * .4 + .6 |
+---+-------------+--------------+-----------------+
| 3 | 1 * .3 + .7 | 1 * .2 + .8 | .3 * .4 + .6 |
+---+-------------+--------------+-----------------+
+---+-------------+--------------+-----------------+
| |like_weight |comment_weight| relevance_weight|
+---+-------------+--------------+-----------------+
| 1 |.85 | .808 | 1 |
+---+-------------+--------------+-----------------+
| 2 |.748 | .9 | .968 |
+---+-------------+--------------+-----------------+
| 3 | 1 | 1 | .72 |
+---+-------------+--------------+-----------------+
Finally I multiply all these values together to get a final sort column...
+---+------------+
| |final_weight|
+---+------------+
| 1 | .6868 |
+---+------------+
| 2 | .6516 |
+---+------------+
| 3 | .72 |
+---+------------+

How to get nearest coordinates from database in mysql?

I have got a table with id,latitude (lat),longitude (lng),altitude (alt).
I have some coordinates and I would like to find the closest entry in the DB.
I used this but not yet working correctly:
SELECT lat,ABS(lat - TestCordLat), lng, ABS(lng - TestCordLng), alt AS distance
FROM dhm200
ORDER BY distance
LIMIT 6
I have a table with the 6 nearest points displaying me the lattitude, longtitude and altitude.
Query to get nearest distance in kilometer (km) from mysql:
SELECT id, latitude, longitude, SQRT( POW(69.1 * (latitude - 4.66455174) , 2) + POW(69.1 * (-74.07867091 - longitude) * COS(latitude / 57.3) , 2)) AS distance FROM ranks ORDER BY distance ASC;
You may wish to limit radius by HAVING syntax.
... AS distance FROM ranks HAVING distance < '150' ORDER BY distance ASC;
Example:
mysql> describe ranks;
+------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| latitude | decimal(10,8) | YES | MUL | NULL | |
| longitude | decimal(11,8) | YES | | NULL | |
+------------+---------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> SELECT id, latitude, longitude, SQRT( POW(69.1 * (latitude - 4.66455174) , 2) + POW(69.1 * (-74.07867091 - longitude) * COS(latitude / 57.3) , 2)) AS distance FROM ranks ORDER BY distance ASC;
+----+-------------+--------------+--------------------+
| id | latitude | longitude | distance |
+----+-------------+--------------+--------------------+
| 4 | 4.66455174 | -74.07867091 | 0 |
| 10 | 4.13510880 | -73.63690401 | 47.59647003096195 |
| 11 | 6.55526689 | -73.13373892 | 145.86590936973073 |
| 5 | 6.24478548 | -75.57050110 | 149.74731096011348 |
| 7 | 7.06125013 | -73.84928550 | 166.35723903407165 |
| 9 | 3.48835279 | -76.51532198 | 186.68173882319724 |
| 8 | 7.88475514 | -72.49432589 | 247.53456848808233 |
| 1 | 60.00001000 | 101.00001000 | 7156.836171031409 |
| 3 | 60.00001000 | 101.00001000 | 7156.836171031409 |
+----+-------------+--------------+--------------------+
9 rows in set (0.00 sec)
You will need to use the Haversine formula to calculate distances taking into account the latitude and longitude:
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
distance = R * c (where R is the radius of the Earth)
However, the altitude raises the difficulty of the problem. If between point A and point B, having different altitudes the road contains a lot of high altitude differences, then assuming that the altitude's line's derivative between the two points is unchanged might be misleading, not taking that into account at all might be very misleading. Compare the distance between a point in China and a point in India, having the Himalaja in between with the distance between two points on the surface of the Pacific ocean. A possibility would be to vary R to be the average of the altitudes for each comparisons, but in case of large distances this could be misleading, as discussed earlier.

How do I compare average runtime of two functions in MySQL?

I wanted to compare average runtime of two functions in MySQL -
Square distance: pow(x1 - x2, 2) + pow(y1 - y2, 2) + pow(z1 - z2, 2)
vs
Dot product: x1 * x2 + y1 * y2 + z1 * z2
Now, whichever function I choose is going to run around 50,000,000,000 times in a single query! So, even the tiniest of difference in their runtime matters.
So, I tried profiling. Here's what I got,
mysql> show profiles;
+----------+------------+-----------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+-----------------------------------------------------------------------+
| 4 | 0.00014400 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
| 5 | 0.00012800 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
| 6 | 0.00017000 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
| 7 | 0.00024800 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
| 8 | 0.00014400 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
| 9 | 0.00014000 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
| 10 | 0.00014900 | select pow(rand()-rand(),2)+pow(rand()-rand(),2)+pow(rand()-rand(),2) |
| 11 | 0.00015000 | select rand()*rand()+rand()*rand()+rand()*rand() |
| 12 | 0.00012000 | select rand()*rand()+rand()*rand()+rand()*rand() |
| 13 | 0.00015200 | select rand()*rand()+rand()*rand()+rand()*rand() |
| 14 | 0.00022500 | select rand()*rand()+rand()*rand()+rand()*rand() |
| 15 | 0.00012700 | select rand()*rand()+rand()*rand()+rand()*rand() |
| 16 | 0.00013200 | select rand()*rand()+rand()*rand()+rand()*rand() |
| 17 | 0.00013400 | select rand()*rand()+rand()*rand()+rand()*rand() |
| 18 | 0.00013800 | select rand()*rand()+rand()*rand()+rand()*rand() |
+----------+------------+-----------------------------------------------------------------------+
15 rows in set, 1 warning (0.00 sec)
This is not very helpful at all, runtimes fluctuate around so much that I have no clue which one is faster and by how much.
I need to run each of these functions like 10,000 times to get a nice and consistent average runtime. How do I accomplish this in MySQL?
(Note that rand() is called 6 times in both the functions so it's runtime doesn't really make a difference)
Edit:
Sure, I can create a temp table, it would be slightly inconvenient, fill it with random values, which again is not straight forward (see How do I populate a mysql table with many random numbers) and then proceed to comparing my functions.
I wanted to know If a better way existed in MySQL.
In the best of the cases, the function pow detects that the exponent is the integer 2 and performs exponentiation with a single multiply. There is no reason it could beat a pure multiply.

mysql SELECT MIN from WHERE result

I have a table with several routes which has severeal points defined by lattitude and longitude.
table name: route_path
|id_route |id_point| lat | lng |
|hhVFlBFA0M| 328| 48.90008 | 18.0233 |
|hhVFlBFA0M| 329| 48.90003 | 18.0268 |
|hhVFlBFA0M| 330| 48.89997 | 18.02856 |
|hhVFlBFA0M| 331| 48.89991 | 18.02857 |
|hhVFlBFA0M| 332| 48.89986 | 18.02862 |
|hhVFlBFA0M| 333| 48.89982 | 18.02869 |
|hhVFlBFA0M| 334| 48.89981 | 18.02878 |
|hhVFlBFA0M| 335| 48.89981 | 18.02886 |
|hhVFlBFA0M| 336| 48.89956 | 18.02925 |
|hhVFlBFA0M| 337| 48.89914 | 18.02972 |
|hhVFlBFA0M| 338| 48.8986177 | 18.0302365|
|3toCyDGVV2| 1| 48.134166 | 17.1051961|
|3toCyDGVV2| 2| 48.13417 | 17.1052 |
|3toCyDGVV2| 3| 48.13344 | 17.10559 |
|3toCyDGVV2| 4| 48.13298 | 17.10609 |
|3toCyDGVV2| 5| 48.13221 | 17.10699 |
|3toCyDGVV2| 6| 48.132 | 17.10806 |
|3toCyDGVV2| 7| 48.13193 | 17.10997 |
|3toCyDGVV2| 8| 48.13203 | 17.1109 |
|3toCyDGVV2| 9| 48.132 | 17.1 112 |
|3toCyDGVV2| 10| 48.13181512| 17.1112 |
|3toCyDGVV2| 11| 48.13181 | 17.10806 |
|3toCyDGVV2| 12| 48.13181 | 17.10806 |
|3toCyDGVV2| 13| 48.13197 | 17.10399 |
|3toCyDGVV2| 14| 48.13199 | 17.10352 |
|3toCyDGVV2| 15| 48.1323 | 17.10328 |
So far I can do it to select all rows from one route which are within tolerated distance and then loop to find minimal distance point.
SELECT * FROM route_path
WHERE
(((lat < $start_lat + $tolerance) AND
(lat > $start_lat - $tolerance)) AND
((lng < $start_lng + $tolerance) AND
(lng > $start_lng - $tolerance)))
So this will results in several rows (id_points) of each route and then I need to loop with while to find minimal.
How can I found out select one row (one id_point) from each route with minimal distance from start lat and lng considering this distance is not more then some value.
Any suggestion for sql request without looping.
Basically I need something like, but of course it is not possible to use MIN after WHERE
SELECT * FROM route_path WHERE **MIN(**(((lat < $start_lat + $tolerance) AND (lat > $start_lat - $tolerance)) AND ((lng < $start_lng + $tolerance) AND (lng > $start_lng - $tolerance)))**)**
There are a few ways to calculate the distance between 2 points. The most efficient are probably using spatial data types which are designed for this and have indexes for this. I am not yet that experience with these so if you want to alter your database to use these I will just point you at this previous question to get the basics (the accepted answer covers it):-
Fastest Way to Find Distance Between Two Lat/Long Points
If you want to use your table as it currently stands then you can get the distance in km between 2 points with the following calculation:-
111.045 * DEGREES(ACOS(COS(RADIANS(lat_point_1))
* COS(RADIANS(lat_point_2))
* COS(RADIANS(long_point_1) - RADIANS(long_point_2))
+ SIN(RADIANS(lat_point_1))
* SIN(RADIANS(lat_point_2))))
(taken from here).
Using this if you wanted to know the closest point on a particular route to your starting point you could use this (no need to multiply by 111.045 unless you care about the actual distance rather than it just being the closest one):-
SELECT id_route,
id_point,
lat,
lng,
DEGREES(ACOS(COS(RADIANS($start_lat))
* COS(RADIANS(lat))
* COS(RADIANS($start_lng) - RADIANS(lng))
+ SIN(RADIANS($start_lat))
* SIN(RADIANS(lat)))) AS distance_in_km
FROM route_path
WHERE id_route = 'hhVFlBFA0M'
ORDER BY distance_in_km
LIMIT 1
If you wanted to know the closest point on EACH route to your starting point you would calculate the closest point on each route, then join that to your original table where the distance for that point matches the min distance (this will cause a problem if 2 points on a single route are exactly the same distance from your start point)
SELECT route_path.id_route,
route_path.id_point,
route_path.lat,
route_path.lng
FROM route_path
INNER JOIN
(
SELECT id_route,
MIN(DEGREES(ACOS(COS(RADIANS($start_lat))
* COS(RADIANS(lat))
* COS(RADIANS($start_lng) - RADIANS(lng))
+ SIN(RADIANS($start_lat))
* SIN(RADIANS(lat))))) AS distance_in_km
FROM route_path
GROUP BY id_route
) sub0
ON route_path.id_route = sub0.id_route
AND DEGREES(ACOS(COS(RADIANS($start_lat))
* COS(RADIANS(lat))
* COS(RADIANS($start_lng) - RADIANS(lng))
+ SIN(RADIANS($start_lat))
* SIN(RADIANS(lat)))) = sub0.distance_in_km

MySql query ordering in JOIN tables

Good day every one
I have three MySql tables
-Doc
-DocType
-Org
I made a query like that:
Select Doc.Code,Doc.DataAccept,DocTypes.Name,Org.Name
From
Doc,DocTypes,Org
Where Doc.Type=DocTypes.Code AND Doc.Org=Org.Code AND Doc.Code;
in Result i have
|Code|DataAccept|Name|Name|
17 | - | - | - |
18 | - | - | - |
24 | - | - | - |
26 | - | - | - |
32 | - | - | - |
the Code field is not in series
if made query like
Select Doc.Code,Doc.DataAccept,DocTypes.Name,Org.Name
From
Doc,DocTypes,Org
Where Doc.Type=DocTypes.Code AND Doc.Org=Org.Code AND Doc.Code AND Doc.Code < 100;
than it's ok
|Code|DataAccept|Name|Name|
1 | - | - | - |
2 | - | - | - |
3 | - | - | - |
4 | - | - | - |
5 | - | - | - |
if Doc.Code < 1000 than again it's not in series
I try to use ORDER BY Code
Select Doc.Code,Doc.DataAccept,DocTypes.Name,Org.Name
From
Doc,DocTypes,Org
Where Doc.Type=DocTypes.Code AND Doc.Org=Org.Code AND Doc.Code AND Doc.Code
ORDER BY Code DESC;
and
Select Doc.Code,Doc.DataAccept,DocTypes.Name,Org.Name
From
Doc,DocTypes,Org
Where Doc.Type=DocTypes.Code AND Doc.Org=Org.Code AND Doc.Code AND Doc.Code
ORDER BY Code ASC;
but in result i have ordered not in series
What i did missing here ?
Thank you for your time , and forgive my English
"-" - it is a normal data , use it just for represent.
SELECT Doc.Code,Doc.DataAccept,DocTypes.Name,Org.Name
FROM Doc LEFT OUTER JOIN DocTypes JOIN Org
ON Doc.Type=DocTypes.Code AND Doc.Org=Org.Code;
WHERE Doc.Code < 1000 ;
ORDER BY 1
this explanation might be useful as well.