MySQL MIN query not working for calculated distance - mysql

I have a table of locations in my database. I need a query to find out the nearest location, provided any coordinates. I wrote the following query to get all rows, along with their respective distance from a given coordinate(distance in meters):
SELECT *, 111111 * DEGREES(ACOS(LEAST(COS(RADIANS(dest.latitude)) * COS(RADIANS(8.584710)) * COS(RADIANS(dest.longitude - 76.868735)) + SIN(RADIANS(dest.latitude)) * SIN(RADIANS(8.584710)), 1.0))) as distance FROM offer dest;
It gives the following output:
+----+------------------------+----------+-----------+------------+---------------------+
| id | description | latitude | longitude | name | distance |
+----+------------------------+----------+-----------+------------+---------------------+
| 2 | Location 1 Description | 8.574858 | 76.874748 | Location 1 | 1278.565430298969 |
| 12 | Location 2 Description | 8.584711 | 76.868738 | Location 2 | 0.35494725284463646 |
+----+------------------------+----------+-----------+------------+---------------------+
It is all working fine. Now to get the Minimum distance, I added HAVING MIN(distance) to this query. Now the query looks like below:
SELECT *, 111111 * DEGREES(ACOS(LEAST(COS(RADIANS(dest.latitude)) * COS(RADIANS(8.584710)) * COS(RADIANS(dest.longitude - 76.868735)) + SIN(RADIANS(dest.latitude)) * SIN(RADIANS(8.584710)), 1.0))) as distance FROM offer dest having MIN(distance);
Now, this query is supposed to return 1 row and that should be Location 2, as it has the the minimum location, but this is returning location 1 instead as seen below:
+----+------------------------+----------+-----------+------------+---------------------+
| id | description | latitude | longitude | name | distance |
+----+------------------------+----------+-----------+------------+---------------------+
| 2 | Location 1 Description | 8.574858 | 76.874748 | Location 1 | 1278.565430298969 |
+----+------------------------+----------+-----------+------------+---------------------+
Why is this behaving so? Is there something wrong with my query? IF yes, what is it and how do I get the location with minimum distance.

A HAVING-clause is used to filter conditions for a group. A group is defined with an aggregate function in SELECT-part and with a GROUP BY. As you do not have either of those, you should not use HAVING.
If you want to show the minimum distance from set of rows order by the distance and limit the result set just to one row.
SELECT *,
111111 * DEGREES(ACOS(LEAST(COS(RADIANS(dest.latitude)) *
COS(RADIANS(8.584710)) * COS(RADIANS(dest.longitude - 76.868735)) +
SIN(RADIANS(dest.latitude)) * SIN(RADIANS(8.584710)), 1.0))) as distance
FROM offer dest
ORDER BY distance
LIMIT 1;

Related

PHP MySQL How to calculate distance for between row?

I have a table like this:
stepID | UserID| Date | Lat | Lng
1 |1 | 2019-10-11 | -7.2905838 | 112.5655568
2 |1 | 2019-10-11 | -7.2349607 | 112.6106177
3 |1 | 2019-10-11 | -7.2345435 | 112.6112432
4 |1 | 2019-10-12 | -7.2529265 | 112.6542999
I need to calculate distance that user has been visited on the same day (for example 2019-10-11). So waht will be show on PHP page is (the KM amount below is an example) :
From step 1 to 2: 2 KM
From step 2 to 3: 3 KM
From step 3 to 4: 3 KM
TOTAL FOR TODAY: 8 KM
I've googling and also search in this stackoverflow's history but didn't found like what I face today. Need your suggestion how to query this.
Thank you before, GBU always.
This is not the best task for MySQL. It would be much better to perform it in any programming language by reading rows from DB 1 by 1.
However if you want to use exactly MySQL then something like that (assuming StepIDs are sequential numbers without gaps):
SELECT UserID, SUM(km) total
FROM (
SELECT t1.UserID,
DEGREES(ACOS(LEAST(1.0, COS(RADIANS(t1.Lat))
* COS(RADIANS(t2.Lat))
* COS(RADIANS(t1.Lon- t2.Lon))
+ SIN(RADIANS(t1.Lat))
* SIN(RADIANS(t2.Lat))))) km
FROM table t1
JOIN table t2 ON t1.UserID = t2.UserID and t1.StepID = t2.StepID - 1
) t
GROUP BY UserID
I've got the formula from: https://stackoverflow.com/a/24372831/2244262

Get total SUM of 3 columns and multiple records from second table

Morning,
I have 2 tables with a relationship being box_id.
The first table is structured as:
box_id | box_name | length | width | depth
---------------------------------------------
1 | box_1 | 30 | 30 | 20
It has details of a storage box ie. name, width, length depth etc.
The second table has an inventory of whats in that box)...
id | bid | product_name | prod_length | prod_width | prod_height
-------------------------------------------------------------------
1 | 1 | phone case | 12 | 6 | 2
2 | 1 | watch | 8 | 8 | 7
3 | 1 | perfume | 16 | 10 | 14
Using SQL im looking to get the details of the box including (length * width * depth / 10) of that box to get its total volume capability, plus to total volume consumed by the products in the box.
Results
box_id | box_name | volume | volume remaining
----------------------------------------------
1 | box_1 | 1800 cm3 | xyz cm3
Here is the SQL i have so far..
select box_name, width, length, depth, (width * length * depth / 10) AS totalVolume from storage...
Im not sure how to get the inventory details and see whats remaining or consumed.
Regards
The volume of the contents would come from a query like this:
select bid, sum(prod_lengt * prod_width * prod_height) as volume
from inventory
group by bid;
You can use a join to compare the volumes:
select b.*,
(length * width * height) as box_volume,
bi.volume as inventory_volume
from boxes b left join
(select bid, sum(prod_length * prod_width * prod_height) as volume
from inventory
group by bid
) bi
on bi.bid = b.id;
You can subtract the two or take a ratio. But don't over-interpret the results. It is a hard problem to figure out how much "inventory" can fit in a "box" based on the dimensions. The sum of the volumes is not sufficient for answering this question.

SQL Server 2008: Spatial Query - Return 5 closest sites

I'm trying to create an origin/destination matrix selection with SQL Server 2008. I want to find the closest 5 sites to any given site.
The matrix should include the origin ID, Destination ID and the distance between the two. So far I have managed to get something working for one site, but I want to loop through every row in my table. I've hit a wall in working out how to do this, could anybody help? I only want to return a destination if they are within 2.5km of the origin site.
The working code for my one origin site is below (I want the same output, but including all rows as an origin):
SP_Geometry is my geography column (MapInfo names this column by default when using EasyLoader)
DECLARE #Point1 AS Geography
DECLARE #Point1ID AS Nvarchar (255)
SELECT #Point1=SP_Geometry FROM SitesTable WHERE Label = 'ID1'
SELECT #Point1ID = Label FROM SitesTable WHERE Label = 'ID1'
SELECT TOP 5
#Point1ID AS Origin
,#Point1 AS Origin_SP_Geometry
,#Point1.STDistance(SP_Geometry) AS Distance
,#Label AS Destination
,SP_Geometry AS Destination_SP_Geometry
FROM SiteTable
WHERE #Point1.STDistance(SP_Geometry) <2500
ORDER BY #Point1.STDistance(SP_Geometry)
Running the above results in the following selection:
+--------+---------------------+-------------+-------------+----------------------------+
| Origin | Origin_SP_GEOMETRY | Distance | Destination | Destination_SP_GEOMETRY |
+--------+---------------------+-------------+-------------+----------------------------+
| ID1 | 0xE6100000010CDD(…) | 0 | ID1 | 0xE6100000010CDD772D9D(…) |
| ID1 | 0xE6100000010CDD(…) | 395.7739586 | ID867 | 0xE6100000010C2466CDFA5(…) |
| ID1 | 0xE6100000010CDD(…) | 407.6394398 | ID2500 | 0xE6100000010C6FBC54(…) |
| ID1 | 0xE6100000010CDD(…) | 1033.827269 | ID91 | 0xE6100000010C3981C0353(…) |
| ID1 | 0xE6100000010CDD(…) | 1082.667065 | ID1540 | 0xE6100000010CD03BFCD2(…) |
+--------+---------------------+-------------+-------------+----------------------------+
Ideally this is exactly what I want, but am having trouble establishing any kind of loop (that would union origin ID2, ID3 etc.)
Any help would be much appreciated!
Try using the RANK function. I can't test since I don't have sample data so this might need a tweak but should be close...
;WITH cteDistances AS (
SELECT
origin.Label AS OriginId
,dest.Label AS DestinationId
,origin.SP_Geometry.STDistance(dest.SP_Geometry) AS Distance
,RANK() OVER (PARTITION BY origin.Label ORDER BY origin.SP_Geometry.STDistance(dest.SP_Geometry)) AS DistanceRank
FROM
SiteTable origin
INNER JOIN SiteTable dest ON (dest.Label <> origin.Label)
WHERE
origin.SP_Geometry.STDistance(dest.SP_Geometry) < 2500
)
SELECT
OriginId, DestinationId, Distance
FROM
cteDistances
WHERE
DistanceRank <= 5

MySQL select to find similar lat/lng with matching name column

I am trying to find rows in a single table of locations that have the same latitude/longitude when rounded to 2 decimal places as well as the same name. Here is my table (for example):
+---------------------------------------+
| ID | lat | lng | name |
+---------------------------------------+
| 11 | -11.119 | 13.891 | Smith's Place |
| 81 | -11.121 | 13.893 | Smith's Place |
+---------------------------------------+
What SELECT statement would find instances (like the one above) where the lat/lng match when rounded to 2 decimal places...and the names are the same?
I am looking for something similar to this query that obviously doesn't work (but is asking for what I am after):
SELECT * FROM pb_locations GROUP BY ROUND(lat,2),ROUND(lng,2) WHERE name = name HAVING count(ID) > 1
WHERE name = name is always true, since it's just comparing within the same row, not across different rows.
You need to put all 3 columns in the GROUP BY clause.
SELECT *
FROM pb_locations
GROUP BY ROUND(lat, 2), ROUND(lng, 2), name
HAVING COUNT(*) > 1

Implementing a k-d tree for 'nearest neighbor' search in MYSQL?

I am designing an automated trading software for the foreign exchange market.
In a MYSQL database I have years of market data at five-minute intervals. I have 4 different metrics for this data alongside the price and time.
[Time|Price|M1|M2|M3|M4]
x ~400,0000
Time is the primary key, and M1 through M4 are different metrics (such as standard deviation or slope of a moving average).
Here is a real example (excerpt:)
+------------+--------+-----------+--------+-----------+-----------+
| Time | Price | M1 | M2 | M3 | M4 |
+------------+--------+-----------+--------+-----------+-----------+
| 1105410300 | 1.3101 | 12.9132 | 0.4647 | 29.6703 | 50 |
| 1105410600 | 1.3103 | 14.056 | 0.5305 | 29.230801 | 50 |
| 1105410900 | 1.3105 | 15.3613 | 0.5722 | 26.8132 | 25 |
| 1105411200 | 1.3106 | 16.627501 | 0.4433 | 24.395599 | 26.47059 |
| 1105411500 | 1.3112 | 18.7843 | 1.0019 | 24.505501 | 34.375 |
| 1105411800 | 1.3111 | 19.8375 | 0.5626 | 20 | 32.8125 |
| 1105412100 | 1.3105 | 20.0168 | 0.6718 | 9.7802 | 23.4375 |
| 1105412400 | 1.3105 | 20.4538 | 0.8943 | 7.033 | 23.4375 |
| 1105412700 | 1.3109 | 21.6078 | 0.4902 | 11.7582 | 29.6875 |
| 1105413000 | 1.3104 | 21.2045 | 1.565 | 8.6813 | 21.875 |
+------------+--------+-----------+--------+-----------+-----------+...400k more
Given an input of M1, M2, M3, and M4 I want (quickly and accurately) find the 5,000 closest matches.
Sample input:
+------------+--------+-----------+--------+-----------+-----------+
| Time | Price | M1 | M2 | M3 | M4 |
+------------+--------+-----------+--------+-----------+-----------+
| 1205413000 | 1.4212 | 20.1045 | 1.0012 | 9.1013 | 11.575 |
+------------+--------+-----------+--------+-----------+-----------+
I figured that each of these metrics could be considered a 'dimension,' and that I can do a nearest neighbor search to locate the closest datapoints in this multidimensional space.
It seems the simplest way to do this is to iterate through every single data point and measure the multidimensional distance to my input point; but speed is of the essence!
I read about something called K-D Trees used for this purpose. Can anyone please explain or provide me with some material that explains how to implement this in MYSQL?
It may be relevant to mention that I can pre-process the table, but the input is received in real-time.
Currently I just make a rough cluster around the data on each dimension independently:
INSERT INTO Dim1 SELECT * FROM myTable AS myTable USE INDEX(M1) WHERE myTable.M1 < currentM1 ORDER BY M1 DESC LIMIT 2500;
INSERT INTO Dim1 SELECT * FROM myTable AS myTable USE INDEX(M1) WHERE myTable.M1 > currentM1 ORDER BY M1 ASC LIMIT 2500;
INSERT INTO Dim2 SELECT * FROM myTable AS myTable USE INDEX(M2) WHERE myTable.M2 < currentM2 ORDER BY M2 DESC LIMIT 2500;
INSERT INTO Dim2 SELECT * FROM myTable AS myTable USE INDEX(M2) WHERE myTable.M2 > currentM2 ORDER BY M2 ASC LIMIT 2500;
INSERT INTO Dim3 SELECT * FROM myTable AS myTable USE INDEX(M3) WHERE myTable.M3 < currentM3 ORDER BY M3 DESC LIMIT 2500;
INSERT INTO Dim3 SELECT * FROM myTable AS myTable USE INDEX(M3) WHERE myTable.M3 > currentM3 ORDER BY M3 ASC LIMIT 2500;
INSERT INTO Dim4 SELECT * FROM myTable AS myTable USE INDEX(M4) WHERE myTable.M4 < currentM4 ORDER BY M4 DESC LIMIT 2500;
INSERT INTO Dim4 SELECT * FROM myTable AS myTable USE INDEX(M4) WHERE myTable.M4 > currentM4 ORDER BY M4 ASC LIMIT 2500;
It is important to understand that I am interested in distance by rank, not by value.
Edit: I am a little closer to understanding how to do it (I think):
I need to pre-process each row of each metric and assign it a percentile which would represent its location (percent-wise) in its range.
For example, for any given value of M1:
percentile = (# rows with values less than input)/(# total rows)
If I calculate the input's percentile and use that for a nearest neighbor search instead of the actual value I will have effectively scaled the various metrics such that they could be used as dimensions.
I am still lost on how to do the actual search though. Is this even possible to accomplish efficiently in MySQL?
You should be able to do a query like the following:
SELECT * FROM myTable
WHERE M1 BETWEEN searchM1 - radiusM1 AND searchM1 + radiusM1
AND M2 BETWEEN searchM2 - radiusM2 AND searchM2 + radiusM2
AND M3 BETWEEN searchM3 - radiusM3 AND searchM3 + radiusM3
AND M4 BETWEEN searchM4 - radiusM4 AND searchM4 + radiusM4
In the case of a sphere, all the radius values will be the same, of course. You then adjust the radius until you get as close to the number of records you want. I'd suggest a binary search.
I'm not sure if you want to mess with the distribution or not, but assuming you do, you would just need to give each search value a rank between the two values it would fall between in your table (e.g. if rank 5 is 5.5, rank 6 is 5.9, and the search value is 5.6, then the search rank could be 5.5)