MySQL change return values - mysql

Here's my problem: I have two tables - zipcodes table and vendors table.
What I want to do is, when I enter a zip code, to get all vendors (based on their zip code) within a certain radius. I got it working so far.
But here's the thing. I need to divide the results based on the distance. I need to have several groups: within 10 miles, within 50 miles, and within 100 miles. What I want to do (if possible) is to change all values under 10 miles to 10, those between 11 and 50 to 50 and those between 51 and 100 to 100.
Here is my query so far, that returns the correct results. I need help how to substitute the actual distance values with those I need.
SELECT SQL_CALC_FOUND_ROWS
3959 * 2 * ASIN(SQRT(POWER(SIN(( :lat - zipcodes.zip_lat) * pi()/180 / 2), 2) + COS( :lat * pi()/180) * COS(zipcodes.zip_lat * pi()/180) * POWER(SIN(( :lon - zipcodes.zip_lon) * pi()/180 / 2), 2))) AS distance,
vendors.*
FROM
g_vendors AS vendors
INNER JOIN g_zipcodes AS zipcodes ON zipcodes.zip_code = vendors.vendor_zipcode
WHERE
vendors.vendor_status != 4
GROUP BY
vendors.vendor_id
HAVING distance < 100

Use CASE EXPRESSION :
SELECT t.*,
CASE WHEN t.distance < 10 THEN 10
WHEN t.distance between 11 and 50 THEN 50
ELSE 100
END as new_distance
FROM ( Your Query Here ) t

Add a new column to your SELECT-Part containing a number to represent the distances:
3 -> within 10 miles
2 -> within 50 miles
1 -> within 100 miles
code:
CAST((distance < 10) AS SIGNED INTEGER) + CAST((distance < 50) AS SIGNED INTEGER) + CAST((distance < 100) AS SIGNED INTEGER) AS goodName

Related

Creating buckets to categorize the values using MySQL

Hi I am trying to create a buckets for a very large number of rows. I have a maximum value of 9759721 and a minimum value of 1006909. I would like to show the results as following:
distance
bucket
range
1006909
0
1000000 - 1009999
1013525
1
1010000 - 1019999
1021948
2
1020000 - 1029999
The table might not be so clear but in general, I would like to break down them by a change of 10000. Creating a new bucket once every 10000 starting from 1000000.
I tried the following code but it doesn't show the correct output.
select distance,floor(distance/10000) as _floor from data;
I got something like:
distance
bucket
1006909
100
1013525
101
1021948
102
1035472
103
1042069
104
9759721
975
This seems to be correct but I need the bucket to start from 0 and then change based on 10000. And then have a range column as well. The minimum value that I have for distance is 1006909 and so the data doesn't start with 0 but is it possible to still have a bucket column starting from 0 [i.e assigned to minimum distance].
SELECT
d.distance,
DENSE_RANK() OVER (ORDER BY d._floor) - 1 AS bucket,
d._floor * 10000 AS bucket_lower_limit,
d._floor * 10000 + 10000 AS bucket_upper_limit
FROM
(
SELECT
distance,
FLOOR(distance / 10000) AS _floor
FROM
data
)
AS d
NOTE: the will give buckets numbered from 0 upwards, but will also remove all gaps (such that you sample data will have bucket 5 for the last row, not bucket 975)
Alternatively, if you need to preserve the gaps...
SELECT
d.distance,
d._floor - MIN(d._floor) OVER () AS bucket,
d._floor * 10000 AS bucket_lower_limit,
d._floor * 10000 + 10000 AS bucket_upper_limit
FROM
(
SELECT
distance,
FLOOR(distance / 10000) AS _floor
FROM
data
)
AS d
Just calculate 1006909 div 10000 * 10000 = 1000000 and subtract it from distance. That'll make the buckets start from 0:
SELECT distance
, (distance - a) div 10000 AS bucket
, distance div 10000 * 10000 AS range_from
, distance div 10000 * 10000 + (10000 - 1) AS range_to
FROM t
CROSS JOIN (
SELECT MIN(distance) div 10000 * 10000 AS a
FROM t
) AS x
SQL Fiddle

Query that finds all locations within a certain radius of a given coordinate

My database has an organisation tables with two decimal columns lat and lon that indicate the location of the organisation. I'm trying to find all organisations within 800km of the coordinate 53.6771, -1.62958 (this roughly corresponds to Leeds in the UK).
The query I'm using is
select *
from organisation
where (3959 * acos(cos(radians(53.6771)) *
cos(radians(lat)) *
cos(radians(lon) - radians(-1.62958)) + sin(radians(53.6771)) *
sin(radians(lat)))) < 800
However this returns locations in Lyon, France which is about 970km from Leeds, UK. I realise that formulae such as the above make some simplifying assumptions (e.g. treating the shape of the Earth as a sphere), so I don't expect the results to be absolutely accurate, but I should be able to do better than this?
I found a formula here for calculating the distance in km between two points, and I have tried to convert it to mysql:
WHERE (6371 * 2 *
ATAN2(
SQRT(
SIN(RADIANS((lat-53.6771)/2)) * SIN(RADIANS((lat-53.6771)/2)) + SIN(RADIANS((lon+1.62958)/2)) * SIN(RADIANS((lon+1.62958)/2)) * COS(RADIANS(lat)) * COS(RADIANS(53.6771))
),
SQRT(
1-(SIN(RADIANS((lat-53.6771)/2)) * SIN(RADIANS((lat-53.6771)/2)) + SIN(RADIANS((lon+1.62958)/2)) * SIN(RADIANS((lon+1.62958)/2)) * COS(RADIANS(lat)) * COS(RADIANS(53.6771)))
)
)) < 800
The problem was caused by using the multiplier for miles (3959) instead of kilometers (6371). The correct query is shown below
select *
from organisation
where (6371 * acos(cos(radians(53.6771)) *
cos(radians(lat)) *
cos(radians(lon) - radians(-1.62958)) + sin(radians(53.6771)) *
sin(radians(lat)))) < 800

MySQL Query To Select Closest City

I am trying to repeat the following query for all rows. Basically I am trying to map the closest city (based on the latitude and longitude) to the places latitude and longitude. I have a table places which contains the places that need to be mapped, and a table CityTable with the places to be matched to. I have the following query which works for a single row:
SELECT p.placeID, p.State, p.City, p.County, p.name,
SQRT(POW((69.1 * (p.lat - z.Latitude)), 2 )
+ POW((53 * (p.lng - z.Loungitude)), 2)) AS distance,
p.lat,p.lng,z.Latitude,z.Loungitude,z.City
FROM places p,CityTable z
WHERE p.placeID = 1
ORDER BY distance ASC
LIMIT 1;
This works for a single location. Obviously I would need to remove the WHERE constraints to apply it to the entire table.The problem that I am encountering is that it seems to want to make a copy to compare to every other element in the table. For example, if there are 100 rows in p and 100 rows in z, then the resulting table seems to be 10,000 rows. I need the table to be of size count(*) for p. Any ideas? Also, are there any more efficient ways to do this if my table p contains over a million rows? Thanks.
You can find the nearest city to a place using:
SELECT p.placeID, p.State, p.City, p.County, p.name,
(select z.City
from CityTable z
order by SQRT(POW((69.1 * (p.lat - z.Latitude)), 2 ) + POW((53 * (p.lng - z.Loungitude)), 2))
limit 1
) as City,
p.lat, p.lng
FROM places p
ORDER BY distance ASC;
(If you want additional city information, join the city table back in on City.)
This doesn't solve the problem of having to do the Cartesian product. It does, however, frame it in a different way. If you know that a city is within five degrees longitude/latitude of any place, then you can make the subquery more efficient:
(select z.City
from CityTable z
where z.lat >= p.lat + 5 and z.lat <= p.lat - 5 and
z.long <= p.long + 5 and z.long <= p.lat - 5
order by SQRT(POW((69.1 * (p.lat - z.Latitude)), 2 ) + POW((53 * (p.lng - z.Loungitude)), 2))
limit 1
) as City,
p.lat, p.lng;
This query will use an index on lat. It might even use an index on lat, long.
If this isn't sufficient, then you might consider another way of reducing the search space, by looking only at neighboring states (in the US) or countries.
Finally, you may want to consider the geospatial extensions to MySQL if you are often dealing with this type of data.

Slow SQL Query by Limit/Order dynamic field (coordinates from X point)

I'm trying to make a SQL query on a database of 7 million records, the database "geonames" have the "latitude" and "longitude" in decimal(10.7) indexed both, the problem is that the query is too slow:
SELECT SQL_NO_CACHE DISTINCT
geonameid,
name,
(6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance
FROM geoNames
WHERE (6367.41 * SQRT(2 * (1 - Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude)) * Sin(-0.0669560660943) + cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533))) <= '10')
ORDER BY Distance
The problem is sort by the "Distance" field, which when created dynamically take long to seep into the condition "WHERE", if I remove the condition of the "WHERE ... <= 10" takes only 0.34 seconds, but the result is 7 million records and to transfer data from MySQL to PHP takes almost 120 seconds.
Can you think of any way to make the query to not lose performance by limiting the Distance field, given that the query will very often change the values?
This kind of query cannot use an index but must compute whether the lat/lon of each row falls within the specified distance. Therefore, it is typical that some form of preprocessing is used to limit the scan to a subset of rows. You could create tables corresponding to distance "bands" (2, 5, 8, 10, 20 miles/km -- whatever makes sense for your application requirements) and then populate these bands and keep them up to date. If you want only those medical providers, say, or hotels, or whatever, within 10 miles of a given location, there's no need to worry about the ones that are hundreds or thousands of miles away. With ad hoc queries you could inner join on the "within 10 miles" band, say, and thereby exclude from the comparison scan all rows where the computed distance > 10. When the location varies, the "elegant" way to handle this is to implement an RTREE, but you can define your encompassing region in any arbitrary way you like if you have access to additional data -- e.g. by using zipcodes or counties or states.
There are two things you can do:
Make sure the datatypes are the same on both sides of a comparison: ie compare with 10 (a number), not '10' (a char type) - it will make less work for the DB
In cases like this, I create a view, which means the calculation to be made just once, even if you refer to it more than once in the query
If these two points are incorporated into you code, you get:
CREATE VIEW geoNamesDistance AS
SELECT SQL_NO_CACHE DISTINCT
geonameid,
name,
(6367.41 * SQRT(2 * (1-Cos(RADIANS(latitude)) * Cos(0.704231626533) * (Sin(RADIANS(longitude))*Sin(-0.0669560660943) + Cos(RADIANS(longitude)) * Cos(-0.0669560660943)) - Sin(RADIANS(latitude)) * Sin(0.704231626533)))) AS Distance
FROM geoNames;
SELECT * FROM geoNamesDistance
WHERE Distance <= 10
ORDER BY Distance;
I came up with:
select * from retailer
where latitude is not null and longitude is not null
and pow(2*(latitude - ?), 2) + pow(longitude - ?, 2) < your_magic_distance_value
With this fast & easy flat-Earth code, Los Angeles is closer to Honolulu than San Fransisco, but i doubt customers will consider that when going that far to shop.

SQL Query For Total Points Within Radius of a Location

I have a database table of all zipcodes in the US that includes city,state,latitude & longitude for each zipcode. I also have a database table of points that each have a latitude & longitude associated with them. I'd like to be able to use 1 MySQL query to provide me with a list of all unique city/state combinations from the zipcodes table with the total number of points within a given radius of that city/state. I can get the unique city/state list using the following query:
select city,state,latitude,longitude
from zipcodes
group by city,state order by state,city;
I can get the number of points within a 100 mile radius of a specific city with latitude '$lat' and longitude '$lon' using the following query:
select count(*)
from points
where (3959 * acos(cos(radians($lat)) * cos(radians(latitude)) * cos(radians(longitude) - radians($lon)) + sin(radians($lat)) * sin(radians(latitude)))) < 100;
What I haven't been able to do is figure out how to combine these queries in a way that doesn't kill my database. Here is one of my sad attempts:
select city,state,latitude,longitude,
(select count(*) from points
where status="A" AND
(3959 * acos(cos(radians(zipcodes.latitude)) * cos(radians(latitude)) * cos(radians(longitude) - radians(zipcodes.longitude)) + sin(radians(zipcodes.latitude)) * sin(radians(latitude)))) < 100) as 'points'
from zipcodes
group by city,state order by state,city;
The tables currently have the following indexes:
Zipcodes - `zip` (zip)
Zipcodes - `location` (state,city)
Points - `status_length_location` (status,length,longitude,latitude)
When I run explain before the previous MySQL query here is the output:
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
| 1 | PRIMARY | zipcodes | ALL | NULL | NULL | NULL | NULL | 43187 | Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | points | ref | status_length_location | status_length_location | 2 | const | 16473 | Using where; Using index |
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
I know I could loop through all the zipcodes and calculate the number of matching points within a given radius but the points table will be growing all the time and I'd rather not have stale point totals in the zipcodes database. I'm hoping a MySQL guru out there can show me the error of my ways. Thanks in advance for your help!
MySQL Guru or not, the problem is that unless you find a way of filtering out various rows, the distance needs to be calculated between each point and each city...
There are two general approaches that may help the situation
make the distance formula simpler
filter out unlikely candidates to the 100k radius from a given city
Before going into these two avenue of improvement, you should decide on the level of precision desired with regard to this 100 miles distance, also you should indicate which geographic area is covered by the database (is this just continental USA etc.
The reason for this is that while more precise numerically, the Great Circle formula, is very computationally expensive. Another avenue of performance improvement would be to store "Grid coordinates" of sorts in addtion (or instead of) the Lat/Long coordinates.
Edit:
A few ideas about a simpler (but less precise) formula:
Since we're dealing with relatively small distances, (and I'm guessing between 30 and 48 deg Lat North), we can use the euclidean distance (or better yet the square of the euclidean distance) rather than the more complicated spherical trigonometry formulas.
depending on the level of precision expected, it may even be acceptable to have one single parameter for the linear distance for a full degree of longitude, taking something average over the area considered (say circa 46 statute miles). The formula would then become
LatDegInMi = 69.0
LongDegInMi = 46.0
DistSquared = ((Lat1 - Lat2) * LatDegInMi) ^2 + ((Long1 - Long2) * LongDegInMi) ^2
On the idea of a columns with grid info to filter to limit the number of rows considered for distance calculation.
Each "point" in the system, be it a city, or another point (?delivery locations, store locations... whatever) is assigned two integer coordinate which define the square of say 25 miles * 25 miles where the point lies. The coordinates of any point within 100 miles from the reference point (a given city), will be at most +/- 4 in the x direction and +/- 4 in the y direction. We can then write a query similar to the following
SELECT city, state, latitude, longitude, COUNT(*)
FROM zipcodes Z
JOIN points P
ON P.GridX IN (
SELECT GridX - 4, GridX - 3, GridX - 2, GridX - 1, GridX, GridX +1, GridX + 2 GridX + 3, GridX +4
FROM zipcode ZX WHERE Z.id = ZX.id)
AND
P.GridY IN (
SELECT GridY - 4, GridY - 3, GridY - 2, GridY - 1, GridY, GridY +1, GridY + 2 GridY + 3, GridY +4
FROM zipcode ZY WHERE Z.id = ZY.id)
WHERE P.Status = A
AND ((Z.latitude - P.latitude) * LatDegInMi) ^2
+ ((Z.longitude - P.longitude) * LongDegInMi) ^2 < (100^2)
GROUP BY city,state,latitude,longitude;
Note that the LongDegInMi could either be hardcoded (same for all locations within continental USA), or come from corresponding record in the zipcodes table. Similarly, LatDegInMi could be hardcoded (little need to make it vary, as unlike the other it is relatively constant).
The reason why this is faster is that for most records in the cartesian product between the zipcodes table and the points table, we do not calculate the distance at all. We eliminate them on the basis of a index value (the GridX and GridY).
This brings us to the question of which SQL indexes to produce. For sure, we may want:
- GridX + GridY + Status (on the points table)
- GridY + GridX + status (possibly)
- City + State + latitude + longitude + GridX + GridY on the zipcodes table
An alternative to the grids is to "bound" the limits of latitude and longitude which we'll consider, based on the the latitude and longitude of the a given city. i.e. the JOIN condition becomes a range rather than an IN :
JOIN points P
ON P.latitude > (Z.Latitude - (100 / LatDegInMi))
AND P.latitude < (Z.Latitude + (100 / LatDegInMi))
AND P.longitude > (Z.longitude - (100 / LongDegInMi))
AND P.longitude < (Z.longitude + (100 / LongDegInMi))
When I do these type of searches, my needs allow some approximation. So I use the formula you have in your second query to first calculate the "bounds" -- the four lat/long values at the extremes of the allowed radius, then take those bounds and do a simple query to find the matches within them (less than the max lat, long, more than the minimum lat, long). So what I end up with is everything within a square sitting inside the circle defined by the radius.
SELECT * FROM tblLocation
WHERE 2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
where the 2 > part would be the number of parallels away and 40 and -90 are lat/lon of the test point
Sorry I didn't use your tablenames or structures, I just copied this out of one of my stored procedures I have in one of my databases.
If I wanted to see the number of points in a zip code I suppose I would do something like this:
SELECT
ParcelZip, COUNT(LocationID) AS LocCount
FROM
tblLocation
WHERE
2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
GROUP BY
ParcelZip
Getting the total count of all locations in the range would look like this:
SELECT
COUNT(LocationID) AS LocCount
FROM
tblLocation
WHERE
2 > POWER(POWER(Latitude - 40, 2) + POWER(Longitude - -90, 2), .5)
A cross join may be inefficient here since we are talking about a large quantity of records but this should do the job in a single query:
SELECT
ZipCodes.ZipCode, COUNT(PointID) AS LocCount
FROM
Points
CROSS JOIN
ZipCodes
WHERE
2 > POWER(POWER(Points.Latitude - ZipCodes.Latitude, 2) + POWER(Points.Longitude - ZipCodes.Longitude, 2), .5)
GROUP BY
ZipCodeTable.ZipCode