GTFS SQL Find Closest Routes - mysql

I have a MySQL database setup with GTFS data and I am trying to query the database to return a list of routes (no duplicates) ordered by the distance to the closest stop on each respective route. (Note: The coordinates will change depending on where the user is)
The database is rather large (millions of rows in the stop_times table, 10,000s of rows in the stops table) so I want to make this as efficient as possible.
One thing I tried was to create a temporary table called stop_routes_link (created when the GTFS data is imported into my database) that links stops with routes such that there is an entry for each stop-route pair like so:
route_id | stop_id
-----------------------
25 | 366072709
21 | 366072709
21 | 194326291
F | 60745282
Q | 198000482
And then I ran this query which works perfectly:
SELECT routes.route_short_name AS route_short_name
FROM routes
LEFT JOIN (
SELECT ( 3959 * acos( cos( radians(39.94868155755109) ) * cos( radians( stops.stop_lat ) ) * cos( radians( stops.stop_lon ) - radians(-75.15972534860013) ) + sin( radians(39.94868155755109) ) * sin( radians( stops.stop_lat ) ) ) ) AS distance, stop_routes_link.route_id AS route_id
FROM stops
LEFT JOIN stop_routes_link ON stops.stop_id = stop_routes_link.stop_id ORDER BY distance)
AS stops ON routes.route_id = stops.route_id
ORDER BY stops.distance
And it returns:
route_short_name
----------------
23
23
12
12
9
21
38
Which is what I want (I know those are the closest routes to that location), except it returns duplicates for routes since I think it is returning an row for each stop on a route.
How do I return only unique routes? I thought that the "LEFT JOIN" would cause only one row per entry in the routes table but it didn't.
I've also tried:
SELECT DINSTINCT routes.route_short_name AS route_short_name
FROM routes
LEFT JOIN (
SELECT ( 3959 * acos( cos( radians(39.94868155755109) ) * cos( radians( stops.stop_lat ) ) * cos( radians( stops.stop_lon ) - radians(-75.15972534860013) ) + sin( radians(39.94868155755109) ) * sin( radians( stops.stop_lat ) ) ) ) AS distance, stop_routes_link.route_id AS route_id
FROM stops
LEFT JOIN stop_routes_link ON stops.stop_id = stop_routes_link.stop_id ORDER BY distance)
AS stops ON routes.route_id = stops.route_id
ORDER BY stops.distance
And:
SELECT routes.route_short_name AS route_short_name
FROM routes
LEFT JOIN (
SELECT ( 3959 * acos( cos( radians(39.94868155755109) ) * cos( radians( stops.stop_lat ) ) * cos( radians( stops.stop_lon ) - radians(-75.15972534860013) ) + sin( radians(39.94868155755109) ) * sin( radians( stops.stop_lat ) ) ) ) AS distance, stop_routes_link.route_id AS route_id
FROM stops
LEFT JOIN stop_routes_link ON stops.stop_id = stop_routes_link.stop_id ORDER BY distance)
AS stops ON routes.route_id = stops.route_id
GROUP BY route_short_name
ORDER BY stops.distance
But they both don't return the closest routes they return a random ordered list of routes (I'm not sure how it is calculated) which I'm assuming is because the grouping messes it up.
Any help would be greatly appreciated!

Related

Join two tables subquery

I'm trying to get * from the users2 table where the user's location is within the given radius.
The location query works fine on the user_location2 table.
SELECT uid, ( 3959 * acos( cos( radians(28.247800068217) ) * cos( radians( `lat` ) ) * cos( radians( `lon` ) - radians(-80.726205977101) )
+ sin( radians(28.247800068217) ) * sin( radians( `lat` ) ) ) )
AS distance FROM user_location2
HAVING distance <= 25 ORDER BY time_stamp
and the inner join works fine without the location subquery
SELECT *
FROM users2
LEFT JOIN user_location2
ON user_location2.uid = users2.id
I'm just having trouble combining the two. Here's my current query that just is returning all rows, so I'm obviously doing something wrong.
SELECT *
FROM users2
LEFT JOIN user_location2
ON user_location2.uid = users2.id
WHERE EXISTS (SELECT NULL, ( 3959 * acos( cos( radians(26.247800068217) ) * cos( radians( `lat` ) ) * cos( radians( `lon` ) - radians(-89.726205977101) ) + sin( radians(26.247800068217) ) * sin( radians( `lat` ) ) ) )
AS distance FROM user_location2
HAVING distance <= 5 ORDER BY time_stamp)
Edit included
I'm hoping to add in a 3rd table (user_like) to eliminate a lot of possible rows that shouldn't be included in the result.
Let's say the script is running for user_id = 88
So basically users 89, 90 and 91 would fall under the location radius, but wouldn't be included in the result because user 88 already liked them.
Try this...
SELECT users2.*
FROM users2
LEFT JOIN user_location2
ON user_location2.uid = users2.id
WHERE ( 3959 * acos( cos( radians(28.247800068217) ) * cos( radians( `lat` ) ) * cos( radians( `lon` ) - radians(-80.726205977101) )
+ sin( radians(28.247800068217) ) * sin( radians( `lat` ) ) ) ) < = 5
ORDER BY time_stamp
The join is fine between the two tables. The calculated column has been added in the select clause and the where clause because the filter requires it there. It would be easier to put that through a view so that if you need to change it, it can be done in one place.
EDIT: Removed the calculation from SELECT because I believe you don't need to see that. Just left it in the WHERE clause since it needs to be filtered on.
You can try this :
SELECT *
FROM users2 A
LEFT JOIN user_location2 B
ON B.uid = A.id
WHERE ( 3959 * acos( cos( radians(26.247800068217) ) * cos( radians( `B.lat` ) ) * cos( radians( `B.lon` ) - radians(-89.726205977101) ) + sin( radians(26.247800068217) ) * sin( radians( `B.lat` ) ) ) ) > 25;

using first sql statement result into another sql statement

What basically i want to do is pick all the coordinates from roadData
one by one and then find all the point in tweetMelbourne within 20
miles of it and insert those point into another table.
So for every (x,y) in roadData table find neighbouring data point from
tweetMelbourne and insert those points into another new table.
So I have to do this:
SELECT geo_coordinates_latitude, geo_coordinates_longitude
FROM tweetmelbourne
HAVING ( 3959 * acos( cos( radians(latitude) ) * cos( radians( geo_coordinates_latitude ) ) *
cos( radians( geo_coordinates_longitude ) - radians(longitude) ) + sin( radians(latitude) ) *
sin( radians( geo_coordinates_latitude ) ) ) ) < .1 ORDER BY distance LIMIT 0 , 20;
in which the value of latitude and longitude i have to get from another table :
select longitude,latitude from roadData;
describe tweetmelbourne;
describe roadData;
SELECT geo_coordinates_latitude, geo_coordinates_longitude
FROM tweetmelbourne;
select longitude,latitude from roadData;
The correct syntax of IN() with multiple arguments is : (Val1,Val2) IN(SELECT VAL1,val2..
SELECT t.address,(t.x+t.y) as z
FROM student t
WHERE (t.x,t.y) IN(SELECT x,y FROM tweet)
Also can be done with a join :
SELECT t.address,(t.x+t.y) as z
FROM student t
JOIN tweet s
ON(t.x = s.x and t.y = s.y)
EDIT: I think what you want is:
SELECT s.address,t.x+t.y as z
FROM student s
CROSS JOIN tweet t
Try this:
SELECT s.address, (t.x + t.y) as z
from (SELECT id,x,y FROM `tweet`) as t, student s
WHERE t.id = s.id;
You need to join the two tables, calculating the distance in the ON clause to select the nearby rows.
SELECT *
FROM tweetmelbourne
JOIN roadData
ON ( 3959 * acos( cos( radians(latitude) ) * cos( radians( geo_coordinates_latitude ) ) *
cos( radians( geo_coordinates_longitude ) - radians(longitude) ) + sin( radians(latitude) ) *
sin( radians( geo_coordinates_latitude ) ) ) ) < .1
This will be very slow if the tables are large. It's not possible to use indexes to implement the join, so it will have to perform that complex formula on every pair of rows. You might want to look at MySQL's Spatial Data extensions.

Search by alias without showing the alias

I have a table of categories and a table of items.
Each item has latitude and longitude to allow me to search by distance.
What I want to do is show each category and how many items are in that category, within a distance chosen.
E.g. Show all TVs in Electronics category within 1 mile of my own latitude and longitude.
Here's what I'm trying but I cannot have two columns within an alias, obviously, and am wondering if there is a better way to do this?
Here is a SQL fiddle
Here's the query:
SELECT *, ( SELECT count(*),( 3959 * acos( cos( radians(52.993252) )
* cos( radians( latitude ) )
* cos( radians( longitude ) - radians(-0.412470) )
+ sin( radians(52.993252) )
* sin( radians( latitude ) ) ) ) AS distance
FROM items
WHERE category = category_id group by item_id
HAVING distance < 1 ) AS howmanyCat,
( SELECT name FROM categories WHERE category_id = c.parent ) AS parname
FROM categories c ORDER BY category_id, parent
First, start with the distance calculation for each item, then join in the category information and aggregate and filter
select c.*, count(i.item_id) as numitems
from category c left outer join
(SELECT i.*, ( 3959 * acos( cos( radians(52.993252) ) * cos( radians( latitude ) )
* cos( radians( longitude ) - radians(-0.412470) ) + sin( radians(52.993252) )
* sin( radians( latitude ) ) )
) AS distance
FROM items i
) i
on c.category_id = i.category_id and distance < 1
group by category_id;
Is this what you're looking for:
SELECT categories.name, count(items.item_id) as cnt
FROM items
JOIN categories
ON categories.category_id=items.category
WHERE ( 3959 * acos( cos( radians(52.993252) )
* cos( radians( latitude ) )
* cos( radians( longitude ) - radians(-0.412470) )
+ sin( radians(52.993252) )
* sin( radians( latitude ) ) ) ) < 1
GROUP BY categories.category_id;
this gives:
Tvs | 1
You can put the expression for computing the distance inside a nested SELECT, and then join the results to the categories table, like this:
SELECT COUNT(*), cc.name FROM (
SELECT
i.item_id
, c.category_id
, ( 3959 * acos( cos( radians(52.993252) )
* cos( radians( latitude ) )
* cos( radians( longitude ) - radians(-0.412470) )
+ sin( radians(52.993252) )
* sin( radians( latitude ) ) ) ) AS distance
FROM items i
JOIN categories c ON c.category_id = i.category
) raw
JOIN categories cc ON raw.category_id = cc.category_id AND raw.distance < 1
GROUP BY cc.name
The nested query pairs up items and categories, and adds the calculated distance column. The outer query then filters the rows by distance, and groups them by category to produce the desired output:
COUNT(*) NAME
-------- ----
1 TVs
Demo on sqlfiddle.

MySQL distinct or group by in combination with having not giving a result when result is a single row

It seems that my query is not exactly doing what I want. The query gets a result aslong as the result is 2 or more rows. When I get a single row the query is not getting any result.
In the SELECT I can do DISTINCT (ct.name) but this gives the same problem as the group by.
SELECT
ct.name,
( 3959 * acos(cos(radians(52.779716)) * cos(radians( com.gps_lat )) * cos(radians( com.gps_lon ) -
radians(21.84803)) + sin( radians(52.779716) ) * sin( radians( com.gps_lat )))) as distance
FROM cuisine_types as ct
Left joining company to check if a company is attached to the cuisine_type
LEFT JOIN company AS com ON (com.cuisine_type_id = ct.id)
Here I'm grouping the results so no Cuisine Type appears twice.
this only seems to work when the result is 2 or more rows...
GROUP BY ct.name
Here I'm checking if the distance of the company is within the users preferenced search radius
HAVING distance < 20;
for example if I had 'Fastfood', 'Vegan', and 'Healthy' as Cuisine Types, I only want one of each Cuisine Types no matter how many companies in the search distance are related to that Cuisine Type. So I filter the double Cuisine Types away using the GROUP BY I hope this helps with understanding my approach in this query.
NOTE: There is only one Cuisine Type attached to a company.
Full sql query without comments down here
SELECT ct.name, ( 3959 * acos( cos( radians(52.779716) ) * cos(
radians( com.gps_lat ) ) * cos( radians( com.gps_lon ) -
radians(21.84803) ) + sin( radians(52.779716) ) * sin( radians(
com.gps_lat ) ) ) ) as distance FROM cuisine_types as ct LEFT JOIN
company AS com ON (com.cuisine_type_id = ct.id) GROUP BY ct.name
HAVING distance < 20;
Try this:
SELECT
ct.name,
min( ( 3959 * acos( cos( radians(52.779716) ) * cos( radians( com.gps_lat ) ) * cos( radians( com.gps_lon ) - radians(21.84803) ) + sin( radians(52.779716) ) * sin( radians( com.gps_lat ) ) ) ) ) as distance
FROM
cuisine_types as ct
LEFT JOIN company AS com ON (com.cuisine_type_id = ct.id)
GROUP BY
ct.name
HAVING
distance < 20;

SQL: Two different queries to merge

I have these two different queries.
This query pulls the records from "posts" table as per their replies counter. Only posts with replies are returned with this query:
SELECT posts.title, posts.num, posts.status, COUNT( posts_replies.post_num) AS count
FROM posts_replies
INNER JOIN posts ON ( posts_replies.post_num = posts.num )
WHERE posts.status = 1
AND posts.category='uncategorized'
GROUP BY posts.num
And this is a new query that i want to merge with the above one to pull and sort records as per gps.
SELECT num, title, ( 3959 * acos( cos( radians( 37 ) ) * cos( radians( lat ) ) * cos( radians( lon ) - radians( -122 ) ) + sin( radians( 37 ) ) * sin( radians( lat ) ) ) ) AS distance
FROM posts
HAVING distance <75
ORDER BY distance
This query uses the columns lat and long to return records that are within the 75 miles radius of the user.
I am not a sql expert and don't know how to merge both of the queries to gather results having the following criteria:
Only return posts with replies
Sort by their distance
Sort by their number of replies
Any help would be highly appreciated.
Thanks!
The having clause in the second query does not look correct. In most dialects of SQL is would not be allowed without a group by. I forget if MySQL t implicitly treats the whole query as an aggregation (returning one row) or if the having gets converted to a where. In either case, you should be explicit and use where when there are no aggregations.
You can just combine them by putting in the where clause. I would do it with a subquery, to make the variable definitions clearer:
SELECT p.title, p.num, p.status, p.distance,
COUNT( p_replies.post_num) AS count
FROM posts_replies pr INNER JOIN
(select p.*,
( 3959 * acos( cos( radians( 37 ) ) * cos( radians( lat ) ) * cos( radians( lon ) - radians( -122 ) ) + sin( radians( 37 ) ) * sin( radians( lat ) ) ) ) AS distance
from posts p
) p
ON pr.post_num = p.num
WHERE p.status = 1 AND
p.category='uncategorized' and
distance < 75
GROUP BY p.num
order by distance