How to extract stops AND their stop_sequence knowing a route_id from GTFS data using MySQL.
I want this because I'm trying to draw the routes using Leaflet which requires to give the stops coordinates in the right order.
I've only found the stop_sequence information in the stop_times.txt file, but it's only correct for one trip on this route.
This answer only tells which are the stops that are associated with a certain route, but not in the good order
I think you've arrived at your own answer here: Stops are ordered in sequence only along a specific trip, of which a route normally has many. This is meant to accommodate routes that have multiple branches or that change their path at certain times, such as a route that makes a diversion through an industrial park during rush hour.
What you'll need to do is first identify a trip that is typical of the route you intend to plot, and note its trip ID. To get a list of all the trips along a specific route, run a query like
SELECT id, headsign, short_name, direction_id
FROM trips
WHERE route_id = <route_id>;
Once you've selected a trip, getting the list of the stops it visits, in order, is straightforward:
SELECT code, name, lat, lon, arrival_time, departure_time
FROM stops
INNER JOIN stop_times ON stop_times.stop_id = stops.id
WHERE trip_id = <trip_id>
ORDER BY stop_sequence ASC;
(I've added a few extra fields here for clarity; it sounds like all you really need are the lat and lon fields included in the results.)
So how do you identify a "typical" trip for the route you want to plot? Often the headsign information for a trip indicates its branch or destination. If you need to be more specific—identifying trips that run between certain hours on certain days, for instance—the information in the calendars and calendar_dates tables can help you narrow these down.
Related
This is query about the requirements for routes file in a GTFS feed. If I understand correctly, a route is a set of trips that is spread out across a a time horizon. For example, if there is a bus travelling between stations A and B five times a day, these trips would be alloted one route ID.
Now suppose, there are two other stations, lets say C and D, distinct from A and B and not lying between A and B. Suppose these stations also have 5 trips running between them everyday.
If a GTFS feeds assigns these two sets of trips the same route ID, would this be a violation of the GTFS requirements?
One example of such a feed can be found here:https://gtfs.de/de/feeds/de_rv/
One example is the route with route id 22. This id is used for trips between stations that lie in two non-adjacent state (Nordrhein Westfalen and Baden Wüttenberg). The stations have no overlap.
Would this be violation of the GTFS specification?
Just a few days ago I had the same problem. Turns out that the creators of the feed put together routes with the same real world names (route_long_name).
In your case (route_id 22) it would be the s6 wich is propably also used in different cities.
I dont realy understand the logic behind this but acording to them it is still a valid feed.
I work on the map and keep the locations in the database.
There are different locations of the map on the table. There are many locations within the table.
Now different users come and click on different position on the map, so I have to select a query in the database to find each location. Is this correct? That means I run a query for every click.
In my opinion, specify a dimension(area), and whenever these dimensions are loaded, I select all the locations within those dimensions to avoid additional query select to database.
What is the best optimization way?
What is a location on a map? Is it a single point? Or is it a region, such as a country, county, province, etc?
In the former case the problem is to "find the nearest" item on the map to the mouse click point. I cover that here
For the latter case, you need to turn each region into a polygon and enter it into a geographic object that is indexed with a SPATIAL index.
It is beyond the scope of this forum to provide all the details for either of your cases. If you have trouble, come back with a more specific question, including the steps you have taken so far.
I have a lat and lon coordinates of a spot and the radius in which I want to search for a stop, I then execute a function from google-maps to query my GTFS database with those variables but I don't know how the query should look. Can I select the wanted routes using only sql query ? If so, how can I do that?
If it can't be done using only sql what are my options?
*sorry for the broad question and no code samples but I'm new to this and need some basic concept guidance sometimes.
anyway thanks for the help.
(Caveat: I'm not that familiar with MySQL and these queries are untested.)
First define a function in MySQL to calculate the distance between pairs of lat-long points. See e.g. this answer. Then, to select stops near a given point:
SELECT stop_id
FROM stops
WHERE getDistanceBetweenPointsNew(stop_lat, stop_lon, my_lat, my_lon) < my_dist;
There is no extremely natural way to find routes associated with stops in the GTFS spec. To do so, you'll need to join trips against stop_times, which will be slow if your stop_times table is large and/or unindexed. I suggest pre-calculating a table associating stops and routes:
CREATE TABLE route_stop AS
SELECT DISTINCT route_id, stop_id
FROM trips
JOIN stop_times
ON trips.trip_id = stop_times.trip_id;
Assuming this table has been created, you can find the list of routes that stop near a given point like so:
SELECT route_id
FROM stops
JOIN route_stop
ON stops.stop_id = route_stop.stop_id
WHERE getDistanceBetweenPointsNew(stop_lat, stop_lon, my_lat, my_lon) < my_dist;
I have a queue of photos to be featured on the homepage of a photography site. Photographers tend to upload several dozen shots at once, meaning that editors selecting the best of the uploads are very likely to put several shots from the same photographer onto the queue one after the other. But we don't want one photographer to own the homepage for hours on end.
At the moment, we sort the queue manually so that it's in queued time order (FIFO) as far as possible, but with no two shots by the same photographer closer than five slots apart. We'd like to automate this.
I know we can do the sorting in PHP, but can we retrieve the queue in the right order with a single MySQL query?
The table structure looks something like this. We sort the queue by swapping the queued_time of two adjacent shots - hardly ideal, but it works:
homepage_queue
--------------
id INT NOT NULL
photo INT NOT NULL
queued_time INT NOT NULL
photos
------
id INT NOT NULL
photographer INT NOT NULL
A browse through related SO questions threw up this page, which seems to suggest that I need to emulate Oracle's LAG function: http://onlamp.com/pub/a/mysql/2007/04/12/emulating-analytic-aka-ranking-functions-with-mysql.html?page=2
Especially when I consider that I need to look at the last five rows, that looks messy enough that I'm tempted to run away screaming and do it in PHP, but is there an easier way I'm missing?
We generally keep the queue stuffed out to a week, at an hour per photo, so we're talking about maybe 200 records at the outside.
There are bound to be some photos at the end of the queue that can't be sorted in a way that fulfils the "five apart" rule. That doesn't matter, because we'll likely run the job every 24hrs, and with a steady stream of uploads it's OK for the tail end of the queue to be a mess.
I would add another table to the mix, one that records the last 5 photographers to have appeared on your website.
Query to pick your next photo:
SELECT
homepage_queue.photo
FROM
homepage_queue
INNER JOIN
photos
ON photos.id = homepage_queue.photo
LEFT JOIN
(SELECT photographer, COUNT(*) AS occurances FROM last_five GROUP BY photographer) AS last_five
ON last_five.photographer = photos.photographer
ORDER BY
last_five.occurances ASC,
homepage_queue.queued_time
LIMIT
1
Once you've picked your photo:
- Store that value somewhere
- Delete the oldest entry from last_five
- Add a new entry to last_five relating to the new photo's photographer
- Delete the chosen photo from the queue
A little extra maintenance, but a solution that's relatively simple and maintains itself.
If the queue is full of just two photographers, they'll alternate
If a new photographer then uploads a few photos, they'll get priority
The photographers with fewest occurances in the last 5 always get priority
EDIT:
This simplifies the problem by focussing only on what's next?
You can adapt this to generate a whole new queue by repeating the process 24 times in a loop. Each itteration you push the next photo onto your new queue.
You could even generate that list of 24 photos once, then use single itterations each hour:
- Remove one photo
- Use this method to add one photo
Then you have a constant list of 24 photos, a method to always add "the right one" to the end of the list, and the ability to re-order that list of 24 at any time you like.
Really, when you ask a question, you should provide some information about the data strucvture. Let me assume that you have the following columns in the underlying table:
Queue position
Photographer
Photo id
If so, the following will return one row per photographer, based on the first photo in the queue:
select q.*
from Queue q join
(select PhotographerId, min(QueuePosition) as minQP
from queue q
group by PhotographerId
) qp
on q.QueuePosition = minQP
order by q.QueuePosition
The following is the variation for your actual data:
select q.*
from Queue q join
(select Photographer, min(QueuedTime) as minQT
from HomePage_Queue hpq join
Photos p
on hpq.PhotoId = p.Id
group by Photographer
) qp
on q.QueuedTime= minQT
order by q.QueuedTime
This will work assuming that the QueuedTimes are unique. If they are not, a bit more work would need to be done.
I want to design a database about bus stations. There're about 60 buses in the city, each of them contains these informations:
BusID
BusName
Lists of stations in the way (forward and back)
This database must be efficient in searching, for example, when user want to list buses which are go through A and B stations, it must run quickly.
In my first thought, I want to put stations in a seperate table, includes StationId and Station, and then list of stations will contains those StationIds. I guest it may work, but not sure that it's efficient.
How can I design this database?
Thank you very much.
Have you looked at Database Answers to see if there is a schema that fits your requirements?
I had to solve this problem and I used this :
Line
number
name
Station
name
latitude
longitude
is_terminal
Holiday
date
description
Route
line_id
from_terminal : station_id
to_terminal : station_id
Route schedule
route_id
is_holiday_schedule
starting_at
Route stop
route_id
station_id
elapsed_time_from_start : in minutes
Does it looks good for you ?
Some random thoughts based on travel on London buses In My Youth, because this could be quite complex I think.
You might need entities for the following:
Bus -- the physical entity, with a particular model (ie. particular seating capacity and disabled access, and dimensions etc) and VIN.
Bus stop -- the location at which a bus stops. Usually bus stops come in pairs, one for each side of the road, but sometimes they are on a one-way road.
Route -- a sequence of bus stops and the route between them (multiple possible roads exist). Sometimes buses do not run the entire route, or skip stops (fast service). Is a route just one direction, or is it both? Maybe a route is actually a loop, not a there-and-back.
Service -- a bus following a certain route
Scheduled Run -- an event when a bus on a particular service follows a particular route. It starts at some part of the route, ends at another part, and maybe skips certain stops (see 3).
Actual Run -- a particular bus followed a particular scheduled run. What time did it start, what time did it get to particular stops, how many people got on and off, what kind of ticket did they have?
(This sounds like homework, so I won't give a full answer.)
It seems like you just need a many-to-many relationship between buses and stops using 3 tables. A query with two inner joins will give you the buses that stop at two specific stops.
I'd hack it.
bus_id int
path varchar(max)
If a bus goes through the following stations (in this order):
01
03
09
17
28
Then I'd put in a record where path was set to
'-01-03-09-17-28-'
When someone wants to find a bus to get from station 03 to 28, then my select statement is
select * from buses where path like '%-03-%-28-%'
Not scalable, not elegant, but dead simple and won't churn through tables like mad when trying to find a route. Of course, it only works if there's a single bus that goes through the two stations in question.
what you have thought is good, in some cases it may or may not be efficient. I think that yo u should create tables as table1(BusID, BusName) table 2(Station List, Bus Id). I think this would would help. And try to use joins between these two tables to get the result. One more thing if possible try to normalize the tables that would help you.
I'd go for 3 tables :
bus
stations
bus_stations
"bus" for what the name stands for, "stations" for the station id's and names, and "bus_stations" to connnect those other 2 tables, wich would have bus_id, station_id_from station_id_to
This is probably more complex that you really need, but if, in the furure, you need to know the full trajectory of a bus, and also, from witch station one bus comes when it goes to "B station", will be usefull.
60 buses will not make that much impact in performance though.