GTFS Data format: Non Unique Route id? - gtfs

This is query about the requirements for routes file in a GTFS feed. If I understand correctly, a route is a set of trips that is spread out across a a time horizon. For example, if there is a bus travelling between stations A and B five times a day, these trips would be alloted one route ID.
Now suppose, there are two other stations, lets say C and D, distinct from A and B and not lying between A and B. Suppose these stations also have 5 trips running between them everyday.
If a GTFS feeds assigns these two sets of trips the same route ID, would this be a violation of the GTFS requirements?
One example of such a feed can be found here:https://gtfs.de/de/feeds/de_rv/
One example is the route with route id 22. This id is used for trips between stations that lie in two non-adjacent state (Nordrhein Westfalen and Baden Wüttenberg). The stations have no overlap.
Would this be violation of the GTFS specification?

Just a few days ago I had the same problem. Turns out that the creators of the feed put together routes with the same real world names (route_long_name).
In your case (route_id 22) it would be the s6 wich is propably also used in different cities.
I dont realy understand the logic behind this but acording to them it is still a valid feed.

Related

Draw routes with leaflet from GTFS data

How to extract stops AND their stop_sequence knowing a route_id from GTFS data using MySQL.
I want this because I'm trying to draw the routes using Leaflet which requires to give the stops coordinates in the right order.
I've only found the stop_sequence information in the stop_times.txt file, but it's only correct for one trip on this route.
This answer only tells which are the stops that are associated with a certain route, but not in the good order
I think you've arrived at your own answer here: Stops are ordered in sequence only along a specific trip, of which a route normally has many. This is meant to accommodate routes that have multiple branches or that change their path at certain times, such as a route that makes a diversion through an industrial park during rush hour.
What you'll need to do is first identify a trip that is typical of the route you intend to plot, and note its trip ID. To get a list of all the trips along a specific route, run a query like
SELECT id, headsign, short_name, direction_id
FROM trips
WHERE route_id = <route_id>;
Once you've selected a trip, getting the list of the stops it visits, in order, is straightforward:
SELECT code, name, lat, lon, arrival_time, departure_time
FROM stops
INNER JOIN stop_times ON stop_times.stop_id = stops.id
WHERE trip_id = <trip_id>
ORDER BY stop_sequence ASC;
(I've added a few extra fields here for clarity; it sounds like all you really need are the lat and lon fields included in the results.)
So how do you identify a "typical" trip for the route you want to plot? Often the headsign information for a trip indicates its branch or destination. If you need to be more specific—identifying trips that run between certain hours on certain days, for instance—the information in the calendars and calendar_dates tables can help you narrow these down.

database design for this road map

I am thinking about the database schema for a road map and thinking about the best possible model.I have following queries in my mind that need to be tackled like
Do streets s1 and s2 intersect?
Get all streets adjacent to point of interest p.
OR
Get the distance between entrance e1 and exit e2 on highway h.
Get the shortest route from intersection i1 to intersection i2.
I thought the table names should
roads and streets, including highways
governmental regions: states, counties, and local municipalities of cities, towns, villages
I have strong expertise in Database modeling, but this is first time I am creating a schema like this, Any help in this regard
As per SO rules, OP have to show some effort , I have seen some similar questions thats why I am asking for help in schema.
You need to have nodes and edges - for inner solving your tasks. (short way and so on)
You need the roads and streets and regions you have told about to translate from your inner model to human language. Also, don't forget the point objects: bus stops, crossroads, house entrances, phone booths, shops atd.
So, you need two models and a structure and a set of methods for translation between them.
Inner model:
Table: NODE Table: EDGE
Id Id
place(REGION.Id) Start(Node.Id)
End (Node.Id)
Length
road (ROAD.Id) (to what road belongs)
Outer Model:
Table: ROAD Table: Crossroad Table:REGION
Id point (NODE.Id) Id
Level Id Name
Name Level
Parent (REGION.Id)
You will need to know if roads intersect. The answer to that question should be in your database model.
It is the most important question you can have.
I would say that you need to organize your information around intersections. What road of which type is present at intersection at GIS location (x,y)?
There is only one table for roads, but each road can be of different types. Sometimes can a road be different types of road, yet have the same traffic rules. To get that distinction within the model would I make a distinction between type of road and effective type of road.
A first entrance for a data model is: http://www.osgeo.org/.
EDIT: take a look for instance at this link on that site:
http://live.osgeo.org/en/index.html
There can you download a distribution that enables you to test different open source geospatial software.
This might be exactly what you are looking for:
http://mapguide.osgeo.org/

How to store a lot of different timetables in MYSQL?

I need help about how to model a database. I need to store the timetable for each transport public line. Lets see what we have...
I have different lines (bus number 100, 101, 102 and so on).
Each line has different stops and I need to store the coordinates of each one of them.
Each stop has a specific timetable, for example:
http://rozklady.mpk.krakow.pl/aktualne/0106/0106t001.htm
http://rozklady.mpk.krakow.pl/aktualne/0106/0106t003.htm
The aim of the program that I'm developing is to check for errors in the official timetables. Each bus has a tracking GPS device that sends its position to a database every 10 seconds. So I must check the hour of the reports whose coordinates are close to the coordinates of one of the stops and compare that time with the official time, and in case there is a big difference, create a row in other table STATISTICS reporting the issue.
Anyway, this was just for the context. The truth is that I don't have any clue about how to store it in an efficient way.
I thought about creating a table with the Stops: STOP_ID (PK) - NAME - LAT - LON - LINE - TIMETABLE
Where timetable would be an array containing all the times serialized for that stop [5:03,5:25,5:50,6:12,...].
Although I think this is not a good solution, I can't think about a better approach.
Maybe I could create a table for the stops, and other for timetables, but what would be the columns for timetables? I have so many variables... if it's weekly, saturday or holiday, a lot of hours, minutes... and all different for each stop.
Could you share any thoughts about how to face this problem? Thank you very much!!
As Simon mentioned, you are starting a big project.
Suggestion: Read up on the various normal forms for relational DBMSs; this will give you some helpful background if you don't have it.
What are your entities (tables)?
Bus lines (consider the outbound trip and return trip to be two different lines).
Stations on those lines, ordered.
Trips (e.g. 106 bus leaves central station at 05:22, another trip at 05:42, etc).
Scheduled-stops
GPS observations.
Here are possible tables and columns:
Busline table: one row for each busline.
Busline e.g. 106-outbound or 108-inbound (pk)
Description
Station table: one row for each bus stop, including ends of trips
Busline part of pk, fk to Busline e.g. 106
Stationid part of pk kf to Station
Description e.g. Second Avenue Eastbound at Houston Street
lat
long
Trip table: One row for each bus trip.
Tripid pk
Busline fk to Busline
Description e.g. 05:22 trip Central Station to University Park
Schedule table: one row for each scheduled time for each trip at each stop
Scheduleid pk ... ascending serial number.
Busline fk to Station
Stationid fk to Station
Tripid fk to Trip
Time
Observation table a row for each of your GPS readings
Observationid pk ... ascending serial number
Busline if you know it fk to Busline
Tripid if you have it fk to Trip
Time
Lat
Long
My advice with RDBMS design is to avoid serializing multiple items of data into single DBMS columns. That's why I have suggested the Schedule table.
Once you figure out how to load your Busline, Station, Trip, and Schedule tables, and you've loaded your observations into the Observation table, it will be an interesting exercise to correlate your observations with your schedules.
Be careful! You may embarrass your municipal transport department! :-)

database relation tables

Well im working on an online game project where every player will get to have a house, the project was going great until u had to organize houses thinking that i will have a limit of 10.000 players on the server.
so here are some facts:
each player may have up to 10 houses (min 1).
that means i need 100.000 possible houses so that no matter how full the server gets, there will always be 10 possible houses for each player.
there will be 20 cities, each city will have 200 neighborhoods, and each neighborhood will have 25 houses
so thats when the problem shows up, how would the database work?
whenever i want to see any of the 100.000 houses, it will belong to a neighborhood on a city, and i need to be able to play with that, like if i want to see house #5 of the neighborhood #123 city#8, then i should be able to get the data.
i was thinking about having one table with all 100.000 houses, and having values on that table to tell what neighborhood it is on what city, but it feels like there must be a better way. (maybe using multiple tables? i dont know)
so any help would be appreciated.
thank you for your time.
It's hard to answer, because your complete requirements aren't known.
But I'd recommend that you normalize the database - 3rd normal form at minimum. If you don't know what that is, I'd recommend learning about it ASAP.
My best guess is as follows:
PLAYER is one-to-many with HOUSE. If more than one PLAYER owns the same HOUSE, then it's many-to-many.
CITY is one to many with NEIGHBORHOOD
NEIGHBORHOOD is one-to-many with HOUSE
You'll do JOINS to associate a PLAYER with cities, neighborhoods, and houses. You should have surrogate primary keys on all tables and appropriate foreign keys. Add UNIQUE indexes on all other candidate key combinations.
Run EXPLAIN PLAN on all your queries to make sure that they don't require TABLE SCAN and perform adequately.
10,000 players isn't a big number for a database. You won't have any problem.
Here's what your query to select all the houses in a neighborhood might look like:
SELECT *
FROM HOUSE H
INNER JOIN NEIGHBORHOOD N
ON H.HOUSE_ID = N.HOUSE_ID
INNER JOIN CITY C
ON N.NEIGHBORHOOD_ID = C.NEIGHBORHOOD_ID
WHERE C.CITY_ID = 8
AND N.NEIGHBORHOOD_ID = 123
ORDER BY H.HOUSE_ID

How to design this "bus stations" database?

I want to design a database about bus stations. There're about 60 buses in the city, each of them contains these informations:
BusID
BusName
Lists of stations in the way (forward and back)
This database must be efficient in searching, for example, when user want to list buses which are go through A and B stations, it must run quickly.
In my first thought, I want to put stations in a seperate table, includes StationId and Station, and then list of stations will contains those StationIds. I guest it may work, but not sure that it's efficient.
How can I design this database?
Thank you very much.
Have you looked at Database Answers to see if there is a schema that fits your requirements?
I had to solve this problem and I used this :
Line
number
name
Station
name
latitude
longitude
is_terminal
Holiday
date
description
Route
line_id
from_terminal : station_id
to_terminal : station_id
Route schedule
route_id
is_holiday_schedule
starting_at
Route stop
route_id
station_id
elapsed_time_from_start : in minutes
Does it looks good for you ?
Some random thoughts based on travel on London buses In My Youth, because this could be quite complex I think.
You might need entities for the following:
Bus -- the physical entity, with a particular model (ie. particular seating capacity and disabled access, and dimensions etc) and VIN.
Bus stop -- the location at which a bus stops. Usually bus stops come in pairs, one for each side of the road, but sometimes they are on a one-way road.
Route -- a sequence of bus stops and the route between them (multiple possible roads exist). Sometimes buses do not run the entire route, or skip stops (fast service). Is a route just one direction, or is it both? Maybe a route is actually a loop, not a there-and-back.
Service -- a bus following a certain route
Scheduled Run -- an event when a bus on a particular service follows a particular route. It starts at some part of the route, ends at another part, and maybe skips certain stops (see 3).
Actual Run -- a particular bus followed a particular scheduled run. What time did it start, what time did it get to particular stops, how many people got on and off, what kind of ticket did they have?
(This sounds like homework, so I won't give a full answer.)
It seems like you just need a many-to-many relationship between buses and stops using 3 tables. A query with two inner joins will give you the buses that stop at two specific stops.
I'd hack it.
bus_id int
path varchar(max)
If a bus goes through the following stations (in this order):
01
03
09
17
28
Then I'd put in a record where path was set to
'-01-03-09-17-28-'
When someone wants to find a bus to get from station 03 to 28, then my select statement is
select * from buses where path like '%-03-%-28-%'
Not scalable, not elegant, but dead simple and won't churn through tables like mad when trying to find a route. Of course, it only works if there's a single bus that goes through the two stations in question.
what you have thought is good, in some cases it may or may not be efficient. I think that yo u should create tables as table1(BusID, BusName) table 2(Station List, Bus Id). I think this would would help. And try to use joins between these two tables to get the result. One more thing if possible try to normalize the tables that would help you.
I'd go for 3 tables :
bus
stations
bus_stations
"bus" for what the name stands for, "stations" for the station id's and names, and "bus_stations" to connnect those other 2 tables, wich would have bus_id, station_id_from station_id_to
This is probably more complex that you really need, but if, in the furure, you need to know the full trajectory of a bus, and also, from witch station one bus comes when it goes to "B station", will be usefull.
60 buses will not make that much impact in performance though.