How to store a lot of different timetables in MYSQL? - mysql

I need help about how to model a database. I need to store the timetable for each transport public line. Lets see what we have...
I have different lines (bus number 100, 101, 102 and so on).
Each line has different stops and I need to store the coordinates of each one of them.
Each stop has a specific timetable, for example:
http://rozklady.mpk.krakow.pl/aktualne/0106/0106t001.htm
http://rozklady.mpk.krakow.pl/aktualne/0106/0106t003.htm
The aim of the program that I'm developing is to check for errors in the official timetables. Each bus has a tracking GPS device that sends its position to a database every 10 seconds. So I must check the hour of the reports whose coordinates are close to the coordinates of one of the stops and compare that time with the official time, and in case there is a big difference, create a row in other table STATISTICS reporting the issue.
Anyway, this was just for the context. The truth is that I don't have any clue about how to store it in an efficient way.
I thought about creating a table with the Stops: STOP_ID (PK) - NAME - LAT - LON - LINE - TIMETABLE
Where timetable would be an array containing all the times serialized for that stop [5:03,5:25,5:50,6:12,...].
Although I think this is not a good solution, I can't think about a better approach.
Maybe I could create a table for the stops, and other for timetables, but what would be the columns for timetables? I have so many variables... if it's weekly, saturday or holiday, a lot of hours, minutes... and all different for each stop.
Could you share any thoughts about how to face this problem? Thank you very much!!

As Simon mentioned, you are starting a big project.
Suggestion: Read up on the various normal forms for relational DBMSs; this will give you some helpful background if you don't have it.
What are your entities (tables)?
Bus lines (consider the outbound trip and return trip to be two different lines).
Stations on those lines, ordered.
Trips (e.g. 106 bus leaves central station at 05:22, another trip at 05:42, etc).
Scheduled-stops
GPS observations.
Here are possible tables and columns:
Busline table: one row for each busline.
Busline e.g. 106-outbound or 108-inbound (pk)
Description
Station table: one row for each bus stop, including ends of trips
Busline part of pk, fk to Busline e.g. 106
Stationid part of pk kf to Station
Description e.g. Second Avenue Eastbound at Houston Street
lat
long
Trip table: One row for each bus trip.
Tripid pk
Busline fk to Busline
Description e.g. 05:22 trip Central Station to University Park
Schedule table: one row for each scheduled time for each trip at each stop
Scheduleid pk ... ascending serial number.
Busline fk to Station
Stationid fk to Station
Tripid fk to Trip
Time
Observation table a row for each of your GPS readings
Observationid pk ... ascending serial number
Busline if you know it fk to Busline
Tripid if you have it fk to Trip
Time
Lat
Long
My advice with RDBMS design is to avoid serializing multiple items of data into single DBMS columns. That's why I have suggested the Schedule table.
Once you figure out how to load your Busline, Station, Trip, and Schedule tables, and you've loaded your observations into the Observation table, it will be an interesting exercise to correlate your observations with your schedules.
Be careful! You may embarrass your municipal transport department! :-)

Related

Database Design Inquiry in regards to Buildings and Floor plans

Good afternoon. I am tasked at starting the framework (for a Final group project) at building, maintaining, and allocating a MySQL database for a proposed company. This company has many buildings, and may of them have multiple floors and many rooms. The database should store all the buildings, their GPS location, their number of floors, the floor plan images corresponding to that, all their rooms, and their locations within the building.
At the moment, I have a database with one table, "Buildings". In "Buildings" I have rows for the following: ID, Name, Latitude, Longitude, # of Floors, Floor plan Image(x4). I am assuming that there are a max of four floors. [probably a bad idea]
I am now stuck with how I want to store all the rooms, the floor plans, and the rooms location within the floor plan.
My initial thought was to create a new table for each building, and have rows for Room Number, Corresponding Floor Plan, and Location (Lat/Long). Yet if the company has, say, over 100 buildings, and each building has maybe only one floor and a couple rooms, then I think this would be overkill. Plus I think dealing with 100's of tables is bad practice.
Is there an easier way with dealing with lots of buildings, but with varying room amounts and floors?
Any suggestions would be much appreciated!
Each entity should have its own table. The tables will be linked using Primary Keys and Foreign Keys. Number of floors should not be hard-coded, you should retrieve the number of floors using the COUNT function.
Normalize your database up to 3NF.
Here is what your DB should look like:
Buildings:
BuildingID(PK)
LongName
ShortName
Latitude
Longitude
etc
Floors:
FloorID(PK)
BuildingID(FK)
FloorName
etc
FloorPlans:
FloorPlanID(PK)
FloorID(FK)
FloorPlanName
FloorPlanImage
Rooms:
RoomID(PK)
FloorID(FK)
RoomNumber
Latitude
Longitude
RoomSize
etc

How can I optimize this data table?

So for a project, I have to create a database and attempt to optimize as much as possible. We are given text files with data listed and I'm having trouble figuring out how I can best relate these pieces.
For right now, I have a Persons data file with names, addresses, etc. and an Airport Data File with airport coordinates, address, etc. In the Airport data file, one of the values we are given is described as this:
PassengerFacilityFee - Fees charged by the airport per passenger per arrival.
Should I create a seperate table for the fees and then use foreign keys from Persons and Airport, or how else could I organize this?
Should I create a seperate table for the fees and then use foreign keys from Persons and Airport, or how else could I organize this?
No. The facility fee is purely a property of an airport. It is not related to specific people - there is not a different facility fee depending on whether Alice or Bob flies out of LAX, for instance. The fact that the fee is assessed "per passenger" is irrelevant here.
This fee should most likely be a column in your table of airport data.

Database design structure

I´m new to database design and never took class on it, i have problem with structuring my database and assigning primary keys.
I have a list of cities, each city has 5 types of public transport. Each type of public transport has different ticket price, main station and CSV file with route coordinations etc. in every city. Then i need to daily calculate average cost of transportation in every city for each type of public transport based on route coordinations (distances), price, time it takes etc.
Table cities:
city (Primary key)
Table public transport:
city, type of transport, ticket price, main station, file1, file2
Table results:
city, type of transport, date, cost
How should i connect these tables (assuming their structure is right)? In table public transport, i think city should be foreign key but type of transport will repeat for every city so i dont think it can be primary key of this table - the same for table results.
The main idea is that you don't wish to repeat ya self. Not only is it an overhead but also it's quite error prone when you wish to change multiple entries that represent the same thing.
There are guidelines on database normalization which help you to ensure that your data is on a form that's easy to maintain and work with.
You don't need to become an expert in understanding which form does what, but being able to identify what should be kept separated is a must when it comes to database designing.
You should list what you know:
Different cities.
Different type of transport.
Different ticket prices.
Different stations.
If you create a separate table for all of those then it'll be easy to link them together in rows in a table that then represents something on a larger scale. Every entry should have a separate id that will be your primary key, you need to be able to allow e.g. multiple cities with the same name, thus not being able to hold a unique value if they are to be the primary key.
E.g. now it would be easy to identify routes for a city, there can be multiple routes in a city
route_id | city_id | route_name
1 2 test1
2 2 test2
You then could add another table that represent which kind of transport is tied with this specific kind of route.
route_id | transport_id
1 3
2 4
You're then able to create a new table that holds points of stations that are a part of your route and you can even identify whether it's a main route or not.
route_connected_id | route_id | station_id | main_route
1 1 2 1 // a main route
2 1 3 0 // not a main route
And it goes on and on, separating the most simple entries allows you to create complex relationships where all you have to do is link ids.
This is the basic idea which should hopefully get you started, whether you find it helpful or not then I recommend that you take a look on the reading material that I suggested, i.e. database normalization.

Database Normalization with user input

I develop a mysql database that will contain the country,city and occupation of each user.
While I can use a "country" table and then insert the id of the country into the user table, I still have to look for the perfect method for the other two tables.
The problem is that the city and occupation of each user are taken from an input field, meaning that users can type "NYC" or "New York" or "New York City" and millions of other combinations for each town, for example.
Is it a good idea to disregard this issue, create an own "town" table containing all the towns inserted by users and then put the id of the town entry into the user table or would it be more appropriate to use a VARCHAR column "town" in the user table and not normalize the database concerning this relation?
I want to display the data from the three tables on user profile pages.
I am concerned about normalization because I don't want to have too much redundant data in my database because it consumes a lot of space and the queries will be slower if I use a varchar index instead of an integer index for example (as far as I know):
Thanks
We had this problem. Our solution was to collect the various synonyms and typo-containing versions that people use and explicitly map them to a known canonical city name. This allowed to correctly guess the name from user input in 99% of cases.
For the remaining 1%, we created a new city entry and marked it as a non-canonical. Periodically we looked through non-canonical entries. For recognizable known cities, we remapped the non-canonical entry to the canonical (updating FKs of linked records and adding a synonym). For a genuinely new city name we didn't know about we kept the created entry as canonical.
So we had something like this:
table city(
id integer primary key,
name varchar not null, -- the canonical name
...
);
table city_synonym(
name varchar primary key, -- we want unique index
city_id integer foreign key references(city.id)
);
Usually data normalization helps you to work with data and keep it simple. If normalized schema not fit your needs you can use denormalized data as well. So it depends on queries you want to use.
There is no good solution to group cities without creating separate table where you will keep all names for each city within single id. So it will be good to have 3 tables then: user(user_id, city_id), city (city_id, correct name), city_alias(alias_id, city_id, name).
It would be better to store the data in a normalized design, containing the actual, government recognized city names.
#Varela's suggestion of an 'alias' for the city would probably work well in this situation. But you have to return a message along the lines of "You typed in 'Now Yerk'. Did you perhaps mean 'New York'?". Actually, you want to get these kinds of corrections regardless...
Of course, what you should probably actually store isn't the city, but the postal/zip code. Table design is along these lines:
State:
Id State
============
AL Alabama
NY New York
City:
Id State_Id City
========================
1 NY New York
2 NY Buffalo
Zip_Code:
Id Code City_Id
=========================
1 00001-0001 1
And then store a reference to Zip_Code.Id whenever you have an address. You want to know exactly which zip code a user has (claimed) to be a part of. Reasons include:
Taxes for retail (regardless of how Amazon plays out).
Addresses for delivery (There is a Bellevue in both Washington and New York, for example. Zip codes are different).
Social mapping. If you store it as 'user input' cities, you will not be able to (easily) analyze the data to find out things like which users live near each other, much less in the same city.
There are a number of other things that can be done about address verification, including geo-location, but this is a basic design that should help you in most of your needs (and prevent most of the possible 'invalid' anomalies).

How to design this "bus stations" database?

I want to design a database about bus stations. There're about 60 buses in the city, each of them contains these informations:
BusID
BusName
Lists of stations in the way (forward and back)
This database must be efficient in searching, for example, when user want to list buses which are go through A and B stations, it must run quickly.
In my first thought, I want to put stations in a seperate table, includes StationId and Station, and then list of stations will contains those StationIds. I guest it may work, but not sure that it's efficient.
How can I design this database?
Thank you very much.
Have you looked at Database Answers to see if there is a schema that fits your requirements?
I had to solve this problem and I used this :
Line
number
name
Station
name
latitude
longitude
is_terminal
Holiday
date
description
Route
line_id
from_terminal : station_id
to_terminal : station_id
Route schedule
route_id
is_holiday_schedule
starting_at
Route stop
route_id
station_id
elapsed_time_from_start : in minutes
Does it looks good for you ?
Some random thoughts based on travel on London buses In My Youth, because this could be quite complex I think.
You might need entities for the following:
Bus -- the physical entity, with a particular model (ie. particular seating capacity and disabled access, and dimensions etc) and VIN.
Bus stop -- the location at which a bus stops. Usually bus stops come in pairs, one for each side of the road, but sometimes they are on a one-way road.
Route -- a sequence of bus stops and the route between them (multiple possible roads exist). Sometimes buses do not run the entire route, or skip stops (fast service). Is a route just one direction, or is it both? Maybe a route is actually a loop, not a there-and-back.
Service -- a bus following a certain route
Scheduled Run -- an event when a bus on a particular service follows a particular route. It starts at some part of the route, ends at another part, and maybe skips certain stops (see 3).
Actual Run -- a particular bus followed a particular scheduled run. What time did it start, what time did it get to particular stops, how many people got on and off, what kind of ticket did they have?
(This sounds like homework, so I won't give a full answer.)
It seems like you just need a many-to-many relationship between buses and stops using 3 tables. A query with two inner joins will give you the buses that stop at two specific stops.
I'd hack it.
bus_id int
path varchar(max)
If a bus goes through the following stations (in this order):
01
03
09
17
28
Then I'd put in a record where path was set to
'-01-03-09-17-28-'
When someone wants to find a bus to get from station 03 to 28, then my select statement is
select * from buses where path like '%-03-%-28-%'
Not scalable, not elegant, but dead simple and won't churn through tables like mad when trying to find a route. Of course, it only works if there's a single bus that goes through the two stations in question.
what you have thought is good, in some cases it may or may not be efficient. I think that yo u should create tables as table1(BusID, BusName) table 2(Station List, Bus Id). I think this would would help. And try to use joins between these two tables to get the result. One more thing if possible try to normalize the tables that would help you.
I'd go for 3 tables :
bus
stations
bus_stations
"bus" for what the name stands for, "stations" for the station id's and names, and "bus_stations" to connnect those other 2 tables, wich would have bus_id, station_id_from station_id_to
This is probably more complex that you really need, but if, in the furure, you need to know the full trajectory of a bus, and also, from witch station one bus comes when it goes to "B station", will be usefull.
60 buses will not make that much impact in performance though.