What kind of relationship do these 2 tables require? - mysql

I have 2 tables and was wondering what the best relationship between them was. I know there is a relationship between them but I get so confused with one to many, many to one, many to many, unidirectional, bidirectional, multidirectional etc.
So this is the basic, displayed, structure:
Traveler Table:
+------------------------------------------+
| Name | Family Name | National ID No. |
+------------------------------------------+
| Dianne | Herbert | 579643 |
| Francine | Jackson | 183432 |
| Oprah | Dingle | 269537 |
+------------------------------------------+
Journeys Table
+------------------------------------------------------------------------------------------------------+
| Start Station | End Station | Start Time | End Time | Travelers |
+------------------------------------------------------------------------------------------------------+
| Hull | Leeds | 13:50 | 14:50 | Francine Jackson, Oprah Dingle |
| Newcastle | Manchester | 16:30 | 19:00 | Dianne Herbert, Francine Jackson |
| Hull | Manchester | 10:00 | 13:00 | Dianne Herbert, Francine Jackson, Oprah Dingle |
+------------------------------------------------------------------------------------------------------+
The travelers table is okay, it makes sense:
CREATE TABLE Travelers (
Name VARCHAR(50) NOT NULL,
Family_Name VARCHAR(50) NOT NULL,
National_ID_Number INT(6) NOT NULL PRIMARY KEY
)
But I am unsure about how to do the journeys table. Especially with Travelers:
CREATE TABLE Journeys (
Start_Station VARCHAR(50) NOT NULL,
End_Station VARCHAR(50) NOT NULL,
Start_Time VARCHAR(50) NOT NULL,
End_Time VARCHAR(50) NOT NULL,
Travelers ???????
)
Obviously I have "Travelers" as a column inside my 2nd table. So there is a relationship there with the first table. But what is it? I think I need to make a Foreign Key somehow?

You are looking for a junction/association table. The tables should look like this:
create table Journeys (
Journey_Id int auto_increment primary key,
Start_Station VARCHAR(50) NOT NULL,
End_Station VARCHAR(50) NOT NULL,
Start_Time VARCHAR(50) NOT NULL,
End_Time VARCHAR(50) NOT NULL
)
create table TravelerJourneys (
traveler_journey_id int auto_increment primary key,
traveler_id int(6),
journey_id int,
foreign key (traveler_id) references travelers(National_ID_Number),
foreign key (journey_id) references Journeys (journey_id)
);

I Relational • Pre-Requisite Explanation
There is an awful lot of misinformation; disinformation in the "literature" produced by the "theoreticians" and all the authors that follow them. Of course that is very confusing and leads to primitive, pre-relational Record Filing Systems with none of the Integrity; Power; and Speed of Relational Systems. Second, while newbies try hard to answer questions here, due to the above, they are also badly confused.
I can't provide a tutorial, this is not-so-short explanation of the issues that you need to understand before diving in to the Question.
1 Relationship
I get so confused with one to many, many to one, many to many, unidirectional, bidirectional, multidirectional etc.
unidirectional, bidirectional, multidirectional
Please delete those from your mind, they are not Relational terms (the "theoreticians" and newbies love to invent new things, they have no value other than to add confusion).
There is no direction in a relationship. It always consists of:
a Primary Key: thing that is referenced, the parent end, and
a Foreign Key: the child end, thing that is referencing the parent PK
At the SQL code level, DML, you could perceive a "direction", parent-to-child, or child-to-parent. It is a matter of perception (not storage) and relevant only to the requirement of the code, the "way to get from this data to that data".
At the physical level, SQL DDL, there is only one type of relationship Parent::Child, and that is all we have ever needed. No Cardinality yet, because that is controlled by other means. As with the natural world, the parent is the thing that is referenced, the child is the thing that references the parent.
At the bare bones level, that is not a Relational database, but a 1960's Record Filing System, the relationship is Referenced:: Referencing, and God only knows what each thing is.
The child can have only one parent, and the parent can have many children, therefore the one-and-only relationship at the physical level is:
one [parent] to 0-to-many [children]
A Relational database is made up of things (rows, the main symbol, with either square or round corners)) and relationships between things (the lines, either Identifying or Non-Identifying). A thing is a Fact, each row is a Fact, the relationships are relationships between Facts.
In the Relational Model, each thing must be uniquely Identified, each logical row (not record!) must be unique. That is the Primary Key, which must be made up from the data (INT; GUID; UUID; etc are not data, they are additions, in the system, the user does not see them).
Of course, IDENTITY or AUTOINCREMENT are fine for prototypes and trials, they are not permitted in Production.
There are many differences between Relational databases and the pre-relation, 1960's Record Filing Systems that the "theoreticians" use. Such primitive systems use physical pointers, such as Record ID (INT; GUID; UUID; etc). If I had to declare just one, the fundamental difference is:
whereas the RFS is physical, the Relational Model is Logical
therefore, whereas in the RFS physical records are referenced by their physical pointer, in the RDb logical rows (nor records!) are referenced by their logical Key
The relationship is established as follows:
ALTER TABLE child_table
ADD CONSTRAINT constraint_name
FOREIGN KEY ( foreign_key_column_list )
REFERENCES parent_table ( primary_key_column_list )
Beware, some "theoreticians", and some newbies, do not understand SQL. If I tell you that Sally is Fred's daughter, from the single Fact you will know that Fred is Sally's father. There is no need for the second statement, it is obviously the first statement in reverse. Likewise in SQL, it is not stupid. There is only one relationship definition. But those darlings add a second "relationship", the above in reverse. That is
(a) totally redundant, and
(b) interferes with administration of the tables. Probably, those types are the ones that use weird and wonderful directional terms.
2 Cardinality
That is controlled firstly by implementing an Index, and secondly by additional by other means. The additional is not relevant here.
one [parent]
Each row is unique, by virtue of the Primary Key, expressed as:
ALTER TABLE table
ADD CONSTRAINT constraint_name
PRIMARY KEY ( column_list )
one [parent] to many [children]
Because each parent row is unique, we know that the reference [to the parent] in the child will reference just one row
ALTER TABLE child_table
ADD CONSTRAINT constraint_name
FOREIGN KEY ( foreign_key_column_list ) -- local child
REFERENCES parent_table ( primary_key_column_list ) -- referenced parent
Example
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993. Refer to IDEF1X Introduction,.
ALTER TABLE Customer
ADD CONSTRAINT Customer_pk
PRIMARY KEY ( CustomerCode )
ALTER TABLE OrderSale
ADD CONSTRAINT OrderSale_pk
PRIMARY KEY ( CustomerCode, OrderSaleNo )
ALTER TABLE Order
ADD CONSTRAINT Customer_Issues_Orders_fk
FOREIGN KEY ( CustomerCode ) -- local child
REFERENCES Customer ( CustomerCode ) -- referenced parent
many to one
There is no such thing. It is simply reading a one-to-many relationship in reverse, and doing so without understanding. In the example, reading the data model explicitly, or translating it to text:
Each Customer issues 0-to-n OrderSales
the reverse is (refer again to the one-to-many):
Each OrderSale is issued by 1 Customer
Again, beware, newbies may implement a duplicate relationship, that will (a) confuse you, and (b) stuff things up royally.
many to many
We have been using diagrammatic modelling tools since the early 1980's. Even IDEF1X was available for modelling long before it was elevated to a NIST Standard. Modelling is an iterative process: whereas redrawing is very cheap, re-implementing SQL is expensive. We start at the Logical level with no concern for the physical (tables, platform specifics), with only entities, progress to logical Keys, Normalising as we go. Finally, still at the logical level, we would finalise each table, and check that the datatypes are correctly set.
If and when the logical model is (a) stable, and (b) signed off, then we progress to the Physical: creating the datatypes; tables; foreign keys; etc. It is a simple matter of translating the data model to SQL DDL. If you use a modelling tool, that is one click, and the tool does it for you.
The point is, there is progression, and a distinction between the Logical and Physical levels.
At the physical level, as can be understood from the fact that there is one and only one type of relationship in SQL, there is no such thing as a many-to-many relationship. Notice that it can't be expressed even in text form, in a single statement, we need two statements.
Such a relationship exists only at the logical modelling level: when we determine that there is such a relationship between two Facts (rows in a table), we draw it.
At the point when the data model is stable, and we move from teh Logical to the Physical, the n-to-n relationship is translated into an Associative Table and a relationship to each parent.
Refer to this unrelated document for an Example
Notice the many-to-many relationship Favours in the Logical Requirement
Notice the translation to an Associative table and tw relationships in Implementation (Right side only)
Each User favours 0-to-n ProductPreferences
Each Product is favoured in 0-to-n ProductPreferences
Now notice this sagely: that Implementation model can be read Logically:
Each User favours 0-to-n Products (via ProductPreference)
Each Product is favoured by 0-to-n Users (via ProductPreference)
Additionally, you might find this document helpful (section 1 Implementation: Relationship only).
II Your Question
Now we can deal with your question.
1 The Obstacle
Your quandary is due to:
not progressing through the formal stages, due to lack of education in the subject matter (hopefully mitigated by the above explanations)
having an idea at the Logical level ... but not formally
of the views required in the app, as opposed to the perceiving the data independent of the app
diving into the Physical tables ... with nothing in-between
not asking specific questions, due either to shyness or inability to identify the particular point that you do not understand
and thus you are stuck, as per your original post.
2 The Quandary
Your quandary is:
you have this at the logical level (Data model, Entity-Relationship level):
and of course, your CREATE TABLE commands at the physical level.
I hope my explanations above are enough to understand the great gap in what you have:
the logical vs the physical
that the physical is far too premature
that we need at least some data modelling (not formal, not possible in this medium) to work things out.
The Logical data model is simply not progressed enough, let alone resolved, in order to create stable tables, let alone correct ones.
3 Journey Progressed
Let's take your Journey thingamajig first. What is a Journey ?
It is definitely not an Independent thing. We do not go walking in the heath and heather after the dew; nor the quietened beach at sunset, and suddenly, out of nowhere ... find a Journey, sitting there, all by itself. No. It can't stand up.
A Journey is Dependent (at least) on a starting and finishing point.
And what are those points exactly ? Railway stations.
Railway stations are Independent, they do stand alone.
And then a Journey is Dependent on a Railway station. In two separate relationships: start; end.
Predicate
I have given some of the Predicates, those relevant right now, so that they are explicit, so that you can check them carefully.
All the Predicates can be read directly from the model.
In the normal case, you have to read them from the diagram (it is rich with specific detail), and check that it is correct
that provides a valuable feedback loop:
modelling --> Predicate --> check --> more modelling.
4 Traveller Progressed
Now for your Traveller thingee. What is a Traveller ?
A Traveller is a person who has travelled on at least 1 journey
Therefore Journey is Dependent on Person
Person is Independent, it can stand alone
5 Journey Resolved
Now we can finalise Journey.
5 Requirement
Now we have a decent chance of answering your Question.
I have chosen Relational Keys that throw themselves at us, no thinking necessary.
What makes a Journey unique is ( NationalID, StationStart, DateTimeStart )
not ( NationalID, StationStart ). Anything more would be superfluous.
Person needs an additional Key, called an Alternate Key, on ( NameFamily, Name ). This prevents dupes on those values.
RoleName
In the first instance, the column name for a PK in used unchanged wherever it is an FK
Except:
to make it even more meaningful, eg.TravellerID, or
to differentiate, when there is more than one FK to the same parent, eg. StationStart, StationEnd.
6 Traveller ???
So what exactly is Traveller??? (the concept in your mind, it is not in the Requirement) ?
One possibility is:
a Person who travels on a Journey is a Traveller.
That is already available above, in the single Person sense.
But there is more. I get the idea that it is a group of people who took a journey together. But that too, is available from the above:
SELECT *
FROM Journey
WHERE (condition...)
GROUP BY StationStart, DateTimeStart, StationEnd
But that will give you the whole train, not a group of people who have an intended common purpose.
What I can figure out is, that you mean a group of people who have some common purpose, such as taking a trip together. That marks an intent, before the fact of the Journey. It could be a loose Group, or and Excursion, etc. Something smaller than a train-load.
I will give you two options. It is for you to contemplate them, and to specify (if it is long, edit your Question; if it is short, post a Comment).
7 Group Option
This is a simple structure, for groups that travel together. This assumes that (because it is group travel) tickets for the Journey are purchased in a block, for all the Members of the Group, and we don't track individual Person purchases.
8 Excursion Option
An excursion is one outing for the group, with different members each outing. This assumes that the Journey for each Person is tracked (booked personally, at different times).
The Fact that each Member has reserved their Journey (or not) is simply a matter of joining Excursion::Member::Journey.
Which is eminently possible due to the Relational Keys (impossible in an RFS). Refer to this Example. Please ask if you need code.
The Identifier for a Group (above) and an Excursion (below) is quite different:
I have set up Group to be a somewhat permanent affair, with a home, and an assumption that they go on several outings together. The groups you have given (in your Journeys.Travellers) would be three different groups, due to the membership.
Excursion is a single event, the group is the list of Passengers.
MemberID and PassengerID are RoleNames for NationalID, that is, the Role the Person plays in the subject table.
It also allows Journeys that a Person takes alone (without an Excursion) to be tracked.
Please feel free to ask specific questions. Or else update your original post.

Firstly understand what each relationships are, I am explaining very few basics which are widely used.
One to One
A One-to-One relationship means that you have two tables that have a relationship, but that relationship only exists in such a way that any given row from Table A can have at most one matching row in Table B.
Ex: A Student has unique rollnumber to unique student which means one student can have only one rollnumber
Many to Many
A good design for a Many-to-Many relationship makes use of something called a join table. The term join table is just a fancy way of describing a third SQL table that only holds primary keys.
Ex- Many Students can have many subjects.
One to Many
a one-to-many relationship is a type of cardinality that refers to the relationship between two entities A and B in which an element of A may be linked to many elements of B, but a member of B is linked to only one element of A.
For instance, think of A as books, and B as pages. A book can have many pages, but a page can only be in one book.
While in your case Travelers column make it as foreign key,the primary key of Traveler table.
Reason: One Traveller can have many journeys. So here relationship is One to Many

As you have a n To n relations. You need to create an intermediate table.
In this case you will have To create a unique id to the journey table to identify the row easily.
CREATE TABLE TRAVELERS_IN_JOURNEY (
National_of,
Journey_id
)
As a column cannot contains multiple keys, you ca also remove the Travelers column from you Journey table.
CREATE TABLE Journeys (
Journey_id INT AUTO_INCREMENT PRIMARY KEY,
Start_Station VARCHAR(50) NOT NULL,
End_Station VARCHAR(50) NOT NULL,
Start_Time VARCHAR(50) NOT NULL,
End_Time VARCHAR(50) NOT NULL
)

Related

correct structure (Database) for a mobile application

We are making a mobile application with some friends, but we are having problems regarding the structure of the database due to Unknown.I think it is a good question that can help many people and it would be nice if people with knowledge can explain it well. The app consists of providing various services (more can be added in the future) to customers. They are logged in and have access to our services. At first we thought of a table that contains columns with all the customer data + the services. Then we saw that it was more effective to make another separate table called "services" and that identifies the user by an id. The problem now comes to this table. We do not know whether to make a single column with all services (such as array) or to make one column per service. I took a photo so that what I am proposing can be observed more easily.
The question is which of these options (obviously there may be a third that we do not contemplate) is the best, in terms of performance.
I think that the second option I see several defects but I'm not sure. In terms of latency and speed, traversing an array (and more if services are added, or perhaps they are out of order because the user first hired service2 and then 1) is much higher than in option 1. In addition, the fact that a user is under a service, that implies going through the entire array, looking for it and eliminating it. I don't know you are the experts, what do you recommend?all this will be uploaded to the cloud (azure), so all requests will be to the cloud
Option 2 is better than option 1. But, with respect, it's still not good.
Never never store comma-separated lists of things in columns of data. If you do you'll be sorry. (They're very costly to search.)
You want something like this. Three tables, one for users, another for services, and a so-called JOIN table to establish a many-to-many relationship between the two.
+-----------+ +-------------+ +-----------+
|user | |user_service | |service |
+-----------+ +-------------+ +-----------+
|user_id +--->|user_id |<----+service_id |
|givennamee | |service_id | |name |
|surname | +-------------+ +-----------+
|is_active |
+-----------+
Each row in user_service means a user is authorized to use that service. To authorize a user, INSERT a row. To revoke authorization, DELETE the row.
To find out whethe a user can use a service, use this query.
SELECT user.user_id
FROM user
JOIN user_service USING (user_id)
JOIN service USING (service_id)
WHERE user.givenname = 'Bill' AND user.surname='Gates'
AND service.name = 'CharityNavigator'
AND user.is_active > 0;
If your query returns the user_id then the chosen user may use the chosen service.
To get a list of the services for each user, use this query.
SELECT user.user_id, user.givenname, user.surname,
GROUP_CONCAT(service.name) service_names
FROM user
JOIN user_service USING (user_id)
JOIN service USING (service_id)
WHERE user.is_active > 0
GROUP BY user.user_id
Some explanation:
It's almost always best to build tables with rows for things like your services in them, rather than columns or comma-separated lists in columns. Why?
You can add new services -- as many as you want -- years from now without reworking your database code.
DBMSs, including MySQL, work well with JOIN operations.
Doing WHERE commalist_column SOMEHOW_CONTAINS (some_id) is disgustingly inefficient in most relational database management systems. Doing WHERE column = some_id is far more efficient because it can use an index.
Rows with fewer columns, in general, work better than rows with more columns.
It's far cheaper in production to add rows to databases than it is to add columns. Adding columns means altering table definitions. That operation can require downtime.
When you use columns for things like your services, you're creating a closed system. When you use rows, your system is open-ended.
May I suggest you read about database normalization? Don't be intimidated by all the CS jargon. Just look at some examples of how to normalize various databases.
And maybe read about entity-relationship database modeling?
Edit On the advice of a commenter, I suggest you make the primary key of your user_service table to contain both columns (user_id, service_id). I also suggest you make a reverse index with both columns (service_id, user_id) so your queries can look things up quickly starting with service as well as user. Your table definitions might look something like this:
CREATE TABLE user (
user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
givenname VARCHAR(50) NULL DEFAULT NULL,
surname VARCHAR(50) NULL DEFAULT NULL,
is_active TINYINT NOT NULL DEFAULT '1',
PRIMARY KEY (user_id)
)
COLLATE='utf8mb4_general_ci';
CREATE TABLE service (
service_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (service_id)
)
COLLATE='utf8mb4_general_ci';
CREATE TABLE user_service (
user_id INT UNSIGNED NOT NULL,
service_id INT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, service_id),
INDEX reverse_index (service_id, user_id),
CONSTRAINT FK_service
FOREIGN KEY (service_id)
REFERENCES service (service_id)
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT FK_user
FOREIGN KEY (user_id)
REFERENCES user (user_id)
ON UPDATE RESTRICT ON DELETE RESTRICT
);
With this primary key if you attempt to INSERT a duplicate authorization for a user for a service, the dbms rejects it.
Be sure to use the same 'INT UNSIGNED NOT NULLdata type foruser_idandservice_id` in those tables.
This is a very common database design pattern: it's the canonical way of creating a many-to-many relationship between rows of two different tables.
A 3rd way (most frugal on space)
See the SET datatype. It allows for saying which combination of those 6 servs apply.
INT UNSIGNED (of a suitable size) is another way to have a "set".
SET or TINYINT takes only 1 byte to represent up to 8 items.
Your 6 column choice takes 6 bytes.
The "{serv1,... }" might be a VARCHAR, averaging 10-20 bytes.
So, My suggestions are clearly aimed at saving space. But maybe that is not important? Do you have millions or rows? Do you have more tnan 64 "servs"? (There is a limit of 64 on SET and BIGINT UNSIGNED.)
But Which?
Is the question about coding? Well, any method is going to take some effort to split the bits/columns/string apart to build the buttons on the screen. Probably a similar amount of effort and probably less than the effort to build the screen. Ditto for performance.
I highly recommend you pick two solutions and implement both. You will discover
How similar they are in performance, amount code, etc.
How insignificant the question is.
How much extra stuff you have learned about databases.
How easy it is to "try" and "throw away" another way to do something.
How the latency, performance, etc, differences are insignificant. (This is what we are really answering for you.)
The bigger picture
You have pointed out one use for this data structure. I worry that there are, or will be, other uses for this data structure. And that something else is the real determinant of which approach is best. (At that point, you can happily resurrect the thrown away version!)
A 4th way
JSON. But it would be more verbose (take more space) than your VARCHAR way. It may or may not be easier to work with -- this depends on the rest of the requirements.

How to model several tournaments / brackets types into a SQL database?

I want to model a database to store data of several types of tournaments (whith different types of modes: single rounds, double rounds, league, league + playoffs, losers, ...).
Maybe, this project would be a kind of Challonge: www.challonge.com
My question is: How to create a model in sql-relationship database to store all this types of tournaments?
I can't imagine how to do this work. There is a lot of different tables but all tables is related to one attribute: tournamentType...
Can I store a tournamentType field and use this field to select the appropiate table on query?
Thanks you
I can understand why you're struggling with modeling this. One of the key reasons why this is difficult is because of the object relational impendance-mismatch. While I am a huge fan of SQL and it is an incredibly powerful way of being able to organize data, one of its downfalls - and why NoSQL exists - is because SQL is different from Object Oriented Programming. When you describe leagues, with different matches, it's pretty easy to picture this in object form: A Match object is extended by League_Match, Round_Match, Knockout_Match, etc. Each of these Match objects contains two Team objects. Team can be extended to Winner and Loser...
But this is not how SQL databases work.
So let's translate this into relationships:
I want to model a database to store data of several types of tournaments (whith different types of modes: single rounds, double rounds, league, league + playoffs, losers, ...).
Tournaments and "modes" are a one to many (1:n) relationship.
Each tournament has many teams, and each team can be part of many tournaments (n:n).
Each team has many matches, and each match has two teams (n:n).
Each tournament has multiple matches but each match only belongs to one tournament (1:n).
The missing piece here that is hard to define as a universal relationship?
- In rounds, each future match has two teams.
- In knockout matches, each future match has an exponential but shrinking number of choices depending on the number of initial teams.
You could define this in the database layer or you could define this in your application layer. If your goal is to keep referential integrity in mind (which is one of the key reasons I use SQL databases) then you'll want to keep it in the database.
Another way of looking at this: I find that it is easiest for me to design a database when I think about the end result by thinking of it as JSON (or an array, if you prefer) that I can interact with.
Let's look at some sample objects:
Tournament:
[
{
name: "team A",
schedule: [
{
date: "11/1/15",
vs: "team B",
score1: 2,
score2: 4
},
{
date: "11/15/15",
vs: "team C",
}
]
}
],
[
//more teams
]
As I see it, this works well for everything except for knockout, where you don't actually know which team is going to play which other team until an elimination takes place. This confirms my feeling that we're going to create descendants of a Tournament class to handle specific types of tournaments.
Therefore I'd recommend three tables with the following columns:
Tournament
- id (int, PK)
- tournament_name
- tournament_type
Team
- id (int, PK)
- team_name (varchar, not null)
# Any other team columns you want.
Match
- id (int, PK, autoincrement)
- date (int)
- team_a_score (int, null)
- team_b_score (int, null)
- status (either future, past, or live)
- tournament_id (int, Foreign Key)
Match_Round
- match_id (int, not null, foreign key to match.id)
- team_a_id (int, not null, foreign key to team.id)
- team_b_id (int, not null, foreign key to team.id)
Match_Knockout
- match_id (int, not null, foreign key to match.id)
- winner__a_of (match_id, not null, foreign key to match.id)
- winner_b_of (match_id, not null, foreign key to match.id)
You have utilized sub-tables in this model. The benefit to this is that knockout matches and round/league matches are very different and you are treating them differently. The downside is that you're adding additional complexity which you're going to have to handle. It may be a bit annoying, but in my experience trying to avoid it only adds more headaches and makes it far less scalable.
Now I'll go back to referential integrity. The challenge with this setup is that theoretically you could have values in both Match_Round and Match_Knockout when they only belong in one. To prevent this, I'd utilize TRIGGERs. Basically, stick a trigger on both the Match_Round and Match_Knockout tables, which prevents an INSERT if the tournament_type is not acceptable.
Although this is a bit of a hassle to set up, it does have the happy benefit of being easy to translate into objects while still maintaining referential integrity.
You could create tables to hold tournament types, league types, playoff types, and have a schedule table, showing an even name along with its tournament type, and then use that relationship to retrieve information about that tournament. Note, this is not MySQL, this is more generic SQL language:
CREATE TABLE tournTypes (
ID int autoincrement primary key,
leagueId int constraint foreign key references leagueTypes.ID,
playoffId int constraint foreign key references playoffTypes.ID
--...other attributes would necessitate more tables
)
CREATE TABLE leagueTypes(
ID int autoincrement primary key,
noOfTeams int,
noOfDivisions int,
interDivPlay bit -- e.g. a flag indicating if teams in different divisions would play
)
CREATE TABLE playoffTypes(
ID int autoincrement primary key,
noOfTeams int,
isDoubleElim bit -- e.g. flag if it is double elimination
)
CREATE TABLE Schedule(
ID int autoincrement primary key,
Name text,
startDate datetime,
endDate datetime,
tournId int constraint foreign key references tournTypes.ID
)
Populating the tables...
INSERT INTO tournTypes VALUES
(1,2),
(1,3),
(2,3),
(3,1)
INSERT INTO leagueTypes VALUES
(16,2,0), -- 16 teams, 2 divisions, teams only play within own division
(8,1,0),
(28,4,1)
INSERT INTO playoffTypes VALUES
(8,0), -- 8 teams, single elimination
(4,0),
(8,1)
INSERT INTO Schedule VALUES
('Champions league','2015-12-10','2016-02-10',1),
('Rec league','2015-11-30','2016-03-04-,2)
Getting info on a tournament...
SELECT Name
,startDate
,endDate
,l.noOfTeams as LeagueSize
,p.noOfTeams as PlayoffTeams
,case p.doubleElim when 0 then 'Single' when 1 then 'Double' end as Elimination
FROM Schedule s
INNER JOIN tournTypes t
ON s.tournId = t.ID
INNER JOIN leagueTypes l
ON t.leagueId = l.ID
INNER JOIN playoffTypes p
ON t.playoffId = p.ID
It's easy to make data models far more complex than they need to be. A lot of what you describe is business logic that can't actually be answered by a perfect data model. Most of the tournament logic should be captured outside the data model in a programming language, such as mysql functions, Java, Python, C# etc. Really your data model should be all "static" data you need, and none of the moving parts. I would suggest the data model to be:
METADATA TABLES
League_Type:
Id
Description
Playoff_Rounds
Resolve_Losing_Teams
Max_Number_of_Teams
Min_Number_of_Teams
Number_Of_Games_In_Season
any other "settings" you want...
Game_Type:
Id
League_Type_Id (fk to League_Type)
Game_Type_Name (e.g. regular season, playoff, championship)
DATA TABLES
League:
Id
League_Type_Id (fk to League_Type)
League_Name
Team:
Id
League_Id (fk to League_Type)
Team_Name
Game:
Id
League_Id (fk to League_Type)
Game_Type_Id (fk to Game_Type)
Home_Team_Id (fk to Team)
Visiting_Team_Id (fk to Team)
Week_of_season
Home_Team_Score
Visiting_Team_Score
Winning_Team (Home or Visitor)
From a data model perspective that should really be all you need. The procedural code should handle things like:
Creating games based on a randomized schedule
Updating scores and winning team in the Game table
Creating playoff games based on when the number of games in the season is up per the league settings table.
Setting matchups in the playoffs based on how many games each team has one.
Forcing the number of teams in a league to be between Min_Number_of_Teams and Max_Number_of_Teams prior to the season beginning.
Etc.
You'll also likely want to create some views based on these tables to create some other meaningful information for end users:
Wins/Losses for a team (based on the Team table joined to the Game table)
Current team standings based on the previous wins/losses view for all teams
Home wins for a team
Road wins for a team
Anything else your heart desires!
Final thoughts
You do not want to do anything that would repeat data stored in the database. A great example of this would be creating a separate table for playoff games vs. regular season games. Most of the columns would be duplicated because almost all of the functionality and data stored between the two tables is the same. To create both tables would break the rules of normalization. The more compact and simple your data structure can be, the less procedural code you will have to write, and the easier it will be to maintain your database.
This looks like a generalization/specialization problem to me. I will answer how to do this in a general way, as you didn't give much detail about the entities you acttually need.
Suppose you have an entity Vehicle (replacing your tournament) and the specialization Train and Car. All Vehicles have an attribute maxSpeed and Train has numberOfWagons and Car has trunkCapacity.
To model this, you have several options:
(1) Merge them all into one table
You can create one table Vehicle with columns maxSpeed, numberOfWagons and trunkCapacity. You add another field vehicleType to distinguish between Trains and Cars, and you'll probably want an Id.
For any concrete Verhicle some of the columns will always be null.
(2) use separate super/sub tables
Alternatively you can create a table Vehicle with just Id and maxSpeed and create tables for Train and Car which just hold the extra attributes, namely numberOfWagons and trunkCapacity (and also an Id).
In this case, creating a new Car will require two inserts, one in the Vehicle Table and one in the Car Table. To select a car you would have to join Vehicle and Car, unless you are only interested in its vehicle attributes.
While this approach is more complicated than (1) it has some benefits
you will not have that many null columns. A contraint like "the trunkCapacity of a car must not be null" can be easily epressed.
you can add new Verhicle Types by just adding new tables and without changing any of the existing ones.
Converting between the two
From (2), you can still get a "merged" view (as in (1)) of all your vehicles by creating a view. This view will be a union of several selects, where each select joins one specialization (Train or Car) with Vehicle and adds constant null columns for the attributes it cannot retrieve from the specialization, so all selects in the union return the same number of columns.
From (1) you can create individual views for Trains and Cars by selecting a particular vehicle type from the Vehicle table and only the columns which are relevant for that vehicle type.
(3) A mixture of the two
You can merge the most prominent attributes into one table and exile the more exotic attributes into extra tables.
A word of caution
One must be careful not to overdo generalizations. It is often better to just model a Cat as a Cat. In object-oriented programming, generalization ("Superclasses") are treasured. There is saves code duplication, but column duplication is not nearly as bad as code duplication. Remember that you're just modelling data and not behaviour. And also in OO-land generalizations are often overdone.
I don't really see the complexity of this model. Let's see:
You need:
Tournaments table (to uniquely define each tournament - league, cup, etc.)
Tournament types, periods and phases (to define each tournament characteristics)
Matchups (every single match of every tournament, including home and visitor teams and the final score)
Matchup types (linked to the tournament phases)
Teams, roster and players (last 2 optional if you intend to add that level of information)
So as an example:
tournament: Premier Leage
tournament type: league
tournament_period: 2015-2016
tournament_phase: 20th round
matchup: Chelsea VS Liverpool
matchup_type: second leg
score_visitor: 2
score_local: 0
or
tournament: Champions league
tournament type: tournament
tournament_period: 2015-2016
tournament_phase: 2nd Round
matchup_type: first leg
matchup: Chelsea VS Barcelona
score_visitor: 5
score_local: 0
I did this pretty fast so the relationships and columns might not be right, but I guess you have a start point.
Hope it helps!
Regards
Not an elegant solution, but you could do the following:
Create a table that holds key value pairs of attributes for a given tournament. Each tournament would be stored in multiple rows.
CREATE TABLE TOURNAMENT {
TOURNAMENT_TYPE VARCHAR2(100) NOT NULL,
TOURNAMENT_NAME VARCHAR2(100) NOT NULL,
ATTRIBUTE_NAME VARCHAR2(100) NOT NULL,
ATTRIBUTE_VALUE VARCHAR2(100) NOT NULL
};
e.g.
TOURNAMENT_TYPE,TOURNAMENT_NAME,ATTRIBUTE_NAME,ATTRIBUTE_VALUE
Volleyball,My volleyball tournament,team_member_1,Josie
Volleyball,My volleyball tournament,team_member_2,Ralph
Volleyball,My volleyball tournament,Rounds Per Game,12
Soccer,My volleyball tournament,team_member_1,Jim
Soccer,My soccer tournament,team_member_2,Emma
Soccer,My soccer tournament,Tournament Duration,20

Database design confusion

I'm developing a classifieds site. And I'm totally stuck at database design level.
Advertisiment can only be in 1 category.
In my database I have table called "ads", which has columns, common for all advertisements.
CREATE TABLE Ads (
AdID int not null,
AdDate datetime not null,
AdCategory int not null,
AdHeading varchar(255) not null,
AdText varchar(255) not null,
etc...
);
I also have a lot of categories.
Ads that are posted in "cars" category, for example, have additional columns like make, model, color, etc. Ads, posted in "housing" have columns like housing type, sqft. etc...
I did something like:
CREATE TABLE Cars (
AdID int not null,
CarMake varchar (255) not null,
CarModel varchar(255) not null,
...
);
CREATE TABLE Housing (
AdID int not null,
HousingType varchar (255) not null
...
);
AdId in those is a foreign key to Ads.
But when I need to retrieve information from Ads, I have to look up all those additional tables and check if AdId in Ads equals to AdId in those tables.
For every category I need a new table. I'm gonna end up with like 15 tables or so.
I had an idea to have a boolean columns in Ads table like is_Cars, is_Housing, etc but having a 15 columns, where 14 would be NULL seems to be horrible.
Is there any better way to design this database? I need my database to be in a 3rd normal form, this is the most important requirement.
Don't worry too much - it's a well known dilemma, there are no 'silver bullets' and all solutions have some trade-offs. Your solution sounds good to me, and is commonly used in the industry. On the down side it has JOINS as you mentioned (which is a well-known trade-off of normalization anyway), and also each new product type requires a new TABLE. On the up side the table structure precisely reflects your business logic, it's readable and efficient in storage.
Your other suggestion, as far as I understand, was a single table where each row has a "type" indication - car, house etc (btw no need for multiple columns such as 'is_car', 'is_house' - it's simpler to have a single column 'type', e.g. type=1 indicates car, type=2 indicates house etc). Then multiple columns where some of them are unused for some product types.
Well, here the advantage is capability to add new types dynamically (even user-defined types) without changing the database schema. Also no 'JOINs'. On the down side you'll be storing & retrieving lots of 'null' cells, and also the schema would be less descriptive: e.g. it's harder to put a constraint "carModel column is not nullable", because it is nullable for houses (you can use triggers, but it's less readable).
Personally I prefer the 1st solution (of course depending on the usecase, but the 1st solution is my first instinct). And I can use it with some peace of mind after considering the trade-offs, e.g. understanding that I'm tolerating those JOINS as payment for a readable & compact schema.
One, you are confusing categories and product specifications.
Two, you need to read up on Table Inheritance.
If you don't mind nulls, use Single Table Inheritance. All "categories" (cars, houses, ...) go in one table and have a "type" column.
If you don't like nulls, use Class Table Inheritance. Make a master table with the primary keys that you point your category foreign key at. Make child tables for each type (cars, houses, ...) whose primary key is also a foreign key to the master table. This is easier with an ORM like Hibernate.

Database Smell - Improve current design with multiple tables

I am in the process of creating a second version of my technical wiki site and one of the things I want to improve is the database design. The problem (or so I think) is that to display each document, I need to join upwards of 15 tables. I have a bunch of lookup tables that contain descriptive data associated with each wiki entry such as programmer used, cpu, tags, peripherals, PCB layout software, difficulty level, etc.
Here is an example of the layout:
doc
--------------
id | author_id | doc_type_id .....
1 | 8 | 1
2 | 11 | 3
3 | 13 | 3
_
lookup_programmer
--------------
doc_id | programmer_id
1 | 1
1 | 3
2 | 2
_
programmer
--------------
programmer_id | programmer
1 | USBtinyISP
2 | PICkit
3 | .....
Since some doc IDs may have multiples entries for a single attribute (such as programmer), I have created the DB to compensate for this. The other 10 attributes have a similiar layout as the 2 programmer tables above. To display a single document article, approx 20 tables are joined.
I used the Sphinx Search engine for finding articles with certain characteristics. Essentially Sphinx indexes all of the data (does not store) and returns the wiki doc ID of interest based on the filters presented. If I want to find articles that use a certain programmer and then sort by date, MYSQL has to first join ALL documents with the 2 programmer tables, then filter, and finally sort the remaining by insert time. No index can help me ordering the filtered results (takes a LONG time with 150k doc IDs) since it is done in a temporary table. As you can imagine, it gets worse really quickly with the more parameters that need to be filtered.
It is because I have to rely on Sphinx to return - say all wiki entries that use a certain CPU AND programer - that lead me to believe that there is a DB smell with my current setup....
edit: Looks like I have implemented a [Entity–attribute–value model]1
I don't see anything here that suggests you've implemented EAV. Instead, it looks like you've assigned every row in every table an ID number. That's a guaranteed way to increase the number of joins, and it has nothing to do with normalization. (There is no "I've now added an id number" normal form.)
Pick one lookup table. (I'll use "programmer" in my example.) Don't build it like this.
create table programmer (
programmer_id integer primary key,
programmer varchar(20) not null,
primary key (programmer_id),
unique key (programmer)
);
Instead, build it like this.
create table programmer (
programmer varchar(20) not null,
primary key (programmer)
);
And in the tables that reference it, consider cascading updates and deletes.
create table lookup_programmer (
doc_id integer not null,
programmer varchar(20) not null,
primary key (doc_id, programmer),
foreign key (doc_id) references doc (id)
on delete cascade,
foreign key (programmer) references programmer (programmer)
on update cascade on delete cascade
);
What have you gained? You keep all the data integrity that foreign key references give you, your rows are more readable, and you've eliminated a join. Build all your "lookup" tables that way, and you eliminate one join per lookup table. (And unless you have many millions of rows, you're probably not likely to see any degradation in performance.)

Database many-to-many intermediate tables: extra fields

I have created a 'shops' and a 'customers' table and an intermediate table customers_shops. Every shop has a site_url web address, except that some customers use an alternative url to access the shop's site (this url is unique to a particular customer).
In the intermediate table below, I have added an additional field, shop_site_url. My understanding is that this is in 2nd normalised form, as the shop_site_url field is unique to a particular customer and shop (therefore won't be duplicated for different customers/shops). Also, since it depends on customer and shop, I think this is in 3rd normalised form. I'm just not used to using the 'mapping' table (customers_shops) to contain additional fields - does the design below make sense, or should I reserve the intermediate tables purely as a to convert many-to-many relationships to one-to-one?
######
customers
######
id INT(11) NOT NULL PRIMARY KEY
name VARCHAR(80) NOT NULL
######
shops
######
id INT(11) NOT NULL PRIMARY KEY
site_url TEXT
######
customers_shops
######
id INT(11) NOT NULL PRIMARY KEY
customer_id INT(11) NOT NULL
shop_id INT(11) NOT NULL
shop_site_url TEXT //added for a specific url for customer
Thanks
What you are calling an "intermediate" table is not a special type of table. There is only one kind of table and the same design principles ought to be applicable to all.
Well, let's create the table, insert some sample data, and look at the results.
id cust_id shop_id shop_site_url
--
1 1000 2000 NULL
2 1000 2000 http://here-an-url.com
3 1000 2000 http://there-an-url.com
4 1000 2000 http://everywhere-an-url-url.com
5 1001 2000 NULL
6 1001 2000 http://here-an-url.com
7 1001 2000 http://there-an-url.com
8 1001 2000 http://everywhere-an-url-url.com
Hmm. That doesn't look good. Let's ignore the alternative URL for a minute. To create a table that resolves a m:n relationship, you need a constraint on the columns that make up the m:n relationship.
create table customers_shops (
customer_id integer not null references customers (customer_id),
shop_id integer not null references shops (shop_id),
primary key (customer_id, shop_id)
);
(I dropped the "id" column, because it tends to obscure what's going on. You can add it later, if you like.)
Insert some sample data . . . then
select customer_id as cust_id, shop_id
from customers_shops;
cust_id shop_id
--
1000 2000
1001 2000
1000 2001
1001 2001
That's closer. You should have only one row for each combination of customer and shop in this kind of table. (This is useful data even without the url.) Now what do we do about the alternative URLs? That depends on a couple of things.
Do customers access the sites through
only one URL, or might they use more
than one?
If the answer is "only one", then you can add a column to this table for the URL, and make that column unique. It's a candidate key for this table.
If the answer is "more than one--at the very least the site url and the alternative url", then you need to make more decisions about constraints, because altering this table to allow multiple urls for each combination of customer and shop cuts across the grain of this requirement:
the shop_site_url field is unique to a
particular customer and shop
(therefore won't be duplicated for
different customers/shops)
Essentially, I'm asking you to decide what this table means--to define the table's predicate. For example, these two different predicates lead to different table structures.
customer 'n' has visited the web site
for shop 'm' using url 's'
customer 'n' is allowed to visit the
web site for shop 'm' using alternate
url 's'
Your schema does indeed make sense, as shop_site_url is an attribute of the relationship itself. You might want to give it a more meaningful name in order to distinguish it from shops.site_url.
Where else would you put this information? It's not an attribute of a shop, and it's not an attribute of a customer. You could put this in a separate table, if you wanted to avoid having a NULLable column, but you'd end up having to have a reference to your intermediate table from this new table, which probably would look even weirder to you.
Relationships can have attributes, just like entities can have attributes.
Entity attributes go into columns in entity tables. Relationship attributes, at least for many-to-many relationships, go in relationship tables.
It sounds as though, in general, URL is determined by the combination of shop and customer. So I would put it in the shop-customer table. The fact that many shops have only one URL suggests that there is a fifth normal form that is more subtle than this. But I'm too lazy to work it out.