I have 2 tables and was wondering what the best relationship between them was. I know there is a relationship between them but I get so confused with one to many, many to one, many to many, unidirectional, bidirectional, multidirectional etc.
So this is the basic, displayed, structure:
Traveler Table:
+------------------------------------------+
| Name | Family Name | National ID No. |
+------------------------------------------+
| Dianne | Herbert | 579643 |
| Francine | Jackson | 183432 |
| Oprah | Dingle | 269537 |
+------------------------------------------+
Journeys Table
+------------------------------------------------------------------------------------------------------+
| Start Station | End Station | Start Time | End Time | Travelers |
+------------------------------------------------------------------------------------------------------+
| Hull | Leeds | 13:50 | 14:50 | Francine Jackson, Oprah Dingle |
| Newcastle | Manchester | 16:30 | 19:00 | Dianne Herbert, Francine Jackson |
| Hull | Manchester | 10:00 | 13:00 | Dianne Herbert, Francine Jackson, Oprah Dingle |
+------------------------------------------------------------------------------------------------------+
The travelers table is okay, it makes sense:
CREATE TABLE Travelers (
Name VARCHAR(50) NOT NULL,
Family_Name VARCHAR(50) NOT NULL,
National_ID_Number INT(6) NOT NULL PRIMARY KEY
)
But I am unsure about how to do the journeys table. Especially with Travelers:
CREATE TABLE Journeys (
Start_Station VARCHAR(50) NOT NULL,
End_Station VARCHAR(50) NOT NULL,
Start_Time VARCHAR(50) NOT NULL,
End_Time VARCHAR(50) NOT NULL,
Travelers ???????
)
Obviously I have "Travelers" as a column inside my 2nd table. So there is a relationship there with the first table. But what is it? I think I need to make a Foreign Key somehow?
You are looking for a junction/association table. The tables should look like this:
create table Journeys (
Journey_Id int auto_increment primary key,
Start_Station VARCHAR(50) NOT NULL,
End_Station VARCHAR(50) NOT NULL,
Start_Time VARCHAR(50) NOT NULL,
End_Time VARCHAR(50) NOT NULL
)
create table TravelerJourneys (
traveler_journey_id int auto_increment primary key,
traveler_id int(6),
journey_id int,
foreign key (traveler_id) references travelers(National_ID_Number),
foreign key (journey_id) references Journeys (journey_id)
);
I Relational • Pre-Requisite Explanation
There is an awful lot of misinformation; disinformation in the "literature" produced by the "theoreticians" and all the authors that follow them. Of course that is very confusing and leads to primitive, pre-relational Record Filing Systems with none of the Integrity; Power; and Speed of Relational Systems. Second, while newbies try hard to answer questions here, due to the above, they are also badly confused.
I can't provide a tutorial, this is not-so-short explanation of the issues that you need to understand before diving in to the Question.
1 Relationship
I get so confused with one to many, many to one, many to many, unidirectional, bidirectional, multidirectional etc.
unidirectional, bidirectional, multidirectional
Please delete those from your mind, they are not Relational terms (the "theoreticians" and newbies love to invent new things, they have no value other than to add confusion).
There is no direction in a relationship. It always consists of:
a Primary Key: thing that is referenced, the parent end, and
a Foreign Key: the child end, thing that is referencing the parent PK
At the SQL code level, DML, you could perceive a "direction", parent-to-child, or child-to-parent. It is a matter of perception (not storage) and relevant only to the requirement of the code, the "way to get from this data to that data".
At the physical level, SQL DDL, there is only one type of relationship Parent::Child, and that is all we have ever needed. No Cardinality yet, because that is controlled by other means. As with the natural world, the parent is the thing that is referenced, the child is the thing that references the parent.
At the bare bones level, that is not a Relational database, but a 1960's Record Filing System, the relationship is Referenced:: Referencing, and God only knows what each thing is.
The child can have only one parent, and the parent can have many children, therefore the one-and-only relationship at the physical level is:
one [parent] to 0-to-many [children]
A Relational database is made up of things (rows, the main symbol, with either square or round corners)) and relationships between things (the lines, either Identifying or Non-Identifying). A thing is a Fact, each row is a Fact, the relationships are relationships between Facts.
In the Relational Model, each thing must be uniquely Identified, each logical row (not record!) must be unique. That is the Primary Key, which must be made up from the data (INT; GUID; UUID; etc are not data, they are additions, in the system, the user does not see them).
Of course, IDENTITY or AUTOINCREMENT are fine for prototypes and trials, they are not permitted in Production.
There are many differences between Relational databases and the pre-relation, 1960's Record Filing Systems that the "theoreticians" use. Such primitive systems use physical pointers, such as Record ID (INT; GUID; UUID; etc). If I had to declare just one, the fundamental difference is:
whereas the RFS is physical, the Relational Model is Logical
therefore, whereas in the RFS physical records are referenced by their physical pointer, in the RDb logical rows (nor records!) are referenced by their logical Key
The relationship is established as follows:
ALTER TABLE child_table
ADD CONSTRAINT constraint_name
FOREIGN KEY ( foreign_key_column_list )
REFERENCES parent_table ( primary_key_column_list )
Beware, some "theoreticians", and some newbies, do not understand SQL. If I tell you that Sally is Fred's daughter, from the single Fact you will know that Fred is Sally's father. There is no need for the second statement, it is obviously the first statement in reverse. Likewise in SQL, it is not stupid. There is only one relationship definition. But those darlings add a second "relationship", the above in reverse. That is
(a) totally redundant, and
(b) interferes with administration of the tables. Probably, those types are the ones that use weird and wonderful directional terms.
2 Cardinality
That is controlled firstly by implementing an Index, and secondly by additional by other means. The additional is not relevant here.
one [parent]
Each row is unique, by virtue of the Primary Key, expressed as:
ALTER TABLE table
ADD CONSTRAINT constraint_name
PRIMARY KEY ( column_list )
one [parent] to many [children]
Because each parent row is unique, we know that the reference [to the parent] in the child will reference just one row
ALTER TABLE child_table
ADD CONSTRAINT constraint_name
FOREIGN KEY ( foreign_key_column_list ) -- local child
REFERENCES parent_table ( primary_key_column_list ) -- referenced parent
Example
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993. Refer to IDEF1X Introduction,.
ALTER TABLE Customer
ADD CONSTRAINT Customer_pk
PRIMARY KEY ( CustomerCode )
ALTER TABLE OrderSale
ADD CONSTRAINT OrderSale_pk
PRIMARY KEY ( CustomerCode, OrderSaleNo )
ALTER TABLE Order
ADD CONSTRAINT Customer_Issues_Orders_fk
FOREIGN KEY ( CustomerCode ) -- local child
REFERENCES Customer ( CustomerCode ) -- referenced parent
many to one
There is no such thing. It is simply reading a one-to-many relationship in reverse, and doing so without understanding. In the example, reading the data model explicitly, or translating it to text:
Each Customer issues 0-to-n OrderSales
the reverse is (refer again to the one-to-many):
Each OrderSale is issued by 1 Customer
Again, beware, newbies may implement a duplicate relationship, that will (a) confuse you, and (b) stuff things up royally.
many to many
We have been using diagrammatic modelling tools since the early 1980's. Even IDEF1X was available for modelling long before it was elevated to a NIST Standard. Modelling is an iterative process: whereas redrawing is very cheap, re-implementing SQL is expensive. We start at the Logical level with no concern for the physical (tables, platform specifics), with only entities, progress to logical Keys, Normalising as we go. Finally, still at the logical level, we would finalise each table, and check that the datatypes are correctly set.
If and when the logical model is (a) stable, and (b) signed off, then we progress to the Physical: creating the datatypes; tables; foreign keys; etc. It is a simple matter of translating the data model to SQL DDL. If you use a modelling tool, that is one click, and the tool does it for you.
The point is, there is progression, and a distinction between the Logical and Physical levels.
At the physical level, as can be understood from the fact that there is one and only one type of relationship in SQL, there is no such thing as a many-to-many relationship. Notice that it can't be expressed even in text form, in a single statement, we need two statements.
Such a relationship exists only at the logical modelling level: when we determine that there is such a relationship between two Facts (rows in a table), we draw it.
At the point when the data model is stable, and we move from teh Logical to the Physical, the n-to-n relationship is translated into an Associative Table and a relationship to each parent.
Refer to this unrelated document for an Example
Notice the many-to-many relationship Favours in the Logical Requirement
Notice the translation to an Associative table and tw relationships in Implementation (Right side only)
Each User favours 0-to-n ProductPreferences
Each Product is favoured in 0-to-n ProductPreferences
Now notice this sagely: that Implementation model can be read Logically:
Each User favours 0-to-n Products (via ProductPreference)
Each Product is favoured by 0-to-n Users (via ProductPreference)
Additionally, you might find this document helpful (section 1 Implementation: Relationship only).
II Your Question
Now we can deal with your question.
1 The Obstacle
Your quandary is due to:
not progressing through the formal stages, due to lack of education in the subject matter (hopefully mitigated by the above explanations)
having an idea at the Logical level ... but not formally
of the views required in the app, as opposed to the perceiving the data independent of the app
diving into the Physical tables ... with nothing in-between
not asking specific questions, due either to shyness or inability to identify the particular point that you do not understand
and thus you are stuck, as per your original post.
2 The Quandary
Your quandary is:
you have this at the logical level (Data model, Entity-Relationship level):
and of course, your CREATE TABLE commands at the physical level.
I hope my explanations above are enough to understand the great gap in what you have:
the logical vs the physical
that the physical is far too premature
that we need at least some data modelling (not formal, not possible in this medium) to work things out.
The Logical data model is simply not progressed enough, let alone resolved, in order to create stable tables, let alone correct ones.
3 Journey Progressed
Let's take your Journey thingamajig first. What is a Journey ?
It is definitely not an Independent thing. We do not go walking in the heath and heather after the dew; nor the quietened beach at sunset, and suddenly, out of nowhere ... find a Journey, sitting there, all by itself. No. It can't stand up.
A Journey is Dependent (at least) on a starting and finishing point.
And what are those points exactly ? Railway stations.
Railway stations are Independent, they do stand alone.
And then a Journey is Dependent on a Railway station. In two separate relationships: start; end.
Predicate
I have given some of the Predicates, those relevant right now, so that they are explicit, so that you can check them carefully.
All the Predicates can be read directly from the model.
In the normal case, you have to read them from the diagram (it is rich with specific detail), and check that it is correct
that provides a valuable feedback loop:
modelling --> Predicate --> check --> more modelling.
4 Traveller Progressed
Now for your Traveller thingee. What is a Traveller ?
A Traveller is a person who has travelled on at least 1 journey
Therefore Journey is Dependent on Person
Person is Independent, it can stand alone
5 Journey Resolved
Now we can finalise Journey.
5 Requirement
Now we have a decent chance of answering your Question.
I have chosen Relational Keys that throw themselves at us, no thinking necessary.
What makes a Journey unique is ( NationalID, StationStart, DateTimeStart )
not ( NationalID, StationStart ). Anything more would be superfluous.
Person needs an additional Key, called an Alternate Key, on ( NameFamily, Name ). This prevents dupes on those values.
RoleName
In the first instance, the column name for a PK in used unchanged wherever it is an FK
Except:
to make it even more meaningful, eg.TravellerID, or
to differentiate, when there is more than one FK to the same parent, eg. StationStart, StationEnd.
6 Traveller ???
So what exactly is Traveller??? (the concept in your mind, it is not in the Requirement) ?
One possibility is:
a Person who travels on a Journey is a Traveller.
That is already available above, in the single Person sense.
But there is more. I get the idea that it is a group of people who took a journey together. But that too, is available from the above:
SELECT *
FROM Journey
WHERE (condition...)
GROUP BY StationStart, DateTimeStart, StationEnd
But that will give you the whole train, not a group of people who have an intended common purpose.
What I can figure out is, that you mean a group of people who have some common purpose, such as taking a trip together. That marks an intent, before the fact of the Journey. It could be a loose Group, or and Excursion, etc. Something smaller than a train-load.
I will give you two options. It is for you to contemplate them, and to specify (if it is long, edit your Question; if it is short, post a Comment).
7 Group Option
This is a simple structure, for groups that travel together. This assumes that (because it is group travel) tickets for the Journey are purchased in a block, for all the Members of the Group, and we don't track individual Person purchases.
8 Excursion Option
An excursion is one outing for the group, with different members each outing. This assumes that the Journey for each Person is tracked (booked personally, at different times).
The Fact that each Member has reserved their Journey (or not) is simply a matter of joining Excursion::Member::Journey.
Which is eminently possible due to the Relational Keys (impossible in an RFS). Refer to this Example. Please ask if you need code.
The Identifier for a Group (above) and an Excursion (below) is quite different:
I have set up Group to be a somewhat permanent affair, with a home, and an assumption that they go on several outings together. The groups you have given (in your Journeys.Travellers) would be three different groups, due to the membership.
Excursion is a single event, the group is the list of Passengers.
MemberID and PassengerID are RoleNames for NationalID, that is, the Role the Person plays in the subject table.
It also allows Journeys that a Person takes alone (without an Excursion) to be tracked.
Please feel free to ask specific questions. Or else update your original post.
Firstly understand what each relationships are, I am explaining very few basics which are widely used.
One to One
A One-to-One relationship means that you have two tables that have a relationship, but that relationship only exists in such a way that any given row from Table A can have at most one matching row in Table B.
Ex: A Student has unique rollnumber to unique student which means one student can have only one rollnumber
Many to Many
A good design for a Many-to-Many relationship makes use of something called a join table. The term join table is just a fancy way of describing a third SQL table that only holds primary keys.
Ex- Many Students can have many subjects.
One to Many
a one-to-many relationship is a type of cardinality that refers to the relationship between two entities A and B in which an element of A may be linked to many elements of B, but a member of B is linked to only one element of A.
For instance, think of A as books, and B as pages. A book can have many pages, but a page can only be in one book.
While in your case Travelers column make it as foreign key,the primary key of Traveler table.
Reason: One Traveller can have many journeys. So here relationship is One to Many
As you have a n To n relations. You need to create an intermediate table.
In this case you will have To create a unique id to the journey table to identify the row easily.
CREATE TABLE TRAVELERS_IN_JOURNEY (
National_of,
Journey_id
)
As a column cannot contains multiple keys, you ca also remove the Travelers column from you Journey table.
CREATE TABLE Journeys (
Journey_id INT AUTO_INCREMENT PRIMARY KEY,
Start_Station VARCHAR(50) NOT NULL,
End_Station VARCHAR(50) NOT NULL,
Start_Time VARCHAR(50) NOT NULL,
End_Time VARCHAR(50) NOT NULL
)
Let's say there are two entities - Product and Image with a many-to-many relationship between them. The order of images associated with each product does matter.
Product
------------------------------------
ProductID (primary key)
ProductName
...
Image
------------------------------------
ImageID (primary key)
Url
Size
...
What are the cons and pros of the following three many-to-many "bridge" table approaches for solving this problem?
ProductImage
------------------------------------
ProductImageID (primary key, identity)
ProductID (foreign key)
FullImageID (foreign key)
ThumbImageID (foreign key)
OrderNumber
or
ProductImage
------------------------------------
ProductID (primary key, foreign key)
IndexNumber (primary key)
FullImageID (foreign key)
ThumbImageID (foreign key)
or
ProductImage
------------------------------------
ProductID (primary key, foreign key)
FullImageID (primary key, foreign key)
ThumbImageID (foreign key)
OrderNumber (index)
There is no purpose (that I have ever found) in adding a surrogate key (i.e. the IDENTITY field) to a many-to-many "bridge" table (or whatever you want to call it). However, neither of your proposed schemas is correct.
In order to get the ideal setup, you first need to determine the scope / context of the following requirement:
The order of images associated with each product does matter.
Should the ordering of the images be the same, in relation to each other, regardless of what Products they are associated with? Meaning, images A, B, C, and D are always in alphabetical order, regardless of what combination of them any particular Product has.
Or, can the ordering change based on the Product that the Image is associated with?
If the ordering of the Images needs to remain consistent across Products, then the OrderNumber field needs to go into the Image table. Else, if the ordering can change per Product, then the OrderNumber field go into this bridge / relationship table.
In either case:
the PK is the combination of FKs:
A Primary Key uniquely, and hopefully reliably (meaning that is doesn't change), identifies each row. And if at all possible, it should be meaningful. Using the combination of the two FK fields gives exactly that while enforcing that uniqueness (so that one Product cannot be given the same Image multiple times, and vice-versa). Even if these two fields weren't chosen as the PK, they would still need to be grouped into a UNIQUE INDEX or UNIQUE CONSTRAINT to enforce that data integrity (effectively making it an "alternate key"). But since these IDs won't be changing (only inserted and deleted) they are well suited to be the PK. And if you are using SQL Server (and maybe others) and decide to use this PK as the Clustered index, then you will have the benefit of having both ProductID and ImageID in any Non-Clustered Indexes. So when you need to sort by [OrderNumber], the Non-Clustered Index on that field will automatically be a covering index because the only two data fields you need from it are already there.
On the other hand, placing the [OrderNumber] field into the PK has a few downsides:
It can change, which is not ideal for PKs.
It removes the ability to enforce that a ProductID and ImageID can only relate to each other one time. Hence would need that additional UNIQUE INDEX or UNIQUE CONSTRAINT in order to maintain the data integrity. Else, even if you include all 3 fields in the PK, it still allows for the ProductID + ImageID combination to be there multiple times per various values of IndexID.
there is no need for an IDENTITY field:
With the above information in mind, all of the requirements of a PK have already been met. Adding a surrogate key / auto-increment field adds no value, but does take up additional space.
And to address the typical reply to the above statement regarding the surrogate key not adding any value, some will say that it makes JOINs easier if this combination of ProductID+ImageID needs to be Foreign Keyed to a child table. Maybe each combination can have attributes that are not singular like [OrderNum] is. An example might be "tags" (although those would most likely be associated with just ImageID, but it works as a basic example). Some people prefer to only place a single ID field in the child table because it is "easier". Well, it's not easier. By placing both ImageID and ProductID fields in the child table and doing the FK on both back to this PK, you now have meaningful values in the child table and will not need to JOIN to this [ProductImage] table all of the time just to get that information (which will probably be needed in most queries that are not simply listing or updating those attributes for a particular ProductID+ImageID combination). And if it is not clear, adding a surrogate key still requires a UNIQUE INDEX or UNIQUE CONSTRAINT to enforce the data integrity of unique ProductID+ImageID combinations (as stated above in the first bullet point).
And placing both ID fields into the child table is another reason to stay away from fields that can change when choosing a PK: if you have FKs defined, you need to set the FK to ON UPDATE CASCADE so that the new value for the PK propagates to all child tables, else the UPDATE will fail.
ProductImage
------------------------------------
ProductID (primary key, foreign key to Product table)
FullImageID (primary key, foreign key to Image table)
ThumbImageID (foreign key; shouldn't this field be in the Image table?)
OrderNumber TINYINT (only here if ordering is per Product, else is in Image table)
The only reason I can see for adding a surrogate key in this situation is if there is a requirement from some other software. Things such as SQL Server Replication (or was it Service Broker?) and/or Entity Framework and/or Full-Text Search. Not sure if those examples do require it, but I have definitely seen 1 or 2 "features" that require a single-field PK.
The best way to achieve this is by having three tables, one for products, one for images and one for their relationship
products
--------
+ product_id (pk)
- product_name
- product_description
- ...
images
------
+ image_id (pk)
- image_title
- ...
product_images
--------------
+ product_id (fk)
+ image_id (fk)
Why do you have seperate tables for fullImage and thumbImage?
Table1 is better since it allows you identify individual rows inside the table.
Table2, im sure you can't have two primary keys.
It might be better to have an Image table as follows.
ImageId (primary)
FullImage [actual value/FK]
ThumbNail [actual value/FK]
and then,
ProductImageID (primary)
ProductID [FK]
ImageID [FK]
How that helps,
Regards,
Rainy
Take the following database tables
|========|
|user |
|========|
|id |
|username|
|password|
|========|
|=========|
|blog |
|=========|
|id |
|date |
|content |
|author_id|
|=========|
blog.author_id is supposed to connected to a particular user.id, whichever user who wrote the blog entry obviously.
My question is with regards to 1:1, 1:n identifying and non-identifying relationships... I don't really understand them very much. Should this relationship be one of these types of relationships or not? And if so, which one? And what is the advantage of this?
In this example, there's a 1:1 relationship between a blog record and an author. The reason they exist as separate entities/tables is the grouping of information -- user related stuff doesn't belong with a blog record, and it could be duplicated if someone writes more than one blog.
The reason you want that implemented as a foreign key constraint is because the constraint ensures that the author for the blog record exists in the user table. Otherwise, it could be nonsense/bad data. The foreign key doesn't stop duplicates -- you'd need a primary or unique key for that -- the foreign key only validates data.
Now that Nanne clarified the identifying/non-identifying terminology for me, the blog.author_id would be the identifying relationship. Because it's identifying who (what user record) the author is.
The id column in both tables can be assumed to be be the primary key, because an artificial/surrogate key is the most common primary key. Which makes these columns the non-identifying relationship...
As a blog and a user are seperate things, and not defined by eachother, these are non-identifying relationships. One can be something without the other, evne though the author-id might be mandatory.
Also see this link for more explanation about the two terms: What's the difference between identifying and non-identifying relationships?
Say I have the following table:
TABLE: product
============================================================
| product_id | name | invoice_price | msrp |
------------------------------------------------------------
| 1 | Widget 1 | 10.00 | 15.00 |
------------------------------------------------------------
| 2 | Widget 2 | 8.00 | 12.00 |
------------------------------------------------------------
In this model, product_id is the PK and is referenced by a number of other tables.
I have a requirement that each row be unique. In the example about, a row is defined to be the name, invoice_price, and msrp columns. (Different tables may have varying definitions of which columns define a "row".)
QUESTIONS:
In the example above, should I make name, invoice_price, and msrp a composite key to guarantee uniqueness of each row?
If the answer to #1 is "yes", this would mean that the current PK, product_id, would not be defined as a key; rather, it would be just an auto-incrementing column. Would that be enough for other tables to use to create relationships to specific rows in the product table?
Note that in some cases, the table may have 10 or more columns that need to be unique. That'll be a lot of columns defining a composite key! Is that a bad thing?
I'm trying to decide if I should try to enforce such uniqueness in the database tier or the application tier. I feel I should do this in the database level, but I am concerned that there may be unintended side effects of using a non-key as a FK or having so many columns define a composite key.
When you have a lot of columns that you need to create a unique key across, create your own "key" using the data from the columns as the source. This would mean creating the key in the application layer, but the database would "enforce" the uniqueness. A simple method would be to use the md5 hash of all the sets of data for the record as your unique key. Then you just have a single piece of data you need to use in relations.
md5 is not guaranteed to be unique, but it may be good enough for your needs.
First off, your intuition to do it in the DB layer is correct if you can do it easily. This means even if your application logic changes, your DB constraints are still valid, lowering the chance of bugs.
But, are you sure you want uniqueness on that? I could easily see the same widget having different prices, say for sale items or what not.
I would recommend against enforcing uniqueness unless there's a real reason to.
You might have something like this (obvoiusly, don't use * in production code)
# get the lowest price for an item that's currently active
select *
from product p
where p.name = "widget 1" # a non-primary index on product.name would be advised
and p.active
order-by sale_price ascending
limit 1
You can define composite primary keys and also unique indexes. As long as your requirement is met, defining composite unique keys is not a bad design. Clearly, the more columns you add, the slower the process of updating the keys and searching the keys, but if the business requirement needs this, I don't think it is a negative as they have very optimized routines to do these.
I have a table which contains two type of data, either for Company or Employee.
Identifying that data by either 'C' or 'E' & a column storing primary key of it.
So how can I give foreign key depending on data contained & maintain referential integrity dynamically.
id | referenceid | documenttype
-------------------------------
1 | 12 | E
2 | 7 | C
Now row with id 1 should reference Employee table with pk 12 & row with id 2 should reference Company table with pk 7.
Otherwise I have to make two different tables for both.
Is there any other way to accomplish it.
If you really want to do this, you can have two nullable columns one for CompanyId and one for EmployeeId that act as foreign keys.
But I would rather you to try and review the database schema design.
It would be better to normalize the table - Creating separate tables for Company and Employee. You would also get better performance after normalization. Sincec the Company and Employee are separate entities, its better not to overlap them.
Personally, i would go with the two different table option.
Employee / Company seem to be distinct enough for me not to want to store their data together.
That will make the foreign key references also straight forward.
However, if you do want to still store it in one table, one way of maintaining the referential integrity would be through a trigger.
Have an Insert / Update trigger that checks the appropriate value in Company Master / Employee master depending on the value of column containing 'C' / 'E'
Personally, i would prefer avoiding such logic as triggers are notoriously hard to debug.