Hello people I have this foreign key dilemma. Let's say we have Table A and Table B and Table C.
Table A is child of super table B and the records are connected through foreign key on id from A to B (one way). Now table C contains information that could be applied to A and B. I know that having this information on table B will come in handy but I am not sure about table A, technically the information could belong also to table A.
Now my question is, would it be better, to have table A access information in table C through its parent row in table B or make a "shortcut" from table A to table C and reference table C directly?
To simplify those two options:
Option 1: table A references table B + table B references table C
Option 2: table A references table B + table B references table C + table A references table C
Is there any benefit doing option 2 since same information is one table away in option 1?
A FOREIGN KEY is
an implicitly generated index (for performance) and
a constraint (for data integrity);
A FK is only indirectly "how you reference another table". So, I would prefer you simply talk about columns. (Any column could be used to 'reference' any other table.)
One of many textbook principles is DRY -- Don't Repeat Yourself. It is a wise principle because eventually something will go wrong and the repeated data will become inconsistent. The extra link from A to C is redundant.
On the other hand, in huge datasets, all sorts of textbook principles are violated to provide the required performance. (By "huge", I mean billions of rows, perhaps millions, but not thousands.)
Since you seem to be just starting out, I recommend not having the short cut and worrying about performance when you hit a problem. Yes, it will take an extra JOIN.
For novices, performance problems usually happens pretty soon, but not because of the lack of a shortcut; there are many other lessons to learn first. Hint: Learn about "composite indexes"; I think it is the number one performance technique that beginners fail to learn about. (And focusing on FKs distracts from focusing on INDEXes.)
Related
I have two tables: students and courses, assuming that each student can be in more than one course and that each course can have more than one student.
[Table Students] [Table Courses]
id(PK) id(PK)
name name
age duration
etc... etc...
and what I want to do it is to relate both tables into another table, for example, studying, in which I will store the course or courses that is doing each student. Like this:
[Table studying]
idStudent
idCourse
What I have deduced
I think that idStudent and idCourse should be foreign keys because the information it is stored in students and courses respectively with an unique primary key and to respect the consistency of the database. It cannot exist a relation without information neither of the student nor the course or just without the information of one of them.
I also know that some tables has two primary keys to allow that in the table could exist more than one repeated value of a primary key, but not of both primary keys at the same time.
My questions
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
Should the table studying has another column with an ID?
Is my deduction in the good way?
P.S: I do not need sql statements, I just need help to clarify my confusion.
Thanks in advance!
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
You want them to be foreign keys, because the existence of each record on your third table depends on the availability of the first, that is, there cannot be a "Student Course" or a "Course with Students" without either the course or the student. It could (if you don't make those keys) but you would break referential integrity
On the other hand, having FK's is usually a good thing because you make sure that you don't remove dependable records by mistake (which is what the constraint is for on the first place) unless you did something like cascade deleting
Should the table studying has another column with an ID?
No, it does not have to but again, sometimes it is a good practice because some software like Object Relational Mappers, Diagram Software, etc. may rely on the fact that they always needs a by-convention primary key. Some others don't even support composite keys so while it is not mandatory it can help in the future and it does not hurt. Of course this all depends on what you are using the database for and how (pure SQL, which engine you use, if you use it with a framework etc.)
Is my deduction in the good way?
All is relative. But I think your logic is good. My advice is that you always design your data schemas as flexible as you can because if a project grows its harder (and more costly) to do those changes down the road. Invest time on thinking how you may expand your application functionality and think if the schema will adapt to it.
Your deduction is correct.
In fact, you should have a composite primary key consisting of both (idStudent, idCourse) columns, because this tuple is the identifier of row in the table, you do not need additional ID column (of course, you can also take that approach to add additional ID column that would be your primary key, but you do not need it if one student can have one course assigned only once)
To respect the integrity, both columns (separately) should be foreign keys - idStudent should be referencing id column of Students table and idCourse should reference id column of Courses table.
If you like you can make them primary keys on studying table. But this is unnecesary, because relation (role of studying table) is many to many and this kind of table dont need primary keys. You need to know that also when you make them pk (pair of student id and course id) , thats mean that theee could be only one pair of each, thats equivalent to constrain unique - student can take a course only ones. In the future you maybe would like to add to this table start_date and this kind of pk could be a problem, you will need to modify them.
I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK
This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.
Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.
Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.
So imagine you have multiple tables in your database each with it's own structure and each with a PRIMARY KEY of it's own.
Now you want to have a Favorites table so that users can add items as favorites. Since there are multiple tables the first thing that comes in mind is to create one Favorites table per table:
Say you have a table called Posts with PRIMARY KEY (post_id) and you create a Post_Favorites with PRIMARY KEY (user_id, post_id)
This would probably be the simplest solution, but could it be possible to have one Favorites table joining across multiple tables?
I've though of the following as a possible solution:
Create a new table called Master with primary key (master_id). Add triggers on all tables in your database on insert, to generate a new master_id and write it along the row in your table. Also let's consider that we also write in the Master table, where the master_id has been used (on which table)
Now you can have one Favorites table with PRIMARY KEY (user_id, master_id)
You can select the Favorites table and join with each individual table on the master_id and get the the favorites per table. But would it be possible to get all the favorites with one query (maybe not a query, but a stored procedure?)
Do you think that this is a stupid approach? Since you will perform one query per table what are you gaining by having a single table?
What are your thoughts on the matter?
One way wold be to sub-type all possible tables to a generic super-type (Entity) and than link user preferences to that super-type. For example:
I think you're on the right track, but a table-based inheritance approach would be great here:
Create a table master_ids, with just one column: an int-identity primary key field called master_id.
On your other tables, (users as an example), change the user_id column from being an int-identity primary key to being just an int primary key. Next, make user_id a foreign key to master_ids.master_id.
This largely preserves data integrity. The only place you can trip up is if you have a master_id = 1, and with a user_id = 1 and a post_id = 1. For a given master_id, you should have only one entry across all tables. In this scenario you have no way of knowing whether master_id 1 refers to the user or to the post. A way to make sure this doesn't happen is to add a second column to the master_ids table, a type_id column. Type_id 1 can refer to users, type_id 2 can refer to posts, etc.. Then you are pretty much good.
Code "gymnastics" may be a bit necessary for inserts. If you're using a good ORM, it shouldn't be a problem. If not, stored procs for inserts are the way to go. But you're having your cake and eating it too.
I'm not sure I really understand the alternative you propose.
But in general, when given the choice of 1) "more tables" or 2) "a mega-table supported by a bunch of fancy code work" ..your interests are best served by more tables without the code gymnastics.
A Red Flag was "Add triggers on all tables in your database" each trigger fire is a performance hit of it's own.
The database designers have built in all kinds of technology to optimize tables/indexes, much of it behind the scenes without you knowing it. Just sit back and enjoy the ride.
Try these for inspiration Database Answers ..no affiliation to me.
An alternative to your approach might be to have the favorites table as user_id, object_id, object_type. When inserting in the favorites table just insert the type of the favorite. However i dont see a simple query being able to work with your approach or mine. One way to go about it might be to use UNION and get one combined resultset and then identify what type of record it is based on the type. Another thing you can do is, turn the UNION query into a MySQL VIEW and simply query that VIEW.
The benefit of using a single table for favorites is a simplicity, which some might consider as against the database normalization rules. But on the upside, you dont have to create so many favorites table and you can add anything to favorites easily by just coming up with a new object_type identifier.
It sounds like you have an is-a type relationship that needs to be modeled. All of the items that can be favourited are a type of "item". It sounds like you are on the right track, but I wouldn't use triggers. What could be the right answer if I have understood correctly, is to pull all the common fields into a single table called items (master is a poor name, master of what?), this should include all the common data that would be needed when you need a users favourite items, I'd expect this to include fields like item_id (primary key), item_type and human_readable_name and maybe some metadata about when the item was created, modified etc. Each of your specific item types would have its own table containing data specific to that item type with an item_id field that has a foreign key relationship to the item table. Then you'd wrap each item type in its own insertion, update and selection SPs (i.e. InsertItemCheese, UpdateItemMonkey, SelectItemCarKeys). The favourites table would then work as you describe, but you only need to select from the item table. If your app needs the specific data for each item type, it would have to be queried for each item (caching is your friend here).
If MySQL supports SPs with multiple result sets you could write one that outputs all the items as a result set, then a result set for each item type if you need all the specific item data in one go. For most cases I would not expect you to need all the data all the time.
Keep in mind that not EVERY use of a PK column needs a constraint. For example a logging table. Even though a logging table has a copy of the PK column from the table being logged, you can't build a constraint.
What would be the worst possible case. You insert a record for Oprah's TV show into the favorites table and then next year you delete the Oprah Show from the list of TV shows but don't delete that ID from the Favorites table? Will that break anything? Probably not. When you join favorites to TV shows that record will fall out of the result set.
There are a couple of ways to share values for PK's. Oracle has the advantage of sequences. If you don't have those you can add a "Step" to your Autonumber fields. There's always a risk though.
Say you think you'll never have more than 10 tables of "things which could be favored" Then start your PK's at 0 for the first table increment by 10, 1 for the second table increment by 10, 2 for the third... and so on. That will guarantee that all the values will be unique across those 10 tables. The risk is that a future requirement will add table 11. You can always 'pad' your guestimate
I'm designing a small database for a personal project, and one of the tables, call it table C, needs to have a foreign key to one of two tables, call them A and B, differing by entry. What's the best way to implement this?
Ideas so far:
Create the table with two nullable foreign key fields connecting to the two tables.
Possibly with a trigger to reject inserts and updates that would result 0 or 2 of them being null.
Two separate tables with identical data
This breaks the rule about duplicating data.
What's a more elegant way of solving this problem?
You're describing a design called Polymorphic Associations. This often gets people into trouble.
What I usually recommend:
A --> D <-- B
^
|
C
In this design, you create a common parent table D that both A and B reference. This is analogous to a common supertype in OO design. Now your child table C can reference the super-table and from there you can get to the respective sub-table.
Through constraints and compound keys you can make sure a given row in D can be referenced only by A or B but not both.
If you're sure that C will only ever be referring to one of two tables (and not one of N), then your first choice is a sensible approach (and is one I've used before). But if you think the number of foreign key columns is going to keep increasing, this suggests there's some similarity or overlap that could be incorporated, and you might want to reconsider.
I am attempting to build a database for inventory control using a large number of tables and enforced relationships, and I just ran into the 32-relationship (index) limit for an Access table (using Access 2007).
Just to clarify: the problem isn't that the Employees table has 32 explicit indexes. Rather, the problem is the limitation on the number of times the Employee table can be referenced in FOREIGN KEY constraints. For example:
CREATE TABLE Employees (employee_number INTEGER NOT NULL UNIQUE)
;
CREATE TABLE Table01 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table02 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table03 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
...
CREATE TABLE Table30 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table31 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table32 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
An exception is thrown on the last line above, "Could not create index; too many indexes defined.
What options do I have to work around this limitation?
I've heard that creating a duplicate table with a 1:1 relationship is one method. I'm new to database design, so please correct me if I'm wrong; but given a table Employees with 31 indexes, I would create a table Employees2(with one field?) with a 1:1 relationship to Employees and relationships to this new table from any remaining relations in which EmployeeID is a foreign key. What's the best way to ensure the second table is populated alongside the first?
Is there another approach?
Based on the lack of information available, it seems this may be a rare problem with a properly-designed database, or the solution is common knowledge. Forgive the noob!
Update: Immediate consensus is that my design is borked or far too ambitious. This could very well be the case. However, I'd rather have a general design discussion within a separate question, so for the sake of argument, can someone answer this one? If the answer is simply "Don't ever do that" I'll have to accept it.
I've run into this limitation a number of times with my apps. And I can assure the other posters that my apps are very well designed.
One problem is that Access creates indexes due to relationships and lookup fields that aren't viewable on the main index property box but they are accessible via DAO collections. These indexes are frequently duplicate indexes to indexes you have created as well.
I have a tool consisting of several forms you import into your BE MDB that allows you to remove the duplicate indexes. As I haven't yet made this available on my website please email me for it.
I'd suggest just not defining all the relationships/indexes to implementing a 1:1 relationship to get around it. Neither solution is optimal, but the later is going to create a much higher maintenance burden and data anomaly potential.
I am not going to decry the design as quicky as some of the others, but it does have me intrigued. Could you list the fields of the employee table that are foreign keys? There is a good liklihood that some normalization is in order and maybe some of the smart people on SO could make some design suggestions to work around the issue.
It is hard for me to believe that an Employee table would need 32 indexes; if it actually does you should consider migrating to at least SQL Express.
... I would create a table
Employees2(with one field?) with a 1:1
relationship to Employees and
relationships to this new table from
any remaining relations in which
EmployeeID is a foreign key.
That is workable. Presumably your main table might have an Autonumber field as the primary key, or you generate an index number. Your Employees2 table obviously must echo that.
What's the best way to ensure the
second table is populated alongside
the first?
That depends somewhat on how you are adding records. But in general, of course you must comply with the rules for integrity. This usually comes down to appending to tables in the correct order and ensuring each record is saved before trying to add a related record elsewhere.