I came across a rather stumping question out of the blue. In SQL, would a primary key be considered an attribute to a particular entity? I've done some research and although it is a specific key to uniquely identify an attribute, would it also be included as an attribute to a particular entity? Part of me says yes and part of me says no since it is just a key that specifies an attribute. Any advice would be very useful.
If primary key is formed out of the attributes of the entity then its a part of the entity. In case of surrogate key its not a part of the entity but associated with it.
Hope I got the intent of your question.
Related
Consider the following scenario:
chapter(chapter_id*,book_id, chapter_no)
book(book_id*)
user(user_id*)
position(user_id*,book_id,chapter_id*)
= (component of) PRIMARY KEY
When I want to know in which chapter of a book the user currently is, I simply query the Position table with certain User and Book. But the problem is that the foreign key to Book is garbage here because the Chapter specifies the Book already!
What should I do now?
It's always depending of queries you will do against database. You will have to decide if duplicate a value worth to avoid a JOIN.
With what you give I would drop the chapter table and simply use
position(user_id*,book_id,chapter_no)
Can it be enough for your application ?
I think your design is valid, and position.book_id isn't "garbage" at all. You wrote:
When I want to know in which chapter of a book the user currently is [..]
So it sounds like a user can have only one "position" per book, and your unique/primary key should be user_id,book_id instead of user_id,chapter_id.
In order to prevent a foreign key mismatch (book_id and chapter_id do not match) you can define a compound foregn key (book_id, chapter_id) referencing chapter(book_id, chapter_id).
I know this is an odd question because I've always been taught to use a foreign key constraint. However, I've come across a case where a foreign key reference value must be kept for historic purpose when the reference is deleted.
It is a task management system whereby a task occurrence references a parent task containing the recurrence rule. This parent task can be deleted, but the occurrence itself must remain in tact with the non-existing parent id. If the parent task cannot be found, the system simply returns an error - eg "parent task no longer exist." The reason why the parent id cannot be set to null on cascade is because it is being used elsewhere in the occurrence as an identifying key.
Another example: What about a YouTube video that was removed from a playlist. Similar situation right? It is being referenced in the playlist, but the video doesn't exist, so it returns an error in the playlist instead.
Do I simply not define a foreign key at all and just simply create the parent_id reference column as a normal column? I just want to be sure how this is normally handled when one encounters a case where one table references another, but the former is not constraint by the existence of the latter.
Having a constraint is just a technical helper to enforce the semantics defined for the database, i.e. "this column contains a number that is not only an INTEGER(32) but also an identifier for a record in some other table". As such they're not strictly necessary, but it:
makes the intention of the field clear (self documentation)
keeps your data "clean" by preventing incorrect data from being inserted
gives the database engine a hint concerning the content of the table which may allow the db to perform more efficiently.
That said, the "proper" way to accomplish what you've described would be not to physically delete the parent record in the first place. Instead, mark the parent as deleted. Since you're keeping the record for historical purposes, surely you'll want to be able to know what the parent used to be, even if it's no longer active or valid.
Second option would be to create a dummy "parent record deleted" reference. Whenever you delete a parent, you update remaining references to point to the dummy record instead. At least you wouldn't rely on errors to implement expected and valid behaviour.
Finally, I see no reason you shouldn't be able to set the foreign key to NULL. It sounds like you're using the foreign key as part of the primary key of the record in question ("is being used .. as an identifying key"). This you almost certainly should not be doing, if that's the root cause of the problem, start by changing that.
Do I simply not define a foreign key at all and just simply create the
parent_id reference column as a normal column?
Yes. At least this is the way I got to know and how we handle this stuff at work.
You might then want to set an index on the reference column.
So the question is self explanatory. And I promise I check for similar question.
like this : question
or this : question
have to say I already read mysql manual, and I know that PK is null and unique.
I was looking at moodle's users table for example and they just have checked PK an NN field, is this a secure assignment for a user id value? while in other databases they have both PK and UQ checked. Please let me know if you have found any reason to have both checked or you think is redundant.
If at some point you found something additional in case like use this fields as foreign keys or any other thing I miss. please let me now your experience. I would accept the answer which contain extra information from the mysql manual rather than on it.
Thanks in advance.
A primary key is also a unique key but not viceversa. See this question difference between key typos
So it is redundant to add a unique constraint on a primary key.
So, I've read a whole lot of answers here on stackoverflow, but I'm still confused about the whole concept thereof. Specifically, I've gone over this article (including all the ones it references), but can't seem to find a solid grasp on the concept (or perhaps it is my confusion between cardinality (n:m, etc.) and identities):
Still Confused About Identifying vs. Non-Identifying Relationships
My issue is this: I know that identifying relationships imply that the primary key of a child entity must include its foreign key, and that the opposite is true for non-identifying relationships (Is this correct?). Now, this seems a bit too "forward thinking" to me? The same was also said in one of the comments in one of the links. How can I "take a step back" and actually see which relations are of which identity?
For example, I have two dilemmas:
job_title (parent, 1) to employee (child, 1..*). Am I right in thinking that, because job_title is a lookup table, it must be a non-identifying relation? Or would it be more accurate in saying that "an employee can't exist without a job_title, thus it must be identifying"? Or would it be the relationship defining that scenario?
employee to employee_equipment (bridging entity between the m:n cardinality) to equipment. Now, I read that this has to be an identifying relationship on both sides of employee_equipment. But, what if an employee doesn't NEED equipment? Can one have an optional identifying relationship?
I guess that I'm really looking for a way to identify which identity tables should belong to, without thinking of primary/foreign keys, or anything really technical for that matter.
Any help would be much appreciated!
You are over-thinking the linkage between optionality and identity. Until the whole thing comes more naturally to you, it's best to think of them as being completely unrelated.
About optionality, it is important to remember that the optionality is directional. To use your example of employee_equipment: Sure, employees don't need equipment. The one-to-many relationship from employee to employee_equipment is optional. At the same time, looking at it from the opposite perspective, the relationship is mandatory. You can't have a record in employee_equipment unless there is an employee to associate it with.
Identity has nothing to do with optionality, except coincidentally an identifying relationship is mandatory from the child to the parent. Whether it is also mandatory from the parent to the child is neither here nor there as far as identity is concerned.
What makes a relationship identifying is that you have to know what parent you are talking about (as well as some other things) in order to know what child you are talking about. That is, the primary key of the child must include a foreign key to the parent.
Pure intersection tables (e.g. employee_equipment) are good examples of this. The primary key of a pure intersection is the combination of the foreign keys to both parent tables. Note that some people may also add a surrogate key to these kinds of tables. It doesn't matter so much from an identity perspective if there are multiple candidate keys. What is important in determining identity is whether the foreign key is part of a candidate key, whether or not that candidate key happens to be the primary key.
Another good example would be something like a database's metadata catalog, where a column is identified by the table to which it belongs, just as the table is identified by the schema it is in, and so on. Knowing that a column is called NAME doesn't tell you which column it is. Knowing that it is the NAME column in the CUSTOMER table helps. (You'll also have to know which schema CUSTOMER is in, and so forth).
Joel has provided a good answer (+1 to him), let me just offer a small mental shortcut that you can use when thinking about identifying relationships... just ask yourself:
Can I achieve uniqueness only with the attributes of the child entity?
If no, and you need to include the attributes migrated from the parent into the child key to make it unique, then you have an identifying relationship1. It's about identification-dependence, not existence-dependence2!
You might be interested in this post for some more musings on the topic.
1 And the child entity is "weak" or "dependent".
2 Although identification-dependence usually implies existence-dependence.
I'm having trouble understanding the difference between partial keys/weak entities and foreign keys. I feel like an idiot for not being able to understand this stuff.
As I understand it:
Weak Entity: An entity that is dependent on another entity.
Partial Key: Specifies a key that that is only partially unique. Used for weak entities.
vs
Foreign Key: A key that is used to establish and enforce a relation between data in different tables.
These don't seem like they're the same thing, but I'm having trouble distinguishing their uses.
Take the [very] simple example:
We have employees specified by an empid. We also have children specified by name. A
child is uniquely specified by name when the parent (employee) is known.
Would the child entity be a weak identity where the partial key is the name (partially unique)? Or should I be using a foreign key because I'm trying to establish and enforce a relation between employee and child? I feel like I can justify both, but I also feel like I'm missing something here. Any insight is appreciated, and I apologize for the stupid questions.
The problem is not you, it is that the ancient textbook or whatever you are using is pure excreta, the "definitions" are not clear, and there have been standard definitions for Relational Databases in use for over 30 years, which are much more clear. The "definitions" you have posted are in fact quite the opposite, non-intuitive, and it is no surprise that people would get confused.
A Foreign Key in a child row, is the value that references its parent Primary Key (in the parent table).
Using IDEF1X terminology. An Identifying Relation is one in which the FK (the parent Pk in the child) is also used to form the child PK. It is unique in the parent, but not unique in the child, you need to add some column to make it unique. Hence the stupid term "Partial Key". Either it is a Key (unique) or it is not a Key; the concept of a "partial Key" is too stupid to contemplate.
In a properly Normalised and standard-compliant database, there will be very few Independent entities. All the rest will be Dependent on some Independent entity. Such entities are not "weak", except in the sense that they cannot exist without the entity that they are Dependent upon.
The use of Identifying Relations (as opposed to Non-identifying) is actually strong; it gives the Dependent ("weak") entities their Identifier. So silly terms like "weak" and "strong" should not be used in a science that demands precision.
Use standard terms.
But to answer your explicit question:
assuming that Employee is "strong" and has a Primary Key (EmployeeId)
then the "weak" EmployeeChild table would need a FK (EmployeeId) to identify the Employee
which would be the perfect first component of the EmployeeChild table, the adorable "partial key"
to which you might add ChildNo, in order to make an ordinary Relational Primary Key
but it is not really "partial" because it is the full Primary Key of the Parent.
Readers who are unfamiliar with the Standard for Modelling Relational Databases may find â–¶IDEF1X Notationâ—€ useful.
A weak entity type is one whose primary key includes some attribute(s) that reference another entity. In other words a foreign key is a subset of the primary key. Therefore the entity cannot exist without its parent.
A partial key means just part of a key - some proper subset of the key attributes.
In your example if the primary key of a Child was (Empid, ChildName) with Empid as a foreign key referencing the Employee then Child is a weak entity. If Empid was not part of the primary key then Child would be a strong entity.
It's worth bearing in mind that the weak/strong distinction is purely an ER modelling concept. In relational database terms it doesn't make much difference. In particular the relational model doesn't make any distinction between primary keys and other candidate keys so for all practical purposes it doesn't make any difference to single out primary key attributes as being a "special" case when they reference other tables.
Suppose there is a relation between two entity Employees and Dependents. Employees is strong entity and Dependents is weak entity. Dependents have attributes Name, Age, Relation and Employees have attributes E_Id (primary key) and E_Name.
Then to satisfy relation we use foreign key E_Id in Dependents table which refers to the E_Id of Employees table.
But by using only foregin key we can't identify the tuples uniquely in Dependents table we require Name(partial key) attribute also to identify the tuples uniquely.
Example : suppose Dependents table has values in Name are Rahul, Akshat, Rahul then it will not unique and when it combine with E_Id then we can identify it uniquely.
E_Id with Name acts as primary key in Dependents table.