Determing correct key values for multiple MySQL tables - mysql

I am trying to determine how the tables need to be linked in which ways.
The employees tables is directly linked to a number of tables which provide more information. A few of those tables have even more details.
Employees have a unique employeeid but I understand best practice is to still have a id?
Customers have a unique customerid
Employees have a manager
Managers are employees
Customers have a manager associated with them. manager associated with them
Employees may have a academic, certification and/or professional information.
With all of this said what would be the best recommendation for creating primary and foreign keys? Is there a better way to handle the design?
EDIT
Updated diagram to reflect feedback thus far. See comments to understand changes taking place.

Though your question is sensible, before you go any further in design, I would suggest for you to spend some time understanding relationships, foreign keys and how they propagate through relationships.
The diagram is utterly wrong. It will help it you start naming primary keys with full name, TableNameID, like EmployeeID; then it will become obvious how keys propagate through relationships. If you had full names you would have noticed that all your arrows are pointing in the wrong direction; parent and child are reversed. Anyway, takes some practice. So I would suggest that you rework the diagram and post the new version, so that we can comment on that one. It should start looking something like this (just a small segment)
EDIT
This is supposed to point you to towards the next step. See if you can read description (specification) and follow the diagram at the same time.
Each employee has one manager, a manager manages many employees.
A manager is an employee.
Each customer is managed by an employee who acts as an account manager for that customer.
Account manages for a customer may change over time.
Each employee is a member of one team, each team has many employees.
Employee performance for each employee is tracked over time.
Employee may have many credentials, each credential belongs to one employee only.
Credential is academic or professional.

Employees have unique employeeID but I understand best practice is to
still have a id?
No. (But keep reading.) You need an id number in two cases: 1) when you don't have any other way to identify an entity, and 2) when you're trying to improve join performance. And that doesn't always work. ID numbers always require more joins, but when the critical information is carried in human-readable codes or in natural keys, you don't need a join to get to it. Also, routine use of ID numbers often introduces data integrity problems, because unique constraints on natural keys are often omitted. For example, USPS postal codes for state names are unique, but tables that use ID numbers instead of USPS postal codes often omit the unique constraint on that two-character column. In your case, you need a unique constraint on employee number regardless. (But keep reading.)
Employees have a manager.
Does the table "team" implement this requirement? If the "manager" table identifies managers, then your other manager columns should reference "manager". (In this diagram, that's customers, team, and customer_orders.)
Managers are employees.
And that information is stored in the table "manager"?
Customers have a manager associated with them.
And so does each order. Did you mean to do that?
Employees may have a academic, certification and/or professional
information.
But apparently you can't record any of that information unless you store a skill first. That doesn't sound like a good approach. Give it some more thought.
Whenever you design tables with overlapping predicates (overlapping meanings), you need to stop and sit on your hands for a few minutes. In this case, the predicates of the tables "employees" and "customers" overlap.
If employees can be customers, which is the case for almost every business, then you have set the stage for update anomalies. An employee's last name changes. Clearly, you must update "employees". But do you have to update "customers" too? How do you know? You can't really tell, because those two tables have independent id numbers.
An informal rule of thumb: if any real-world entity has more than one primary key identifying it in your database, you have a problem. In this case, an employee who is also a customer would have two independent primary keys identifying that person--an employee id and a customer id.
Consider adding a table of persons, some of whom will be employees, and some of whom will be customers. If your database is well-designed and useful, I'll bet that later some of the persons will be prospects, some will be job applicants, and so on. You will need an id number for persons, because in the most general case all you can count on knowing is their name.
At some point, you'll have to take your database design knowledge to the next level. For an early start, read this article on people's names.

Related

Relating data from one table to another in mySQL

Here I have a EER diagram I am creating in the MySQL editor.
I want Person to relate to employee. I want some Persons entries to be employees/Customers/etc.
How would I implement this? Would it be a shared key? Join the tables? Related data?
I tried making EmployeeID a foreign key to PersonID. The tables still don't "relate". (There's no way to tell what employee X's name/dob/ect is.)
I encourage you to try again in the direction of making employeeID refer to personID, as a foreign key. You may not have implemented the whole idea.
The technique has a name: Shared Primary Key. If you search on this phrase, you will find articles describing what you have to do when adding a new employee. Without taking the correct steps when a new employee is added, you break the relationship between the two tables.
There is a tag with this name here in StackOverflow. shared-primary-key
What you are basically doing is setting up a situation where employee is a subclass of person. In an object-oriented system, this would be easy. SQL is not object oriented at the base level.

Is it proper to make a grand-parent key, a primary key, in its grand-child, in a multi-level identifying relationship?

Asked this here a couple of days ago, but haven't gotten many views, let alone a response, so I'm reposting to stackoverflow.
I'm modeling a DB for a conference ticketing system. In this system attendees are members of an attendee group, which belong to a conference. These relationships are identifying, and therefore FKs must be PKs in the respective children.
My current model:
Q: Is it proper to have attendeeGroupConferenceId FK, as a PK, in the attendee table, as MySQL Workbench has automatically set up for me?
On one side one would get a performance boost by keeping it in there for quick association at "check in". However, it does not strictly necessary since the combination of id, attendeeGroupId, and a corresponding lookup of conferenceId in the respective attendeeGroup table, is enough. (Therefore becomes redundant data.)
To me, it feels like it might violate some form of normalization, but I plan on keeping it in for the speed boost as described. I'm just curious about what proper design says about giving it PK status or not.
You definitely don't need the attendeeGroupConferenceId in your attendee table. It's redundant and notice that candidate key is the combination of (attendeeGroupId, personId), not the attendeeGroupConferenceId alone.
The table attendee also seems to violate the Second normal form (2NF) as it is.
My suggestion is to remove the attribute attendeeGroupConferenceId. In any case you can just join the tables in your queries to get extra info rather than keeping an extra attribute.

How to spot the relationship in RDBMS?

I was studying about relationships in RDBMS.I have understood the basic concept behind mapping relation ship,but I am not able to spot them.
The three possibilities :
one to many(Most common) requires a PK - FK relationsip.Two tables involved
many to many(less common) requires a junction table.Three tables Involved
one to one(very rare). One table involved.
When I begin a project,I am not able to separate the first two conditions and I am not clear in my head.
Examples when I study help for a brief moment,but not when I need to put these principles in to practice.
This is the place where most begineers falter.
How can I spot these relationships.Is there a simpler way?
Don't look at relationships from a technical perspective. Use analogies and real-life examples when trying to envision relationships in your head.
For example, let's say we have a library database.
A library must have books.
M:M
Each Book may have been written by multiple Authors and each Author may have written multiple Books. Thus it is a many-to-many relationship which will reflect into 3 tables in the database.
1:M
Each Book must also have a Publisher, but a Book may only have one Publisher and a Publisher can publish many Books. Thus it is a one-to-many relationship and it reflects with the PublisherId being referenced in the Books table.
A simple analogy like this one explains relationships to their core. When you try to look at them through a technical lens you're only making it harder on yourself. What's actually difficult is applying real world data scenarios when constructing your database.
I think the reason you are not getting the answers that you need is because of the way you are framing the question. Instead of asking “How do I spot the correct type of relationship between entities”, think about “How do my functional needs dictate what relationship to implement”. Database design doesn’t drive the function; it’s the functional needs that drive the relationships you need to implement.
When designing a database structure, you need to identify all the entities. Entities are all the facts that you want to store: lists of things like book titles, invoices, countries, dog species, etc. Then to identify your relationships, you have to consider the types of questions you will want to ask your database. It takes a bit of forward thinking sometimes… just because nobody is asking the question now doesn’t mean that it might not ever be asked. So you can’t ask the universe “what is the relationship between these lists of facts?” because there is no definitive answer. You define the universe… I only want to know answers to these types of questions; therefore I need to use this type of relationship.
Let’s examine an example relation between two common entities: a table of customers and a table of store locations. There is no “correct” way to relate these entities without first defining what you need to know about them. Let’s say you work for a retailer and you want to give a customer a default store designation so they can see products on the website that their local store has in stock. This only requires a one-to-many relationship between a store and the customer. Designing the relationship this way ensures that one store can have many customers as their default and each customer can only have one default store. To implement this relationship is as easy as adding a DefaultStore field to your Customer table as a foreign key that links to the primary key of the Store table.
The same two entities above might have alternate requirements for the relationship definition in a different context. Let’s say that I need to be able to give the customer the opportunity to select a list of favorite stores so that they can query about in stock information about all of them at once. This requires a many-to-many relationship because you want one customer to be able to relate to many stores and each store can also relate to many customers. To implement a many-to-many relationship requires a little more overhead because you will have to create a separate table to define the relationship links, but you get this additional functionality. You might call your relationship table something like CustomerStoreFavorites and would have as its primary key as the combined primary keys from each of the entities: (CustomerID, StoreID). You could also add attributes to the relationship, like possibly a LastOrderDate field to specify the last date that the customer ordered something from a particular store.
You could technically define both types of relationships for the same two entities. As an example: maybe you need to give the customer the option to select a default store, but you also need to be able to record the last date that a customer ordered something from a particular store. You could implement the DefaultStore field on the Customer table with the foreign key to the Store table and also create a relationship table to track all the stores that a customer has ordered from.
If you had some weird situation where every customer had their own store, then you wouldn’t even need to create two tables for your entities because you can fit all the attributes for both the customer and the store into one table.
In short, the way you determine which type of relationship to implement is to ask yourself what questions you will need to ask the database. The way you design it will restrict the relational data you can collect as well as the queries you can ask. If I design a one-to-many relationship from the store to the customer, I won’t be able to ask questions about all the stores that each customer has ordered from unless I can get to that information though other relationships. For example, I could create an entity called "purchases" which has a one-to-many relationship to the customer and store. If each purchase is defined to relate to one customer and one store, now I can query “what stores has this customer ordered from?” In fact with this structure I am able to capture and report on a much richer source of information about all of the customer's purchases at any store. So you also need to consider the context of all the other relationships in your database to decide which relationship to implement between two particular entities.
There is no magic formula, so it just takes practice, experience, and a little creativity. ER Diagrams are a great way to get your design out of your head and onto paper so that you can analyze your design and ensure that you can get the right types of questions answered. There are also a lot of books and resources to learn about database architecture. One good book I learned a lot from was “Database System Concepts” by Abraham Silberschatz and Henry Korth.
Say you have two tables A and B. Consider an entry from A and think of how many entries from B it could possibly be related with at most: only one, or more? Then consider an entry from B and think of how many entries in A it could be related with.
Some examples:
Table A: Mothers, Table B: Children. Each child has only one mother but a mother may have one or more children. Mothers and Children have a one-to-many relationship.
Table A: Doctors, Table B: Patients. Each patient may be visiting one or more doctors and each doctor treats one or more patients. So they have a many-to-many relationship.
An example of one to one:
LicencePlate to Vehicle. One licence plate belongs to one vehicle and one vehicle has one licence plate.

MySQL foreign key connecting to two tables with a constant?

We have a MySQL database with employees, companies and addresses tables.
The setup works like this: employees and companies each have an id. Since they are in two different tables, an employee can have id = 1 and the company can have id = 1. Both can have multiple addresses.
Now the address table has two columns that links it either to a company or to a employee:
element_id
element_type_id
element_type_id is either 1 = person or 2 = company
The whole thing is a bit more complex and there are many more tables, but that kinda explains the concept.
The problem is now that we would like to start using the Entity Framework and for that we need to define relationships with foreign keys.
But that sounds pretty much impossible with the setup that we currently have, does it?
Since the address table would need to be combined with the person and the company somewhow....
Any ideas?
This concept is known as polymorphic associations. I once asked a question about it, because I had a similar data structure that I wanted to make referentially sound. In terms of this question your Address corresponds with the Comment table and Employee and Company with Person etc.
The answer was excellent. If you can change the schema I would certainly go for it and use that approach.
If you can't change the schema you can always subtype your Address class in your EF model and use element_type_id as the discriminator column. You would create subtypes EmployeeAddress and CompanyAddress, where Employee refers to the former and Company to the latter. But as "the whole thing is a bit more complex" I'm not sure if this will be feasible in your situation.
You already meantioned yourself that this concept is not fitting into a relational database. It is not wrong, MySQL and all the other just don't support it. And it is not a normalized layout, so it might be regarded bad practice by database engineers.
You should think about separating the address from the relation by creating two additional tables: employee_adresses and company_addresses, each holding a relation to the company and the address or the employee and the address. This way you might get one address that is used for a company and many employees, which might be a good thing (normalized structure).

Always create unique keys whenever possible?

Should you always create unique keys whenever possible?
For example let's say I have a table with three fields, student ID, first name, last name and the student ID is the primary key.
If no two students have the first & last name, should I create a unique key for those two fields?
Yes, you should use unique indexes even when you already have a primary key when the column or combination of columns are unique. It's good to have constraints in your database to prevent bad data. However, this is not what you have in your case. Even if you currently have no students with duplicate names that can easily happen in the future. Names are not unique in the world.
U.S. Social Security numbers are almost always unique (they can be reused after a number of years, but it's unlikely to ever happen in your case), so they might make for a good candidate for a unique index. If you have non-U.S. students though then you would need to make the column nullable.
Yes, usually having unique IDs (surrogate keys) is best. In this case, last name and first name are not enough for a primary key. Even if you no duplicate names now, you can't be sure you won't have two John Smith's in the future.
Don't make the assumption that no two students will have the same name.
When the underlying model suggests it, it is a good idea to create unique keys. Constraints like these will ensure cohesive data and prevent errors. But in your case the underlying model does not suggest this to be the case.
Unique keys should follow business definitions; if the studentID is a "semi-natural" key (it has unique meaning that exists beyond your specific database), then that should suffice as your unique key.
If the studentID is simply an identity value that is assigned by the database as a row-number, then you probably need some other unique key to avoid entering the same student twice.
Primitive primary key with no relation to data domain is one of widely accepted best practices
( just imagine - one of your students decides to marry )
Another good practice (though from NoSql) world is to use GUID - this way keys are unique, and different datasets can be mixed in same table without collisions.
PS: you could save some storage space, but today it is cheap and there is no need to sacrifice good practices for it
Yes!
If you ever need to update or delete rows from the table, it is very advantageous to have something to uniquely identify each row in the table.
With your example, I don't think it's possible to guarantee no two students will share the same name. Even adding a date of birth still can't guarantee they'll always be unique. I'd recommend adding an auto incrementing INT or BIGINT as the primary key.
You can always add the Unique constraint as well and remove it if it becomes an issue.
A simple way to do it is use an auto-generated Guid (Globally Unique Identifier) to identify a student. It is "guarenteed" to be unique every time it is generated. Names can change (like when somebody gets married), but some auto generated value has no meaning so should never need to be changed.
http://en.wikipedia.org/wiki/Globally_unique_identifier
Your database constraints should be DBMS understood business rules. Is there a business rule that states that no two students may have the same first and last name combination? I presume not, therefore do not create a unique key for those two fields. Perhaps best not to presume, though, and ask a business domain expert e.g. the enrolment officer.
Note that a row in this table is a proposition I.e. that there exists a student enrolled with first name 'x' and last name 'y' and student ID 'z'. Clearly the DBMS has not concept of whether this proposition is true in the real world. What normally happens is that there will be a trusted source to verify data. The enterprise will authorize an officer (director etc) in this role. Let's say it is the enrolment officer who is responsible for verifying that 'x y' is a real person, that they are eligible to be enrolled, and the person is who they say they are. Typically, they will require sight of documents (certificates, passport, etc), take up references, interview the person, check public records, etc. Of course, the enrolment officer may delegate their responsibility to other members of staff or engage an agent.
At some point they will be satisfied and for convenience will issue they own identifier, the student ID. Mistakes do happen and it may turn out that this value is not unique, in which case it would be the enrolment officer's responsibility to resolve the problem and issue a new student to. Perhaps they will use software to generate the value to mitigate against such problems. The student ID will be issued to the student and will be used within the enterprise to identify the person for the convenience of all concerned. They may even be issued with a document (e.g. photo ID card) to assist in identification, based on the level of trust in a given context (e.g. may need to produce photo ID to sit an exam). If the student forgets their ID, loses their issued documents, etc then the enrolment office will be able to retrieve it from records e.g. with reference to copy documents taken during the verification process; they are unlikely to use first name and last name alone.
The point is, the trusted source for the identifier is the enrolment officer on behalf of the enterprise, rather than the database, the DBMS or any other kind of software involved in the process. Therefore, it probably is acceptable to make student ID the sole identifier for stents within the database. Consider, however, that an auto-increment column generated on one hardware build of a single DBMS within the enterprise is probably not suitable for the allocation of such significant identifier values.