multiple foreign key ERD - mysql

I having a question about same FK using in the schema. Here is the question
|=======================================|
| Book |
|=======================================|
| Book_ID (PK)| Cover_Paper | Page_Paper|
|-------------|-------------|-----------|
|====================================|
| Paper |
|====================================|
| Paper_ID (PK)| Paper_Type | weight |
|--------------|------------|--------|
Let say, I have different type of paper with different weight use to print cover and page.
So I need to plug the Paper_ID as FK into Book table. The problem is, it is wrong to have different column name as FK. If I change the table to the same column name it will be so weird.
|==========================================|
| Book |
|==========================================|
| Book_ID (PK)| Paper_ID(FK) | Paper_ID(FK)|
|-------------|--------------|-------------|
Any help on this problem??

It's not wrong to have column names that differ from the domain name of the column. In fact, it is often necessary.
The alternative - having two columns with the same name - is bad. How would you know which column indicated cover paper and which page paper? By position? This ties the meaning of the content to the physical representation of the data. What happens if I select Book_ID and just one of the Paper_ID columns? One wouldn't know, without additional external information, what the data means. Rather, that additional information should be part of the representation, so that it's as self-descriptive as possible.
In relations where each role is filled by a unique domain, it's easy enough to just use the name of the domain as the name of the role without confusion. If a book consisted of a single type of paper, talking about the book's paper makes sense. Same for a bicycle's seat and a person's nose.
However, when a relation has more than one of the same kind of thing, we need to indicate each thing's role. Distinguishing Cover_Paper and Page_Paper like you did is the right way to do it. (It's too bad SQL DBMSs don't have separate role and domain names for each column, but I digress.)
You could call it Cover_Paper_ID and Page_Paper_ID, it's sort of an industry convention to attach ID to surrogate identifier columns though I think it reads better without. In other relations, it's often sufficient to write just the role without the domain - e.g. in a Marriage we might have columns for Husband and Wife, instead of writing Husband_Person and Wife_Person.
Both Edgar Codd (author of A Relational Model of Data for Large Shared Data Banks) and Peter Chen (author of The Entity-Relationship Model - Toward a Unified View of Data) discuss roles in their papers. I highly recommend studying both, especially since very few online resources ever mention the topic.

Related

I am creating a database for a community to store details of all the members. What would be the best way to create such database?

I am creating a database for a community to store details of all their members along with those members' relations with each other.
For Instance: There is a family of 4. Mother, Father, Son and Daughter. The Son gets married to a girl from another family in the same community (Their data is also in the same database). The newly married couple has a new member soon. Also they need to add their grand parents to the database at a later stage (Parents of both the Mother and Father in this case).
What would be the best way to create a schema for such a database.
I have a schema called member_details that'll store all community members' data in a single table something like this.
member_details: ID | Name | Birthdate | Gender | Father | Mother | Spouse | Child
All members would have relations mapped to Father,Mother,Spouse,Child referenced in the same table.
Is this schema workable from a technical pov?
I just need to know if my solution is correct or is there a better way to do this. I know there are a lot of websites storing this kind of data and if someone could point me to the right direction.
I'd advice you to use two tables. One for members of community and one for relations beetween them. Something like this:
Members:
ID | Name | Birth | Gender
Relations:
First Member ID | Second Member ID | Relation
Where you use IDs from first table as foreign keys in second. That way you'll be able to add more relations types when you need it. By the way, I'd add a third table to store relation types, so it can work as a dictionary. Same thing for genders.
As usual, "it depends".
The first question is "how will you use this data?". What sort of questions do you expect the database to answer? If you want to show a person's profile with their relationships, that's pretty easy. If you want to find out how many children a person has, or who is the grandfather of a person, or the age of someone's youngest child, that could be a little harder.
The second question is "how sure are you these are the only relationships you want to store?" Perhaps you also want to store "neighbour", "team member", "engaged_to" - or maybe you need to store that information later on. Maybe you need to take account of people getting divorced, or remarrying.
The schema you suggest works fine for most scenarios, but adding a new type of relationship means you have to add a new column. There are no hard and fast rules, but in general it's better to add rows than columns when faced with events in the problem domain. Asking "who is this person's grandfather" requires a couple of self joins, and that's okay.
#ba3a suggests splitting the information about people from the information about relationships. This is much "cleaner" - and less likely to require new columns as you store more types of relationship. Showing a person's profile requires a query with lots of outer joins. Finding a grand parent requires self joins on the "relations" table.

MySQL Schema Advice: Unpredictable Field Additions

A little overview of the problem.
Let's say I have a table named TableA with fixed properties, PropertyA, PropertyB, PropertyC. This has been enough for your own website needs but then you suddenly have clients that want custom fields on your site.
ClientA wants to add PropertyD and PropertyE.
ClientB wants to add PropertyF and PropertyG.
The catch is these clients don't want each others fields. Now imagine if you get more clients, the solution of just adding nullable fields in TableA will be cumbersome and you will end up with a mess of a table. Or at least I assume that's the case feel free to correct me. Is it better if I just do that?
Now I thought of two solutions. I'm asking if there's a better way to do it since I'm not that confident with the trade offs and their future performance.
Proposed Solution #1
data_id is a not exactly a foreign key but it stores whatever corresponding client property is attached to a table A row. Using client_id as the only foreign key present on both the property table and table A.
It feels like it's an anti pattern of some sorts but I could imagine queries will be easy this way but it requires that the developer knows what property table it should pick from. I'm not sure if many tables is a bad thing.
Proposed Solution #2
I believe it's a bit more elegant and can easily add more fields as necessary. Not to mention these are the only tables I would need for everything else. Just to visualize. I will add the request properties in the properties table like so:
Properties
-------------
1 | PropertyD
2 | PropertyE
3 | PropertyF
4 | PropertyG
And whenever I save any data I would tag all properties whenever they are available like so. For this example I want to save a ClientA stored in the Clients table on id 1.
Property_Mapping
--------------------------------------------------------
property_id | table_a_id | property_value | client_id
--------------------------------------------------------
1 | 1 | PROPERTY_D_VALUE | 1
2 | 1 | PROPERTY_E_VALUE | 1
There are obvious possible complexity of query on this one, I'd imagine but it's more a tradeoff. I intended client_id to be placed on property_mapping just in case clients want the same fields. Any advice?
You've discovered the Entity-Attribute-Value antipattern. It's a terrible idea for a relational database. It makes your queries far more complex, and it takes 4-10x the storage space.
I covered some pros and cons of several alternatives in an old answer on Stack Overflow:
How to design a product table for many kinds of product where each product has many parameters
And in a presentation:
Extensible Data Modeling with MySQL
As an example of the trouble EAV causes, consider how you would respond if one of your clients says that PropertyD must be mandatory (i.e. the equivalent of NOT NULL) and PropertyE must be UNIQUE. Meanwhile, the other client says that PropertyG should be restricted to a finite set of values, so you should either use an ENUM data type, or use a foreign key to a table of allowed values.
But you can't implement any of these constraints using your Properties table, because all the values of all the properties are stored in the same column.
You lose features of relational databases when you use this antipattern, such as data types and constraints.

Bill of Materials: One table for everything, or a table for each sub-level?

I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?

Need help designing ERD for food bank

This is my first project outside of school so I'm rusty and lacking practice.
I want to create a database but I'm not sure if I'm doing ok so far. I need a little help with the reports table.
The situation:
The food bank has multiple agencies who hand out food
Each agency has to submit a report with how many families/people served from a long list of zip codes.
I thought of putting a fk on Report to Zips table. But will that make selecting multiple zips impossible?
Maybe I'm way off base. Does someone have a suggestion for me?
A better name for this table might be FoodService or something. I imagine the kind of reports you really want to end up are not just a single row in this table, so naming it Report is a bit confusing.
In any case, each Report is the unique combination of {Agency ID, ZIP code, Date} and of course Agency ID is a foreign key. The other columns in this table would be number of families and people served, as you've indicated. This means you'll have rows for each agency-ZIP-date combination like this:
Agency | ZIP | Date | FamiliesServed | PeopleServed
Agency A | 12345 | Jan-12 | 100 | 245
Agency A | 12340 | Jan-12 | 20 | 31
Agency B | 12345 | Jan-12 | 80 | 178
Agency B | 12340 | Jan-12 | 0 | 0
Are these totals also broken down by "program"? If so, program needs to be part of the primary key for this table. Otherwise, program doesn't belong here.
Finally, unless you're going to start storing data about the ZIP codes themselves, you don't need a table for ZIP codes.
Usually having orphan tables like "Food" is a sign something's missing. If there's that much data involved, you'd think it would link in to the order model in some capacity, or at the very least you'd have some kind of indication as to which agency stocks which kind of food.
What's curiously absent is how data like "families-served" is computed from this schema. There doesn't seem to be a source for this information, not even a "family served" record, or a spot for daily or weekly summaries to be put in and totalled.
A "Zips" table is only relevant if there is additional data that might be linked in by zip code. If you have a lat/long database or demographic data this would make sense. Having an actual foreign key is somewhat heavy handed, though. What if you don't know the zip? What if, for whatever reason, the zip is outside of the USA? How will you handle five and nine digit zip codes?
Since zips are not created by the user, the zips table is mostly auxiliary information that may or may not be referenced. This is a good candidate for an isolated "reference" table.
Remember that the structure of a diagram like this is largely influenced by the front-end of the application. If users are adding orders for food items, that translates into relationships between all three things. If agencies are producing reports based on daily activity logs, then once again you need relationships between those three entities.
The front end is usually based on use-cases, so be sure you're accommodating all of those that are relevant.

Method To Create Database for Tv Shows

This is my first question to stackoverflow so if i do something wrong please let me know i will fix it as soon as possible.
So i am trying to make a database for Tv Shows and i would like to know the best way and to make my current database more simple (normalization).
I would to be able to have the following structure or similar.
Fringe
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
Burn Notice
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
... (More Tv Shows)
Sorry if this seems unclear. (Please ask for clarification)
But the structure i have right now is 3 tables (tvshow_list, tvshow_episodes, tvshow_link)
//tvshow_list//
TvShow Name | Director | Company_Created | Language | TVDescription | tv_ID
//tvshow_episodes//
tv_ID | EpisodeNum | SeasonNum | EpTitle | EpDescription | Showdate | epid
//tvshow_link//
epid | ep_link
The Director and the company are linked by an id to another table with a list of companies and directors.
I am pretty sure that there is an more simplified way of doing this.
Thanks for the help in advance,
Krishanthan Lingeswaran
The basic concept of Normalization is the idea that you should only store one copy of any item of data that you have. It looks like you've got a good start already.
There are two basic ways to model what you're trying to do here, with episodes and shows. In the database world, we you might have heard the term "one to many" or "many to many". Both are useful, it just depends on your specific situation to know which is the correct one to use. In your case, the big question to ask yourself is whether a single episode can belong to only one show, or can an episode belong to multiple shows at once? I'll explain the two forms, and why you need to know the answer to that question.
The first form is simply a foreign key relationship. If you have two tables, 'episodes' and 'shows', in the episodes table, you would have a column named 'show_id' that contains the ID of one (and only one!) show. Can you see how you could never have an episode belong to more than one show this way? This is called a "one to many" relationship, i.e. a show can have many episodes.
The second form is to use an association table, and this is the form you used in your example. This form would allow you to associate an episode with multiple shows and is therefore called a "many to many" relationship.
There is some benefit to using the first form, but it's not really that big of a deal in most cases. Your queries will be a little bit shorter because you only have to join 2 tables to get episodes->shows but the other table is just one more join. It really comes down to figuring out if you need a "one to many" or "many to many" type relationship.
An example of a situation where you would need a many-to-many relationship would be if you were modeling a library and had to keep track of who checked out which book. You'd have a table of books, a table of users, and then a table of "books to users" that would have an id, a book_id, and a user_id and would be a many-to-many relationship.
Hope that helps!
I am pretty sure that there is an more simplified way of doing this.
Not as far as I know. Your schema is close to the simplest you can make for what I presume is the functionality you're asking for. "Improvements" on it really only make it more complicated, and should be added as you judge the need emerges on your side. The following examples come to mind (none of which really simplify your schema).
I would standardize your foreign key and primary key names. An example would be to have the columns shows.id, episodes.id, episodes.show_id, link.id, link.episode_id.
Putting SeasonNum as what I presume will be an int in the Episodes table, in my opinion, violates the normalization constraint. This is not a major violation, but if you really want to stick to it, I would create a separate Seasons table and associate it many-to-one to the Shows table, and then have the Episodes associate only with the Seasons. This gives you the opportunity to, for instance, attach information to each season. Also, it prevent repetition of information (while the type of the season ID foreign key column in the Episodes table would ostensibly still be an INT, a foreign key philosophically stores an association, what you want, versus dumb data, what you have).
You may consider putting language, director, and company in their own tables rather than your TV show list. This is the same concern as above and in your case a minor violation of normalization.
Language, director, and company all have interesting issues attached to them regarding the level of the association. Most TV shows have different directors for different episodes. Many are produced in multiple languages and by several different companies and sometimes networks. So at what level do you plan on storing this information? I'm not a software architect, so someone else can better answer this question than me, but I'd set up a polymorphic many-to-many association for languages, directors, and companies and an inheritance model that allows for these values to be specified on an episode-by-episode, season-by-season, or show-by-show basis, inheriting the value from its parent if none are provided.
Bottom line concerning all these suggestions: Pick what's appropriate for your project. If you don't need the functionality afforded by this level of associations, and you don't mind manually entering in repetitive data (you might end up implementing an auto-complete system to help you), you can gloss over some of the normalization constraints.
Normalization is merely a suggestion. Pick what's right for you and learn from your mistakes.