Database Design - structure - mysql

I'm designing a website with courses and jobs.
I have a jobs table and courses table, and each job or course is offered by a 'body', which is either an institution(offering courses) or a company(offering jobs). I am deciding between these two options:
option1: use a 'Bodies' table, with a body_type column for both insitutions and companies.
option2: use separate 'institution' and 'company' tables.
My main problem is that there is also a post table where all adverts for courses and jobs are displayed from. Therefore if I go with the first option, I would just need to put a body_id as a record for each post, whereas if I choose the second option, I would need to have an extra join somewhere when displaying posts.
Which option is best? or is there an alternative design?

Don't think so much in terms of SQL syntax and "extra joins", think more in terms of models, entities, attributes, and relations.
At the highest level, your model's central entity is a Post. What are the attributes of a post?
Who posted it
When it was posted
Its contents
Some additional metadata for search purposes
(Others?)
Each of these attributes is either unique to that post and therefore should be in the post table directly, or is not and should be in a table which is related; one obvious example is "who posted it" - this should simply be a PostedBy field with an ID which relates another table for poster/body entities. (NB: Your poster entity does not necessarily have to be your body entity ...)
Your poster/body entity has its own attributes that are either unique to each poster/body, or again, should be in some normalized entity of their own.
Are job posts and course posts substantially different? Perhaps you should consider CoursePosts and JobPosts subset tables with job- and course-specific data, and then join these to your Posts table.
The key thing is to get your model in such a state that all of the entity attributes and relationships make sense where they are. Correctly modeling your actual entities will prevent both performance and logic issues down the line.
For your specific question, if your bodies are generally identical in terms of attributes (name, contact info, etc) then you want to put them in the same table. If they are substantially different, then they should probably be in different tables. And if they are substantially different, and your jobs and courses are substantially different, then definitely consider creating two entirely different data models for JobPosts versus CoursePosts and then simply linking them in some superset table of Posts. But as you can tell, from an object-oriented perspective, if your Posts have nothing in common but perhaps a unique key identifier and some administrative metadata, you might even ask why you're mixing these two entities in your application.

When resolving hierarchies there are usually 3 options:
Kill children: Your option 1
Kill parent: Your option 2
Keep both
I get the issue you're talking about when you kill the parent. Basically, you don't know to what table you have to create a foreign key. So unless you also create a post hierarchy where you have a post related to institution and a separate post table relating to company (horrible solution!) that is a no go. You could also solve this outside the design itself adding metadata in each post stating which table they should join against (not a good option either as your schema will not be self documentation and the data will determine how to join tables... which is error prone).
So I would discard killing the parent. Killing the children works good if you don't have too many different fields between the different tables. Also you should bear in mind that that approach is not good to solve issues wether the children can be both: institution and companies but it doesn't seem to be the case. Killing the children is also the most efficient one.
The third option that you haven't evaluated is the keeping both approach. This way you keep a dummy table containing the shared values between the bodies and each of the bodies have a FK to this "abstract" table (if you know what I mean). This is usually the least efficient way but most likely the most flexible. This way you can easily handle bodies that are of both types, and also that are only of type "body" but not a company nor an institution themselves (if that is even possible or might be possible in the future). You should note that in order to join a post to an institution you should always reference the parent table and then join the parent with the children.
This question might also be useful for you:
What is the best database schema to support values that are only appropriate to specific rows?

Related

Optimising a database with two separate category tables

I have a database for a website that provides all the data storage capabilities of the website. It stores articles in a knowlegebase, and services for internal and end-user access.
Both articles and services are stored in categories which can have an indefinite amount of parent categories by self-referencing. It is possible to add multiple categories to either via the connecting table.
It needs to be possible to find the categories of a service or an article, including all the way up the category-parent tree. It also needs to be possible to find the services or articles of a category. Of course, a category can't have both.
Is this an optimal way of doing this? It doesn't feel right, and I'd welcome alternate ideas.
EDIT: Does this way usually work? The categories all have roughly the same content, just a name and description and perhaps an image.
The primary keys of category_service and category_article should include both fields in the respective tables (if a category can have more than one service or article). Also, do you really need a VARCHAR(45) type indicator? I recommend a short ENUM instead.
Otherwise, the basic design in the second diagram looks good. I suggest you add a closure table for efficiently querying recursive hierarchies.
If you want to enforce consistency between the category type and records in category_article/category_service, you can duplicate the type indicator in those tables and include it in the foreign key constraint. Yes, doing so feels redundant, but it's effective. Resist the temptation to combine these two tables, mixing values from different domains in a single column usually leads to more difficulties.

Is it proper to make a grand-parent key, a primary key, in its grand-child, in a multi-level identifying relationship?

Asked this here a couple of days ago, but haven't gotten many views, let alone a response, so I'm reposting to stackoverflow.
I'm modeling a DB for a conference ticketing system. In this system attendees are members of an attendee group, which belong to a conference. These relationships are identifying, and therefore FKs must be PKs in the respective children.
My current model:
Q: Is it proper to have attendeeGroupConferenceId FK, as a PK, in the attendee table, as MySQL Workbench has automatically set up for me?
On one side one would get a performance boost by keeping it in there for quick association at "check in". However, it does not strictly necessary since the combination of id, attendeeGroupId, and a corresponding lookup of conferenceId in the respective attendeeGroup table, is enough. (Therefore becomes redundant data.)
To me, it feels like it might violate some form of normalization, but I plan on keeping it in for the speed boost as described. I'm just curious about what proper design says about giving it PK status or not.
You definitely don't need the attendeeGroupConferenceId in your attendee table. It's redundant and notice that candidate key is the combination of (attendeeGroupId, personId), not the attendeeGroupConferenceId alone.
The table attendee also seems to violate the Second normal form (2NF) as it is.
My suggestion is to remove the attribute attendeeGroupConferenceId. In any case you can just join the tables in your queries to get extra info rather than keeping an extra attribute.

How to spot the relationship in RDBMS?

I was studying about relationships in RDBMS.I have understood the basic concept behind mapping relation ship,but I am not able to spot them.
The three possibilities :
one to many(Most common) requires a PK - FK relationsip.Two tables involved
many to many(less common) requires a junction table.Three tables Involved
one to one(very rare). One table involved.
When I begin a project,I am not able to separate the first two conditions and I am not clear in my head.
Examples when I study help for a brief moment,but not when I need to put these principles in to practice.
This is the place where most begineers falter.
How can I spot these relationships.Is there a simpler way?
Don't look at relationships from a technical perspective. Use analogies and real-life examples when trying to envision relationships in your head.
For example, let's say we have a library database.
A library must have books.
M:M
Each Book may have been written by multiple Authors and each Author may have written multiple Books. Thus it is a many-to-many relationship which will reflect into 3 tables in the database.
1:M
Each Book must also have a Publisher, but a Book may only have one Publisher and a Publisher can publish many Books. Thus it is a one-to-many relationship and it reflects with the PublisherId being referenced in the Books table.
A simple analogy like this one explains relationships to their core. When you try to look at them through a technical lens you're only making it harder on yourself. What's actually difficult is applying real world data scenarios when constructing your database.
I think the reason you are not getting the answers that you need is because of the way you are framing the question. Instead of asking “How do I spot the correct type of relationship between entities”, think about “How do my functional needs dictate what relationship to implement”. Database design doesn’t drive the function; it’s the functional needs that drive the relationships you need to implement.
When designing a database structure, you need to identify all the entities. Entities are all the facts that you want to store: lists of things like book titles, invoices, countries, dog species, etc. Then to identify your relationships, you have to consider the types of questions you will want to ask your database. It takes a bit of forward thinking sometimes… just because nobody is asking the question now doesn’t mean that it might not ever be asked. So you can’t ask the universe “what is the relationship between these lists of facts?” because there is no definitive answer. You define the universe… I only want to know answers to these types of questions; therefore I need to use this type of relationship.
Let’s examine an example relation between two common entities: a table of customers and a table of store locations. There is no “correct” way to relate these entities without first defining what you need to know about them. Let’s say you work for a retailer and you want to give a customer a default store designation so they can see products on the website that their local store has in stock. This only requires a one-to-many relationship between a store and the customer. Designing the relationship this way ensures that one store can have many customers as their default and each customer can only have one default store. To implement this relationship is as easy as adding a DefaultStore field to your Customer table as a foreign key that links to the primary key of the Store table.
The same two entities above might have alternate requirements for the relationship definition in a different context. Let’s say that I need to be able to give the customer the opportunity to select a list of favorite stores so that they can query about in stock information about all of them at once. This requires a many-to-many relationship because you want one customer to be able to relate to many stores and each store can also relate to many customers. To implement a many-to-many relationship requires a little more overhead because you will have to create a separate table to define the relationship links, but you get this additional functionality. You might call your relationship table something like CustomerStoreFavorites and would have as its primary key as the combined primary keys from each of the entities: (CustomerID, StoreID). You could also add attributes to the relationship, like possibly a LastOrderDate field to specify the last date that the customer ordered something from a particular store.
You could technically define both types of relationships for the same two entities. As an example: maybe you need to give the customer the option to select a default store, but you also need to be able to record the last date that a customer ordered something from a particular store. You could implement the DefaultStore field on the Customer table with the foreign key to the Store table and also create a relationship table to track all the stores that a customer has ordered from.
If you had some weird situation where every customer had their own store, then you wouldn’t even need to create two tables for your entities because you can fit all the attributes for both the customer and the store into one table.
In short, the way you determine which type of relationship to implement is to ask yourself what questions you will need to ask the database. The way you design it will restrict the relational data you can collect as well as the queries you can ask. If I design a one-to-many relationship from the store to the customer, I won’t be able to ask questions about all the stores that each customer has ordered from unless I can get to that information though other relationships. For example, I could create an entity called "purchases" which has a one-to-many relationship to the customer and store. If each purchase is defined to relate to one customer and one store, now I can query “what stores has this customer ordered from?” In fact with this structure I am able to capture and report on a much richer source of information about all of the customer's purchases at any store. So you also need to consider the context of all the other relationships in your database to decide which relationship to implement between two particular entities.
There is no magic formula, so it just takes practice, experience, and a little creativity. ER Diagrams are a great way to get your design out of your head and onto paper so that you can analyze your design and ensure that you can get the right types of questions answered. There are also a lot of books and resources to learn about database architecture. One good book I learned a lot from was “Database System Concepts” by Abraham Silberschatz and Henry Korth.
Say you have two tables A and B. Consider an entry from A and think of how many entries from B it could possibly be related with at most: only one, or more? Then consider an entry from B and think of how many entries in A it could be related with.
Some examples:
Table A: Mothers, Table B: Children. Each child has only one mother but a mother may have one or more children. Mothers and Children have a one-to-many relationship.
Table A: Doctors, Table B: Patients. Each patient may be visiting one or more doctors and each doctor treats one or more patients. So they have a many-to-many relationship.
An example of one to one:
LicencePlate to Vehicle. One licence plate belongs to one vehicle and one vehicle has one licence plate.

Database Normalization - I think?

We have a J2EE content management and e-commerce system, and in this system – for sake of a simple example – let’s say that we have 100 objects. All of these objects extend the same base class, and all share many of the same fields.
Let’s take two objects as an example: a news item that would be posted on a website, and a product that would be sold on a website. Both of these share common properties:
IDs: id, client ID, parent ID (long)
Flags: deleted, archived, inactive (boolean)
Dates: created, modified, deleted (datetime)
Content: name, description
And of course they have some properties that are different:
News item: author, posting date
Product: price, tax
So (finally) here is my question. Let’s say we have 100 objects in our system, and they all follow this pattern. They have many fields that overlap, and some unique fields. In terms of a relational database, would we be better off with:
Option One: Less Tables, Common Tables
table_id: id, client ID, parent ID (long) (id is the primary key, a GUID for all objects)
table_flag: id, deleted, archived, inactive (boolean)
table_date: id, created, modified, deleted (datetime)
table_content: id, name, description
table_news: id, author, posting date
table_product: id, price, tax
Option Two: More Tables, Common Fields Repeated
table_news: id, client ID, parent ID, deleted, archived, inactive, name, description, author, posting date
table_product: id, client ID, parent ID, deleted, archived, inactive, name, description, price, tax
For full disclosure – I am a developer and not a DBA, and because of that I prefer option one. But there is another team member that prefers option two, and I think he makes valid points.
Option One: Pros and Cons
Pro: Encapsulates common fields into common tables.
Pro: Need to change a common field? Change it in one place.
Pro: Only creates new fields/tables when they are needed.
Pro: Easier to create the queries dynamically, less repetitive code
Con: More joining to create objects (not sure of DB impact on that)
Con: More complex queries to store objects (not sure of DB impact on that)
Con: Common tables will become huge over time
Option Two: Pros and Cons
Pro: Perhaps it is better to distribute the load of all objects across tables?
Pro: Could index the news table on the client ID, and index the product table on the parent ID.
Pro: More readable to human eye: easy to see all the fields for an object in one table.
My Two Cents
For me, I much prefer the elegance of the first option – but maybe that is me trying to force object oriented patterns on a relational database. If all things were equal, I would go with option one UNLESS a DB expert told me that when we have millions of objects in the system, option one is going to create a performance problem.
Apologies for the long winded question. I am not great with DB lingo, so I probably could have summarized this more succinctly if I better understood terms like normalization. I tried to search for answers on this topic, and while I found many that were close (I suspect this is a common DB issue) I could not find any that answered all my questions. I read through this article on normalization:
But I did not totally understand it. On the one hand it was saying that you should remove any redundancies. But on the other hand, it was saying that each attribute should define only one object.
Thanks,
John
You should read Patterns of Enterprise Application Architecture by Martin Fowler. He writes about several options for the scenario you describe:
Single Table Inheritance: One table for all object subtypes. Stores all attributes, setting them NULL where they are inapplicable to the row's object subtype.
Class Table Inheritance: One table for column common to all subtypes, then one table for each subtype to store subtype-specific columns.
Concrete Table Inheritance: One table for each subtype, storing both subtype-specific columns and columns common to all subtypes.
Serialized LOB: One table for all object subtypes. Store common attributes as conventional columns, but combine optional or subtype-specific columns as fields in a BLOB that stores XML or JSON or whatever format you want.
Each one of these designs has pros and cons, so choose a solution depending on the most common way you access your data.
However, notice I use the word subtype above. I would use these designs only if the different object types are subtypes of a common base class. I'm assuming that News item and Product don't actually share a logical base class (besides Object); they are not subtypes of a common superclass.
So for the sake of OO design, I would choose Concrete Table Inheritance. This avoids any inappropriate coupling between these subtypes. There are columns the two tables have in common, but they basically amount to bookkeeping, not anything to do with the function of the class and hence the table.

DB design to store different products for each customer order

I'm building a simple way to insert customer orders into the db.
We have several products, each one needs different properties.
I've started designing the following tables:
CUSTOMER -> Order (FK to CUSTOMER) -> OrderItem (FK to Order)
Now I'm thinking How could I link product-specific tables to OrderItem.
Suppose I've two products: product1 (room_name, width, height, color) and product2 (number, width, height, type, optionals). I'd create two different tables and link them with the OrderItem, to get specific options, am I wrong? (of course there will be more than just two products)
How can I do this?
I'd have one Product table with a one-to-many relationship between OrderItem and Product. Put a FOREIGN KEY in the OrderItem table that points to its associated Product.
A design like yours would mean you'd have to add a table every time there was a new product. That would not do. You want to add products by inserting new rows.
No approach can resolve all of the issues you may be dealing with, the choice you make depends on which factor is most important to you.
Most people shirk away from having multiple tables. One reason is that you don't know how many tables you may end up with in the future. Another is that your queries may also bloat by having to join to multiple tables. And it may become a maintenance headache with multiple queries to update every time you add a table. Finally, adding a table is not even remotely as friendly as adding a record (Do you really want your App to be able to create tables?).
One option is just to add more and more fields to the Product table. By making the property fields NULLable, different products can use different fields.
But... You may then need to add logic to ensure that ProductX -always- has a value in FieldA, but that ProductY always has a value in FieldB, etc. And probably some meta-data about each product type so that your application knows which fields to use for which products. You still may need to add new fields, which is possibly tidier than adding new tables, but you still probably don't want the Application doing.
An option that totally avoids using DDL to add a product is to further normalise your data, and have the product-specific-properties in an Entity-Attribute-Value table. This is initially very attractive to many people as it is so generic and flexible.
Product(id, name, another-global-property, etc)
Product_Properties(product_id, property_id, property_value)
You'll probably have some meta-data and extra logic to ensure all the correct properties are used. But now you just add records to a generic structure whenever you create a new product.
But what type should "property value" be? It may need to hold strings, dates, numbers, anything. You could make it a string and use the meta-data to know how to CAST the value. Of you may have several value fields, one of each type, and a "field_type_id" or something to indicate which value-field should be read from.
It's also less friendly for certain searches. If you know a product_id, finding the properties is easy. If you want all products where the expiry date is in the past, you need to be careful about how you structure the data and indexes to make the query efficient. But if you want (expiry < today AND cost > 50) then you get a much different query from what you are used to - Each value is in a different ROW instead of a different FIELD.
Search performance really does begin to shrink as query complexity increases and design considerations become more technical.
Which way you go depends on application functional requirement, architecture and design decisions, and a good helpful dash of 'taste'.
You have tagged question as django. Then you should read this recent post:
Coding an inventory system, with polymorphic items and manageable item types
In this post #ThibaultJ explain how to accomplish this with Django model utils.
The idea is that you have a 'product' model and you inherit product1 and product2 from this model adding specific information for both. #ThibaultJ has posted intesting samples.
I will notice #ThibaultJ about this question. If #ThibaultJ writes an answer I will remove my post.
Here are some options
IMHO I would choose an Inheritance pattern, i.e. a new table called "ProductBase" with a unique Surrogate. Product base would have a classification e.g. "ProductType" which would then allow you to join into the appropriate 'subclass' Product table. OrderItem would reference just the Surrogate. Referential Integrity is enforcable, and it gives the opportunity for extending to additional forms of products. It does however require the use of a common unique surrogate amongst all Product table types. If there are other tables (other than OrderItem) referencing Product, it would also avoid the use of having to FK to composite keys.
Nullable Foreign Keys in OrderItem, i.e. OrderItem would have nullable FK to both (all) types of Product Tables, although only one of them would be present on each row.
By inner joining OrderItem to the appropriate Product tables would eliminate the 'wrong' product joins based on the NULLs. RI can still be enforced.
If you have the SAME type of Primary Key on all your Product subclass tables, then you could also add a single Product "Foreign" Key and a "ProductType" "Switch" on OrderItem. The problem here is that you can't enforce RI.
That said, I really wouldn't be creating a new table for each and every product - surely there are some broad 'categories' of Product which can be modelled in a uniform manner.
No doubt if you sell Aircraft and Groceries that you would probably need a AircraftProduct and a GroceryProduct, but surely A300, Boeing 747 and Cessna Skyhawk would fit as rows inside AircraftProduct, even if there are a few 'optional' nullable fields in each table not applicable to all products in this 'category'?
Edit : First see Dems and Duffmo's posts to see if you can avoid the requirement for having multiple Product tables at all, by using EAV / Multivalue / Metadata patterns to model Product.