Class hierarchy in MySQL - mysql

I'm creating clothing shop and therefore I'm creating gallery functionality which consists of 2 types: Product gallery and lookbook gallery. A product gallery is simply pictures of singular products and a lookbook gallery is pictures that contains multiple products.
So far i I have a simplified UML diagram somewhat like this
I'm not sure how to translate this into MySQL tables. I've tried and I came up with something like this
But it seems like overkill and smells funny to me. What would be best practice in my situation? Am i on the right track or am i way wrong?

I don't know what "best practice" is on this, you could use a NO-SQL database or if you used a relational database you could just use 3 tables, Galleries, Pictures and Products.
A Gallery can contain many Pictures
A Picture can contain many Products.
The distinction between the types of galleries is just contained in an attribute.

Firstly consider whether inheritance is the best way to implement the behavior you need. Generally it is best to prefer composition over inheritance. From your diagram I'd say that you really don't need inheritance at all to solve your problem.
If you do need to implement inheritance then there are a number of strategies you can use. It is a really good idea to look for an object relational mapper as a good one can make implementing the below strategies much easier. If your using .NET then NHibernate or Entity Framework are good options. For Java Hibernate is pretty good.
Table per class hierarchy
Here you'd create a single table for the entire class hierarchy. This makes most sense
when the classes in the hierarchy all share many columns. You'd need to add a "discriminator" column so you could identify which subclass each row belongs to. In your example you'd have
gallery
picture
I'd say this strategy makes most sense for you as there don't appear to be many different columns between your sub-classes.
Table per subclass
In this example you'd create a table for each subclass. This makes most sense when the classes in the inheritance hierarchy don't share many common columns.
So you'd have tables like this:
product_gallery
logbook_galleries
product_picture
logbook_picture
Table per class
This is the strategy from your diagram. Like table per-subclass, This is useful when each of the sub-classes has different columns. The advantage over table-per-subclass is that it is easier to query the entire class hierarchy in one big join, the disadvantage is that you end up with lots of tables.

That database schema looks like it's doomed to fail. The most important thing to do when defining a schema is to start with the nouns and verbs of your problem. Describe in words what your environment is like. What entities are there? How do the entities interact with one another? For example, "a gallery has pictures." That statement alone tells you there is an association between a gallery and pictures.
Once you come up with your noun-verb associations, you can begin illustrating entities such as "gallery" and "picture." Is there a one-to-one relationship among galleries and pictures, or is there one gallery to many pictures?
Think on these things, and check out some basic design tips here: http://msdn.microsoft.com/en-us/library/b42dwsa3(v=vs.71).aspx

Related

Building Organizations with Subgroups

I'm struggling with a database design issue, and it's kind of a long winded one:
My website will have an unlimited number of organizations users they can join, subgroups under those organizations, and finally specific profiles for those subgroups. Subgroups within the same organization will be able to borrow and make changes to profiles from each other. Users will generate the organizations, the subgroups, and profiles.
I can draw it out, make the flow sensible on paper. When it comes to actually putting it to either SQL I'm lost. The majority of the help guides out there assumes static groups so a simple primary and foreign key set up can refer back to the right information. Mine has too much dynamic information for most of these to outright work as I understand it.
Most writers say stay away from dynamically generated tables, but that's where my instinct takes me. Another idea I had was 3 massive tables one for all Organizations, Groups, and Profiles.
So is there a better way to go about this? Or are there any good documents I should read up on to help me translate from drawing to actual code?
I have some experience with both SQL and MongoDB if that helps explain things.
I don't know about MongoDB(NoSQL), but from the SQL standpoint, here is my opinion.
As far as your schema goes, Most of the time when your "instinct" indicates that :- Only a "Dynamic Tables" solution is your best bet, for some problem that you are working on.
Remember there is a high chance that, that very problem can be solved by multiple static tables with different relationships. (By Static I mean the ones which you have created yourself as a developer.)
Also I'd like to mention that, I too myself in my initial days always thought of problem solving the similar way, but then I started understanding the principles and how exactly the databases work.
Back To Your Problem:-
If your organisation hirerchy consists of three major types of objects/levels, viz. Organizations, Groups, and Profiles then I'd suggest that you go with the 3 tables with correct relationships, which any SQL engine is quiet efficient at handling, in comparison to creating tables at runtime.
Now if the hierarchy is dynamic like say, An organisation can contain many groups which in turn shall contain profiles which again shall/can contain other organisations and so on.... Then you may want to look at Recursive structure with SQL(Recursion). (Just do a google search there are a lot of articles about that.)

Database table relationship design

I am trying to write out a database design to include the following relationships, I have tried to work them out from the top down, hierarchically, but the relationships seem to be better connected another way, I just cannot see, or express how.
(This comes from a FOUO system from work, so the names have been changed to reflect that classification, that's why the names may look odd.)
Each Branch 1:n Functional Areas,
Each Building 1:n Groups,
Each Group 1:n Units,
Each FunctionalArea 1:n Checklists,
Each Checklist 1:n Items, and
Each Unit 1:n Checklists and
This was solved by re-evaluating the relationships without concern for the size or data type they would hold. 1:n relationships were used in lieu of n:n.
When you are designing a database you need to be specific about the relationships. For example you need to mention things like "A functional area can only belongs to a one branch only". These will help to determine either we are going to have 1:1 relations or 1:n or something else.
However i have come up with an answer.
one simple way I've used that would handle this is to have tables for each pairing: branch-function, building-function, building-group, group-unit, unit-checklist, checklist-item, keeping the objects and relationships separate.
It's basically tuple soup, but keeping that sorted is what a relational db is good at. Accesses will be doing primary-key joins on multiple tables. How large do you expect your dataset to grow?
The limits (100 checklists, etc) are policy. Design the schema for simplicity and performance, implement policy in the application layer.

(Somewhat) complicated database structure vs. simple — with null fields

I'm currently choosing between two different database designs. One complicated which separates data better then the more simple one. The more complicated design will require more complex queries, while the simpler one will have a couple of null fields.
Consider the examples below:
Complicated:
Simpler:
The above examples are for separating regular users and Facebook users (they will access the same data, eventually, but login differently). On the first example, the data is clearly separated. The second example is way simplier, but will have at least one null field per row. facebookUserId will be null if it's a normal user, while username and password will be null if it's a Facebook-user.
My question is: what's prefered? Pros/cons? Which one is easiest to maintain over time?
First, what Kirk said. It's a good summary of the likely consequences of each alternative design. Second, it's worth knowing what others have done with the same problem.
The case you outline is known in ER modeling circles as "ER specialization". ER specialization is just different wording for the concept of subclasses. The diagrams you present are two different ways of implementing subclasses in SQL tables. The first goes under the name "Class Table Inheritance". The second goes under the name "Single Table Inheritance".
If you do go with Class table inheritance, you will want to apply yet another technique, that goes under the name "shared primary key". In this technique, the id fields of facebookusers and normalusers will be copies of the id field from users. This has several advantages. It enforces the one-to-one nature of the relationship. It saves an extra foreign key in the subclass tables. It automatically provides the index needed to make the joins run faster. And it allows a simple easy join to put specialized data and generalized data together.
You can look up "ER specialization", "single-table-inheritance", "class-table-inheritance", and "shared-primary-key" as tags here in SO. Or you can search for the same topics out on the web. The first thing you will learn is what Kirk has summarized so well. Beyond that, you'll learn how to use each of the techniques.
Great question.
This applies to any abstraction you might choose to implement, whether in code or database. Would you write a separate class for the Facebook user and the 'normal' user, or would you handle the two cases in a single class?
The first option is the more complicated. Why is it complicated? Because it's more extensible. You could easily include additional authentication methods (a table for Twitter IDs, for example), or extend the Facebook table to include... some other facebook specific information. You have extracted the information specific to each authentication method into its own table, allowing each to stand alone. This is great!
The trade off is that it will take more effort to query, it will take more effort to select and insert, and it's likely to be messier. You don't want a dozen tables for a dozen different authentication methods. And you don't really want two tables for two authentication methods unless you're getting some benefit from it. Are you going to need this flexibility? Authentication methods are all similar - they'll have a username and password. This abstraction lets you store more method-specific information, but does that information exist?
Second option is just the reverse the first. Easier, but how will you handle future authentication methods and what if you need to add some authentication method specific information?
Personally I'd try to evaluate how important this authentication component is to the system. Remember YAGNI - you aren't gonna need it - and don't overdesign. Unless you need that extensibility that the first option provides, go with the second. You can always extract it at a later date if necessary.
This depends on the database you are using. For example Postgres has table inheritance that would be great for your example, have a look here:
http://www.postgresql.org/docs/9.1/static/tutorial-inheritance.html
Now if you do not have table inheritance you could still create views to simplify your queries, so the "complicated" example is a viable choice here.
Now if you have infinite time than I would go for the first one (for this one simple example and prefered with table inheritance).
However, this is making things more complicated and so will cost you more time to implement and maintain. If you have many table hierarchies like this it can also have a performance impact (as you have to join many tables). I once developed a database schema that made excessive use of such hierarchies (conceptually). We finally decided to keep the hierarchies conceptually but flatten the hierarchies in the implementation as it had gotten so complex that is was not maintainable anymore.
When you flatten the hierarchy you might consider not using null values, as this can also prove to make things a lot harder (alternatively you can use a -1 or something).
Hope these thoughts help you!
Warning bells are ringing loudly with the presence of two the very similar tables facebookusers and normalusers. What if you get a 3rd type? Or a 10th? This is insane,
There should be one user table with an attribute column to show the type of user. A user is a user.
Keep the data model as simple as you possibly can. Don't build it too much kung fu via data structure. Leave that for the application, which is far easier to alter than altering a database!
Let me dare suggest a third. You could introduce 1 (or 2) tables that will cater for extensibility. I personally try to avoid designs that will introduce (read: pollute) an entity model with non-uniformly applicable columns. Have the third table (after the fashion of the EAV model) contain a many-to-one relationship with your users table to cater for multiple/variable user related field.
I'm not sure what your current/short term needs are, but re-engineering your app to cater for maybe, twitter or linkedIn users might be painful. If you can abstract the content of the facebookUserId column into an attribute table like so
user_attr{
id PK
user_id FK
login_id
}
Now, the above definition is ambiguous enough to handle your current needs. If done right, the EAV should look more like this :
user_attr{
id PK
user_id FK
login_id
login_id_type FK
login_id_status //simple boolean flag to set the validity of a given login
}
Where login_id_type will be a foreign key to an attribute table listing the various login types you currently support. This gives you and your users flexibility in that your users can have multiple logins using different external services without you having to change much of your existing system

Hierarchical Data - Nested Set Model: MySql

I am just learning how to implement the Nested Set Model but still have confusion with a certain aspect of it involving items that may be part of multiple categories. Given the example below that was pulled from HERE and mirrors many other examples I have come across...
How do you avoid duplication in the DB when you add Apples since they are multi-colored (i.e. Red, Yellow, Green)?
You do not avoid duplications and the apple (or a reference to the apple) will be placed twice in your tree otherwise it won't be a tree but rather a graph. Your question is equally applicable if you build a... Swing JTree or an HTML tree ;).
The nested set model is just an efficient way to push and traverse a tree structure in a relational DB.It is not a data structure itself. It's more popular among MySQL users since MySQL lacks functionality for processing tree structures (e.g. like the one that Oracle provides).
Cheers!
Nested set model is a structure for 1:N (one-to-many) relationships, you want to use M:N (many to many) relationship (many items can have apple as parent, but can have more than one parent).
See this article
Wikipedia
But you should be aware, that hierarchical M:N relationships can get quite complex really fast!
Thinking out loud here, but perhaps it would be helpful to view some attributes (like Red, Yellow and Green) as 'tags' instead of 'categories' and handle them with separate logic. That would let you keep the Nested Set model and avoid unnecessary duplication. Plus, it would allow you to keep your categories simpler.
It's all in how you think about the information. Categories are just another way of representing attributes. I understand your example was just for illustrative purposes, but if you're going to categorize fruit by color, why would you not also categorize meat the same way, i.e., white meat and red meat? Most likely you would not. So my point is it's probably not necessary to categorize fruit by color, either.
Instead, some attributes are better represented in other ways. In fact, in its simplest form, it could be recorded as a column in the 'food' table labeled 'color'. Or, if it's a very common attribute and you find yourself duplicating the value significantly, it could be split off to a separate table named 'color' and mapped to each food item from a third table. Of course, the more abstract approach would be to generalize the table as 'tags' and include each color as an individual tag that can then be mapped to any food item. Then you can map any number of tags (colors) to any number of food items, giving you a true many-to-many relationship and freeing up your category designations to be more generalized as well.
I know there's ongoing debate about whether tags are categories or categories are tags, etc., but this appears to be one instance in which they could be complimentary and create a more abstract and robust system that's easier to manage.
Old thread, but I found a better answer to this problem.
Since apple can have different color, your structure is a graph,not a tree. The nested set model is not the right structure for that.
Since you mention in a comment that you're using Mysql, a better solution is to use the Open Query Graph engine (http://openquery.com/graph/doc) which is a mysql plugin that lets you create a special table where you put the relationships, basically parentId and childId.
The magic is that you query this table with a special column latch depending of the value passed in the query will tell the OQGRAPH engine which command to execute. See the docs for details.

How do you know when you need separate tables?

How do you know when to create a new table for very similar object types?
Example:
To learn mysql I'm building a model solar system. For the purposes of my project, planets have many similar attributes to dwarf planets, centaurs, and comets. Dwarf planets are almost completely identical to planets. Centaurs and comets are only different from planets because their orbital path has more variation. Should I have a separate table for each type of object, or should they share tables?
The example is probably too simple, but I'm also interested in best practices. Like should I use separate tables just in case I want to make planets and dwarf planets different in the future, or are their any efficiency reasons for keeping them in the same table.
Normal forms is what you should be interested with. They pretty much are the convention for building tables.
Any design that doesn't break the first, second or third normal form is fine by me. That's a pretty long list of requirement though, so I suggest you go read it off the Wikipedia links above.
It depends on what type of information you want to store about the objects. If the information for all of them is the same, say orbit radius, mass and name, then you can use the same table. However, if there are different properties for each (say atmosphere composition for planets, etc.) then you can either use separate tables for each (not very normalized) or have one table for basic properties like orbit, mass and name and a second table for just the properties that are unique to planets (and a similar table for comets, etc. if needed). All objects would be in the first table but only planets would be in the second table and linked through a foreign key to the first table.
It's called Database Normalization
There are many normal forms. By applying normalization you will go through metadata (tables) and study the relationsships between data more clearly. By using the normalization techniques you will optimize the tables to prevent redundancy. This process will help you understand which entities to create based on the relationsships between the different fields.
You should most likely split the data about a planet etc so that the shared (common) information is in another table.
E.g.
Common (Table)
Diameter (Column)
Mass (Column)
Planet
Population
Comet
Speed
Poor columns I know. Have the Planet and Comet tables link to the Common data with a key.
This is definitely a subjective question. It sounds like you are already on the right lines of thinking. I would ask:
Do these objects share many attributes? If so, it's probably worth considering at the very least a base table to list them all in.
Does one object "extend" another - it has all the attributes of the other, plus some extras? If so, it might be worth adding another table with the extra attributes and a one-to-one mapping back to the base object.
Do both objects have many shared attributes and unshared attributes? If this is the case, maybe you need a single table plus a "data extension" system where each object can have a type or category that specifies any amount of extra attributes that may be associated with it.
Do the objects only share one or two attributes? In this case, they are probably dissimilar enough to separate into multiple tables.
You may also ask yourself how you are going to query the data. Will you ever want to get them all in the same list? It's always a good idea to combine data into tables with other data they will commonly be queried with. For example, an "attachments" table where the file can be an image or a video, instead of images and video tables, if you commonly want to query for all attachments. Don't split into multiple tables unless there is a really good reason.
If you will ever want to get planets and comets in one single query, they will pretty much have to be in the same table if you want the database to work efficiently. Inheritance should be handled inside your app itself :)
Here's my answer to a similar question, which I think applies here as well:
How do you store business activities in a SQL database?
There are many different ways to express inheritance in your relational model. For example you can try to squish everything in to one table and have a field that allows you to distinguish between the different types or have one table for the shared attributes with relationships to a child table with the specific attributes etc... in either choice you're still storing the same information. When going from a domain model to a relational model this is what is called an impedance mismatch. Both choices have different trade offs, for example one table will be easier to query, but multiple tables will have higher data density.
In my experience it's best not to try to answer these questions from a database perspective, but let your domain model, and sometimes your application framework of choice, drive the table structure. Of course this isn't always a viable choice, especially when performance is concerned.
I recommend you start by drawing on paper the relationships you want to express and then go from there. Does the table structure you've chosen represent the domain accurately? Is it possible to query to extract the information you want to report on? Are the queries you've written complicated or slow? Answering these questions and others like them will hopefully guide you towards creating a good relational model.
I'd also suggest reading up on database normalization if you're serious about learning good relational modeling principals.
I'd probably have a table called [HeavenlyBodies] or some such thing. Then have a look up table with the type of body, ie Planet, comet, asteroid, star, etc. All will share similar things such as name, size, weight. Most of the answers I read so far all have good advise. Normalization is good, but I feel you can take it too far sometimes. 3rd normal is a good goal.