What is the best practice for naming database tables to provide natural organization? - mysql

We put common prefixes on related tables to assure they display next to each other in our DB management software (Toad, Enterprise Manager, etc).
So for example, all user tables start with the word User:
User
UserEvent
UserPurchase
Ideally, in honor of the three great virtues of a programmer these tables should be named User, Event, Purchase respectively to save some typing, agreed?
Is this naming convention the best (only?) practice for grouping related tables together naturally?

I tend to go against the grain in naming conventions on two counts here...
I don't like using prefixes so that things group together in a given UI. To me the tables should be named such that in code they are easily readable and make sense. There have been numerous studies (mostly ignored by programmers) that show that things like using underscores, logical names, eliminating abbreviations, and eliminating things like prefixes have a very big impact on both comprehension and the speed of working with code. Also, if you use prefixes to break down tables what do you do when a table is used by multiple areas - or even worse it starts out in only one area but later starts to be used in another area. Do you now have to rename the table?
I almost completely ignore name lengths. I'm much more concerned about readability and understandability than I am about taking 1/10th of a second longer to type a column or table name. When you also keep in mind all of the time wasted trying to remember if you abbreviated "number" as "no", "num" or "nbr" and keep in mind the use of things like Intellisense, using longer, more meaningful names is a no brainer to me.
With those things in mind, if your UserEvents table really has to do with events related to a user then that name makes perfect sense. Using just Events would end up giving you a poorly named table because the name isn't clear enough in my opinion.
Hope this helps!

I wouldn't use a naming convention for purposes of alphabetizing table names. It's nice when it works out that way, but this shouldn't be by design.
Read Joe Celko's book "SQL Programming Style." His first chapter in that book is about naming conventions, guided by the ISO 11179 standard for metadata naming. One of his recommendations is to avoid unnecessary prefixes in your naming convention.

I would use this notation i.e "UserEvent" in case this table is a representation of many-to-many relation between "User" and "Event" tables.

I think it depends on how if/how you plan to expand upon your data model. For instance, what if you decided that you not only want to have User Events, but also, Account Events, or Login Events, and User, Accounts, and Logins are all different types of entities, but they share some Events. In that case, you would might want a normalized database model:
Users (Id int, Name varchar)
Accounts (Id int, Name varchar)
Logins (Id int, Name varchar)
Events (Id int, Name varchar)
UserEvents (UserId int, EventId int)
AccountEvents (AccountId int, EventId int)
LoginEvents (LoginId int, EventId int)

I guess that the naming convention in SQL is or at least should follow the same guidelines as used in other programming languages. I have always preferred variable names / class name that clearly identified the purpose and that usually means more typing. I think the importent thing to remember is that code is only written once but read a lot of times, so clarity is vital.
The last 4 or 5 years I have noticed that the variable names have grown from "eDS" to "employeeDataSet" and I think that IntelliSense is the main reason for that.
I think we will see the same happen in SQL since Red Gate SQL Prompt and now MS SQL2008 have introduced Intellisense into the mainstrem database world.

I also would avoid using prefixes in order to get meaning out of an alphabetically ordered list. It's generally bad data management to use names that way. If tables are closely related, then the software that manages the data model should be able to model relationships between tables, regardless of how the tables are named.
A name like UserEvents should be used when the content of each row describes a relationship between a User and an Event. If I saw such a table in a database that was new to me, I'd expect to see a primary key in this table made up of two foreign keys, one to the Users table and one to the Events table.
If I saw something different, I would have to slow down until I understood the designer's convention.
This last point raises some questions: home many new people will be coming up to speed on the database? What documentation will those people have available? How important are those new people to the success of the project that ijncludes the database, or the success of the database designer?

Personally I rename my tables like this:
Main tables : tbl_user
Reference tables : ref_profil
Related Tables : lnk_user_profil

I personally favor naming conditions similar to what you have listed there, as it is a very logical pattern, they are organized in a way that you can easily find them, and overall the extra bits of typing typically doesn't cause many issues or reductions in time.

Related

Is there a more efficient way to handle multi-valued attributes other than creating a relationship table?

I have three tables, tbl_school, tbl_courses and tbl_branches.
Each course can be taught in one or more branches of a school.
tbl_school has got:
id
school_name
total_branches
...
tbl_courses:
id
school_id
course_title
....
tbl_branches:
id
school_id
city
area
address
When I want to list all the branches of a school, it is a pretty straight forward JOIN.
However, each course will be taught in one or more branches or all the branches of the school and I need to store this information. Since there is a one-to-many relationship between tbl_courses and tbl_branches, I will have to create a new relationship table that maps each course record to it's respective branches.
When my users want to filter a course by city or area, this relationship table will be used.
I would like to know if this is the right approach or is there something better for my problem?
I was planning to store a JSON of branches of courses which would eliminate the relationship table and query would be much easier to find the city or area pattern in JSON string.
I am new to design patterns so kindly bear with me.
Issues
The table description you have given has a few errors, which need to be corrected first, after which my proposal will make more sense.
The use of a table prefix, especially tbl_, is incorrect. All the tables are tbl_s. If you do use a prefix, it is to group tables by Subject Area. Further, SQL allows a table qualifier when referring to any table in the code:
`... WHERE table_name.column_name = "something" ...
If you would like some advice re Naming Convention, please review this Answer.
Use singular, because the table name is supposed to refer to a row (relation), not to the content (we know it contains many rows). Then all the English used re the table_name makes sense. (Eg. refer my Predicates.)
You have some incorrect or extraneous columns. It is easier to give you a Data Model, than to explain each item. A couple of items do need explanation:
school.total_branches is a duplicate, because that value can easily be derived (by COUNT() of the Branches). It breaks Normalisation rules, and introduces an Update Anomaly, which can get "out of synch".
course.school_id is incorrect, given that each Branch may or may not teach a Course. That relation is 1 Course to many Branches, it should be in the new table you are contemplating.
By JSON, if you mean construct an array on the client instead of keeping the relations in the database, then no, definitely not. Data and relationships to data, should be implemented in the database. For many reasons, the most important of which is Integrity. Following that, you may easily drag it into the client, and keep it there for stream-performance purposes.
The table you are thinking about is an Associative Table, an ordinary Relational construct to relate ("map", "link") two parent tables, here Courses to Branches.
Data duplication is not prevented. Refer to the Keys is the Data Model.
ID columns you have do not provide row uniqueness, which the Relational Model demands. If that is not clear to you please read this Answer.
Solution
Here is the model.
Proposed School Data Model
Please review and comment.
I need to ensure that you understand the notation in IDEF1X models, that unlike non-standard diagrams: every little notch, tick and line means something very specific. If not, please got to the IDEF1X Notation link at the bottom right of the model.
Please check the Predicates carefully, they (a) explain the model, and (b) are used to verify it. It is a feedback loop. They have two separate benefits.
If you would like more information on Predicates, why they are relevant, please go to this Answer and read the Predicate section.
If you wish to thoroughly understand Predicates, with a view to understanding Data Modelling, consider that Data Model (latest version is linked at the top of the Answer) against those Predicates. Ie. see if you understand a database that you have never seen before, via the model plus Predicates.
The Relational Keys I have given provide the row uniqueness that is required for Relational databases, duplicate data must be prevented. Note that ID columns are simply not needed. The Relational Keys provide:
Data Integrity
Relational access to data (notice the ease of, and unlimited, joins)
Relational speed
None of which a Record Filing System (characterised by ID columns) has.
Column description:
I have implemented two address_lines. Obviously, that should not include city because that is a separate column.
I presume area means something like borough or county or the area that the school branch operates in. If it is a fixed geographic administrative region (my first two descriptors) then it requires a formal structure. If not (my third descriptor), ie. it is loose, or (eg) it spans counties, then a simple Lookup table is enough.
If you use formal administrative regions, then city must move into that structure.
Your approach with an additional table seems the simplest and most straightforward to me. I would not mix JSON in this.

Is using a Master Table for shared columns good practice for an entire database?

Below, I explain a basic design for a database I am working on. As I am not a DB, I am concerned if I am on a good track or a bad one so I wanted to float this on stack for some advice. I was not able to find a similar discussion that fit's my design.
In my database, every table is considered an entity. An Entity could be a customer account, a person, a user, a set of employee information, contractor information, a truck, a plane, a product, a support ticket, etc etc. Here are my current entities (Tables)...
People
Users
Accounts
AccountUsers
Addresses
Employee Information
Contractor Information
And to store information about these Entities I have two tables:
Entity Tables
-EntityType
-> EntityTypeID (INT)
-Entities
-> EntityID (BIGINT)
-> EnitityType (INT) : foreign key
Every table I have made has an Auto Generated primary key, and a foreign key on an entityID column to the entities table.
In the entities table I have some shared fields like,
DateCreated
DateModified
User_Created
User_Modified
IsDeleted
CanUIDelete
I use triggers on all of the table's to automatically create their entity entry with the correct entity type on inserts. And update triggers update the LastModified date.
From an application layer point of view, all the code has to do is worry about the individual entities (except for the USER_Modified/User_Created fields "it does updates on that" by joining on the entityID)
Now the reason for the entities table, is down the line I plan on having an EAV model, so every entity type can be extended with custom fields. It also serves as a decent place to store metadata about the entities (like the created/modified fields).
I'm just new to DB design, and want a 2nd opinion.
I plan on having an EAV model, so every entity type can be extended with custom fields.
Why? Do all your entities require to be extensible in this way? Probably not -- in most applications there are one or two entities at most that would benefit from this level of flexibility. The other entities actually benefit from the stability and clarity of not changing all the time.
EAV is an example of the Inner-Platform Effect:
The Inner-Platform Effect is a result of designing a system to be so customizable that it ends becoming a poor replica of the platform it was designed with.
In other words, now it's your responsibility to write application code to do all the things that a proper RDBMS already provides, like constraints and data types. Even something as simple as making a column mandatory like NOT NULL doesn't work in EAV.
It's true sometimes a project requires a lot of tables. But you're fooling yourself if you think you have simplified the project by making just two tables. You will still have just as many distinct Entities as you would have had tables, but now it's up to you to keep them from turning into a pile of rubbish.
Before you invest too much time into EAV, read this story about a company that nearly ceased to function because someone tried to make their data repository arbitrarily flexible: Bad CaRMa.
I also wrote more about EAV in a blog post, EAV FAIL, and in a chapter of my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You haven't really given a design. If you had given a description of tables, the application-oriented criterion for when a row goes in of each them and consequent constraints including keys, fks etc for the part of your application involving your entities then you would have given part of a design. In other words, if you had given that part's straightforward relational design. (Just because you're not implementing it that way doesn't mean you don't need to design properly.) Notice that this must include application-level state and functionality for "extending with custom fields". But then you have to give a description of tables, the criterion for when a row goes in each of them and consequent constraints including keys, fks etc for the part of your implementation that encodes the previous part via EAV, plus operators for manipulating them. In other words, if you had given that part's straightforward relational design. The part of your design that is implementing a DBMS. Then you would really have given a design.
The notion that one needs to use EAV "so every entity type can be extended with custom fields" is mistaken. Just implement via calls that update metadata tables sometimes instead of just updating regular tables: DDL instead of DML.

Restructure Inventory Management Database (2 to 3 Tables; Development Stage)

I’m developing a database. I’d appreciate some help restructuring 2 to 3 tables so the database is both compliant with the first 3 normal forms; and practical to use and to expand on / add to in the future. I want to invest time now to reduce effort / and confusion later.
PREAMBLE
Please be aware that I'm both a nube, and an amateur, though I have a certain amount of experience and skill and an abundance of enthusiasm!
BACKGROUND TO PROJECT
I am writing a small (though ambitious!) web application (using PHP and AJAX to a MySQL database). It is essentially an inventory management system, for recording and viewing the current location of each individual piece of equipment, and its maintenance history. If relevant, transactions will be very low (probably less than 100 a day, but with a possibility of simultaneous connections / operations). Row count will also be very low (maybe a few thousand).
It will deal with many completely different categories of equipment, eg bikes and lamps (to take random examples). Each unit of equipment will have its details or specifications recorded in the database. For a bike, an important specification might be frame colour, whereas a lamp it might require information regarding lampshade material.
Since the categories of equipment have so little in common, I think the most logical way to store the information is 1 table per category. That way, each category can have columns specific to that category.
I intend to store a list of categories in a separate table. Each category will have an id which is unique to that category. (Depending on the final design, this may function as a lookup table and / or as a table to run queries against.) There are likely to be very few categories (perhaps 10 to 20), unless the system is particulary succesful and it expands.
A list of bikes will be held in the bikes table.
Each bike will have an id which is unique to that bike (eg bike 0001).
But the same id will exist in the lamp table (ie lamp 0001).
With my application, I want the user to select (from a dropdown list) the category type (eg bike).
They will then enter the object's numeric id (eg 0001).
The combination of these two ids is sufficient information to uniquely identify an object.
Images:
Current Table Design
Proposed Additional Table
PROBLEM
My gut feeling is that there should be an “overarching table” that encompasses every single article of equipment no matter what category it comes from. This would be far simpler to query against than god knows how many mini tables. But when I try to construct it, it seems like it will break various normal forms. Eg introducing redundancy, possibility of inconsistency, referential integrity problems etc. It also begins to look like a domain table.
Perhaps the overarching table should be a query or view rather than an entity?
Could you please have a look at the screenshots and let me know your opinion. Thanks.
For various reasons, I’d prefer to use surrogate keys rather than natural keys if possible. Ideally, I’d prefer to have that surrogate key in a single column.
Currently, the bike (or lamp) table uses just the first column as its primary key. Should I expand this to a composite key including the Equipment_Category_ID column too? Then make the Equipment_Article table into a view joining on these two columns (iteratively for each equipment category). Optionally Bike_ID and Lamp_ID columns could be renamed to something generic like Equipment_Article_ID. This might make the query simpler, but is there a risk of losing specificity? It would / could still be qualified by the table name.
Speaking of redundancy, the Equipment_Category_ID in the current lamp or bike tables seems a bit redundant (if every item / row in that table has the same value in that column).
It all still sounds messy! But surely this must be very common problem for eg online electronics stores, rental shops, etc. Hopefully someone will say oh that old chestnut! Fingers crossed! Sorry for not being concise, but I couldn't work out what bits to leave out. Most of it seems relevant, if a bit chatty. Thanks in advance.
UPDATE 27/03/2014 (Reply to #ElliotSchmelliot)
Hi Elliot.
Thanks for you reply and for pointing me in the right direction. I studied OOP (in Java) but wasn't aware that something similar was possible in SQL. I read the link you sent with interest, and the rest of the site/book looks like a great resource.
Does MySQL InnoDB Support Specialization & Generalization?
Unfortunately, after 3 hours searching and reading, I still can't find the answer to this question. Keywords I'm searching with include: MySQL + (inheritance | EER | specialization | generalization | parent | child | class | subclass). The only positive result I found is here: http://en.wikipedia.org/wiki/Enhanced_entity%E2%80%93relationship_model. It mentions MySQL Workbench.
Possible Redundancy of Equipment_Category (Table 3)
Yes and No. Because this is a lookup table, it currently has a function. However because every item in the Lamp or the Bike table is of the same category, the column itself may be redundant; and if it is then the Equipment_Category table may be redundant... unless it is required elsewhere. I had intended to use it as the RowSource / OptionList for a webform dropdown. Would it not also be handy to have Equipment_Category as a column in the proposed Equipment parent table. Without it, how would one return a list of all Equipment_Names for the Lamp category (ignoring distinct for the moment).
Implementation
I have no way of knowing what new categories of equipment may need to be added in future, so I’ll have to limit attributes included in the superclass / parent to those I am 100% sure would be common to all (or allow nulls I suppose); sacrificing duplication in many child tables for increased flexibility and hopefully simpler maintenance in the long run. This is particulary important as we will not have professional IT support for this project.
Changes really do have to be automated. So I like the idea of the stored procedure. And the CreateBike example sounds familiar (in principle if not in syntax) to creating an instance of a class in Java.
Lots to think about and to teach myself! If you have any other comments, suggestions etc, they'd be most welcome. And, could you let me know what software you used to create your UML diagram. Its styling is much better than those that I've used.
Cheers!
You sound very interested in this project, which is always awesome to see!
I have a few suggestions for your database schema:
You have individual tables for each Equipment entity i.e. Bike or Lamp. Yet you also have an Equipment_Category table, purely for identifying a row in the Bike table as a Bike or a row in the Lamp table as a Lamp. This seems a bit redundant. I would assume that each row of data in the Bike table represents a Bike, so why even bother with the category table?
You mentioned that your "gut" feeling is telling you to go for an overarching table for all Equipment. Are you familiar with the practice of generalization and specialization in database design? What you are looking for here is specialization (also called "top-down".) I think it would be a great idea to have an overarching or "parent" table that represents Equipment. Then, each sub-entity such as Bike or Lamp would be a child table of Equipment. A parent table only has the fields that all child tables share.
With these suggestions in mind, here is how I might alter your schema:
In the above schema, everything starts as Equipment. However, each Equipment can be specialized into Lamp, Bike, etc. The Equipment entity has all of the common fields. Lamp and Bike each have fields specific to their own type. When creating an entity, you first create the Equipment, then you create the specialized entity. For example, say we are adding the "BMX 200 Ultra" bike. We first create a record in the Equipment table with the generic information (equipmentName, dateOfPurchase, etc.) Then we create the specialized record, in this case a Bike record with any additional bike-specific fields (wheelType, frameColor, etc.) When creating the specialized entities, we need to make sure to link them back to the parent. This is why both the Lamp and Bike entities have a foreign key for equipmentID.
An easy and effective way to add specialized entities is to create a stored procedure. For example, lets say we have a stored procedure called CreateBike that takes in parameters bikeName, dateOfPurchase, wheelType, and frameColor. The stored procedure knows we are creating a Bike, and therefore can easily create the Equipment record, insert the generic equipment data, create the bike record, insert the specialized bike data, and maintain the foreign key relationship.
Using specialization will make your transactional life very simple. For example, if you want all Equipment purchased before 1/1/14, no joins are needed. If you want all Bikes with a frameColor of blue, no joins are needed. If you want all Lamps made of felt, no joins are needed. The only time you will need to join a specialized table back to the Equipment table is if you want data both from the parent entity and the specialized entity. For example, show all Lamps that use 100 Watt bulbs and are named "Super Lamp."
Hope this helps and best of luck!
Edit
Specialization and Generalization, as mentioned in your provided source, is part of an Enhanced Entity Relationship (EER) which helps define a conceptual data model for your schema. As such, it does not need to be "supported" per say, it is more of a design technique. Therefore any database schema naturally supports specialization and generalization as long as the designer implements it.
As far as your Equipment_Category table goes, I see where you are coming from. It would indeed make it easy to have a dropdown of all categories. However, you could simply have a static table (only contains Strings that represent each category) to help with this population, and still keep your Equipment tables separate. You mentioned there will only be around 10-20 categories, so I see no reason to have a bridge between Equipment and Equipment_Category. The fewer joins the better. Another option would be to include an "equipmentCategory" field in the Equipment table instead of having a whole table for it. Then you could simply query for all unique equipmentCategory values.
I agree that you will want to keep your Equipment table to guaranteed common values between all children. Definitely. If things get too complicated and you need more defined entities, you could always break entities up again down the road. For example maybe half of your Bike entities are RoadBikes and the other half are MountainBikes. You could always continue the specialization break down to better get at those unique fields.
Stored Procedures are great for automating common queries. On top of that, parametrization provides an extra level of defense against security threats such as SQL injections.
I use SQL Server. The diagram I created is straight out of SQL Server Management Studio (SSMS). You can simply expand a database, right click on the Database Diagrams folder, and create a new diagram with your selected tables. SSMS does the rest for you. If you don't have access to SSMS I might suggest trying out Microsoft Visio or if you have access to it, Visual Paradigm.

Implementing inheritance in MySQL: alternatives and a table with only surrogate keys

This is a question that has probably been asked before, but I'm having some difficulty to find exactly my case, so I'll explain my situation in search for some feedback:
I have an application that will be registering locations, I have several types of locations, each location type has a different set of attributes, but I need to associate notes to locations regardless of their type and also other types of content (mostly multimedia entries and comments) to said notes. With this in mind, I came up with a couple of solutions:
Create a table for each location type, and a "notes" table for every location table with a foreign key, this is pretty troublesome because I would have to create a multimedia and comments table for every comments table, e.g.:
LocationTypeA
ID
Attr1
Attr2
LocationTypeA_Notes
ID
Attr1
...
LocationTypeA_fk
LocationTypeA_Notes_Multimedia
ID
Attr1
...
LocationTypeA_Notes_fk
And so on, this would be quite annoying to do, but after it's done, developing on this structure should not be so troublesome.
Create a table with a unique identifier for the location and point content there, like so:
Location
ID
LocationTypeA
ID
Attr1
Attr2
Location_fk
Notes
ID
Attr1
...
Location_fk
Multimedia
ID
Attr1
...
Notes_fk
As you see, this is far more simple and also easier to develop, but I just don't like the looks of that table with only IDs (yeah, that's truly the only objection I have to this, it's the option I like the most, to be honest).
Similar to option 2, but I would have an enormous table of attributes shaped like this:
Location
ID
Type
Attribute
Name
Value
And so on, or a table for each attribute; a la Drupal. This would be a pain to develop because then it would take several insert/update operations to do something on a location and the Attribute table would be several times bigger than the location table (or end up with an enormous amount of attribute tables); it also has the same issue of the surrogate-keys-only table (just it has a "type" now, which I would use to define the behavior of the location programmatically), but it's a pretty solution.
So, to the question: which would be a better solution performance and scalability-wise?, which would you go with or which alternatives would you propose? I don't have a problem implementing any of these, options 2 and 3 would be an interesting development, I've never done something like that, but I don't want to go with an option that will collapse on itself when the content grows a bit; you're probably thinking "why not just use Drupal if you know it works like you expect it to?", and I'm thinking "you obviously don't know how difficult it is to use Drupal, either that or you're an expert, which I'm most definitely not".
Also, now that I've written all of this, do you think option 2 is a good idea overall?, do you know of a better way to group entities / simulate inheritance? (please, don't say "just use inheritance!", I'm restricted to using MySQL).
Thanks for your feedback, I'm sorry if I wrote too much and meant too little.
ORM systems usually use the following, mostly the same solutions as you listed there:
One table per hierarchy
Pros:
Simple approach.
Easy to add new classes, you just need to add new columns for the additional data.
Supports polymorphism by simply changing the type of the row.
Data access is fast because the data is in one table.
Ad-hoc reporting is very easy because all of the data is found in one table.
Cons:
Coupling within the class hierarchy is increased because all classes are directly coupled to the same table.
A change in one class can affect the table which can then affect the other classes in the hierarchy.
Space potentially wasted in the database.
Indicating the type becomes complex when significant overlap between types exists.
Table can grow quickly for large hierarchies.
When to use:
This is a good strategy for simple and/or shallow class hierarchies where there is little or no overlap between the types within the hierarchy.
One table per concrete class
Pros:
Easy to do ad-hoc reporting as all the data you need about a single class is stored in only one table.
Good performance to access a single object’s data.
Cons:
When you modify a class you need to modify its table and the table of any of its subclasses. For example if you were to add height and weight to the Person class you would need to add columns to the Customer, Employee, and Executive tables.
Whenever an object changes its role, perhaps you hire one of your customers, you need to copy the data into the appropriate table and assign it a new POID value (or perhaps you could reuse the existing POID value).
It is difficult to support multiple roles and still maintain data integrity. For example, where would you store the name of someone who is both a customer and an employee?
When to use:
When changing types and/or overlap between types is rare.
One table per class
Pros:
Easy to understand because of the one-to-one mapping.
Supports polymorphism very well as you merely have records in the appropriate tables for each type.
Very easy to modify superclasses and add new subclasses as you merely need to modify/add one table.
Data size grows in direct proportion to growth in the number of objects.
Cons:
There are many tables in the database, one for every class (plus tables to maintain relationships).
Potentially takes longer to read and write data using this technique because you need to access multiple tables. This problem can be alleviated if you organize your database intelligently by putting each table within a class hierarchy on different physical disk-drive platters (this assumes that the disk-drive heads all operate independently).
Ad-hoc reporting on your database is difficult, unless you add views to simulate the desired tables.
When to use:
When there is significant overlap between types or when changing types is common.
Generic Schema
Pros:
Works very well when database access is encapsulated by a robust persistence framework.
It can be extended to provide meta data to support a wide range of mappings, including relationship mappings. In short, it is the start at a mapping meta data engine.
It is incredibly flexible, enabling you to quickly change the way that you store objects because you merely need to update the meta data stored in the Class, Inheritance, Attribute, and AttributeType tables accordingly.
Cons:
Very advanced technique that can be difficult to implement at first.
It only works for small amounts of data because you need to access many database rows to build a single object.
You will likely want to build a small administration application to maintain the meta data.
Reporting against this data can be very difficult due to the need to access several rows to obtain the data for a single object.
When to use:
For complex applications that work with small amounts of data, or for applications where you data access isn’t very common or you can pre-load data into caches.

(Somewhat) complicated database structure vs. simple — with null fields

I'm currently choosing between two different database designs. One complicated which separates data better then the more simple one. The more complicated design will require more complex queries, while the simpler one will have a couple of null fields.
Consider the examples below:
Complicated:
Simpler:
The above examples are for separating regular users and Facebook users (they will access the same data, eventually, but login differently). On the first example, the data is clearly separated. The second example is way simplier, but will have at least one null field per row. facebookUserId will be null if it's a normal user, while username and password will be null if it's a Facebook-user.
My question is: what's prefered? Pros/cons? Which one is easiest to maintain over time?
First, what Kirk said. It's a good summary of the likely consequences of each alternative design. Second, it's worth knowing what others have done with the same problem.
The case you outline is known in ER modeling circles as "ER specialization". ER specialization is just different wording for the concept of subclasses. The diagrams you present are two different ways of implementing subclasses in SQL tables. The first goes under the name "Class Table Inheritance". The second goes under the name "Single Table Inheritance".
If you do go with Class table inheritance, you will want to apply yet another technique, that goes under the name "shared primary key". In this technique, the id fields of facebookusers and normalusers will be copies of the id field from users. This has several advantages. It enforces the one-to-one nature of the relationship. It saves an extra foreign key in the subclass tables. It automatically provides the index needed to make the joins run faster. And it allows a simple easy join to put specialized data and generalized data together.
You can look up "ER specialization", "single-table-inheritance", "class-table-inheritance", and "shared-primary-key" as tags here in SO. Or you can search for the same topics out on the web. The first thing you will learn is what Kirk has summarized so well. Beyond that, you'll learn how to use each of the techniques.
Great question.
This applies to any abstraction you might choose to implement, whether in code or database. Would you write a separate class for the Facebook user and the 'normal' user, or would you handle the two cases in a single class?
The first option is the more complicated. Why is it complicated? Because it's more extensible. You could easily include additional authentication methods (a table for Twitter IDs, for example), or extend the Facebook table to include... some other facebook specific information. You have extracted the information specific to each authentication method into its own table, allowing each to stand alone. This is great!
The trade off is that it will take more effort to query, it will take more effort to select and insert, and it's likely to be messier. You don't want a dozen tables for a dozen different authentication methods. And you don't really want two tables for two authentication methods unless you're getting some benefit from it. Are you going to need this flexibility? Authentication methods are all similar - they'll have a username and password. This abstraction lets you store more method-specific information, but does that information exist?
Second option is just the reverse the first. Easier, but how will you handle future authentication methods and what if you need to add some authentication method specific information?
Personally I'd try to evaluate how important this authentication component is to the system. Remember YAGNI - you aren't gonna need it - and don't overdesign. Unless you need that extensibility that the first option provides, go with the second. You can always extract it at a later date if necessary.
This depends on the database you are using. For example Postgres has table inheritance that would be great for your example, have a look here:
http://www.postgresql.org/docs/9.1/static/tutorial-inheritance.html
Now if you do not have table inheritance you could still create views to simplify your queries, so the "complicated" example is a viable choice here.
Now if you have infinite time than I would go for the first one (for this one simple example and prefered with table inheritance).
However, this is making things more complicated and so will cost you more time to implement and maintain. If you have many table hierarchies like this it can also have a performance impact (as you have to join many tables). I once developed a database schema that made excessive use of such hierarchies (conceptually). We finally decided to keep the hierarchies conceptually but flatten the hierarchies in the implementation as it had gotten so complex that is was not maintainable anymore.
When you flatten the hierarchy you might consider not using null values, as this can also prove to make things a lot harder (alternatively you can use a -1 or something).
Hope these thoughts help you!
Warning bells are ringing loudly with the presence of two the very similar tables facebookusers and normalusers. What if you get a 3rd type? Or a 10th? This is insane,
There should be one user table with an attribute column to show the type of user. A user is a user.
Keep the data model as simple as you possibly can. Don't build it too much kung fu via data structure. Leave that for the application, which is far easier to alter than altering a database!
Let me dare suggest a third. You could introduce 1 (or 2) tables that will cater for extensibility. I personally try to avoid designs that will introduce (read: pollute) an entity model with non-uniformly applicable columns. Have the third table (after the fashion of the EAV model) contain a many-to-one relationship with your users table to cater for multiple/variable user related field.
I'm not sure what your current/short term needs are, but re-engineering your app to cater for maybe, twitter or linkedIn users might be painful. If you can abstract the content of the facebookUserId column into an attribute table like so
user_attr{
id PK
user_id FK
login_id
}
Now, the above definition is ambiguous enough to handle your current needs. If done right, the EAV should look more like this :
user_attr{
id PK
user_id FK
login_id
login_id_type FK
login_id_status //simple boolean flag to set the validity of a given login
}
Where login_id_type will be a foreign key to an attribute table listing the various login types you currently support. This gives you and your users flexibility in that your users can have multiple logins using different external services without you having to change much of your existing system