Table design and class hierarchies - language-agnostic

Hopefully someone can shed some light on this issue through either an example, or perhaps some suggested reading. I'm wondering what is the best design approach for modeling tables after their class hierarchy equivalencies. This can best be described through an example:
abstract class Card{
private $_name = '';
private $_text = '';
}
class MtgCard extends Card{
private $_manaCost = '';
private $_power = 0;
private $_toughness = 0;
private $_loyalty = 0;
}
class PokemonCard extends Card{
private $_energyType = '';
private $_hp = 0;
private $_retreatCost = 0;
}
Now, when modeling tables to synchronize with this class hierarchy, I've gone with something very similar:
TABLE Card
id INT, AUTO_INCREMENT, PK
name VARCHAR(255)
text TEXT
TABLE MtgCard
id INT, AUTO_INCREMENT, PK
card_id INT, FK(card.id)
manacost VARCHAR(32)
power INT
toughness INT
loyalty INT
TABLE PokemonCard
id INT, AUTO_INCREMENT, PK
card_id INT, FK(card.id)
hp INT
energytype ENUM(...)
retreatcost INT
The problem I'm having is trying to figure out how to associate each Card record with the record containing it's details from the corresponding table. Specifically, how to determine what table I should be looking in.
Should I add a VARCHAR column to Card to hold the name of the associated table? That's the only resolution that my peers and I have come to, but it seems too "dirty". Keeping the design extensible is the key here, allowing for the easy addition of new subclasses.
If someone could provide an example or resources showing a clean way of mirroring class/table hierarchies, it would be most appreciated.

Google "generalization specialization relational modeling". You'll find several excellent articles on the subject of how to model the gen-spec pattern using relational tables. This same question has been asked many times in SO, with slightly different details.
The best of these articles will confirm your decision to have one table for generalized data and separate tables for specialized data. The biggest difference will be the way they recommend using primary and foreign keys. Basically, they recommend that specialized tables have a single column that does double duty. It serves as the primary key to the specialized table, but it's also a foreign key that duplicates the PK of the generalized table.
This is a little complicated to maintain, but it's very sweet at join time.
Also keep in mind that DDL is required when a new class is added to the hierarchy.

Basically don't.
Forget about class hierarchies, storage models, and anything that is specific to your app and your particular app language. Unless you want to use the RDb as a mere storage location for your files, a dependent slave.
If you want the power and flexibility (specifically extensibility) of the relational Database, then you need to model it independent of any app, and using RDb principles, not app language requirements. Leave your app context behind for a while and design the database as a database. Learn about them. Normalise (eliminate all duplication). Learn about the structures and rules, and implement them. When you do that, your queries and your "mapping", will be effortless. There will be no "impedance". Use the correct datatypes and there will be no mismatch.
The structure you require is an ordinary subtype-supertype. Those are Relational Database terms that have been in existence for over 30 years in the RM, and over 23 years in Relational Database products. No need to call them funny new names. Wikipedia is not an academic reference.
Given your tables, which are quite correct as a starting point (you've Normalised automatically), you need:
Rename Card.Id as Card.CardId
Remove the ids for the subtypes, they are 100% redundant; the CardId is both the PK and the FK.
Add a discriminator Card.CardType CHAR(1) or TINYINT. This will identify which subtype to join with, when the CardType is not known.
It appears you do not fully understand the concept of Foreign Keys, so that would be good to gear up on first. It is implemented here in its simple, ordinary form:
ALTER TABLE MtgCard
ADD CONSTRAINT Card_MtgCard_fk
FOREIGN KEY (CardId)
REFERENCES Card(CardId)
The relation between Card and MtgCard or PokemonCard is always 1::1. The supertype is complete only when there is a Card plus { MtgCard | PokemonCard } with the same CardId. In your case there can be only one subtype, easy to enforce with a simple CHECK constraint.
In other cases, more than one subtype is quite legal.
The subtypes there are Person Is a Teacher or Person Is a Student
In Relational Databases there is no concept of joining "from" or "to" (or up/down or left/right), those notions are only there to assist us humans; you can start with any table/key you have, and go to any table you need. The tables in-between are demanded only in the absence of Relational Identifiers (ie. where additional Surrogates, ID columns, are used as PKs instead of meaningful natural keys).
In the example, using your terms, you can go straight from Enrollment to Person (eg, to grab the LastName) or to Course (to grab the Name) without having to visit the intermediate tables; the relation lines are solid.
.
Now, class hierarchies ("Is" or "Is a") and anything else, are simple and effortless.
Quick Reference to Standard Relational Database Diagrams.

Related

SQL for one to one between a single table

I'd like to know what the best way of reflecting relations between precisely two rows from a single (my)sql table is?
Exemplified, we have:
table Person { id, name }
If I want to reflect that persons can be married monogamously (in liberal countries at least), is it better to use foreign keys within the Person:
table Person { id, name, spouse_id(FK(Person.id)) }
and then create stored procedures to marry and divorce Persons (ensuring mutual registration of the marriage or annulment of it + triggers to handle on_delete events..
or use a mapping table:
table Marriage {
spouse_a(FK(Person.id)),
spouse_b(FK,Person.id) + constraint(NOT IN spouse_a))
}
This way divorces (delete) would simply be delete queries without triggers to cascade, and marriage wouldn't require stored procedure.
The constraint is to prevent polygamy / multi-marriage
I guess the second option is preferred? What is the best way to do this?
I need to be able to update this relation on and off, so it has to be manageable..
EDIT:
Thanks for the replies - in practice the application is physical point-to-point interfaces in networking, where it really is a 1:1 relationship (monogamous marriage), and change in government, trends etc will not change this :)
I'm going to use a separate table with A & B, having A < B checked..
To ensure monogamy, you simply want to ensure that the spouses are unique. So, this almost does what you want:
create table marriage (
spouse_a int not null unique,
spouse_b int not null unique
);
The only problem is that a given spouse can be in either table. One normally handles this with a check constraint:
check (spouse_a < spouse_b)
Voila! Uniqueness for the relationship.
Unfortunately, MySQL does not support check constraints. So you can implement this using a trigger or at the application layer.
Option #1 - Add relationships structurally
You can add one additional table for every conceivable relationship between two people. But then, when someone asks for a new ralationship you forgot to add structurally, you'll need to add a new table.
And then, there will be relationship for three people at a time. And then four. And then, variable size relationships. You name it.
Option #2 - Model relationships as tables
To make it fool proof (well... never possible) you could model the relationships into a new table. This table can have several properties such as size, and also you can model restrictions to it. For example, you can decide to have a single person be the "leader of the cult" if you wish to.
This option requires more effor to design, but will resist much more options, and ideas from your client that you never thought before.

How can I implement a dual relationship (Null, Not Null) in MySQL without creating two classes?

If I have a TABLE that consists of a LIST of attributes of one TYPE and another TABLE that is also composed by the same TYPE of LIST, so how can I implements them into MySQL without having to create two TABLES of the same TYPE?
For example:
Like the EMPLOYEE table, the COMPANY table has a list of ADDRESSES.
And I want to implement without having to make one table ADDRESS for COMPANY and another for EMPLOYEE, as in this case:
To me the solution seems to be a dual relationship where one of the foreign keys must be null while the other may not be, but I don't even know how to do it.
I believe your underlying hyopthesis is flawed: a one-person Company, founded by an Employee, could be registered at the founder's personal adress. Therefore an Employee may have the same address as a Company.
Likewise, two Employees may share the same address (a husband and a wife could be coworkers). Therefore I would define the relationship between adress and each of the other two entities a regular many-to-many, without any further condition.
You might be worried that changing the address of (eg.) a Company would wrongly alter the Employee's address. But instead of updating the Address entity, treat this case as creating a new Address and linking the (eg.) Company to this new Address.
Now, if you really need to implement the constraint, regardless of the (oh so dull ;) reality, I see no other option but implementing a form of inheritance as decribed here.
Notice that:
your initial design actually implements the Single Table Inheritance pattern
your alternative proposal (based on two separate address tables) actually implements the Concrete Table Inheritance pattern1
In your initial design, the constraint can be implemented with a simple CHECK constraint2 on the address table, the condition being company_id IS NULL XOR employee_id IS NULL
1 Migrating to the Class Table Inheritance pattern is "left as an exercise". However this pattern does not help in enforcing this constraint (in fact, this elegant pattern has serious additional limitations with enforcing integrity constraints in general). Nevertheless the constraint could still be enforced with a CHECK constraint.
2 MySQL does not enforce CHECK constraints, but a similar effect can be achieved through the use of triggers.

Database Structure for Inconsistent Data

I am creating a database for my company that will store many different types of information. The categories are Brightness, Contrast, Chromaticity, ect. Each category has a number of data points which my company would like to start storing.
Normally, I would create a table for each category which would store the corresponding data. (This is how I learned to do it). However, Sometimes these categories have "sub-data" which would change the number of fields required in each table.
My question is then how do people handle the inconsistency of data when structuring their databases? Do they just keep adding more tables for extra data or is it something else altogether?
There are a few (and thank goodness only a few) unbendable rules about relational database models. One of those is, that if you don't know what to store, you have a hard time storing it. Chances are, you'll have an even harder time retrieving it.
That said, the reality of business rules is often less clear cut than the ivory tower of database design. Most importantly, you might want or even need a way to introduce a new property without changing the schema.
Here are two feasable ways to go at this:
Use a datastore, that specializes in loose or inexistant schemas
(NoSQL and friends). Explaining this in detail is a subject of a CS
Thesis, not a stackoverflow answer.
My recommendation: Use a separate properties table - here is how
this goes:
Assuming for the sake of argument, your products allways have (unique string) name, (integer) id, brightness, contrast, chromaticity plus sometimes (integer) foo and (string) bar, consider these tables
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
brightness INT,
contrast INT,
chromaticity INT,
UNIQUE INDEX(name)
);
CREATE TABLE properties (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
proptype ENUM('null','int','string') NOT NULL default 'null',
UNIQUE INDEX(name)
);
INSERT INTO properties VALUES
(0,'foo','int'),
(0,'bar','string');
CREATE TABLE product_properties (
id INT PRIMARY KEY AUTO_INCREMENT,
products_id INT NOT NULL,
properties_id INT NOT NULL,
intvalue INT NOT NULL,
stringvalue VARCHAR(250) NOT NULL,
UNIQUE INDEX(products_id,properties_id)
);
now your "standard" properties would be in the products table as usual, while the "optional" properties would be stored in a row of product_properties, that references the product id and property id, with the value being in intvalue or stringvalue.
Selecting products including their foo if any would look like
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
AND product_properties.property_id=1
or even
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
LEFT JOIN properties
ON product_properties.property_id=properties.id
WHERE properties.name='foo' OR properties.name IS NULL
Please understand, that this incurs a performance penalty - in fact you trade performance against flexibility: Adding another property is nothing more than INSERTing a row in properties, the schema stays the same.
If you're not mysql bound then other databases have table inheritance or arrays to solve certain of those niche cases. Postgresql is a very nice database that you can use as easily and freely as mysql.
With mysql you could:
change your tables, add the extra columns and allow for NULL in the subcategory data that you don't need. This way integrity can be checked since you can still put constraints on the columns. Unless you really have a lot of subcategory columns this way I'd recommend this, otherwise option 3.
store subcategory data dynamically in a seperate table, that has a category_id,category_row_id,subcategory identifier(=type of subcategory) and a value column: that way you can retrieve your data by linking it via the category_id (determines table) and the category_row_id (links to PK of the original category table row). The bad thing: you can't use foreign keys or constraints properly to enforce integrity, you'd need to write hairy insert/update triggers to still have some control there which would push the burden of integrity checking and referential checking solely on the client. (in which case you'd properly be better of going NoSQL route) In short I wouldn't recommend this.
You can make a seperate subcategory table per category table, columns can be fixed or variable via value column(s) + optional subcategory identifier, foreign keys can still be used, best to maintain integrity is fixed since you'll have the full range of constraints at your disposal. If you have a lot of subcategory columns that would otherwise hopefully clutter your regular subcategory table then I'd recommend using this with fixed columns. Like the previous option I'd never recommend going dynamic for anything but throwaway data.
Alternatively if your subcategory is very variable and volatile: use NoSQL with a document database such as mongodb, mind you that you can keep all your regular data in a proper RDBMS and just storeside-data in the document database though that's probably not recommended.
If your subcategory data is in a known fixed state and not prone to change I'd just add the extra columns to the specific category table. Keep in mind that the major feature of a proper DBMS is safeguarding the integrity of your data via checks and constraints, doing away with that never really is a good idea.
If you are not limited to MySQL, you can consider Microsoft SQL server and using Sparse Columns This will allow you to expand your schema to include however many columns you want, without incurring the storage penalty for columns that are not pertinent for a given row.

Many to many relationships with large amount of different tables

I am having trouble developing a piece of my database schema. Currently, my app has a table of users, and a another table of events. I can easily set up a many to many relationship (using a third table) to hold information about which users are attending which events.
My problem is that events is just one feature of my app. The goal is to have a large number of different programs a user can take part in, and each will need its own table. Yet I still need to be able to call up a list of everything the user is signed up for.
Right now, I am thinking about just making one way relationships from each event table back to the user. I would then need to create a custom function (in my websites ORM) that queries each table independently and assembles a full list. I feel like this would be slow, so I am also entertaining the idea of creating a separate table that just list all the programs that users sign up for, and storing in there the info needed for my app to function. This would repeat info in my database, and in general doesn't sound as "clean", but probably would be faster.
Any suggestions as to the best way to handle relationships like this?
P.S. If it matters, I'm using Doctrine2 & Symfony2 to power my site.
In one of my web applications, I have used a this kind of construct for storing comments for any table that has integer as primary key:
CREATE TABLE Comments (
Table VARCHAR(24) NOT NULL,
RowID BIGINT NOT NULL,
Comments VARCHAR(2000) NOT NULL,
PRIMARY KEY (TABLE, RowID, COMMENTS)
);
In my case (DB2, less than 10 million rows in Comments table) it performs well.
So, applying it to your case:
CREATE TABLE Registration (
Table VARCHAR(24) NOT NULL,
RowID BIGINT NOT NULL,
User <datatype> NOT NULL,
Signup TIMESTAMP NOT NULL,
PRIMARY KEY (TABLE, RowID, User)
);
So, the 'Table' column identifies the table containing the program (say, 'Events' table). 'RowID' is the primary key in that table (e.g. PK of an entry in 'Events' table). To perform well, this requres the primary key being of same datatype in all target tables.
NoSQL solutions are cool, but pattern above works in plain old relational database.
What is unique about these event types that requires them to have their own table?
If the objects are so inherently different, make the object as simple as possible with only those things common to all Events:...
public Event
{
public Guid Id;
public string Title;
public DateTime Date;
public string Type;
public string TypeSpecificData; // serialized JSON/XML
}
// Not derived from Event, but built from it.
public SpecialEventType
{
public Guid Id;
// ... and the other common props from Event
// some kind of special prop parsed from the Event's serialized data
public string SpecialField;
}
The "type specific data" could then be used to store details about events that are not in common (that would normally require columns or new tables)... do it something like serialized XML or JSON
Map the table MTM to your Users table, and query by the basic event properties and its type.
Your code is then responsible for parsing the data using its Type property and some predefined XML schema you associate with it.
Very simple, keeps your database nice and clean, and fast, minimizes round trips. The tradeoff here is that you don't have the ability to query the DB for the specifics of a certain Event type... but for large scaling applications, with mature ORM layers, the performance tradeoff is worth it alone...
For example, now you query your data once for Events of a particular Type, build your pseudo-derived types from it, and then "query" them using LINQ.
Unless you have a ridiculous amount of types of events, querying events a user is signed up for from a few tables should not be much slower than querying the same thing from one long table of all the events.
I would take this approach, each table or collection has a user_id field which maps back to the Users table. You dont need to really create a separate function in the ORM. If each of the event types inherit from an event class, then you can just find all events by user_id.

Database design: objects with different attributes

I'm designing a product database where products can have very different attributes depending on their type, but attributes are fixed for each type and types are not manageable at all. E.g.:
magazine: title, issue_number, pages, copies, close_date, release_date
web_site: name, bandwidth, hits, date_from, date_to
I want to use InnoDB and enforce database integrity as much as the engine allows. What's the recommended way to handle this?
I hate those designs where tables have 100 columns and most of the values are NULL so I thought about something like this:
product_type
============
product_type_id INT
product_type_name VARCHAR
product
=======
product_id INT
product_name VARCHAR
product_type_id INT -> Foreign key to product_type.product_type_id
valid_since DATETIME
valid_to DATETIME
magazine
========
magazine_id INT
title VARCHAR
product_id INT -> Foreign key to product.product_id
issue_number INT
pages INT
copies INT
close_date DATETIME
release_date DATETIME
web_site
========
web_site_id INT
name VARCHAR
product_id INT -> Foreign key to product.product_id
bandwidth INT
hits INT
date_from DATETIME
date_to DATETIME
This can handle cascaded product deletion but... Well, I'm not fully convinced...
This is a classic OO design to relational tables impedance mismatch. The table design you've described is known as 'table per subclass'. The three most common designs are all compromises compared to what your objects actually look like in your app:
Table per concrete class
Table per hierarchy
Table per subclass
The design you don't like - "where tables have 100 columns and most of the values are NULL" - is 2. one Table to store the whole specialization hierarchy. This is the least flexible for all kinds of reasons, including - if your app requires a new sub-class, you need to add columns. The design you describe accommodates change much better because you can add extend it by adding a new sub-class table described by a value in product_type.
The remaining option - 1. Table per concrete class - is usually undesirable because of the duplication involved in implementing all the common fields in each specialization table. Although, the advantages are that you wont need to perform any joins and the sub-class tables can even be on different db instances in a very large system.
The design you described is perfectly viable. The variation below is how it might look if you were using an ORM tool to do your CRUD operations. Notice how the ID in each sub-class table IS the FK value to the parent table in the hierarchy. A good ORM will automatically manage the correct sub-class table CRUD based on the value of the discriminator values in product.id and product.product_type_id alone. Whether you are planning on using an ORM or not, look at hibernate's joined sub-class documentation, if only to see the design decisions they made.
product
=======
id INT
product_name VARCHAR
product_type_id INT -> Foreign key to product_type.product_type_id
valid_since DATETIME
valid_to DATETIME
magazine
========
id INT -> Foreign key to product.product_id
title VARCHAR
..
web_site
========
id INT -> Foreign key to product.product_id INT
name VARCHAR
..
You seem to be roughly on the right track, except that you may need to consider the difference between "a product" and what's often called "a stock-keeping unit" (SKU). Is a 25-units box of paper clips (of a certain specific kind) the same "product" as a 50-units box thereof? In terms of a store, or any kind of inventory system, the distinction matters; in some cases, indeed, a simple distinction in packaging of what's otherwise the same amount of the same underlying "product" may give you distinct SKUs to keep track of.
You need to decide where you want to keep track of this issue, if it matters to your application (it may be OK to have the products laid out as you do, and deal with packaging for SKU purposes in other tables, for example, even though for some apps that might be a slight overhead).
This actually a standard way to "enforce" a sort of OO design in a classical RDBMS.
All the "common" attributes go on the master table (e.g. Price, if it is mantained at the product table level, could easily be part of the main table) while the specifics go on a subtable.
In theory if you have sub-sub-types (e.g. magazines could be subtyped in daily newspapers and 4-colours periodicals, maybe, with periodicals having a date interval for shelf-life) you could add one or more sublevels too...
This is pretty common (and proven) design. The only concern is that the master table will always be joined with at least a subtable for most operations. If you have zillions of items this could have performance implications.
On the other hand, common operation like deleting an item (I'd suggest a logical deletion, setting a flag to "true" on the master table) would be done once for every kind of subtype.
Anyway, go for it. And maybe google around for "Object oriented to RDBMS mappings" or somesuch for a complete discussion.