Modeling the storage of multiple data types that also have parent child relationships - mysql

I'm trying to design a MySQL database for a project I've started but I cannot figure out the best way to do it.
Its an OOP system that contains different types of objects all of which need to be stored in the database. But those objects also need to maintain parent child relationships with one another. Also I want the flexibility to easily add new data types once the system is in production.
As far as I can see I have three options, one that is pure relational, one which I think is entity attribute value (I don't properly understand EAV) and the last is a hybrid design that I've thought of myself, but I assume has already been thought of before and has a proper name.
The relational design would consist of two tables, one large table with columns that allowed it to store any type of object and a second table to maintain the parent child relationships of the rows in the first table.
The EAV design would have two tables, one being an EAV table with the three columns (Entity id, Attribute and Value), the second table would then relate the parent child relationships of these entities.
The hybrid design would have a table for each type of object, then a parent child relation table that would have to store the id of the parent, child and some sort of identifier of the tables that these id's come from.
I'm sure this problem has been tackled and solved hundreds of times before and I would appreciate any references so I can read about the solutions.

This is the only truly relational design:
CREATE TABLE Objects (
object_id INT AUTO_INCREMENT PRIMARY KEY,
parent_object_id INT,
-- also attribute columns common to all object types
FOREIGN KEY (parent_object_id) REFRENCES Objects (object_id)
);
CREATE TABLE RedObjects (
object_id INT PRIMARY KEY,
-- attribute columns for red objects
FOREIGN KEY (object_id) REFRENCES Objects (object_id)
);
CREATE TABLE BlueObjects (
object_id INT PRIMARY KEY,
-- attribute columns for blue objects
FOREIGN KEY (object_id) REFRENCES Objects (object_id)
);
CREATE TABLE YellowObjects (
object_id INT PRIMARY KEY,
-- attribute columns for yellow objects
FOREIGN KEY (object_id) REFRENCES Objects (object_id)
);
But MySQL does not support recursive queries, so if you need to do complex queries to fetch the whole tree for instance, you'll need to use another method to store the relationships. I suggest a Closure Table design:
CREATE TABLE Paths (
ancestor_id INT,
descendant_id INT,
length INT DEFAULT 0,
PRIMARY KEY (ancestor_id, descendant_id),
FOREIGN KEY (ancestor_id) REFRENCES Objects (object_id),
FOREIGN KEY (descendant_id) REFRENCES Objects (object_id)
-- this may need additional indexes to support different queries
);
I describe more about the Closure Table here:
My answer to What is the most efficient/elegant way to parse a flat table into a tree?
My presentation Models for Hierarchical Data with SQL and PHP
My book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

Yes you can very well use the EAV design. It works for the application we created, although after about 3 years of refinement.
You can also use a generic table structure and use any particular table for a group of objects. Or just create one generic table for each object.
Which Table for which Object is part of a metadata repository.
If you use a val_int, val_string type of structure, you will have Null columns except where the value is stored. There are sparse matrix features of MS SQL which you might consider using. Disk size is somewhat cheap these days. So the only drawback you have vis-a-vis traditional structure is NxR rows (say R Attributes for the object) instead of N rows.
Other than that, few things to look out for are object instance GUIDs, dynamic sql generation...

Related

What is adding a new relationship to an existing table called?

In database terms, when i add a new foreign key, insert a record for that foreign key and update the existing record, what is the process called? My goal is to be able to find answers more effectively.
//create temporary linking key
alter table example add column example_foreign_key int unsigned null;
//contains more fields
insert into example_referenced_table (example_id, ...)
select id, ...
from example
join ...;
//link with the table
update example join example_referenced_table on example_id = example.id
set example.example_foreign_key = example_referenced_table.id;
//drop linking key
alter table example_referenced_table drop column example_id;
It looks like you're substituting one surrogate identifier for another. Introducing a surrogate key is sometimes (incorrectly) called normalization, so you may get some hits on that term.
In rare cases, normalization requires the introduction of a surrogate key, but in most cases, it simply decomposes a relation (table) into two or more, in such a way that no information is lost.
Surrogate keys are generally used when a natural or candidate key doesn't exist, isn't convenient, or not supported (e.g. composite keys are often a problem for object-relational mappers). For criteria on picking a good primary key, see: What are the design criteria for primary keys
There's little value in substituting one surrogate identifier for another, so the procedure you demonstrate has no proper name as far as I know, at least in the relational model.
If you mean to introduce a surrogate key as an identifier of a new entity set to which the original attribute is transferred, that's close to what Peter Chen called shifting a value set from the lower conceptual domain to the upper conceptual domain. You can find more information in his paper "The entity-relationship model - A basis for the enterprise view of data".
As for your question's title, it's not wrong to say that you're adding a relationship to a table (though that wording mixes conceptual and physical terms), but note that in the entity-relationship model, a relationship is represented by a two or more entity keys in a table (e.g. (id, example_foreign_key) in the example table) and not by a foreign key constraint between tables. The association of relationships with foreign key constraints came from the network data model, which is older than both the relational and entity-relationship models.

Create a compound Key including a lookup value

I have not used access for many years, but need what I thought was a simple DB and struggling at early stage.
I have a table that represents objects that can belong to multiple systems and may have some different attributes between the systems.
So my initial idea was to have a table for Systems, a table for Objects, and a child table for Object Attributes.
My Objects table currently is 3 fields:
Key : Autogenerated
Object Name : Text
System : Lookup to system table
Object Name will not be unique as it can appear in multiple systems, so I want to create a unique compound key of the Object Name & System field. This key will be what joins this table to the child Object Attributes table.
My problem is that the System lookup field does not appear in the available fields to use when trying to create a compound key.
Could someone tell me where I am going wrong?

Decorating an existing relational SQL database with NoSql features

We have a relational database (MySql) with a table that stores "Whatever". This table has many fields that store properties of different (logical and data-) types. The request is that another 150 new, unrelated properties are to be added.
We certainly do not want to add 150 new columns. I see two other options:
Add a simple key-value table (ID, FK_Whatever, Key, Value and maybe Type) where *FK_Whatever* references the Whatever ID and Key would be the name of the property. Querying with JOIN would work.
Add a large text field to the Whatever table and serialize the 150 new properties into it (as Xml, maybe). That would, in a way, be the NoSql way of storing data. Querying those fields would mean implementing some smart full text statements.
Type safety is lost in both cases, but we don't really need that anyway.
I have a feeling that there is a smarter solution to this common problem (we cannot move to a NoSql database for various reasons). Does anyone have a hint?
In an earlier project where we needed to store arbitrary extended attributes for a business object, we created an extended schema as follows:
CREATE TABLE ext_fields
{
systemId INT,
fieldId INT,
dataType INT // represented using an enum at the application layer.
// Other attributes.
}
CREATE TABLE request_ext
{
systemId INT, // Composite Primary Key in the business object table.
requestId INT, // Composite Primary Key in the business object table.
fieldId INT,
boolean_value BIT,
integer_value INT,
double_value REAL,
string_value NVARCHAR(256),
text_value NVARCHAR(MAX),
}
A given record will have only of the _value columns set based on the data type of the field as defined in the ext_fields table. This allowed us to not lose the type of the field and it's value and worked pretty well in utilizing all the filtering methods provided by the DBMS for those data types.
My two cents!

Database Structure for Inconsistent Data

I am creating a database for my company that will store many different types of information. The categories are Brightness, Contrast, Chromaticity, ect. Each category has a number of data points which my company would like to start storing.
Normally, I would create a table for each category which would store the corresponding data. (This is how I learned to do it). However, Sometimes these categories have "sub-data" which would change the number of fields required in each table.
My question is then how do people handle the inconsistency of data when structuring their databases? Do they just keep adding more tables for extra data or is it something else altogether?
There are a few (and thank goodness only a few) unbendable rules about relational database models. One of those is, that if you don't know what to store, you have a hard time storing it. Chances are, you'll have an even harder time retrieving it.
That said, the reality of business rules is often less clear cut than the ivory tower of database design. Most importantly, you might want or even need a way to introduce a new property without changing the schema.
Here are two feasable ways to go at this:
Use a datastore, that specializes in loose or inexistant schemas
(NoSQL and friends). Explaining this in detail is a subject of a CS
Thesis, not a stackoverflow answer.
My recommendation: Use a separate properties table - here is how
this goes:
Assuming for the sake of argument, your products allways have (unique string) name, (integer) id, brightness, contrast, chromaticity plus sometimes (integer) foo and (string) bar, consider these tables
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
brightness INT,
contrast INT,
chromaticity INT,
UNIQUE INDEX(name)
);
CREATE TABLE properties (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
proptype ENUM('null','int','string') NOT NULL default 'null',
UNIQUE INDEX(name)
);
INSERT INTO properties VALUES
(0,'foo','int'),
(0,'bar','string');
CREATE TABLE product_properties (
id INT PRIMARY KEY AUTO_INCREMENT,
products_id INT NOT NULL,
properties_id INT NOT NULL,
intvalue INT NOT NULL,
stringvalue VARCHAR(250) NOT NULL,
UNIQUE INDEX(products_id,properties_id)
);
now your "standard" properties would be in the products table as usual, while the "optional" properties would be stored in a row of product_properties, that references the product id and property id, with the value being in intvalue or stringvalue.
Selecting products including their foo if any would look like
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
AND product_properties.property_id=1
or even
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
LEFT JOIN properties
ON product_properties.property_id=properties.id
WHERE properties.name='foo' OR properties.name IS NULL
Please understand, that this incurs a performance penalty - in fact you trade performance against flexibility: Adding another property is nothing more than INSERTing a row in properties, the schema stays the same.
If you're not mysql bound then other databases have table inheritance or arrays to solve certain of those niche cases. Postgresql is a very nice database that you can use as easily and freely as mysql.
With mysql you could:
change your tables, add the extra columns and allow for NULL in the subcategory data that you don't need. This way integrity can be checked since you can still put constraints on the columns. Unless you really have a lot of subcategory columns this way I'd recommend this, otherwise option 3.
store subcategory data dynamically in a seperate table, that has a category_id,category_row_id,subcategory identifier(=type of subcategory) and a value column: that way you can retrieve your data by linking it via the category_id (determines table) and the category_row_id (links to PK of the original category table row). The bad thing: you can't use foreign keys or constraints properly to enforce integrity, you'd need to write hairy insert/update triggers to still have some control there which would push the burden of integrity checking and referential checking solely on the client. (in which case you'd properly be better of going NoSQL route) In short I wouldn't recommend this.
You can make a seperate subcategory table per category table, columns can be fixed or variable via value column(s) + optional subcategory identifier, foreign keys can still be used, best to maintain integrity is fixed since you'll have the full range of constraints at your disposal. If you have a lot of subcategory columns that would otherwise hopefully clutter your regular subcategory table then I'd recommend using this with fixed columns. Like the previous option I'd never recommend going dynamic for anything but throwaway data.
Alternatively if your subcategory is very variable and volatile: use NoSQL with a document database such as mongodb, mind you that you can keep all your regular data in a proper RDBMS and just storeside-data in the document database though that's probably not recommended.
If your subcategory data is in a known fixed state and not prone to change I'd just add the extra columns to the specific category table. Keep in mind that the major feature of a proper DBMS is safeguarding the integrity of your data via checks and constraints, doing away with that never really is a good idea.
If you are not limited to MySQL, you can consider Microsoft SQL server and using Sparse Columns This will allow you to expand your schema to include however many columns you want, without incurring the storage penalty for columns that are not pertinent for a given row.

How to relate two tables without a foreign key?

Can someone give a demo?
I'm using MySQL,but the idea should be the same!
EDIT
In fact I'm asking what's the difference between Doctrine_Relation and Doctrine_Relation_ForeignKey in doctrine?
I suspect what you are looking at would be to be map columns from one db table to another db table. You can do this using some string comparison algorithm. An algo like Levenstein or Jaro-Winkler distance would let you infer the "matching" columns.
For example, if db1.tableA has a column L_Name and db2.tableB has a column LastName, a string distance match would fetch you one measure. You can extend that by comparing the values in the rows to check if there is some consistency for example if the values in both tables contains: "Smith"s, "Johnson"s etc. you have a double-win.
I recently did something similar, integrating multiple large databases (one of them in a different language - French!) and it turned out to be quite a great experience.
HTH
You should use foreign keys to relate tables in MySQL, because it does not offer other ways to create relationships (such as references or nested tables in an object-oriented database).
See:
http://lists.mysql.com/mysql/206589
EDIT:
If you are willing to use Oracle, references and nested-tables are alternate ways to create relationships between tables. References are more versatile, so here is an example.
Since references are used in object-oriented fashion, you should first create a type and a table to hold objects of that type.
Lets create an object type of employee which has a reference to his manager:
CREATE TYPE employee_type AS OBJECT (
name VARCHAR2(30),
manager REF manager_type
);
We should also create an object type for managers:
CREATE TYPE manager_type AS OBJECT (
name VARCHAR2(30),
);
Now lets create two tables, one for employees and other for managers:
CREATE TABLE employees OF employee_type;
CREATE TABLE managers OF manager_type;
We can relate this tables using references. To insert an employee in employees table, just do this:
INSERT INTO employees
SELECT employee_type('Bob Jones', REF(m))
FROM managers m
WHERE m.name = 'Larry Ellison';
More info: Introduction to Oracle Objects
Well you could get around that by taking care of relationships in a server side language. Some database abstraction layers can handle this for you (such as Zend_Db_Table for PHP) but it is recommended to use foreign keys.
MySQL has InnoDB storage engine that supports foreign keys and also transactions.
Using a foreign key is the standard way of creating a relationship. Alternatives are pretty much nonexistent, as you'd have to identify the related rows SOMEHOW.
A column (or set of columns) which links the two tables IS a foreign key - even if it doesn't have a constraint defined on it (or even an index) and isn't either of the tables' primary key (although in that case you can end up with a weird situation where you can get unintended cartesian products when joining, as you will end up with a set vs set relationship which is probably not what you want)
Not having a foreign key constraint is no barrier to using a foreign key.