Polymorphic database design : does this approach have a name?

Polymorphic database design : does this approach have a name? - sql-server-2008

I have a base enitiy (items) that will host a vast range of item types (>200) with totaly different properties. I want a clean portable and fast solution and have come up with an idea that maby has a name I'm unaware of.
Here it goes:
items-entity holds base class fields + additional fields for subclass fields but with dummie-names, ItemID,ItemNo,ItemTypeID,int1,int2,dec1,dec2,dec3,str1,str2
referenced itemtype-record holds name of type and child enity (1:n):
itemtypefields [itemtypeid,name,type,realfield]
example in [53,MaxPressure,dec,dec3]
It's limitations:
hard to estimate field requirements in baseclass
harder to add domains/checkconstraints based on child type
need application layer to translate tagged sql to real query
Only possible to query one type at a time since shared attributes may be defined to different "real-fields".
3rd bullet explained:
select ItemNo,_MaxPressure_ from items where ItemTypeID=10 and _MaxPressure_>42
should translate to:
select ItemNo,dec3 as MaxPressure from items where ItemType=10 and dec3>42
(can't do that with sp's or udf's right - or whould it be possible?)
But benefits of:
Performance
Ease of CRUD-operations
Easier to sort/filter at application level.
Now - does it have a name?

This antipattern is called One True Lookup Table.
In a relational database, each column needs to be defined as one logical type. I don't mean one SQL data type like INT or VARCHAR, I mean everything in that column from start to finish must be from the same set of values, and you should be able to tell one value apart from another value.
You can't put shoe size and average temperature and threads per inch into the same column of a given table, and still call it a relation.
Basically, your database would not be a database at all -- it would be a spreadsheet.
Read almost any book by C. J. Date, such as SQL and Relational Theory for a proper explanation of relations and types.
Re your comment:
Read the Q again before lecuturing about elementary books and mocking about semi structured data.
Okay, I have re-read your post.
The classic use of One True Lookup Table isn't exactly what you're doing, but what you're doing shares the same problems with OTLT.
Suppose you have "MaxPressure" stored in column dec3 for ItemType 10. Suppose there are a fixed set of valid choices for the value of MaxPressure, and you want to put those in another lookup table, so that no one can enter an invalid MaxPressure value.
Now: declare a foreign key constraint on dec3 referencing your MaxPressures lookup table. You can't -- the problem is that the foreign key constraint applies to the dec3 column in all rows, not just those rows where ItemType is 10.
The reason is that you're storing more than one set of values in a single column. The same problem arises for any other kind of constraint -- unique constraints, check constraints, even NOT NULL. And you can't declare a DEFAULT value for the column either, because you probably have a different correct default for each ItemType (and some ItemTypes have no default for that attribute).
The reason that I referred to the C. J. Date book is that he gives a crisp definition for a type: it's a named finite set, over which the equality operation is defined. That is, you can tell if the value "42" on one row is the same as the value "42" on another row. In a relational column, that must be true because they must come from the same original set of values. In your table, dec3 could have the value "42" when it's MaxPressure, but "42" for another ItemType when it's threads per inch. Therefore they aren't the same value "42". If you had a unique constraint, these two 42's would not be considered duplicates. If you had a foreign key, each of the different 42's would reference a different lookup table, etc.
What you're doing is not a valid relational database design.
Don't bristle at my referring you to a resource on relational database design unless you understand that.

Related

Functional dependency in another table

Lets say there are warehouses each storing items of a specific type.
So there are tables with fields
Warehouse - ID,Name,Type
Item - ID,Name,Type
WarehouseItem - Warehouse, Item
Type - ID, Name
The question is - given that a Warehouse only holds Items with of specific Type, what database normalization rule is this breaking?
Is this database normalized?
(The problem's example is made up, but I basically have this problem in real life.)

I'm making some assumptions from just looking at your metadata without any data examples, but on first glance it appears that your schema for the most part is normalized. Technically speaking your table is 3NF (which should be your target) if it meets all of the following standards:
It is also 1NF - Each entry only contains atomic data (or a single piece of info)
It is also 2NF - No candidate key dependency meaning that when you have have a composite primary key (a key made up of more than one column) that all data is dependent on the entire key
It is 3NF - No transitive dependency meaning all data is only dependent on the primary key and not some other column in the table
Note that there are also higher normalized forms but they are mostly academic as you begin experiencing performance degradation the more you normalize
Given this definition:
Warehouse appears 3NF assuming that each warehouse can only have one Type. If not then you would be failing the transitive dependency and would need to move Type information to a new table.
Item too appears 3NF assuming only one Type can be assigned
Type appears to contain redundant data and should be removed unless of course you have a many-to-many relationship between Type and Warehouse and/or Item. In that case, you would want to introduce a bridge-entity (aka composite entry) between Type and Warehouse or Item to create two 1-to-many relationships.
Lastly, if I'm reading this correctly, WarehouseItem appears to be a bridge-entity between Warehouse and Item to break up the many-to-many relationship between them. If this is correct, you should be able to argue that this table is 3NF assuming the combination of Warehouse and Item represent a composite key.
So assuming I interpreted your schema correctly, once you eliminate the redundant Type table, then yes I would say this setup technically meets 3NF. Note that your requirement that
given that a Warehouse only holds Items with of specific Type
may require you introduce a new type field which will mean you need to reevaluate your normalization of that table. If you have two distinct types (a WarehouseType and an ItemType) then you may need to keep that Type table after all and turn it into a mapping table between those two new fields. But I'd need to see data examples to better evaluate.

Linking Database Tables Standard Practice

I am looking for the standard way to handle the following Database Situation.
Two Database Tables - One called Part, one called Return. In Part we have information about Part Number, Cost, Received Date, etc.
Return is for if that part is being returned to the vendor. It will have Return Tracking Number, Shipped Date, and If Credited.
A Part can only have one Return but may have none if Part is not returned to vendor.
The 3 options I see are:
Put both Part and Return in the same Table but I do not like this idea, table will get too large.
Create a field in the 'Part' Table to reference the Id of the Return record that it is related to. My Concern here is there could possibly be free floating Return records not attached to a Part
Create a field in the Return Table to reference the Id of the Part record it is related to, making the PartId field unique so I cannot duplicate Part Id.
Is there any advantage or disadvantage to using #2 or #3 (or I guess #1 if that is a viable option)?
UPDATE:
I should have mentioned in reality these tables will be much bigger, and in the application I will be viewing Returns and Parts information in seperate views.

Basically you have 2 entities, Part and Number with 1-1 relationship where one entity is optional.
In this case you should create a table for each entity (i.e. 2 tables) and use the PK of Part as a reference in the Return table. That is the standard way to represent relationships of this kind.

solution 3:
with the exception that you do not need a unique constraint on part_id, just make it the PK (which is almost the same)

Integer values for status fields

Often I find myself creating 'status' fields for database tables. I set these up as TINYINT(1) as more than often I only need a handful of status values. I cross-reference these values to array-lookups in my code, an example is as follows:
0 - Pending
1 - Active
2 - Denied
3 - On Hold
This all works very well, except I'm now trying to create better database structures and realise that from a database point of view, these integer values don't actually mean anything.
Now a solution to this may be to create separate tables for statuses - but there could be several status columns across the database and to have separate tables for each status column seems a bit of overkill? (I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me).
Another option is to use the ENUM data type - but there are mixed opinions on this. I see many people not recommending to use ENUM fields.
So what would be the way to go? Do I absolutely need to be putting this data in to its own table?

I think the best approach is to have a single status table for each kind of status. For example, order_status ("placed", "paid", "processing", "completed") is qualitatively different from contact_status ("received", "replied", "resolved"), but the latter might work just as well for customer contacts as for supplier contacts.
This is probably already what you're doing — it's just that your "tables" are in-memory arrays rather than database tables.

As I really agree with "ruakh" on creating another table structured as id statusName which is great. However, I would like to add that for such a table you can still use tinyint(1) for the id field. as tinyint accepts values from 0 to 127 which would cover all status cases you might need.

Can you add (or remove) a status value without changing code?
If yes, then consider a separate lookup table for each status "type". You are already treating this data in a generic way in your code, so you should have a generic data structure for it.
I no, then keep the ENUM (or well-documented integer). You are treating each value in a special way, so there isn't much purpose in trying to generalize the data model.
(I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me
You should never mix several distinct sets of values within the same lookup table (regardless of your "zero issue"). Reasons:
A simple FOREIGN KEY alone won't be able to prevent referencing a value from the wrong set.
All values are forced into the same type, which may not always be desirable.
That's such a common anti-pattern that it even has a name: "one true lookup table".
Instead, keep each lookup "type" within a separate table. That way, FKs work predictably and you can tweak datatypes as necessary.

database storing multiple types of data, but need unique ids globally

a while ago, i asked about how to implement a REST api. i have since made headway with that, but am trying to fit my brain around another idea.
in my api, i will have multiple types of data, such as people, events, news, etc.
now, with REST, everything should have a unique id. this id, i take it, should be unique to the whole system, and not just each type of data.
for instance, there should not be a person with id #1 and a news item with id of #1. ultimately, these two things would be given different ids altogether: person #1 with unique id of #1 and news item #1 with unique id #2, since #1 was taken by a person.
in a database, i know that you can create primary keys that automatically increment. the problem is, usually you have a table for each data "type", and if you set the auto increment for each table individually, you will get "duplicate" ids (yes, the ids are still unique in their own table, but not the whole DB).
is there an easy way to do this? for instance, can all of these tables be set to work off of one incrementer (the only way i could think of how to put it), or would it require creating a table that holds these global ids, and ties them to a table and the unique id in that table?

You could use a GUID, they will be unique everywhere (for all intents and purposes anyway).
http://en.wikipedia.org/wiki/Globally_unique_identifier

+1 for UUIDs (note that GUID is a particular Microsoft implementation of a UUID standard)
There is a built-in function uuid() for generating UUID as text. You may probably prefix it with table name so that you may easily recognize it later.
Each call to uuid() will generate you a fresh new value (as text). So with the above method of prefixing, the INSERT query may look like this:
INSERT INTO my_table VALUES (CONCAT('my_table-', UUID()), ...)
And don't forget to make this column varchar of large enough size and of course create an index for it.

now, with REST, everything should have a unique id. this id, i take
it, should be unique to the whole system, and not just each type of
data.
That's simply not true. Every resource needs to have a unique identifier, yes, but in an HTTP system, for example, that means a unique URI. /people/1 and /news/1 are unique URI's. There is no benefit (and in fact quite a lot of pain, as you are discovering) from constraining the system such that /news/1 has to instead be /news/0983240-2309843-234802/ in order to avoid conflict.

Relational tables design problem

How do you retain historical relational data if rows are changed? In this example, users are allowed to edit the rows in the Property table at any time. Tests can have any number of properties. If they edit the field 'Name' in the Property table, or drop a row in the Property table, Test rows might not hold conditions at the time of the test. Would you change the design of the Test table by adding a property names column, and dropping the TestProperty mapping table? The property names column would have to be something like a delimited list of strings. How is problem usually handled?
3 tables:
Test:
TestId AUTONUMBER,
Name CHAR,
TestDate DATE
Property:
PropertyId AUTONUMBER,
Name CHAR
TestProperty: (maps properties to tests)
TestId
PropertyId

I do not think the question has been answered fully.
If they edit the field 'Name' in the Property table ... Would you change the design of the Test table by adding a property names column, and dropping the TestProperty mapping table?
Definitely not. That would add massive duplication for no purpose.
If your requirement is to maintain the integrity of the data values (in Property) at the time of the Test, the correct (database) method is to implement a History table. That should be an exact copy of the source table, plus one item: a TIMESTAMP or DATETIME column is added to the PK.PropertyHistory
PropertyId AUTONUMBER,
Name CHAR
CONSTRAINT PRIMARY KEY CLUSTERED UC_PK (PropertyId)
PropertyHistory
PropertyId INT,
AuditedDtm DATETIME,
Name CHAR
CONSTRAINT PRIMARY KEY CLUSTERED UC_PK (PropertyId, AuditedDtm)
For this to be meaningful and useable, the Test table needs a timestamp as well, to identify which version of ProperyHistory to reference:TestProperty
TestId
PropertyId
TestDtm DATETIME
The property names column would have to be something like a delimited list of strings.
That would break basic design rules as well as Database Normalisation rules, and prevent you from performing ordinary Relational operations on it. Never store more than one data value in a single column.
... or drop a row in the Property table
Deletion is something different again. If it is a "database" then it has Integrity. Therfore you cannot delete a parent row if it has child rows in some other table (and you can delete it if it does not have children). This is usually implemented as a "soft delete", an Indicator such as IsObsolete is added. This is referenced in the various SELECTS to exclude the row from being used (to add new children) but remains available as the parent for existing children.

If you want to retain property relations, even if the property doesn't exist. Make it so that Properties aren't necessarily deleted, but add a flag that denotes if the property is currently active. If a property's name is changed, create a new property with the new name and set the old property to inactive.
If you do this, you'll have to create some way of garbage collecting the inactive properties.
I'd never make a single column into a field that imitates a one-to-multi relationship with a comma-denoted list. Otherwise, you defeat the purpose of relational database.

Seems like you're using Test as both a template for a particular instance of a test, as well as the test itself. Maybe every time a user performs a test according to the specification in Test, create a row in, say, TestRun? This would preserve the particular Propertys, and if the entries in Property change later, then subsequent TestRuns would reflect the new changes.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008