Relational database design - relational-database

I'm trying to understand entities, tables and foreign keys. I have the following:-
AnObject - I have identified this as an entity type.
ID (Primary Key)
Description
State
DependsOn
Creator
Now State has only two values it can be [Alive, Dead]. However it could possibly have another in the future. It can however only be one or the other but it will likely change between the two.
Question:
Should State be its own entity type? Would it be an entity type or
just a table? Should State have a foreign key to AnObject or vice
versa? EG
State
ID (PK)
Description
AnObject_ID (Foreign Key references AnObject)
Question: The DependsOn attribute of AnObject can have multiple values of other AnObject entity types. Obviously a field cannot have multiple values but I'm not sure how to model this?
The Creator attribute of AnObject also takes up a strict number of values [Fred, Jim, Dean]. Should I have an entity type (table) for a Creator with a foreign key to AnObject ID? So, A Creator can create, 0, 1, m AnObjects but AnObject can only have one creator?
Thanks,

State could just be an enum field, unless you need users to be able to add other State values via a user interface, in which case you could use a lookup table (one-to-many relationship) as you suggested. I don't know what database you're using, but here's some info on the enum type in MySQL: http://dev.mysql.com/doc/refman/5.6/en/enum.html.
If you use a lookup table, then AnObject should have a field called StateID that points to the desired row in the State table.
It sounds like DependsOn is a many-to-many relationship. For that you will need a join table, e.g.:
Table: Dependencies
Primary key (called a "composite key" because it's made up of more than one field):
AnObjectParentID
AnObjectChildID
I've assumed that the dependencies are needed for a parent-child relationship but if that's not the case you might want to name the table or fields differently.

You can add extra tables for enumeration values with a foreign key from AnObject to it. State will probably be best represented as a single field of type varchar not null. You can have the primary key for a table be a varchar field - they don't have to be int type.
This will constrain the values but allow you to use reasonable syntax to query the thing (i.e. WHERE state = 'Alive' (although in this case I think you're prematurely abstracting things - I'd keep it simple and just have a simple bool column IsDead).
DependsOn is a one-way attribute (you presumably can't have A depend on B and also B depend on A). The real issue here is how you're intending to query these items and how many of them there will be. If you want to pull out the whole chain of dependencies at once and the chain is long, you want to avoid doing hundreds of individual queries to do that. What is your use case?

Related

What would normalized ERD (Entity-Relationship Diagram) look like for this table

I got stuck on one project of mine. My table looks sort of like this but I think I'm just going circles and going crazy.
The issue is on how to "separate" attributes under the categories. Do I need to make entities for each category? Then, how would we declare keys and which table will be adopting a foreign key?
Or, alternatively, there is no point in normalizing this?
I was also thinking about somehow enumerating the attributes to make categories into attributes?? Is this even a thing...
Appreciating any suggestions!
There's only one way that I see to normalize this table. Each line is an entity.
Entity
------
Entity ID
Entity Letter
Entity Name
Entity Name Type
Where Entity ID is the primary clustering key and you have a unique index on (Entity Letter, Entity Name, Entity Name Type).
Then you have an attribute Table to hold one attribute. There's a one-to-many relationship between an entity and an attribute.
Attribute
---------
Attribute ID
Entity ID
Category (1 or 2)
Level (x1 - x4, x1 - x8)
Attribute Value
Where the Attribute ID is the primary clustering key, and Entity ID is the foreign key pointing back to the entity. You have a unique index on (Entity ID, Category, Level) to order the attributes.
You can break this down further by creating a Category table and / or a Level table, but I think this is a sufficient breakdown.
I'm not sure whether the x1 in category 2 is a typo or deliberate. Either way, it's modled.

What is adding a new relationship to an existing table called?

In database terms, when i add a new foreign key, insert a record for that foreign key and update the existing record, what is the process called? My goal is to be able to find answers more effectively.
//create temporary linking key
alter table example add column example_foreign_key int unsigned null;
//contains more fields
insert into example_referenced_table (example_id, ...)
select id, ...
from example
join ...;
//link with the table
update example join example_referenced_table on example_id = example.id
set example.example_foreign_key = example_referenced_table.id;
//drop linking key
alter table example_referenced_table drop column example_id;
It looks like you're substituting one surrogate identifier for another. Introducing a surrogate key is sometimes (incorrectly) called normalization, so you may get some hits on that term.
In rare cases, normalization requires the introduction of a surrogate key, but in most cases, it simply decomposes a relation (table) into two or more, in such a way that no information is lost.
Surrogate keys are generally used when a natural or candidate key doesn't exist, isn't convenient, or not supported (e.g. composite keys are often a problem for object-relational mappers). For criteria on picking a good primary key, see: What are the design criteria for primary keys
There's little value in substituting one surrogate identifier for another, so the procedure you demonstrate has no proper name as far as I know, at least in the relational model.
If you mean to introduce a surrogate key as an identifier of a new entity set to which the original attribute is transferred, that's close to what Peter Chen called shifting a value set from the lower conceptual domain to the upper conceptual domain. You can find more information in his paper "The entity-relationship model - A basis for the enterprise view of data".
As for your question's title, it's not wrong to say that you're adding a relationship to a table (though that wording mixes conceptual and physical terms), but note that in the entity-relationship model, a relationship is represented by a two or more entity keys in a table (e.g. (id, example_foreign_key) in the example table) and not by a foreign key constraint between tables. The association of relationships with foreign key constraints came from the network data model, which is older than both the relational and entity-relationship models.

MySQL DB: Nullable bool in database column, enum, or two boolean columns, which one is more efficient?

I am doing EF 6 code first with MVC 5. One of my classes has a property that can mean three things:
Confirmed by user
Has not answered yet
Declined by user
My question is, what should I use?
A nullable bool, obviously mapped to the choices above
An enum (the column would store an integer as a foreign key to another table listing the states)
Or two bool columns (HasAnswered, IsConfirmed) where IsConfirmed only gets accessed if the user has answered
I am very thankful for every opinion you might have.
Disclaimer.. I thought you were suggesting an enum as a column datatype
None of the above.
An enum.. what if want to add more data to each status or rename one?
Nullable bool .. as above, plus what if you need to add another status?
Two bool columns.. same as above, plus you could introduce normalisation issues.
I'd go with a tiny_int called status (or something similar).
Mainly for flexibility.. if you need the status titles in the DB or need to add any other data to each option you can place it in another table and foreign key it in. If you don't you just need to translate the numbers somewhere.
Another option is to separate out the actions (the answer) from the state.
Consider an answer table with a column indicating confirmation or denial and a reference to the original table. By joining the tables it is possible to separate out the original rows that have no answers and those that have been confirmed/denied.
UPDATE
In response to Null's comment, an int is so much more powerful..
A status database table:
id, title, description, priority, cost... (attach extra data to the status)
Or in PHP pseudo code (without a table):
$query = new Query('SELECT * FROM table WHERE status = :status');
$query->bind('status'=>TableClass::STATUS_ANSWERED)
I wasn't suggesting you translate the numbers everywhere.. just somewhere.
The final great thing about an int is it separates meaning from the workings.. what if I want to rename one of the enums? I change one row with a foreign key, or potentially millions with an enum.

Database Design - Custom attributes table - Table that "relate" entities

I'm designing a database (for use in mysql) that permits new user-defined attributes to an entity called nodes.
To accomplish this I have created 2 other tables. One customvars table that holds all custom attributes and a *nodes_customvars* that define the relationship between nodes and customvars creating a 1..n and n..1 relationship.
Here is he link to the drawed model: Sketched database model
So far so good... But I'm not able to properly handle INSERTs and UPDATEs using separate IDs for each table.
For example, if I have a custom attribute called color in the *nodes_customvars* table inserted for a specific node, if I try to "INSERT ... ON DUPLICATE KEY UPDATE" either it will always insert or always update.
I've thinked on remove the "ID" field from the *nodes_customvars* tables and make it a composite key using nodes id and customvars id, but I'm not sure if this is the best solution...
I've read this article, and the comments, as well: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
What is the best solution to this?
EDIT:
Complementing: I don't know the *nodes_customvars* id, only nodes id and customvars id. Analysing the *nodes_customvars* table:
1- If I make nodes id and/or customvars id UNIQUE in this table, using "INSERT ... ON DUPLICATE KEY UPDATE" will always UPDATE. Since that multiple nodes can share the same customvar, this is wrong;
2- If I don't make any UNIQUE key, "INSERT ... ON DUPLICATE KEY UPDATE" will always INSERT, since that no UNIQUE key is already found in the statement...
You have two options for solving your specific problem of the "INSERT...ON DUPLICATE KEY" either always inserting or updating as you describe.
Change the primary to be a composite key using nodeId and customvarId (as suggested by SyntaxGoonoo and in your question as a possible option).
Add a composite unique index using nodeId and customvarId.
CREATE UNIQUE INDEX IX_NODES_CUSTOMVARS ON NODES_CUSTOMVARS(nodeId, customvarId);
Both of the options would allow for the "INSERT...ON DUPLICATE KEY" functionality to work as you require (INSERT if a unique combination of nodeId and customvarId doesn't exist; update if it does).
As for the question about whether to have a composite primary key or a separate primary key column with an additional unique index, there are many things to consider in the design. There's the 1NF considerations and the physical characteristics of the database platform you're on and the preference of the ORM you happen to be using (if any). Given how InnoDB secondary indexes work (see last paragraph at: http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html), I would suggest that you keep the design as you currently have it and add in the additional unique index.
HTH,
-Dipin
You current entity design breaks 1NF. This means that your schema can erroneously store duplicate data.
nodes_customvars describes the many-to-many relationship between nodes and customvars. This type of table is sometimes referred to as an auxiliary table, because its contents are purely derived from base tables (in this case nodes and customvars).
The PK for an auxiliary table describing a many-to-many relationship should be a composite key in order to prevent duplication. Basically 1NF.
Any PK on a table is inherently UNIQUE. regardless of whether it is a single, or composite key. So in some ways your question doesn't make sense, because you are talking about turning the UNIQUE constraint on/off on id for nodes and customvars . Which you can't do if your id is actually a PK.
So what are you actually trying to achieve here???

Enum datatype versus table of data in MySQL?

I have one MySQL table, users, with the following columns:
user_id (PK)
email
name
password
To manage a roles system, would there be a downside to either of the following options?
Option 1:
Create a second table called roles with three columns: role_id (Primary key), name, and description, then associate users.user_id with roles.role_id as foreign keys in a third table called users_roles?
Or...
Option 2:
Create a second table called roles with two columns: user_id (Foreign key from users.user_id) and role (ENUM)? The ENUM datatype column would allow for a short list of allowable roles to be inserted as values.
I've never used the ENUM datatype in MySQL before, so I'm just curious, as option 2 would mean one less table. I hope that makes sense, this is the first time I've attempted to describe MySQL tables in a forum.
In general, ENUM types are not meant to be used in these situations. This is especially the case if you intend to cater for the flexibility of adding or removing roles in the future. The only way to change the values of an ENUM is with an ALTER TABLE, while defining the roles in their own table will simply require a new row in the roles table.
In addition, using the roles table allows you to add additional columns to better define the role, like the description field you suggested in Option 1. This is not possible if you were to use an ENUM type as in Option 2.
Personally I would not opt for an ENUM in these scenarios. Maybe I can see them being used for columns with an absolutely finite set of values, such as {Spades, Hearts, Diamonds, Clubs} to define the suit of a card, but not in cases such as the one in question, for the disadvantages mentioned earlier.
Using ENUM for the case You suggested only makes sense when You have a strictly definded ORM on the receiving end that for istance maps db rows into a list of flat objects automatically.
Example:
table animal( ENUM('reptiles','mamals') Category, (varchar 50)Name );
is automatically maped to
object animal
animal->Category
animal->Name