database nomenclature for abstracting an instance of a class of data - mysql

TABLE FIELDS
users user_id
items item_id
item_instance item_inst_id
item_inst_qty
user_id
item_id
The first table has all the users, and the second table has all the items.
The third table records the quantity of a specific item which a user has. It is not recording data about the item itself, but rather the data about the relationship between the item and user.
My question is what is the third type of table generally referred to as? The most accurate abstraction I could think of is "instance," as taken from OOP, because each data record represents an occurrance of a particular class of data, which in this case is the "item" class. Also, "case" was another possiblity.
In database programing, is there a generally accepted term for a table which has records that tie two tables together in this manner? Or is the naming convention usually up to the programmer? If the latter, what is your take?

Related

Database Hierarchy Structure - Different Node Representation

I am looking for some feedback/guidance on modeling a hierarchy structure within a relational database. My requirement states that I need to have a tree structure, where every node within the tree can represent a different type of data. For example:
Organization
Department 1
Employee 1
Employee 2
Office Equipment 1
Office Equipment 2
Department 1
Team 1
Office Equipment 3
In the example above, Organization, Department, Employee, Office Equipment, and Team could all be different tables within the database and have different properties associated with them. Additionally, things like Office Equipment may not necessarily be required to be associated to a department - it could be associated to a Team or the Organization.
I have two ideas surrounding modeling this:
The first idea is to have a hierarchy table like below:
hierarchys
hierarchy_id (INT, NOT NULL)
parent_hierarchy_id (INT, NOT NULL)
organization_id (INT, NULL)
department_id (INT, NULL)
team_id (INT, NULL)
office_equipment (INT, NULL)
In the table above, each of the columns would be a nullable field with a foreign key reference to their respectable table. The idea would be that only one column from every row would be populated.
My second idea is to have a single table like below:
hierarchys
hierarchy_id (INT, NOT NULL)
parent_hierarchy_id (INT, NOT NULL)
type (INT, NOT NULL)
In this case, the table above would manage the hierarchy structure, and each "node table" would have a hierarchy_id which would have a foreign key reference back to the hierarchy table (i.e. organizations would have a hierachy_id column). The type column would be a lookup to represent which type node is being represented (i.e. Organization, Employee, etc).
I see pros and cons in both approaches.
Some additional information:
I would like to keep in mind maintainability of this table - there will be additions, deletions, changes, etc.
I will have to display this data on an user interface, which will likely just display an icon to represent the node type, and the name.
I will have to preform some aggregations across the tree for different data requests.
This structure will be backed by a MySQL database.
Does anyone have an experience with a similar scenario? I have searched quite a bit for information and guidance on this approach, but have not been able to find any information. I have a feeling there is a specific term for what I am looking for that I am failing to use.
Thank you in advance for the community's help.
You may want to look into "nested sets". This is a model for representing subsets of an ordered set by two limits, which we can call "left" and "right". In this model, (6,7) is a subset of (5,10) because it is "nested" inside of it. If you use nested sets together with your design of having a separate table for the hierarchy, you'll end up with four columns in your hierarchy table: leftID, rightID, ObjectID (an FK), and level.
There is a good description of the nested set model in Wikipedia, which you can view by clicking here.
I have encountered similar situations throughout different projects, and the approach I've taken in those cases was very similar to your second solution.
I am also a bit biased towards how some Ruby on Rails gems do things, but you can easily figure out how you would implement these techniques with plain SQL and some application logic. So I'm giving you one alternative to your solution:
Using "Multi Table Inheritance" (Implemented in Heritage: https://github.com/dipth/Heritage). In this scenario you would have a Node table which forms the basis of your hierarchy with:
Node (id, parent_node_id, heir_type, heir_id)
Where the heir_type is the name of the table holding the details for the node (e.g., Organization, Employee, team, etc.), and the heir_id is the id of the object in that table.
Then each type of node would have it's own table and it's own unique id. e.g.:
Organization(id, name, address)
Having the rest of the tables independently from the hierarchy (i.e., strong entities) makes your model more flexible to new additions. Also having a separate table with its own unique id to handle the hierarchy makes it easier to render the hierarchy without having to deal with parent types etc. This model is also more flexible in the sense that one entity can be part of many different branches of the hierarchy (e.g., Employee 1 could be a member of Team 1 and Team 2 at the same time.)
Your solution has one mistake: The hierarchys is miss-spelled :P JK. The hierarchys table has no unique id. It looks like the unique id is a composite key (hierarchy_id, type). The parent_hierarchy_id does not capture the type of the parent and thus it may point to multiple nodes and many inconsistencies.
If you'd like me to elaborate more, let me know.

Connecting Two Items in a Database - Best method?

In a MySQL Database, I have two tables: Users and Items
The idea is that Users can create as many Items as they want, each with unique IDs, and they will be connected so that I can display all of the Items from a particular user.
Which is the better method in terms of performance and clarity? Is there even a real difference?
Each User will contain a column with a list of Item IDs, and the query will retrieve all matching Item rows.
Each Item will contain a column with the User's ID that created it, and the query will call for all Items with a specific User ID.
Let me just clarify why approach 2 is superior...
The approach 1 means you'd be packing several distinct pieces of information within the same database field. That violates the principle of atomicity and therefore the 1NF. As a consequence:
Indexing won't work (bad for performance).
FOREIGN KEYs and type safety won't work (bad for data integrity).
Indeed, the approach 2 is the standard way for representing such "one to many" relationship.
2nd approach is better, because it defines one-to-many relationship on USER to ITEM table.
You can create foreign key on ITEM table on USERID columns which refers to USERID column in USER table.
You can easily join both tables and index also be used for that query.
As long as an item doesn't have multiple owners it's a one to many relationship. This typically gets reduced to the second approach you mention, eg. have a user or created_by column in the Items table.
If a User can have one or more Items but each Item is owned by only a single User, then you have a classic One-To-Many relationship.
The first option, cramming a list of related IDs into a single field, is exactly the wrong way to do it.
Assign a unique identifier field to each table (called the primary key). And add an extra field to the Item table, a foreign key, the id of the User that owns that item.
Like this ERD (entity-relationship diagram)…
You have some learning to do about relational database design and normalization.

Database Normalization - I think?

We have a J2EE content management and e-commerce system, and in this system – for sake of a simple example – let’s say that we have 100 objects. All of these objects extend the same base class, and all share many of the same fields.
Let’s take two objects as an example: a news item that would be posted on a website, and a product that would be sold on a website. Both of these share common properties:
IDs: id, client ID, parent ID (long)
Flags: deleted, archived, inactive (boolean)
Dates: created, modified, deleted (datetime)
Content: name, description
And of course they have some properties that are different:
News item: author, posting date
Product: price, tax
So (finally) here is my question. Let’s say we have 100 objects in our system, and they all follow this pattern. They have many fields that overlap, and some unique fields. In terms of a relational database, would we be better off with:
Option One: Less Tables, Common Tables
table_id: id, client ID, parent ID (long) (id is the primary key, a GUID for all objects)
table_flag: id, deleted, archived, inactive (boolean)
table_date: id, created, modified, deleted (datetime)
table_content: id, name, description
table_news: id, author, posting date
table_product: id, price, tax
Option Two: More Tables, Common Fields Repeated
table_news: id, client ID, parent ID, deleted, archived, inactive, name, description, author, posting date
table_product: id, client ID, parent ID, deleted, archived, inactive, name, description, price, tax
For full disclosure – I am a developer and not a DBA, and because of that I prefer option one. But there is another team member that prefers option two, and I think he makes valid points.
Option One: Pros and Cons
Pro: Encapsulates common fields into common tables.
Pro: Need to change a common field? Change it in one place.
Pro: Only creates new fields/tables when they are needed.
Pro: Easier to create the queries dynamically, less repetitive code
Con: More joining to create objects (not sure of DB impact on that)
Con: More complex queries to store objects (not sure of DB impact on that)
Con: Common tables will become huge over time
Option Two: Pros and Cons
Pro: Perhaps it is better to distribute the load of all objects across tables?
Pro: Could index the news table on the client ID, and index the product table on the parent ID.
Pro: More readable to human eye: easy to see all the fields for an object in one table.
My Two Cents
For me, I much prefer the elegance of the first option – but maybe that is me trying to force object oriented patterns on a relational database. If all things were equal, I would go with option one UNLESS a DB expert told me that when we have millions of objects in the system, option one is going to create a performance problem.
Apologies for the long winded question. I am not great with DB lingo, so I probably could have summarized this more succinctly if I better understood terms like normalization. I tried to search for answers on this topic, and while I found many that were close (I suspect this is a common DB issue) I could not find any that answered all my questions. I read through this article on normalization:
But I did not totally understand it. On the one hand it was saying that you should remove any redundancies. But on the other hand, it was saying that each attribute should define only one object.
Thanks,
John
You should read Patterns of Enterprise Application Architecture by Martin Fowler. He writes about several options for the scenario you describe:
Single Table Inheritance: One table for all object subtypes. Stores all attributes, setting them NULL where they are inapplicable to the row's object subtype.
Class Table Inheritance: One table for column common to all subtypes, then one table for each subtype to store subtype-specific columns.
Concrete Table Inheritance: One table for each subtype, storing both subtype-specific columns and columns common to all subtypes.
Serialized LOB: One table for all object subtypes. Store common attributes as conventional columns, but combine optional or subtype-specific columns as fields in a BLOB that stores XML or JSON or whatever format you want.
Each one of these designs has pros and cons, so choose a solution depending on the most common way you access your data.
However, notice I use the word subtype above. I would use these designs only if the different object types are subtypes of a common base class. I'm assuming that News item and Product don't actually share a logical base class (besides Object); they are not subtypes of a common superclass.
So for the sake of OO design, I would choose Concrete Table Inheritance. This avoids any inappropriate coupling between these subtypes. There are columns the two tables have in common, but they basically amount to bookkeeping, not anything to do with the function of the class and hence the table.

Database design for user entries (using mysql)

The main pieces of data I'm having my users enter is an object called an "activity" which consists of a few text fields, a few strings, etc. One of the text fields, called a "Description", could possibly be quite long (such as a long blog post). For each user I would like to store all of their activity objects in a mysql database.
Here are some solutions I've thought of:
Have a separate mysql table for each user's activities, i.e. activities_userX, X ranging over
Use json to encode these objects into strings and store them as a column in the main table
Have one separate table for all these activities objects, and just index them; then for each user in the main table have a list of indices corresponding to which activities are theirs.
What are the pros/cons of these methods? More importantly, what else could I be doing?
Thanks.
Have a separate mysql table for each user's activities, i.e. activities_userX, X ranging over
A table for every user? That just means an insane number of tables.
Use json to encode these objects into strings and store them as a column in the main table
JSON is a good transport language. You have a database for storing your data, use its features.
Have one separate table for all these activities objects, and just index them; then for each user in the main table have a list of indices corresponding to which activities are theirs.
Getting closer.
This sort of relationship is usually known as 'has many'. In this case "A user has many activities".
You should have a table of users and a table of activities.
One of the columns of the activities table should be a foreign key that points to the primary key of the user table.
Then you will be able to do:
SELECT fields, i, want from activities WHERE userid=?
Or
SELECT users.foo, users.bar, activities.description from users,activities
WHERE user.userid=activities.userid

Super general database structure

Say I have a store that sells products that fall under various categories... and each category has associated properties... like a drill bit might have coating, diameter, helix angle, or whatever. The issue is that I'd like the user to be able to edit these properties. If I wasn't interested in having the user change the properties, and I was building the store for a certain set of categories, I'd have one table for drill bits, etc. Alternatively, I could just modify the schema online but that doesn't seem to be done very often (unless we're talking phpmyadmin or something), and plus that doesn't fit in well at all with the way models are coupled to tables.
In general, I'm interested in implementing a multi-table database structure with various datatypes (because diameter might be a decimal, coating would be a string/index into a table, etc), within mysql. Any idea how this might be done?
If I understand correctly what you're asking, an, admittedly hacky, solution would be to have a products table that has to related tables, product_properties and product_properties_lookup (or some better name) where product_properties_lookup has an entry for every possible property a product can have and where product_properties contains the value of a property as a string with the ID of the property and the ID of the product. You could then coerce the property value into whatever type you wanted. Not ideal, but I'm not sure what else to do short of adding individual columns to the DB for property types.
Just use the database. It does all of this already. For free. And fast. How is having a table of products point to a table of properties with data types any different from a table with columns? It's not. Save if you use the DBs tables you get to use SQL to query it in all sorts of neat, and efficient ways compared to your own (crosstabs suck in SQL dbs).
Get a new product, make a new table. No big deal. Get a new property, alter the table. If you have 1M products in that table, yea, it may be a slow update (depends on the DB). Do you have 1M products? I don't think WalMart has 1M products.
Building Databases on top of Databases is a silly thing. Just use the one that's there. It is putty in your hands. Mold it to your whim.
Create a Property table first. This will contain all properties. It should have (at minimum) a Name column and a Type column ('string', 'boolean', 'decimal', etc.). Note: Primary keys are implied for all these tables.
Next, create a CategoryProperty table. Here you will be able to assign properties to a category. It should have these columns: CategoryID, PropertyID. Both foreign keys.
Then, create a Category table. This describes the categories. It should have a Name column and possibly some other columns like Description.
Then, create a ProductCategory table. Here, you will assign the categories for each product. It should have these columns: CategoryID, ProductID. Both foreign keys.
Next, create a PropertyValue table. Here, you will "instantiate" the properties and give them values. Columns include ProductID, PropertyID, and PropertyValue. The primary key can consist of ProductID and PropertyID.
Finally, create a Product table that just describes each product with columns like Name, Price, etc.
Note how for each relationship there is a separate table. If you only want one category for each product, you can do away with the ProductCategory table and just put a CategoryID field in the Product table. Similarly, if you want each property to belong to only one category, you can put a PropertyID column in the Category table and get rid of the CategoryProperty table.
Lastly, you will not be able to verify the data type for each property since each property has a different type (and they are rows, not columns). So just make the PropertyValue column a string and then perform your validation either as a trigger, or in your application, by checking the Type column of the Property table for that property.
If you're using a recentish version of mysql (5.1.5 or greater) you can store your data as XML in the database. You can then query that data using thigns like this.
Suppose I have a table that contains some items and I have a widgetpack that contains numerous
widgets. I can get my total number of widgets:
SELECT SUM( EXTRACTVALUE( infoxml, '/info/widget_count/text()' ) ) as widget_count
WHERE product_type="widgetpack"
assuming the table has an infoxml column and each widgetpacks infxml column contain XML that looks like this
<info>
<widget_count>10</widget_count>
<!-- Any other unstructured info can go in here too -->
</info>
DB purists will cringe at this, and it is kinda hacky. But often its easier to keep all your unstructured data in one place.
Have a look at this database schema on DatabaseAnswers.org:
http://www.databaseanswers.org/data_models/products_and_generic_characteristics/index.htm
Maybe consider an Entity-Attribute-Value (EAV) approach (not for the whole model of course!).
Related questions
Entity Attribute Value Database vs. strict Relational Model Ecommerce question
Approach to generic database design
How do you build extensible data model