Super general database structure - mysql

Say I have a store that sells products that fall under various categories... and each category has associated properties... like a drill bit might have coating, diameter, helix angle, or whatever. The issue is that I'd like the user to be able to edit these properties. If I wasn't interested in having the user change the properties, and I was building the store for a certain set of categories, I'd have one table for drill bits, etc. Alternatively, I could just modify the schema online but that doesn't seem to be done very often (unless we're talking phpmyadmin or something), and plus that doesn't fit in well at all with the way models are coupled to tables.
In general, I'm interested in implementing a multi-table database structure with various datatypes (because diameter might be a decimal, coating would be a string/index into a table, etc), within mysql. Any idea how this might be done?

If I understand correctly what you're asking, an, admittedly hacky, solution would be to have a products table that has to related tables, product_properties and product_properties_lookup (or some better name) where product_properties_lookup has an entry for every possible property a product can have and where product_properties contains the value of a property as a string with the ID of the property and the ID of the product. You could then coerce the property value into whatever type you wanted. Not ideal, but I'm not sure what else to do short of adding individual columns to the DB for property types.

Just use the database. It does all of this already. For free. And fast. How is having a table of products point to a table of properties with data types any different from a table with columns? It's not. Save if you use the DBs tables you get to use SQL to query it in all sorts of neat, and efficient ways compared to your own (crosstabs suck in SQL dbs).
Get a new product, make a new table. No big deal. Get a new property, alter the table. If you have 1M products in that table, yea, it may be a slow update (depends on the DB). Do you have 1M products? I don't think WalMart has 1M products.
Building Databases on top of Databases is a silly thing. Just use the one that's there. It is putty in your hands. Mold it to your whim.

Create a Property table first. This will contain all properties. It should have (at minimum) a Name column and a Type column ('string', 'boolean', 'decimal', etc.). Note: Primary keys are implied for all these tables.
Next, create a CategoryProperty table. Here you will be able to assign properties to a category. It should have these columns: CategoryID, PropertyID. Both foreign keys.
Then, create a Category table. This describes the categories. It should have a Name column and possibly some other columns like Description.
Then, create a ProductCategory table. Here, you will assign the categories for each product. It should have these columns: CategoryID, ProductID. Both foreign keys.
Next, create a PropertyValue table. Here, you will "instantiate" the properties and give them values. Columns include ProductID, PropertyID, and PropertyValue. The primary key can consist of ProductID and PropertyID.
Finally, create a Product table that just describes each product with columns like Name, Price, etc.
Note how for each relationship there is a separate table. If you only want one category for each product, you can do away with the ProductCategory table and just put a CategoryID field in the Product table. Similarly, if you want each property to belong to only one category, you can put a PropertyID column in the Category table and get rid of the CategoryProperty table.
Lastly, you will not be able to verify the data type for each property since each property has a different type (and they are rows, not columns). So just make the PropertyValue column a string and then perform your validation either as a trigger, or in your application, by checking the Type column of the Property table for that property.

If you're using a recentish version of mysql (5.1.5 or greater) you can store your data as XML in the database. You can then query that data using thigns like this.
Suppose I have a table that contains some items and I have a widgetpack that contains numerous
widgets. I can get my total number of widgets:
SELECT SUM( EXTRACTVALUE( infoxml, '/info/widget_count/text()' ) ) as widget_count
WHERE product_type="widgetpack"
assuming the table has an infoxml column and each widgetpacks infxml column contain XML that looks like this
<info>
<widget_count>10</widget_count>
<!-- Any other unstructured info can go in here too -->
</info>
DB purists will cringe at this, and it is kinda hacky. But often its easier to keep all your unstructured data in one place.

Have a look at this database schema on DatabaseAnswers.org:
http://www.databaseanswers.org/data_models/products_and_generic_characteristics/index.htm

Maybe consider an Entity-Attribute-Value (EAV) approach (not for the whole model of course!).
Related questions
Entity Attribute Value Database vs. strict Relational Model Ecommerce question
Approach to generic database design
How do you build extensible data model

Related

Best database table design for a table with dependent column values

I would like know the best way of designing a table structure for dependent column values.
If i have a scenario like this this
if the status of the field is alive nothing to do
if the status is died some other column values are stored somehow.
What is the best way to handle this situation
whether to create table containing all columns ie 'Died in the hospital','Cause of death','Date of Death' and 'Please narrate the event' and let it be null when status is alive
or
to use seperate table for storing all the other attributes using Entity-attribute-value (EVA) concepts
in the above scenario signs and symptoms may be single, multiple or others with specification. how to store this .
what is the best way for performance and querying
either to provide 15 columns in single table and store null if no value or to store foreign key of symptoms in another table (in this strategy how to store other symptom description column).
In general, if you know what the columns are, you should include those in the table. So, a table with columns such as: died_in_hospital, cause_of_death, and so on seems like a reasonable solution.
Entity-attribute-value models are useful under two circumstances:
The attributes are not known and new ones are added over time.
The number of attributes is so large and sparsely populated that most columns would be NULL.
In your case, you know the attributes, so you should put them into a table as columns.
Entity-attribute-value models is the best method, it will be helpful in data filtering/searching. Keeping the columns in the base table itself is against Normalization rules.

A more efficient way to store data in MySQL using more than one table

I had one single table that had lots of problems. I was saving data separated by commas in some fields, and afterwards I wasn't able to search them. Then, after search the web and find a lot of solutions, I decided to separate some tables.
That one table I had, became 5 tables.
First table is called agendamentos_diarios, this is the table that I'm gonna be storing the schedules.
Second Table is the table is called tecnicos, and I'm storing the technicians names. Two fields, id (primary key) and the name (varchar).
Third table is called agendamento_tecnico. This is the table (link) I'm goona store the id of the first and the second table. Thats because there are some schedules that are gonna be attended by one or more technicians.
Forth table is called veiculos (vehicles). The id and the name of the vehicle (two fields).
Fith table is the link between the first and the vehicles table. Same thing. I'm gonna store the schedule id and the vehicle id.
I had an image that can explain better than I'm trying to say.
Am I doing it correctly? Is there a better way of storing data to MySQL?
I agree with #Strawberry about the ids, but normally it is the Hibernate mapping type that do this. If you are not using Hibernate to design your tables you should take the ID out from agendamento_tecnico and agendamento_veiculos. That way you garantee the unicity. If you don't wanna do that create a unique key on the FK fields on thoose tables.
I notice that you separate the vehicles table from your technicians. On your model the same vehicle can be in two different schedules at the same time (which doesn't make sense). It will be better if the vehicle was linked on agendamento_tecnico table which will turn to be agendamento_tecnico_veiculo.
Looking to your table I note (i'm brazilian) that you have a column called "servico" which, means service. Your schedule table is designed to only one service. What about on the same schedule you have more than one service? To solve this you can create a table services and create a m-n relationship with schedule. It will be easier to create some reports and have the services well separated on your database.
There is also a nome_cliente field which means the client for that schedule. It would be better if you have a cliente (client) table and link the schedule with an FK.
As said before, there is no right answer. You have to think about your problem and on the possible growing of it. Model a database properly will avoid lot of headache later.
Better is subjective, there's no right answer.
My natural instinct would be to break that schedule table up even more.
Looks like data about the technician and the client is duplicated.
There again you might have made a decisions to de-normalise for perfectly valid reasons.
Doubt you'll find anyone on here who disagrees with you not having comma separated fields though.
Where you call a halt to the changes is dependant on your circumstances now. Comma separated fields caused you an issue, you got rid of them. So what bit of where you are is causing you an issue now?
looks ok, especially if a first try
one comment: I would name PK/FK (ids) the same in all tables and not using 'id' as name (additionaly we use '#' or '_' as end char of primary / foreighn keys: example technicos.technico_ and agendamento_tecnico has fields agend_tech_ and technico_. But this is not common sense. It makes queries a bit more coplex (because you must fully qualify the fields), but make the databse schema mor readable (you know in the moment wich PK belong to wich FK)
other comment: the two assotiative (i never wrote that word before!) tables, joining technos and agendamento_tecnico have an own ID field, but they do not need that, because the two (primary/unique) keys of the two tables they join, are unique them selfes, so you can use them as PK for this tables like:
CREATE TABLE agendamento_tecnico (
technico_ int not null,
agend_tech_ int not null,
primary key(technico_,agend_tech_)
)

Database Normalization - I think?

We have a J2EE content management and e-commerce system, and in this system – for sake of a simple example – let’s say that we have 100 objects. All of these objects extend the same base class, and all share many of the same fields.
Let’s take two objects as an example: a news item that would be posted on a website, and a product that would be sold on a website. Both of these share common properties:
IDs: id, client ID, parent ID (long)
Flags: deleted, archived, inactive (boolean)
Dates: created, modified, deleted (datetime)
Content: name, description
And of course they have some properties that are different:
News item: author, posting date
Product: price, tax
So (finally) here is my question. Let’s say we have 100 objects in our system, and they all follow this pattern. They have many fields that overlap, and some unique fields. In terms of a relational database, would we be better off with:
Option One: Less Tables, Common Tables
table_id: id, client ID, parent ID (long) (id is the primary key, a GUID for all objects)
table_flag: id, deleted, archived, inactive (boolean)
table_date: id, created, modified, deleted (datetime)
table_content: id, name, description
table_news: id, author, posting date
table_product: id, price, tax
Option Two: More Tables, Common Fields Repeated
table_news: id, client ID, parent ID, deleted, archived, inactive, name, description, author, posting date
table_product: id, client ID, parent ID, deleted, archived, inactive, name, description, price, tax
For full disclosure – I am a developer and not a DBA, and because of that I prefer option one. But there is another team member that prefers option two, and I think he makes valid points.
Option One: Pros and Cons
Pro: Encapsulates common fields into common tables.
Pro: Need to change a common field? Change it in one place.
Pro: Only creates new fields/tables when they are needed.
Pro: Easier to create the queries dynamically, less repetitive code
Con: More joining to create objects (not sure of DB impact on that)
Con: More complex queries to store objects (not sure of DB impact on that)
Con: Common tables will become huge over time
Option Two: Pros and Cons
Pro: Perhaps it is better to distribute the load of all objects across tables?
Pro: Could index the news table on the client ID, and index the product table on the parent ID.
Pro: More readable to human eye: easy to see all the fields for an object in one table.
My Two Cents
For me, I much prefer the elegance of the first option – but maybe that is me trying to force object oriented patterns on a relational database. If all things were equal, I would go with option one UNLESS a DB expert told me that when we have millions of objects in the system, option one is going to create a performance problem.
Apologies for the long winded question. I am not great with DB lingo, so I probably could have summarized this more succinctly if I better understood terms like normalization. I tried to search for answers on this topic, and while I found many that were close (I suspect this is a common DB issue) I could not find any that answered all my questions. I read through this article on normalization:
But I did not totally understand it. On the one hand it was saying that you should remove any redundancies. But on the other hand, it was saying that each attribute should define only one object.
Thanks,
John
You should read Patterns of Enterprise Application Architecture by Martin Fowler. He writes about several options for the scenario you describe:
Single Table Inheritance: One table for all object subtypes. Stores all attributes, setting them NULL where they are inapplicable to the row's object subtype.
Class Table Inheritance: One table for column common to all subtypes, then one table for each subtype to store subtype-specific columns.
Concrete Table Inheritance: One table for each subtype, storing both subtype-specific columns and columns common to all subtypes.
Serialized LOB: One table for all object subtypes. Store common attributes as conventional columns, but combine optional or subtype-specific columns as fields in a BLOB that stores XML or JSON or whatever format you want.
Each one of these designs has pros and cons, so choose a solution depending on the most common way you access your data.
However, notice I use the word subtype above. I would use these designs only if the different object types are subtypes of a common base class. I'm assuming that News item and Product don't actually share a logical base class (besides Object); they are not subtypes of a common superclass.
So for the sake of OO design, I would choose Concrete Table Inheritance. This avoids any inappropriate coupling between these subtypes. There are columns the two tables have in common, but they basically amount to bookkeeping, not anything to do with the function of the class and hence the table.

DB design to store different products for each customer order

I'm building a simple way to insert customer orders into the db.
We have several products, each one needs different properties.
I've started designing the following tables:
CUSTOMER -> Order (FK to CUSTOMER) -> OrderItem (FK to Order)
Now I'm thinking How could I link product-specific tables to OrderItem.
Suppose I've two products: product1 (room_name, width, height, color) and product2 (number, width, height, type, optionals). I'd create two different tables and link them with the OrderItem, to get specific options, am I wrong? (of course there will be more than just two products)
How can I do this?
I'd have one Product table with a one-to-many relationship between OrderItem and Product. Put a FOREIGN KEY in the OrderItem table that points to its associated Product.
A design like yours would mean you'd have to add a table every time there was a new product. That would not do. You want to add products by inserting new rows.
No approach can resolve all of the issues you may be dealing with, the choice you make depends on which factor is most important to you.
Most people shirk away from having multiple tables. One reason is that you don't know how many tables you may end up with in the future. Another is that your queries may also bloat by having to join to multiple tables. And it may become a maintenance headache with multiple queries to update every time you add a table. Finally, adding a table is not even remotely as friendly as adding a record (Do you really want your App to be able to create tables?).
One option is just to add more and more fields to the Product table. By making the property fields NULLable, different products can use different fields.
But... You may then need to add logic to ensure that ProductX -always- has a value in FieldA, but that ProductY always has a value in FieldB, etc. And probably some meta-data about each product type so that your application knows which fields to use for which products. You still may need to add new fields, which is possibly tidier than adding new tables, but you still probably don't want the Application doing.
An option that totally avoids using DDL to add a product is to further normalise your data, and have the product-specific-properties in an Entity-Attribute-Value table. This is initially very attractive to many people as it is so generic and flexible.
Product(id, name, another-global-property, etc)
Product_Properties(product_id, property_id, property_value)
You'll probably have some meta-data and extra logic to ensure all the correct properties are used. But now you just add records to a generic structure whenever you create a new product.
But what type should "property value" be? It may need to hold strings, dates, numbers, anything. You could make it a string and use the meta-data to know how to CAST the value. Of you may have several value fields, one of each type, and a "field_type_id" or something to indicate which value-field should be read from.
It's also less friendly for certain searches. If you know a product_id, finding the properties is easy. If you want all products where the expiry date is in the past, you need to be careful about how you structure the data and indexes to make the query efficient. But if you want (expiry < today AND cost > 50) then you get a much different query from what you are used to - Each value is in a different ROW instead of a different FIELD.
Search performance really does begin to shrink as query complexity increases and design considerations become more technical.
Which way you go depends on application functional requirement, architecture and design decisions, and a good helpful dash of 'taste'.
You have tagged question as django. Then you should read this recent post:
Coding an inventory system, with polymorphic items and manageable item types
In this post #ThibaultJ explain how to accomplish this with Django model utils.
The idea is that you have a 'product' model and you inherit product1 and product2 from this model adding specific information for both. #ThibaultJ has posted intesting samples.
I will notice #ThibaultJ about this question. If #ThibaultJ writes an answer I will remove my post.
Here are some options
IMHO I would choose an Inheritance pattern, i.e. a new table called "ProductBase" with a unique Surrogate. Product base would have a classification e.g. "ProductType" which would then allow you to join into the appropriate 'subclass' Product table. OrderItem would reference just the Surrogate. Referential Integrity is enforcable, and it gives the opportunity for extending to additional forms of products. It does however require the use of a common unique surrogate amongst all Product table types. If there are other tables (other than OrderItem) referencing Product, it would also avoid the use of having to FK to composite keys.
Nullable Foreign Keys in OrderItem, i.e. OrderItem would have nullable FK to both (all) types of Product Tables, although only one of them would be present on each row.
By inner joining OrderItem to the appropriate Product tables would eliminate the 'wrong' product joins based on the NULLs. RI can still be enforced.
If you have the SAME type of Primary Key on all your Product subclass tables, then you could also add a single Product "Foreign" Key and a "ProductType" "Switch" on OrderItem. The problem here is that you can't enforce RI.
That said, I really wouldn't be creating a new table for each and every product - surely there are some broad 'categories' of Product which can be modelled in a uniform manner.
No doubt if you sell Aircraft and Groceries that you would probably need a AircraftProduct and a GroceryProduct, but surely A300, Boeing 747 and Cessna Skyhawk would fit as rows inside AircraftProduct, even if there are a few 'optional' nullable fields in each table not applicable to all products in this 'category'?
Edit : First see Dems and Duffmo's posts to see if you can avoid the requirement for having multiple Product tables at all, by using EAV / Multivalue / Metadata patterns to model Product.

MySQL Database - Related Results from same table / Many to Many database design problem

I am designing a relational database of products where there are products that are copies/bootlegs of each other, and I'd like to be able to show that through the system.
So initially during my first draft, I had a field in products called " isacopyof " thinking to just list a comma delimited list of productIDs that are copies of the current product.
Obviously once I started implementing, that wasn't going to work out.
So far, most many-to-many relationship solutions revolve around an associative table listing related id from table A and related id from table B. That works, but my situation involves related items from the SAME table of products...
How can I build a solution around that ? Or maybe I am thinking in the wrong direction ?
You're overthinking.
If you have a products table with a productid key, you can have a clones table with productid1 and productid2 fields mapping from products to products and a multi-key on both fields. No issue, and it's still 3NF.
Because something is a copy, that means you have a parent and child relationship... Hierarchical data.
You're on the right track for the data you want to model. Rather than have a separate table to hold the relationship, you can add a column to the existing table to hold the parent_id value--the primary key value indicating the parent to the current record. This is an excellent read about handling hierarchical data in MySQL...
Sadly, MySQL doesn't have hierarchical query syntax, which for things like these I highly recommend looking at those that do:
PostgreSQL (free)
SQL Server (Express is free)
Oracle (Express is also free)
There's no reason you can't have links to the same product table in your 'links' table.
There are a few ways to do this, but a basic design might simply be 2 columns:
ProductID1, ProductID2
Where both these columns link back to ProductID in your product table. If you know which is the 'real' product and which is the copy, you might have logic/constraints which place the 'real' productID in ProductID1 and the 'copy' productID in ProductID2.