Database issue: 2 tables with identical structure because of the quality of the data

Database issue: 2 tables with identical structure because of the quality of the data - relational-database

I have a database with one table where I store two different types of data.
I store a Quote and a Booking in a unique table named Booking.
First, I thought that a quote and a booking is the same since they had the same fields.
But then a quote is not related to a user where a booking is.
We have a lot of quotes in our database which pollutes the table booking with less important data.
I guess it makes sense to have two different tables so they can also evolve independently.
Quote
Booking
The objective is to split the data into junk data (quote) and the actual data (booking).
Does it make sense in the relational-database theory?

I'd start by looking for the domain model to tie this to - is a "quote" the same logical thing as a "booking"? Quotes typically have a different lifecycle to bookings, and bookings typically represent financial commitments. The fact they share some attributes is a hint that they are similar domain concepts, but it's not conclusive. Cars and goldfish share some attributes - age, location, colour - but it's hard to think of them as "similar concepts" at any fundamental level.
In database design, it's best to try to represent the business domain as far as is possible. It makes your code easy to understand, which makes it less likely you'll introduce bugs. It often makes the code simpler, too, which may make it faster.
If you decide they are related in the domain model, it may be a case of trying to model an inheritance hierarchy in the relational database. This question discusses this extensively.

Related

Giving user the ability to create variables and store them in db [duplicate]

I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?

You have at least these five options for modeling the type hierarchy you describe:
Single Table Inheritance: one table for all Product types, with enough columns to store all attributes of all types. This means a lot of columns, most of which are NULL on any given row.
Class Table Inheritance: one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
Concrete Table Inheritance: no table for common Products attributes. Instead, one table per product type, storing both common product attributes, and product-specific attributes.
Serialized LOB: One table for Products, storing attributes common to all product types. One extra column stores a BLOB of semi-structured data, in XML, YAML, JSON, or some other format. This BLOB allows you to store the attributes specific to each product type. You can use fancy Design Patterns to describe this, such as Facade and Memento. But regardless you have a blob of attributes that can't be easily queried within SQL; you have to fetch the whole blob back to the application and sort it out there.
Entity-Attribute-Value: One table for Products, and one table that pivots attributes to rows, instead of columns. EAV is not a valid design with respect to the relational paradigm, but many people use it anyway. This is the "Properties Pattern" mentioned by another answer. See other questions with the eav tag on StackOverflow for some of the pitfalls.
I have written more about this in a presentation, Extensible Data Modeling.
Additional thoughts about EAV: Although many people seem to favor EAV, I don't. It seems like the most flexible solution, and therefore the best. However, keep in mind the adage TANSTAAFL. Here are some of the disadvantages of EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
The degree of flexibility EAV gives you requires sacrifices in other areas, probably making your code as complex (or worse) than it would have been to solve the original problem in a more conventional way.
And in most cases, it's unnecessary to have that degree of flexibility. In the OP's question about product types, it's much simpler to create a table per product type for product-specific attributes, so you have some consistent structure enforced at least for entries of the same product type.
I'd use EAV only if every row must be permitted to potentially have a distinct set of attributes. When you have a finite set of product types, EAV is overkill. Class Table Inheritance would be my first choice.
Update 2019: The more I see people using JSON as a solution for the "many custom attributes" problem, the less I like that solution. It makes queries too complex, even when using special JSON functions to support them. It takes a lot more storage space to store JSON documents, versus storing in normal rows and columns.
Basically, none of these solutions are easy or efficient in a relational database. The whole idea of having "variable attributes" is fundamentally at odds with relational theory.
What it comes down to is that you have to choose one of the solutions based on which is the least bad for your app. Therefore you need to know how you're going to query the data before you choose a database design. There's no way to choose one solution that is "best" because any of the solutions might be best for a given application.

#StoneHeart
I would go here with EAV and MVC all the way.
#Bill Karvin
Here are some of the disadvantages of
EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g.
for a lookup table.
All those things that you have mentioned here:
data validation
attribute names spelling validation
mandatory columns/fields
handling the destruction of dependent attributes
in my opinion don't belong in a database at all because none of databases are capable of handling those interactions and requirements on a proper level as a programming language of an application does.
In my opinion using a database in this way is like using a rock to hammer a nail. You can do it with a rock but aren't you suppose to use a hammer which is more precise and specifically designed for this sort of activity ?
Fetching results in a conventional tabular layout is complex and
expensive, because to get attributes
from multiple rows you need to do JOIN
for each attribute.
This problem can be solved by making few queries on partial data and processing them into tabular layout with your application. Even if you have 600GB of product data you can process it in batches if you require data from every single row in this table.
Going further If you would like to improve the performance of the queries you can select certain operations like for e.g. reporting or global text search and prepare for them index tables which would store required data and would be regenerated periodically, lets say every 30 minutes.
You don't even need to be concerned with the cost of extra data storage because it gets cheaper and cheaper every day.
If you would still be concerned with performance of operations done by the application, you can always use Erlang, C++, Go Language to pre-process the data and later on just process the optimised data further in your main app.

If I use Class Table Inheritance meaning:
one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
-Bill Karwin
Which I like the best of Bill Karwin's Suggestions.. I can kind of foresee one drawback, which I will try to explain how to keep from becoming a problem.
What contingency plan should I have in place when an attribute that is only common to 1 type, then becomes common to 2, then 3, etc?
For example: (this is just an example, not my real issue)
If we sell furniture, we might sell chairs, lamps, sofas, TVs, etc. The TV type might be the only type we carry that has a power consumption. So I would put the power_consumption attribute on the tv_type_table. But then we start to carry Home theater systems which also have a power_consumption property. OK its just one other product so I'll add this field to the stereo_type_table as well since that is probably easiest at this point. But over time as we start to carry more and more electronics, we realize that power_consumption is broad enough that it should be in the main_product_table. What should I do now?
Add the field to the main_product_table. Write a script to loop through the electronics and put the correct value from each type_table to the main_product_table. Then drop that column from each type_table.
Now If I was always using the same GetProductData class to interact with the database to pull the product info; then if any changes in code now need refactoring, they should be to that Class only.

You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.

How would you structure these tables for product reviews in MySQL [duplicate]

I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?

You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.

MySQL conditional table structure question [duplicate]

I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?

You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.

Design database schema with merge fields that hold different types of values [duplicate]

I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?

You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.

Database Design For Tournament Management Software

I'm currently designing a web application using php, javascript, and MySQL. I'm considering two options for the databases.
Having a master table for all the tournaments, with basic information stored there along with a tournament id. Then I would create divisions, brackets, matches, etc. tables with the tournament id appended to each table name. Then when accessing that tournament, I would simply do something like "SELECT * FROM BRACKETS_[insert tournamentID here]".
My other option is to just have generic brackets, divisions, matches, etc. tables with each record being linked to the appropriate tournament, (or matches to brackets, brackets to divisions etc.) by a foreign key in the appropriate column.
My concern with the first approach is that it's a bit too on the fly for me, and seems like the database could get messy very quickly. My concern with the second approach is performance. This program will hopefully have a national if not international reach, and I'm concerned with so many records in a single table, and with so many people possibly hitting it at the same time, it could cause problems.
I'm not a complete newb when it comes to database management; however, this is the first one I've done completely solo, so any and all help is appreciated. Thanks!

Do not create tables for each tournament. A table is a type of an entity, not an instance of an entity. Maintainability and scalability would be horrible if you mix up those concepts. You even say so yourself:
This program will hopefully have a national if not international reach, and I'm concerned with so many records in a single table, and with so many people possibly hitting it at the same time, it could cause problems.
How on Earth would you scale to that level if you need to create a whole table for each record?
Regarding the performance of your second approach, why are you concerned? Do you have specific metrics to back up those concerns? Relational databases tend to be very good at querying relational data. So keep your data relational. Don't try to be creative and undermine the design of the database technology you're using.
You've named a few types of entities:
Tournament
Division
Bracket
Match
Competitor
etc.
These sound like tables to me. Manage your indexes based on how you query the data (that is, don't over-index or you'll pay for it with inserts/updates/deletes). Normalize the data appropriately, de-normalize where audits and reporting are more prevalent, etc. If you're worried about performance then keep an eye on the query execution paths for the ways in which you access the data. Slight tweaks can make a big difference.
Don't pre-maturely optimize. It adds complexity without any actual reason.

First, find the entities that you will need to store; things like tournament, event, team, competitor, prize etc. Each of these entities will probably be tables.
It is standard practice to have a primary key for each of them. Sometimes there are columns (or group of columns) that uniquely identify a row, so you can use that as primary key. However, usually it's best just to have a column named ID or something similar of numeric type. It will be faster and easier for the RDBMS to create and use indexes for such columns.
Store the data where it belongs: I expect to see the date and time of an event in the events table, not in the prizes table.
Another crucial point is conforming to the First normal form, since that assures data atomicity. This is important because it will save you a lot of headache later on. By doing this correctly, you will also have the correct number of tables.
Last but not least: add relevant indexes to the columns that appear most often in queries. This will help a lot with performance. Don't worry about tables having too many rows, RDBMS-es these days handle table with hundreds of millions of rows, they're designed to be able to do that efficiently.

Beside compromising the quality and maintainability of your code (as others have pointed out), it's questionable whether you'd actually gain any performance either.
When you execute...
SELECT * FROM BRACKETS_XXX
...the DBMS needs to find the table whose name matches "BRACKETS_XXX" and that search is done in the DBMS'es data dictionary which itself is a bunch of tables. So, you are replacing a search within your tables with a search within data dictionary tables. You pay the price of the search either way.
(The dictionary tables may or may not be "real" tables, and may or may not have similar performance characteristics as real tables, but I bet these performance characteristics are unlikely to be better than "normal" tables for large numbers of rows. Also, performance of data dictionary is unlikely to be documented and you really shouldn't rely on undocumented features.)
Also, the DBMS would suddenly need to prepare many more SQL statements (since they are now different statements, referring to separate tables), which would present the additional pressure on performance.

The idea of creating new tables whenever a new instance of an item appears is really bad, sorry.
A (surely incomplete) list of why this is a bad idea:
Your code will need to automatically add tables whenever a new Division or whatever is created. This is definitely a bad practice and should be limited to extremely niche cases - which yours definitely isn't.
In case you decide to add or revise a table structure later (e.g. adding a new field) you will have to add it to hundreds of tables which will be cumbersome, error prone and a big maintenance headache
A RDBMS is built to scale in terms of rows, not tables and associated (indexes, triggers, constraints) elements - so you are working against your tool and not with it.
THIS ONE SHOULD BE THE REAL CLINCHER - how do you plan to handle requests like "list all matches which were played on a Sunday" or "find the most recent three brackets where Frank Perry was active"?
You say:
I'm not a complete newb when it comes to database management; however, this is the first one I've done completely solo...
Can you remember another project where tables were cloned whenever a new set was required? If yes, didn't you notice some problems with that approach? If not, have you considered that this is precisely what a DBA would never ever do for any reason whatsoever?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008