Normalising a database to reduce dependency - mysql

I am trying to get my database design right. It is a large set of alcohol drinks consisting of beer, liquor, wine and so on. I could keep it all in a single table as follows:
id category brand type price quantity description
1 Beer Heineken bottle $2.00 100 some description...
2 Beer Calsburg bottle $3.00 200 some description
3 Beer Heineken can $1.00 300 some description....
4 Liquor JWalker bottle $30.00 100 some descri...
Seems this is bad design considering repetitions for category and brand will occur. Thus I split it into 3 tables as follows:
Category Table
id name(pk)
1 Beer
2 Liquor
Brand Table
id name(pk) category_name(FK)
1 Heineken Beer
2 Carlsburg Beer
3 Lindemans Wine
4 JWalker Liquor
Product Table
id(PK) type price quantity description category_name(FK) brand_name(FK)
1 Bottle $2.00 100 some description Beer Heineken
Thought this would be better normalised but the way I see it, hardly a difference from the first table. And I end with type repeatnig too since I can get repetitions on bottle, can and so on. So should I get a 4th table for that?
Trying to normalise and keep it as sensible as possible. Is there a better way to go about doing this?

Brand Table
brandID(PK) BrandName
Category table
BrandID(FK) CategoryID(PK) Categoryname
Product table
ProductID(PK) CategoryID(FK) description price quantity

Normalization requires knowing functional dependencies (FDs) and join dependencies (JDs) that hold. You haven't given them. So we can't normalize. But guessing at your application and your table, it is in 5NF.
Presumably id is a unique column. So it functionally determines every column set. Since no smaller subset of {id} is unique, it is a candidate key (CK). Presumably no other FDs hold other than the ones that hold because of that CK. So the table is in 5NF.
But suppose also one more FD holds: that a given brand only ever appears with the same category. Then to normalize to 5NF column category should dropped and a new table should be added with brand & category columns and CK {brand}.
Or suppose that a brand has one or more categories, and instead of a row stating that category is its product's category, it states that category is a category of its product's brand. (Weird, since then for brands with more than one category the table wouldn't give a product's category.) Then normalization also gives those two tables, with new CK {category, brand}. But in this case it's because of a multi-valued dependency (MVD), ie because of a binary JD.
PS Introducing ids has nothing to do with normalization.
PPS You seem to think that repeated subrow values imply a need for normalization. They don't. Normalization is for sometimes replacing a table by tables that always join to it.

Normalization through BCNF is based on functional dependencies. It's not based on whether a column contains text or numbers. You seem to think that, because the category column contains the word Beer more than once, it needs to be "normalized". That's not the case.
So what are the functional dependencies here?
id -> category, brand, type, price, quantity, description
category, brand, type, -> id, price, quantity, description
That second FD might be wrong. It might be that {brand, type} is the determinant. But I think it's likely that there's a company somewhere that makes both beer and liquor under the same brand name. So I think that the determinant is probably {category, brand, type}.
That's in 5NF already. "Splitting" isn't going to improve this table.

Table creation would look something like this:
create table product (
product_id int not null identity,
brand_id int not null,
category_id int not null,
primary key(product_id),
foreign key brand_id references brand(brand_id),
foreign key category_id references category(category_id)
);
create table brand (
brand_id int not null identity,
name varchar(80),
primary key(brand_id)
);
create table category (
category_id int int not null identity,
name varchar(80),
primary key(category_id)
);
You do a JOIN to get the record back:
select p.product_id, c.name as category_name, b.name as brand_name
from product as p
join category as c on p.category_id = c.category_id
join brand as b on p.brand_id = b.brand_id

Related

How to query against many tables in mySQL

I am designing a database application for an award. It has a 75 year history and numerous categories that have changed over time. Right now, the design I am thinking of has two kinds of tables:
entities
people
publishers
categories
novel
movie
author
artist
and such like. Each category has data particular to that category, for example:
NOVEL
title varchar(1024)
author int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
or
ARTIST
name int
year year(4)
winner bool
So far so good. However, there are 38 (!) of these categories that have existed over time (some do not exist anymore) and I really can't imagine doing a query for say, all of the winners from 1963 by doing:
SELECT * from table1,table2,...,table38 WHERE year=1963 and winner=TRUE;
These tables will never be that large (each category usually has at most five nominees, so even after a 100 years, there would be at most 500 rows per table and at a lot less for the early ones that aren't continued). So this isn't a performance question. It is just that that query feels very, very wrong to me, if only because every query will have to be changed every time a new category is created or an old one removed. That happens every few years or so.
The questions then are:
is this query evidence that I've designed this wrong?
if not, is there a better way to do that query?
I keep thinking there must be some way to create a lookup table which pulls from other tables, but I could be misremembering. Is there some way of doing such a thing?
Many thanks,
Glenn
You could do that with 3 tables.
First one is entities. It contains data about all publishers/artist/etc.
entities
name varchar(1024)
publisher bool
Second is data where all data from all categories is stored.
data
title varchar(1024)
author/name int #FK into people table ID
publisher int #FK into publisher table ID
year year(4)
winner bool
category int #FK into category table ID
Third is category in which you can find all categories names with their IDs.
category
ID int
name varchar(1024)
Now you have to join only three tables.
select * from entities e, data d, category c where d.name=e.name and d.category=c.id and winner=bool and year=1963
You would better to have a table for categories where you can save category key value, or just normal category table and you can save the row's id only in other table:
for example,
Table: Category
columns: id, name, slug, status, active_since, inactive_since etc...
In slug, you can keep slugified form of cat to make it easy for queries and url: for example, Industry Innovations category will be saved as industry-innovations.
In status, keep 0 or 1 to show if it is active now. You can also keep dates when it was active and when became inactive in active_since and inactive_since fields.
When you search, you can search those have status 1 for example etc. I dont think your problem is complex and it is very simple for mysql to search when you join tables.
There are projects where dozens of tables are joined and it is ok.

maintaining a price table changes every day

state (st_id,st_name)
district (d_id,d_name,st_id[FK])
product (pid,pnme)
price (max_price,min_price,pid[FK],d_id[FK])
1.)This is my table structure, i want to show the price of products in 5 states and its districts,but in price tbl i'm repeating the product(more than 10) for each district.
Whats wrong with my price tbl, Could you plz give an idea to normalize it..
2.) NOw i'm just planning to add date stamp(start date) field to price tbl so that i can maintain historical price list, but how can i do it without repeating product(like shown below) on each date..any better solution to reduce the tbl rows
_______________________________________
product| price |district|date(mm/dd/yy)|
_______|_______|________|______________|
fan 200 delhi 3/15/2013
speaker 400 delhi 3/15/2013
fan 210 chenni 3/15/2013
speaker 403 chenni 3/15/2013
fan 200 delhi 3/16/2013
fan 210 chenni 3/16/2013
1) There's nothing much wrong with your table design - however, the sample data doesn't make sense, as there's a repeat for product 1 and district 111. You might want to create a composite primary key on pid and d_id.
2) Again, nothing much wrong with the table design; you might consider only entering data if there's a change, so that retrieving the price for a given date searches for the last record before the desired data. That reduces the size of the table.
General points: please pick a naming convention and stick to it - you use pid and d_id (one with an underscore, one without); in general, I prefer more descriptive column names, but consistency is key.
Also, there's nothing wrong with large tables, as long as the data isn't redundant. Your design seems to have no redundancies.
1.)This is my table structure, i want to show the price of products in 5
states and its districts,but in price tbl i'm repeating the product(more than 10)
for each district.
If you're offering all your products in all those districts, and the price varies depending on which district the product is sold in, then it only makes sense that you'd repeat the product for each district.
Whats wrong with my price tbl, Could you plz give an idea to normalize it..
It looks like your price table doesn't have a sensible primary key.
If you'd built the table along these lines . . .
create table prices (
district_id integer not null references districts (district_id),
product_id integer not null references products (product_id),
primary key (district_id, product_id),
min_price numeric(14,2) not null,
max_price numeric(14,2) not null
);
you'd have a table in 5NF, assuming that minimum and maximum product prices vary among the districts. But your sample data couldn't possibly fit in it.
1) In your Price (Price Range) table, I don't understand why (d_id, pid) repeats? There should be only one price range, unless you put an effective date column in the table.
2) You could have a future price table, a current price table, and a history price table. This allows you to enter price changes in advance, keeps the current price table short and allows you to get the historical prices infrequently when you need them. Your application code maintains the relationship between these price tables.
I'm not sure where city came from in your other Price table, since you defined state and district.

database design / mysql

My client is wanting to add new functionality to their site, they deal with actors/models and want to be able to create a credit history for each of the clients (much like a CV or Resume).
They have some criteria that I must adhere too, and because of this I cannot get my head around it.
A credit can be one of two things, it can be a 4 column credit, or a single column credit. The credit however must have a category, and these categories can be one of the following, TV, Film, Advert, Radio or something of their own making.
The second criteria is that the categories are orderable, so for example if they are entering an actors credit, he may have television and film credits, they may from time to time, to film above television.
The third criteria is that the credits with each category are orderable, so film1 credit does not have to be at the top.
Here is what I have devised so far.
CANDIDATES | CREDITS
---------- -------
candidate_id^ credit_id*
credit_category
credit_heading
credit_title
credit_role
credit_director
credit_position
candidates_candidate_id^^
^ - Primary Key
^^ - Foreign Key
My confusion comes from using this table structure there is no way to alter what order the categories are in, as if I added a credit_category_position.
For example if the user has a credit in the category film, and I want to add another, when I insert the data through my form, how do I keep the credit_category_position consistent for all that clients film entries?
I hope this makes sense to someone.
I've just glanced through your question and I'm not sure I exactly understand it, but one thing just pops to my mind:
why don't you have a many-to-many relationship between candidates and credits?
CANDIDATES | CREDITS
---------- -------
candidate_id^ credit_id*
credit_category
credit_heading
credit_title
credit_role
credit_director
CANDIDATE_CREDIT_REL
--------
rel_id*
credit_id^^
candidate_id^^
credit_position
A credit can be one of two things, it can be a 4 column credit, or a
single column credit.
I'm going to skip this one, because I don't understand it and because there isn't anything in your description that helps me with it. Feel free to edit your question.
The second criteria is that the categories are orderable
You need an additional table for that, because each candidate can have multiple credits in, say, film.
create table credit_category_order (
candidate_id integer not null,
credit_category <whatever> not null,
category_order float not null, -- lets you reorder by splitting the difference,
-- but there are other ways.
primary key (candidate_id, credit_category),
foreign key (candidate_id, credit_category)
references credits (candidate_id, credit_category)
);
The third criteria is that the credits with each category are
orderable
Add a column to credits. When you query, join credit_category_order, and ORDER BY credit_category_order.category_order, credits.credit_order.
Credit_Category
----------
id, category_name, details
Actor_cc_order //For each actor, have an entry for every category
----------
id, id_actor, id_cc, ord_number
Actor_credit
------------
id, id_actor, id_cc, credit_details
view of credits
SELECT a.*, b.category_name, c.ord_number FROM
Actor_credit a
JOIN Credit_category b ON b.id = a.id_cc
JOIN Actor_cc_order c ON c.id_actor = a.id_actor AND c.id_cc = b.id
SORT BY a.id_actor, c.order_number

Database Design: product and product combo

Say I am selling a number of product. Sometimes, the product is actually a combination of other product. For example, say I am selling a:
hot dog
soda
hot dog + soda combo
How should I model something like this? Should I have a product table to list the individual products, then a product_combo table that describes the combo, and another table that is associated with product and product_combo to itemize the products in the combo? This seems straightforward to me.
However, what if I wanted to record all the sales in one table? Meaning, I don't want product_sales table and a product_combo_sales table. I want all sales to be in just one table. I'm a bit unsure how to model product and product combos in such a way I can later record all sales in one table.
Suggestions?
NOTE: I'm wondering if I could put product and product combo in one table using a parent-child relationship. With one table, then recording sales won't be hard. I'd just have to implement a business rule that editing a product combo when a sale is already recorded against that combo that the edit actually results in a new entry. Could get messy, though.
This depends on what you actually need to do with your system. A system that needs to track inventory is going to need to understand that a "combo meal" needs to debit the inventory by one hot dog and 32 ounces of soda (or whatever). A system that only keeps track of orders and dollars, however, doesn't really care about what "goes into" the combo meal -- only that you sold one and got paid for it.
That said, let's assume you need the inventory system. You can reduce your complexity by changing your definition a little bit. Think in terms of (1) inventory items and (2) menu items. Your inventory_items table contains items that you purchase and track as inventory (hot dogs, soda, etc). Your menu_items table contains items that you sell (Big Dog Combo Meal, Hot Dog (sandwich only), etc).
You can have some menu items that, coincidentally, have the same name as an inventory item but for these menu items treat them the same way you do a combo item and stick a single record into the linking table:
inventory_items menu_items recipes (menu_item, inventory, qty)
--------------- ------------ ----------
hot dog Hot Dog Hot Dog, hot dog, 1
hot dog bun Hamburger Hot Dog, hot dog bun, 1
hamburger patty (4oz) Big Dog Combo Hamburger, hamburger patty (4oz), 1
hamburger bun Soda (32oz) Hamburger, hamburger bun, 1
cola Big Dog Combo, hot dog, 1
ginger ale Big Dog Combo, hot dog bun, 1
Big Dog Combo, *soda, 32
Soda (32oz), *soda, 32
Just constructing this example, it turns out that even the lowly hot dog has two components (you have to count the bun), not just one. To come up with the simplest case (a menu item with a single component), I added Soda to the menu. Consider, however, that if you are going to inventory non-food items (cups) then even a simple Soda is going to have two components (three if you're inventorying the straws).
Note that with this design there will be no special codepaths for handling combo items and non-combo items. All menu-related functionality will use only the menu_items table, all inventory and food-prep related functionality will JOIN menu_items to recipes and (if additional fields are needed) to inventory_items.
You'll need special handling for optional components (sauerkraut, relish, chili, etc) and for components that can be selected from different inventory items (represented as *soda in this model), but this should get you started.
Both you're approaches are OK. But there's at least one other way to solve the problem which is to apply discounts to product combinations (which means you can also apportion the discount selectively) e.g.
CREATE TABLE products
(
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(128),
description TEXT,
price INT
);
CREATE TABLE combo_discounts
(
id NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(128),
description TEXT
);
CREATE TABLE cd_products
(
cd_id INT /* REFERENCES combo_discounts.id */,
p_id INT /* REFERENCES product.id */
price_reduction INT
);
CREATE TABLE sales
(
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
location ...whatever...
);
CREATE TABLE sales_items
(
sale_id INT /* REFERENCES sales.id */
p_id INT /* REFERENCES product.id */
cd_discount INT /* REFERENCES cd_products.cd_id */
);
But bear in mind that you'll need to use procedural code to assign the discounts the sale (and flag each sold item as you go) to address the problem of someone buying 2 hot dogs and one soda (and hence only getting one discount).
...and hence the total price for a sale is
SELECT SUM(p.price)-SUM(cd.price_reduction)
FROM sales s INNER JOIN sales_items si ON (si.sale_id=s.id)
LEFT JOIN cd_products cdp ON (si.cd_discount = cdp.cd_id
AND si.p_id=cdp.p_id)
AND s.id=?
I suggest you think in terms of "orders" and "items". An Order consists of many items. Items can be different "products". So, examples for Orders can be:
1) hot dog
2) soda
3) hot dog + soda
Examples for Items can be:
A) hot dog
B) soda
Also this way you can keep sales, in the orders table.
I don't think you need to have the prises for the "combo" in your database.
This is business logic that should be applied in the code, not in database.
You can apply all your discounts later in the code.

storing an item that is tagged with many categories - bitmasking?

Maybe the solution is obvious, but I cant seem to find a good one.
In my upcoming project, there will be one main table, its data will be read frequently. Update / Insert / Delete speed is not an issue.
The items in that main table are associated to 4 or more categories. An item can have 50 - 100 or more relations within one category.
The most common operations that will be performed on the database:
select all items that have been assigned to category A, B, C, ... with LIMIT X, Y
count all items that have been assignged to category A, B, C, ...
My first thought on how to create a database for the above was something like this (classic approach I guess):
First, for each of the four categories I create a category table:
id - PK, int(11), index
name - varchar(100)
then I will have one item table:
id - PK, int(11), index
... some more data fields, about 30 or so ...
and to relate the category tables, there will be 4 or more lookup / MM tables like so:
id_item - int(11)
id_category - int(11)
The queries looked something like this:
select
item.*
from
item
inner mm_1 on mm_1.id_item = item.id
inner join cat_1 on cat_1.id = mm_1.id_category and cat_1.id in (1, 2, ... , 100)
inner mm_2 on mm_2.id_item = item.id
inner join cat_2 on cat_2.id = mm_2.id_category and cat_2.id in (50, 51, ... , 90)
Of course the above approach with MM tables would work, but as the app should provide very good SELECT performance, I tested it with real world amounts of data (100.000 records in the item table, 50 - 80 relations in each category), but it was not as fast as I expected, even with indexes in place. I also tried using WHERE EXISTS instead of INNER JOIN when selecting.
My second idea was to just use the item table from above denormalize the data.
After reading this blog post about using bitmasks I gave it a try and assigned each category a bit value:
category 1.1 - 1
category 1.2 - 2
category 1.3 - 4
category 1.4 - 8
... etc ...
So, if an item was tagged with category 1.1 and category 1.3, it had a bitmask of 5, which I then stored in a field item.bitmask and I can query it like so:
select count(*) from item where item.bitmask & 5 = 5
But performance was not so great either.
The problems with this bitmasking approach: mysql does NOT use any indexes when bit operators are involved and even when item.bitmask would be of type BIGINT I can only handle up to 64 relations, but I need to support up to 100 per category.
That was about it. I cant think of anything more except maybe polluting the item table with many, many fields like category_1_1 up to category_4_100 each of the contains either 1 or 0. But that could lead to many AND in the WHERE clause of the select and that does not seem like a good idea, too.
So, what are my options? Any better ideas out there?
EDIT: as an response to Cory Petosky comment "What does "An item can have 50 - 100 or more relations within one category." mean?":
To make it more concrete, the item table represents an image. Images are among other criterias categorized in moods (mood would be one of 4 categories). So it would look like this:
Image:
- Category "mood":
- bright
- happy
- funny
- ... 50 or so more ...
- Category "XYZ":
- ... 70 or so more ...
If my image table would be a class in C#, it would look like this:
public class Image {
public List<Mood> Moods; // can contain 0 - 100 items
public List<Some> SomeCategory; // can contain 0 - 100 items
// ...
}
What about this (pseudocode):
Item (image)
Id PK, int(11)
Name varchar(100)
Category (mood, xyz)
Id PK, int(11)
Name varchar(100)
Relations (happy, funny)
Id PK, int(11)
Name varchar(100)
ItemCategories
Id PK, int(11)
ItemId FK, int(11)
CategoryId FK, int(11)
ItemCategoryRelations
ItemCategoriesId FK, int(11)
RelationId FK, int(11)
SELECT *
FROM Item
JOIN ItemCategories ON Item.Id = ItemCategories.ItemId
WHERE ItemCategories.CategoryId IN (1, 2, ..., 10)
Below version uses one less table but doesn't supports categories without relations, and relations can't be reused. So, its just valid if matches your data structure requirements:
Item (image)
Id PK, int(11)
Name varchar(100)
Category (mood, xyz)
Id PK, int(11)
Name varchar(100)
Relations (happy, funny)
Id PK, int(11)
CategoryId FK, int(11)
Name varchar(100)
ItemRelations
ItemId FK, int(11)
RelationId FK, int(11)
SELECT *
FROM Item
JOIN ItemRelations ON Item.Id = ItemRelations.ItemId
JOIN Relations ON Relations.Id = ItemRelations.RelationsId
WHERE Relations.CategoryId IN (1, 2, ..., 10)
How about this one; each category can have parent category. In your example, if bright is a child of mood then linking an item to bright would automatically make it mood\bright.
So if I understand right, an image falls into one of four of your main categories...mood for example. Then within mood it can be linked to 'bright' and 'happy.' and so on.
While I absolutely love bitmasking (microprocessor programmer here by day), and while I always seem to love applying it to db design as well, there always seems to be a better way.
How about something like this.
tblItems
------------------
item_id
item_name
tblCategories
------------------
category_id
category_name
tblRelations
------------------
relation_id
relation_name
tblCategoryRelationLink (link relations to specific categories)
------------------
cat_rel_id
category_id
relation_id
tblItemRelationLink (set relations to items)
------------------
item_rel_id
item_id
rel_id
If your relations are specific to categories....then you can simply lookup which category a specific relation is linked to. If somehow you can have a relation linked to two categories, then you would need an extra table as well (to link an item to a category).