Storing combinations of item properties in database - mysql

I have a problem of such:
Let's say I have an item, a CUP for example. I want to sell it, but want to allow the user to pick CUP properties, such as Size, Color, Material. When the user will select Size (maybe Large), color (maybe Black) and Material (maybe Glass) then I need to show him, that we have 20 such Cups in warehouse and the cost is $25 each. And now: I don't know how to store those combinations in database.
Here is my ultra stupid solution:
For each combination I will have a column, yet, adding any new combination might be painfull as well, as removing some, I will have to map them somehow, well...
Id | Product Name | LargeBlackGlassPrice | LargeBlackGlassCount | SmallBlackGlassPrice | SmallBlackGlassCount | Medium...
stupid idea, but as for now didn't hit anything better :/
Hope it's clear what I want to achieve.
Thank you

Consider the following ERD:
The system administrator maintains a list of product categories, these may include, for example, cups. The administrator also maintains a list of features. These could include size, colour, material, and anything else that they decide is potentially important for any type of product. The administrator can then create an intersection of categories and features to indicate which features matter for a particular product category.
This establishes the "rules" for a catalogue of products. Which types of products do you have and what is important to know about each of these types products.
Now, to store the products themselves, you have the SKU table. Each individual product, for example: Large Black Glass Cups is stored in this table. You can store the current price of this product here. You can also store the stock on hand here, although I've recommended elsewhere to never store stock quantity directly. Inventory management is not the basis of your question, however.
For any particular product (SKU) you then have a list of product features where the specific values of each specific product are stored. The features that matter are the ones defined by the product's category as listed in the CATEGORY_FEATURE table.
On your website, when a customer is searching for items in a PRODUCT_CATEGORY, (e.g. Cups) you show them the list of CATEGORY_FEATUREs that apply. For each feature, you can create a drop down list of possible values to choose from by using:
select distinct PF.value
from CATEGORY_FEATURE CF
inner join PRODUCT_FEATURE PF
on CF.product_category_id = PF.product_category_id
and CF.feature_id = PF.feature_id
where CF.product_category_id = CategoryOfInterest
and CF.feature_id = FeatureOfInterest
order by
PF.value
This design gives your administrator the ability to define new product categories and product features without having to make database schema or code changes.
Many people are likely to point out that this design uses the Entity-Attribute-Value (EAV) pattern, and they are equally likely to point out that EAV is EVIL. I agree in principle that EAV is to be avoided in almost all cases, but I have also asserted that in some cases, and in particular in the case of product catalogues, EAV is actually the preferred design.

Table1 => Cup Master
Fields => Cup Id | Product Name
Example =>
1001 | CUP A
1002 | CUP B
Table2 => Property Master
Fields => Property_Id | Properties
Example =>
1 | LargeBlackGlass
2 | SmallBlackGlass
3 | MediumBlackGlass
Table3 => Inventory Master
Fields => Cup Id | Property_Id | count | price_per_piece
Example =>
CUP A | 1 | 3 | 45/=
CUP A | 2 | 2 | 40/=
CUP A | 3 | 2 | 35/=
CUP A | 1 | 3 | 45/=
CUP A | 2 | 2 | 40/=
NOTE: A cup with a particular property might be available and with other property might not.

Let try to reason how to solve your task. I will describe general conception and split it in some steps:
Define types of products that you are going to sell: cup, plate, pan and so on. Create table products with fields: id, name, price.
Define colours of products: black, red, brown. Create table products_colours with fields: id, name, price.
Define sizes of products: small, medium, large. Create table products_sizes with fields: id, name, price.
In simple case all types of products will have the same price and will store in table products.
In simple case additional price for colours and sizes will be the same for all types of products and will be stored in tables products_colours and products_sizes.
Create table customers_products with fields: id, products_id, products_colours_id, products_sizes_id, quantity.
Write a query for join all table together to fetch all products with colours, sizes and all prices from db.
In the script iterates through all rows and calculate price for every product as a sum of product price, size price and colour price.
To sum up: this is very basic implementation that doesn't include things like brands, discounts and so on. However, it gives you understanding how to scale your system in case of adding additional attributes that affect the final price of products.

Related

Better to have one master table or split into multiple tables?

I am creating a database and I am unsure of the best way to design my tables. I have a table of real estate properties and I want to store information about those properties - e.g. bedrooms, bathrooms, size... I may have additional information I want to store in the future if it seems useful - e.g. last purchase price or date built, so I need to be flexible to make additions.
Is it better to create a separate table for each "characteristic" or to have one table of all the characteristics? It seems cleaner to separate the characteristics, but easier programming-wise to have one table.
CHARACTERISTIC TABLE
id property_id characteristic value
1 1 bedrooms 3
2 1 bathrooms 2
3 1 square feet 1000
4 2 bedrooms 2
...
OR
BEDROOM TABLE
id property_id bedrooms
1 1 3
2 2 2
...
BATHROOM TABLE
id property_id bathrooms
1 1 2
...
Forgive me if this is a stupid question, my knowledge of database design is pretty basic.
I would suggest a middle ground between your two suggestions. Off the cuff I would do
property table (UID address zip other unique identifying properties)
Rooms table ( UID, propertyID, room type , room size,floor, shape, color, finish, other roo specific details ect..)
Property details (uid, propertyID, lot size, school district, how cost, tax rate, other entire property details)
Finally a table or two for histories eg.
Property sales history(UID, PropertyID , salesdate, saleprice, sale reason, ect..)
Often grouping your data by just "does it match" logic can yield you good results.... care needs only be taken to account for 1to1 and 1tomany relationship needs of tables.
I am focused to this:
"I have a table of real estate properties"
Now as far as i knew you has to be a different type of:
Houses
Bedrooms
Comfort room and so on.
For further explanation:
You has to be a table of:
1. House type
2. House names,description,housetypeid,priceid,bedroomid,roofid,comfortroomid and any other that related to your house.
3. Bedroom type
4. Comfort room type
5. Dining type
6. roof type if it has.
7. House prices
8. Bathroom type
something like that.
One table with a few columns:
Columns for price, #br, #bath, FR, DR, sqft and a small number of other commonly checked attributes. Then one JSON column with all the other info (2 dishwashers, spa, ocean view, etc).
Use WHERE clause for the separate columns, then finish the filtering in you client code, which can more easily look into the JSON.

How to make a single MySQL query that uses the results of another query

I have a Perl program that queries a MySQL database to bring back results based upon which "report" option a user has selected from a web page.
One of the reports is all occupants of a student housing building who have applied for a parking permit, but who have not yet been given one.
When the students apply for a permit, it records the specifics about their car (make, model, year, color, etc.) in a single table row. Each apartment can have up to three students, and each student may apply for a permit. So an apartment might have 0 permits, or 1, 2, or 3 permits, depending upon how many of them have cars.
What I'd like to be able to do, is execute a MySQL query that will find out how many occupants in each apartment have applied for a parking permit, and then based on the results of that query, find out how many permits have been issued. If the number of permits issued is less than the number of applications, that apartment number should be returned in the result set. It doesn't have to name the specific occupant, just the fact that the apartment has at least one occupant who has applied for a permit, but not yet received one.
So I have two tables, one is called occupant_info and it contains all kinds of info about the occupant, but the relevant fields are:
counter (a unique row id)
parking_permit_1_number
parking_permit_2_number
parking_permit_3_number
When a parking permit has been assigned, it is recorded in the appropriate parking_permit_#_number field (if it's occupant number one's permit, it would be recorded in parking_permit_1_number, etc.).
The second table is called, parking_permits, and contains all of the car/owner specifics (make, model, year, owner, owner address, etc.). It also contains a field which references the counter from the occupant_info table.
So an example would be:
occupant_info table
counter | parking_permit_1_number | parking_permit_2_number | parking_permit_3_number
--------|-------------------------|-------------------------|------------------------
1 | 12345 | | 98765
2 | 43920 | |
3 | 30239 | | 34233
parking_permits table
counter | counter_from_occupant_info | permit_1_name | permit_2_name | permit_3_name
--------|----------------------------|---------------|-----------------|-------------------
1 |2 | David Jones | James Cameron | Michael Smerconish
2 |3 | Bill Epps | Hillary Clinton | Donald Trump
3 |1 | Joanne Miller | | Sridevi Gupta
I want a query that will first look at how many occupants in an apartment have applied for a permit. This is determined by counting the names in the parking_permits table. In that table, row 1 has three names, row 2 has three names, and row 3 has two names. The query should then look at the occupant_info table, and for each counter_from_occupant_info from the parking_permits table, see if the same number of parking permits have been issued. This can be determined by comparing the number of non-blank parking_permit_#_number fields.
Using the data above, the query would see the following :
parking_permit table row 1
Has counter_from_occupant_info equal to "2"
Has three names
The row in occupant_info with counter = "2" has only one permit number issued,
so counter_from_occupant_info 2 from parking_permits should be in the result set.
parking_permit table row 2
Has counter_from_occupant_info equal to "3"
Has three names
The row in occupant_info with counter = "3" has only two permit numbers issued,
so counter_from_occupant_info 3 from parking_permits should be in the result set.
parking_permit table row 3
Has counter_from_occupant_info equal to "1"
Has two names
The row in occupant_info with counter = "1" has two permit numbers issued,
so this row should *not* be in the result set.
I've thought about using if, then, case, when, type logic to do this in one query, but frankly can't wrap my head around how to do so.
I was thinking something like:
SELECT
CASE WHEN ( SELECT counter_from_occupant_info
FROM parking_permits
WHERE parking_permit_1_name != ""
AND parking_permit_2_name != ""
AND parking_permit_3_name != "" ) THEN
IF ( SELECT parking_permit_1_number,
parking_permit_2_number,
parking_permit_3_number
FROM occupant_info
WHERE counter = ***somehow reference counter from above case statement--I don't know how to do this***
But then my head explodes and I realize I don't know what the heck I'm doing.
Any help would be appreciated. :-)
Doug
You have a few problems:
Your occupants table schema is bad. There's worse out there, but it looks like someone that doesn't understand how a database works built this table.
Your permits table is also bad. Same reason.
You have no idea what you are doing (kidding... kidding...)
Problem 1:
Your occupants table should probably be two tables. Because an occupant could have 0-3 permits (possibly more, I can't tell from the sample data) then you need a table for your occupant's attributes (name, height, gender, age, primary smell, favorite color, first rent date, I dunno).
Occupants
OccupantID | favorite TV Show | number of limbs | first name | last name | aptBuilding
And... another table for Relationship between the occupant and the permit:
Occupant_permits
OccupantID | Permit ID | status
Now... an occupant can have as many permits as you can stuff into that table and the relationship between them has a status "Applied for", or "Granted" or "Revoked" or what have you.
Problem 2
Your permit info table is doing double duty as well. It holds the information about a permit (it's name) as well as the relationship to the occupant. Since we already have a relationship to the occupant with the "Occupant_Permits" table above, we just need a permits table to hold attributes of a permit:
Permits
Permit ID | Permit Name | Description | etc..
Problem 3
Now that you have a correct schema where objects are in their own table (Occupant, Permit, Occupant and Permit Relationship) your query to get a list of apartments that have at least one occupant that has applied, but not yet received a permit would be:
SELECT
COUNT(DISTINCT o.AptBuilding)
FROM
occupants as o
INNER JOIN occupants_permit as op
ON o.occupant_id = op.occupant_id
INNER JOIN permits as p
ON op.permit_id = p.permit_id
WHERE
op.Status = "Applied"
That's nice and simple and you aren't relying on CASE or UNION or count comparison or any fancy stuff. Just nice straight joins and a simple WHERE clause. This will be fast to query and there's no funny business.
Because your schema isn't great, in order to get something similar you'll need to make use of either UNION queries to stack your many permit_N_ fields into a single field and run something similar to the above query, or you'll have use a fair amount of CASE/IF statements:
SELECT DISTINCT p.pCounter
FROM
(
SELECT
counter as Ocounter
CASE WHEN parking_permit_1_number IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_2_number IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_3_number IS NOT NULL THEN 1 ELSE 0 END AS permitCount
FROM occupant_info
) as o
LEFT OUTER JOIN
(
SELECT
counter_from_occupant_info as pCounter
CASE WHEN parking_permit_1_name IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_2_name IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_3_Name IS NOT NULL THEN 1 ELSE 0 END AS permitPermitCount
) as p ON o.Ocounter = p.Pcounter
WHERE p.permitCounter > o.PermitCount
I'm not 100% convinced that is exactly what you are looking for since your schema is confusing where you have multiple objects in a single table and everything is pivoted, but... it should get you in the ball park.
This will be much slower too. There's intermediate result sets, CASE statements, and math, so don't expect MySQL to spit this out in milliseconds.

Storing users data efficiently in mysql database

I am developing a recommendation engine, so that requires storing lots of data and keeping track of every move made by the user. So, basically my website is a product search engine and will be having sets of queries as users data. Following are some examples of the data set
Example
User1 :
1. Apple Ipod tOuch
2. Samsung Galaxy Ace Plus
3. HArry Porter
User2 :
1. Product1
2. Product2
and so on.
One way(naive) could be having an ID associated with each of my users and then having a string corresponding to that ID which will be of this form(strings separated with ~) :-
Unique ID - Apple IPod TOuch~Samsung Galaxy Ace Plus~HArry Porter
But this method won't be efficient considering how I would be playing with those data later on.
Can any one come up with a very efficient model fairly easily implementable in mysql ?
Comment if I am unclear in asking my doubt.
The classic design is a table for users :
Users(user_id,user_name,reg_date....)
table for products :
Products(prod_id,prod_name,prod_cost....)
table with mapping user-->products :
User_products(user_id,prod_id ....)
Example :
Users :
user_id|user_name
1200 | User1
7856 | User2
Products :
prod_id | prod_name
12900 | Apple Ipod tOuch
45673 | Samsung Galaxy Ace Plus
99876 | HArry Porter
34590 | Product1
56283 | Product2
User_products :
user_id | prod_id
1200 |12900
1200 |45673
1200 |99876
7856 |34590
7856 |56283
Avoid strings separated with some identifier because you'll have to work with the data submitted, otherwise your search engine will be very slow when you'll get really big amount of data.
I think Grisha is absolutely right - user or product searches (numeric id searches), joined with mapping tables will output the result much faster than searches through text/varchar fields, separating the results, etc.
Using the canonical approach as proposed by Grisha, the query 'who has product 1' would be represented thus
select user.user_name
from users inner join user_products on users.user_id = user_products.user_id
inner join products on products.prod_id = user.products.prod_id
where products.prod_name = 'Product 1'
This may look complicated but it's actually very simple and very powerful. If there were another field in the user_products table such as purchase date, you could also find out when those users bought product 1, or find all the users who bought the product during a given period, by means of a simple extension to the query.

Database Design: how to model generic price factors of a product/service?

I'm trying to create a generic data model that will allow for a particular product (indicated by the FK product_id in the sample table below) to specify 0 or more price "factors" (I define "factor" as a unit of price added or subtracted in order to get the total).
So say there is this table:
===============================
price
===============================
price_id (PK)
product_id (FK)
label
operation (ENUM: add, subtract)
type (ENUM: amount, percentage)
value
A book's price might be represented this way:
====================================================================
price_id | product_id | label | operation | type | value
====================================================================
1 | 10 | Price | add | amount | 20
2 | 10 | Discount | subtract | percentage | .25
3 | 10 | Sales Tax | add | percentage | .1
This basically means:
Price: $20.00
Discount: - $5.00 (25%)
--------------------
Sub Total: $15.00
Sales Tax: $1.50 (10%)
------------------------
Total: $16.50
A few questions:
Is there anything obviously wrong with the initial design?
What if I wanted to create "templates" (e.g. "general merchandise" template that has "price", "discount" and "sales tax" fields; a "luxury merchandise" that has "price", "discount", "luxury tax" fields) - how would I model that?
The above model works if each record applies to the total of the preceeding record. So, in the example, "sales tax" applies to the difference of "price" and "discount". What if total was not computed that simply? For example: A + B + (A + 10%) - (B - 5%). How would I model that?
Also, what if the "percentage" type doesn't apply to the immediately preceeding row (as implied by question #3) and applied to more than 1 row? Do I need another table to itemize which price->price_id the percentage applies to?
First of all you need a model of price labels, which is simple:
price_labels
id | label
1 | Price
2 | Discount
3 | Tax
Then a slightly modified version of the sample table that you've given:
products_prices
price_id|product_id|label_id|divider|value
1 10 1 1 20
2 10 2 100 -25
3 10 3 100 10
Here I just substituted the label with the corresponding id from the price_labels table as a foreign key. Additionally, I omitted the type field which is trivial since value can be positive or negative float number. I added the divider column to enable the percentage parameter. I think it is more easily read this way as well, since you say (and think) "minus twenty-five percent" not 0.25 .
Now the expression "abstraction" part is a bit more complicated and there could be a lot of solutions.
price_expressions
product_id | date_from | date_until | expression
10 |2011-11-02 04:00:00 |2011-11-12 04:00:00 | (SELECT divider*value from
products_prices
WHERE product_id=%PRODUCT_ID%
AND label_id=1)*
(SELECT 1+value/divider from products_prices
where product_id=%PRODUCT_ID% AND
label_id=2)*
(SELECT 1+value/divider from products_prices
where product_id=%PRODUCT_ID% AND
label_id=3)
In the expression field you can store a complex SQL statement in which you can just replace the %PRODUCT_ID% placeholder with the product_id value from the same row:
SELECT REPLACE(expression,'%PRODUCT_ID%',CAST(product_id AS char))
AS price_expression FROM price_expressions
WHERE product_id = 10 AND date_from>=DATE_OF_PURCHASE
AND date_until<=DATE_OF_PURCHASE
There are two possible variations of this the way I see it:
You can change the product_id=%PRODUCT_ID% and label_id=N condition with just a price_id=N since you already have it stored in the products_prices table
You can use another expression format e.g. %PRICE_ID_1%*%PRICE_ID_2 and perform substitutions and calculations on the application level not directly in SQL
Hope this helps.
This seems a little over-engineered.
1) Wouldn't the sales tax percentage be a factor of where the item was purchased and not which item was purchased? I could see a field for "IsTaxable", but specifying the rate for each items seems incorrect.
2) Are you sure you need to incur the cost of making this generic? Are you already fairly certain there will be more factors in the future? If not, don't overcomplicate it.
Suggested Design:
- Add columns to the products table for IsTaxable, DiscountPct, and Unit Price.
- Store the Sales tax percentage in another table. Probably the invoice table.
Regarding your question 1:
There is a potential functional dependency between label, operation and type. For example, a discount might always imply subtraction and percentage. If so, the data model can be normalized by moving these fields to a separate table with label as a PK.
BTW, a de-normalized data model may be a legitimate tool for improving performance and/or simplicity.
Regarding your question 2:
Here is a model that allows easy "templating":
The final price of a product is calculated by applying the series of steps on PRICE, in order defined by STEP_NO. Multiple products can easily share the same "template" (i.e. the same PRICE_ADJUSTMENT_ID).
Regarding your questions 3 and 4:
You'd need to model a full expression tree, not just a "linear" series of steps. There are several ways to do that, most of them fairly complicated in relational paradigm. Perhaps the simplest one is to keep the data model similar to above, but treat it as Reverse Polish Notation.
For example...
A + B + (A + 10%) - (B - 5%)
...could be represented as:
OPERATION TYPE VALUE
---- ---- -----
value A
value B
add
value A
percentage 10
add
add
value B
percentage 5
subtract
subtract
Are you sure you actually need this kind of functionality?
If some price factors are dependent on the type of the item, then you'd have a set of price factors linked to entities in an ItemType table, and ItemType would be a property of the item entity (foreign key referencing ItemType). If other price factors are linked to the locale in which the item is being sold or shipped (e.g. sales tax), then those factors would be linked to Locale and would be invoked based on the customer's address. You would typically apply item-type factors at the line-item level, and locale-driven factors to the invoice total. Sin-tax would be linked to an ItemTypeLocale dyad, and applied at the line-item level.
1/ I think you also need to consider sequence
e.g. Price - discount + sales tax is obviously acceptable but Price +sales tax - discount is not nor is Price - (discount + sales tax)
2/ I would consider having price in another table. Is this not a detail of the item being sold? E.g. Widget, blue, $20.00. Whereas your factors are a detail of sales type. Presumably you could have one set of factors for a walk-in retail sale, another for a on-line sale and a third for a wholesale sale. You could calculate the actual price for these three sale types from the base price * factors.
3/ I think you need more tables; e.g. maybe Item, Sale type, and factor_details and factor_rules. It may be that your sale type is covered by your example of Luxury item in which case (if an item is only ever one sale type) this could be in the item table. Factor_rules would detail the calculation formula and factor_details the values.
I find this quite interesting. I would appreciate you updating this question with your experiences once you have worked this through.

MySQL: How to pull information from multiple tables based on information in other tables?

Ok, I have 5 tables which I need to pull information from based on one variable.
gameinfo
id | name | platforminfoid
gamerinfo
id | name | contact | tag
platforminfo
id | name | abbreviation
rosterinfo
id | name | gameinfoid
rosters
id | gamerinfoid | rosterinfoid
The 1 variable would be gamerinfo.id, which would then pull all relevant data from gamerinfo, which would pull all relevant data from rosters, which would pull all relevant data from rosterinfo, which would pull all relevant data from gameinfo, which would then pull all relevant data from platforminfo.
Basically it breaks down like this:
gamerinfo contains the gamers basic
information.
rosterinfo contains basic information about the rosters
(ie name and the game the roster is
aimed towards)
rosters contains the actual link from the gamer to the
different rosters (gamers can be on
multiple rosters)
gameinfo contains basic information about the games (ie
name and platform)
platform info contains information about the
different platforms the games are
played on (it is possible for a game
to be played on multiple platforms)
I am pretty new to SQL queries involving JOINs and UNIONs and such, usually I would just break it up into multiple queries but I thought there has to be a better way, so after looking around the net, I couldn't find (or maybe I just couldn't understand what I was looking at) what I was looking for. If anyone can point me in the right direction I would be most grateful.
There is nothing wrong with querying the required data step-by-step. If you use JOINs in your SQL over 5 tables, we sure to have useful indexes on all important columns. Also, this could create a lot of duplicate data:
Imagine this: You need 1 record from gamerinfo, maybe 3 of gameinfo, 4 ouf of rosters and both 3 out of the remaining two tables. This would give you a result of 1*3*4*3*3 = 108 records, which will look like this:
ID Col2 Col3
1 1 1
1 1 2
1 1 3
1 2 1
... ... ...
You can see that you would fetch the ID 108 times, even if you only need it once. So my advice would be to stick with mostly single, simple queries to get the data you need.
There is no need for UNION just multiple JOINs should do the work
SELECT gameinfo.id AS g_id, gameinfo.name AS g_name, platforminfoid.name AS p_name, platforminfoid.abbreviation AS p_abb, rosterinfo.name AS r_name
FROM gameinfo
LEFT JOIN platforminfo ON gameinfo.platforminfoid = platforminfo.id
LEFT JOIN rosters ON rosters.gameinfoid = gameinfo.id
LEFT JOIN rosterinfo ON rosterinfo.id = rosters.rosterinfoid
WHERE gameinfo.id = XXXX
this should pull all info about game based on game id
indexing on all id(s) gameinfoid, platformid, rosterinfoid will help on performance