MySQL Database - Performance Design - mysql

I'm currently redesign a heavy loaded website, and I would appreciate any opinion about a specific database design issue.
The concept is to keep in the db a number of products (500K of them).
Every product can have a number of dynamic properties (around 1K), and every property a number of predefined but dynamic values (lets say 10 on average for every property, so around 10K)
At this point of time this is the simplified db structure:
Products (Products Table)
+--------+--------------+
| ProdID | Product Name |
+--------+--------------+
| 1 | T-Shirt XYZ |
+--------+--------------+
| 2 | Dress ABC |
+--------+--------------+
| ... | ... |
+--------+--------------+
| 500000 | Something |
+--------+--------------+
Properties Definition (Props Table) (it holds the Property Types)
+--------+--------------+
| PropID | Property Name|
+--------+--------------+
| 1 | color |
+--------+--------------+
| 2 | size |
+--------+--------------+
| ... | ... |
+--------+--------------+
| 100 | Some Prop |
+--------+--------------+
Properties Values Definition (Values Table)
+-----------+--------+-------+
| PropValID | PropID | Value |
+-----------+--------+-------+
| 1 | 1 | red |
+-----------+--------+-------+
| 2 | 1 | blue |
+-----------+--------+-------+
| 3 | 2 | m |
+-----------+--------+-------+
| 4 | 2 | xl |
+-----------+--------+-------+
| 5 | 2 | xxl |
+-----------+--------+-------+
| ... | ... | ... |
+-----------+--------+-------+
| 1000 | 100 | xyz |
+-----------+--------+-------+
This way we can add any number of properties and values in any product.
The table below holds this info.
Product Properties & Values (ProdPropVal Table)
+--------+--------+--------+-----------+
| InfoID | ProdID | PropID | PropValID |
+--------+--------+--------+-----------+
| 1 | 1 | 1 | 1 |
+--------+--------+--------+-----------+
| 2 | 1 | 2 | 3 |
+--------+--------+--------+-----------+
| 3 | 2 | 1 | 2 |
+--------+--------+--------+-----------+
| 4 | 2 | 2 | 5 |
+--------+--------+--------+-----------+
| ... | ... | ... | |
+--------+--------+--------+-----------+
In the example above we know that "T-Shirt XYZ" has blue color and its size is medium.
And now the tricky part...
if we want to find all products that have a common property values set (all products of blue color and medium size) which is the best approach?
My ideas:
Search one time the ProdPropVal Table for each PropValID and compare the results in code. This can be fine tuned by starting from the most rare PropValIDs and limiting ProdIDs using a WHERE ProdID IN (previous IDs) in the next queries.
Use an Inner Join in the ProdPropVal Table for each PropValID wanted. Something like: SELECT ProdID FROM ProdPropVal ppv1 INNER JOIN ProdPropVal ppv2 ON ppv1.ProdID = ppv2.ProdID INNER JOIN ProdPropVal ppv3 ON ppv1.ProdID = ppv3.ProdID INNER JOIN ProdPropVal ppv4 ON ppv1.ProdID = ppv4.ProdID WHERE ppv1.PropValID = 10 AND ppv2.PropValID = 20 AND ppv3.PropValID = 30 AND ppv4.PropValID = 150
These are my ideas so far. The fact that ProdPropVal tablet has some millions rows doesn't leave any room for error.
Any suggestion is most welcomed!

To find all products with blue colour and medium size I would do this:
SELECT ProdID
FROM ProdPropVal
WHERE (PropID = 1 AND PropValID = 2)
OR (PropID = 2 AND PropValID = 3)
GROUP BY ProdID
HAVING COUNT(*) = 2
Better still, if PropValID is unique in the Values table, then you would remove the PropID column from the ProdPropVal table, and simplify the query to this:
SELECT ProdID
FROM ProdPropVal
WHERE PropValID IN (2, 3)
GROUP BY ProdID
HAVING COUNT(*) = 2

Related

MySQL table relationship and how to query a Key/value table

I have the following table structure:
Product (id, name, ...)
+-----+------------+
| id | name |
+-----+------------+
| 1 | Product #1 |
| 2 | Product #2 |
| 3 | Product #3 |
| 4 | Product #4 |
+-----+------------+
Attribute (id, title, ...)
+-----+------------+
| id | title |
+-----+------------+
| 1 | shape |
| 2 | colour |
| 3 | height |
| 4 | weight |
+-----+------------+
Option (id, title ... )
+-----+------------+
| id | title |
+-----+------------+
| 1 | round |
| 2 | square |
| 3 | oval |
| 4 | red |
| 5 | blue |
| 6 | green |
| 7 | tall |
| 8 | short |
| 9 | heavy |
| 10 | light |
+-----+------------+
and a fourth one (ProductAttribute - id, product_id, attribute_id, option_id), hoping to get "all the red round products which are also tall and heavy":
+-----+------------+--------------------+
| id | product | attribute | option |
+-----+------------+--------------------+
| 1 | Product #1 | shape | round |
| 2 | Product #2 | shape | oval |
| 3 | Product #3 | shape | round |
| 4 | Product #4 | shape | square |
| 5 | Product #1 | color | green |
| 6 | Product #2 | color | red |
| 7 | Product #3 | height | tall |
| 8 | Product #4 | height | short |
| 9 | Product #2 | weight | heavy |
| 10 | Product #1 | weight | light |
+-----+------------+--------------------+
I'm by far not an sql master and maybe my idea can't work.
Edit:
Q1. The question is how do I achieve that? Getting all the red, tall, heavy products for instance.
The following queries don't achieve my purpose:
1:
SELECT ProductAttributes.product_id, ProductAttributes.id FROM ProductAttributes
WHERE (ProductAttributes.attribute_id = 1 AND ProductAttributes.option_id = 1)
AND (ProductAttributes.attribute_id = 3 AND ProductAttributes.option_id = 4);
2:
SELECT DISTINCT ProductAttributes.product_id, ProductAttributes.id FROM ProductAttributes
WHERE (ProductAttributes.attribute_id = 1 AND ProductAttributes.option_id = 1)
OR (ProductAttributes.attribute_id = 3 AND ProductAttributes.option_id = 4);
Note: I'm purposely putting 2 variable in my query, as the real one has many more.
Key/value tables are a nuisance. So avoid them, if you can. You'd have these tables then:
table shapes
+--------+
| shape |
+--------+
| round |
| oval |
| round |
| square |
+--------+
table colors
+--------+
| color |
+--------+
| green |
| red |
+--------+
table heights
+--------+
| height |
+--------+
| tall |
| short |
+--------+
table weights
+--------+
| weight |
+--------+
| heavy |
| light |
+--------+
table products
+-------------+--------------+--------+--------+--------+--------+
| product_no | product name | shape | color | height | weight |
+-------------+--------------+--------+--------+--------+--------+
| 14214 | Product #1 | round | red | tall | heavy |
| 22312 | Product #2 | oval | | short | heavy |
| 35757 | Product #3 | square | green | tall | heavy |
| 42468 | Product #4 | | red | short | light |
+-------------+--------------+--------+--------+--------+--------+
The query
select *
from products
where shape = 'round'
and color = 'red'
and height = 'tall'
and weight = 'heavy';
You can do the same with IDs by the way. So all lookup tables would get an ID (round = 1, oval = 2, ... green = 1, red = 2, ...) and the product table would no longer contain the words, but the IDs. The query would then be:
select *
from products
where shape_id = (select id from shapes where shape = 'round')
and color_id = (select id from colors where color = 'red')
and height_id = (select id from heights where height = 'tall')
and weight_id = (select id from weights where weight = 'heavy';
So you want make select based on option from ProductAttribute table
Better way to store data in table is to use unique/id/primary key value for fourth columns and then you can so that
SELECT * FROM ProductAttribute as attr
INNER JOIN Product as product ON product.id=attr.product_id
INNER JOIN Attribute as attr2 ON attr2.id=attr.attribute_id
WHERE attr.option=“round” OR attr.option=“red”
I hope this help you!
Fot the key/value approach I'd use composite keys to improve consistency:
attribute (attribute_no, title), PK = attribute_no
+--------------+------------+
| attribute_no | title |
+--------------+------------+
| 1 | shape |
| 2 | colour |
| ... | ... |
+--------------+------------+
attribute_option (attribute_no, option_no, value), PK = attribute_no, option_no
+--------------+-----------+------------+
| attribute_no | option_no | value |
+--------------+-----------+------------+
| 1 | 1 | round |
| 1 | 2 | square |
| 2 | 1 | green |
| 2 | 2 | red |
| ... | ... | ... |
+--------------+-----------+------------+
product (product_no, product_name, ...), PK = product_no
+------------+--------------+
| product_no | product_name |
+------------+--------------+
| 7352871 | Product #1 |
| 8956443 | Product #2 |
| ... | ... |
+------------+--------------+
product_attributes (product_no, attribute_no, option_no), PK = product_no, attribute_no
+------------+--------------+-----------+
| product_no | attribute_no | option_no |
+------------+--------------+-----------+
| 7352871 | 1 | 1 |
| 7352871 | 2 | 1 |
| 8956443 | 1 | 2 |
| 8956443 | 2 | 1 |
+------------+--------------+-----------+
(And you'd want an index on attribute_no + option_no for this table.)
The product_attributes primary key guarantees that each product only gets one value per attribute. Well, this is good for height, weight, etc. If you want to have multiple colors etc. for a product however, you need such product_attributes table including the option_no in the primary key. You may end up with separate tables for unique attributes and multiple attributes. Maybe later you even want to introduce product groups with optional and obligatory attributes (a freezer has an energy class, a t-shirt doesn't). So this whole concept may grow, but above tables should give you an idea how to best approach this.
A query for all the red round products which are also tall and heavy:
select *
from product
where product_no in
(
select product_no
from product_attributes
where (attribute_no, option_no) =
(
select ao.attribute_no, ao.option_no
from attribute_option ao
join attribute a on a.attribute_no = ao.attribute_no
where a.title = 'colour'
and ao.value = 'red'
)
)
and product_no in
(
select product_no
from product_attributes
where (attribute_no, option_no) =
(
select ao.attribute_no, ao.option_no
from attribute_option ao
join attribute a on a.attribute_no = ao.attribute_no
where a.title = 'shape'
and ao.value = 'round'
)
)
and product_no in (...)
and product_no in (...);
Or shorter with aggregation:
select *
from product
where product_no in
(
select pa.product_no
from product_attributes pa
join attribute a on a.attribute_no = pa.attribute_no
join attribute_option ao on a.attribute_no = pa.attribute_no
and a.option_no = pa.option_no
group by pa.product_no
having sum(a.title = 'colour' and ao.value = 'red') > 0
and sum(a.title = 'shape' and ao.value = 'round') > 0
and sum(a.title = 'height' and ao.value = 'tall') > 0
and sum(a.title = 'weight' and ao.value = 'heavy') > 0
)
After searching the web for "mysql key value table" (thank you #Thorsten Kettner for the keywords, as I lack the terminology), I've end up with something like:
SELECT Product.id FROM Product
INNER JOIN ProductAttributes PA_1 ON
Product.id = PA_1.product_id
INNER JOIN ProductAttributes PA_2 ON
Product.id = PA_2.product_id
WHERE
(PA_1.attribute_id = 1 and PA_1.option_id = 1)
AND
(PA_2.attribute_id = 3 and PA_2.option_id = 4);
Basically whenever a new attribute is used in the query, a different INNER JOIN condition is needed.
Which in terms of "performance" a rather noticeable hit will happen.
According to this and this a key/value table should not be used for filtering, but at this point I have no choice, so it will be up to the caching server to save the day.
I've based my answer of this (no need for GROUP BY in my case as I don't use aggregate functions) Filtering and Grouping data from table with key/value pairs

Find unique/duplicated rows from has and belongs to many association

I have following DB structure:
Table cars:
+----+-----------------------+
| id | few other columns.... |
+----+-----------------------+
| 1 | ... |
| 2 | ... |
| 3 | ... |
+----+-----------------------+
Table properties:
+----+-------+
| id | name |
+----+-------+
| 1 | title |
| 2 | type |
| 3 | brand |
| 4 | color |
+----+-------+
Table cars_properties:
+----+--------+-------------+------------+
| id | car_id | property_id | txt |
+----+--------+-------------+------------+
| 1 | 1 | 1 | Volvo V70 |
| 2 | 1 | 2 | personal |
| 3 | 1 | 3 | Volvo |
| 4 | 1 | 4 | white |
| 5 | 2 | 1 | Volvo VV |
| 6 | 2 | 2 | personal |
| 7 | 2 | 3 | Volvo |
| 8 | 2 | 4 | blue |
| 9 | 3 | 1 | Volvo XXL |
| 10 | 3 | 2 | truck |
| 11 | 3 | 3 | Volvo |
| 12 | 3 | 4 | white |
+----+--------+-------------+------------+
I would like to get all cars that have unique/duplicated values in one or many properties. Currently I'm using this SQL pattern to get duplicates for car type and brand:
SELECT cars.id FROM cars
LEFT JOIN cars_properties AS cp_0 ON cp_0.car_id = cars.id AND cp_0.property_id = 2 # => type
LEFT JOIN cars_properties AS cp_1 ON cp_1.car_id = cars.id AND cp_1.property_id = 3 # => brand
INNER JOIN (
SELECT cp_0.txt AS type_txt, cp_1.txt AS brand_txt FROM cars
LEFT JOIN cars_properties AS cp_0 ON cp_0.car_id = cars.id AND cp_0.property_id = 2
LEFT JOIN cars_properties AS cp_1 ON cp_1.car_id = cars.id AND cp_1.property_id = 3
GROUP BY cp_0.txt, cp_1.txt
HAVING COUNT(cars.id) > 1
) dupes ON cp_0.txt=dupes.type_txt AND cp_1.txt=dupes.brand_txt;
And expected result is:
+----+
| id |
+----+
| 1 |
| 2 |
+----+
Explanation: Both cars with id = 1 and 2 has type and brand that is present in more than one car (multiple times).
As for unique cars, I'm just altering: HAVING COUNT(cars.id) = 1 and I want to find all rows where the combination of properties is present only in one car (once).
It works fine, but it's extremely slow with more than 2 properties I want to check.
I cannot change the DB structure, and I'm not sure how to optimize the query, or if there are better ways of achieving this.
It feels like I would need to implement counter table, where each property id and value (txt) would also store corresponding number of occurrences in cars, and update this counter on every insert/update/delete... But I still hope there is some better SQL, that could help. Do you know some? Any advice greatly appreciated, thanks!
PS: I tried to create fiddle for it, but after I build schema I cannot run any SQL on it. To quickly setup DB with data, you can check SQL Fiddle

MYSQL pivot table create

I've tried various tutorials and examples how to make a dynamic pivot table, and I cant make it work. Every time I am getting some sql error.
Can someone help me how to make dynamic pivot table. I am new, and I guess I will make a filter search by attributes easy when I make pivot table first. Here are my tables:
Table 1 name: table_items
------------------------------
| item_id | title | ... |
------------------------------
| 1 | item 1 | ... |
------------------------------
| 2 | item 2 | ... |
------------------------------
Table 2 name: table_item_options
-----------------------------
| option_id | option_name |
-----------------------------
| 1 | Color |
-----------------------------
| 2 | Size |
-----------------------------
Table 3 name: table_attributes
--------------------------------------------------
| attribute_id | option_id | attribute_name |
--------------------------------------------------
| 1 | 1 | Blue |
--------------------------------------------------
| 2 | 1 | Red |
--------------------------------------------------
| 3 | 2 | XL |
--------------------------------------------------
| 4 | 1 | Green |
--------------------------------------------------
| 5 | 2 | L |
--------------------------------------------------
Table 4 name: table_item_attributes
------------------------------------------------------
| assigned_id | item_id | option_id | attribute_id |
------------------------------------------------------
| 1 | 1 | 1 | 1 |
------------------------------------------------------
| 1 | 1 | 1 | 2 |
------------------------------------------------------
Any help is much appreciated
What I want is to make a product filter for items by its attributes. As I understood it is best possible to make a pivot table, and to avoid duplicate results bu joining tables.
All first columns are primary keys, and with autoincrement.
EDIT:
At this point, using inner join , I create one big table, and in Select clause I put " WHERE attribute_id = '2' ". (Which means Select all Red) , but like this I can use only one filter.
So I have a problem that I cant use more than one filter from attribute_id column, and I want to filter by more attributes (other color, other size, city, etc..)
So how can I do this using pivot table ? My intention is to dinamicly create columns which would be option.name from (table_item_options) and to populate it with attribute_id or attribute_name, so I can use more filters
for example:
-----------------------------------------------------
| item_id | ... | color | size | City | etc.. |
------------------------------------------------------
| 1 | ... | 1(or Red) | L | A | ... |
------------------------------------------------------
| 1 | ... | 2(or Blue)| XL | B | ... |
------------------------------------------------------
In table like this, If I Select * .. where color = red , I will be able to filter this table by another column, eg: Where city = a
I hope it is more clear now.

Convert columns into rows

I have a table:
+--------------+-------+--------+----------+
| attribute_id | color | brand | category |
+--------------+-------+--------+----------+
| 1 | red | honda | cars |
| 2 | blue | bmw | cars |
| 3 | pink | skonda | vans |
+--------------+-------+--------+----------+
I would like to convert it to the following:
+--------------+---------+
| attribute_id | keyword |
+--------------+---------+
| 1 | red |
| 2 | blue |
| 3 | pink |
| 1 | honda |
| 2 | bmw |
| 3 | skonda |
| 1 | cars |
| 2 | cars |
| 3 | vans |
+--------------+---------+
The only way I can think of is to use UNIONs like this:
SELECT attribute_id, color from attributes
UNION ALL
SELECT attribute_id, brand from attributes
UNION ALL
SELECT attribute_id, category from attributes
The above way is a bit cumbersome, especially since my real use case will need to join multiple tables for each select.
Is there a simpler or less copy/paste way to write this?
A more efficient query (at least for large tables) is:
SELECT attribute_id,
(case when n = 1 then color
when n = 2 then brand
when n = 3 then category
end) as keyword
from attributes a cross join
(select 1 as n union all select 2 union all select 3) n;
The reason this is better than the union all query is performance. The union all will scan the original table three times. This will scan the original table once (and then loop through n). For a large table this could be a significant difference in performance.

MySQL: Select multiple rows containing values from one column

I'd like to find the car_id's of the cars that have 'FORD' AND 'SILVER' AND the user input value of '200' in the value column:
table_cars
+----+--------+----------+-----------+
| id | car_id | name | value |
+----+--------+----------+-----------+
| 1 | 1 | MAKE | FORD |
| 2 | 1 | CARLINE | FIESTA |
| 3 | 1 | COLOR | SILVER |
| 4 | 1 | TOPSPEED | 210KM/H |
| 5 | 2 | MAKE | FORD |
| 6 | 2 | CARLINE | FOCUS |
| 7 | 2 | COLOR | SILVER |
| 8 | 2 | TOPSPEED | 200KM/H |
| 9 | 3 | MAKE | HOLDEN |
| 10 | 3 | CARLINE | ASTRA |
| 11 | 3 | COLOR | WHITE |
| 12 | 3 | TOPSPEED | 212KM/H |
+----+--------+----------+-----------+
Which in this case should return only one car_id: car_id = 2.
What would be the way to go to create the SQL query for this?
What you have is a properties table. When you want to test multiple properties at once you need to join the table to itself:
SELECT c0.car_id
FROM table_cars AS c0
JOIN table_cars AS c1 ON c1.car_id=c0.car_id
JOIN table_cars AS c2 ON c2.car_id=c1.car_id
WHERE c0.name='MAKE' AND c0.value='FORD'
AND c1.name='COLOR' AND c1.value='SILVER'
AND c2.name='TOPSPEED' AND c2.value='200KM/H'
Having the surrogate id present in a properties table is questionable. It doesn't seem to be doing anything; each property isn't an entity of its own. Unless the id is required by some other element, I'd get rid of it and make car_id, name the primary key (a composite primary key).
I assume that every car needs to have variable parameters, otherwise you wouldn't have gone with a setup like this. It would be much easier if MAKE, CARLINE, COLOR, and TOPSPEED each had their own column.
Using the table you've provided, however, you need to use subqueries. http://dev.mysql.com/doc/refman/5.0/en/subqueries.html
The query should look something like this (untested):
SELECT * FROM table_cars WHERE id IN (SELECT * FROM table_cars WHERE name="MAKE" AND value="FORD") AND id IN (SELECT * FROM table_cars WHERE name="COLOR" AND value="SILVER") AND id IN (SELECT * FROM table_cars WHERE name="TOPSPEED" AND value="200KM/H")