I'm building a table that tracks physical products variants' (like a T-shirt with print various colorways) with a foreign and primary key of product_id and the primary key of variant_id. I would like the format of the variant_id to be auto-incrementing after the product_id. So if there is a product with id 123, and I would add variants to that product, that it starts with 123-1, then 123-2 for the second variant, and just auto-increments for each variant.
What I now have is an auto-incrementing variant_id and product_id, but without that format, it's harder to process.
How can I configure this in the table setup? And what is the conventional way of doing this, would that be auto-incrementing both without a format and writing a query to format it?
Current variants table:
I've created this DBfiddle that first creates a products table and then a variants table. This works however, the variant_id isn't in the desired format, because it auto increments its own value instead of taking product_id into the mix.
In this DBfiddle I also used ENGINE = MyISAM which I'm not sure is efficient for the database design.
Here is the current output of a test variants table, in which the variant_id of 1 should be 1-1 and the variant_id of 2 should be 3-1 (product number 3, first variant). I inserted these by specifying the product_id and variant_name.
Related
My client has given me about 14k urls of various products and he wants me to store all the price changes of that product per day. I think it will require an immense amount of db storage and a lot of optimization. I've never done this before. I'm using mysql DB. Should I store all these price changes per product in a JSON column or as separate row? Looking for tips regarding this. Thanks!
JSON columns are not as efficient as normal SQL columns and should be reserved for when you're not sure what data you're going to have. You're pretty sure what data you're going to have.
This is a pretty straightforward two table schema. One table for the product, and one for its price changes.
create table product (
id integer primary key auto_increment,
name varchar,
url varchar unique,
...any other information about the product you might want to store...
index(url)
);
By giving it a primary key it shields you from the URL changing, and it reduces the amount that must be stored in tables that refer to it. They only have to store the integer primary key, not the whole URL. The URL is indexed for faster searches.
Now that you have a product table other tables can refer to it. Like a table of price changes.
create table product_price_changes (
product_id integer references product(id),
price numeric(9,2) not null,
change_time datetime not null,
index(change_time)
);
This table stores when the price for a product changes, and what that price is. This is how you attach lists of data to things in SQL. The change_time is indexed for faster searches.
A simple join lets you efficiently see all the changes to a particular product in order.
select price, change_time
from product_price_changes ppc
join product prod on ppc.product_id = prod.id
where prod.url = ?
order by change_time
We are really having a technical trouble of designing the primary keys for our new data intensive project.
Please explain us which PK design is better for our data intensive database.
The database is data intensive and persistence.
Atleast 3000 users access it per second.
Please tell us technically which type of PK is better for our database and the tables are less likely to change in the future.
1.INT/BIGINT auto increment column as PK
2.Composite keys.
3.Unique varchar PK.
I would go for option 1, using a BIGINT autoincrement column as the PK. The reason is simple, each write will write to the end of the current page, meaning inserting new rows is very fast. If you use a composite key, then you need an order, and unless you are inserting in the order of the composite key, then you need to split pages to insert, e.g. Imagine this table:
A | B | C
---+---+---
1 | 1 | 4
1 | 4 | 5
5 | 1 | 2
Where the primary key is a composite key on (A, B, C), suppose I want to insert (2, 2, 2), it would need to be inserted as follows:
A | B | C
---+---+---
1 | 1 | 4
1 | 4 | 5
2 | 2 | 2 <----
5 | 1 | 2
So that the clustered key maintains its order. If the page you are already inserting too is already full, then MySQL will need to split the page, moving some of the data to a new page to make room for the new data. These page splits are quite costly, so unless you know you are inserting sequential data then using an autoincrement column as the clustering key means that unless you mess around with the increments you should never have to split a page.
You could still add a unique index to the columns that would be the primary key to maintain integrity, you would still have the same problem with splits on the index, but since the index would be narrower than a clustered index the splits would be less frequent as more data will fit on a page.
More or less the same argument applies against a unique varchar column, unless you have some kind of process that ensures the varchar is sequential, but generating a sequential varchar is more costly than an autoincrement column, and I can see no immediate advantage.
This is not easy to answer.
To start with, using composite keys as primary keys is the straight-forward way. IDs come in handy when the database structure changes.
Say you have products in different sizes sold in different countries. Primary keys are bold.
product (product_no, name, supplier_no, ...)
product_size (product_no, size, ean, measures, ...)
product_country (product_no, country_isocode, translated_name, ...)
product_size_country (product_no, size, country_isocode, vat, ...)
It is very easy to wite data, because you are dealing with natural keys, which is what users work with. The dbms garantees data consistency.
Now the same with technical IDs:
product (product_id, product_no, name, supplier_no, ...)
product_size (product_size_id, size, product_id, ean, measures, ...)
product_country (product_country_id, product_id, country_id, translated_name, ...)
product_size_country (product_size_country_id, product_size_id, country_id, vat, ...)
To get the IDs is an additional step needed now, when inserting data. And still you must ensure that product_no is unique. So the unique constraint on product_id doesn't replace that constraint on product_no, but adds to it. Same for product_size, product_country and product_size_country. Moreover product_size_country may now link to product_country and product_size_country of different products. The dbms cannot guarantee data consistency any longer.
However, natural keys have their weakness when changes to the database structure must be made. Let's say that a new company is introduced in the database and product numbers are only unique per company. With the ID based database you would simply add a company ID to the products table and be done. In the natural key based database you would have to add the company to all primary keys. Much more work. (However, how often must such changes be made to a database. In many databases never.)
What more is there to consider? When the database gets big, you might want to partitionate tables. With natural keys, you could partition your tables by said company, assuming that you will usually want to select data from one company or the other. With IDs, what would you partition the tables by to enhance access?
Well, both concepts certainly have pros and cons. As to your third option to create a unique varchar, I see no benefit in this over using integer IDs.
I am creating a site that is sort of ecommerce-ish. I want to give my users a perfect search ability using specific attributes that differ from product to product. I plan to create 1 products table storing the basic information that is shared among products i.e Name, Description, Price and a few others. Then I plan to create several "details" table say categories_computers with columns Processor, HDD, RAM, etc and another table say table_shoes with columns MATERIAL, SIZE, GENDER, etc.
I am new to Mysql but not to the concept of Databases. I don't think I will have a problem storing this data to each table. My issue comes about from reads. It won't be hard to query a product id but I think it would be extremely wasteful to query all details tables to get the details of the product since 1 product can only have 1 details.
So my question is how can I store a reference to a table in a column so that a product has say ID, Name, Description, Price, Details_Table_ID or something similar to save on queries. Do tables have unique ids in Mysql? Or how does the Stackoverflow community suggest I go about this? Thanks.
EDIT
Silly me, I have just remembered that every table name is uniques so I can just use that, so my question changes to how I can write a query that contains one cell in a table A to be used as a reference to a Table name.
Don't use separate details tables for each category, use a generic details table that can store any attribute. Its columns would be:
Product_ID INT (FK to Products)
Attribute VARCHAR
Value VARCHAR
The unique key of this table would be (Product_ID, Attribute).
So if Product_ID = 1 is a computer, you would have rows like:
1 Processor Xeon
1 RAM 4GB
1 HDD 1TB
And if Product_ID = 2 is shoes:
2 Material Leather
2 Size 6
2 Gender F
If you're worried about the space used for all those attribute strings, you can add a level of indirection to reduce it. Create another table Attributes that contains all the attribute names. Then use AttributeID in the Details table. This will slow down some queries because you'll need to do an additional join, but could save lots of space
Think about just having a single ProductDetails table like this:
ProductDetailID (PK)
ProductID (foreign key to your Products table)
DetailType
DetailValue
this way you do not have to create new columns every time you add a new product detail type. and you'll have many ProductDetail rows for each productid, which is fine and will query ok. Just be sure to put an index on ProductDetails.ProductID !
Since this is an application so you must be generating the queries. So lets generate it in 2 steps. I assume you can add a column product_type_id in your Product table that will tell you which child table to user. Next create another table Product_type which contains columns product_type_id and query. This query can be used as the base query for creating the final query e.g.
Product_type_id | Query
1 | SELECT COMPUTERS.* FROM COMPUTERS JOIN PRODUCT ON COMPUTERS.PRODUCT_ID = PRODUCT.PRODUCT_ID
2 | SELECT SHOES.* FROM SHOES JOIN PRODUCT ON COMPUTERS.PRODUCT_ID = PRODUCT.PRODUCT_ID
Based on the product_id entered by the user lookup this table to build the base query. Next append your where clause to the query returned.
We have a shopping cart as pictured below, The setup works well, except for one fatal flaw. If you place an order the order is linked to a product, so If I update the product after you have purchased the product there is no way for me to show you want the product looked like when you bought it (including price). This means we need versioning.
My plan at present is to, when a new product, or variant is created, or an existing one is edited, create a duplicate of the product or variant in the database. When a purchase is made, link the order to the version, not the product.
This seems rather simple, except from what I can see the only things we don't need to version are the categories (as no one cares what categories it was in.). So we need to version:
Products
Variants
The key -> value pairs of attributes for each version
The images
My current thinking is,
note: When a product is created a default variant is created as well, this cannot be removed.
When a product is created
Insert the product into the products table.
Create the default variant
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate the variant into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
When a product is edited
Update the product into the products table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
Duplicate all variant_image_links into the variant_Image_link_version table
Replace current variant_id column with variant_version_id column
When a variant is added
Add the variant into the variants table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
When a variant is edited
Update the variant in the variants table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
Duplicate all variant_image_links into the variant_Image_link_version table
Replace current variant_id column with variant_version_id column
So the final structure looks like Full Size
Now this all seems great, except it seems like a heck of a lot of duplicated data, e.g. if we update a product we duplicate the variants even though they would not have been updated since they were inserted. Also, this seems like a lot of work.
Is there a better way of doing this?
You can do what ERP (and also possibly Payroll) systems do: Add a Start and End Date/Time. So...
the variant and prices match with their product based on the common dates.
all queries default to running on current date and the joins between each table need to also take into account the overlapping/intersecting date ranges. parent_start_date <= child_start_date AND parent_end_date >= child_end_date
You would end up with duplicated rows for each price change or variant but you then don't need to keep update as many records (like variant ids) when the product price changes.
Need to ensure valid dates are used. PS: Use your system's max date for the End datetime of the most current/recent record.
Btw, some related questions along the same line:
Ways to implement data versioning in MongoDB
Ways to implement data versioning in PostreSQL
Ways to implement data versioning in Cassandra
Row versioning in MySQL
Another approach to this would be to never edit or remove your data, only create new data. In SQL terms, the only operations you ever run on your tables are INSERTs and SELECTs.
To accomplish what you want, each table would need the following colums:
version_id - this would be your primary key
id - this would be the thing that holds versions of your object together (e.g. to find all versions of a product, SELECT * FROM products WHERE id = ?)
creation_date
is_active - you're not deleting anything, so you need to flag to (logically) get rid of data
With this, here's what your products table would look like:
CREATE TABLE products (
version_id CHAR(8) NOT NULL PRIMARY KEY,
id INTEGER NOT NULL,
creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
is_active BOOLEAN DEFAULT true,
name VARCHAR(1024) NOT NULL,
price INTEGER NOT NULL
);
CREATE TABLE variants (
version_id CHAR(8) NOT NULL PRIMARY KEY,
id INTEGER NOT NULL,
creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
is_active BOOLEAN DEFAULT true,
product_version_id CHAR(8) NOT NULL,
price INTEGER NOT NULL,
override_price INTEGER NOT NULL,
FOREIGN KEY (product_version_id) REFERENCES products(version_id)
);
Now, to insert into either table
Generate a unique version_id (there are several strategies for this, one is to use a database sequence, or for MySQL use ant AUTO_INCREMENT).
Generate an id. This id is consistent for all versions of a product.
To update a row in a table, one must insert the entire graph e.g. to update a product, one must insert a new product, and new variants. (There is a lot of room for optimization here, but it's easiest to start with the un-optimized solution.)
For example, to update a product
Generate a unique version_id
Use the same id
Insert new product variants. The variants will be the same as the ones linked to the previous version of the product that you're "updating", except the product_version_id will be different.
This principal can extend to all your tables.
To find the most recent version of a product, you need to use the creation_date column to get the product that was most recently created.
This model will use more space, but I think this may be a fair trade-off given it's simplicity: there are only INSERTs and SELECTs and data is never mutated.
Maybe a newbie question about foreign keys, but I want to know the answer.
Let's say I have 2 tables:
products
--------
product_id (int)
name (unique) (varchar)
description (text)
vendor (varchar) (foreign key: vendors.name)
AND
vendors
--------
name (varchar)
I know that I should use a vendor_id (int), but this is just an example to help me ask my question.
So: if I create vendor: Apple, and product: 1, iPhone 4, Description.., Apple then the varchar "Apple" will be stored both in products and vendors, or just in vendors (because of the foreign key)?
Is this a wrong db design?
This is called "normalization" in the database. In your example, there are a couple things to consider:
In order for products to have a foreign key to vendors, vendors needs a key. Is name the primary key for vendors? If so, then the foreign key would also be a varchar. In that case, yes, the value "Apple" would be stored in both. (Note that this isn't a very good idea.)
If you add a vendor_id integer column to the vendors table, and it is the primary key for that table, then you can add a vendor_id (or any other name) column to the products table and make it a foreign key to the vendors table. In this case, only that integer would be stored in both tables. This is where the data becomes normalized. A small, simpler data type (integer) links the tables, which contain the actual data which describes the records.
Only that key value is stored in both tables. It's used as a reference to join the tables when selecting data. For example, in order to select a given product and its vendor, you'd do something like this:
SELECT products.name, products.description, vendors.name AS vendor
FROM products INNER JOIN vendors ON products.vendor_id = vendors.vendor_id
WHERE products.product_id = ?id
This would "join" the two tables into a single table (not really, just for the query) and select the record from it.
It will be stored in both. The foreign-key constraint requires that every value in products.vendor appear somewhere in vendor.name.
(By the way, note that MySQL only enforces foreign-key constraints if the storage engine is InnoDB.)