Basic mysql versioning? - mysql

We have a shopping cart as pictured below, The setup works well, except for one fatal flaw. If you place an order the order is linked to a product, so If I update the product after you have purchased the product there is no way for me to show you want the product looked like when you bought it (including price). This means we need versioning.
My plan at present is to, when a new product, or variant is created, or an existing one is edited, create a duplicate of the product or variant in the database. When a purchase is made, link the order to the version, not the product.
This seems rather simple, except from what I can see the only things we don't need to version are the categories (as no one cares what categories it was in.). So we need to version:
Products
Variants
The key -> value pairs of attributes for each version
The images
My current thinking is,
note: When a product is created a default variant is created as well, this cannot be removed.
When a product is created
Insert the product into the products table.
Create the default variant
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate the variant into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
When a product is edited
Update the product into the products table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
Duplicate all variant_image_links into the variant_Image_link_version table
Replace current variant_id column with variant_version_id column
When a variant is added
Add the variant into the variants table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
When a variant is edited
Update the variant in the variants table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
Duplicate all variant_image_links into the variant_Image_link_version table
Replace current variant_id column with variant_version_id column
So the final structure looks like Full Size
Now this all seems great, except it seems like a heck of a lot of duplicated data, e.g. if we update a product we duplicate the variants even though they would not have been updated since they were inserted. Also, this seems like a lot of work.
Is there a better way of doing this?

You can do what ERP (and also possibly Payroll) systems do: Add a Start and End Date/Time. So...
the variant and prices match with their product based on the common dates.
all queries default to running on current date and the joins between each table need to also take into account the overlapping/intersecting date ranges. parent_start_date <= child_start_date AND parent_end_date >= child_end_date
You would end up with duplicated rows for each price change or variant but you then don't need to keep update as many records (like variant ids) when the product price changes.
Need to ensure valid dates are used. PS: Use your system's max date for the End datetime of the most current/recent record.
Btw, some related questions along the same line:
Ways to implement data versioning in MongoDB
Ways to implement data versioning in PostreSQL
Ways to implement data versioning in Cassandra
Row versioning in MySQL

Another approach to this would be to never edit or remove your data, only create new data. In SQL terms, the only operations you ever run on your tables are INSERTs and SELECTs.
To accomplish what you want, each table would need the following colums:
version_id - this would be your primary key
id - this would be the thing that holds versions of your object together (e.g. to find all versions of a product, SELECT * FROM products WHERE id = ?)
creation_date
is_active - you're not deleting anything, so you need to flag to (logically) get rid of data
With this, here's what your products table would look like:
CREATE TABLE products (
version_id CHAR(8) NOT NULL PRIMARY KEY,
id INTEGER NOT NULL,
creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
is_active BOOLEAN DEFAULT true,
name VARCHAR(1024) NOT NULL,
price INTEGER NOT NULL
);
CREATE TABLE variants (
version_id CHAR(8) NOT NULL PRIMARY KEY,
id INTEGER NOT NULL,
creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
is_active BOOLEAN DEFAULT true,
product_version_id CHAR(8) NOT NULL,
price INTEGER NOT NULL,
override_price INTEGER NOT NULL,
FOREIGN KEY (product_version_id) REFERENCES products(version_id)
);
Now, to insert into either table
Generate a unique version_id (there are several strategies for this, one is to use a database sequence, or for MySQL use ant AUTO_INCREMENT).
Generate an id. This id is consistent for all versions of a product.
To update a row in a table, one must insert the entire graph e.g. to update a product, one must insert a new product, and new variants. (There is a lot of room for optimization here, but it's easiest to start with the un-optimized solution.)
For example, to update a product
Generate a unique version_id
Use the same id
Insert new product variants. The variants will be the same as the ones linked to the previous version of the product that you're "updating", except the product_version_id will be different.
This principal can extend to all your tables.
To find the most recent version of a product, you need to use the creation_date column to get the product that was most recently created.
This model will use more space, but I think this may be a fair trade-off given it's simplicity: there are only INSERTs and SELECTs and data is never mutated.

Related

SQL add additional ID for product variant

I'm building a table that tracks physical products variants' (like a T-shirt with print various colorways) with a foreign and primary key of product_id and the primary key of variant_id. I would like the format of the variant_id to be auto-incrementing after the product_id. So if there is a product with id 123, and I would add variants to that product, that it starts with 123-1, then 123-2 for the second variant, and just auto-increments for each variant.
What I now have is an auto-incrementing variant_id and product_id, but without that format, it's harder to process.
How can I configure this in the table setup? And what is the conventional way of doing this, would that be auto-incrementing both without a format and writing a query to format it?
Current variants table:
I've created this DBfiddle that first creates a products table and then a variants table. This works however, the variant_id isn't in the desired format, because it auto increments its own value instead of taking product_id into the mix.
In this DBfiddle I also used ENGINE = MyISAM which I'm not sure is efficient for the database design.
Here is the current output of a test variants table, in which the variant_id of 1 should be 1-1 and the variant_id of 2 should be 3-1 (product number 3, first variant). I inserted these by specifying the product_id and variant_name.

Reuse existing product IDs as primary key

I have some products, which has it's own ids and i'm designing MySQL DB and then I will import this data, there is much more than product table, but it doesn't matter now.
It's good idea to reuse existing product ids as primary key? So into the autoincerement ID column will be imported existing product ids, I never did that like I'm describing.
It is also worth to mention, that IDs are normal unsigned integer values and also that products are now only some rows in xls sheet.
I think it would be great to keep the IDs as they are if you have any relationship build up upon those IDs, and for the new IDs that will be added just let them increment with the identity property.
To insert defined IDs on an identity column (auto-increment) use the following:
Set Identity_Insert [TableName] On
-- --------------------------------------------
youre insert query goes here
-- --------------------------------------------
Set Identity_Insert [TableName] Off

Having Three AUTO-INC fields in mySQL or emulating this with TRIGGERS

I have a table called contents which contains the ingredients of a specific chemical formula. As suspected, if the ingredients are added to the formula in the wrong order, the formula is not successful.
So, consider that I have six fields:
id | formula_id | ingredient_id | quantity | item_id | add_id
Where:
id = the PK and primary index
formula_id = a repeating integer depending on the id of the formula
ingredient_id = the PK from the "ingredients" table
quantity = self-explanatory
item_id = the UNIQUE one-based item id of that ingredient as it pertains to the formula
add_id = the UNIQUE zero-based index of the order in which this ingredient is added to the formula
So, as I am modifying formulas, and adding ingredients, I want to make sure that both the item_id and add_id are incremental integers that are handled by mySQL rather than the PHP code and in a manner that they can be modified later on (should the order of the added ingredients need to be adjusted).
Since I cannot find a decent TRIGGER writing tutorial nor anything about having three AUTO-INC fields, where two only increment based on the "formula_id", I come here and ask for your help.
After some trial and error, I've discovered that it's more my terminology that's incorrect than methodology. What I should have been looking for was a way to create a UNIQUE INDEX based on other fields.
Hence, the solution to my problems is as follows:
ALTER TABLE `chem`.`formulas`
DROP INDEX `item_id`,
DROP INDEX `add_id`,
ADD UNIQUE INDEX `item_id` (`id`, `formula_id`, `ingredient_id`),
ADD UNIQUE INDEX `add_id` (`id`, `formula_id`, `ingredient_id`);

Sphinx Search, compound key

After my previous question (http://stackoverflow.com/questions/8217522/best-way-to-search-for-partial-words-in-large-mysql-dataset), I've chosen Sphinx as the search engine above my MySQL database.
I've done some small tests with it, and it looks great. However, i'm at a point right now, where I need some help / opinions.
I have a table articles (structure isn't important), a table properties (structure isn't important either), and a table with values of each property per article (this is what it's all about).
The table where these values are stored, has the following structure:
articleID UNSIGNED INT
propertyID UNSIGNED INT
value VARCHAR(255)
The primary key is a compound key of articleID and propertyID.
I want Sphinx to search through the value column. However, to create an index in Sphinx, I need a unique id. I don't have right here.
Also when searching, I want to be able to filter on the propertyID column (only search values for propertyID 2 for example, which I can do by defining it as attribute).
On the Sphinx forum, I found I could create a multi-value attribute, and set this as query for my Sphinx index:
SELECT articleID, value, GROUP_CONCAT(propertyID) FROM t1 GROUP BY articleID
articleID will be unique now, however, now I'm missing values. So I'm pretty sure this isn't the solution, right?
There are a few other options, like:
Add an extra column to the table, which is unique
Create a calculated unique value in the query (like articleID*100000+propertyID)
Are there any other options I could use, and what would you do?
In your suggestions
Add an extra column to the table, which is unique
This can not be done for an existing table with large number of records as adding a new field to a large table take some time and during that time the database will not be responsive.
Create a calculated unique value in the query (like articleID*100000+propertyID)
If you do this you have to find a way to get the articleID and propertyID from the calculated unique id.
Another alternative way is that you can create a new table having a key field for sphinx and another two fields to hold articleID and propertyID.
new_sphinx_table with following fields
id - UNSIGNED INT/ BIGINT
articleID - UNSIGNED INT
propertyID - UNSIGNED INT
Then you can write an indexing query like below
SELECT id, t1.articleID, t1.propertyID, value FROM t1 INNER JOIN new_sphinx_table nt ON t1.articleID = nt.articleID AND t1.propertyID = nt.propertyID;
This is a sample so you can modify it to fit to your requirements.
What sphinx return is matched new_sphinx_table.id values with other attributed columns. You can get result by using new_sphinx_table.id values and joining your t1 named table and new_sphinx_table

opinions and advice on database structure

I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
And I have a list of categories to break these rows up into, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).
Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.
Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.
What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.
Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like
select * from Generated where original_string = '$string'
// id is put into $id
and then
select * from Category_Values where generated_id = '$id'
...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?
My suggestion:
Table: Generated
id unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method ENUM('manual','auto')
original_string varchar (255)
Table: Categories
id unsigned int autoincrement primary key
category_name varchar(20)
Table: Category_Values
id unsigned int autoincrement primary key
category_id int
generated_id int
category_value varchar (255) - value for the category
FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id
Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html
I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)
Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).