MySQL table design - mysql

I have a table with products. Each product has a title and a price.
The products come in huge XML files, on a daily basis.
I store all of them in MySQL. But sometimes they have a wrong title. But i can't edit it, because they will be lost the next day (cronjob removes all products and inserts again).
What would be the best way to edit them? Save them in a different table and SELECT both tables at once? Whereas the table that contains the edited rows has precedence over the cronjob table.
What would be the best way to handle it, since there are 300.000+ products. Products might be (manually) edited via a CMS system.
Thanks!

Is there some sort of ID that remains constant? (productID) for example?
Can you edit the cronjob?
If both of the above is true; i'd edit the job to only add new records into the table; preventing writing over your updated values.

If there is a unique identifier for each product that remains constant over updates, you could make a table containing the product ID and the corrected title. Correcting a title would involve inserting a row into this table as well as updating the main table.
As the last step of the cron job, you can then update your main table of products from this one.
UPDATE FROM tblProduct p, tblProductCorrections pc
SET p.strTitle = pc.strCorrectedTitle
WHERE p.intId = pc.intProductId

Related

Storing products in order - product is deleted after order placed

I have an orders table, which currently stores the product ID and the quantity.
However, should the product be removed from the site, then when getting the data for the order, there won't be a relationship between the order and products (product no longer exists).
What is the best way of rectifying this? Do I need to store the product name etc with the order?
I wouldn't recommend hard deleting your products in the first place.
Instead you can soft delete them by adding an extra bool to your products table named deleted and put that to true when deleting a product. This way you keep your references to old, deleted, products.
Note that you will have to change your SELECT query to include WHERE deleted = false. So you only get the products that aren't deleted.
This is also usefull if you accidently delete a product which you didn't want to delete. Because you can easily change it back.
Some systems (*) deal with this problem exactly as you indicated in your question - by copying the product name (and other relevant data) to the order, so that the order will still have all the information in it even if the product is deleted.
(*) For example, both OpenCart and Zen Cart do this.

MySQL count selected rows in one table to update value in another table

I have created a table ("texts" table) for storing ocr text from scanned documents. The table now has 100,000 + records. It stores a separate record for each page in the document. I set up the table originally so it stored the documents' title and its location against each record, which was obviously bad design as the info was duplicated for many records. I have subsequently created a separate table which now only stores one record for each document ("documents" table). The original table still contains a record for each page in the document, but the only columns now are the ocr text and the id of the document record in the documents table.
The documents table has a column "total_pages". I am trying to update this value using the following query:
UPDATE documents SET total_pages=(SELECT Count(*) from texts where texts.docs_id=documents.id)
This just seems to take forever to execute and I have had to crash out of it on a couple of occasions. There are over 8000 records in the documents table.
I have tested the query by limiting it to just one document
UPDATE documents SET total_pages=(SELECT Count(*) from texts where texts.docs_id=documents.id and documents.id=1)
This works eventually with just one record, but it takes a very long time to execute. I am guessing that my full query needs a bit of optimization! Any help greatly appreciated.
This is your query:
UPDATE documents
SET total_pages = (SELECT Count(*)
from texts
where texts.docs_id = documents.id)
For performance, you want an index on texts(docs_id). That will probably fix your performance problem. In fact, it might make it unnecessary to store this value in the master table.
If you do decide to store the count, be sure that you keep the value up-to-date. That would typically require a trigger to handle inserts and dates (and perhaps updates, if doc_id changes).

MySQL: - Bulk insert category to all posts in wordpress

I am searching for a MySQL query that via PHPmyAdmin inserts a category (Not modify existing) to all existing posts. Bulk edit in Wordpress dashboard doesn't work for me because of high server load, I have more than 2000 posts and I need to add an another category "Italy" to all of them and do not remove existing categories (italian cities). Can anyone help?
I've found some useful information for your problem:
wp_categories
If you have any categories in your WordPress installation, wp_categories is the table that keeps those records. Category names and descriptions are stored here, as well the ID of each category’s parent.
To work faster, WordPress often keeps aggregated values in the database, instead of recalculating them each time. For example, frequently requested counts of posts and links in each category are simply stored in the wp_categories table (WordPress uses the same set of categories for both links and posts). Every time you add post to a category, the post counter (column category_counter) increases. Every time you remove post from the category, the counter decreased. Same goes for links (column link_count). That’s why you see those extra columns in the table.
wp_post2cat
Linking posts to categories is done via wp_post2cat table. This is a standard approach for many-to-many relationships in relational databases. Thewp_post2cat table has only three fields: the unique record ID (automatically generated), the ID of the post, and the ID of the category to add the post to.
So if you could just give us 1 table row of the "wp_categories" we could easily create a query for bulk actions.
source: http://wpbits.wordpress.com/2007/08/08/a-look-inside-the-wordpress-database/
cheers.

Possible to always hide a subset of a table?

I have a shop set up with lots of tables that are joined together in various ways, as per usual.
In my products table, I have a field called 'status'. If the status = 4, then the product is archived.
I want to ensure that no queries ever return anything with a status of 4. Right now I'm about to add a AND status <> 4 to every SQL query I can find.
Is there a better way to do this, or is that the only way?
You can create a view that doesn't show status = 4 and use that view in your query's instead.
Presumably changing a product's status between archived and unarchived is a pretty rare operation, while selecting from your product table is extremely common. Therefore you should make a table archived and move all archived products there.
This would be the best-performing solution.
If you also occasionally wanted to view all products, whether archived or not, then you could also make a view that combines the archived and products tables.

How to delete from a database?

I know of two ways to delete data from a database table
DELETE it forever
Use a flag like isActive/isDeleted
Now the problem with isActive is that I have to track everywhere in my SQL queries that whether the record is active or not. Using DELETE however gets rid of the data forever.
What would be the best way to backup this data?
Assuming I have multiple tables in a database, should I have a common function which just backs everything up and stores it in another table (in XML probably?) or is there any other way.
I am using MySQL but am curious about techniques used in other DBs as well.
Replace the table with a view that hides the inactive items.
Or write a trigger on DELETE that backs up the row to an archive table.
You could use a trigger that fires on deleting records to back them up into some kind of graveyard table.
You could use an isDeleted column and defien a view which selects all columns except isDeleted with the condition isDeleted=false. Then have all your stps work only with the view.
You could maintain a history table, where you back the record up and time stamp
One of the biggest reasons for not deleting data is that it may be required for a relation - for example the the user may decide to delete an old customer from the database, but you still need the customer record because it is referenced by old invoices (which may have a much longer lifespan).
Based on this the best solution is often the "IsDeleted" type of column, combined with a view (Quassnoi has mentioned partitioning, which can help with performance issues that might pop up due to a lot of invisible data).
You can partition your tables on the DELETED column and define the views which would include the condition:
… AND deleted = 0
This will make the queries over the active data just as simple and efficient.
Well, if you were using SqlServer you can use triggers, which will allow you to move the record to a deleted table.