I have a shop set up with lots of tables that are joined together in various ways, as per usual.
In my products table, I have a field called 'status'. If the status = 4, then the product is archived.
I want to ensure that no queries ever return anything with a status of 4. Right now I'm about to add a AND status <> 4 to every SQL query I can find.
Is there a better way to do this, or is that the only way?
You can create a view that doesn't show status = 4 and use that view in your query's instead.
Presumably changing a product's status between archived and unarchived is a pretty rare operation, while selecting from your product table is extremely common. Therefore you should make a table archived and move all archived products there.
This would be the best-performing solution.
If you also occasionally wanted to view all products, whether archived or not, then you could also make a view that combines the archived and products tables.
Related
Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.
I am creating a new DB in MySQL for an application and wondered if anyone could provide some advice on the following set up. I'll try and simplify things as best as I can.
This DB is designed to store alerts which are related to specific items created by a user. In turn there is the need to store notes related to the items and/or alerts. At first I considered the following structure...
USERS table - to store basic app user info (e.g. user_id. name, email) - this is the only bit I'm fairly certain does not need to be changed
ITEMS table: contains info on particular item (4 fields or so). Contains user_id to indicate which user created/owns this item
ALERTS table: contains info on the alert, item_id to indicate which item the alert is related to, contains user_id to indicate which user created alert
NOTES table: contains note info, user_id of note owner, item_id if associated with an item, alert_id if associated with alert
Relationships:
An item does not always have an an alert associated with it
An item or alert does not always have a note associated with it
An alert is always associated with an item. More than one alert can be associated with the same item.
A note is always associated with an item or alert. More than one note can be associated with the same item or alert.
Once first created item info is unlikely to be updated by a user.
For arguments sake let's say that each user will create an average of 10 items, each item will have an average of 2 alerts associated with it. There will be an average of 2 notes per item/alert.
Very common queries that will be run:
1) Return all items created by a particular user with any associated alerts and notes. Given a user_id this query would span 3 tables
2) Checking each day for alerts that need to be sent to a user's email address. WHERE alert date==today, return user's email address, item name and any associated notes. This would require a query spanning 4 tables which is why I'm wondering if I need to take a different approach...
Option 1) one table to cover items, alerts and notes. user_id owner for each row. Every time you add a note to an item or alert you are repeating the alert and/or item info. Seems a bit wasteful but item and alert info won't be large.
Option 2) I don't foresee the need to query notes (famous last words?) so how about serializing note data so multiple notes are stored in one row in either the item or alert table (or just a combined alert/item table)
Option 3) Anything else you can think of? I'm asking this question as each option I've considered doesn't feel quite right.
I appreciate this is currently a small project and so performance shouldn't be of great concern and I should just go with the 4 tables. It's more that my common queries will end up being relatively complex that makes me think I need to re-evaluate the structure.
I would say that the common wisdom is to normalize to start and denormalize only when performance data suggest that it's necessary.
Make sure that your tables are indexed properly, with foreign key relationships for JOINs.
If you think you'll end up with a lot of data, this might be a good time to think about a partitioning strategy. Partitioning your fast-growing tables by time would be a good first step.
Four tables is not complex. I commonly write report queries that hit 15 or more tables in a database structure that has hundreds of tables (most with millions of records) and I wouldn't even say our dbs are anything more than medium sized (a typical db in our system might have around 200 gigs of data, so not large at all as databases go). Because they are properly indexed, they still run fast unless I am doing very complex calculations. Normalize, don't even consider denormalizing until you are an experienced database designer who knows better than to worry about the number of tables.
We have a products table. Users can create new products as copies of existing products.
Instead of simply duplicating this data, we're thinking in order to minimize database size, we would store only the differences from the "parent" product. (were talking thousands of products)
My thinking is that, for each new "child" product, we create a new record in that same table which has a "parent" field which has the ID of the parent product.
So, when querying for the "child" product, is there a way to merge the results so that any empty fields in the child record will be taken from the parent?
(I hope this makes sense)
Yes, you can do this.
Say for example Your table name is Product and you want to retrieve name of child product, Then you can query as,
select IF(c.productName = '',p.productName,c.productName) as childProductName
from Products p,Products c
where c.ID = p.ParentID
Similarly you can do this for other fields.
I would anticipate that you'd want to have child products of child products (e.g. product C is based on product B, which is in turn based on product A.) And there would be children of those and so on (especially with user generated content.) This could get out of hand very quickly and require you to make either long cumbersome queries or collect the data with code rather than SQL queries.
I'm just offering this as a consideration because the saving is size often yield a cost of processing time. Just be sure you consider this before you jump into something that can't easily be undone.
I've got a question to which I've had opposing pieces of advice, would appreciate additional views.
My site has users, each with a user_id. These users can view products, and I need to keep track of the unique instances of users viewing specific products. To record a view in a separate views table, I've currently got two options:
OPTION 1:
view_id (INT,PK) | user_id (INT,FK) | product_id (INT,FK) | view_date
... and create a unique constraint over the two middle columns for easy updating with ON DUPLICATE KEY. If the same view already exists, I just update view_date. If not, I write a new row.
OPTION 2:
user_product (VARCHAR20,PK) | view_date
... merge the two ids into a VARCHAR with a separator in the middle, and use the primary key column for easy updating with ON DUPLICATE KEY in the same way as above.
The structure should accommodate up to approx. million unique views. Any thoughts on which option might be better or worse, and why? Big thanks in advance.
EDIT:
Thanks for the answers, seems like there's a consensus. Was leaning to the same side but just needed the reassurance.
I like the first option better - in general, its good to maintain as much atomicity as possible. If you ever want to query for all of a user's views, or something like that, it would be more difficult to do after merging two columns into one (you would need to use LIKE with a wildcard match, which will never be as fast as an indexed single-valued column). You also lose the ability to index on different fields.
Also, there is no reason why you couldnt have a primary or unique key that involved multiple columns, so I see no advantage to option 2. To perform your update, just use REPLACE (documentation) instead of INSERT - this will allow you to easily maintain your invariant of having only one row per user/product combination.
I think that the first option is your better choice. Later down the line I think it will make querying for different things a bit easier. Queries will likely be faster as well since there won't be string manipulation involved. Further, you can have a primary key over multiple columns if you need.
Definitely go for the first option. The second option will mean many queries from hell if you need to make reports to look for particular groups of users (get me all users that often view product X and product Y so we can offer them a discount), same for looking for specific groups of products (which products are often viewed by the same users, so we can launch a discount promotion)
I understand that it is not a requirement to remember all individual views. But I would certainly capture the number of times they visited the product - this is almost free, as you can keep a running total (insert 1 , on duplicate key update view_count = view_count + 1)
I have a table with products. Each product has a title and a price.
The products come in huge XML files, on a daily basis.
I store all of them in MySQL. But sometimes they have a wrong title. But i can't edit it, because they will be lost the next day (cronjob removes all products and inserts again).
What would be the best way to edit them? Save them in a different table and SELECT both tables at once? Whereas the table that contains the edited rows has precedence over the cronjob table.
What would be the best way to handle it, since there are 300.000+ products. Products might be (manually) edited via a CMS system.
Thanks!
Is there some sort of ID that remains constant? (productID) for example?
Can you edit the cronjob?
If both of the above is true; i'd edit the job to only add new records into the table; preventing writing over your updated values.
If there is a unique identifier for each product that remains constant over updates, you could make a table containing the product ID and the corrected title. Correcting a title would involve inserting a row into this table as well as updating the main table.
As the last step of the cron job, you can then update your main table of products from this one.
UPDATE FROM tblProduct p, tblProductCorrections pc
SET p.strTitle = pc.strCorrectedTitle
WHERE p.intId = pc.intProductId