I can't work out what I should be doing here...
I have a database with around 20,000 records. Each of these records has about 20 columns to it.
I want to add around 20 or so additional columns to this database which would be on the lines of a load of different URLs for each record. Mostly, these will be blank.
What's the "right" way of doing this:
Add 20 additional columns (youtubeurl, facebookurl, etc)
(Benefits: only one URL call // Drawbacks: makes my database much larger)
Add an additional table with three columns - 'ID','URLType','URL' which I can additionally call?
(Benefits: keeps main table much smaller // Drawbacks: additional SQL query required)
What should I be doing?
Everything else being equal, I would go with option (2). This allows you to keep your data normalized and offers flexibility if you need to add more sites in the future.
FWIW, this does not require an extra query to SELECT data, as you can just JOIN to the other table. But of course, it would require extra INSERT / UPDATE queries.
Option 2 is a almost certainly the better option. It makes it easier for you to add new Url types in the future (just invent a new URLType instead of having to create a new column). Pages that use these urls then don't have to be modified to accomodate the new type of URL; they'll just pick it out of the table. In other words, you only have to make a change in one place instead of several.
If people mostly have only a few of these urls, splitting it into a separate table is almost certainly the way to go.
Everything you're adding is a URL. Each URL is related to one (or maybe more) of your current records. So either:
for URLs that have only one record-
urls table with url and FK to records table
or for URLS that can relate to more than one record-
urls table with url_id and url
linking table with record_id and url_id
Related
I have a PHP application which use MySQL database. It has table called profile which store user's details. Now there is a need of keeping snapshot of that profile when he perform a task. Which means whole table row related to a user must be cloned.
I found two ways of doing that.
1) Add another column to table to mention whether it was cloned. Then his original profile can be separated. (original/cloned). Profile data will be maintained in one table.
Other method is ..
2) Add another table similar to profile (with same fields) and store cloned profiles in that. Profile data will be maintained in two tables.
What is most efficient in terms of performance and usability ?
If you have it in one table with just a column/field to distinguish whether it's a clone or original then you will have always double the number of records to handle. Whereas if it is on another table you have only one table to worry about each time unless you need both at one time. Another thing is if you have a separate table you have a virtual back-up for your main table. So, in any case that one is in trouble you have something to fall back on and vice versa. Additionally you don't put yourself in danger of mixing up the records if it is a separate table. In other words I would prefer your second approach rather than the first one.
Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.
What is the "proper" (most normalized?) way to store requests in the database? For example, a user submits an article. This article must be reviewed and approved before it is posted to the site.
Which is the more proper way:
A) store it in in the Articles table with an "Approved" field which is either a 0, 1, 2 (denied, approved, pending)
OR
B) Have an ArticleRequests table which has the same fields as Articles, and upon approval, move the row data from ArticleRequests to Articles.
Thanks!
Since every article is going to have an approval status, and each time an article is requested you're very likely going to need to know that status - keep it inline with the table.
Do consider calling the field ApprovalStatus, though. You may want to add a related table to contain each of the statuses unless they aren't going to change very often (or ever).
EDIT: Reasons to keep fields in related tables are:
If the related field is not always applicable, or may frequently be null.
If the related field is only needed in rare scenarios and is better described by using a foreign key into a related table of associated attributes.
In your case those above reasons don't apply.
Definitely do 'A'.
If you do B, you'll be creating a new table with the same fields as the other one and that means you're doing something wrong. You're repeating yourself.
I think it's better to store data in main table with specific status. Because it's not necessary to move data between tables if this one is approved and the article will appear on site at the same time. If you don't want to store disapproved articles you should create cron script with will remove unnecessary data or move them to archive table. In this case you will have less loading of your db because you can adjust proper time for removing old articles for example at night.
Regarding problem using approval status in each query: If you are planning to have very popular site with high-load for searching or making list of article you will use standalone server like sphinx or solr(mysql is not good solution for this purposes) and you will put data to these ones with status='Approved'. Using delta indexing helps you to keep your data up-to-date.
I'm pretty sure I already know the answer, but would like some confirmation...
We received 220 text files of providers. Each file is a different category of provider. In total there are 3.2 million records.
My inclination is to create a category table and a provider table that links to category by an ID, then index any other columns that may be searched on like state, or even last name.
The other option is to have one table per category, but I think other than the smaller row size there are a lot of disadvantages to this approach.
It's a PHP/MySQL implementation.
Anyone think the separate table option is better for any reason?
Thanks,
D.
Go with two table approach -- categories and providers.
This will enable you to
easily adding new categories
easily reverse search Categories based on a column such as state of provider.
It make sense from data-structure point of view as well. One type of data in one table.
I agree with your original thought, and with Nishant's answer. In addition to his points, it also normalizes the data, and allows easy updates if a category changes names for some reason.
I'm planning a database who has a couple of tables who contain plenty of address information, city, zip code, email address, phone #, fax #, and so on (about 11 columns worth of it), a table is an organizations table containing (up to) 2 addresses (legal contacts and contacts they should actually be used), plus every user has the same information tied to him.
We are going to have to run some geolocation stuff on those addresses too (like every address that's within X Kilometers from another address).
I have a bunch of options, each with its own problem:
I could put all the information inside every table but that would make for tables with a very large amount of columns which I'd have problems indexing, and if I change my address format it'll take a while to fix it.
I could put all the information inside an array and serialize it, then store the serialized information in one field, same problem with the previous method with a little less columns and much less availability through mysql queries
I could create a separate table with address information and link it to the other tables either by
putting an address_id column in the users and organizations table
putting a related_id and related_table columns in the addresses table
That should keep stuff tidier, but it might create some unforeseen problems with excessive joining or whatever.
Personally I think that solution 3.2 is the best, but I'm not too confident about it, so I'm asking for opinions.
Option 2 is definitely out as it would put the filtering logic into your codes instead of letting the DBMS handle them.
Option 1 or 3 will depend on your need.
if you need fast access to all the data, and you usually access both addresses along with the organization information, then you might consider option 1. But this will make it difficult to query out (i.e. slow) if the table get too big in mysql.
option 3 is good provided you index the tables correctly.