Let's assume that you need a table to store settings. For example, I want to store vehicle settings in a table, but there are over 100 settings, so is is better to have a table with 100 columns or a table with maybe 2 columns (1 for the name of the setting and 1 for the value of the setting)?
Either have its advantage and disadvantage.
For flexibility, I would go for Vertical (each setting in each row) approach
If you are using one setting per row,
it will be easier to add new settings or remove unwanted setting in the future without changing the table schema.
You can have an user interface to do this without touching the database
Your clients can add/remove settings without requesting your attention
BUT(s)
You may need to remember the setting keyword, no intellisense
Looping, Cursor
The 100 columns approach
Intellisense
It's just one record, should be faster
No looping, no cursor
BUT(s)
You may have to fill all columns if they are not NULLable
Change schema, you may have to change all dependent codes
I'm all for normalization. So I would create Three tables: Vehicle, Setting and VehicleSetting that will have three columns for vehicle id, setting id and Setting value. Actually I do have this implementation in production. My settings table also has Default Value that is stored if user doesn't specify the value explicitly.
This approach is very convenient if you decide to add a setting in the future. Instead of modifying the table and potentially facing refactoring, you would just add another record to settings table and you're good to go.
I don't disagree with the answer from Dimitri but present the other side.
12 or 100 look at how often to you expect the settings to change.
If each setting is a column then you have a program change for a new property. More simple query syntax. If they are singe value properties then I would argue you still have 3rd normal form and more efficient queries.
If you go with 3 tables as Dimitri suggested then you have a slightly more complex design but you have that ability to add and revise properties run time. The query will be more complex with several joins. You could build a query on the setting table to build your real query. For sure I would use joins over cursor suggested by tcoder.
If you have a .NET or other front end then you could also build up the query by reading from the settings table. If you are binding to like a GridView you will not be able to generate columns but again not that much work.
Related
We are looking to extend MySQL by adding extra columns to each "column". Right now you have the following.
Field, Type, Null, Key, Default, Extra
We want to be able to add to the "column" definition an extra column like, Attributes. Our system has certain design specifications that we need to describe more data per "column". How can we accomplish this in MySQL?
The query to return back all of the columns is as follows.
SHOW COLUMNS FROM MyDB.MyTable;
EDIT 1
I should have added this to begin with, and I apologize for not doing so. We are currently describing attributes in the Comments section for each column type, and we understand that this is a very dirty solution, but it was the only one we could think of at the time. We have built a code generator that revolves around the DB structure and is what really stems from this initiative. We want to describe code attributes for a column so the code generator can pick up the changes and refresh the code base on each change or run.
First, terminology: "field" and "column" are basically synonyms in this context. There is no distinction between fields and columns. Some MySQL commands even allow you to use these two words interchangeably (e.g. SHOW FIELDS FROM MyDB.MyTable).
We want to assign attributes to each column in a table. Adding "field_foo" for "field" would repeat the same data over and over again for each row.
Simple answer:
If you want more attributes that pertain to a given column foo, then you should create another table, where foo is its primary key, so each distinct value gets exactly one row. This is part of the process of database normalization. This allows you to have attributes to describe a given value of foo without repeating data, even when you use that value many times in your original table.
It sounds like you might also need to allow for extensibility and you want to allow new columns at some future time, but you don't know which columns or how many right now. This is a pretty common project requirement.
You might be interested in my presentation Extensible Data Modeling, in which I give an overview of different solutions in SQL for this type of problem.
Extra Columns
Entity-Attribute-Value
Class Table Inheritance
Serialized LOB
Inverted Indexes
Online Schema Changes
Non-Relational Databases
None of these solutions are foolproof, each has their strengths and weaknesses. So it is worth learning about all of them, and then decide which ones have strengths that matter to your specific project, while their weaknesses are something that doesn't inconvenience you too much (that's the decision process for many software design choices).
We are currently describing attributes in the Comments section for each column type
So you're using something like the "Serialized LOB" solution.
I'm confused as to which of the two db schema approaches I should adopt for the following situation.
I need to store multiple attributes for a website, e.g. page size, word count, category, etc. and where the number of attributes may increase in the future. The purpose is to display this table to the user and he should be able to quickly filter/sort amongst the data (so the table strucuture should support fast querying & sorting). I also want to keep a log of previous data to maintain a timeline of changes. So the two table structure options I've thought of are:
Option A
website_attributes
id, website_id, page_size, word_count, category_id, title_id, ...... (going up to 18 columns and have to keep in mind that there might be a few null values and may also need to add more columns in the future)
website_attributes_change_log
same table strucuture as above with an added column for "change_update_time"
I feel the advantage of this schema is the queries will be easy to write even when some attributes are linked to other tables and also sorting will be simple. The disadvantage I guess will be adding columns later can be problematic with ALTER TABLE taking very long to run on large data tables + there could be many rows with many null columns.
Option B
website_attribute_fields
attribute_id, attribute_name (e.g. page_size), attribute_value_type (e.g. int)
website_attributes
id, website_id, attribute_id, attribute_value, last_update_time
The advantage out here seems to be the flexibility of this approach, in that I can add columns whenever and also I save on storage space. However, as much as I'd like to adopt this approach, I feel that writing queries will be especially complex when needing to display the tables [since I will need to display records for multiple sites at a time and there will also be cross referencing of values with other tables for certain attributes] + sorting the data might be difficult [given that this is not a column based approach].
A sample output of what I'd be looking at would be:
Site-A.com, 232032 bytes, 232 words, PR 4, Real Estate [linked to category table], ..
Site-B.com, ..., ..., ... ,...
And the user needs to be able to sort by all the number based columns, in which case approach B might be difficult.
So I want to know if I'd be doing the right thing by going with Option A or whether there are other better options that I might have not even considered in the first place.
I would recommend using Option A.
You can mitigate the pain of long-running ALTER TABLE by using pt-online-schema-change.
The upcoming MySQL 5.6 supports non-blocking ALTER TABLE operations.
Option B is called Entity-Attribute-Value, or EAV. This breaks rules of relational database design, so it's bound to be awkward to write SQL queries against data in this format. You'll probably regret using it.
I have posted several times on Stack Overflow describing pitfalls of EAV.
Also in my blog: EAV FAIL.
Option A is a better way ,though the time may be large when alert table for adding a extra column, querying and sorting options are quicker. I have used the design like Option A before, and it won't take too long when alert table while millions records in the table.
you should go with option 2 because it is more flexible and uses less ram. When you are using option1 then you have to fetch a lot of content into the ram, so will increases the chances of page fault. If you want to increase the querying time of the database then you should defiantly index your database to get fast result
I think Option A is not a good design. When you design a good data model you should not change the tables in a future. If you domain SQL language, using queries in option B will not be difficult. Also it is the solution of your real problem: "you need to store some attributes (open number, not final attributes) of some webpages, therefore, exist an entity for representation of those attributes"
Use Option A as the attributes are fixed. It will be difficult to query and process data from second model as there will be query based on multiple attributes.
Let's take a table Companies with columns id, name and UCId. I'm trying to find companies whose numeric portion of the UCId matches some string of digits.
The UCIds usually look like XY123456 but they're user inputs and the users seem to really love leaving random spaces in them and sometimes even not entering the XY at all, and they want to keep it that way. What I'm saying is that I can't enforce a standard pattern. They want to enter it their way, and read it their way as well. So i'm stuck having to use functions in my where section.
Is there a way to make these queries not take unusably long in mysql? I know what functions to use and all that, I just need a way to make the search at least relatively fast. Can I somehow create a custom index with the functions already applied to the UCId?
just for reference an example of the query I'd like to use
SELECT *
FROM Companies
WHERE digits_only(UCId) = 'some_digits.'
I'll just add that the Companies tables usually has tens of thousands of rows and in some instances the query needs to be run repeatedly, that's why I need a fast solution.
Unfortunately, MySQL doesn't have such things as function- (generally speaking, expression-) based indexes (like in Oracle or PostgreSQL). One possible workaround is to add another column to Companies table, which will actually be filled by normalized values (i.e., digits_only(UCId)). This column can be managed in your code or via DB triggers set on INSERT/UPDATE.
I'm just wondering which is the best practice:
I have a table and I want to hide a record so should I use a column like visible = 1 or should I create another table and transfer the data.
thanks!
I would recommend adding a isHidden field for this purpose. I usually use a tinyint for this.
Tables don't have performance—queries have performance. When you're trying to decide how to optimize, concentrate on the queries you will run against the table.
It could be worth moving the data to another table, for example if 90% of the data is "hidden" and rows seldom change their hidden state, you could greatly improve performance for queries against the non-hidden data by keeping that table small.
On the other hand, if you have a mix of queries where you sometimes include and sometimes exclude the "hidden" rows, or rows frequently change their hidden state, it would be more convenient to keep them in the same table.
Both strategies are valid in different circumstances. You need to take all your uses of the data into account.
My target DB is MySQL, but lets try to look at this problem more "generic". I am about to create a framework that will modify table structures as needed. Tables may contain a hundred thousands of records some day. I might add a column, rename a column, or even change the type of a column (lets assume that's nothing impossible, i.e. I might only change a numeric column into a varchar column), or I might also drop a whole column. But in most cases I would just add columns.
Somebody told me that's an very bad idea. He said that very loud and clear, but with no further argumentation. So what do you think? Bad idea? Why?
In most databases, adding and renaming columns are simple operations that just change the table metadata. You should actually verify that that's the case for the MySQL storage engine you're using, though. Dropping a column should be lightweight too.
By contrast, changing the type of a column is an intensive operation, since it involves actually creating data for each row in the table. Similarly, adding a column and populating it (rather than leaving the new column's values as null).
It is a bad idea.
You will have client applications using these columns, so whenever you rename or remove any columns, you will break their code.
Changing the type of a column might be necessary sometimes, but that's not something you will have to do often. If the column is numeric and you do calculations with it, then you can't change it to be of type varchar. And if you don't need it for any calculations and you expect non-numeric characters later, why not define it as varchar in first place?
Databases are built to do some things very well: to handle data integrity, referential integrity and transactional integrity, to name just three. Messing with table structures (when they have existing data and relationships) is probably violating all three of those things simultaneously.
If you want to adjust the way your data is presented to the user (limiting visibility of columns, renaming columns etc.), you are much better off doing this by exposing your data via views and altering their definition as needed. But leave the tables alone!
it is very harmful and should be avoided as integrity of data is disturbed and your queries will also fail incase you have hardcoded queries or improper ORM.
if you are doing it be cautious and if it live take the backup of your DB first.