I'm confused on whether I should combine two tables or leave them separated. The ff is just one set of that problem of mine:
tblPhone(Phone_ID, Phone_Number, Phone_Type_ID, Person_ID)
tblPhone(Phone_Type_ID, Phone_Type_Name)
or
I should simply have it as:
tblPhone(Phone_ID, Phone_Number, Phone_Type_Name, Person_ID)
Does one have advantage over the other? Is there like a standard guideline or practice for table creation? For example, I remember someone telling me if a table isn't 3 or more, just combine it to another table, something like that...how true is that? I remember a little bit of normalization rules...but it can be very confusing at times. I think this falls on the third normalization rule that's why they should be separated, am I mistaken? Thanks!
You are correct that this illustrates the third normal form.
You are normalizing the Phone_Type_Name out of tblPhone (we assume you mean to call the second tblPhoneType). This is correct and a common practice. Even if you have just two columns in tblPhoneType now, eventually you may need to expand it to include other attributes related to phone types, and that is the easiest way to illustrate why you should normalize it.
Likely future scenario (need more columns):
By normalizing it now, you have protected yourself against this:
tblPhone(Phone_Type_ID, Phone_Type_Name, Phone_Type_Min_Price, Phone_Type_Max_Price)
Related
I'm making a database for a languages dictionary. I have a table of definitions with words in diferent languages.
DEFINITIONS
-----------
Id
Definition
Language
For example some records may be:
1->casa->spanish
2->house->english
3->maison->french
...
And now I have to create another table for the relationships, but I don't know how to do it correctly. In my application I can have 10 languages more or less. I think two ways of doing this:
RELATIONSHIPS
-------------
Id
Id_Spanish
Id_English
Id_French
...
So that in the same record I have the word in the different languages. Or this other way:
RELATIONSHIPS
-------------
Id
Id_Language_1
Id_Language_2
and linking the words in pairs, for example:
1(Id) -> 1(Id_Language_1) -> 2(Id_Language_2)
1(Id) -> 1(Id_Language_1) -> 3(Id_Language_2)
...
I have read it a lot about relationships many to many, but in my case I think it's better the first option (one record with all the languages), but I'm not very sure. Can someone say they think is best. Thanks.
I would add another column to your primary table
DEFINITIONS
-----------
Id
Definition
Language
WordId
and assign a word_id to each group to indicate they are all the same word.
1->casa->Spanish->10
2->house->English->10
3->maison->French->10
Bridge tables only really make sense with dynamic content, your content is static and can easily be defined in rows. Bridge tables require joins which slow down queries, so I guess one table is what makes the most sense if what you care about is querying speed.
But, what if some words are not defined for all languages. Than you waste a bunch of space on the database, which means it may make sense to use a bridge table.
I am suggesting one really long table with rows like english_word, english_Definition,french_word,french_definition etc...
I've already upvoted Brian's answer but I have enough to add that I think it's worth an additional answer:
Really, stay away from the first idea. Make lots of overhead to add a new language, and means that you need to either hardcode many different queries, or use dynamically-generated SQL, to make use of that table.
Your second idea, storing pairwise relationships, is better. At first it might seem harder to write queries against it, but once you get used to it, they will be more general and more straightforward. However, this design requires you to choose between one of two approaches, either of which has flaws:
Store every possible pair (English/French, English/Spanish, French/Spanish). This makes finding any arbitrary translation relatively simply, but requires more storage and allows for the possibility of inconsistencies if you look at more than two languages at a time. You also need to decide whether to store each pair in both directions; if not the queries become somewhat more complex.
Store just enough pairs to establish the equivalencies (e.g. store English/French and French/Spanish) then traverse them when necessary to find any given translation. Simplest way to do this is probably select one language that will always be the first one in each pair (e.g. store Spanish/English and Spanish/French pairs). But even then, you then need application logic that is aware of which language is the central one.
If you use the design that Brian suggests, any arbitrary translation from one language to another can be done with the same generic query, just plugging in the desired languages and word.
The second one is definitely better.
If you add a language, you don't need to change the structure
Queries for all language pairs are the same. Language ID is a query parameter. This means better query optimization and less code
Simpler update/delete/insert operations
Somewhat better support for synonyms
Brian's idea is also good, as long as your words have one meaning and one translation. If they have multiple, it needs to be extended.
I've taken over development on a project that has a user table with over 30 columns. And the bad thing is that changes and additions to the columns keep happening.
This isn't right.
Should I push to have the extra fields moved into a second table as values and create a third table that stores those column names?
user
id
email
user_field
id
name
user_value
id
user_field_id
user_id
value
Do not go the key / value route. SQL isn't designed to handle it and it'll make getting actual data out of your database an exercise in self torture. (Examples: Indexes don't work well. Joins are lots of fun when you have to join just to get the data you're joining on. It goes on.)
As long as the data is normalized to a decent level you don't have too many columns.
EDIT: To be clear, there are some problems that can only be solved with the key / value route. "Too many columns" isn't one of them.
It's hard to say how many is too many. It's really very subjective. I think the question you should be asking is not, "Are there too many columns?", but, rather, "Do these columns belong here?" What I mean by that is if there are columns in your User table that aren't necessarily properties of the user, then they may not belong. For example, if you've got a bunch of columns that sum up the user's address, then maybe you pull those out into an Address table with an FK into User.
I would avoid using key/value tables if possible. It may seem like an easy way to make things extensible, but it's really just a pain in the long run. If you find that your schema is changing very consistently you may want to consider putting some kind of change control in place to vet changes to only those that are necessary, or move to another technology that better supports schema-less storage like NoSQL with MongoDB or CouchDB.
This is often known as EAV, and whether this is right for your database depends on a lot of factors:
http://en.wikipedia.org/wiki/Entity-attribute-value_model
http://karwin.blogspot.com/2009/05/eav-fail.html
http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back
Too many columns is not really one of them.
Changes and additions to a table are not a bad thing if it means they accurately reflect changes in your business requirements.
If the changes and additons are continual then perhaps you need to sit down and do a better job of defining the requirements. Now I can't say if 30 columns is toomany becasue it depends on how wide they are and whether thay are something that shouldbe moved to a related table. For instnce if you have fields like phone1, phone2, phone 3, youo have a mess that needs to be split out into a related table for user_phone. Or if all your columns are wide (and your overall table width is wider than the pages the databases stores data in) and some are not that frequently needed for your queries, they might be better in a related table that has a one-to-one relationship. I would probably not do this unless you have an actual performance problem though.
However, of all the possible choices, the EAV model you described is the worst one both from a maintainabilty and performance viewpoint. It is very hard to write decent queries against this model.
This really depends on what you're trying to do.
I'm working on a forum-like webapp where I'd like to allow users to favourite an item so that they can keep track of it, and also so that others can see how many times an item's been favourited.
The problem is, I'm unsure on the best practices for databases, which includes this situation.
I have two ideas in my head on how to do this:
Add an extra column to the user table and store things like so: "|2|5|73|"
Add an extra table with at least two columns, one for referencing an item, the other for referencing a user.
I feel uncomfortable about going for the second method as it involves an extra table, and potentially more queries would be required. Perhaps these beliefs aren't an issue, as I have little understanding of databases beyond simply working with table layouts and basic queries.
The second method, commonly called a junction or join table, is fairly standard practice and is going to be far more efficient than adding a column like the one you describe to the user table. Through the magic of JOINs you won't be making any extra queries.
Since it sounds like your app is starting to get a little complex, I highly recommend picking up a MySQL database book at your local library or book store (check for reviews on Amazon to find a good one) and expanding your knowledge.
Well I'd have +1-ed the otherresponse but I'm too much of a newb apparently. But yes, I recommend a join table for this type of thing.
Overview (Sorry its vague - I think if I went into more detail it would just over complicate things)
I have three tables, table one contains an id, table two contains its own id and table one's id and table three contains its own id and table two's id.
I have spent a lot of time pondering and I think it would be more efficient for table three to also contain the related table ones id.
-It will mean I will not have to join three tables, I can just query table three (for a query that will be used very often)
-It will allow me to implement a reservation system more easily by only locking rows within table three that contain a specific id from table one.
For anyone who wants to know more about the database layout there is more info here
Question
What are the disadvantaged to de-normalisation? I have seen some people who are completely against it and others who believe in the right situation it is a useful tool. The id's will never change so I do not really see any disadvantage other than having to insert the same data twice and thus the additional space it will consume (which as it is just id's will surely be negligible).
My advice is to follow this general rule: Normalise by default, then denormalise if and when you identify a performance problem which it will solve.
I find normalised data, and code dealing with it, easier and more logical to maintain. I don't think there is any problem using denormalisation to improve performance, but I would not speculatively apply any performance optimisation which results in a decrease in maintainability until you are sure they are necessary.
The only time you really want to denormalize is if its required to get the performance you want
This was already asked several times. See here
As its a one (Table 1) to many (Table 2), with another one (table 2) to many (Table 3) I would keep the same structure as their seems to be 3 layers there.
e.g.
Table 1
Table 2
Table 3
Also, a lot will depend on what additional fields you are storing within those tables.
Every rule might be broken if there is a good reason for it.
In your case I wonder what the three tables contain. Does Table three really describe Table two or does it describe table one directly?
The disadvantage to have self-id, table-two-id and table-one-id in table three in this case is, that it can lead to inconsistence - what if you have table-one-id 1 in table two and table-one-id 15 in table three by a mistake?
It depends on the data and the entity relationship of your data. For me, it would be more important to have no inconsistencies and to have a little bit more time at selection...
EDIT: After reading about your Tables I would suggest to add a table-one-id to table three (areas), because table-one-id doesn't change after all and for that reason its relatively save for inconsistency.
Normalization vs efficiency is usually a trade-off, while normalization is generally a good thing, it is not a silver bullet. If you have a clear reason (as it seems you do), denormalization is perfectly acceptable.
Schemas containing less than fully normalized tables suffer from what is called "harmful redundancy". Harmful redundancy can result in storing the same fact in more than one place, or in not having any place to store a fact that needs to be stored. These problems are known as "insert anomalies", "update anomalies", or "delete anomalies".
To make a long story short, if you store a fact in more than one place, then sooner or later you are going to store mutually contradictory facts in the two places, and your database will begin to give contradictory answers, depending on which version of the facts the query found.
If you are forced to "invent a dummy record" in order to have a place to store a needed fact, then sooner or later you are going to write a query that mistakenly treats the dummy record like a real one.
If you are a super programmer, and you never make mistakes, then you don't have to worry about the above. I never met such a programmer, although I've met lots of people who think they never make mistakes.
I would refrain from "denormalizing" as a practice. That's like "driving away from Chicago". You still don't know where you are going. However, there are times when normalization rules should be disregarded, as others have noted. If you are designing a star schema (or a snowflake schema) you are going to have to disregard some of the normalization rules in order to get the best star (or snowflake).
In my database I currently have two tables that are almost identical except for one field.
For a quick explanation, with my project, each year businesses submit to me a list of suppliers that they sale to, and also purchase things from. Since this is done on an annual basis, I have a table called sales and one called purchases.
So in the sales table, I would have the fields like: BusinessID, year, PurchaserID, etc. And the complete opposite would be in the purchases table, except that there would be a SellerID.
So basically both tables are exactly the same field wise except for the PurchaserID/SellerID. I inherited this system, so I did not design the DB this way. I'm debating combing the two tables into one table called suppliers and just adding a type field to distinguish between whether they are selling to, or purchasing from.
Does this sound like a good idea? Is there something I'm missing in regards to why this wouldn't be a good idea?
Do what works for you.
The textbook answer is normalize. If you normalized you would probably have 2 tables, one with both your buyers and sellers as companies. And a transactions table telling who bought what from who.
If it ain't broke, don't fix it. Leave them separate.
Since the system is already built, I would only consider this if you find yourself doing a lot of queries across the two tables, like big nasty UNION queries. Joining the two tables in one makes queries like "show me all sellers or purchasers who sold/bought between these dates..." much easier.
But it sounds like these two groups are treated very differently from the business rule perspective, so its probably not worth the trouble to make application changes at this point. (Every query would have to have a "WHERE Type = 1" or something like that).
If you'd have asked this during the db design phase, my answer might be different.
Normalization would say "yes".
How many applications are affected by this change? That would affect the decision.
Definitely one table. And I wouldn't call it supplier since this does not reflect the meaning of the table. Something like busibess_partner or something better than that might be more appropriate. Instead of purchase_id and seller_id, then be more generic like business_partner_id, and yes, add a field to distinguish.
Not one table. They are different entities that have a similar structure. There's nothing to be gained by consolidating them. (Nothing lost, either, except lucidity; but that's critical IMHO).
"Normalization" doesn't include looking for tables with similar schemas, and merging them.
A database is always a limited model of your business objective. If it doesn't make sense for you business, ignore those who say you should add complexity to your data model by creating a new companies table (though you probably already have something similar). If you really want to get into the "perfect model" game, just start abstracting everything away into an "entities" table and pretty soon you will have a completely unmanageable database.
Normalization would dictate that you NOT combine the two fields, unless the foreign keys actually point to the same table. A key rule to keep in mind is that each column in a table should only mean one thing. Adding a second field that explains what the first field means breaks this rule.
If your queries are getting to be a mess because you are always joining the two tables, you could create a view.
Also, the number of records in the table is almost completely irrelevant. Always optimize for performance after you have the system in place. If it killing your application to have all the records in one table, set a clustered index on a column that partitions your table in a meaningful way.
You must take into consideration the number of records on both tables. if they are to big it could have a big inpact on queries that have multiple joins to customers and suppliers.
Example: Who sold computers to us and to whom did we sell them to.
From a completely different point of view. I tend to consider logic over technology. To me the decision is not whether the data is similar in shape or fields, but whether it makes sense mixing them. That is as much to say that whether the technical answer might be normalize, my answer would be: does it make sense to you (business logic) to have both together?
Another answer talks about merging both and changing naming conventions. To me that is a logic decision: you are saying that you don't work with buyers and sellers, but with business partners. If that is your case, then do it.
You might also consider what your use of the tables would be. If they are of one unique logic type (business partner) you will surely have queries that need to access both buyers and sellers. Else, if all your queries are separate, that might be an indication that they are not the same, and should not be held together. Pushing them together will imply a lot of extra checks and cpu time spent differing from what were separate entities.
There is a long used metaphor about interfaces that might apply here. Just because a fire gun and a camera both shoot, that does not mean they share an interface, unless you like playing Russian roulette.
From a logical view, there seems to be no difference between the reported transactions, it is just a difference in who reports it to you. It should be a single table with SellerID, BuyerID, and (if you need it) ReporterID(s) (and perhaps additional transaction information).
This is how it should be. Now, how to make the transition? Making a script that uses the two old tables to fill a new table should be an easy exercise, but then you also need to change all the queries that use the information. This is likely a lot of work, and might not be worth the effort.
Since none of the experts reporting in are willing to answer your question, the simple answer is: query1 UNION query2
EX.
SELECT * FROM table1 UNION SELECT * FROM table2 assuming table1 and table2 have the same structure/heading titles