If one has three fields in a db that they are querying an object by....
One of these fields must always be an associations id.
Concerning the other two fields "only one needs to be true"
What interpretation do you take or make of "only one needs to be true"?
I'd read that as OR, only one needs to be true, but I won't object if the other is too.
If I meant EXCLUSIVE OR I would say "exactly one must be true".
However, the only way to be sure what was intended is to ask the author, who may well surprise you by telling you about yet a further condition ;-(
I read this as meaning that the original search specified the condition that one field had to be assoc, plus 2 more conditions for the remaining two fields, but selection can be made if only one of these two (plus the assoc one) are true.
Well, I would read it as "one of the other two fields must match the search criterion". Beyond that, I would need more information.
Related
I'm confused on whether I should combine two tables or leave them separated. The ff is just one set of that problem of mine:
tblPhone(Phone_ID, Phone_Number, Phone_Type_ID, Person_ID)
tblPhone(Phone_Type_ID, Phone_Type_Name)
or
I should simply have it as:
tblPhone(Phone_ID, Phone_Number, Phone_Type_Name, Person_ID)
Does one have advantage over the other? Is there like a standard guideline or practice for table creation? For example, I remember someone telling me if a table isn't 3 or more, just combine it to another table, something like that...how true is that? I remember a little bit of normalization rules...but it can be very confusing at times. I think this falls on the third normalization rule that's why they should be separated, am I mistaken? Thanks!
You are correct that this illustrates the third normal form.
You are normalizing the Phone_Type_Name out of tblPhone (we assume you mean to call the second tblPhoneType). This is correct and a common practice. Even if you have just two columns in tblPhoneType now, eventually you may need to expand it to include other attributes related to phone types, and that is the easiest way to illustrate why you should normalize it.
Likely future scenario (need more columns):
By normalizing it now, you have protected yourself against this:
tblPhone(Phone_Type_ID, Phone_Type_Name, Phone_Type_Min_Price, Phone_Type_Max_Price)
I have an uncommon database design problem I'm not sure how to handle properly. There is a table called profile storing a website users' public profile information. However, every profile can belong to either a single person or a couple so I need an additional child table called person to store person-specific data. Every profile entity must have at least one but no more than two person child entities.
What is the best (in terms of being "kosher" and/or performance) way to model such relationship? Should I go with regular one-to-many and enforce the number of children programatically or with stored procedures? Or should I just create two foreign key fields in the parent table and allow null for one of them? Maybe there's another way I can't think of?
Edit: Additional info in response to Gordon's questions
A person can be related to only one profile and there can't be a person without a profile. Perhaps the name person is confusing, as it may suggest that a person has profile, while in fact it's the profile that has person information.
In case of couple profiles both persons are equal. Due to the site's specific the limit on 2 will never change, however it should be possible to add or remove a person (to make a single person profile a couple profile and vice-versa) but there can never be less than 1 or more than 2 persons.
The person data would never be fetched without the profile data but the profile data could sometimes be fetched without the person data.
1)
The solution with two fields:
PRO: Allows you to precisely restrict both minimal and maximal number of people per proflie.
CON: Would allow a profile-less person.
CON: Would require 2 indexes (1 on each field) to efficiently get the profile of a given person, taking additional space and potentially slowing down INSERT/UPDATE/DELETE.
2)
But if you are willing to enforce the minimal number at the application level, you might be better off with something like this:
CHECK(PERSON_NO = 1 OR PERSON_NO = 2)
Characteristics:
CON: Allows a person-less profile.
PRO: Restricts maximal number of people per profile, yet easy to change by just modifying the CHECK.
PRO: If you keep the identifying relationship as above, it doesn't require additional indexes and is clustering-friendly (persons of the same profile can be stored physically close together, minimizing I/O during JOIN).
On the other hand, if you have a key PERSON_ID (or similar), then an additional index on {PROFILE_ID, PERSON_NO} would be necessary for the efficient enforcement of key constraint on these fields too.
3)
Theoretically, you could even combine the two approaches and avoid both profile-less persons and person-less profiles:
(PERSON1_ID is not NULL-able, PERSON2_ID is NULL-able)
However, this leads to circular references, requiring deferred constraints to resolve, which are unfortunately not supported by MySQL.
4)
And finally, you could just take a brute-force approach and simply place fields of both persons in the profile table (and make one of these sets not NULL-able and the other NULL-able).
Out of all these possibilities, I'd probably go with 2).
You essentially have two options, as you mention in your question. You can store two fields in the table. Or, you can have a second table that has the mapping information.
Here are some additional question to help you answer the question:
Can a person have their own profile and a profile as part of a couple?
Are both people on a profile "equal" or is one the "master" and the other an "alternate"?
When you fetch profile information, will you always be including information about all people on the profile?
Can you have persons without profiles?
In this case, I just have the sneaky suspicion that the limit on "2" may change in the future. This suggests storing the mapping in a separate table, since increasing "2" by adding a field is a problem in terms of modifying existing code. In other words, creating a separate table, person-profile, that maps persons to profiles. In mysql, you can always gather the person-level information using GROUP_CONCAT().
One case where it is better to put such similar fields in the same table is when one is clearly preferred and the other is the alternate. In that case, you are doing a lot of "coalesce(, )" type of logic.
I'll try to make it easy by explaining an example.
So, I consolidate data from two sources, namely 1 and 2. In each of the sources, it has a column "number" that has unique values within a source. But when A and B are consolidated (they have to be), it cannot be checked that they are unique. However, when consolidating 1 and 2, I created a column name "source" and tagged it with its source name (1 or 2). Therefore, if I want to look for a certain specific "number" I submit a query that looks for the desired number AND source.
Is there a better way to do this? It is working just fine because my database is small, but will this work well (i.e. fast, efficiently, etc.) as the DB grows? I mean, it won't have one million entries in the next few years, but I'd still like to perform it in a optimal manner.
The only other way I can think about is to keep separate "number" columns for different sources and query the appropriate columns.. but this will require additional columns to be added as I get additional sources. Hm.. what to do?
Your method should work just fine without causing any perceivable slow downs, if any at all.
I am wondering about a 'many to two' relationship. The child can be linked to either of two parents, but not both. Is there any way to reinforce this? Also I would like to prevent duplicate entries in the child.
A real world example would be phone numbers, users and companies. A company can have many phone numbers, a user can have many phone numbers, but ideally the user shouldn't provide the same phone number as the company as there would be duplicate content in the DB.
This question shows that you don't fully understand entity relationships (no rudeness intended). Of which there are four (technically only 3) types below:
One to One
One to Many
Many to One
Many to Many
One to One (1:1):
In this case a table has been broken up into two parts for purposes of complying with normalisation, or more usually the open closed principle.
Normalisation compliance: You might have a business rule that each customer has only one account. Technically, you could in this case say customer and account could all be in the same table, but this breaks the rules of normalisation, so you split them and make a 1:1.
Open-Close principle compliance: A customer table, might have id, first & last names, and address. Later someone decides to add a date of birth and with it the ability to calculate age along with a bunch of other much needed fields. This is an over simplified example of one to one, but you get the main use for it is to extend your database without breaking existing code. Much code written (sadly) is tightly coupled to the database so changes in the structure of a table will break the code. Adding a 1:1 like this will extend the table to meet new requirements without modifying the origional, thereby allowing old code to continue functioning normally and new code to make use of the new db features.
The downside of normalisation and extending tables using 1:1 relationships in this way is performance. Often times on heavly used systems, the first target to increase database performance is de-normalising and combining such tables into a single table, and optimising the indexes thus removing the need to use joins and read from multiple tables. Normalisation / De-Normalisation is neither a good or bad thing, as it depends on the needs of the system. Most systems usually start off normalised changing back when needed, but this change needs to be done very carefully as mentioned, if code is tightly coupled to the DB structure, it will almost definitely cause the system to fail. i.e. When you combine 2 tables, one ceases to exist, all the code that includes that now nonexistant table fails until it is modified (in db terms, imagine connecting relationships to any of the tables in the 1:1, when you remove those tables, this breaks the relationships, and so the structure has to be greatly modified to compensate. Unfortunately, such bad designs are much easier to spot in the DB world than in the software world in most cases and you don't usually notice something went wrong in code until it all falls apart) unless the system is properly designed with separation of concerns in mind.
It the closest thing you can get to inheritance in object oriented programming. But its not quite the same.
One to Many (1:M) / Many to One (M:1):
These two relationships (hense why 4 become 3), are the most popular relationship types. They are both the same type of relationship, the only thing that changes is your point of view. An example A customer has many phone numbers, or alternately, many phone numbers can belong to a customer.
In object oriented programming this would be considered composition. Its not inheritance, but you are saying one item is composed of many parts. This is usually represented with arrays / lists / collections etc. inside of classes as opposed to an inheritance structure.
Many to Many (M:M):
This type of relationship with current technology is impossible. For this reason we need to break it down into two one to many relationships with an "association" table joining them. The many side of the two one to many relationships is always on the association / link table.
For your example, the person who said you need a many to many is correct. Because a two to many is effectively a many (meaning more than one) to many relationship. This is the only way you would get your system to work. Unless you are intending to research the field of relational calculus to find some new type of relationship that would allow this.
Also for such relationships (m2m) you have two choices, either create a compound key in the linker table so the combination of fields become a unique entry (if you are interested in db optimisation this is the slower choice, but takes less space). Alternately, you create a third field with an auto generated id column and make that the primary key (for db optimisation, this is the faster choice, but takes more space).
In your example specifically above...
A real world example would be phone numbers, users and companies. A company can have many phone numbers, a user can have many phone numbers, but ideally the user shouldn't provide the same phone number as the company as there would be duplicate content in the DB.
This would be a many to many relationship with the phone number table as the linker table between companies and users. As explained, to ensure no phone number is repeated, you simply set it as the primary key or use another primary key and set the phone number field to unique.
For those kind of questions, it is really down to how you phrase them. What is causing you to get confused about this, and how you overcome this confusion to see the solution is simple. Rephrase the problem as follows. Start by asking is it a one to one, if the answer is no, move on. Next ask is it a one to many, if the answer is no move on. The only other option remaining is many to many. Be careful though, ensure you have considered the first 2 questions carefully before moving on. Many inexperienced database people often over complicate issues by defining one to many as many to many. Once again, the most popular type of relationship by far is one to many (I would say 90%) with the many to many and one to one spliting the remaining 10% 7/3 respectevely. But those figures are just my personal perspective, so dont go quoting them as industry standard statistics. My point is to make extra extra sure it is definitely not a one to many before choosing many to many. It is worth the extra effort.
So now to find the linker table between the two, decide which two are your main tables, and what fields need to be shared between them. In this case, company and user tables both need to share the phone. Hense you need to make a new phone table as the linker.
The warning alarm of misunderstanding should show as soon as you decide none of the 3 are working for you. This should be enough to tell you that you simply are not phrasing the relationship question correctly. You will get better at it as time passes, but it is an essential skill and really should be mastered as soon as possible for your own sanaty.
Of course you could also go to an object oriented database which will allow a range of other relationships called "Hierarchacal" relationships. Thats great if you are thinking of becomming a programmer too. But I wouldnt recommend this as it going to make your head hurt when you start finding ways to combine the various types of relationships. Especially given there is not much need since nearly all databases in the world consist of just those 3 types of relationships unless they are something super duper special.
Hope this was a reasonable answer. Thanks for taking the time to read it.
Just make phone number a key in your contact numbers table.
For your phone number example, you would put the phone number in a table by itself, with an ID.
Then you link to that phone_id from each of users and companies.
For your parents example, you don't link the child to parent - instead you link the parent to the child. OR, you put both parents in the same table, and the child just links to one of them.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am currently debating an issue with my dev team. They believes that empty fields are bad news. For instance, if we have a customer details table that stores data for customers from different countries, and each country has a slightly different address configuration - plus 1-2 extra fields, e.g. French customer details may also store details for entry code, and floor/level plus title fields (madamme, etc.). South Africa would have a security number. And so on.
Given that we're talking about minor variances my idea is to put all of the fields into the table and use what is needed on each form.
My colleague believes we should have a separate table with extra data. E.g. customer_info_fr. But this seams to totally defeat the purpose of a combined table in the first place.
The argument is that empty fields / columns is bad - but I'm struggling to find justification in terms of database design principles for or against this argument and preferred solutions.
Another option is a separate mini EAV table that stores extra data with parent_id, key, val fields. Or to serialise extra data into an extra_data column in the main customer_data table.
I think I am confused because what I'm discussing is not covered by 3NF which is what I would typically use as a reference for how to structure data.
So my question specifically: -
If you have slight variances in data for each record (1-2 different fields for instance) what is the best way to proceed?
There is definitely a school of thought which holds that NULL fields are bad, in and of themselves. Relational theory demands that databases consist of facts, and NULLs are the absence of fact. So, a rigorously designed database would have no nullable columns.
Your colleague is proposing something which is on the road to 6th Normal Form, where all the tables consist of a primary key and at most one other column. Only in such a schema we wouldn't have tables called customer_info_fr. That's not normalised. Many countries might include ENTRY_CODE in their addresses. So we would need address_entry_codes and address_floor_numbers. Not to mention address_building_number and address_building_name, as some places are identified by number and other by name.
It's completely accurate and truthful as a logical design. Alas from a physical perspective it is Teh Suck! The simplest query - select * from addresses - becomes a multi-table join, and outer joins at that. Nullable columns are a way of reconciling ugly design with the hard truth, "you cannae break the laws of physics". Nullable columns allow us to combine disjoint data sets into a single table, albeit at the cost of handling nulls (they can affect data retrieval, index usage, maths, etc).
Some designs attempt to get around the use of nulls by applying magic values. That is, if we don't know the correct value for some column we inject a default value which is a value but also means "unknown". A common instance of this is date '9999-12-31' as an open-ended TO_DATE in a FROM-TO date range. As long as everybody understands and adheres to the convention it's not a problem. It becomes a problem when some tables have date '9999-12-01' or date '9999-01-31' instead.
This is why magic values are not a robust solution. Consumers of our data need to know that -1 is the value we use for DofQ in our stock control system when we don't know the real value. But at least it's obviously not a valid value. Choosing say 20 as a magic value is deadly because it could be a real DofQ: we can no longer tell the actual values from the "don't knows".
So, given a choice between nulls and magic values, choose nulls.
I'd be interested in your colleague's justification as to why empty fields are bad. As far as I'm aware, empty or null fields aren't bad in and of themselves. If you have a lot of empty data values for a column that you are planning on putting an important index on, you may want to consider other options. This goes for any column where you have a lot of duplicate records actually and need an index, as duplicated records lower the cardinality of the column, making indexes less useful. In your case, I don't see it being an issue.
For this kind of data, you're likely using a VARCHAR or some kind of TEXT column anyway, which are variable length fields in the database. It doesn't matter if your field is full of data or empty, you're still going to incur the overhead of a variable-length column (which isn't worth worrying about in normal circumstances). So again, there's no difference to the RDBMS.
From the sounds of what you're designing, I think if you came up with a generic method of handling address variances in a single table, it would be the way to go. Your code and structure would be much simpler at the negligible (in my opinion) cost of some empty data fields.
That's what nullable fields are for: "Data not available/applicable".
SQL has a different notion of null than most programming languages, so SQL's null is often a misunderstood concept.
Whatever you do, do not go down the EAV route. This is a prescription for a poorly performing database, far, far worse than a few empty fields.
If you must have a separate related tables for the different situations, a lot of that will depend on how different the entities are and how they will be queried. If you will be querying across categories, you will find that joins to a bunch of tables to get all the data you may or may not need is a nightmare (I don't know if Germany will be in my result set so I join to the Germany details tables, oops didn't need to). It can be far simpler to handle nulls than to try it figure out which of many tables you need to join to (and to always remember to left join to those tables).
However, if you will never be querying across the entitites and the fields make sense separate, then put them in a separate table.
Nulls invariably add complexity to a data model because the behaviour of null in SQL rarely matches the maths, logic or reality that you intended to model with it. In other words, some queries return incorrect results, which you then need to compensate for with additional logic.
All information can be represented accurately without nulls. Since nulls add complexity it is sound design practice to begin your data model without them and then only add a null where you find some special reason to do so or where some database feature or limitation forces a null upon you.
I wouldn't overthink it. NULL can be used, but developers need to be careful using them.
I would prefer to have the Address be a long Text field in the database for any website that deals with multiple countries.
Most websites have Address Line1, Address Line 2, Postal/ZIP Code, City, State/Region, Country ... anything more than that (like EAV) would be overkill.
I wouldn't mind having the user interface show different labels near the text boxes for each country.
Entry code, floor/level, title fields, security number, and so on should fit in the address lines, the label near it, or a tip in the UI can indicate it.