Is there a reason for or against setting some fields as NULL or NOT NULL in a mysql table, apart from primary/foreign key fields?
That completely depends on your domain to be honest. Functionally it makes little difference to the database engine, but if you're looking to have a well defined domain it is often best to have both the database and application layer mirror the requirements you are placing on the user.
If it's moot to you whether or not the user enters their "Display Name", then by all means mark the column as nullable. On the other hand, if you are going to require a "Display Name" you should mark it non null in the database as well as enforcing the constraint in the application. By doubling the constraint, you ensure that should your front-end change, the domain is still fully qualified.
MySQL has a NOT NULL condition on a field, but this will not stop you from inserting "empty" data. There is no way to flag a field as "required".
As Pekka mentioned, you should be doing some sort of validation to prevent this at a higher level in your application.
It's not a MySQL specific thing - every database that I'm aware of allows for defining columns with a constraint that either allows a NULL value in the column, or does not allow this to happen.
Defining a column as NOT NULL means there always has to be a value present that matches the data type. NULL is a sentinel value, and its' data type transcends whatever is defined for the column.
If the column is a foreign key, the value also has to already exist in the related table before you insert the value into the current table. DEFAULT constraints are common, but not necessary, on columns defined as NOT NULL so that the columns will be populated with an appropriate value if NULL was attempted to be inserted into these columns. Getting back to foreign keys, a foreign key column can be nullable, which means the relationship is optional - the business rules allow for there to be no relationship.
When Should NULL & NOT NULL be Used?
Ideally, every column should be NOT NULL but it really depends on what the business rules require.
I don't know how you would define a required field in mySQL, care to enlighten me? I really don't know.
Anyway, even if this can be done, I can hardly think of a scenario where it would make sense. IMO, you would have to validate faulty (=incorrectly empty) data much earlier. Validation, sanitation and cutting should be done long before anything enters the database. The only time a database error should occur is when something exceptional occurs, e.g. when the database is physically not reachable.
Related
I have a question related to this already answered question regards to MySql DB design. I was wondering what are the possible problems/sacrifices related to a decision not to put a "Not Null" constraint on foreign keys in the table? (As mentioned in the linked question, I can have multiple foreign keys in one table and I do not have to always know all of them when uploading data)
Here is an example (simplified):
There are three tables in my DB:
Company
Investor
Investment
Investment table has among others following columns:
Company FK
Investor FK
Problem:
I wanted to know what will be the consequences for the end user, f.e. data analyst, when I will allow "Null value" for Investor FK.
Therefore I think, my question was best answered by Vojta F who showed me both pros and cons of my solution from a perspective of a DB user.
As a DB user (i.e. not a DB admin) I think it is perfectly fine to omit a not null constraint from a foreign key if you don't know its value upon upload. The effect of such an omission is two-fold:
positive: it will be easier for you to upload new data - you won't be forced to insert a fkey value which I think is fine as long as you are aware of this when joining on this column ,
negative: weaker data integrity: it will be harder to resolve records among multiple tables and you'll have to think about the nulls when joining.
In general the gain for using NULL when you need it exceeds any performance, etc, loss (or even gain).
The space consumed so small as to be not worth computing.
The speed considerations are usually non-existent. The Optimizer does a few things differently depending on the NULLability of a indexed column. But, again, your benefit of having (or not having) NULL is likely to exceed any downside.
There are a small number of restrictions. A PRIMARY KEY must include NOT NULL column(s).
I’m in the process of cleaning up an old database. I have a table, User, with Identity 1,1 set for the User.SequenceNumber column which is used as the foreign key in many tables as the users ID number. Another table, UserNotes, had many records added erroneously over the past years and the UserNotes.UserID is zero. My initial thoughts were to add a User.SequenceNumber of zero with the User.UserName of ‘Unknown’ to restore the foreign key constraint between User.SequenceNumber and UserNotes.UserID. I have been able to successfully do this in the test area with a few records.
My concern is, when I add the ‘zero’ row, will it start causing problems in my 5 million plus database that has approximately 6 reads plus 4 records saved per minute? I found where people have problems with the database doing this when they don’t want it to happen but not where someone wanted to do this intentionally.
Inserting a 0 into a column defined as IDENTITY(1,1) should be fine as far as the database is concerned, because you would be inserting a value outside the range allocated for automatically generated values (which, assuming int, is [1..2,147,483,647]), meaning that your arbitrary value would not interfere with automatic value generation.
This could affect your application, although not very likely. For instance, if your application sees a 0 reference as a special value not because it is 0 but because of the absence of rows with SequenceNumber = 0 in User, that logic would be broken with the addition of the 0 row.
Alternatively, you could just replace all the zeros in UserNotes.UserID with NULLs. That would seem to me a more natural way to designate an "unknown" reference. On the other hand, it would also seem to me more likely to affect the application, and possibly to a greater extent too – particularly if the application is already set to work with zeros as special reference values and designed in a way so as to immediately distinguish between a 0 and a NULL when reading the results of a query.
Either way, there is a bigger issue you seem to be facing here. The fact that UserNotes.UserID has been allowed to contain non-existing references of 0, a value not found in the User.SequenceNumber column, implies that UserNotes.UserID is not formally defined as a foreign key. When there is no formal foreign key/primary key relationship, there is no referential integrity either (from the database's stand point anyway), and your UserNotes table is prone to getting non-existent references even after the clean-up you are carrying out.
You should really consider establishing the formal relationship between the tables at the database level. That may require additional changes to the application (depending on how it is designed), but it would be a justified time/effort investment and might spare you some headache in the future.
It's not going to break the Identity property.
I think you're right to be concerned that an artificial change could have side effects. But I think the application-side ones are the concern... what else does your application do with the record that you haven't considered.
If the zero row is unique and you don't have any other constraints on the column, you can insert the row.
Most forums do not specify any default value for the ID column.
I wonder why they don't do that.
For example SMF. They specify for every column a default value except for the (PRIMARY KEY) ID column.
EDIT: I now see that ^this^ part of the question was a bit 'stupid'.
Is there an performance advantage by specifying a default value?
Why shouldn't you specify by default an empty value?
Will 'NOT NULL' complain if you use the value: (similar to $var = ''; in php)
EDIT: But I assume that you want a database to complain if it doesn't get, for example, a username. Of course, I'll not allow registration fields to be left empty (and I do validate them server-side), but if I'm not mistaken, than, if by magic the input to the database is still empty, the database will accept this. Assuming the database uses NOT NULL and a default value. Question: So what than is the use of NOT NULL, in combination with a default value?
Thanks in advance.
Please keep in mind that I am Dutch an therefore I may have made some mistakes.
If we assume that a default value for an id, primary key, column in a database made sense (and was even valid), think what the consequences would be:
First off, it would allow the user, or the DB scripts, to supply no id information (which is fine, since most scripts rely on the database to deal with that aspect anyway), but
it would also allow the database not to 'deal with' it, and to simply insert the new record with the default value, whatever that might be. This means that
in many situations multiple records are likely to be inserted which have the same id value/primary key. If multiple entries, into for example a forum Database, have the same value, how would you tell your users apart? How would you identify them?
The reason that no 'default' is used is because, a: it wouldn't make sense to allow it in the first place, and b: it'd be invalid (albeit b is a consequence of a).
Is there a performance benefit to supplying a default value?
Not so much a 'performance' benefit, but it does mean that if no value is supplied the software has, at least, some idea what to expect (whether that default is a string, a number, an enum...).
Why shouldn't you specify by default an empty value?
I think that all entries to the database should be controlled, whether that's the values you allow to be inserted, or the values inserted in the event of an error, or a failure of the user to supply a value. Allowing null-entries to your database likely a controllable situation, but it seems needlessly complex to compensate for null entries, when a default can be specified.
Will 'NOT NULL' complain if you use the value: (similar to $var = ''; in php)?
It's been a while since I last used MySQL, but null is not similar to an empty string (in the way that JavaScript assesses '' to be 'falsy'), it means, absolutely, nothing. Empty. A whole vacuum, and non-existence, of information. NOT NULL, then, requirese some (albeit it doesn't specify exactly what) information. This is why a default should be supplied, since it specifies precisely what will be found in the database in the even of the user specifying no data.
Edited in response to comments from OP (below):
But I assume that you want a database to complain if it doesn't get, for example, a username. Of course, I'll not allow registration fields to be left empty (and I do validate them serverside), but if I'm not mistaken, than if by magic the input to the database is still empty, the database will accept this. If it uses NOT NULL and a default value. So what than is the use of NOT NULL, in combination with a default value?
I can't speak to all situations, or to the choices made by all database-/web-admins, but in my personal use-cases I'd suggest a consideration of:
First, would an empty-value make sense for this field, and:
Second, if an empty field doesn't make sense, should a default value be used?
In some situations I don't need the details from a user (their gender, for example) has almost no impact on me, or any services I provide, so I don't care whether they supply it, or not. In this case a default isn't used.
In other cases, such as their age, it might make a difference (I cant' remember the name of the law, but I recall some mention of Facebook, and other online societies, being required to only allow those of age 13, or older, to participate). In this case I ask my users to say agree that they are, in fact, over the age of 13 (via a checkbox) and then later offer them an input to insert their actual age/date-of-birth. In the absence of a supplied data-point (and, honestly, other than the (presumed) legal obligation I really don't care how old my users are) my database defaults to thirteen (or in the case of birthday it defaults to thirteen-years-ago today (albeit with help from the php scripts).
Now, the the aspect of NOT NULL and a default value? There'd be no sense, so far as I can see.
The whole concept of NOT NULL is to ensure a user-supplied value (otherwise the record isn't created, and the user can't join (or whatever) the service for which they're registering). The default value is to supply a fall-back value in this absence. And, logically, if the two are allowed together on the same field? The default would be inserted before the NOT NULL condition is triggered. So...why bother? Do you have evidence that somebody, or some software, used both conditions on the same field? I can't, honestly, think why.
The ID column is an auto_increment in most cases. So a default value is invalid.
Edit:
Will 'NOT NULL' complain if you use the value: (similar to $var = ''; in php)
With NULL, the literal NULL is meant. Not something like zero or an empty string.
I have a 2 tables, one in which I have groups, the other where I set user restrictions of which groups are seen.
When I do LEFT JOIN and specify no condition, it shows me all records. When I do WHERE group_hide.hide!='true' it only shows these records that have false enum type set to them. With JOIN, other groups get the hide field set as "NULL".
How can I make it so that it excludes only these that are set to true, and show everything else that has either NULL or false?
In MySQL you must use IS NULL or IS NOT NULL when dealing with nullable values.
HEre you should use (group_hide.hide IS NULL OR group_hide.hide != 'true')
Don already provided good answer to the question that you asked and will solve your immediate problem.
However, let me address the point of wrong data type domain. Normally you would make hide be BOOLEAN but mysql does not really implement it completely. It converts it to TINYINT(1) which allows values from -128 to 127 (see overview of data types for mysql). Since mysql does not support CHECK constraint you are left with options to either use a trigger or foreign reference to properly enforce the domain.
Here are the problems with wrong data domain (your case), in order of importance:
The disadvantages of allowing NULL for a field that can be only 1 or 0 are that you have to employ 3 value logic (true, false, null), which btw is not perfectly implemented in SQL. This makes certain query more complex and slower then they need to be. If you can make a column NOT NULL, do.
The disadvantages of using VARCHAR for a field that can be only 1 or 0 are the speed of the query, due to the extra I/O and bigger storage needs (slows down reads, writes, makes indexes bigger if a field is part of the index and influences the size of backups; keep in mind that none of these effects might be noticeable with wrong domain of a single field for a smaller size tables, but if data types are consistently set too big or if the table has serious number of records the effects will bite). Also, you will always need to convert the VARCHAR to a 1 or 0 to use natural mysql boolean operators increasing complexity of queries.
The disadvantage of mysql using TINYINT(1) for BOOL is that certain values are allowed by RDBMS that should not be allowed, theoretically allowing for meaningless values to be stored in the system. In this case your application layer must guarantee the data integrity and it is always better if RDBMS guarantees integrity as it would protect you from certain bugs in application layer and also mistakes that might be done by database administrator.
an obvious answer would be:
WHERE (group_hide.hide is null or group_hide.hide ='false')
I'm not sure off the top of my head what the null behaviour rules are.
Sometimes an absent value can be represented (with no loss of function) without resort to a NULL-able column, e.g.:
Zero integer in a column that references the AUTO_INCREMENT row ID of another table
Invalid date value (0000-00-00)
Zero timestamp value
Empty string
On the other hand, according to Ted Codd's relational model, NULL is the marker of an absent datum. I always feel better doing something "the correct way" and MySQL supports it and the associated 3-value logic, so why not?
A few years ago I was woking on a performance problem and found that I could resolve it simply by adding NOT NULL to a column definition. The column was indexed but I don't remember other details. I have avoided NULL-able columns when there is an alternative since then.
But it has always bothered me that I don't properly understand the performance effects of allowing NULL in a MySQL table. Can anyone help out?
It saves 1 bit per column. http://dev.mysql.com/doc/refman/5.0/en/data-size.html
Doesn't seem like much, but over millions of rows it starts to make a difference