MySQL NULL or NOT NULL That is The Question? - mysql

What is the difference between NULL and NOT NULL? And when should they be used?

NULL means you do not have to provide a value for the field...
NOT NULL means you must provide a value for the fields.
For example, if you are building a table of registered users for a system, you might want to make sure the user-id is always populated with a value (i.e. NOT NULL), but the optional spouses name field, can be left empty (NULL)

I would suggest
Use NOT NULL on every field if you can
Use NULL if there is a sensible reason it can be null
Having fields which don't have a meaningful meaning for NULL nullable is likely to introduce bugs, when nulls enter them by accident. Using NOT NULL prevents this.
The commonest reason for NULL fields is that you have a foreign key field which is options, i.e. not always linked, for a "zero or one" relationship.
If you find you have a table with lots of columns many of which can be NULL, that starts sounding like an antipattern, consider whether vertical partitioning makes more sense in your application context :)
There is another useful use for NULL - making all the columns in an index NULL will stop an index record being created for that row, which optimises indexes; you may want to index only a very small subset of rows (e.g. for an "active" flag set on only 1% or something) - making an index which starts with a column which is usually NULL saves space and optimises that index.

What is the difference between NULL and NOT NULL?
When creating a table or adding a column to a table, you need to specify the column value optionality using either NULL or NOT NULL. NOT NULL means that the column can not have a NULL value for any record; NULL means NULL is an allowable value (even when the column has a foreign key constraint). Because NULL isn't a value, you can see why some call it optionality - because database table requires that in order to have a column, there must be an instance of the column for every record within the table.
And when should they be used?
That is determined by your business rules.
Generally you want as many columns as possible to be NOT NULL because you want to be sure data is always there.

NOT NULL means that a column cannot have the NULL value in it - instead, if nothing is specified when inserting a row for that column, it will use whatever default is specified (or if no default is specified, whatever MySQL's default is for that type).
Fields that aren't NOT NULL can potentially have their value as NULL (which essentially means a missing/unknown/unspecified value). NULL behaves differently than normal values, see here for more info.

As others have answered, NOT NULL simply means that NULL is not a permitted value. However, you will always have the option of empty string '' (for varchar) or 0 (for int), etc.
One nice feature when using NOT NULL is that you can get an error or warning should you forget set the column's value during INSERT. (assuming the NOT NULL column has no DEFAULT)
The main hiccup with allowing NULL columns is that they will never be found with the <> (not equal) operator. For example, with the following categorys
Desktops
Mobiles
NULL -- probably embedded devices
The = operator works as expected
select * from myTable where category="Desktops";
However, the <> operator will exclude any NULL entries.
select * from myTable where category<>"Mobiles";
-- returns only desktops, embedded devices were not returned
This can be the cause of subtle bugs, especially if the column has no NULL data during testing initial, but later some NULL values are added due to subsequent development. For this reason I set all the columns to NOT NULL.
However, it can be helpful to allow NULL values when using a UNIQUE KEY/INDEX. Normally a unique key requires the column (or combination of columns) to be unique for the whole table. Unique keys are a great safeguard that the database will enforce for you.
In some cases, you may want the safeguard for most of the rows, but there are exceptions.
If any column referenced by that particular UNIQUE KEY is NULL, then the uniqueness will no longer be enforced for that row. Obviously this would only work if you permit NULLs on that column, understanding the hiccup I explained above.
If you decide to allow NULL values, consider writing your <> statements with an additional condition to detect NULLs.
select * from myTable where category<>"Desktops" or category is null;

NOT NULL is a column constraint and should be used when you have a column that's not in a primary key (primary keys are intrinsically not-null so it's silly to say NOT NULL explicitly about them) of which you know the values will always be known (not unknown or missing) so there's no need for nulls in that column.
NULL is a keyword occurring in many contexts -- including as a column constraint, where it means the same as the default (i.e., nulls are allowed) -- but also in many other contexts, e.g. to insert a null in a column as part of an INSERT...VALUES statement.

Also note that NULL is not equal to anything else, even not to NULL itself.
For example:
mysql> select if(NULL=NULL, "null=null", "null!=null");
+------------------------------------------+
| if(NULL=NULL, "null=null", "null!=null") |
+------------------------------------------+
| null!=null |
+------------------------------------------+
1 row in set (0.00 sec)
This definition of NULL is very useful when you need a unique key on a column that is partially filled. In such case you can just leave all the empty values as NULL, and it will not cause any violation of the uniqueness key, since NULL != NULL.
Here is an example of how you can see if something is NULL:
mysql> select if(null is null, "null is null", "null is not null");
+------------------------------------------------------+
| if(null is null, "null is null", "null is not null") |
+------------------------------------------------------+
| null is null |
+------------------------------------------------------+
1 row in set (0.01 sec)

If you're not sure, use NOT NULL.
Despite the common belief, NOT NULL doesn't require you to fill all fields; it just means whatever you omit will have the default value. So no, it doesn't mean pain. Also, NULL is less efficient in terms of indexing, and causes many edge case situations when processing what you receive from a query.
So, while of course NULL values have a theoretical meaning (and in rare cases you can benefit from this), most of the time NOT NULL is the way to go. NOT NULL makes your fields work like any variable: they always have a value, and you decide if that value means something or not. And yes, if you need all the possible values and one extra value that tells you there's simply nothing there, you can still use NULL.
So why do they love NULL so much?
Because it's descriptive. It has a semantic meaning, like "Nnah, wait, this is not just an empty string, this is a lot more exotic - it's the lack of information!" They will explain how it's different to say "time is 00:00" and "i don't knot what time it is". And this is valid; it just takes some extra effort to handle this. Because the system will allocate extra space for the information "is there a value at all" and it will constantly struggle checking it out. So for the tiny piece of semantic beauty, you sacrifice time and storage. Not much, but still. (Instead, you could have said "99:99" which is clearly an invalid time and you can assign a constant to it. No, don't even start, it's just an example.)
The whole phenomenon reminds me of the good old isset debate where people are somehow obsessed with the beauty of looking at a nonexistent array index and getting an error message. This is completely pointless. Practical defaults are a blessing, they simplify the work in a "you know what I mean" style and you can write more readable, more concise, more expressive code that will make sense even after you spend 4 years with other projects.
Inevitable NULLs
If you have JOINs, you will encounter NULLs sooner or later, when you want to join another row but it's not there. This is the only valid case where you can't really live without NULLs; you must know it's not just an empty record you got, it's no record at all.
Otherwise? NOT NULL is convenient, efficient, and gives you fewer surprises. In return, it will be called ignorant and outrageous by some people with semantical-compulsive disorder. (Which is not a thing but should be.)
TL;DR
Prefer NOT NULL when possible.
It's a weird thing - for the machine, too.

Related

what is the problem of "null" values in mysql or other dbs?

As far as I know, null can be indexed in InnoDB. But many colleagues say the null values are bad DB designs. So I don't know what is the problem of "null", is the 3rd value (eq, not eq, not known) problem or something else that stops people use nullable columns?
NULL is an essential piece of SQL. It indicates a value that does not exist.
Avoiding NULL would lead into situations where you make up arbitary special values for items that does not exist (0, -1, empty strings etc). That would be bad design.
I would suggest that 80% of columns should be declared NOT NULL.
But there are many cases where NULL is the 'natural' thing to use --
A start_time is known but the end_time is not yet known.
An optional attribute
etc.
As for indexing, read the rules, and be ready to abide by them. PRIMARY KEY disallows NULLs, but UNIQUE allows them. But are they treated as "equal" or not.
As already mentioned, IS NULL and IS NOT NULL work as expected, but = NULL does not. Note also <=>.
LEFT JOIN creates artificial NULLs (when the 'right' table's row is missing). An example of that usage:
FROM a
LEFT JOIN b ON ...
WHERE b.id IS NULL
See COALESCE() for a way to turn NULL into something else. Example:
SELECT ...,
( SELECT name FROM foo WHERE ... ) AS foo_name,
...,
FROM ...
may deliver NULL. This would be friendlier:
SELECT ...,
COALESCE(( SELECT name FROM foo WHERE ... ), 'N/A') AS foo_name,
...,
FROM ...
Personally, I often shun NULL in these two places:
string VARCHAR(99) NOT NULL DEFAULT ('') -- empty string is usually good enough
choice ENUM('unknown', 'this', 'that') NOT NULL -- easier to display and test
There is virtually no performance difference (space or speed) between NULL and NOT NULL.

INNODB -> Should the default column be zero or null?

My english is a little weak sorry
When using INNODB, must the column be 0 or should it be null?
Does the problem occur if the joined columns are defined as 0?
Finally, which is better in terms of performance?
Thanks.
Ignore the problem. You do not have to provide a DEFAULT value. You do not have to declare a column NULL (or NOT NULL).
Instead...
Think about the application.
Case 1: Your application will always have a value for that column, and it will always specify the value. Then declaring the column to be NOT NULL and not providing a DEFAULT makes sense.
Case 2: You don't have a value yet. Example: The table has start_date and end_date. You create a row when the thing "starts", so you fill in start_date (see Case 1) but leave end_date empty. "Empty" could be encoded as NULL and DEFAULT NULL. Later you UPDATE the table to fill in the end_date with a real value.
Case 3: The table has a "counter". When you initially add a row, the counter needs to start with "1". Later you will UPDATE ... SET counter=counter+1. You could either explicitly put the "1" in when you create the row, or you could leave out the value when inserting, but have the column declared NOT NULL DEFAULT '1'
NULL could represent "not-yet-filled-in" (Case 1), "don't know the value", "use the default", and several other things. This is an application choice.
There are other uses for NULL. Unless you have a use for NULL, declare each column NOT NULL.
If, when INSERTing, you specify all the columns, then there is no need for DEFAULTs. DEFAULTs are a convenience, not a necessity. Without a DEFAULT, you get "0" for numeric NOT NULL columns, or '' for strings.
Performance -- Probably not an issue. Do you have a particular example we should discuss?
JOINing -- I would avoid joining on a column that might have NULL in it. This rarely happens in "real life", since one joins on the unique identifier for rows in one column with a column in the other table:
FROM A JOIN B ON B.id = A.b_id
That is, B.id is probably the PRIMARY KEY of B, hence cannot be NULL. On the other hand, A.b_id could be NULL to indicate there is no row in B corresponding to the row in A. No problem.

MySQL: Why would an `ID` column be defined with `NOT NULL` if using `AUTO_INCREMENT`?

I was reading somewhere that adding the AUTO_INCREMENT will allow the id column to automatically generate sequential numbers starting from 1. However, in that case, it seems that there is no need to define NOT NULL and UNIQUE. Why is it that I still see many examples online using NOT NULL with AUTO_INREMENT, when creating a table?
Per the documentation, if you assign null values to an auto_id which is defined as not-null, it will replace null with the next sequential value. As ID is likely your primary key (or part of your key), it shouldn't ever take null values.
Also, per this example, if you don't specify that it should be non-null, MySQL will supply this for you.
Okay, after re-reading documentation it seems that if NOT NULL is defined... When inserting values, one may use the value NULL to automatically assign the next sequential number. If NOT NULL is not defined, then one may only use the value 0 to assign the next sequential number. Also, it should be noted that UNIQUE doesn't need to be defined because ID's are usually used in conjunction with PRIMARY KEY which achieves the same.

Default scope of a null datetime. How to index?

In my app, there is a very large table (>40 million rows) that will have a default scope set on the model.
The default scope will look at a specific DATETIME column and check that it IS NULL. The DATETIME column will probably never be used to search for a specific date. Should I be using an index here, and if so, how?
The WHERE <column_name> IS NULL will be added to almost every single query made on this table from the app. On the one hand, since the column is essentially being treated as a boolean, I am tempted to think that it should not be indexed. However, it seems that with such a huge table, an index should provide value, especially for a query like
SELECT COUNT(*) FROM <table_name> WHERE <column_name> IS NULL
I am also a bit confused about how I should index, since the WHERE clause will be appended to every query. I do not think that it would make sense to created an index on all columns of this table. This is being done in MySQL. Thanks

MySQL unique index doesnt apply when values are NULL

I have a table for detecting views of articles - it should have one unique row for each article_id&&NULL&&IP when noone is logged in and unique row for each article_id&&loggedInUser&&IP. So I thought that when noone is logged in I will just add a NULL instead of user_id. But MySQL suprised me - when I've added UNIQUE KEY like article_id&&user_id&&IP it worked fine for logged in users, but if no user logged in it started to add rows like (e.g.):
article_id | user_id | IP
5 NULL 192.168.3.50
5 NULL 192.168.3.50
5 NULL 192.168.3.50
5 NULL 192.168.3.50
This doesnt seem much unique - I know it is caused by NULL but how to solve this? Should I just rely on the fact that no user will have user_id "0"?
Thanks.
This is intentional and is documented:-
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. This constraint does not apply to NULL values except for the BDB storage engine. For other engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL. If you specify a prefix value for a column in a UNIQUE index, the column values must be unique within the prefix.
While you could use a user id of 0 I would be concerned that you might have 0 used elsewhere when you do not want a record found. For example I often just convert any input id field to an integer and if someone has tried to hack around and enter a string this might well be converted to 0. In such a case I wouldn't really want the zero to be meaningful.
I would possibly be tempted to set up a 'none' userid to use in this situation.
Your current solution is going to grow huge very quickly and provide very little benefit. If it were me I would just rely on analytics to handle this sort of data. If you really want this it can be done very easily by adding one more field to your table for a count. When you are about to add a row to this look for one which already exists. If one does then instead of adding a new record just update the current record and increment the count instead. This will provide the exact same information in much less space.