NULL in table fields better than empty fields? - mysql

i wonder if one should have NULL in the fields with no value, or should it be empty?
what is best?
thanks

NULL means that no data is set, while empty string could be some valid data.
Thus, using NULL helps you to differentiate these two cases.

From a programming standpoint, I try to not allow null values for a few reasons. One of which is that code often has a bad reaction to unexpected NULL values. If a query filter ran faster checking null values I might consider using them but there is no evidence of this I have experienced. But I have experienced many a function which pooped out on doing some kind of comparison not testing for NULL before hand.

There is a certain argument that you should never allow NULL in your data, if you are using it to indicate that you don't know what the value should be or that you just don't have that data yet then use an explicit value in the field to indicate those states. Similarly for 'empty' fields. That said, I think everyone does it or has done it and may do it again. NULL has odd comparative properties which is why it's always best, if you can, to avoid it and have explicit values for missing data states.

Avoid NULLs in base tables whenever three valued logic is likely to come back to bite you. That's easy to say, but lengthy to explain. Three valued logic can sometimes be successfully managed, but your intuition is likely to be based on two valued logic, and can be misleading.
If you avoid NULLS in base tables, but create views or queries with outer joins, be prepared to deal with NULLS. NULLS in fields that are never used in where clauses and never used "incorrectly" with aggregates (as in sum(FIELD)) are OK.
NULL fields are always empty, but empty doesn't always imply NULL. In particular, an empty or non existent field in a form can translate into a non NULL value in a table. Autonumber fields are an example.
Oracle made a mistake way back in the 1980s by using the same representation for the VARCHAR string of length zero (the empty string), and NULL. They've been about to fix it "real soon now" for a quarter of a century. Don't hold your breath.
Don't use NULLs to convey a meaningful message. This almost always confuses your colleagues, even when they deny it.

Nulls are necessary amd important tools in dataase design. If you don'tknow the value at the time the record is inseerted, null is entirely appropriate and the best practice. Making an unknon into a known value such as empty string is silly. This especially true when you get away from string data into dates or numeric data. 0 is not the same as null, some arbitrary date far in the past or future is not the same as null. For that matter empty strings means there is no value, null means we don't know what the value is. This is an important distinction.
It's not hard to handle nulls, any competent programmer should be able to do so.

Related

mysql: 'WHERE something!=true' excludes fields with NULL

I have a 2 tables, one in which I have groups, the other where I set user restrictions of which groups are seen.
When I do LEFT JOIN and specify no condition, it shows me all records. When I do WHERE group_hide.hide!='true' it only shows these records that have false enum type set to them. With JOIN, other groups get the hide field set as "NULL".
How can I make it so that it excludes only these that are set to true, and show everything else that has either NULL or false?
In MySQL you must use IS NULL or IS NOT NULL when dealing with nullable values.
HEre you should use (group_hide.hide IS NULL OR group_hide.hide != 'true')
Don already provided good answer to the question that you asked and will solve your immediate problem.
However, let me address the point of wrong data type domain. Normally you would make hide be BOOLEAN but mysql does not really implement it completely. It converts it to TINYINT(1) which allows values from -128 to 127 (see overview of data types for mysql). Since mysql does not support CHECK constraint you are left with options to either use a trigger or foreign reference to properly enforce the domain.
Here are the problems with wrong data domain (your case), in order of importance:
The disadvantages of allowing NULL for a field that can be only 1 or 0 are that you have to employ 3 value logic (true, false, null), which btw is not perfectly implemented in SQL. This makes certain query more complex and slower then they need to be. If you can make a column NOT NULL, do.
The disadvantages of using VARCHAR for a field that can be only 1 or 0 are the speed of the query, due to the extra I/O and bigger storage needs (slows down reads, writes, makes indexes bigger if a field is part of the index and influences the size of backups; keep in mind that none of these effects might be noticeable with wrong domain of a single field for a smaller size tables, but if data types are consistently set too big or if the table has serious number of records the effects will bite). Also, you will always need to convert the VARCHAR to a 1 or 0 to use natural mysql boolean operators increasing complexity of queries.
The disadvantage of mysql using TINYINT(1) for BOOL is that certain values are allowed by RDBMS that should not be allowed, theoretically allowing for meaningless values to be stored in the system. In this case your application layer must guarantee the data integrity and it is always better if RDBMS guarantees integrity as it would protect you from certain bugs in application layer and also mistakes that might be done by database administrator.
an obvious answer would be:
WHERE (group_hide.hide is null or group_hide.hide ='false')
I'm not sure off the top of my head what the null behaviour rules are.

NULL or set a default value

I have a table for user questions and each question has a "question_score" field. for this field, if i allow it to be a NULL then this may save some spaces or even maybe save some CPU times, will it?
question_id (int) | quesion_name (varchar) | question_score (int) ...
You won't save any space or cpu time.
Even if the value is NULL, MySQL will still have to store this fact.
Sometimes, permits NULL value is the right thing to do, but surely not when the reason is some non-existent space optimization. Default value is generally the way to go.
Use nulls if you cannot know the value at the time the data is entered unless there is a reasonable alternative. For instance if you don't know the name of something, you could perhaps have a default value of 'Unknown'. however if you don't know the enddate, don't put in some fake data that you always have to remember to code around. And never put in data that is fake than might conceivably be a real value (0 for price for instance) or you won't be able to tell the items you gave away from the ones you haven't set a price for yet. Nulls are good, use them when appropriate. It is much harder to work around fake data used in place of nulls (and far more likely to have a problem where you aren't returning the correct results, but don't know it) in my 30 years of database experience than it is to properly handle nulls.
In short, no. It's a question of correctness, not performance. Nulls almost invariably lead to inconsistencies and incorrect results. The best strategy is to design the database in Normal Form without nulls, unless and until you find a compelling reason to do otherwise.
In my opinion are null-values always a no-go. Please apply database normalization to at least to the third level. Probably your db-structure is not optimal.

Performance effects of using NULL-able fields in MySQL

Sometimes an absent value can be represented (with no loss of function) without resort to a NULL-able column, e.g.:
Zero integer in a column that references the AUTO_INCREMENT row ID of another table
Invalid date value (0000-00-00)
Zero timestamp value
Empty string
On the other hand, according to Ted Codd's relational model, NULL is the marker of an absent datum. I always feel better doing something "the correct way" and MySQL supports it and the associated 3-value logic, so why not?
A few years ago I was woking on a performance problem and found that I could resolve it simply by adding NOT NULL to a column definition. The column was indexed but I don't remember other details. I have avoided NULL-able columns when there is an alternative since then.
But it has always bothered me that I don't properly understand the performance effects of allowing NULL in a MySQL table. Can anyone help out?
It saves 1 bit per column. http://dev.mysql.com/doc/refman/5.0/en/data-size.html
Doesn't seem like much, but over millions of rows it starts to make a difference

Dealing with null values versus empty strings in a database when only empty strings are coming from the client via HTTP POST

My MySQL database has carefully defined fields. Some fields, if there is a chance that they can be unknown, allow a NULL value.
I'm writing a web-based CMS to handle the data in said database. Obviously, the post arrays from the HTML forms never contain null values, only empty strings.
I don't want to confuse the users of my CMS by adding a "NULL checkbox" or anything like that but I can't tell from the post arrays whether a field should be null or empty.
Should I convert all empty strings to NULL values on save for fields that allow null values?
What are some good practices for this type of conundrum?
That is a sticky question. Of course there is no way for a user to know whether they want to insert an empty string or nothing at all. So the answer is really up to you. Do you want to allow users to enter empty strings? Does it have any meaning for it to do so? If an empty string means something that a NULL string doesn't and that is well defined, go ahead and allow both.
If there is no distinction between them, pick one and stick with it. I personally don't see why you would need to keep empty strings like that, it just makes things far more confusing later. Just stick with NULL to represent an absence of data.
Make your decision now, document it and stick by it.
Do you really want to deal with actual proper NULL, with its tricky three-value logic?
In my experience you rarely actually want the explicit uncertainty that comes with NULL. Storing an empty string for values the user doesn't care to fill in is usually more practical. Then you can go ahead and do searches like WHERE t.field<>'x' or WHERE t0.field=t1.field without having to worry about what that does to your boolean logic when one or both are null.
If you've already got a working database that relies on the uncertain nature of null, and that's an intrinsic part of your requirements, then fine, stick with null (and in that case you'll probably have to convert empty user input in a nullable field to null just because no end-user is ever going to be able to understand the conceptual difference between nothing and null).
But personally I only still use nulls for optional foreign key references (when not using a separate join table).

MySQL: NULL vs ""

Is it better to use default null or default "" for text fields in MySQL?
Why?
Update: I know what means each of them. I am interested what is better to use considering disk space and performance.
Update 2: Hey ppl! The question was "what is better to use" not "what each means" or "how to check them"...
For MyISAM tables, NULL creates an extra bit for each NULLABLE column (the null bit) for each row. If the column is not NULLABLE, the extra bit of information is never needed. However, that is padded out to 8 bit bytes so you always gain 1 + mod 8 bytes for the count of NULLABLE columns. 1
Text columns are a little different from other datatypes. First, for "" the table entry holds the two byte length of the string followed by the bytes of the string and is a variant length structure. In the case of NULL, there's no need for the length information but it's included anyways as part of the column structure.
In InnoDB, NULLS take no space: They simply don't exist in the data set. The same is true for the empty string as the data offsets don't exist either. The only difference is that the NULLs will have the NULL bit set while the empty strings won't. 2
When the data is actually laid out on disk, NULL and '' take up EXACTLY THE SAME SPACE in both data types. However, when the value is searched, checking for NULL is slightly faster then checking for '' as you don't have to consider the data length in your calculations: you only check the null bit.
As a result of the NULL and '' space differences, NULL and '' have NO SIZE IMPACT unless the column is specified to be NULLable or not. If the column is NOT NULL, only in MyISAM tables will you see any peformance difference (and then, obviously, default NULL can't be used so it's a moot question).
The real question then boils down to the application interpretation of "no value set here" columns. If the "" is a valid value meaning "the user entered nothing here" or somesuch, then default NULL is preferable as you want to distinguish between NULL and "" when a record is entered that has no data in it.
Generally though, default is really only useful for refactoring a database, when new values need to come into effect on old data. In that case, again, the choice depends upon how the application data is interpreted. For some old data, NULL is perfectly appropriate and the best fit (the column didn't exist before so it has NULL value now!). For others, "" is more appropriate (often when the queries use SELECT * and NULL causes crash problems).
In ULTRA-GENERAL TERMS (and from a philosophical standpoint) default NULL for NULLABLE columns is preferred as it gives the best semantic interpretation of "No Value Specified".
1 [http://forge.mysql.com/wiki/MySQL_Internals_MyISAM]
2 [http://forge.mysql.com/wiki/MySQL_Internals_InnoDB]
Use default null. In SQL, null is very different from the empty string (""). The empty string specifically means that the value was set to be empty; null means that the value was not set, or was set to null. Different meanings, you see.
The different meanings and their different usages are why it's important to use each of them as appropriate; the amount of space potentially saved by using default null as opposed to default "" is so small that it approaches negligibility; however, the potential value of using the proper defaults as convention dictates is quite high.
From High Performance MySQL, 3rd Edition
Avoid NULL if possible.
A lot of tables include nullable columns even when the application does not need
to store NULL (the absence of a value), merely because it’s the default. It’s usually
best to specify columns as NOT NULL unless you intend to store NULL in them.
It’s harder for MySQL to optimize queries that refer to nullable columns, because
they make indexes, index statistics, and value comparisons more complicated. A
nullable column uses more storage space and requires special processing inside
MySQL. When a nullable column is indexed, it requires an extra byte per entry
and can even cause a fixed-size index (such as an index on a single integer column)
to be converted to a variable-sized one in MyISAM.
The performance improvement from changing NULL columns to NOT NULL is usually
small, so don’t make it a priority to find and change them on an existing schema
unless you know they are causing problems. However, if you’re planning to index
columns, avoid making them nullable if possible.
There are exceptions, of course. For example, it’s worth mentioning that InnoDB
stores NULL with a single bit, so it can be pretty space-efficient for sparsely populated
data. This doesn’t apply to MyISAM, though.
I found out that NULL vs "" is insignificant in terms of disk-space and performance.
The only true reason I can personally see in using NULL over '' is when you have a field marked as UNIQUE but need the ability to allow multiple "empty" columns.
For example, the email column in my user table is only filled in if someone actually has an email address. Anyone without an email address gets NULL. I can still make this field unique because NULL isn't counted as a value, whereas the empty string '' is.
A lot of folks are answering the what is the difference between null and '', but the OP has requested what takes up less space/is faster, so here's my stab at it:
The answer is that it depends. If your field is a char(10), it will always take 10 bytes if not set to null, and therefore, null will take up less space. Minute on a row-by-row basis, but over millions and millions of rows, this could add up. I believe even a varchar(10) will store one byte (\0) as an empty string, so again this could add up over huge tables.
In terms of performance in queries, null is in theory quicker to test, but I haven't seen able to come up with any appreciable difference on a well indexed table. Keep in mind though, that you may have to convert null to '' on the application side if this is the desired return. Again, row-by-row, the difference is minute, but it could potentially add up.
All in all it's a micro-optimization, so it boils down to preference. My preference is to use null because I like to know that there's no value there, and not guess if it's a blank string ('') or a bunch of spaces (' '). null is explicit in its nature. '' is not. Therefore, I go with null because I'm an explicit kind of guy.
Use whatever makes sense. NULL means "no value available/specified", "" means "empty string."
If you don't allow empty strings, but the user does not have to enter a value, then NULL makes sense. If you require a value, but it can be empty, NOT NULL and a value of "" makes sense.
And, of course, if you don't require a value, but an empty value can be specified, then NULL makes sense.
Looking at an efficiency point of view, an extra bit is used to determine whether the field is NULL or not, but don't bother about such micro-optimization until you have millions of rows.
"" is like an empty box... null is like no box at all.
It's a difficult concept to grasp initially, but as the answers here plainly state - there is a big difference.
'' = '' yields TRUE which satisfies WHERE condition
NULL = NULL yields NULL which doesn't satisfy WHERE condition
Which is better to use depends on what result you want to get.
If your values default to NULL, no query like this:
SELECT *
FROM mytable
WHERE col1 = ?
will ever return these values, even if you pass the NULL for the bound parameter, while this query:
SELECT *
FROM mytable
WHERE col1 = ''
will return you the rows that you set to an empty string.
This is true for MySQL, but not for Oracle, which does not distinguish between empty string and a NULL.
In Oracle, the latter query will never return anything.
Use "". It requires less programming effort if you can assert that columns are non-null. Space difference between these is trivial.
I prefer null when it is semantically correct. If there is an address field available and the user did not fill in, I give it a "". However if there in an address attribute to in the users table yet I did not offer the user a chance to fill it in, I give it a NULL.
I doubt (but I can't verify) that NULL and "" makes much of a difference.
In general, NULL should indicate data that is not present or has not been supplied, and therefore is a better default value than the empty string.
Sometimes the empty string is what you need as a data value, but it should almost never be a default value.
NULL means 'there is no value' and is treated especially by RDBMSs regarding where clauses and joins.
"" means 'empty string' and is not treated especially.
It depends on what does the text represent and how will it actually be used in queries.
For example, you can have a questionnaire with some obligatory questions and some optional questions.
Declined optional questions should have a NULL in their corresponding column.
Obligatory questions should have an empty string as default, because they HAVE to be answered. (Of course in a real application you'd tell the user to enter something, but I hope you get the idea)