I have added ENUM column in my table, and all my rows that are already existing have got a NULL value. The only problem is that I don't get how to select and upgrade them to what I need.
I've tried something like UPGRADE users Status='User' where Status=NULL and other options like "... where Status='' " but none of these bring me the solution.
I want to change every NULL value to 'User'.
NULL is a special value, in pretty much any normal comparison, if NULL is involved the result is NULL; which is not TRUE, which means effectively FALSE. When comparing for NULL you need to is x IS NULL, or x IS NOT NULL.
That said, I generally recommend avoiding MySQL's ENUM data type. Adding new values involves ALTER TABLE, which involves rebuilding the table. They also do not "play nice" with a lot of libraries used to connect to databases; and have weird semantic in general. You're usually much better off with a lookup table using standard data types.
Also, the statement type you're looking for is UPDATE, there is no "UPGRADE".
Try:
UPDATE users
SET Status = 'User'
WHERE Status IS NULL;
This is because NULL is a special value. Conceptually, NULL means “a missing unknown value” and it is treated somewhat differently from other values. To test for NULL, use the IS NULL and IS NOT NULL operators.
Related
I have a table structure shown below contains Structure of Roles Table I taken:
Let it be a "roles" table contains some records related to roles of users.
Now here I have taken one column "is_archived(int)" which I am using to get to know that role still exists or deleted.
So I am considering two values for that column:
"NULL"=> if that role still exists (like TRUE),
"1" => if deleted /inactive (like FALSE)
For my table maximum records will contain "NULL" value for this column and Default value is also "NULL".
Now I am in a dilemma that is there any performance issue in this case as I am using "NULL" instead of "0".
I need to know the pros and cons of this case(Like "Search Performance", "Storage", "indexing", etc).
And in case of cons, what are the best alternatives?
My opinion is that NULL is for "out of band", not for kludging an in-band value. If there is any performance or space difference, it is insignificant.
For true/false, use TINYINT NOT NULL. It is only 1 byte. You could use ENUM('false', 'true'); it is also 1 byte.
INT, regardless of the number after it, takes 4 bytes. Don't use INT for something of such low cardinality.
Leave NULL to mean "not yet known" or any other situation where you can't yet say "true" or "false". (Since you probably always know if it is 'archived', NULL has no place here.
You could even use ENUM('male', 'female', 'decline_to_state', 'transgender', 'gay', 'lesbian', 'identifies_as_male', 'North_Carolina_resident', 'other'). (Caveat: That is only a partial list; it may be better to set up a table and JOIN to it.)
I agree with #RickJames about NULL. Don't use NULL where you mean to use a real value like true. Likewise, don't use a real value like 0 or '' to signify absence of a value.
As for performance impact, you should know that to search for the presence/absence of NULL you would use the predicate is_archive IS [NOT] NULL.
If you use EXPLAIN on the query, you'll see that that predicate counts as a "range" access type. Whereas searching for a single specific value, e.g. is_archive = 1 or is_archive = 0 is a "ref" access type.
That will have performance implications for some queries. For example if you have an index on (is_archived, created_on) and you try to do a query like:
SELECT ... FROM roles
WHERE is_archived IS NULL AND created_on = '2017-01-31'
Then the index will only be half-useful. The WHERE clause cannot search the second column in the index.
But if you use real values, then the query like:
SELECT ... FROM roles
WHERE is_archived = 0 AND created_on = '2017-01-31'
Will use both columns in the index.
Re your comment about NULL storage:
Yes, in the InnoDB storage engine, internally each row stores a bitfield with 1 bit per column, where the bits indicate whether each column is NULL or not. These bits are stored compactly, i.e. one byte contains up to 8 bits. Following the bitfield is the series of column values. A column that is NULL stores no value. So yes, technically it is true that using a NULL reduces storage.
However, I urge you to simplify your data management and use false when you mean false. Do not use NULL for one of your values. I suppose there's an exception if you manage data at a scale where saving one byte per row matters. For example, if you are managing tens of billions of rows.
But at a smaller scale than that, the potential space savings aren't worth the extra complexity you add to your project.
To put it in perspective, InnoDB pages only fill each data page 15/16 full anyway. So the overhead of the InnoDB page format is likely to be greater than the savings you could get from micro-optimizing boolean storage.
MySQL provides a nice operator <=> that works with comparisons that could contain a null such as null <=> null or null <=> 5 etc. giving back intuitive results as many programming languages. Whereas the normal equals operator always just returns null, which catches many new MySQL users such as myself awry.
Is there a reason MySQL has both and not JUST the functionality in <=> ? Who really needs an operator that is effectively undefined with built in language types?
Who really needs an operator that is effectively undefined with built in language types?
You asked for some real-world examples. Here's a spurious one.
Let's say that you have a residential youth programme or similar, and one of the requirements is that the kids only share a room with someone of the same sex. You have a nullable M/F field in your database - nullable because your data feed is incomplete (you're still chasing down some of the data).
Your room-matching code should definitely not match students where t1.Gender<=>t2.Gender, because it could end up matching two kids of unknown gender, who might be of opposite genders. Instead, you match where they're equal and not both null.
That's just one example. I admit that the behaviour of NULL and the = operator have caused a lot of confusion over the years, but ultimately the fault probably lies with the plethora of online MySQL tutorials that make no mention of how NULL interacts with operators, nor of the existence of the <=> operator.
The big difference between null in mySQL and in programming languages is that in mySQL, null means unknown value while in programming it means undefined value.
In mySQL, null does not equal null (unknown does not equal unknown). While in programming languages, null does equal null (undefined equals undefined).
Who really needs an operator that is effectively undefined with built
in language types?
you also need it for relations inside your database. specially if you are using foreign key constraints.
for example if you have a table for tasks (in your company). then you assign these tasks to employees. so you have a relation from your tasks-table to your employees-table.
and there will always be some unassigned tasks. in this case the field in your tasks-table you use for the relation to the employees table will contain NULL.
this will make sure, that this task is unassigned. which means: there is no possibility that there is a relation to the employees table.
if NULL = NULL would be true, then in my example there would be always the possibility that the foreign key in the employees table also is NULL. thus the task would be assigned to one or some employees. and you never would be able to know for sure wheter a task is assigned to some employee or not.
Yes.
This must be because relational databases use the theory of three-valued logic (TRUE, NULL, FALSE).
And the three-valued logic must work so because it must be internally consistent.
It follows from the rules of mathematics.
Comparisons with NULL and the three-valued logic
Is there a reason MySql has both and not JUST the functionality in <=>
?
The operators are completely different from each other.
<=> performs an equality comparison like the = operator, but returns 1 rather than NULL if both operands are NULL, and 0 rather than NULL if one operand is NULL.
Who really needs an operator that is effectively undefined with built
in language types?
This depends on case, just because you haven't encountered such cases, does not mean nobody needs it.
When this <=> null safe equals operator combines with NOT operator, it fits to real-world use cases. Example to return records that are not dormant:
WHERE NOT dormant_status <=> 'Y'
it is equivalent to
WHERE dormant_status IS DISTINCT FROM 'Y' (only valid in certain databases)
or
WHERE dormant_status <> 'Y' OR dormant_status IS NULL
I have a 2 tables, one in which I have groups, the other where I set user restrictions of which groups are seen.
When I do LEFT JOIN and specify no condition, it shows me all records. When I do WHERE group_hide.hide!='true' it only shows these records that have false enum type set to them. With JOIN, other groups get the hide field set as "NULL".
How can I make it so that it excludes only these that are set to true, and show everything else that has either NULL or false?
In MySQL you must use IS NULL or IS NOT NULL when dealing with nullable values.
HEre you should use (group_hide.hide IS NULL OR group_hide.hide != 'true')
Don already provided good answer to the question that you asked and will solve your immediate problem.
However, let me address the point of wrong data type domain. Normally you would make hide be BOOLEAN but mysql does not really implement it completely. It converts it to TINYINT(1) which allows values from -128 to 127 (see overview of data types for mysql). Since mysql does not support CHECK constraint you are left with options to either use a trigger or foreign reference to properly enforce the domain.
Here are the problems with wrong data domain (your case), in order of importance:
The disadvantages of allowing NULL for a field that can be only 1 or 0 are that you have to employ 3 value logic (true, false, null), which btw is not perfectly implemented in SQL. This makes certain query more complex and slower then they need to be. If you can make a column NOT NULL, do.
The disadvantages of using VARCHAR for a field that can be only 1 or 0 are the speed of the query, due to the extra I/O and bigger storage needs (slows down reads, writes, makes indexes bigger if a field is part of the index and influences the size of backups; keep in mind that none of these effects might be noticeable with wrong domain of a single field for a smaller size tables, but if data types are consistently set too big or if the table has serious number of records the effects will bite). Also, you will always need to convert the VARCHAR to a 1 or 0 to use natural mysql boolean operators increasing complexity of queries.
The disadvantage of mysql using TINYINT(1) for BOOL is that certain values are allowed by RDBMS that should not be allowed, theoretically allowing for meaningless values to be stored in the system. In this case your application layer must guarantee the data integrity and it is always better if RDBMS guarantees integrity as it would protect you from certain bugs in application layer and also mistakes that might be done by database administrator.
an obvious answer would be:
WHERE (group_hide.hide is null or group_hide.hide ='false')
I'm not sure off the top of my head what the null behaviour rules are.
When using integer columns is it better to have 0 or NULL to indicate no value.
For example, if a table had a parent_id field and a particular entry had no parent, would you use 0 or NULL?
I have in the past always used 0, because I come from a Java world were (prior to 1.5) integers always had to have a value.
I am asking mainly in relation to performance, I am not too worried about which is the "more correct" option.
Using NULL is preferable, for two reasons:
NULL is used to mean that the field has no value, which is exactly what you're trying to model.
If you decide to add some referential integrity constraints in the future, you will have to use NULL.
Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.
MySQL - optimizing data size
using NULL for "no value" is literally correct. 0 is a value for an integer, therefore it has meaning.
NULL otoh literally means there is nothing, so there is no value.
Performance would likely be irrelevant, but using NULL may well be faster somewhat if you learn to code using NULL correctly.
In your parent_id example 0 is perfectly valid, because it stands for 'root'. In most cases, NULL is a better choice for 'no value' logically.
It has no performance impact that I know of, though.
You shouldn't expect to see any real life performance difference from this
UNIQUE( id1, id2 ) won't work with null values because it would allow, for example 1, null twice
on the other hand if you use 0, JOIN atable ON this.extID = atable.ID the join will be executed (resulting in no rows joined) whereas NULL will just be ignored
Anyway I suggest to always use "empty values" (like 0 or empty string) instead of NULL unless the empty value has a different meaning from NULL
and I also modify queries like so: JOIN atable ON this.extID = atable.id AND extID > 0 which prevents to execute the useless join
I think 0 can be used instead of NULL if you don't actually expect 0 to be used as a value.
That is for example your column is a foreign key. Since foreign keys don't normally start with 0 but instead start with 1, it means you wouldn't expect the 0 to be used as a value.
You can then use the 0 to denote the 'No' value state. Using it in joins would not match any columns on the other table. Thus, having the same effect as NULL.
But if you have a column where the 0 actually has a meaning. Like for example a quantity field. And apart from that, you also need to express and empty value. For example, to denote that the quantity hasn't been inputted yet. Then you need a NULL for that.
Hope that makes sense.
0 is still a valid value for an integer column. Hence you have to use NULL and allow null on that column.
Also if you are using integer column for only positive numbers, then you can use -1 for no value.
In your example of parent_id reference to use 0, it is fine until you make sure that there are no reference ids starting with id 0.
Is it better to use default null or default "" for text fields in MySQL?
Why?
Update: I know what means each of them. I am interested what is better to use considering disk space and performance.
Update 2: Hey ppl! The question was "what is better to use" not "what each means" or "how to check them"...
For MyISAM tables, NULL creates an extra bit for each NULLABLE column (the null bit) for each row. If the column is not NULLABLE, the extra bit of information is never needed. However, that is padded out to 8 bit bytes so you always gain 1 + mod 8 bytes for the count of NULLABLE columns. 1
Text columns are a little different from other datatypes. First, for "" the table entry holds the two byte length of the string followed by the bytes of the string and is a variant length structure. In the case of NULL, there's no need for the length information but it's included anyways as part of the column structure.
In InnoDB, NULLS take no space: They simply don't exist in the data set. The same is true for the empty string as the data offsets don't exist either. The only difference is that the NULLs will have the NULL bit set while the empty strings won't. 2
When the data is actually laid out on disk, NULL and '' take up EXACTLY THE SAME SPACE in both data types. However, when the value is searched, checking for NULL is slightly faster then checking for '' as you don't have to consider the data length in your calculations: you only check the null bit.
As a result of the NULL and '' space differences, NULL and '' have NO SIZE IMPACT unless the column is specified to be NULLable or not. If the column is NOT NULL, only in MyISAM tables will you see any peformance difference (and then, obviously, default NULL can't be used so it's a moot question).
The real question then boils down to the application interpretation of "no value set here" columns. If the "" is a valid value meaning "the user entered nothing here" or somesuch, then default NULL is preferable as you want to distinguish between NULL and "" when a record is entered that has no data in it.
Generally though, default is really only useful for refactoring a database, when new values need to come into effect on old data. In that case, again, the choice depends upon how the application data is interpreted. For some old data, NULL is perfectly appropriate and the best fit (the column didn't exist before so it has NULL value now!). For others, "" is more appropriate (often when the queries use SELECT * and NULL causes crash problems).
In ULTRA-GENERAL TERMS (and from a philosophical standpoint) default NULL for NULLABLE columns is preferred as it gives the best semantic interpretation of "No Value Specified".
1 [http://forge.mysql.com/wiki/MySQL_Internals_MyISAM]
2 [http://forge.mysql.com/wiki/MySQL_Internals_InnoDB]
Use default null. In SQL, null is very different from the empty string (""). The empty string specifically means that the value was set to be empty; null means that the value was not set, or was set to null. Different meanings, you see.
The different meanings and their different usages are why it's important to use each of them as appropriate; the amount of space potentially saved by using default null as opposed to default "" is so small that it approaches negligibility; however, the potential value of using the proper defaults as convention dictates is quite high.
From High Performance MySQL, 3rd Edition
Avoid NULL if possible.
A lot of tables include nullable columns even when the application does not need
to store NULL (the absence of a value), merely because it’s the default. It’s usually
best to specify columns as NOT NULL unless you intend to store NULL in them.
It’s harder for MySQL to optimize queries that refer to nullable columns, because
they make indexes, index statistics, and value comparisons more complicated. A
nullable column uses more storage space and requires special processing inside
MySQL. When a nullable column is indexed, it requires an extra byte per entry
and can even cause a fixed-size index (such as an index on a single integer column)
to be converted to a variable-sized one in MyISAM.
The performance improvement from changing NULL columns to NOT NULL is usually
small, so don’t make it a priority to find and change them on an existing schema
unless you know they are causing problems. However, if you’re planning to index
columns, avoid making them nullable if possible.
There are exceptions, of course. For example, it’s worth mentioning that InnoDB
stores NULL with a single bit, so it can be pretty space-efficient for sparsely populated
data. This doesn’t apply to MyISAM, though.
I found out that NULL vs "" is insignificant in terms of disk-space and performance.
The only true reason I can personally see in using NULL over '' is when you have a field marked as UNIQUE but need the ability to allow multiple "empty" columns.
For example, the email column in my user table is only filled in if someone actually has an email address. Anyone without an email address gets NULL. I can still make this field unique because NULL isn't counted as a value, whereas the empty string '' is.
A lot of folks are answering the what is the difference between null and '', but the OP has requested what takes up less space/is faster, so here's my stab at it:
The answer is that it depends. If your field is a char(10), it will always take 10 bytes if not set to null, and therefore, null will take up less space. Minute on a row-by-row basis, but over millions and millions of rows, this could add up. I believe even a varchar(10) will store one byte (\0) as an empty string, so again this could add up over huge tables.
In terms of performance in queries, null is in theory quicker to test, but I haven't seen able to come up with any appreciable difference on a well indexed table. Keep in mind though, that you may have to convert null to '' on the application side if this is the desired return. Again, row-by-row, the difference is minute, but it could potentially add up.
All in all it's a micro-optimization, so it boils down to preference. My preference is to use null because I like to know that there's no value there, and not guess if it's a blank string ('') or a bunch of spaces (' '). null is explicit in its nature. '' is not. Therefore, I go with null because I'm an explicit kind of guy.
Use whatever makes sense. NULL means "no value available/specified", "" means "empty string."
If you don't allow empty strings, but the user does not have to enter a value, then NULL makes sense. If you require a value, but it can be empty, NOT NULL and a value of "" makes sense.
And, of course, if you don't require a value, but an empty value can be specified, then NULL makes sense.
Looking at an efficiency point of view, an extra bit is used to determine whether the field is NULL or not, but don't bother about such micro-optimization until you have millions of rows.
"" is like an empty box... null is like no box at all.
It's a difficult concept to grasp initially, but as the answers here plainly state - there is a big difference.
'' = '' yields TRUE which satisfies WHERE condition
NULL = NULL yields NULL which doesn't satisfy WHERE condition
Which is better to use depends on what result you want to get.
If your values default to NULL, no query like this:
SELECT *
FROM mytable
WHERE col1 = ?
will ever return these values, even if you pass the NULL for the bound parameter, while this query:
SELECT *
FROM mytable
WHERE col1 = ''
will return you the rows that you set to an empty string.
This is true for MySQL, but not for Oracle, which does not distinguish between empty string and a NULL.
In Oracle, the latter query will never return anything.
Use "". It requires less programming effort if you can assert that columns are non-null. Space difference between these is trivial.
I prefer null when it is semantically correct. If there is an address field available and the user did not fill in, I give it a "". However if there in an address attribute to in the users table yet I did not offer the user a chance to fill it in, I give it a NULL.
I doubt (but I can't verify) that NULL and "" makes much of a difference.
In general, NULL should indicate data that is not present or has not been supplied, and therefore is a better default value than the empty string.
Sometimes the empty string is what you need as a data value, but it should almost never be a default value.
NULL means 'there is no value' and is treated especially by RDBMSs regarding where clauses and joins.
"" means 'empty string' and is not treated especially.
It depends on what does the text represent and how will it actually be used in queries.
For example, you can have a questionnaire with some obligatory questions and some optional questions.
Declined optional questions should have a NULL in their corresponding column.
Obligatory questions should have an empty string as default, because they HAVE to be answered. (Of course in a real application you'd tell the user to enter something, but I hope you get the idea)