I have a table for user questions and each question has a "question_score" field. for this field, if i allow it to be a NULL then this may save some spaces or even maybe save some CPU times, will it?
question_id (int) | quesion_name (varchar) | question_score (int) ...
You won't save any space or cpu time.
Even if the value is NULL, MySQL will still have to store this fact.
Sometimes, permits NULL value is the right thing to do, but surely not when the reason is some non-existent space optimization. Default value is generally the way to go.
Use nulls if you cannot know the value at the time the data is entered unless there is a reasonable alternative. For instance if you don't know the name of something, you could perhaps have a default value of 'Unknown'. however if you don't know the enddate, don't put in some fake data that you always have to remember to code around. And never put in data that is fake than might conceivably be a real value (0 for price for instance) or you won't be able to tell the items you gave away from the ones you haven't set a price for yet. Nulls are good, use them when appropriate. It is much harder to work around fake data used in place of nulls (and far more likely to have a problem where you aren't returning the correct results, but don't know it) in my 30 years of database experience than it is to properly handle nulls.
In short, no. It's a question of correctness, not performance. Nulls almost invariably lead to inconsistencies and incorrect results. The best strategy is to design the database in Normal Form without nulls, unless and until you find a compelling reason to do otherwise.
In my opinion are null-values always a no-go. Please apply database normalization to at least to the third level. Probably your db-structure is not optimal.
Related
Most forums do not specify any default value for the ID column.
I wonder why they don't do that.
For example SMF. They specify for every column a default value except for the (PRIMARY KEY) ID column.
EDIT: I now see that ^this^ part of the question was a bit 'stupid'.
Is there an performance advantage by specifying a default value?
Why shouldn't you specify by default an empty value?
Will 'NOT NULL' complain if you use the value: (similar to $var = ''; in php)
EDIT: But I assume that you want a database to complain if it doesn't get, for example, a username. Of course, I'll not allow registration fields to be left empty (and I do validate them server-side), but if I'm not mistaken, than, if by magic the input to the database is still empty, the database will accept this. Assuming the database uses NOT NULL and a default value. Question: So what than is the use of NOT NULL, in combination with a default value?
Thanks in advance.
Please keep in mind that I am Dutch an therefore I may have made some mistakes.
If we assume that a default value for an id, primary key, column in a database made sense (and was even valid), think what the consequences would be:
First off, it would allow the user, or the DB scripts, to supply no id information (which is fine, since most scripts rely on the database to deal with that aspect anyway), but
it would also allow the database not to 'deal with' it, and to simply insert the new record with the default value, whatever that might be. This means that
in many situations multiple records are likely to be inserted which have the same id value/primary key. If multiple entries, into for example a forum Database, have the same value, how would you tell your users apart? How would you identify them?
The reason that no 'default' is used is because, a: it wouldn't make sense to allow it in the first place, and b: it'd be invalid (albeit b is a consequence of a).
Is there a performance benefit to supplying a default value?
Not so much a 'performance' benefit, but it does mean that if no value is supplied the software has, at least, some idea what to expect (whether that default is a string, a number, an enum...).
Why shouldn't you specify by default an empty value?
I think that all entries to the database should be controlled, whether that's the values you allow to be inserted, or the values inserted in the event of an error, or a failure of the user to supply a value. Allowing null-entries to your database likely a controllable situation, but it seems needlessly complex to compensate for null entries, when a default can be specified.
Will 'NOT NULL' complain if you use the value: (similar to $var = ''; in php)?
It's been a while since I last used MySQL, but null is not similar to an empty string (in the way that JavaScript assesses '' to be 'falsy'), it means, absolutely, nothing. Empty. A whole vacuum, and non-existence, of information. NOT NULL, then, requirese some (albeit it doesn't specify exactly what) information. This is why a default should be supplied, since it specifies precisely what will be found in the database in the even of the user specifying no data.
Edited in response to comments from OP (below):
But I assume that you want a database to complain if it doesn't get, for example, a username. Of course, I'll not allow registration fields to be left empty (and I do validate them serverside), but if I'm not mistaken, than if by magic the input to the database is still empty, the database will accept this. If it uses NOT NULL and a default value. So what than is the use of NOT NULL, in combination with a default value?
I can't speak to all situations, or to the choices made by all database-/web-admins, but in my personal use-cases I'd suggest a consideration of:
First, would an empty-value make sense for this field, and:
Second, if an empty field doesn't make sense, should a default value be used?
In some situations I don't need the details from a user (their gender, for example) has almost no impact on me, or any services I provide, so I don't care whether they supply it, or not. In this case a default isn't used.
In other cases, such as their age, it might make a difference (I cant' remember the name of the law, but I recall some mention of Facebook, and other online societies, being required to only allow those of age 13, or older, to participate). In this case I ask my users to say agree that they are, in fact, over the age of 13 (via a checkbox) and then later offer them an input to insert their actual age/date-of-birth. In the absence of a supplied data-point (and, honestly, other than the (presumed) legal obligation I really don't care how old my users are) my database defaults to thirteen (or in the case of birthday it defaults to thirteen-years-ago today (albeit with help from the php scripts).
Now, the the aspect of NOT NULL and a default value? There'd be no sense, so far as I can see.
The whole concept of NOT NULL is to ensure a user-supplied value (otherwise the record isn't created, and the user can't join (or whatever) the service for which they're registering). The default value is to supply a fall-back value in this absence. And, logically, if the two are allowed together on the same field? The default would be inserted before the NOT NULL condition is triggered. So...why bother? Do you have evidence that somebody, or some software, used both conditions on the same field? I can't, honestly, think why.
The ID column is an auto_increment in most cases. So a default value is invalid.
Edit:
Will 'NOT NULL' complain if you use the value: (similar to $var = ''; in php)
With NULL, the literal NULL is meant. Not something like zero or an empty string.
I have a 2 tables, one in which I have groups, the other where I set user restrictions of which groups are seen.
When I do LEFT JOIN and specify no condition, it shows me all records. When I do WHERE group_hide.hide!='true' it only shows these records that have false enum type set to them. With JOIN, other groups get the hide field set as "NULL".
How can I make it so that it excludes only these that are set to true, and show everything else that has either NULL or false?
In MySQL you must use IS NULL or IS NOT NULL when dealing with nullable values.
HEre you should use (group_hide.hide IS NULL OR group_hide.hide != 'true')
Don already provided good answer to the question that you asked and will solve your immediate problem.
However, let me address the point of wrong data type domain. Normally you would make hide be BOOLEAN but mysql does not really implement it completely. It converts it to TINYINT(1) which allows values from -128 to 127 (see overview of data types for mysql). Since mysql does not support CHECK constraint you are left with options to either use a trigger or foreign reference to properly enforce the domain.
Here are the problems with wrong data domain (your case), in order of importance:
The disadvantages of allowing NULL for a field that can be only 1 or 0 are that you have to employ 3 value logic (true, false, null), which btw is not perfectly implemented in SQL. This makes certain query more complex and slower then they need to be. If you can make a column NOT NULL, do.
The disadvantages of using VARCHAR for a field that can be only 1 or 0 are the speed of the query, due to the extra I/O and bigger storage needs (slows down reads, writes, makes indexes bigger if a field is part of the index and influences the size of backups; keep in mind that none of these effects might be noticeable with wrong domain of a single field for a smaller size tables, but if data types are consistently set too big or if the table has serious number of records the effects will bite). Also, you will always need to convert the VARCHAR to a 1 or 0 to use natural mysql boolean operators increasing complexity of queries.
The disadvantage of mysql using TINYINT(1) for BOOL is that certain values are allowed by RDBMS that should not be allowed, theoretically allowing for meaningless values to be stored in the system. In this case your application layer must guarantee the data integrity and it is always better if RDBMS guarantees integrity as it would protect you from certain bugs in application layer and also mistakes that might be done by database administrator.
an obvious answer would be:
WHERE (group_hide.hide is null or group_hide.hide ='false')
I'm not sure off the top of my head what the null behaviour rules are.
My MySQL database has carefully defined fields. Some fields, if there is a chance that they can be unknown, allow a NULL value.
I'm writing a web-based CMS to handle the data in said database. Obviously, the post arrays from the HTML forms never contain null values, only empty strings.
I don't want to confuse the users of my CMS by adding a "NULL checkbox" or anything like that but I can't tell from the post arrays whether a field should be null or empty.
Should I convert all empty strings to NULL values on save for fields that allow null values?
What are some good practices for this type of conundrum?
That is a sticky question. Of course there is no way for a user to know whether they want to insert an empty string or nothing at all. So the answer is really up to you. Do you want to allow users to enter empty strings? Does it have any meaning for it to do so? If an empty string means something that a NULL string doesn't and that is well defined, go ahead and allow both.
If there is no distinction between them, pick one and stick with it. I personally don't see why you would need to keep empty strings like that, it just makes things far more confusing later. Just stick with NULL to represent an absence of data.
Make your decision now, document it and stick by it.
Do you really want to deal with actual proper NULL, with its tricky three-value logic?
In my experience you rarely actually want the explicit uncertainty that comes with NULL. Storing an empty string for values the user doesn't care to fill in is usually more practical. Then you can go ahead and do searches like WHERE t.field<>'x' or WHERE t0.field=t1.field without having to worry about what that does to your boolean logic when one or both are null.
If you've already got a working database that relies on the uncertain nature of null, and that's an intrinsic part of your requirements, then fine, stick with null (and in that case you'll probably have to convert empty user input in a nullable field to null just because no end-user is ever going to be able to understand the conceptual difference between nothing and null).
But personally I only still use nulls for optional foreign key references (when not using a separate join table).
i wonder if one should have NULL in the fields with no value, or should it be empty?
what is best?
thanks
NULL means that no data is set, while empty string could be some valid data.
Thus, using NULL helps you to differentiate these two cases.
From a programming standpoint, I try to not allow null values for a few reasons. One of which is that code often has a bad reaction to unexpected NULL values. If a query filter ran faster checking null values I might consider using them but there is no evidence of this I have experienced. But I have experienced many a function which pooped out on doing some kind of comparison not testing for NULL before hand.
There is a certain argument that you should never allow NULL in your data, if you are using it to indicate that you don't know what the value should be or that you just don't have that data yet then use an explicit value in the field to indicate those states. Similarly for 'empty' fields. That said, I think everyone does it or has done it and may do it again. NULL has odd comparative properties which is why it's always best, if you can, to avoid it and have explicit values for missing data states.
Avoid NULLs in base tables whenever three valued logic is likely to come back to bite you. That's easy to say, but lengthy to explain. Three valued logic can sometimes be successfully managed, but your intuition is likely to be based on two valued logic, and can be misleading.
If you avoid NULLS in base tables, but create views or queries with outer joins, be prepared to deal with NULLS. NULLS in fields that are never used in where clauses and never used "incorrectly" with aggregates (as in sum(FIELD)) are OK.
NULL fields are always empty, but empty doesn't always imply NULL. In particular, an empty or non existent field in a form can translate into a non NULL value in a table. Autonumber fields are an example.
Oracle made a mistake way back in the 1980s by using the same representation for the VARCHAR string of length zero (the empty string), and NULL. They've been about to fix it "real soon now" for a quarter of a century. Don't hold your breath.
Don't use NULLs to convey a meaningful message. This almost always confuses your colleagues, even when they deny it.
Nulls are necessary amd important tools in dataase design. If you don'tknow the value at the time the record is inseerted, null is entirely appropriate and the best practice. Making an unknon into a known value such as empty string is silly. This especially true when you get away from string data into dates or numeric data. 0 is not the same as null, some arbitrary date far in the past or future is not the same as null. For that matter empty strings means there is no value, null means we don't know what the value is. This is an important distinction.
It's not hard to handle nulls, any competent programmer should be able to do so.
When I'm setting up a MySQL table, it asks me to define the name of the column, type of input, and length. My assumption, without having read anything about it, is that it's for minimization. Specify the smallest possible int/smallint/tinyint for your needs, and it will reduce overhead of some sort. If it's all positives, make it unsigned to double your space, etc.
What happens if I just make every field a varchar-200 characters? When/why is this bad, what will I miss out on, and when will any inefficiencies manifest themselves? 100k records?
I think about this every time I set up a DB, but I haven't built anything to scale enough where I've ever had my scheme setup inappropriately, either too "strict/small" or "loose/big". Can someone confirm that I'm making good assumptions about speed and efficiency?
Thanks!
Data types not only optimize storage, but how data is indexed. As your databases get bigger, it will become apparent that it's quicker to search for all the records that have a 1 in an integer field than those that have a "1" in a varchar field. This becomes especially important when you're joining data from more than one table and your database engine is having to do this sort of thing repeatedly. (Daren also rightly points out below that it's important that the types of the fields you're matching on are identical as well.)
The level at which these inefficiencies become an issue depends greatly on your hardware and your application design. We have big enough iron these days that if you're building moderate-scale apps, you may not see an appreciable difference. (Aside from feeling a little bit guilty about your database design!) But establishing good habits on small projects makes the bigger ones easier when they come along.
If you have two columns as varchar and put in the values 10 and 20 and add them, you'll get 1020 instead of 30 which you'd likely expect.
Sure, you could save everything as VARCHAR strings. But you'd be giving up a lot of functionality provided by the database engine.
You should choose the database type that most closely matches the intended use of the column. For example, using DATE or DATETIME to store dates provides you with all sorts of date/time functions that you don't get with basic VARCHAR types.
Likewise, fields used to count things or provide simple unique IDs should be INT or one of its related types. Also bear in mind that an INT occupies only 4 bytes, whereas a 9-digit string uses at least 9 bytes.
For character data, it's wise to use NVARCHAR for internationalized values that users in any locale are going to enter (esp. names and locations). If you know the text is limited to US or internal use only, VARCHAR is safe.