A bit off topic but what is the origin behind NULL?
I sent an email to a French speaking customer containing "thank you for bringing in your NULL 'item'".
Apparently in French, NULL (NUL) translates to:
nul, nulle /nyl/
I. adjective
(familiar) [personne] hopeless, useless;
[travail, étude] worthless;
[film, roman] trashy (colloquial);
Oops, I insulted them. Perhaps of French origin?
Etymology
From Middle French nul, from Latin nullus.
Noun
null (plural nulls)
A non-existent or empty value or set of values.
Zero quantity of expressions; nothing.
(computing) the ASCII or Unicode character (␀), represented by a zero value, that indicates no character and is sometimes used as a string terminator.
(computing) the attribute of an entity that has no valid value.
Since no date of birth was entered for
the patient, his age is null.
Adjective
null (comparative more null, superlative most null)
Having no legal validity, "null and void"
insignificant
absent or non-existent
(mathematics) of the null set
(mathematics) of or comprising a value of precisely zero
(genetics, of a mutation) causing a complete loss of gene function, amorphic.
Source: Wiktionary.
Apparently Tony Hoare (of quicksort fame) introduced the concept of null values in computing. He later called that his "billion-dollar mistake" referring to the damage bugs involving null pointers have caused over the decades.
From the Latin Nullus == "Nothing"
In french, Nul, mainly means equal to zero, none existent, and by familiar extension, of little importance, or worthless.
"Null" is an English word. According to my dictionary it means invalid, non-existant, or without character or expression. Also according to my dictionary (Oxford dictionary of Current English) it derives from Latin, which could mean it got to English by way of French.
I'm french, "Nul" is widely used as useless or worthless.
If your customer is used to reading your messages in english, it should be OK (especially if he works in IT domain).
Related
In two's-complement notation, there's always an odd-man-out value to compensate for the 0/origin value that is conceptually neither positive nor negative. We treat 0 as positive for the sake of pragmatism, and we treat its counterpart, which is a 1 in the top bit and 0 in the rest, as negative, but conceptually, they are both special values that have no sign, because in both cases, -v==v.
For instance, in a signed 32-bit value, this number might be represented in one of these based forms:
0b10000000000000000000000000000000
0x80000000
-2147483648
I've personally been using my own term for this odd value for a while, which I will share below as my own answer, and let you all decide whether it's worthy, but I wouldn't be surprised if there's already an accepted name for it.
I leave the rest to you...
Edit: On further research, I did find some sites claiming that "it is sometimes called the weird number", but these blurbs are consistently copied verbatim from a Wikipedia entry on two's complement notation, which itself only references a 2006 college research paper that's unavailable at the given location, but I found here, where it's only referred to in passing as such. Wikipedia also references a single book, but that book's usage appears to be based on the text of the Wikipedia entry, which existed before the book was written. I'm not convinced that anyone other than one University of Tokyo student ever called it "the weird number" in practice.
Depending on context, I might refer to it neutrally as the dead value or, if I'm feeling like anthropomorphizing it, I call it Death. I think of that lone top bit as a scythe of sorts.
I call it this for two reasons:
On the ring that is two's-complement notation, its counterpart is 0, which we commonly refer to as the origin. One antonym for origin is death.
This particular value, being ambiguous as it is, tends to catch out a lot of programmers. It is literally the death of a lot of unsuspecting algorithms.
When writing terse assembly, I tend to abbreviate it as just "D", for instance if I had a condition that was satisfied by all values greater that zero, and Death, I might call the flag "GZD".
I simply call that the minimum integer or minimum value, since that is indeed what it is in two's complement.
It's also described that way in the C standard limits.h (and C++ equivalent) header, such as with SCHAR_MIN, INT_MIN, LONG_MIN and so on.
Can someone please explain the differences between Null, Zero and Blank in SQL to me?
Zero is a number value. It is a definite with precise mathematical properties. (You can do arithmetic on it ...)
NULL means the absence of any value. You can't do anything with it except test for it.
Blank is ill-defined. It means different things in different contexts to different people. For example:
AFAIK, there is no SQL or MySQL specific technical meaning for "blank". (Try searching the online MySQL manual ...)
For some people "blank" could mean a zero length string value: i.e. one with no characters in it ('').
For some people "blank" could mean a non-zero length string value consisting of only non-printing characters (SPACE, TAB, etc). Or maybe consisting of just a single SPACE character.
In some contexts (where character and string are different types), some people could use "blank" to mean a non-printing character value.
For some people could even use "blank" mean "anything that doesn't show up when you print or display it".
And then there are meanings that are specific to (for example) ORM mappings.
The point is that "blank" does not have a single well-defined meaning. At least not in (native) English IT terminology. It is probably best to avoid it ... if you want other IT professionals to understand what you mean. (And if someone else uses the term and it is not obvious from the context, ask them to say precisely what they mean!)
We cannot say anything generally meaningful about how ZERO / NULL / BLANK are represented, how much memory they occupy or anything like that. All we can say is that they are represented differently to each other .... and that the actual representation is implementation and context dependent.
You may correlate NULL-BLANK-ZERO case by child birth scenario( A real life Example.).
NULL Case: Child is not born yet.
BLANK Case: Child is born but we didn't give any name to him
ZERO Case: We defined it as zero, Child is born but of zero age.
See how this data will look in a database table:
Also NULL is a absence of value, where a field having NULL is not allocated any memory, where as empty fields have empty value with allocated space in memory.
Could you be more accurate about blank?
For what I understand of your question:
"Blank" is the lack of value. This is a human concept. In SQL, you need to fill the field with a value anyway. So that there is a value which means "no value has been set for this field". It is NULL.
If Blank is "", then it is a string, an empty one.
Zero: well, Zero is 0 ... It is a number.
To sum up:
NULL --> no value set
Blank ("") --> empty string
Zero --> Number equal to 0
Please, try to be more accurate next time you post an answer on Stack!
If I were you, I would check some resources about it, for example:
https://www.tutorialspoint.com/sql/sql-null-values.htm
NULL means it does not have any value not even garbage value.
ZERO is an integer value.
BLANK is simply a empty String value.
For an application I'm currently building I need a database to store books. The schema of the books table should contain the following attributes:
id, isbn10, isbn13, title, summary
What data types should I use for ISBN10 and ISBN13? My first thoughts where a biginteger but I've read some unsubstantiated comments that say I should use a varchar.
You'll want a CHAR/VARCHAR (CHAR is probably the best choice, as you know the length - 10 and 13 characters). Numeric types like INTEGER will remove leading zeroes in ISBNs like 0-684-84328-5.
ISBN numbers should be stored as strings, varchar(17) for instance.
You need 17 characters for ISBN13, 13 numbers plus the hyphens, and 13 characters for ISBN10, 10 numbers plus hyphens.
ISBN10
ISBN10 numbers, though called "numbers", may contain the letter X. The last number in an ISBN number is a check digit that spans from 0-10, and 10 is represented as X. Plus, they might begin with a double 0, such as 0062472100, and as a numeric format, it might get the leading 00 removed once stored.
84-7844-453-X is a valid ISBN10 number, in which 84 means Spain, 7844 is the publisher's number, 453 is the book number and X (i.e 10) is the control digit. If we remove the hyphens we mix publisher with book id. Is it really important? Depending on the use you'll give to that number. Bibliographic researchers (I've found myself in that situation) might need it for many reasons that I won't go into here, since it has nothing to do with storing data. I would advise against removing hyphens, but the truth is everyone does it.
ISBN13
ISBN13 faces the same issues regarding meaning, in that, with the hyphens you get 4 blocks of meaningful data, without them, language, publisher and book id would become lost.
Nevertheless, the control digit will only be 0-9, there will never be a letter. But should you feel tempted to only store isbn13 numbers (since ISBN10 can automatically and without fail be upgraded to ISBN13), and use int for that matter, you could run into some issues in the future. All ISBN13 numbers begin with 978 or 979, but in the future some 078 might could be added.
A light explanation about ISBN13
A deeper explanation of ISBN
numbers
I will be storing a year in a MySQL table: Is it better to store this as a smallint or varchar? I figure that since it's not a full date, that the date format shouldn't be an answer but I'll include that as well.
Smallint? varchar(4)? date? something else?
Examples:
2008
1992
2053
I would use the YEAR(4) column type... but only if the years expected are within the range 1901 and 2155... otherwise, see Gambrinus's answer.
I'd go for small-int - as far as I know - varchar would take more space as well as date. second option would be the date.
My own experience is with Oracle, which does not have a YEAR data type, but I have always tried to avoid using numeric data types for elements just because they are comprised only of digits. (So this includes phone numbers, social security numbers, zip codes as well, as additional examples).
My own rule of thumb is to consider what the data is used for. If you will perform mathematical operations on it then store it as a number. If you will perform string functions (eg. "Take the last four characters of the SSN" or "Display the phone number as (XXX) XXX-XXXX") then it's a string.
An additional clue is the requirement to store leading zeroes as part of the number.
Furthermore, and despite being commonly referred to as a phone "number", they frequently contain letters to indicate the presence of an extension number as a suffix. Similarly, a Standard Book Number potentially ended in an "X" as a "check digit", and an International Standard Serial Number can end with an "X" (despite the ISSN International Centre repeatedly referring to it as an 8-digit code http://www.issn.org/understanding-the-issn/what-is-an-issn/).
Formatting of phone numbers in an international context is tricky, or course, and conforming to E.164 requires that country calling codes are prefixed with a "+".
This question is language agnostic but is inspired by these c/c++ questions.
How to convert a single char into an int
Char to int conversion in C
Is it safe to assume that the characters for digits (0123456789) appear contigiously in all text encodings?
i.e. is it safe to assume that
'9'-'8' = 1
'9'-'7' = 2
...
'9'-'0' = 9
in all encodings?
I'm looking forward to a definitive answer to this one :)
Thanks,
Update: OK, let me limit all encodings to mean anything as old as ASCII and/or EBCDIC and afterwards. Sandscrit I'm not so worried about . . .
I don't know about all encodings, but at least in ASCII and <shudder> EBCDIC, the digits 0-9 all come consecutively and in increasing numeric order. Which means that all ASCII- and EBCDIC-based encodings should also have their digits in order. So for pretty much anything you'll encounter, barring Morse code or worse, I'm going to say yes.
You're going to find it hard to prove a negative. Nobody can possibly know every text encoding ever invented.
All encodings in common use today (except EBCDIC, is it still in common use?) are supersets of ASCII. I'd say you're more likely to win the lottery than you are to find a practical environment where the strict ordering of '0' to '9' doesn't hold.
Both the C++ Standard and the C standard require that this be so, for C++ and C program text.
According to K&R ANSI C it is.
Excerpt:
..."This particular program relies on the properties of the character representation of the digits. For example, the test
if (c >= '0' && c <= '9') ...
determines whether the character in c is a digit. If it is, the numeric value of that
digit is
c - '0'
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets...."
All text encodings I know of typically order each representation of digits sequentially. However, your question becomes a lot broader when you include all of the other representations of digits in other encodings, such as Japanese: 1234567890. Notice how the characters for the numbers are different? Well, they are actually different code points. So, I really think the answer to your question is a hard maybe, since there are so many encodings out there and they have multiple representations of digits in them.
A better question is to ask yourself, why do I need to count on digits to be in sequential code points in the first place?