Should null == null be true when comparing objects?

Should null == null be true when comparing objects? - language-agnostic

I'm curious what everyone thinks. In SQL (at least in oracle) NULL translates conceptually to "I don't know the value" so NULL = NULL is false. (Maybe it actually results in a NULL which then gets cast to false or something like that...)
This makes sense to me, but in most OO languages null means "no reference" so null==null should probably be true. This is the usual way of doing things in C# for example when overriding Equals.
On the other hand, null is still frequently used to mean "I don't know" in object-oriented languages and implementing null==null to false might result in code that is slightly more meaningful to certain domains.
Tell me what you think.

For general purpose programming, null == null should probably return true.
I can't count the number of times I've run into the
if( obj != null )
{
//call methods on obj
}
pattern, and it often seems unavoidable. If null == null evaluated to false, this pattern would fall apart, and there wouldn't be a good way to handle this case without exceptions.

I think that null in Java, just like NULL in C++ or None in Python, means specifically "there's nothing here" -- not "I don't know", which is a concept peculiar to SQL, not common in OOP languages.

I think you've got your basic facts about SQL completely wrong.
NULL is a data value and UNKNOWN is a logical value.
NULL = NULL is UNKNOWN.
NULL = NULL is certainly not FALSE!
Google for "three value logic".
The NULL value is the placeholder for a missing data value. Ideally, a NULL-able column should only be used for values that are only temporarily missing i.e. there's a reasonable expectation that a non-NULL value will be available in the future e.g. using the NULL value for the end date in a pair of DATETIME values used to model a period to signify infinity i.e. this period is current (though a far-future DATEITME value works well too).

I already get annoyed enough at the IEEE NaN not being equal to itself. I tend to view == as an equivalence relation, one of whose properties is reflexivity. If you have special semantics of "unknown" value equality, I think you should use something more specific rather than overloading an operator whose semantics are well understood. For example, just because you don't know if two values are equal, doesn't mean they definitely are not equal, which is what I'd guess from a False return value from ==. It seems like you really want some sort of ternary logic. Depending on your language, it may or may not be easy for you to come up with a concise ==-alike that can return True or False or NoClue, but I definitely think it should be separate.
Just my opinion on the matter though :)

If you said null === null, I would agree with you.

First null == null being true makes patterns like
if(!(instance == null)) {
// do something requiring instance not be null
}
work. (Yes, the usual test is instance != null but I want to make clear the use of !(null == null) being false.)
Second, if you need instance1 == instance2 to be false when instance1 and instance2 are null reference instances of your class, then this should be encapsulated into a logic class somewhere. In C#, we would say
class MyObjectComparer : IEqulityComparer<MyObject> {
public bool Equals(MyObject instance1, MyObject instances2) {
if(instance1 == null && instance2 == null) {
return false;
}
// logic here
}
public int GetHashCode(MyObject instance) {
// logic here
}
}

All null pointers (or references) are equal to each other.
They have to be, otherwise how would you compare a null pointer tonull?

C++: comparing null pointers always returns true. If somehow you have a null reference (don't do that) the result is crash.

In my opinion the current behavior is correct, especially if you consider that null is interpreted as "Unknown value".
Think about it this way:
If someone asked you whether the number of apples inside two boxes that you didn't know the contents of were equal. The answer wouldn't be yes or no, it would be "I don't know."

Related

When not to use <=> instead of = in MariaDB/MySQL?

Is there any reason to use = instead of <=> in MariaDB/MySQL? It seems that the = operator is only desirable if the null is desirable for result of the expression.
Are there any consequences of replacing every = with <=>? Even if both operands can never be null (for which the behaviour should remain the exact same)?

First of all, the meaning of NULL is quite unclear. There are many articles in the Internet about how people interpret it.
Therefore, using <=> could just add to the confusion and you risk than other developers may not understand it well. It strongly deviates from the standard behavior most people expect from queries. I would suggest you avoid it unless there are strong reasons why to use it.

a = b -- Always NULL (which is usually treated as False) if either is NULL
a <=> b -- True if _both_ are NULL
If both columns are declared NOT NULL, there is no use to use <=>.
In almost all other cases, your logic is not going to care.
I almost never see anyone using <=>.

null == object instead of object == null

I've seen this in a few places, people using
if(null == myInstance)
instead of
if(myInstance == null)
Is the former just a matter of style or does it have any performance impact in front of the (more intuitive, in my opinion) latter?

It depends actually on the language you are using.
In the languages where only bool type is recognized as a condition inside if, if (myinstance == null) is preferred as it reads more naturally.
In the languages where other types can be treated (implicitly casted, etc.) as a condition the form if (null == myinstance) is usually preferred. Consider, for example, C. In C, ints and pointers are valid inside if. Additionally, assignment is treated as an expression. So if one mistakenly writes if (myinstance = NULL), this won't be detected by the compiler as an error. This is known to be a popular source of bugs, so the less readable but more reliable variant is recommended. Actually, some modern compilers issue a warning in case of assignment inside if (while, etc.), but not an error: after all, it's a valid expression in C.

You do that, to avoid that you by mistake set the value of myInstance to null.
Example for a typo:
if (null = myConstant) // no problem here, just won't compile
if (myConstant = null) // not good. Will compile anyway.
In general a lot of developers will put the constant part of the equation in first position for that reason.
Writing conditional checks this way is also referred to as Yoda Conditions.

Difference between decimal and decimal.value?

I often have to call the Value property when accessing my Linq to SQL objects to check for null values or I get an exception. Can someone please expain these data types (i.e. decimal?, bool?, etc...) that appear to wrap the primitive types?

They are Generics of type Nullable<T>, and they do wrap primitive types.
Why they invented the short form int? is Nullable seems to be down to the standard confusion between succinct and terse C based language developers struggle with.
decimal? total = null;
total.HasValue will return false, it won't blow up with a null reference
but total.Value.ToString(); will throw an exception, because the Value property of total is null.
The Value and HasValue properties are read only.
total = 10;
means total.Value will return 10.0 and total.HasValue will return true.
It's a really nice generic, especially for database types, still don't get the short form though...

decimal? is shorthand (and equivalent to) nullable<decimal>, which means it can have a value of null set to it. The same thing applies to bool?, int?, etc.
These values are chosen by Linq 2 SQL when your database fields allow null values. Otherwise, you would not have a way to indicate that those values should be null.

When is handling a null pointer/reference exception preferred over doing a null check?

I have an odd question that I have always thought about, but could never see a practical use for. I'm looking to see if there would be enough justification for this.
When is handling a null pointer/reference exception preferred over doing a null check? If at all.
This applies to any language that has to deal with null pointers/references which has exception handling features.
My usual response to this would be to perform a null check before doing anything with the pointer/reference. If non-null, continue like normal and use it. If null, handle the error or raise it.
i.e., (in C#)
string str = null;
if (str == null)
{
// error!
}
else
{
// do stuff
int length = str.Length;
// ...
}
However if we were not to do the check and just blindly use it, an exception would be raised.
string str = null;
int length = str.Length; // oops, NullReferenceException
// ...
Being an exception, we could certainly catch it so nothing is stopping us from doing this (or is there?):
string str = null;
try
{
int length = str.Length; // oops, NullReferenceException
// ...
}
catch (NullReferenceException ex)
{
// but that's ok, we can handle it now
}
Now I admit, it's not the cleanest code, but it's no less working code and I wouldn't do this normally. But is there a design pattern or something where doing this is useful? Perhaps more useful than doing the straight up, null check beforehand.
The only cases where I can imagine this might be useful is in a multi-threaded environment where an unprotected shared variable gets set to null too soon. But how often does that happen? Good code that protects the variables wouldn't have that problem. Or possibly if one was writing a debugger and wanted the exception to be thrown explicitly only to wrap it or whatnot. Maybe an unseen performance benefit or removes the need for other things in the code?
I might have answered some of my questions there but is there any other use to doing this? I'm not looking for, "do this just because we can" kinds of examples or just poorly written code, but practical uses for it. Though I'll be ok with, "there's no practical use for it, do the check."

The problem is that all null pointer exceptions look alike. Any accounting that could be added to indicate which name tripped the exception can't be any more efficient than just checking for null in the first place.

If you expect the value to not be null, then there is no point in doing any checks. But when accepting arguments, for example, it makes sense to test the arguments that you require to be non-null and throw ArgumentNullExceptions as appropriate.
This is preferred for two reasons:
Typically you would use the one-string form of the ArgumentNullException constructor, passing in the argument name. This gives very useful information during debugging, as now the person coding knows which argument was null. If your source is not available to them (or if the exception trace was submitted by an end user with no debug symbols installed) it may otherwise be impossible to tell what actually happened. With other exception types you could also specify which variable was null, and this can be very helpful information.
Catching a null dereference is one of the most expensive operations that the CLR can perform, and this could have a severe performance impact if your code is throwing a lot of NullReferenceExceptions. Testing for nullity and doing something else about it (even throwing an exception!) is a cheaper operation. (A similar principle is that if you declare a catch block with the explicit intent of catching NREs, you are doing something wrong somewhere else and you should fix it there instead of using try/catch. Catching NREs should never be an integral part of any algorithm.)
That being said, don't litter your code with null tests in places you don't ever expect null values. But if there is a chance a reference variable might be null (for example if the reference is supplied by external code) then you should always test.

I've been programming Java for more than 10 years, and I can't remember a single time I've explicitly tried to catch a null pointer exception.
Either the variable is expected to be null sometimes (for example an optional attribute of an object), in which case I need to test for null (obj.attr != null), or I expect the variable to be not null at all times (in which case a null indicates a bug elsewhere in the code). If I'm writing a method (say isUpperCase) whose argument (String s) is null, I explicitly test for null and then throw an IllegalArgumentException.
What I never ever do, is silently recover from a null. If a method should return the number of upper case characters in a string and the passed argument was null, I would never "hide" it by silently return 0 and thus masking a potential bug.

Personnally (doing C++), unless there is a STRONG contract that assures me that my pointers are not nullptr, I always test them, and return error, or silently return if this is allowed.

In Objective-C you can safely do anything with nil you can do with a non-nil and the nil ends up being a noop. This turns out to be surprisingly convenient. A lot less boilerplate null checking, and exceptions are reserved for truly exceptional circumstances. Or would that be considered sloppy coding?

True until disproven or false until proven?

I've noticed something about my coding that is slightly undefined. Say we have a two dimension array, a matrix or a table and we are looking through it to check if a property is true for every row or nested dimension.
Say I have a boolean flag that is to be used to check if a property is true or false. My options are to:
Initialize it to true and check each cell until proven false. This
gives it a wrong name until the code
is completely executed.
Start on false and check each row until proven true. Only if all rows are true will the data be correct. What is the cleanest way to do this, without a counter?
I've always done 1 without thinking but today it got me wondering. What about 2?

Depends on which one dumps you out of the loop first, IMHO.
For example, with an OR situation, I'd default to false, and as soon as you get a TRUE, return the result, otherwise return the default as the loop falls through.
For an AND situation, I'd do the opposite.

They actually both amount to the same thing and since you say "check if a property is true for every row or nested dimension", I believe the first method is easier to read and perhaps slightly faster.
You shouldn't try to read the value of the flag until the code is completely executed anyway, because the check isn't finished. If you're running asynchronous code, you should guard against accessing the value while it is unstable.
Both methods "give a wrong name" until the code is executed. 1 gives false positives and 2 gives false negatives. I'm not sure what you're trying to avoid by saying this - if you can get the "correct" value before fully running your code, you didn't have run your code in the first place.
How to implement each without a counter (if you don't have a foreach syntax in your language, use the appropriate enumerator->next loop syntax):
1:
bool flag = true;
foreach(item in array)
{
if(!check(item))
{
flag = false;
break;
}
}
2:
bool flag = false;
foreach(item in array)
{
if(!check(item))
{
break;
}
else if(item.IsLast())
{
flag = true;
}
}

Go with the first option. An algorithm always has preconditions, postconditions and invariants. If your invariant is "bool x is true iff all rows from 0-currentN have a positve property", then everything is fine.
Don't make your algorithm more complex just to make the full program-state valid per row-iteration. Refactor the method, extract it, and make it "atomic" with your languages mechanics (Java: synchronized).

Personally I just throw the whole loop into a somewhat reusable method/function called isPropertyAlwaysTrue(property, array[][]) and have it return a false directly if it finds that it finds a case where it's not true.
The inversion of logic, by the way, does not get you out of there any quicker. For instance, if you want the first non-true case, saying areAnyFalse or areAllTrue will have an inverted output, but will have to test the exact same cases.
This is also the case with areAnyTrue and areAllFalse--different words for the exact same algorithm (return as soon as you find a true).
You cannot compare areAnyFalse with areAnyTrue because they are testing for a completely different situation.

Make the property name something like isThisTrue. Then it's answered "yes" or "no" but it's always meaningful.
In Ruby and Scheme you can use a question mark in the name: isThisTrue?. In a lot of other languages, there is a convention of puttng "p" for "predicate" on the name -- null-p for a test returning true or false, in LISP.

I agree with Will Hartung.
If you are worried about (1) then just choose a better name for your boolean. IsNotSomething instead of IsSomething.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008