Avoid even Option fields. Always empty string for String and 0 for Int optional fields - json

I have scala REST service based on JSON and Play Framework. Some of the fields of the JSON are optional (e.g. middleName). I can mark it Option e.g.
middleName: Option[String]
and even don't expect it in JSON. But I would like to avoid possible app errors in the future and simplify life. I would like to mark it as expectable but empty if user don't want to provide this info and have no Option fields throughout entire application (JSON/DB overhead is minor).
Is it good idea to avoid Option fields throughout the application? If the String field is empty, it contains an empty string but manadatory present in JSON/DB. If the Int field is empty it contains 0 etc
Thanks in advance

I think you would regret avoiding Option because of the loss of type safety. If you go passing around potentially null object references, everyone who touches them has to remember to check for null because there is nothing that forces them to do so. Failure to remember is a NullPointerException waiting to happen. The use of Option forces code to deal with the possibility that there is no value to work with; forgetting to do so will cause a compilation error:
case class Foo(name: Option[String])
...
if (foo1.name startsWith "/") // ERROR: no startsWith on Option
I very occasionally do use nulls in a very localized bit of code where I think either performance is critical or I have many, many objects and don't want to have all of those Some and None objects taking up memory, but I would never leak the null out across a public API. Using nulls is a complicating optimization that should only be used where the extra vigilance required to avoid catastrophe is justified by the benefit. Such cases are rare.
I am not entirely sure I understand what your needs are with regard to JSON, but it sounds like you might like to have Option fields not disappear from JSON documents. In Spray-json there is a NullOptions trait specifically for this. You simply mix it into your protocol type and it affects all of the JsonFormats defined within (you can have other protocol types that do "not" mix it in if you like), e.g.
trait FooJsonProtocol extends DefaultJsonProtocol with NullOptions {
// your jsonFormats
}
Without NullOptions, Option members with value None are omitted altogether; with it, they appear with null values. I think that it is clearer for users if you show the optional fields with null values rather than having them disappear, but for transmission efficiency you might want them omitted. With Spray-json, at least, you can pick.
I don't know whether other JSON packages have a similar option, but perhaps that will help you look for it if for some reason you don't want to use Spray-json (which, by the way, is very fast now).

I think that would depend on your business logic and how you want to use these values.
In the case of the middleName I am assuming you are using it primarily to address the user in a personal manner and you just concatenate title, firstName, middleName and lastName. So you treat the value exactly the same whether the user has specified it or not. So I think using an empty String instead of None might be preferable.
In the case of values where 0 or the "" is a valid value in terms of your business logic I would go with the Option[String], also in cases where you have different behaviours depending on whether the value is specified or not.
x match {
case 0 => foo
case _ => bar(_)
}
is less descriptive than
x match {
case Some(i) => bar(i)
case None => foo
}

It's a bad idea, because normally you want to handle the absence of something differently. If you pass a value of "" or 0 around, this can very easily be confused with a real value; you might end up sending an email that starts "Dear Mr ," or wishing them Happy 35th Birthday because the timestamp 0 comes out as 1st January 1970. If you keep a distinction between a value and None in code and in the type system, this forces you to think about whether a value is actually set and what you want to do if it isn't.
Don't blindly just push Options everywhere though, either. If it's an error for a value to not be supplied, you should check that immediately and throw an error as soon as possible, not wait until much later in your application when it will be harder to debug where that None came from.

It won't make your "life easier". If anything, it will make it harder, and instead of avoiding app errors will make them more likely. Your app code will have to be infested with checks like if(middleName != "") { doSomething(middleName); } or if(age == 0) "Unknown age" else age.toString, and you will have to rely on the programmer remembering to handle those "kinda-optional" fields in a special way.
All of this you could get "for free" using the monadic properties of Option with middleName.foreach(doSomething) or age.map(_.toString).getOrElse("")

Related

How should substring() work?

I do not understand why Java's [String.substring() method](http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#substring(int,%20int%29) is specified the way it is. I can't tell it to start at a numbered-position and return a specified number of characters; I have to compute the end position myself. And if I specify an end position beyond the end of the String, instead of just returning the rest of the String for me, Java throws an Exception.
I'm used to languages where substring() (or substr()) takes two parameters: a start position, and a length. Is this objectively better than the way Java does it, and if so, can you prove it? What's the best language specification for substring() that you have seen, and when if ever would it be a good idea for a language to do things differently? Is that IndexOutOfBoundsException that Java throws a good design idea, or not? Does all this just come down to personal preference?
There are times when the second parameter being a length is more convenient, and there are times when the second parameter being the "offset to stop before" is more convenient. Likewise there are times when "if I give you something that's too big, just go to the end of the string" is convenient, and there are times when it indicates a bug and should really throw an exception.
The second parameter being a length is useful if you've got a fixed length of field. For instance:
// C#
String guid = fullString.Substring(offset, 36);
The second parameter being an offset is useful if you're going up to another delimited:
// Java
int nextColon = fullString.indexOf(':', start);
if (start == -1)
{
// Handle error
}
else
{
String value = fullString.substring(start, nextColon);
}
Typically, the one you want to use is the opposite to the one that's provided on your current platform, in my experience :)
I'm used to languages where
substring() (or substr()) takes two
parameters: a start position, and a
length. Is this objectively better
than the way Java does it, and if so,
can you prove it?
No, it's not objectively better. It all depends on the context in which you want to use it. If you want to extract a substring of a specific length, it's bad, but if you want to extract a substring that ends at, say, the first occurrence of "." in the string, it's better than if you first had to compute a length. The question is: which requirement is more common? I'd say the latter. Of course, the best solution would be to have both versions in the API, but if you need the length-based one all the time, using a static utility method isn't that horrible.
As for the exception, yeah, that's definitely good design. You asked for something specific, and when you can't get that specific thing, the API should not try to guess what you might have wanted instead - that way, bugs become apparent more quickly.
Also, Java DOES have an alternative substring() method that returns the substring from a start index until the end of the string.
second parameter should be optional, first parameter should accept negative values..
If you leave off the 2nd parameter it will go to the end of the string for you without you having to compute it.
Having gotten some feedback, I see when the second-parameter-as-index scenario is useful, but so far all of those scenarios seem to be working around other language/API limitations. For example, the API doesn't provide a convenient routine to give me the Strings before and after the first colon in the input String, so instead I get that String's index and call substring(). (And this explains why the second position parameter in substr() overshoots the desired index by 1, IMO.)
It seems to me that with a more comprehensive set of string-processing functions in the language's toolkit, the second-parameter-as-index scenario loses out to second-parameter-as-length. But somebody please post me a counterexample. :)
If you store this away, the problem should stop plaguing your dreams and you'll finally achieve a good night's rest:
public String skipsSubstring(String s, int index, int length) {
return s.subString(index, index+length);
}

Should I always/ever/never initialize object fields to default values?

Code styling question here.
I looked at this question which asks if the .NET CLR will really always initialize field values. (The answer is yes.) But it strikes me that I'm not sure that it's always a good idea to have it do this. My thinking is that if I see a declaration like this:
int myBlorgleCount = 0;
I have a pretty good idea that the programmer expects the count to start at zero, and is okay with that, at least for the immediate future. On the other hand, if I just see:
int myBlorgleCount;
I have no real immediate idea if 0 is a legal or reasonable value. And if the programmer just starts reading and modifying it, I don't know whether the programmer meant to start using it before they set a value to it, or if they were expecting it to be zero, etc.
On the other hand, some fairly smart people, and the Visual Studio code cleanup utility, tell me to remove these redundant declarations. What is the general consensus on this? (Is there a consensus?)
I marked this as language agnostic, but if there is an odd case out there where it's specifically a good idea to go against the grain for a particular language, that's probably worth pointing out.
EDIT: While I did put that this question was language agnostic, it obviously doesn't apply to languages like C, where no value initialization is done.
EDIT: I appreciate John's answer, but it is exactly what I'm not looking for. I understand that .NET (or Java or whatever) will do the job and initialize the values consistently and correctly. What I'm saying is that if I see code that is modifying a value that hasn't been previously explicitly set in code, I, as a code maintainer, don't know if the original coder meant it to be the default value, or just forgot to set the value, or was expecting it to be set somewhere else, etc.
Think long term maintenance.
Keep the code as explicit as possible.
Don't rely on language specific ways to initialize if you don't have to. Maybe a newer version of the language will work differently?
Future programmers will thank you.
Management will thank you.
Why obfuscate things even the slightest?
Update: Future maintainers may come from a different background. It really isn't about what is "right" it is more what will be easiest in the long run.
You are always safe in assuming the platform works the way the platform works. The .NET platform initializes all fields to default values. If you see a field that is not initialized by the code, it means the field is initialized by the CLR, not that it is uninitialized.
This concern is valid for platforms which do not guarantee initialization, but not here. In .NET, is more often indicates ignorance from the developer, thinking initialization is necessary.
Another unnecessary hangover from the past is the following:
string foo = null;
foo = MethodCall();
I've seen that from people who should know better.
I think that it makes sense to initialize the values if it clarifies the developer's intent.
In C#, there's no overhead as the values are all initialized anyway. In C/C++, uninitialized values will contain garbage/unknown values (whatever was in the memory location), so initialization was more important.
I think it should be done if it really helps to make the code more understandable.
But I think this is a general problem with all language features. My opinion on that is: If it is an official feature of the language, you can use it. (Of course there are some anti-features which should be used with caution or avoided at all, like a missing option explicit in Visual Basic or diamond inheritance in C++)
There was I time when I was very paranoid and added all kinds of unnecessary initializations, explicit casts, über-paranoid try-finally blocks, ... I once even thought about ignoring auto-boxing and replacing all occurrences with explicit type conversions, just "to be on the safe side".
The problem is: There is no end. You can avoid almost all language features, because you do not want to trust them.
Remember: It's only magic until you understand it :)
I agree with you; it may be verbose, but I like to see:
int myBlorgleCount = 0;
Now, I always initial strings though:
string myString = string.Empty;
(I just hate null strings.)
In the case where I cannot immediately set it to something useful
int myValue = SomeMethod();
I will set it to 0. That is more to avoid having to think about what the value would be otherwise. For me, the fact that integers are always set to 0 is not on the tip of my fingers, so when I see
int myValue;
it will take me a second to pull up that fact and remember what it will be set to, disrupting my thought process.
For someone who has that knowledge readily available, they will encounter
int myValue = 0;
and wonder why the hell is that person setting it to zero, when the compiler would just do it for them. This thought would interrupt their thought process.
So do which ever makes the most sense for both you and the team you are working in. If the common practice is to set it, then set it, otherwise don't.
In my experience I've found that explicitly initializing local variables (in .NET) adds more clutter than clarity.
Class-wide variables, on the other hand should always be initialized. In the past we defined system-wide custom "null" values for common variable types. This way we could always know what was uninitialized by error and what was initialized on purpose.
I always initialize fields explicitly in the constructor. For me, it's THE place to do it.
I think a lot of that comes down to past experiences.
In older and unamanged languages, the expectation is that the value is unknown. This expectation is retained by programmers coming from these languages.
Almost all modern or managed languages have defined values for recently created variables, whether that's from class constructors or language features.
For now, I think it's perfectly fine to initialize a value; what was once implicit becomes explicit. In the long run, say, in the next 10 to 20 years, people may start learning that a default value is possible, expected, and known - especially if they stay consistent across languages (eg, empty string for strings, 0 for numerics).
You Should do it, there is no need to, but it is better if you do so, because you never know if the language you are using initialize the values. By doing it yourself, you ensure your values are both initialized and with standard predefined values set.
There is nothing wrong on doing it except perhaps a bit of 'time wasted'. I would recommend it strongly. While the commend by John is quite informative, on general use it is better to go the safe path.
I usually do it for strings and in some cases collections where I don't want nulls floating around.
The general consensus where I work is "Not to do it explicitly for value types."
I wouldn't do it. C# initializes an int to zero anyways, so the two lines are functionally equivalent. One is just longer and redundant, although more descriptive to a programmer who doesn't know C#.
This is tagged as language-agnostic but most of the answers are regarding C#.
In C and C++, the best practice is to always initialize your values. There are some cases where this will be done for you such as static globals, but there shouldn't be a performance hit of any kind for redundantly initializing these values with most compilers.
I wouldn't initialise them. If you keep the declaration as close as possible to the first use, then there shouldn't be any confusion.
Another thing to remember is, if you are gonna use automatic properties, you have to rely on implicit values, like:
public int Count { get; set; }
http://www.geekherocomic.com/2009/07/27/common-pitfalls-initialize-your-variables/
If a field will often have new values stored into it without regard for what was there previously, and if it should behave as though a zero was stored there initially but there's nothing "special" about zero, then the value should be stored explicitly.
If the field represents a count or total which will never have a non-zero value written to it directly, but will instead always have other amounts added or subtracted, then zero should be considered an "empty" value, and thus need not be explicitly stated.
To use a crude analogy, consider the following two conditions:
`if (xposition != 0) ...
`if ((flags & WoozleModes.deluxe) != 0) ...
In the former scenario, comparison to the literal zero makes sense because it is checking for a position which is semantically no different from any other. In the second scenario, however, I would suggest that the comparison to the literal zero adds nothing to readability because code isn't really interested in whether the value of the expression (flags & WoozleModes.deluxe) happens to be a number other than zero, but rather whether it's "non-empty".
I don't know of any programming languages that provide separate ways of distinguishing numeric values for "zero" and "empty", other than by not requiring the use of literal zeros when indicating emptiness.

Are hard-coded STRINGS ever acceptable?

Similar to Is hard-coding literals ever acceptable?, but I'm specifically thinking of "magic strings" here.
On a large project, we have a table of configuration options like these:
Name Value
---- -----
FOO_ENABLED Y
BAR_ENABLED N
...
(Hundreds of them).
The common practice is to call a generic function to test an option like this:
if (config_options.value('FOO_ENABLED') == 'Y') ...
(Of course, this same option may need to be checked in many places in the system code.)
When adding a new option, I was considering adding a function to hide the "magic string" like this:
if (config_options.foo_enabled()) ...
However, colleagues thought I'd gone overboard and objected to doing this, preferring the hard-coding because:
That's what we normally do
It makes it easier to see what's going on when debugging the code
The trouble is, I can see their point! Realistically, we are never going to rename the options for any reason, so about the only advantage I can think of for my function is that the compiler would catch any typo like fo_enabled(), but not 'FO_ENABLED'.
What do you think? Have I missed any other advantages/disadvantages?
If I use a string once in the code, I don't generally worry about making it a constant somewhere.
If I use a string twice in the code, I'll consider making it a constant.
If I use a string three times in the code, I'll almost certainly make it a constant.
if (config_options.isTrue('FOO_ENABLED')) {...
}
Restrict your hard coded Y check to one place, even if it means writing a wrapper class for your Map.
if (config_options.isFooEnabled()) {...
}
Might seem okay until you have 100 configuration options and 100 methods (so here you can make a judgement about future application growth and needs before deciding on your implementation). Otherwise it is better to have a class of static strings for parameter names.
if (config_options.isTrue(ConfigKeys.FOO_ENABLED)) {...
}
I realise the question is old, but it came up on my margin.
AFAIC, the issue here has not been identified accurately, either in the question, or the answers. Forget about 'harcoding strings" or not, for a moment.
The database has a Reference table, containing config_options. The PK is a string.
There are two types of PKs:
Meaningful Identifiers, that the users (and developers) see and use. These PKs are supposed to be stable, they can be relied upon.
Meaningless Id columns which the users should never see, that the developers have to be aware of, and code around. These cannot be relied upon.
It is ordinary, normal, to write code using the absolute value of a meaningful PK IF CustomerCode = "IBM" ... or IF CountryCode = "AUS" etc.
referencing the absolute value of a meaningless PK is not acceptable (due to auto-increment; gaps being changed; values being replaced wholesale).
.
Your reference table uses meaningful PKs. Referencing those literal strings in code is unavoidable. Hiding the value will make maintenance more difficult; the code is no longer literal; your colleagues are right. Plus there is the additional redundant function that chews cycles. If there is a typo in the literal, you will soon find that out during Dev testing, long before UAT.
hundreds of functions for hundreds of literals is absurd. If you do implement a function, then Normalise your code, and provide a single function that can be used for any of the hundreds of literals. In which case, we are back to a naked literal, and the function can be dispensed with.
the point is, the attempt to hide the literal has no value.
.
It cannot be construed as "hardcoding", that is something quite different. I think that is where your issue is, identifying these constructs as "hardcoded". It is just referencing a Meaningfull PK literally.
Now from the perspective of any code segment only, if you use the same value a few times, you can improve the code by capturing the literal string in a variable, and then using the variable in the rest of the code block. Certainly not a function. But that is an efficiency and good practice issue. Even that does not change the effect IF CountryCode = #cc_aus
I really should use constants and no hard coded literals.
You can say they won't be changed, but you may never know. And it is best to make it a habit. To use symbolic constants.
In my experience, this kind of issue is masking a deeper problem: failure to do actual OOP and to follow the DRY principle.
In a nutshell, capture the decision at startup time by an appropriate definition for each action inside the if statements, and then throw away both the config_options and the run-time tests.
Details below.
The sample usage was:
if (config_options.value('FOO_ENABLED') == 'Y') ...
which raises the obvious question, "What's going on in the ellipsis?", especially given the following statement:
(Of course, this same option may need to be checked in many places in the system code.)
Let's assume that each of these config_option values really does correspond to a single problem domain (or implementation strategy) concept.
Instead of doing this (repeatedly, in various places throughout the code):
Take a string (tag),
Find its corresponding other string (value),
Test that value as a boolean-equivalent,
Based on that test, decide whether to perform some action.
I suggest encapsulating the concept of a "configurable action".
Let's take as an example (obviously just as hypthetical as FOO_ENABLED ... ;-) that your code has to work in either English units or metric units. If METRIC_ENABLED is "true", convert user-entered data from metric to English for internal computation, and convert back prior to displaying results.
Define an interface:
public interface MetricConverter {
double toInches(double length);
double toCentimeters(double length);
double toPounds(double weight);
double toKilograms(double weight);
}
which identifies in one place all the behavior associated with the concept of METRIC_ENABLED.
Then write concrete implementations of all the ways those behaviors are to be carried out:
public class NullConv implements MetricConverter {
double toInches(double length) {return length;}
double toCentimeters(double length) {return length;}
double toPounds(double weight) {return weight;}
double toKilograms(double weight) {return weight;}
}
and
// lame implementation, just for illustration!!!!
public class MetricConv implements MetricConverter {
public static final double LBS_PER_KG = 2.2D;
public static final double CM_PER_IN = 2.54D
double toInches(double length) {return length * CM_PER_IN;}
double toCentimeters(double length) {return length / CM_PER_IN;}
double toPounds(double weight) {return weight * LBS_PER_KG;}
double toKilograms(double weight) {return weight / LBS_PER_KG;}
}
At startup time, instead of loading a bunch of config_options values, initialize a set of configurable actions, as in:
MetricConverter converter = (metricOption()) ? new MetricConv() : new NullConv();
(where the expression metricOption() above is a stand-in for whatever one-time-only check you need to make, including looking at the value of METRIC_ENABLED ;-)
Then, wherever the code would have said:
double length = getLengthFromGui();
if (config_options.value('METRIC_ENABLED') == 'Y') {
length = length / 2.54D;
}
// do some computation to produce result
// ...
if (config_options.value('METRIC_ENABLED') == 'Y') {
result = result * 2.54D;
}
displayResultingLengthOnGui(result);
rewrite it as:
double length = converter.toInches(getLengthFromGui());
// do some computation to produce result
// ...
displayResultingLengthOnGui(converter.toCentimeters(result));
Because all of the implementation details related to that one concept are now packaged cleanly, all future maintenance related to METRIC_ENABLED can be done in one place. In addition, the run-time trade-off is a win; the "overhead" of invoking a method is trivial compared with the overhead of fetching a String value from a Map and performing String#equals.
I believe that the two reasons you have mentioned, Possible misspelling in string, that cannot be detected until run time and the possibility (although slim) of a name change would justify your idea.
On top of that you can get typed functions, now it seems you only store booleans, what if you need to store an int, a string etc. I would rather use get_foo() with a type, than get_string("FOO") or get_int("FOO").
I think there are two different issues here:
In the current project, the convention of using hard-coded strings is already well established, so all the developers working on the project are familiar with it. It might be a sub-optimal convention for all the reasons that have been listed, but everybody familiar with the code can look at it and instinctively knows what the code is supposed to do. Changing the code so that in certain parts, it uses the "new" functionality will make the code slightly harder to read (because people will have to think and remember what the new convention does) and thus a little harder to maintain. But I would guess that changing over the whole project to the new convention would potentially be prohibitively expensive unless you can quickly script the conversion.
On a new project, symbolic constants are the way IMO, for all the reasons listed. Especially because anything that makes the compiler catch errors at compile time that would otherwise be caught by a human at run time is a very useful convention to establish.
Another thing to consider is intent. If you are on a project that requires localization hard coded strings can be ambiguous. Consider the following:
const string HELLO_WORLD = "Hello world!";
print(HELLO_WORLD);
The programmer's intent is clear. Using a constant implies that this string does not need to be localized. Now look at this example:
print("Hello world!");
Here we aren't so sure. Did the programmer really not want this string to be localized or did the programmer forget about localization while he was writing this code?
I too prefer a strongly-typed configuration class if it is used through-out the code. With properly named methods you don't lose any readability. If you need to do conversions from strings to another data type (decimal/float/int), you don't need to repeat the code that does the conversion in multiple places and can cache the result so the conversion only takes place once. You've already got the basis of this in place already so I don't think it would take much to get used to the new way of doing things.

applying separation of concerns

I wonder if you think that there is a need to refactor this class.( regarding separation of concern)
publi class CSVLIstMapping<T>
{
void ReadMappingFromAttirbutes();
void GetDataFromList();
}
ReadMappingFromAttributes - Reads the mapping from the type T and stores it in the class. Has a name of the list to use and a number of csvMappingColumns which contains the name of the property to set the value in and the name of csvcolumns.
GetObjectsFromList - uses a CVSListreader ( which is passed in via the constructor) to get the data from all row's as KeyValuePair ( Key = csvcolumnName , value = actually value) and after that it uses the mappinginformation( listname and csvMappingColumns ) to set the data in the object.
I cant decide if this class has 2 concerns or one. First I felt that it had two and started to refactor out the conversion from rows to object to another object. But after this it felt awkward to use the functionality, as I first had to create a mappingretriver, and after that I had to retrive the rows and pass it in together with the mapping to the "mapper" to convert the objects from the rows
/w
Sounds like two concerns to me: parsing and mapping/binding. I'd separate them. CSV parsing should be a well-defined problem. And you should care about more than mere mapping. What about validation? If you parse a date string, don't you want to make sure that it's valid before you bind it to an object attribute? I think you should.
Rule of thumb: if it's awkward, it's wrong.
I have to say I'm finding it hard to understand what you've written there, but I think it's likely that you need to refactor the class: the names seem unclear, any method called GetFoo() should really not be returning void, and it may be possible that the whole ReadMappingFromAttribute should just be constructor logic.

api documentation and "value limits": do they match?

Do you often see in API documentation (as in 'javadoc of public functions' for example) the description of "value limits" as well as the classic documentation ?
Note: I am not talking about comments within the code
By "value limits", I mean:
does a parameter can support a null value (or an empty String, or...) ?
does a 'return value' can be null or is guaranteed to never be null (or can be "empty", or...) ?
Sample:
What I often see (without having access to source code) is:
/**
* Get all readers name for this current Report. <br />
* <b>Warning</b>The Report must have been published first.
* #param aReaderNameRegexp filter in order to return only reader matching the regexp
* #return array of reader names
*/
String[] getReaderNames(final String aReaderNameRegexp);
What I like to see would be:
/**
* Get all readers name for this current Report. <br />
* <b>Warning</b>The Report must have been published first.
* #param aReaderNameRegexp filter in order to return only reader matching the regexp
* (can be null or empty)
* #return array of reader names
* (null if Report has not yet been published,
* empty array if no reader match criteria,
* reader names array matching regexp, or all readers if regexp is null or empty)
*/
String[] getReaderNames(final String aReaderNameRegexp);
My point is:
When I use a library with a getReaderNames() function in it, I often do not even need to read the API documentation to guess what it does. But I need to be sure how to use it.
My only concern when I want to use this function is: what should I expect in term of parameters and return values ? That is all I need to know to safely setup my parameters and safely test the return value, yet I almost never see that kind of information in API documentation...
Edit:
This can influence the usage or not for checked or unchecked exceptions.
What do you think ? value limits and API, do they belong together or not ?
I think they can belong together but don't necessarily have to belong together. In your scenario, it seems like it makes sense that the limits are documented in such a way that they appear in the generated API documentation and intellisense (if the language/IDE support it).
I think it does depend on the language as well. For example, Ada has a native data type that is a "restricted integer", where you define an integer variable and explicitly indicate that it will only (and always) be within a certain numeric range. In that case, the datatype itself indicates the restriction. It should still be visible and discoverable through the API documentation and intellisense, but wouldn't be something that a developer has to specify in the comments.
However, languages like Java and C# don't have this type of restricted integer, so the developer would have to specify it in the comments if it were information that should become part of the public documentation.
I think those kinds of boundary conditions most definitely belong in the API. However, I would (and often do) go a step further and indicate WHAT those null values mean. Either I indicate it will throw an exception, or I explain what the expected results are when the boundary value is passed in.
It's hard to remember to always do this, but it's a good thing for users of your class. It's also difficult to maintain it if the contract the method presents changes (like null values are changed to no be allowed)... you have to be diligent also to update the docs when you change the semantics of the method.
Question 1
Do you often see in API documentation (as in 'javadoc of public functions' for example) the description of "value limits" as well as the classic documentation?
Almost never.
Question 2
My only concern when I want to use this function is: what should I expect in term of parameters and return values ? That is all I need to know to safely setup my parameters and safely test the return value, yet I almost never see that kind of information in API documentation...
If I used a function not properly I would expect a RuntimeException thrown by the method or a RuntimeException in another (sometimes very far) part of the program.
Comments like #param aReaderNameRegexp filter in order to ... (can be null or empty) seems to me a way to implement Design by Contract in a human-being language inside Javadoc.
Using Javadoc to enforce Design by Contract was used by iContract, now resurrected into JcontractS, that let you specify invariants, preconditions, postconditions, in more formalized way compared to the human-being language.
Question 3
This can influence the usage or not for checked or unchecked exceptions.
What do you think ? value limits and API, do they belong together or not ?
Java language doesn't have a Design by Contract feature, so you might be tempted to use Execption but I agree with you about the fact that you have to be aware about When to choose checked and unchecked exceptions. Probably you might use unchecked IllegalArgumentException, IllegalStateException, or you might use unit testing, but the major problem is how to communicate to other programmers that such code is about Design By Contract and should be considered as a contract before changing it too lightly.
I think they do, and have always placed comments in the header files (c++) arcordingly.
In addition to valid input/output/return comments, I also note which exceptions are likly to be thrown by the function (since I often want to use the return value for...well returning a value, I prefer exceptions over error codes)
//File:
// Should be a path to the teexture file to load, if it is not a full path (eg "c:\example.png") it will attempt to find the file usign the paths provided by the DataSearchPath list
//Return: The pointer to a Texture instance is returned, in the event of an error, an exception is thrown. When you are finished with the texture you chould call the Free() method.
//Exceptions:
//except::FileNotFound
//except::InvalidFile
//except::InvalidParams
//except::CreationFailed
Texture *GetTexture(const std::string &File);
#Fire Lancer: Right! I forgot about exception, but I would like to see them mentioned, especially the unchecked 'runtime' exception that this public method could throw
#Mike Stone:
you have to be diligent also to update the docs when you change the semantics of the method.
Mmmm I sure hope that the public API documentation is at the very least updated whenever a change -- that affects the contract of the function -- takes place. If not, those API documentations could be drop altogether.
To add food to yours thoughts (and go with #Scott Dorman), I just stumble upon the future of java7 annotations
What does that means ? That certain 'boundary conditions', rather than being in the documentation, should be better off in the API itself, and automatically used, at compilation time, with appropriate 'assert' generated code.
That way, if a '#CheckForNull' is in the API, the writer of the function might get away with not even documenting it! And if the semantic change, its API will reflect that change (like 'no more #CheckForNull' for instance)
That kind of approach suggests that documentation, for 'boundary conditions', is an extra bonus rather than a mandatory practice.
However, that does not cover the special values of the return object of a function. For that, a complete documentation is still needed.