Caffeine - How to set for each entity its own "expiration time" - caffeine

We used to use the guava cache and we want to change it to caffeine.
We want to set for each entity its own "expiration time", something like - put(K key, V value, long expiration_time).
I saw the 3 functions above and I wonder what exactly they are doing, if you can explain me the meaning ant the operations of each one of them it will be great.
For example, the return value of expireAfterCreate should be the duration we want for this entity from it's creation untill it's expiration? or something else?
I'm also wondering why we have the parameter "currentTime" in both expireAfterRead and expireAfterUpdate if we don't use it in the function?
When we used the guava cache we used the expireAfterAccess, what is the substitution for it in caffeine?
My last question is how can I set a default value for entities without a unique expiration time.
Thank you,
May

When we used the guava cache we used the expireAfterAccess, what is the substitution for it in caffeine?
We mirror the Guava API, so this is also available on the cache builder.
My last question is how can I set a default value for entities without a unique expiration time.
Use expireAfterAccess, expireAfterWrite, or return a constant duration with expireAfter(Expiry).
I saw the 3 functions above and I wonder what exactly they are doing, if you can explain me the meaning ant the operations of each one of them it will be great.
Expiry is a callback interface where a single timestamp value is updated. The invoked method corresponds to the operation performed on the cache entry (created, updated, read). An update or read that should have no effect can return currentDuration to no-op.
For example, the return value of expireAfterCreate should be the duration we want for this entity from it's creation untill it's expiration? or something else?
Yes. However if the expireAfterUpdate returns a custom value (something other than currentDuration), then that overrides the prior expiration duration.
I'm also wondering why we have the parameter "currentTime" in both expireAfterRead and expireAfterUpdate if we don't use it in the function?
This can most often be ignored, but is provided if somehow useful. It is the current nano timestamp from the Ticker (not wall clock time).
We want to set for each entity its own "expiration time", something like - put(K key, V value, long expiration_time).
The callback Expiry is required and generally recommended, because ideally entries are loaded through the cache to avoid stampedes (e.g. LoadingCache). A stampede is when multiple threads lookup the same entry, miss, load it, and overwrite each other putting it in. That wasted work rather than having only one thread perform the load and others wait for the results.
That said, this method is available under Cache.policy().expiresVariably(). Those configuration-specific methods are stashed in that area to offer more power when deemed necessary.
Thank you,
You're very welcome.

Related

drand48() always returns same value

drand48() always returns the same value after I close the app and open it again
When I call drand48() and print out the result it's always the same after closing and opening the app.
Does anybody know how to prevent that from happening and get a random number each time without a predictable pattern?
Thank you very much
This doesn't seem unexpected. Quoting from here (emphasis mine):
The srand48(), seed48() and lcong48() are initialisation entry points, one of which should be invoked before either drand48(), lrand48() or mrand48() is called. (Although it is not recommended practice, constant default initialiser values will be supplied automatically if drand48(), lrand48() or mrand48() is called without a prior call to an initialisation entry point.)
So if you don't seed the PRNG prior to using it, you'll get a constant seed, which means you'll get exactly the same sequence every time.
Note that you should usually seed exactly once, in your case probably at startup, and never in a loop where you also use the numbers.
drand48 creates a Pseudo-random number sequence. You need to set a different seed every run.
Something like this
let time = UInt32(NSDate().timeIntervalSinceReferenceDate) srand48(Int(time)) let number = drand48 ()

Do you use tense when naming methods of boolean return type?

So, when you are writing a boolean method, do you use tense, like "has" or "was", in your return method naming, or do you solely use "is"?
The following is a Java method I recently wrote, very simply ..
boolean recovered = false;
public boolean wasRecovered()
{
return recovered;
}
In this case, recovered is a state that may or may not have already occurred at this point in the code, so grammatically "was" makes sense. But does it make the same sense in code, where the "is" naming convention is usually standard?
I prefer to use IsFoo(), regardless of tense, simply because it's a well-understood convention that non-native speakers will still generally understand. Non-native speakers of English are a regular consideration in today's global dev't industry.
I use the tense which is appropriate the meaning of the value. To do otherwise essentially creates code which reads one way and behaves another. Lets look at a real world example in the .Net Framework: Thread.IsAlive
This property is presented with the present tense. This has the implication the value refers to the present and makes code like the following read very well
if (thread.IsAlive ) {
// Code that depends on the thread being alive
...
The problem here is that the property does not represent the present state of the object it represents a past state. Once the value is calculated to be true, the thread in question can immediately exit and invalidate the value. Hence the value can only safely be used to identify the past state of the thread and a past tense property is more appropriate. Lets now revisit the sample which reads a bit differently
if ( thread.WasAlive ) {
// Code that depends on the thread being alive
...
They behave the same but one reads very poorly because it in fact represents poor code.
Here's a list of some other offenders
File.Exists
Directory.Exists
DriveInfo.IsReady
WeakReference.IsAlive
The isXxx prefix is a widespread naming convention, so it's generally the best choice.
For order-sensitive operations, wasXxx is appropriate. For example, in JDBC, retrieving the value of a database column might return zero when the field is actually NULL (unset); in this case, a follow-up call to wasNull determines which it is after the actual retrieval was performed.
For retrieving attribute settings, hasXxx may be more appropriate. It's a grammar preference, as in "the object's flag is set" versus "the object has an attribute".
Then there are capability tests canXxx. For example, calling canWrite to see if a file is writable. But names like these can probably be renamed to the isXxx form, such as isWritable.
I tend to, yes. For example in error checking:
$errors = false;
public function hasErrors()
{
return $this->errors;
}
I am not sure that you are thinking about this correctly. The reason one would use the Recovered property is because that is the state the object is in now, not because that was the state the object used to be in. There may have been some process in the past (The Recovery) that has now completed, but the fact that we are accessing this property now means that there is something about that completed process that altered current state, and that current state is important. To me "Recovered" captures the nature of that state. For this example (and most similar situations) I would use IsRecovered to name the predicate that indicates this condition. (This also matches normal English: "This is a recovered document.")
It is extremely rare that I would use anything other than present tense to name a predicate (IsDirty, HasCoupon) or boolean function (IsPrime(x)) in a program.
An exception would be to indicate state that has since been changed that might need to be reinstated (DocumentWindow.WasMaximizedAtLastExit).
I would usually use an infinitive for future tense (ToBeCopied rather than WillBeCopied), since the best laid plans of software are sometimes altered (or cancelled).
It depends on whether or not you care about the past or future state of the property in question.
To try to simplify the semantics, realize that there are a few scenarios that make the IsXXX form debatable and some very common scenarios where the IsXXX form is the only useful one.
Below is the 'truth table' for Thread.IsAlive() based on possible states of the thread over time. Forget about why a thread might flip flop states, we need to focus on the language used.
Scenarios of possible thread states over time:
Past Present Future
===== ======= =======
1. alive alive alive
2. alive alive dead
3. alive dead dead
4. dead dead dead
5. dead dead alive
6. dead alive alive
7. dead alive dead
8. alive dead alive
Note: I talk about the Future state below for consistency. Knowing whether a thread will die is very likely unknowable as a subset of The Halting Problem)
When we interrogate an object by calling a method, there is a common assumption "Is this thread alive, at the time I asked? For these cases, the answer in the "Present" column is all we care about and using the IsXXX form works fine.
Scenarios #1(always alive) and #4(always dead) are the simplest and most common. The answer to IsAlive() will not change between calls. The battle over language that comes up is due to the other 6 cases where the result of calling IsAlive() depends on when it is called.
Scenarios #2(will die) and #3(has died) transitions from alive to dead.
Scenarios #5(will start) and #6(has started) transitions from dead to alive.
For these four (2, 3, 5, 6) the answer to IsAlive() is not constant. The question becomes, do I care about the Present state, IsAlive(), or am I interested in the Past/Future state, WasAlive() and WillBeAlive()? Unless you can predict the future, the WillBeAlive() call becomes meaningless for all but the most specific designs.
When dealing with a thread pool, we might need to restart threads that are in the 'dead' state to service connect requests and it doesn't matter whether they were ever alive, just that they are currently dead. In this case we might actually want to use WasDead(). Of course we should try to guarantee we don't restart a thread that was just restarted but that is a design problem, not a semantic one. Assuming that no one else can restart the thread, it doesn't matter much whether we use IsAlive() == false or WasDead() == true.
Now for the last two scenarios. Scenario #7(was dead, is alive, will be dead) is practically the same as #6. Do you know when in the future it will die? In 10 seconds, 10 minutes, 10 hours? Are you going to wait before deciding what to do. No, you only care about the current (Present) state. We're talking about naming here, not multi-threaded design.
Scenario #8(was alive, is dead, will be alive), is practically the same as #3. If you are reusing threads, then they can cycle through the alive/dead states several times. Worrying about the difference between #3 and #8 goes back to the Halting Problem and so can be disregarded.
IsAlive() should work for all cases. IsAlive() == false works (for #5 and #6) instead of adding WasAlive().
I don't mind wasRecovered that much. Recovery is a past event that may or may not have happened - this tells you whether it did or not. But if you're using it because of some consequence of recovery, I'd prefer isCached, isValid, or some other description of what those consequences actually are. Just because you've recovered something doesn't inherently mean you haven't lost it again since.
Always beware that in English, the use of a past participle as an adjective is ambiguous between transitive and intransitive verbs (and perhaps between active and passive voice). isRecovered might mean that the object has been recovered by something else, or it might mean that the object has recovered. If your object represents a patient at a hospital, does "isRecovered" mean that the patient is fit and well, or that someone has fetched the patient back from the X-ray department? wasRecovered might therefore be better for the latter.
The conceit for method naming is that you are retrieving information about the object in question. For it to be named in the past tense, it would have to be information about a previous state of the object, rather than its current state.
The only reason I could ever think of for using past tense is if I was checking a cached result of something that previously occurred but is no longer the case. For a contrived example, perhaps retriveing the previous value after something like a swap() call. It could be useful in operations that are atomic by design. Not real likely in the wild though.
Since your question is specific to Java, the method name should start with "is" if your class is a JavaBean and the method is an accessor method for a property.
http://download.oracle.com/javase/tutorial/javabeans/properties/properties.html

What is an idempotent operation?

What is an idempotent operation?
In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters. For example, removing an item from a set can be considered an idempotent operation on the set.
In mathematics, an idempotent operation is one where f(f(x)) = f(x). For example, the abs() function is idempotent because abs(abs(x)) = abs(x) for all x.
These slightly different definitions can be reconciled by considering that x in the mathematical definition represents the state of an object, and f is an operation that may mutate that object. For example, consider the Python set and its discard method. The discard method removes an element from a set, and does nothing if the element does not exist. So:
my_set.discard(x)
has exactly the same effect as doing the same operation twice:
my_set.discard(x)
my_set.discard(x)
Idempotent operations are often used in the design of network protocols, where a request to perform an operation is guaranteed to happen at least once, but might also happen more than once. If the operation is idempotent, then there is no harm in performing the operation two or more times.
See the Wikipedia article on idempotence for more information.
The above answer previously had some incorrect and misleading examples. Comments below written before April 2014 refer to an older revision.
An idempotent operation can be repeated an arbitrary number of times and the result will be the same as if it had been done only once. In arithmetic, adding zero to a number is idempotent.
Idempotence is talked about a lot in the context of "RESTful" web services. REST seeks to maximally leverage HTTP to give programs access to web content, and is usually set in contrast to SOAP-based web services, which just tunnel remote procedure call style services inside HTTP requests and responses.
REST organizes a web application into "resources" (like a Twitter user, or a Flickr image) and then uses the HTTP verbs of POST, PUT, GET, and DELETE to create, update, read, and delete those resources.
Idempotence plays an important role in REST. If you GET a representation of a REST resource (eg, GET a jpeg image from Flickr), and the operation fails, you can just repeat the GET again and again until the operation succeeds. To the web service, it doesn't matter how many times the image is gotten. Likewise, if you use a RESTful web service to update your Twitter account information, you can PUT the new information as many times as it takes in order to get confirmation from the web service. PUT-ing it a thousand times is the same as PUT-ing it once. Similarly DELETE-ing a REST resource a thousand times is the same as deleting it once. Idempotence thus makes it a lot easier to construct a web service that's resilient to communication errors.
Further reading: RESTful Web Services, by Richardson and Ruby (idempotence is discussed on page 103-104), and Roy Fielding's PhD dissertation on REST. Fielding was one of the authors of HTTP 1.1, RFC-2616, which talks about idempotence in section 9.1.2.
No matter how many times you call the operation, the result will be the same.
Idempotence means that applying an operation once or applying it multiple times has the same effect.
Examples:
Multiplication by zero. No matter how many times you do it, the result is still zero.
Setting a boolean flag. No matter how many times you do it, the flag stays set.
Deleting a row from a database with a given ID. If you try it again, the row is still gone.
For pure functions (functions with no side effects) then idempotency implies that f(x) = f(f(x)) = f(f(f(x))) = f(f(f(f(x)))) = ...... for all values of x
For functions with side effects, idempotency furthermore implies that no additional side effects will be caused after the first application. You can consider the state of the world to be an additional "hidden" parameter to the function if you like.
Note that in a world where you have concurrent actions going on, you may find that operations you thought were idempotent cease to be so (for example, another thread could unset the value of the boolean flag in the example above). Basically whenever you have concurrency and mutable state, you need to think much more carefully about idempotency.
Idempotency is often a useful property in building robust systems. For example, if there is a risk that you may receive a duplicate message from a third party, it is helpful to have the message handler act as an idempotent operation so that the message effect only happens once.
A good example of understanding an idempotent operation might be locking a car with remote key.
log(Car.state) // unlocked
Remote.lock();
log(Car.state) // locked
Remote.lock();
Remote.lock();
Remote.lock();
log(Car.state) // locked
lock is an idempotent operation. Even if there are some side effect each time you run lock, like blinking, the car is still in the same locked state, no matter how many times you run lock operation.
An idempotent operation produces the result in the same state even if you call it more than once, provided you pass in the same parameters.
An idempotent operation is an operation, action, or request that can be applied multiple times without changing the result, i.e. the state of the system, beyond the initial application.
EXAMPLES (WEB APP CONTEXT):
IDEMPOTENT:
Making multiple identical requests has the same effect as making a single request. A message in an email messaging system is opened and marked as "opened" in the database. One can open the message many times but this repeated action will only ever result in that message being in the "opened" state. This is an idempotent operation. The first time one PUTs an update to a resource using information that does not match the resource (the state of the system), the state of the system will change as the resource is updated. If one PUTs the same update to a resource repeatedly then the information in the update will match the information already in the system upon every PUT, and no change to the state of the system will occur. Repeated PUTs with the same information are idempotent: the first PUT may change the state of the system, subsequent PUTs should not.
NON-IDEMPOTENT:
If an operation always causes a change in state, like POSTing the same message to a user over and over, resulting in a new message sent and stored in the database every time, we say that the operation is NON-IDEMPOTENT.
NULLIPOTENT:
If an operation has no side effects, like purely displaying information on a web page without any change in a database (in other words you are only reading the database), we say the operation is NULLIPOTENT. All GETs should be nullipotent.
When talking about the state of the system we are obviously ignoring hopefully harmless and inevitable effects like logging and diagnostics.
Just wanted to throw out a real use case that demonstrates idempotence. In JavaScript, say you are defining a bunch of model classes (as in MVC model). The way this is often implemented is functionally equivalent to something like this (basic example):
function model(name) {
function Model() {
this.name = name;
}
return Model;
}
You could then define new classes like this:
var User = model('user');
var Article = model('article');
But if you were to try to get the User class via model('user'), from somewhere else in the code, it would fail:
var User = model('user');
// ... then somewhere else in the code (in a different scope)
var User = model('user');
Those two User constructors would be different. That is,
model('user') !== model('user');
To make it idempotent, you would just add some sort of caching mechanism, like this:
var collection = {};
function model(name) {
if (collection[name])
return collection[name];
function Model() {
this.name = name;
}
collection[name] = Model;
return Model;
}
By adding caching, every time you did model('user') it will be the same object, and so it's idempotent. So:
model('user') === model('user');
Quite a detailed and technical answers. Just adding a simple definition.
Idempotent = Re-runnable
For example,
Create operation in itself is not guaranteed to run without error if executed more than once.
But if there is an operation CreateOrUpdate then it states re-runnability (Idempotency).
Idempotent Operations: Operations that have no side-effects if executed multiple times.
Example: An operation that retrieves values from a data resource and say, prints it
Non-Idempotent Operations: Operations that would cause some harm if executed multiple times. (As they change some values or states)
Example: An operation that withdraws from a bank account
It is any operation that every nth result will result in an output matching the value of the 1st result. For instance the absolute value of -1 is 1. The absolute value of the absolute value of -1 is 1. The absolute value of the absolute value of absolute value of -1 is 1. And so on. See also: When would be a really silly time to use recursion?
An idempotent operation over a set leaves its members unchanged when applied one or more times.
It can be a unary operation like absolute(x) where x belongs to a set of positive integers. Here absolute(absolute(x)) = x.
It can be a binary operation like union of a set with itself would always return the same set.
cheers
In short, Idempotent operations means that the operation will not result in different results no matter how many times you operate the idempotent operations.
For example, according to the definition of the spec of HTTP, GET, HEAD, PUT, and DELETE are idempotent operations; however POST and PATCH are not. That's why sometimes POST is replaced by PUT.
An operation is said to be idempotent if executing it multiple times is equivalent to executing it once.
For eg: setting volume to 20.
No matter how many times the volume of TV is set to 20, end result will be that volume is 20. Even if a process executes the operation 50/100 times or more, at the end of the process the volume will be 20.
Counter example: increasing the volume by 1. If a process executes this operation 50 times, at the end volume will be initial Volume + 50 and if a process executes the operation 100 times, at the end volume will be initial Volume + 100. As you can clearly see that the end result varies based upon how many times the operation was executed. Hence, we can conclude that this operation is NOT idempotent.
I have highlighted the end result in bold.
If you think in terms of programming, let's say that I have an operation in which a function f takes foo as the input and the output of f is set to foo back. If at the end of the process (that executes this operation 50/100 times or more), my foo variable holds the value that it did when the operation was executed only ONCE, then the operation is idempotent, otherwise NOT.
foo = <some random value here, let's say -2>
{ foo = f( foo ) }   curly brackets outline the operation
if f returns the square of the input then the operation is NOT idempotent. Because foo at the end will be (-2) raised to the power (number of times operation is executed)
if f returns the absolute of the input then the operation is idempotent because no matter how many multiple times the operation is executed foo will be abs(-2).
Here, end result is defined as the final value of variable foo.
In mathematical sense, idempotence has a slightly different meaning of:
f(f(....f(x))) = f(x)
here output of f(x) is passed as input to f again which doesn't need to be the case always with programming.
my 5c:
In integration and networking the idempotency is very important.
Several examples from real-life:
Imagine, we deliver data to the target system. Data delivered by a sequence of messages.
1. What would happen if the sequence is mixed in channel? (As network packages always do :) ). If the target system is idempotent, the result will not be different. If the target system depends of the right order in the sequence, we have to implement resequencer on the target site, which would restore the right order.
2. What would happen if there are the message duplicates? If the channel of target system does not acknowledge timely, the source system (or channel itself) usually sends another copy of the message. As a result we can have duplicate message on the target system side.
If the target system is idempotent, it takes care of it and result will not be different.
If the target system is not idempotent, we have to implement deduplicator on the target system side of the channel.
For a workflow manager (as Apache Airflow) if an idempotency operation fails in your pipeline the system can retry the task automatically without affecting the system. Even if the logs change, that is good because you can see the incident.
The most important in this case is that your system can retry the task that failed and doesn't mess up the pipeline (e.g. appending the same data in a table each retry)
Let's say the client makes a request to "IstanceA" service which process the request, passes it to DB, and shuts down before sending the response. since the client does not see that it was processed and it will retry the same request. Load balancer will forward the request to another service instance, "InstanceB", which will make the same change on the same DB item.
We should use idempotent tokens. When a client sends a request to a service, it should have some kind of request-id that can be saved in DB to show that we have already executed the request. if the client retries the request, "InstanceB" will check the requestId. Since that particular request already has been executed, it will not make any change to the DB item. Those kinds of requests are called idempotent requests. So we send the same request multiple times, but we won't make any change

Which contract (Design by contract) is better?

Suppose I have a method
public Patient(int id)
{
----
}
that returns Patient object given an id.. I could define contract in 2 ways
Method would return null if patient does not exist
Method would throw an exception if patient does not exist. In this case I would also define a query method that returns true if the Patient exist in the database or false otherwise...
Which contract should I use? Any other suggestions?
Update: Please comment on this case too...
If it is not an database assigned Id and it is something a user enter in UI.. like SSN .. then which one is better..
Comment about Null pattern from Steve that I think is valid:
probably not a good idea here, as it would be really useful to know immediately when an ID did not exist.
And I also think Null pattern here would be somewhat heavy weight
Comment from Rob Wells on throwing exception because its bad Id:
i don't think a typo in a patient's name is an exceptional circumstance" IMHO
Keep in mind that going "over the wire" to another tier (whether a database or an application server) is one of the most expensive activities you can do - typically a network call will take several orders of magnitude longer than in-memory calls.
It's therefore worth while structuring your API to avoid redundant calls.
Consider, if your API is like this:
// Check to see if a given patient exists
public bool PatientExists(int id);
// Load the specified patient; throws exception if not found
public Patient GetPatient(int id);
Then you are likely to hit the database twice - or to be reliant on good caching to avoid this.
Another consideration is this: In some places your code may have a "known-good" id, in other places not. Each location requires a different policy on whether an exception should be thrown.
Here's a pattern that I've used to good effect in the past - have two methods:
// Load the specified patient; throws exception if not found
public Patient GetExistingPatient(int id);
// Search for the specified patient; returns null if not found
public Patient FindPatient(int id);
Clearly, GetExistingPatient() can be built by calling FindPatient().
This allows your calling code to get the appropriate behaviour, throwing an exception if something has gone wrong, and avoiding exception handling in cases where it is not needed.
Another option would be the Null Object pattern.
You should probably throw an exception. If you have an id that doesn't point to a valid patient, where did it come from? Something very bad has likely happened. It is an exceptional circumstance.
EDIT: If you're doing something other than an integer-based retrieval, like a search based on text, then returning null is fine. Especially since in that case you are returning a set of results, which could be more than one (more than one patient with the same name, same birth date, or whatever your criteria is).
A search function should have a different contract from a retrieval function.
It depends:
If you consider the normal operation will lead to a pation number not matching a file in the DB then an empty (NULL) record should be returned.
But if you expect that a given ID should always hit a record then when one is not found (which should be rare) then use an exception.
Other things like a DB connection error should generate an exception.
As you expect under normal situations the query to the DB to always work (though it may return 0 records or not).
P.S. I would not return a pointer. (Who owns the pointer??)
I would return an object that may or may not have the record. But that you can interogated for the existance of the record within. Potentially a smart pointer or somthing slightly smarter than a smart pointer that understands the cotext.
For this circumstance, I would have the method return null for a non-existent patient.
I tend to prefer using exceptions to assist graeful degradation when there is a problem with the system itself.
In this instance, it is mosdt probably:
a typo in the patient's ID if it was entered into a search form,
a data entry error, or
a workflow issue in that he patient's record hasn't been entered yet.
Hence, returning a null rather than an exception.
If there was a problem contacting the database, then I would have the method raise an exception.
Edit: Just saw that the patient ID in the signature was an integer, thanks Steven Lowe, so I've corrected my list of reasons.
My basic point about delineating when to use exceptions (for system errors) versus other methods of returning an error (for simple data entry typos) still stands though. IMHO.
HTH
cheers,
Rob
In a simple situation like this 1. seems to be more than sufficient. You may want to implement something like a callback method that the client calls to know why it returned null. Just a suggestion.
taking your descriptiong at face value, you probably need both:
bad IDs are errors/exceptions, as Adam pointed out, but
if you are given IDs elsewhere that might have disappeared, you will need the query method to check for them
Assuming I read that correctly...
When you call Patient(100) it will return an object reference for a Patient with an id of 100.
If no patient with an id of 100 exists, I think it should return null. Exceptions are overused IMO and this case doesn't call for it. The function simply returned a null. It didn't create some errored case that can crash your application (unless of course, you ended up not handling that null and passed it around to some other part of your application).
I would definitely have that function return 'null', especially if it was part of some search, where a user would search for a patient with a particular ID and if the object reference ended up being null, it would simply state that no patient with that id exists.
Throw an exception.
If you return null, code like this:
Console.WriteLine(Patient(id).Name);
would fail with a NullReferenceException if the id doesn't exist, which is not as helpful as a say a PatientNotFoundException(id). In this example, it's still relatively easy to track down, but consider:
somePatient = Patient(id)
// much later, in a different function:
Console.WriteLine(somePatient);
About adding a function that checks whether a patient exists: Note this won't prevent PatientNotFoundExceptions completely. For example:
if (PatientExists(id))
Console.WriteLine(Patient(id).Name);
-- another thread or another process could delete the patient between the calls to PatientExists and Patient. Also, this would mean two database queries instead of one. Usually, it's better to just try the call, and handle the exception.
Note that the situation is different for queries that return multiple values, e.g. as a list; here, it is appropriate to return an empty list if there are no matches.

api documentation and "value limits": do they match?

Do you often see in API documentation (as in 'javadoc of public functions' for example) the description of "value limits" as well as the classic documentation ?
Note: I am not talking about comments within the code
By "value limits", I mean:
does a parameter can support a null value (or an empty String, or...) ?
does a 'return value' can be null or is guaranteed to never be null (or can be "empty", or...) ?
Sample:
What I often see (without having access to source code) is:
/**
* Get all readers name for this current Report. <br />
* <b>Warning</b>The Report must have been published first.
* #param aReaderNameRegexp filter in order to return only reader matching the regexp
* #return array of reader names
*/
String[] getReaderNames(final String aReaderNameRegexp);
What I like to see would be:
/**
* Get all readers name for this current Report. <br />
* <b>Warning</b>The Report must have been published first.
* #param aReaderNameRegexp filter in order to return only reader matching the regexp
* (can be null or empty)
* #return array of reader names
* (null if Report has not yet been published,
* empty array if no reader match criteria,
* reader names array matching regexp, or all readers if regexp is null or empty)
*/
String[] getReaderNames(final String aReaderNameRegexp);
My point is:
When I use a library with a getReaderNames() function in it, I often do not even need to read the API documentation to guess what it does. But I need to be sure how to use it.
My only concern when I want to use this function is: what should I expect in term of parameters and return values ? That is all I need to know to safely setup my parameters and safely test the return value, yet I almost never see that kind of information in API documentation...
Edit:
This can influence the usage or not for checked or unchecked exceptions.
What do you think ? value limits and API, do they belong together or not ?
I think they can belong together but don't necessarily have to belong together. In your scenario, it seems like it makes sense that the limits are documented in such a way that they appear in the generated API documentation and intellisense (if the language/IDE support it).
I think it does depend on the language as well. For example, Ada has a native data type that is a "restricted integer", where you define an integer variable and explicitly indicate that it will only (and always) be within a certain numeric range. In that case, the datatype itself indicates the restriction. It should still be visible and discoverable through the API documentation and intellisense, but wouldn't be something that a developer has to specify in the comments.
However, languages like Java and C# don't have this type of restricted integer, so the developer would have to specify it in the comments if it were information that should become part of the public documentation.
I think those kinds of boundary conditions most definitely belong in the API. However, I would (and often do) go a step further and indicate WHAT those null values mean. Either I indicate it will throw an exception, or I explain what the expected results are when the boundary value is passed in.
It's hard to remember to always do this, but it's a good thing for users of your class. It's also difficult to maintain it if the contract the method presents changes (like null values are changed to no be allowed)... you have to be diligent also to update the docs when you change the semantics of the method.
Question 1
Do you often see in API documentation (as in 'javadoc of public functions' for example) the description of "value limits" as well as the classic documentation?
Almost never.
Question 2
My only concern when I want to use this function is: what should I expect in term of parameters and return values ? That is all I need to know to safely setup my parameters and safely test the return value, yet I almost never see that kind of information in API documentation...
If I used a function not properly I would expect a RuntimeException thrown by the method or a RuntimeException in another (sometimes very far) part of the program.
Comments like #param aReaderNameRegexp filter in order to ... (can be null or empty) seems to me a way to implement Design by Contract in a human-being language inside Javadoc.
Using Javadoc to enforce Design by Contract was used by iContract, now resurrected into JcontractS, that let you specify invariants, preconditions, postconditions, in more formalized way compared to the human-being language.
Question 3
This can influence the usage or not for checked or unchecked exceptions.
What do you think ? value limits and API, do they belong together or not ?
Java language doesn't have a Design by Contract feature, so you might be tempted to use Execption but I agree with you about the fact that you have to be aware about When to choose checked and unchecked exceptions. Probably you might use unchecked IllegalArgumentException, IllegalStateException, or you might use unit testing, but the major problem is how to communicate to other programmers that such code is about Design By Contract and should be considered as a contract before changing it too lightly.
I think they do, and have always placed comments in the header files (c++) arcordingly.
In addition to valid input/output/return comments, I also note which exceptions are likly to be thrown by the function (since I often want to use the return value for...well returning a value, I prefer exceptions over error codes)
//File:
// Should be a path to the teexture file to load, if it is not a full path (eg "c:\example.png") it will attempt to find the file usign the paths provided by the DataSearchPath list
//Return: The pointer to a Texture instance is returned, in the event of an error, an exception is thrown. When you are finished with the texture you chould call the Free() method.
//Exceptions:
//except::FileNotFound
//except::InvalidFile
//except::InvalidParams
//except::CreationFailed
Texture *GetTexture(const std::string &File);
#Fire Lancer: Right! I forgot about exception, but I would like to see them mentioned, especially the unchecked 'runtime' exception that this public method could throw
#Mike Stone:
you have to be diligent also to update the docs when you change the semantics of the method.
Mmmm I sure hope that the public API documentation is at the very least updated whenever a change -- that affects the contract of the function -- takes place. If not, those API documentations could be drop altogether.
To add food to yours thoughts (and go with #Scott Dorman), I just stumble upon the future of java7 annotations
What does that means ? That certain 'boundary conditions', rather than being in the documentation, should be better off in the API itself, and automatically used, at compilation time, with appropriate 'assert' generated code.
That way, if a '#CheckForNull' is in the API, the writer of the function might get away with not even documenting it! And if the semantic change, its API will reflect that change (like 'no more #CheckForNull' for instance)
That kind of approach suggests that documentation, for 'boundary conditions', is an extra bonus rather than a mandatory practice.
However, that does not cover the special values of the return object of a function. For that, a complete documentation is still needed.