What is an idempotent operation? - language-agnostic

What is an idempotent operation?

In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters. For example, removing an item from a set can be considered an idempotent operation on the set.
In mathematics, an idempotent operation is one where f(f(x)) = f(x). For example, the abs() function is idempotent because abs(abs(x)) = abs(x) for all x.
These slightly different definitions can be reconciled by considering that x in the mathematical definition represents the state of an object, and f is an operation that may mutate that object. For example, consider the Python set and its discard method. The discard method removes an element from a set, and does nothing if the element does not exist. So:
my_set.discard(x)
has exactly the same effect as doing the same operation twice:
my_set.discard(x)
my_set.discard(x)
Idempotent operations are often used in the design of network protocols, where a request to perform an operation is guaranteed to happen at least once, but might also happen more than once. If the operation is idempotent, then there is no harm in performing the operation two or more times.
See the Wikipedia article on idempotence for more information.
The above answer previously had some incorrect and misleading examples. Comments below written before April 2014 refer to an older revision.

An idempotent operation can be repeated an arbitrary number of times and the result will be the same as if it had been done only once. In arithmetic, adding zero to a number is idempotent.
Idempotence is talked about a lot in the context of "RESTful" web services. REST seeks to maximally leverage HTTP to give programs access to web content, and is usually set in contrast to SOAP-based web services, which just tunnel remote procedure call style services inside HTTP requests and responses.
REST organizes a web application into "resources" (like a Twitter user, or a Flickr image) and then uses the HTTP verbs of POST, PUT, GET, and DELETE to create, update, read, and delete those resources.
Idempotence plays an important role in REST. If you GET a representation of a REST resource (eg, GET a jpeg image from Flickr), and the operation fails, you can just repeat the GET again and again until the operation succeeds. To the web service, it doesn't matter how many times the image is gotten. Likewise, if you use a RESTful web service to update your Twitter account information, you can PUT the new information as many times as it takes in order to get confirmation from the web service. PUT-ing it a thousand times is the same as PUT-ing it once. Similarly DELETE-ing a REST resource a thousand times is the same as deleting it once. Idempotence thus makes it a lot easier to construct a web service that's resilient to communication errors.
Further reading: RESTful Web Services, by Richardson and Ruby (idempotence is discussed on page 103-104), and Roy Fielding's PhD dissertation on REST. Fielding was one of the authors of HTTP 1.1, RFC-2616, which talks about idempotence in section 9.1.2.

No matter how many times you call the operation, the result will be the same.

Idempotence means that applying an operation once or applying it multiple times has the same effect.
Examples:
Multiplication by zero. No matter how many times you do it, the result is still zero.
Setting a boolean flag. No matter how many times you do it, the flag stays set.
Deleting a row from a database with a given ID. If you try it again, the row is still gone.
For pure functions (functions with no side effects) then idempotency implies that f(x) = f(f(x)) = f(f(f(x))) = f(f(f(f(x)))) = ...... for all values of x
For functions with side effects, idempotency furthermore implies that no additional side effects will be caused after the first application. You can consider the state of the world to be an additional "hidden" parameter to the function if you like.
Note that in a world where you have concurrent actions going on, you may find that operations you thought were idempotent cease to be so (for example, another thread could unset the value of the boolean flag in the example above). Basically whenever you have concurrency and mutable state, you need to think much more carefully about idempotency.
Idempotency is often a useful property in building robust systems. For example, if there is a risk that you may receive a duplicate message from a third party, it is helpful to have the message handler act as an idempotent operation so that the message effect only happens once.

A good example of understanding an idempotent operation might be locking a car with remote key.
log(Car.state) // unlocked
Remote.lock();
log(Car.state) // locked
Remote.lock();
Remote.lock();
Remote.lock();
log(Car.state) // locked
lock is an idempotent operation. Even if there are some side effect each time you run lock, like blinking, the car is still in the same locked state, no matter how many times you run lock operation.

An idempotent operation produces the result in the same state even if you call it more than once, provided you pass in the same parameters.

An idempotent operation is an operation, action, or request that can be applied multiple times without changing the result, i.e. the state of the system, beyond the initial application.
EXAMPLES (WEB APP CONTEXT):
IDEMPOTENT:
Making multiple identical requests has the same effect as making a single request. A message in an email messaging system is opened and marked as "opened" in the database. One can open the message many times but this repeated action will only ever result in that message being in the "opened" state. This is an idempotent operation. The first time one PUTs an update to a resource using information that does not match the resource (the state of the system), the state of the system will change as the resource is updated. If one PUTs the same update to a resource repeatedly then the information in the update will match the information already in the system upon every PUT, and no change to the state of the system will occur. Repeated PUTs with the same information are idempotent: the first PUT may change the state of the system, subsequent PUTs should not.
NON-IDEMPOTENT:
If an operation always causes a change in state, like POSTing the same message to a user over and over, resulting in a new message sent and stored in the database every time, we say that the operation is NON-IDEMPOTENT.
NULLIPOTENT:
If an operation has no side effects, like purely displaying information on a web page without any change in a database (in other words you are only reading the database), we say the operation is NULLIPOTENT. All GETs should be nullipotent.
When talking about the state of the system we are obviously ignoring hopefully harmless and inevitable effects like logging and diagnostics.

Just wanted to throw out a real use case that demonstrates idempotence. In JavaScript, say you are defining a bunch of model classes (as in MVC model). The way this is often implemented is functionally equivalent to something like this (basic example):
function model(name) {
function Model() {
this.name = name;
}
return Model;
}
You could then define new classes like this:
var User = model('user');
var Article = model('article');
But if you were to try to get the User class via model('user'), from somewhere else in the code, it would fail:
var User = model('user');
// ... then somewhere else in the code (in a different scope)
var User = model('user');
Those two User constructors would be different. That is,
model('user') !== model('user');
To make it idempotent, you would just add some sort of caching mechanism, like this:
var collection = {};
function model(name) {
if (collection[name])
return collection[name];
function Model() {
this.name = name;
}
collection[name] = Model;
return Model;
}
By adding caching, every time you did model('user') it will be the same object, and so it's idempotent. So:
model('user') === model('user');

Quite a detailed and technical answers. Just adding a simple definition.
Idempotent = Re-runnable
For example,
Create operation in itself is not guaranteed to run without error if executed more than once.
But if there is an operation CreateOrUpdate then it states re-runnability (Idempotency).

Idempotent Operations: Operations that have no side-effects if executed multiple times.
Example: An operation that retrieves values from a data resource and say, prints it
Non-Idempotent Operations: Operations that would cause some harm if executed multiple times. (As they change some values or states)
Example: An operation that withdraws from a bank account

It is any operation that every nth result will result in an output matching the value of the 1st result. For instance the absolute value of -1 is 1. The absolute value of the absolute value of -1 is 1. The absolute value of the absolute value of absolute value of -1 is 1. And so on. See also: When would be a really silly time to use recursion?

An idempotent operation over a set leaves its members unchanged when applied one or more times.
It can be a unary operation like absolute(x) where x belongs to a set of positive integers. Here absolute(absolute(x)) = x.
It can be a binary operation like union of a set with itself would always return the same set.
cheers

In short, Idempotent operations means that the operation will not result in different results no matter how many times you operate the idempotent operations.
For example, according to the definition of the spec of HTTP, GET, HEAD, PUT, and DELETE are idempotent operations; however POST and PATCH are not. That's why sometimes POST is replaced by PUT.

An operation is said to be idempotent if executing it multiple times is equivalent to executing it once.
For eg: setting volume to 20.
No matter how many times the volume of TV is set to 20, end result will be that volume is 20. Even if a process executes the operation 50/100 times or more, at the end of the process the volume will be 20.
Counter example: increasing the volume by 1. If a process executes this operation 50 times, at the end volume will be initial Volume + 50 and if a process executes the operation 100 times, at the end volume will be initial Volume + 100. As you can clearly see that the end result varies based upon how many times the operation was executed. Hence, we can conclude that this operation is NOT idempotent.
I have highlighted the end result in bold.
If you think in terms of programming, let's say that I have an operation in which a function f takes foo as the input and the output of f is set to foo back. If at the end of the process (that executes this operation 50/100 times or more), my foo variable holds the value that it did when the operation was executed only ONCE, then the operation is idempotent, otherwise NOT.
foo = <some random value here, let's say -2>
{ foo = f( foo ) }   curly brackets outline the operation
if f returns the square of the input then the operation is NOT idempotent. Because foo at the end will be (-2) raised to the power (number of times operation is executed)
if f returns the absolute of the input then the operation is idempotent because no matter how many multiple times the operation is executed foo will be abs(-2).
Here, end result is defined as the final value of variable foo.
In mathematical sense, idempotence has a slightly different meaning of:
f(f(....f(x))) = f(x)
here output of f(x) is passed as input to f again which doesn't need to be the case always with programming.

my 5c:
In integration and networking the idempotency is very important.
Several examples from real-life:
Imagine, we deliver data to the target system. Data delivered by a sequence of messages.
1. What would happen if the sequence is mixed in channel? (As network packages always do :) ). If the target system is idempotent, the result will not be different. If the target system depends of the right order in the sequence, we have to implement resequencer on the target site, which would restore the right order.
2. What would happen if there are the message duplicates? If the channel of target system does not acknowledge timely, the source system (or channel itself) usually sends another copy of the message. As a result we can have duplicate message on the target system side.
If the target system is idempotent, it takes care of it and result will not be different.
If the target system is not idempotent, we have to implement deduplicator on the target system side of the channel.

For a workflow manager (as Apache Airflow) if an idempotency operation fails in your pipeline the system can retry the task automatically without affecting the system. Even if the logs change, that is good because you can see the incident.
The most important in this case is that your system can retry the task that failed and doesn't mess up the pipeline (e.g. appending the same data in a table each retry)

Let's say the client makes a request to "IstanceA" service which process the request, passes it to DB, and shuts down before sending the response. since the client does not see that it was processed and it will retry the same request. Load balancer will forward the request to another service instance, "InstanceB", which will make the same change on the same DB item.
We should use idempotent tokens. When a client sends a request to a service, it should have some kind of request-id that can be saved in DB to show that we have already executed the request. if the client retries the request, "InstanceB" will check the requestId. Since that particular request already has been executed, it will not make any change to the DB item. Those kinds of requests are called idempotent requests. So we send the same request multiple times, but we won't make any change

Related

Questions about the Boundary Value Check

I'm doing my JUnit homework and need some explanations here.
Here's the quotation from my homework description:
One of the issues with boundary conditions is that the system needs to behave well even if the boundary is approached multiple times. This should be obvious, but it doesn't always happen in practice.
Remember that we can characterize an object as state and behavior. Typically, the state is not directly accessible, but instead, is accessed indirectly by means of the behavior. That is, the behavior reflects the state of the object.
Now, if we think about boundaries in math, it might not be too surprising to imagine the the value at some boundary will be different if we approach that boundary in different ways. So, if the value can be likened to the state, the state at the boundary may vary depending on how we got there. This would mean that the behavior could be different.
To make objects that behave consistently, we would have to insure that the internal state at those boundaries is consistent. So, test cases should check this assumption. To receive challenge points for this homework assignment enhance your test cases so that potential problems around the boundaries may be discovered.
Clearly mark the Challenge test cases with the string "### challenge ###" in the comments. Include in those comments what boundary is being tested, and how you're guessing that the state of the object may be different depending on how the boundary is being approached.
I don't understand this especially the highlighted part. What does he mean by "object behave consistently" and the "potential probelms"?
Also, how is this different than general boundary check that will just throws the exception and i expected in the JUnit?
Thank you!
Without knowing the details of the homework, an answer could only be somewhat generic, but I'll try.
Boundary checking is not just exception checking, its about seeing which paths in your code are execution on what condition. If you have control statements, loops, if-else, switch, etc you have to verify, on what conditions (of your internal state) those statements are processed in what way.
To me, boundary testing is that you change certain values of an instance field in a way that would cause the behavior to run through different branches of your code.
for example, you have this behavior:
if(someInstanceValue > 5) {
return "great";
} else {
return "poor";
}
Now you could test with data for someInstanceValue that define the boundary
4 : "poor"
5 : "great"
If you have multiple fields in your class, all of them define the state but only some of them may affect a certain path in your code. As the test is a specification of your class under test, written in code, you should specify which fields are relevant to a function, and which are not (by leaving them out).
So you should set up your instance-under-test accordingly (calling all setters) or if you require more complex objects, you could use frameworks like Mockito to specify the state (in a when().thenReturn() syntax).
If you want to verify if you covered all your boundaries, you could run a mutation test against your suite using a mutation testing tool like PIT. It will flip the switches in your code (i.e. replacing a < with a >=) to check whether your test will fail. Often, it's a good source of inspiration for improving the way you test.
Neverthelss, some parts of the homework assignment sound a bit confusing to me. You may approach a boundary from two sides, ok, but there is no such thing as a state that represents THE boundary, you're either on one or the other side of the boundary. If the way, how you approached one side of a boundary matters, and the object behaves differently depending on that "history" of how you reached that state, the history becomes part of the state. In other words: different history = different state.
Keep in mind: every instance field is part of the state. Every possible combination of values of your instance fields defines a single state. Every transition from one combination to another is a state transition triggered by calling a behavior. No think of your test describing this statemachine, be listing the triple of {currentState,input} -> nextState (with input being method invocation). Wich is basically the Given-When-Then structure good tests should have.

How to best validate JSON on the server-side

When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.
It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:
Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.
The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).
The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.
Which of the two above is better? Or, is there yet a better way?
What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.
If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.
In regards to comparing your methods, here is a Pro / Cons list.
Method #1: Upon Request
Pros
The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
Cons
It could result in redundant processing meaning longer computing time.
Method #2: Validation on the Go
Pros
It's efficient theoretically by saving process and compute time doing them at the same time.
Cons
In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.
Hope this helps.
The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.
For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):
GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:
Sample object (JAVA used as example language):
public class someInnerDataFromJSON {
String name;
String address;
int housenumber;
String buildingType;
// Getters and setters
public String getName() { return name; }
public void setName(String name) { this.name=name; }
//etc.
}
The data parsed by GSON is by using the model provided, already type checked.
This is the first point where your code can abort.
After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.
Assume for this buildingType is a list:
Single family house
Multi family house
Apartment
You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.
I would definitively go for validation before processing.
Let's say you receive some json data with 10 variables of which you expect:
the first 5 variables to be of type string
6 and 7 are supposed to be integers
8, 9 and 10 are supposed to be arrays
You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.
foreach($data as $varName => $varValue){
$varType = gettype($varValue);
if(!$this->isTypeValid($varName, $varType)){
// return validation error
}
}
// continue processing
Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.
I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.
In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.
In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.
Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.
Here are some steps you could follow:
Go for a JSON parser like GSON that would just convert your entire
JSON input into the corresponding Java domain model object. (If GSON
doesn't throw an exception, you can be sure that the JSON is
perfectly valid.)
Of course, the objects which were constructed using GSON in step 1
may not be in a functionally valid state. For example, functional
checks like mandatory fields and limit checks would have to be done.
For this, you could define a validateState method which repeatedly
validates the states of the object itself and its child objects.
Here is an example of a validateState method:
public void validateState(){
//Assume this validateState is part of Customer class.
if(age<12 || age>150)
throw new IllegalArgumentException("Age should be in the range 12 to 120");
if(age<18 && (guardianId==null || guardianId.trim().equals(""))
throw new IllegalArgumentException("Guardian id is mandatory for minors");
for(Account a:customer.getAccounts()){
a.validateState(); //Throws appropriate exceptions if any inconsistency in state
}
}
The answer depends entirely on your use case.
If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.
However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.
If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.
It really depends on your requirements. But in general I'd always go for #1.
Few considerations:
For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.
Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.
For validation, I use the jsonschema python module which makes json validation easy.
Another approach:
if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.
The final decision:
If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:
1c = average time spent with method 1 on correct input
1w = average time spent with method 1 on wrong input
2c = average time spent with method 2 on correct input
2w = average time spent with method 2 on wrong input
CR = correct input rate (or frequency)
WR = wrong input rate (or frequency)
if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
chose method 1
else:
chose method 2

Is it OK to have multiple assertions in a unit test when testing complex behavior?

Here is my specific scenario.
I have a class QueryQueue that wraps the QueryTask class within the ArcGIS API for Flex. This enables me to easily queue up multiple query tasks for execution. Calling QueryQueue.execute() iterate through all the tasks in my queue and call their execute method.
When all the results have been received and processed QueryQueue will dispatch the completed event. The interface to my class is very simple.
public interface IQueryQueue
{
function get inProgress():Boolean;
function get count():int;
function get completed():ISignal;
function get canceled():ISignal;
function add(query:Query, url:String, token:Object = null):void;
function cancel():void;
function execute():void;
}
For the QueryQueue.execute method to be considered successful several things must occur.
task.execute must be called on each query task once and only once
inProgress = true while the results are pending
inProgress = false when the results have been processed
completed is dispatched when the results have been processed
canceled is never called
The processing done within the queue correctly processes and packages the query results
What I am struggling with is breaking these tests into readable, logical, and maintainable tests.
Logically I am testing one state, that is the successful execution state. This would suggest one unit test that would assert #1 through #6 above are true.
[Test] public mustReturnQueryQueueEventArgsWithResultsAndNoErrorsWhenAllQueriesAreSuccessful:void
However, the name of the test is not informative as it does not describe all the things that must be true in order to be considered a passing test.
Reading up online (including here and at programmers.stackexchange.com) there is a sizable camp that asserts that unit tests should only have one assertion (as a guideline). As a result when a test fails you know exactly what failed (i.e. inProgress not set to true, completed displayed multiple times, etc.) You wind up with potentially a lot more (but in theory simpler and clearer) tests like so:
[Test] public mustInvokeExecuteForEachQueryTaskWhenQueueIsNotEmpty():void
[Test] public mustBeInProgressWhenResultsArePending():void
[Test] public mustNotInProgressWhenResultsAreProcessedAndSent:void
[Test] public mustDispatchTheCompletedEventWhenAllResultsProcessed():void
[Test] public mustNeverDispatchTheCanceledEventWhenNotCanceled():void
[Test] public mustReturnQueryQueueEventArgsWithResultsAndNoErrorsWhenAllQueriesAreSuccessful:void
// ... and so on
This could wind up with a lot of repeated code in the tests, but that could be minimized with appropriate setup and teardown methods.
While this question is similar to other questions I am looking for an answer for this specific scenario as I think it is a good representation of a complex unit testing scenario exhibiting multiple states and behaviors that need to be verified. Many of the other questions have, unfortunately, no examples or the examples do not demonstrate complex state and behavior.
In my opinion, and there will probably be many, there are a couple of things here:
If you must test so many things for one method, then it could mean your code might be doing too much in one single method (Single Responsibility Principle)
If you disagree with the above, then the next thing I would say is that what you are describing is more of an integration/acceptance test. Which allows for multiple asserts, and you have no problems there. But, keep in mind that this might need to be relegated to a separate section of tests if you are doing automated tests (safe versus unsafe tests)
And/Or, yes, the preferred method is to test each piece separately as that is what a unit test is. The closest thing I can suggest, and this is about your tolerance for writing code just to have perfect tests...Is to check an object against an object (so you would do one assert that essentially tests this all in one). However, the argument against this is that, yes it passes the one assert per test test, but you still lose expressiveness.
Ultimately, your goal should be to strive towards the ideal (one assert per unit test) by focusing on the SOLID principles, but ultimately you do need to get things done or else there is no real point in writing software (my opinion at least :)).
Let's focus on the tests you have identified first. All except the last one (mustReturnQueryQueueEventArgs...) are good ones and I could immediatelly tell what's being tested there (and that's very good sign, indicating they're descriptive and most likely simple).
The only problem is your last test. Note that extensive use of words "and", "with", "or" in test name usually rings problems bell. It's not very clear what it's supposed to do. Return correct results comes to mind first, but one might argue it's vague term? This holds true, it is vague. However you'll often find out that this is indeed pretty common requirement, described in details by method/operation contract.
In your particular case, I'd simplify last test to verify whether correct results are returned and that would be all. You tested states, events and stuff that lead to results building already, so there is no need to that again.
Now, advices in links you provided are quite good ones actually, and generally, I suggest sticking to them (single assertion for one test). The question is, what single assertion really stands for? 1 line of code at the end of test? Let's consider this simple example then:
// a method which updates two fields of our custom entity, MyEntity
public void Update(MyEntity entity)
{
entity.Name = "some name";
entity.Value = "some value";
}
This method contract is to perform those 2 operations. By success, we understand entity to be correctly updated. If one of them for some reasons fails, method as a unit is considered to fail. You can see where this is going; you'll either have two assertions or write custom comparer purely for testing purposes.
Don't be tricked by single assertion; it's not about lines of code or number of asserts (however, in majority of tests you'll write this will indeed map 1:1), but about asserting single unit (in the example above, update is considered to be an unit). And unit might be in reality multiple things that don't make any sense at all without eachother.
And this is exactly what one of questions you linked quotes (by Roy Osherove):
My guideline is usually that you test one logical CONCEPT per test. you can have multiple asserts on the same object. they will usually be the same concept being tested.
It's all about concept/responsibility; not the number of asserts.
I am not familiar with flex, but I think I have good experience in unit testing, so you have to know that unit test is a philosophy, so for the first answer, yes you can make a multiple assert but if you test the same behavior, the main point always in unit testing is to be very maintainable and simple code, otherwise the unit test will need unit test to test it! So my advice to you is, if you are new in unit testing, don't use multiple assert, but if you have good experience with unit testing, you will know when you will need to use them

assert() vs enforce(): Which to choose?

I'm having a hard time choosing whether I should "enforce" a condition or "assert" a condition in D. (This is language-neutral, though.)
Theoretically, I know that you use assertions to find bugs, and you enforce other conditions in order to check for atypical conditions. E.g. you might say assert(count >= 0) for an argument to your method, because that indicates that there's a bug with the caller, and that you would say enforce(isNetworkConnected), because that's not a bug, it's just something that you're assuming that could very well not be true in a legitimate situation beyond your control.
Furthermore, assertions can be removed from code as an optimization, with no side effects, but enforcements cannot be removed because they must always execute their condition code. Hence if I'm implementing a lazy-filled container that fills itself on the first access to any of its methods, I say enforce(!empty()) instead of assert(!empty()), because the check for empty() must always occur, since it lazily executes code inside.
So I think I know that they're supposed to mean. But theory is easier than practice, and I'm having a hard time actually applying the concepts.
Consider the following:
I'm making a range (similar to an iterator) that iterates over two other ranges, and adds the results. (For functional programmers: I'm aware that I can use map!("a + b") instead, but I'm ignoring that for now, since it doesn't illustrate the question.) So I have code that looks like this in pseudocode:
void add(Range range1, Range range2)
{
Range result;
while (!range1.empty)
{
assert(!range2.empty); //Should this be an assertion or enforcement?
result += range1.front + range2.front;
range1.popFront();
range2.popFront();
}
}
Should that be an assertion or an enforcement? (Is it the caller's fault that the ranges don't empty at the same time? It might not have control of where the range came from -- it could've come from a user -- but then again, it still looks like a bug, doesn't it?)
Or here's another pseudocode example:
uint getFileSize(string path)
{
HANDLE hFile = CreateFile(path, ...);
assert(hFile != INVALID_HANDLE_VALUE); //Assertion or enforcement?
return GetFileSize(hFile); //and close the handle, obviously
}
...
Should this be an assertion or an enforcement? The path might come from a user -- so it might not be a bug -- but it's still a precondition of this method that the path should be valid. Do I assert or enforce?
Thanks!
I'm not sure it is entirely language-neutral. No language that I use has enforce(), and if I encountered one that did then I would want to use assert and enforce in the ways they were intended, which might be idiomatic to that language.
For instance assert in C or C++ stops the program when it fails, it doesn't throw an exception, so its usage may not be the same as what you're talking about. You don't use assert in C++ unless you think that either the caller has already made an error so grave that they can't be relied on to clean up (e.g. passing in a negative count), or else some other code elsewhere has made an error so grave that the program should be considered to be in an undefined state (e.g. your data structure appears corrupt). C++ does distinguish between runtime errors and logic errors, though, which may roughly correspond but I think are mostly about avoidable vs. unavoidable errors.
In the case of add you'd use a logic error if the author's intent is that a program which provides mismatched lists has bugs and needs fixing, or a runtime exception if it's just one of those things that might happen. For instance if your function were to handle arbitrary generators, that don't necessarily have a means of reporting their length short of destructively evaluating the whole sequence, you'd be more likely consider it an unavoidable error condition.
Calling it a logic error implies that it's the caller's responsibility to check the length before calling add, if they can't ensure it by the exercise of pure reason. So they would not be passing in a list from a user without explicitly checking the length first, and in all honesty should count themselves lucky they even got an exception rather than undefined behavior.
Calling it a runtime error expresses that it's "reasonable" (if abnormal) to pass in lists of different lengths, with the exception indicating that it happened on this occasion. Hence I think an enforcement rather than an assertion.
In the case of filesize: for the existence of a file, you should if possible treat that as a potentially recoverable failure (enforcement), not a bug (assertion). The reason is simply that there is no way for the caller to be certain that a file exists - there's always someone with more privileges who can come along and remove it, or unmount the entire fielsystem, in between a check for existence and a call to filesize. It's therefore not necessarily a logical flaw in the calling code when it doesn't exist (although the end-user might have shot themselves in the foot). Because of that fact it's likely there will be callers who can treat it as just one of those things that happens, an unavoidable error condition. Creating a file handle could also fail for out-of-memory, which is another unavoidable error on most systems, although not necessarily a recoverable one if for example over-committing is enabled.
Another example to consider is operator[] vs. at() for C++'s vector. at() throws out_of_range, a logic error, not because it's inconceivable that a caller might want to recover, or because you have to be some kind of numbskull to make the mistake of accessing an array out of range using at(), but because the error is entirely avoidable if the caller wants it to be - you can always check the size() before access if you have no other way of knowing whether your index is good or not. And so operator[] doesn't guarantee any checks at all, and in the name of efficiency an out of range access has undefined behavior.
assert should be considered a "run-time checked comment" indicating an assumption that the programmer makes at that moment. The assert is part of the function implementation. A failed assert should always be considered a bug at the point where the wrong assumption is made, so at the code location of the assert. To fix the bug, use a proper means to avoid the situation.
The proper means to avoid bad function inputs are contracts, so the example function should have a input contract that checks that range2 is at least as long as range1. The assertion inside the implementation could then still remain in place. Especially in longer more complex implementations, such an assert may inprove understandability.
An enforce is a lazy approach to throwing runtime exceptions. It is nice for quick-and-dirty code because it is better to have a check in there rather then silently ignoring the possibility of a bad condition. For production code, it should be replaced by a proper mechanism that throws a more meaningful exception.
I believe you have partly answered your question yourself. Assertions are bound to break the flow. If your assertion is wrong, you will not agree to continue with anything. If you enforce something you are making a decision to allow something to happen based on the situation. If you find that the conditions are not met, you can enforce that the entry to a particular section is denied.

Do you use tense when naming methods of boolean return type?

So, when you are writing a boolean method, do you use tense, like "has" or "was", in your return method naming, or do you solely use "is"?
The following is a Java method I recently wrote, very simply ..
boolean recovered = false;
public boolean wasRecovered()
{
return recovered;
}
In this case, recovered is a state that may or may not have already occurred at this point in the code, so grammatically "was" makes sense. But does it make the same sense in code, where the "is" naming convention is usually standard?
I prefer to use IsFoo(), regardless of tense, simply because it's a well-understood convention that non-native speakers will still generally understand. Non-native speakers of English are a regular consideration in today's global dev't industry.
I use the tense which is appropriate the meaning of the value. To do otherwise essentially creates code which reads one way and behaves another. Lets look at a real world example in the .Net Framework: Thread.IsAlive
This property is presented with the present tense. This has the implication the value refers to the present and makes code like the following read very well
if (thread.IsAlive ) {
// Code that depends on the thread being alive
...
The problem here is that the property does not represent the present state of the object it represents a past state. Once the value is calculated to be true, the thread in question can immediately exit and invalidate the value. Hence the value can only safely be used to identify the past state of the thread and a past tense property is more appropriate. Lets now revisit the sample which reads a bit differently
if ( thread.WasAlive ) {
// Code that depends on the thread being alive
...
They behave the same but one reads very poorly because it in fact represents poor code.
Here's a list of some other offenders
File.Exists
Directory.Exists
DriveInfo.IsReady
WeakReference.IsAlive
The isXxx prefix is a widespread naming convention, so it's generally the best choice.
For order-sensitive operations, wasXxx is appropriate. For example, in JDBC, retrieving the value of a database column might return zero when the field is actually NULL (unset); in this case, a follow-up call to wasNull determines which it is after the actual retrieval was performed.
For retrieving attribute settings, hasXxx may be more appropriate. It's a grammar preference, as in "the object's flag is set" versus "the object has an attribute".
Then there are capability tests canXxx. For example, calling canWrite to see if a file is writable. But names like these can probably be renamed to the isXxx form, such as isWritable.
I tend to, yes. For example in error checking:
$errors = false;
public function hasErrors()
{
return $this->errors;
}
I am not sure that you are thinking about this correctly. The reason one would use the Recovered property is because that is the state the object is in now, not because that was the state the object used to be in. There may have been some process in the past (The Recovery) that has now completed, but the fact that we are accessing this property now means that there is something about that completed process that altered current state, and that current state is important. To me "Recovered" captures the nature of that state. For this example (and most similar situations) I would use IsRecovered to name the predicate that indicates this condition. (This also matches normal English: "This is a recovered document.")
It is extremely rare that I would use anything other than present tense to name a predicate (IsDirty, HasCoupon) or boolean function (IsPrime(x)) in a program.
An exception would be to indicate state that has since been changed that might need to be reinstated (DocumentWindow.WasMaximizedAtLastExit).
I would usually use an infinitive for future tense (ToBeCopied rather than WillBeCopied), since the best laid plans of software are sometimes altered (or cancelled).
It depends on whether or not you care about the past or future state of the property in question.
To try to simplify the semantics, realize that there are a few scenarios that make the IsXXX form debatable and some very common scenarios where the IsXXX form is the only useful one.
Below is the 'truth table' for Thread.IsAlive() based on possible states of the thread over time. Forget about why a thread might flip flop states, we need to focus on the language used.
Scenarios of possible thread states over time:
Past Present Future
===== ======= =======
1. alive alive alive
2. alive alive dead
3. alive dead dead
4. dead dead dead
5. dead dead alive
6. dead alive alive
7. dead alive dead
8. alive dead alive
Note: I talk about the Future state below for consistency. Knowing whether a thread will die is very likely unknowable as a subset of The Halting Problem)
When we interrogate an object by calling a method, there is a common assumption "Is this thread alive, at the time I asked? For these cases, the answer in the "Present" column is all we care about and using the IsXXX form works fine.
Scenarios #1(always alive) and #4(always dead) are the simplest and most common. The answer to IsAlive() will not change between calls. The battle over language that comes up is due to the other 6 cases where the result of calling IsAlive() depends on when it is called.
Scenarios #2(will die) and #3(has died) transitions from alive to dead.
Scenarios #5(will start) and #6(has started) transitions from dead to alive.
For these four (2, 3, 5, 6) the answer to IsAlive() is not constant. The question becomes, do I care about the Present state, IsAlive(), or am I interested in the Past/Future state, WasAlive() and WillBeAlive()? Unless you can predict the future, the WillBeAlive() call becomes meaningless for all but the most specific designs.
When dealing with a thread pool, we might need to restart threads that are in the 'dead' state to service connect requests and it doesn't matter whether they were ever alive, just that they are currently dead. In this case we might actually want to use WasDead(). Of course we should try to guarantee we don't restart a thread that was just restarted but that is a design problem, not a semantic one. Assuming that no one else can restart the thread, it doesn't matter much whether we use IsAlive() == false or WasDead() == true.
Now for the last two scenarios. Scenario #7(was dead, is alive, will be dead) is practically the same as #6. Do you know when in the future it will die? In 10 seconds, 10 minutes, 10 hours? Are you going to wait before deciding what to do. No, you only care about the current (Present) state. We're talking about naming here, not multi-threaded design.
Scenario #8(was alive, is dead, will be alive), is practically the same as #3. If you are reusing threads, then they can cycle through the alive/dead states several times. Worrying about the difference between #3 and #8 goes back to the Halting Problem and so can be disregarded.
IsAlive() should work for all cases. IsAlive() == false works (for #5 and #6) instead of adding WasAlive().
I don't mind wasRecovered that much. Recovery is a past event that may or may not have happened - this tells you whether it did or not. But if you're using it because of some consequence of recovery, I'd prefer isCached, isValid, or some other description of what those consequences actually are. Just because you've recovered something doesn't inherently mean you haven't lost it again since.
Always beware that in English, the use of a past participle as an adjective is ambiguous between transitive and intransitive verbs (and perhaps between active and passive voice). isRecovered might mean that the object has been recovered by something else, or it might mean that the object has recovered. If your object represents a patient at a hospital, does "isRecovered" mean that the patient is fit and well, or that someone has fetched the patient back from the X-ray department? wasRecovered might therefore be better for the latter.
The conceit for method naming is that you are retrieving information about the object in question. For it to be named in the past tense, it would have to be information about a previous state of the object, rather than its current state.
The only reason I could ever think of for using past tense is if I was checking a cached result of something that previously occurred but is no longer the case. For a contrived example, perhaps retriveing the previous value after something like a swap() call. It could be useful in operations that are atomic by design. Not real likely in the wild though.
Since your question is specific to Java, the method name should start with "is" if your class is a JavaBean and the method is an accessor method for a property.
http://download.oracle.com/javase/tutorial/javabeans/properties/properties.html