Strategy for handling parameter validation in class library - language-agnostic

I got a rather big class library that contains a lot of code.
I am looking at how to optimize the performance of some of the code, and for some rather simple utility methods I've found that the parameter validation occupies a rather large portion of the runtime for some core methods.
Let me give a typical example:
A.MethodA1 runs a loop, iterating over a collection, calling B.MethodB1 for each element
B.MethodB1 processes the element and returns the result, it's a rather basic calculation, but since it is used many places, it has been put into its own method instead of being copied and pasted where needed
A.MethodA1 calls C.MethodC1 with the results of B.MethodB1, and puts the result into a list that is returned at the end of the loop
In the case I've found now, B.MethodB1 does rudimentary parameter validation. Since the method calls other internal methods, I'd like to avoid having NullReferenceExceptions several layers deep into the code, and rather fail early, hence B.MethodB1 validates the parameters, like checking for null and some basic range checks on another parameter.
However, in this particular call scenario, it is impossible (due to other program logic) for these parameters to ever have the wrong values. If they had, from the program standpoint, B.MethodB1 would never be called at all for those values, A.MethodA1 would fail before the call to B.MethodB1.
So I was considering removing the parameter validation in B.MethodB1, since it occupies roughly 65% of the method runtime (and this is part of some heavily used code.)
However, B.MethodB1 is a public method, and can thus be called from the program, in which case I want the parameter validation.
So how would you solve this dilemma?
Keep the parameter validation, and take the performance hit
Remove the parameter validation, and have potentially fail-late problems in the method
Split the method into two, one internal that doesn't have parameter validation, called by the "safe" path, and one public that has the parameter validation + a call to the internal version.
The latter one would give me the benefits of having no parameter validation, while still exposing a public entrypoint which does have parameter validation, but for some reason it doesn't sit right with me.
Opinions?

I would go with option 3. I tend to use assertions for private and internal methods and do all the validation in public methods.
By the way, is the performance hit really that big?

That's an interesting question.
Hmmm, makes me think ... "code contracts" .. It would seem like it might be technically possible to statically (at compile time) have certain code contracts be proven to be fulfilled. If this were the case and you had such a compilation validation option you could state these contracts without ever having to validate the conditions at runtime.
It would require that the client code itself be validated against the code contacts.
And, of course it would inevitably be highly dependent on the type of conditions you'd want to write, and it would probably only be feasible to prove these contracts to a certain point (how far up the possible call graph would you go?). Beyond this point the validator might have to beg off, and insist that you place a runtime check (or maybe a validation warning suppression?).
All just idle speculation. Does make me wonder a bit more about C# 4.0 code contracts. I wonder if these have support for static analysis. Have you checked them out? I've been meaning to, but learning F# is having to take priority at the moment!
Update:
Having read up a little on it, it appears that C# 4.0 does indeed have a 'static checker' as well as a binary rewriter (which takes care of altering the output binary so that pre and post condition checks are in the appropriate location)
What's not clear from my extremely quick read, is whether you can opt out of the binary rewriting - what I'm thinking here is that what you'd really be looking for is to use the code contracts, have the metadata (or code) for the contracts maintained within the various assemblies but use only the static checker for at least a selected subset of contracts, so that you in theory get proven safety without any runtime hit.
Here's a link to an article on the code contracts

Related

Whose responsibility is it to check data validity?

I am confused as to whether it is the caller or the callee's responsibility to check for data legality.
Should the callee check whether passed-in arguments should not be null and meet some other requirements so that the callee method can execute normally and successfully, and to catch any potential exceptions? Or it is the caller's responsibility to do this?
Both consumer side(client) and provider side(API) validation.
Clients should do it because it means a better experience. For example, why do a network round trip just to be told that you've got one bad text field?
Providers should do it because they should never trust clients (e.g. XSS and man in the middle attacks). How do you know the request wasn't intercepted? Validate everything.
There are several levels of valid:
All required fields present, correct formats. This is what the client validates.
# 1 plus valid relationships between fields (e.g. if X is present then Y is required).
# 1 plus # 2 plus business valid: meets all business rules for proper processing.
Only the provider side can do #2 and #3.
For an API the callee should always do proper validation and throw a descriptive exception for invalid data.
For any client with IO overhead client should do basic validation as well...
Validation: Caller vs. Called
The TLDR version is both.
The long version involves who, why, when, how, and what.
Both
Both should be ready to answer the question "can this data be operated on reliably?" Do we know enough about this data to do something meaningful with it? Many will suggest that the reliability of the data should never be trusted, but that only leads to a chicken and egg problem. Chasing it endlessly from both ends will not provide for meaningful value, but to some degree it essential.
Both must validate the shape of the data to ensure base usability. If either one does not recognize or understand the shape of the data, there is no way to know how to further handle it with any reliability. Depending on the environment, the data may need to be a particular 'type', which is often an easy way to validate shape. We often consider types that present evidence of common linage back to a particular ancestor and retain the crucial traits to possess the right shape. Other characteristics might be important if the data is anything other than an in memory structure, for instance if it is a stream or some other resource external the running context.
Many languages include data shape checking as a built-in language feature through type or interface checking. However, when favoring composition over inheritance, providing a good mechanism to verify trait existence is incumbent on the implementer. One strategy to achieve this is through dynamic programming, or particularly via type introspection, inference, or reflection.
Called
The called must validate the domain (the set of inputs) of the given context to which it will operate on. The design of the called always suggests it can handle only so many cases of input. Usually these values are broken up into certain subclasses or categories of input. We verify the domain in the called because the called is intimate with the localized constraints. It knows better than anyone else what is good input, and what is not.
Normal values: These values of the domain map to a range. For every foo there is one and only one bar.
Out of range/out of scope values: These values are part of the general domain, but will not map to a range in the context of the called. No defined behavior exists for these values, and thus no valid output is possible. Frequently out-of-range checking entails range, limit, or allowed characters (or digits, or composite values). A cardinality check (multiplicity) and subsequently a presence check (null or empty), are special forms of a range checking.
Values that lead to Illogical or undefined behavior: These values are special values, or edge cases, that are otherwise normal, but because of the algorithm design and known environment constraints, would produce unexpected results. For instance, a function that operates on numbers should guard against division by zero or accumulators that would overflow, or unintended loss of precision. Sometimes the operating environment or compiler can warn that these situations may happen, but relying on the runtime or compiler is not good practice as it may not always be capable of deducing what is possible and what is not. This stage should be largely verification, through secondary validation, that the caller provided good, usable, meaningful input.
Caller
The caller is special. The caller has two situations in which it should validate data.
The first situation is on assignment or explicit state changes, where a change happens to at least one element of the data by some explicit mechanism, internally, or externally by something in its container. This is somewhat out of scope of the question, but something to keep in mind. The important thing is to consider the context when a state change occurs, and one or more elements that describe the state are affected.
Self/Referential Integrity: Consider using an internal mechanism to validate state if other actors can reference the data. When the data has no consistency checks, it is only safe to assume it is in an indeterminate state. That is not intermediate, but indeterminate. Know thyself. When you do not use a mechanism to validate internal consistency on state change, then the data is not reliable and that leads to problems in the second situation. Make sure the data for the caller is in a known, good state; alternatively, in a known transition/recovery state. Do not make the call until you are ready.
The second situation is when the data calls a function. A caller can expect only so much from the called. The caller must know and respect that the called recognizes only a certain domain. The caller also must be self-interested, as it may continue and persist long after the called completes. This means the caller must help the called be not only successful, but also appropriate for the task: bad data in produces bad data out. On the same token, even good data in and out with respect to the called may not be appropriate for the next thing in terms of the caller. The good data out may actually be bad data in for the caller. The output of the called may invalidate the caller for the caller's current state.
Ok, so enough commentary, what should a caller validate specifically?
Logical and normal: given the data, is the called a good strategy that fits the purpose and intent? If we know it will fail with certain values, there is no point in performing the call without the appropriate guards most times. If we know the called cannot handle zero, do not ask it to as it will never succeed. What is more expensive and harder to manage: a [redundant (do we know?)] guard clause, or an exception [that occurs late in a possibly long running, externally available resource dependent process]? Implementations can change, and change suddenly. Providing the protection in the caller reduces the impact and risk in changing that implementation.
Return values: check for unsuccessful completion. This is something that a caller may or may not need to do. Before using or relying upon the returned data, check for alternative outcomes, if the system design incorporates success and failure values that may accompany the actual return value.
Footnote: In case it wasn't clear. Null is a domain issue. It may or may not be logical and normal, so it depends. If null is a natural input to a function, and the function could be reasonably expected to produce meaningful output, then leave it to the caller to use it. If the domain of the caller is such that null is not logical, then guard against it in both places.
An important question: if you are passing null to the called, and the called is producing something, isn't that a hidden creational pattern, creating something from nothing?
It's all about "contract". That's a callee that decides which parameters are fine or not.
You may put in documentation that a "null" parameter is invalid and then throwing NullPointerException or InvalidArgumentException is fine.
If returning a result for null parameter make sense - state it in the documentation. Ususally such situation is a bad design - create an overriden method with fewer parameters instead of accepting null.
Only remember about throwing descriptive exceptions. By a rule of thumb:
If the caller passed wrong arguments, different than described in documentation (i.e. null, id < 0 etc) - throw an unchecked exception (NullPointerException or InvalidArgumentException)
If the caller passed correct arguments but there may be an expected business case that makes it impossible to process the call - you may want to throw a checked descriptive exception. For example - for getPermissionsForUser(Integer userId) the caller passes userId not knowing if such user exists but it's a non-null Integer. Your method may return a list of permissions or thorw a UserNotFoundException. It may be a checked exception.
If the parameters are correct according to the documentation but they causes processing internal error - you may throw an unchecked exception. This usually means that your method is not well tested ;-)
Depends on whether you program nominally, defensively, or totally.
If you program defensively (my personal favourite for most Java methods), you validate input in the method. You throw an exception (or fail in another way) when validation fails.
If you program nominally, you don't validate input (but expect the client to make sure the input is valid). This method is useful when validation would aversely impact performance, because the validation would take a lot of time (like a time-consuming search).
If you program totally (my personal favourite for most Objective-C methods), you validate input in the method, but you change invalid input into valid input (like by snapping values to the nearest valid value).
In most cases you would program defensively (fail-fast) or totally (fail-safe). Nominal programming is risky IMO and should be avoided when expecting input from an external source.
Of course, don't forget to document everything (especially when programming nominally).
Well... it depends.
If you can be sure how to handle invalid data inside your callee then do it there.
If you are not sure (e.g. because your method is quite general and used in a few different places and ways) then let the caller decide.
For example imagine a DAO Method that has to retrieve a certain entity and you don't find it. Can you decide whether to throw an exception, maybe roll back a transaction or just consider it okay?
In cases like this it is definitely up to the caller to decide how to handle it.
Both. This is a matter of good software development on both sides and independent of environment (C/S, web, internal API) and language.
The callee should be validating all parameters against the well documented parameter list (you did document it, right?). Depending on the environment and architecture, good error messages or exceptions should be implemented to give clear indication of what is wrong with the parameters.
The caller should be ensuring that only appropriate parameter values are passed in the api call. Any invalid values should be caught as soon as possible and somehow reflected to the user.
As often occurs in life, neither side should just assume that the other guy will do the right thing and ignore the potential problem.
I'm going to take a different perspective on the question. Working inside a contained application, both caller and callee are in the same code. Then any validation that is required by the contract of the callee should be done by the callee.
So you've written a function and your contract says, "Does not accept NULL values." you should check that NULL values have not been sent and raise an error. This ensures that your code is correct, and if someone else's code is doing something it shouldn't they'll know about it sooner.
Furthermore, if you assume that other code will call your method correctly, and they don't, it will make tracking the source of potential bugs more difficult.
This is essential for "Fail Early, Fail Often" where the idea is to raise an error condition as soon as a problem is detected.
It is callee responsibility to validate data. This is because only callee knows what is valid. Also this is a good security practice.
It needs to be on both end in client side and server(callee and caller) side too.
Client :
This is most effective one.
Client validation will Reduce one request to server.
To reduce the bandwidth traffic.
Time comsuming (if it has delay responase from server)
Server :
Not to believe on UI data (due to hackers).
Mostly backend code will be reused, so we dont know whether the data will be null,etc,. so we need to validate on both callee and caler methods.
Overall,
1. If data comes from UI, Its always better to validate in UI layer and make an double check in server layer.
2. If data transfer with in server layer itself, we need to validate on callee and for double check, we requre to do on caller side also.
Thanks
In my humble opinion, and in a few more words explaining why, it is the callee's responsibility most of the time, but that doesn't mean the caller is always scot-free.
The reason why is that the callee is in the best position to know what it needs to do its work, because it's the one doing the work. It's thus good encapsulation for the object or method to be self-validating. If the callee can do no work on a null pointer, that's an invalid argument and should be thrown back out as such. If there are arguments out of range, that's easy to guard against as well.
However, "ignorance of the law is no defense". It's not a good pattern for a caller to simply shove everything it's given into its helper function and let the callee sort it out. The caller adds no value when it does this, for one thing, especially if what the caller shoves into the callee is data it was itself given by its own caller, meaning this layer of the call stack is likely redundant. It also makes both the caller's and callee's code very complex, as both sides "defend" against unwanted behavior by the other (the callee trying to salvage something workable and testing everything, and the caller wrapping the call in try-catch statements that attempt to correct the call).
The caller should therefore validate what it can know about the requirements for passed data. This is especially true when there is a time overhead inherent in making the call, such as when invoking a service proxy. If you have to wait a significant portion of a second to find out your parameters are wrong, when it would take a few ticks to do the same client-side, the advantage is obvious. The callee's guard clauses are exactly that; the last line of defense and graceful failure before something ugly gets thrown out of the actual work routine.
There should be something between caller and callee that is called a contract. The callee ensures that it does the right thing if the input data is in specified values. He still should check if the incomming data is right according to those specifications. In Java you could throw an InvalidArgumentException.
The caller should also work within the contract specifications. If he should check the data he hands over depends on the case. Ideally you should program the caller in a way that checking is unescessary because you are sure of the validity your data. If it is e.g. user input you cannot be sure that it is valid. In this case you should check it. If you don't check it you at least have to handle the exceptions and react accordingly.
The callee has the responsibility of checking that the data it receives is valid. Failure to perform this task will almost certainly result in unreliable software and exposes you to potential security issues.
Having said that if you have control of the client (caller) code then you should also perform at least some validation there as well since it will result in a better over all experience.
As a general rule try to catch problems with data as early as possible, it results in far less trouble further down the line.

Is assert in privation function redundant if check has already been made by the calling public function?

Effective java states a good practice of assertions in private methods.
"For an unexported method, you as the package author control the circumstances under which the method is called, so you can and should ensure that only valid parameter values are ever passed in. Therefore, nonpublic methods should generally check their parameters using assertions, as shown below:
For example:
// Private helper function for a recursive sort
private static void sort(long a[]) {
assert a != null;
// Do the computation;
}
My question is would asserts be required even if the public function calling the sort has a null pointer check ?
Example:
public void computeTwoNumbersThatSumToInputValue(int a[], int x) {
if (a == null) {
throw new Nullptrexception();
}
sort(a);
// code to do the required.
}
In other words, will asserts in private function be 'redudant' or mandatory in this case.
Thanks,
It's redundant if you're sure that you've got the assertion in all the calling code. In some cases, that's very obvious - in other cases it can be less so. If you're calling sort from 20 places in the class, are you sure you've checked it in every case?
It's a matter of taste and balance, with no "one size fits all" answer. The balance is in terms of code clarity (both ways!), performance (in extreme cases) and of course safety. It depends on the exact context, and I wouldn't personally like to even guarantee that I'm entirely consistent. (In other words, "level of caffeine at the time of coding" may turn out to be an influence too.)
Note that your assert is only going to execute when assertions are turned on anyway - I personally prefer to validate parameters consistently however you're running the code. I generally use the Preconditions class from Guava to make preconditions unobtrusive.
Assertions will make the helper function sort more robust to use.
Checking for parameters before passing it to any method is a good methodology to have more control over the Exceptions occurring unintentionally at the runtime.
My suggestion will be to use both the approaches in your code as there is no guarantee that all the callers of sort will do such checks. If assertions in helper methods are algorithmically of high order or seems redundant then this can be disabled (esp for production use) via use of -disableassertions or -da from command-line.
You could do that. I will quote from the Oracle docs.
An assertion is a statement in the JavaTM programming language that
enables you to test your assumptions about your program. For example,
if you write a method that calculates the speed of a particle, you
might assert that the calculated speed is less than the speed of
light.
I do not personally use assertions, but from what I gathered readings the oracle docs on it, it enables you to test your assumptions about what you expect something to do. Try/catch blocks are more for failing gracefully as an inevitability of failures bound to happen (like networking, computer problems). Basically, in a perfect world your code would always run successfully because theres nothing wrong with it code wise. But this isn't a perfect world. Also note:
Experience has shown that writing assertions while programming is one
of the quickest and most effective ways to detect and correct bugs. As
an added benefit, assertions serve to document the inner workings of
your program, enhancing maintainability.
I would say use as a preference. To answer your question, I would mainly use it to test code as the docs say, while testing assumptions you have about your code. As the second quote mentions, it has the added benefit of telling other developers (or future you) what you assume to get as parameters. As a personal preference, I leave control flow to try/catch blocks as that is what they were designed for.
*But keep in mind that assertions could be turned off.

Understanding complex post-conditions in DbC

I have been reading over design-by-contract posts and examples, and there is something that I cannot seem to wrap my head around. In all of the examples I have seen, DbC is used on a trivial class testing its own state in the post-conditions (e.g. lots of Bank Accounts).
It seems to me that most of the time when you call a method of a class, it does much more work delegating method calls to its external dependencies. I understand how to check for this in a Unit-Test with specific scenarios using dependency inversion and mock objects that focus on the external behavior of the method, but how does this work with DbC and post-conditions?
My second question has to deal with understanding complex post-conditions. It seems to me that to write out a post-condition for many functions, that you basically have to re-write the body of the function for your post-condition to know what the new state is going to be. What is the point of that?
I really do like the notion of DbC and I think that it has great promise, particularly if I can figure out how to reproduce some failure state once I find a validated contract. Over the past couple of hours I have been reading some neat stuff wrt. automatic test generation in Eiffel. I am currently trying to improve my processes in C++ development, but I am open to learning something new if I can figure out how to not lose all of the ground I have made in my current projects. Thanks.
but how does this work with DbC and
post-conditions?
Every function is basically one of these:
A sequence of statements
A conditional statement
A loop
The idea is that you should check any postconditions about the results of the function that go beyond the union of the postconditions of all the functions called.
that you basically have to re-write
the body of the function for your
post-condition to know what the new
state is going to be
Think about it the other way round. What made you write the function in the first place? What were you pursuing? Can that be expressed in a postcondition which is more simple than the function body itself? A postcondition will typically use queries (what in C++ are const functions), while the body usually combines commands and queries (methods that modify the object and methods which only get information from it).
In some cases, yes, you will find out that you can really add little value with postconditions. In these cases, writing a bunch of tests will typically be enough.
See also:
Bertrand Meyer, Contract Driven
Development
Related questions 1, 2
Delegation at the contract level
most of the time when you call a
method of a class, it does much more
work delegating method calls to its
external dependencies
As for this first question: the implementation of a function/method may call many other function/methods, but if the designer of the code had a clear mind, this does not imply that the specification of the caller is the concatenation of the specifications of the callees. For a method that calls many others, the size of the specification can remain contained if the method accomplishes a precise and well-defined task. Which it should if the whole system was well designed.
You are clearly asking your question from the point of view of run-time assertion checking. In this context, the above would perhaps be expressed as "you don't need to re-check in the post-condition of the caller that all the callees have respected their respective contracts. These checks will already be made on each call. In the post-condition of the caller, only check the functionally visible result of the caller."
Understanding complex post-conditions
You may find this "ACSL by example" document interesting (although probably different from what you're used to). It contains many examples of formal contracts for C functions. The language of the contracts is intended for static verification instead of run-time checking, with all the advantages and the drawbacks that it entails. They are a little more sophisticated than the "Bank Accounts" that you mention — these functions implement real algorithms, although simple ones. The document keeps the contracts short and readable by introducing well-thought-out auxiliary predicates (which would be called queries in Eiffel, as Daniel points out in his answer).

Doesn't Passing in Parameters that Should Be Known Implicitly Violate Encapsulation?

I often hear around here from test driven development people that having a function get large amounts of information implicitly is a bad thing. I can see were this would be bad from a testing perspective, but isn't it sometimes necessary from an encapsulation perspective? The following question comes to mind:
Is using Random and OrderBy a good shuffle algorithm?
Basically, someone wanted to create a function in C# to randomly shuffle an array. Several people told him that the random number generator should be passed in as a parameter. This seems like an egregious violation of encapsulation to me, even if it does make testing easier. Isn't the fact that an array shuffling algorithm requires any state at all other than the array it's shuffling an implementation detail that the caller should not have to care about? Wouldn't the correct place to get this information be implicitly, possibly from a thread-local singleton?
I don't think it breaks encapsulation. The only state in the array is the data itself - and "a source of randomness" is essentially a service. Why should an array naturally have an associated source of randomness? Why should that have to be a singleton? What about different situations which have different requirements - e.g. speed vs cryptographically secure randomness? There's a reason why java.util.Random has a SecureRandom subclass :) Perhaps it doesn't matter whether the shuffle's results are predictable with a lot of effort and observation - or perhaps it does. That will depend on the context, and that's information that the shuffle algorithm shouldn't care about.
Once you start thinking of it as a service, it makes sense that it's passed in as a dependency.
Yes, you could get it from a thread-local singleton (and indeed I'm going to blog about exactly that in the next few days) but I would generally code it so that the caller gets to make that decision.
One benefit of the "randomness as a service" concept is that it makes for repeatability - if you've got a test which fails, you can pass in a Random with a specific seed and know you'll always get the same results, which makes debugging easier.
Of course, there's always the option of making the Random optional - use a thread-local singleton as a default if the caller doesn't provide their own.
Yes, that does break encapsulation. As with most software design decisions, this is a trade-off between two opposing forces. If you encapsulate the RNG then you make it difficult to change for a unit test. If you make it a parameter then you make it easy for a user to change the RNG (and potentially get it wrong).
My personal preference is to make it easy to test, then provide a default implementation (a default constructor that creates its own RNG, in this particular case) and good documentation for the end user. Adding a method with the signature
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
that creates a Random using the current system time as its seed would take care of most normal use cases of this method. The original method
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
could be used for testing (pass in a Random object with a known seed) and also in those rare cases where a user decides they need a cryptographically secure RNG. The one-parameter implementation should call this method.
I don't think this violates encapsulation.
Your Example
I would say that being able to provide an RNG is a feature of the class. I would obviously provide a method that doesn't require it, but I can see times where it may be useful to be able to duplicate the randomization.
What if the array shuffler was part of a game that used the RNG for level generation. If a user wanted to save the level and play it again later, it may be more efficient to store the RNG seed.
General Case
Simple classes that have a single task like this typically don't need to worry about divulging their inner workings. What they encapsulate is the logic of the task, not the elements required by that logic.

How to design a class that has only one heavy duty work method and data returning other methods?

I want to design a class that will parse a string into tokens that are meaningful to my application.
How do I design it?
Provide a ctor that accepts a string, provide a Parse method and provide methods (let's call them "minor") that return individual tokens, count of tokens etc. OR
Provide a ctor that accepts nothing, provide a Parse method that accepts a string and minor methods as above. OR
Provide a ctor that accepts a string and provide only minor methods but no parse method. The parsing is done by the ctor.
1 and 2 have the disadvantage that the user may call minor methods without calling the Parse method. I'll have to check in every minor method that the Parse method was called.
The problem I see in 3 is that the parse method may potentially do a lot of things. It just doesn't seem right to put it in the ctor.
2 is convenient in that the user may parse any number of strings without instantiating the class again and again.
What's a good approach? What are some of the considerations?
(the language is c#, if someone cares).
Thanks
I would have a separate class with a Parse method that takes a string and converts it into a separate new object with a property for each value from the string.
ValueObject values = parsingClass.Parse(theString);
I think this is a really good question...
In general, I'd go with something that resembles option 3 above. Basically, think about your class and what it does; does it have any effective data other than the data to parse and the parsed tokens? If not, then I would generally say that if you don't have those things, then you don't really have an instance of your class; you have an incomplete instance of your class; something which you'd like to avoid.
One of the considerations that you point out is that the parsing of the tokens may be a relatively computationally complicated process; it may take a while. I agree with you that you may not want to take the hit for doing that in the constructor; in that case, it may make sense to use a Parse() method. The question that comes in, though, is whether or not there's any sensible operations that can be done on your class before the parse() method completes. If not, then you're back to the original point; before the parse() method is complete, you're effectively in an "incomplete instance" state of your class; that is, it's effectively useless. Of course, this all changes if you're willing and able to use some multithreading in your application; if you're willing to offload the computationally complicated operations onto another thread, and maintain some sort of synchronization on your class methods / accessors until you're done, then the whole parse() thing makes more sense, as you can choose to spawn that in a new thread entirely. You still run into issues of attempting to use your class before it's completely parsed everything, though.
I think an even more broad question that comes into this design, though, is what is the larger scope in which this code will be used? What is this code going to be used for, and by that, I mean, not just now, with the intended use, but is there a possibility that this code may need to grow or change as your application does? In terms of the stability of implementation, can you expect for this to be completely stable, or is it likely that something about the set of data you'll want to parse or the size of the data to parse or the tokens into which you will parse will change in the future? If the implementation has a possibility of changing, consider all the ways in which it may change; in my experience, those considerations can strongly lead to one or another implementation. And considering those things is not trivial; not by a long shot.
Lest you think this is just nitpicking, I would say, at a conservative estimate, about 10 - 15 percent of the classes that I've written have needed some level of refactoring even before the project was complete; rarely has a design that I've worked on survived implementation to come out the other side looking the same way that it did before. So considering the possible permutations of the implementation becomes very useful for determining what your implementation should be. If, say, your implementation will never possibly want to vary the size of the string to tokenize, you can make an assumption about the computatinal complexity, that may lead you one way or another on the overall design.
If the sole purpose of the class is to parse the input string into a group of properties, then I don't see any real downside in option 3. The parse operation may be expensive, but you have to do it at some point if you're going to use it.
You mention that option 2 is convenient because you can parse new values without reinstantiating the object, but if the parse operation is that expensive, I don't think that makes much difference. Compare the following code:
// Using option 3
ParsingClass myClass = new ParsingClass(inputString);
// Parse a new string.
myClass = new ParsingClass(anotherInputString);
// Using option 2
ParsingClass myClass = new ParsingClass();
myClass.Parse(inputString);
// Parse a new string.
myClass.Parse(anotherInputString);
There's not much difference in use, but with Option 2, you have to have all your minor methods and properties check to see if parsing had occurred before they can proceed. (Option 1 requires to you do everything that option 2 does internally, but also allows you to write Option 3-style code when using it.)
Alternatively, you could make the constructor private and the Parse method static, having the Parse method return an instance of the object.
// Option 4
ParsingClass myClass = ParsingClass.Parse(inputString);
// Parse a new string.
myClass = ParsingClass.Parse(anotherInputString);
Options 1 and 2 provide more flexibility, but require more code to implement. Options 3 and 4 are less flexible, but there's also less code to write. Basically, there is no one right answer to the question. It's really a matter of what fits with your existing code best.
Two important considerations:
1) Can the parsing fail?
If so, and if you put it in the constructor, then it has to throw an exception. The Parse method could return a value indicating success. So check how your colleagues feel about throwing exceptions in situations which aren't show-stopping: default is to assume they won't like it.
2) The constructor must get your object into a valid state.
If you don't mind "hasn't parsed anything yet" being a valid state of your objects, then the parse method is probably the way to go, and call the class SomethingParser.
If you don't want that, then parse in the constructor (or factory, as Garry suggests), and call the class ParsedSomething.
The difference is probably whether you are planning to pass these things as parameters into other methods. If so, then having a "not ready yet" state is a pain, because you either have to check for it in every callee and handle it gracefully, or else you have to write documentation like "the parameter must already have parsed a string". And then most likely check in every callee with an assert anyway.
You might be able to work it so that the initial state is the same as the state after parsing an empty string (or some other base value), thus avoiding the "not ready yet" problem.
Anyway, if these things are likely to be parameters, personally I'd say that they have to be "ready to go" as soon as they're constructed. If they're just going to be used locally, then you might give users a bit more flexibility if they can create them without doing the heavy lifting. The cost is requiring two lines of code instead of one, which makes your class slightly harder to use.
You could consider giving the thing two constructors and a Parse method: the string constructor is equivalent to calling the no-arg constructor, then calling Parse.