Is there a way to differentiate validation and evaluation in AllenNLP? - allennlp

Sometimes we may want some minor different behaviors during validation (i.e, validation on dev set during training) and final evaluation. I am not sure whether there is a simple way to inform our model whether it's called by a train command or a evaluate command. Now what I can do is just to use self.training to distinguish them from training, but both validation and evaluation have self.training == False, which makes it not possible for me to distinguish them. I mean if it's my own pytorch framework, that's extremely easy to do, since I can write my own train and evaluate methods, but in AllenNLP they are both part of the framework code that I don't really want to modify. Is there an easy way to do it?

You could try overriding your Model's from_archive() method to set a flag, something like self._training = False.

Related

Replicate http.HandleFunc()'s "coding style" to create our own methods/functions

There is an answered question which will help you understand what exactly I want to say.
How does the function passed to http.HandleFunc get access to http.ResponseWriter and http.Request?
There are many built-in Go functions where the function parameters get assigned this way. I want to use that coding style in my daily coding life.
I want to write a similar function/method which will get its parameter values from somewhere just like http.Handlefunc's w and r.
func (s SchoolStruct) GetSchoolDetails(name string){
// here the parameter "name" should get assigned exactly like http.HandleFunc()'s "w" and "r".
}
What http does is that it registers a callback and uses it when the time comes. You don't have to pass the arguments it takes, as servers implementation provides these arguments with correct state. If you want to copy this approach, first you have to ask:
Is there some kind of generic abstraction that computes these parameters? Is the function I write just reacting to something? Does this function have any side effects? Does it return value back to the system?
This approach is very good when you are modifying existing system, extending its behavior with independent units. So to speak, integrating into robust API.
You may be correct that this is a style of doing things, but you cannot use this style on everything. Its just too specific and good at certain group of tasks.
As #mkopriva pointed out, declaring rules and requirements, your logic should satisfy, is known way to execute this style in Go. You have to realize that your logic, encapsulated behind function pointer or interface, has to be passed and controlled by some other code you call indirectly.
I cannot possibly imagine going to such lengths when all components of the system are under your control and system has only one logic to run.

Strategy for handling parameter validation in class library

I got a rather big class library that contains a lot of code.
I am looking at how to optimize the performance of some of the code, and for some rather simple utility methods I've found that the parameter validation occupies a rather large portion of the runtime for some core methods.
Let me give a typical example:
A.MethodA1 runs a loop, iterating over a collection, calling B.MethodB1 for each element
B.MethodB1 processes the element and returns the result, it's a rather basic calculation, but since it is used many places, it has been put into its own method instead of being copied and pasted where needed
A.MethodA1 calls C.MethodC1 with the results of B.MethodB1, and puts the result into a list that is returned at the end of the loop
In the case I've found now, B.MethodB1 does rudimentary parameter validation. Since the method calls other internal methods, I'd like to avoid having NullReferenceExceptions several layers deep into the code, and rather fail early, hence B.MethodB1 validates the parameters, like checking for null and some basic range checks on another parameter.
However, in this particular call scenario, it is impossible (due to other program logic) for these parameters to ever have the wrong values. If they had, from the program standpoint, B.MethodB1 would never be called at all for those values, A.MethodA1 would fail before the call to B.MethodB1.
So I was considering removing the parameter validation in B.MethodB1, since it occupies roughly 65% of the method runtime (and this is part of some heavily used code.)
However, B.MethodB1 is a public method, and can thus be called from the program, in which case I want the parameter validation.
So how would you solve this dilemma?
Keep the parameter validation, and take the performance hit
Remove the parameter validation, and have potentially fail-late problems in the method
Split the method into two, one internal that doesn't have parameter validation, called by the "safe" path, and one public that has the parameter validation + a call to the internal version.
The latter one would give me the benefits of having no parameter validation, while still exposing a public entrypoint which does have parameter validation, but for some reason it doesn't sit right with me.
Opinions?
I would go with option 3. I tend to use assertions for private and internal methods and do all the validation in public methods.
By the way, is the performance hit really that big?
That's an interesting question.
Hmmm, makes me think ... "code contracts" .. It would seem like it might be technically possible to statically (at compile time) have certain code contracts be proven to be fulfilled. If this were the case and you had such a compilation validation option you could state these contracts without ever having to validate the conditions at runtime.
It would require that the client code itself be validated against the code contacts.
And, of course it would inevitably be highly dependent on the type of conditions you'd want to write, and it would probably only be feasible to prove these contracts to a certain point (how far up the possible call graph would you go?). Beyond this point the validator might have to beg off, and insist that you place a runtime check (or maybe a validation warning suppression?).
All just idle speculation. Does make me wonder a bit more about C# 4.0 code contracts. I wonder if these have support for static analysis. Have you checked them out? I've been meaning to, but learning F# is having to take priority at the moment!
Update:
Having read up a little on it, it appears that C# 4.0 does indeed have a 'static checker' as well as a binary rewriter (which takes care of altering the output binary so that pre and post condition checks are in the appropriate location)
What's not clear from my extremely quick read, is whether you can opt out of the binary rewriting - what I'm thinking here is that what you'd really be looking for is to use the code contracts, have the metadata (or code) for the contracts maintained within the various assemblies but use only the static checker for at least a selected subset of contracts, so that you in theory get proven safety without any runtime hit.
Here's a link to an article on the code contracts

Doesn't Passing in Parameters that Should Be Known Implicitly Violate Encapsulation?

I often hear around here from test driven development people that having a function get large amounts of information implicitly is a bad thing. I can see were this would be bad from a testing perspective, but isn't it sometimes necessary from an encapsulation perspective? The following question comes to mind:
Is using Random and OrderBy a good shuffle algorithm?
Basically, someone wanted to create a function in C# to randomly shuffle an array. Several people told him that the random number generator should be passed in as a parameter. This seems like an egregious violation of encapsulation to me, even if it does make testing easier. Isn't the fact that an array shuffling algorithm requires any state at all other than the array it's shuffling an implementation detail that the caller should not have to care about? Wouldn't the correct place to get this information be implicitly, possibly from a thread-local singleton?
I don't think it breaks encapsulation. The only state in the array is the data itself - and "a source of randomness" is essentially a service. Why should an array naturally have an associated source of randomness? Why should that have to be a singleton? What about different situations which have different requirements - e.g. speed vs cryptographically secure randomness? There's a reason why java.util.Random has a SecureRandom subclass :) Perhaps it doesn't matter whether the shuffle's results are predictable with a lot of effort and observation - or perhaps it does. That will depend on the context, and that's information that the shuffle algorithm shouldn't care about.
Once you start thinking of it as a service, it makes sense that it's passed in as a dependency.
Yes, you could get it from a thread-local singleton (and indeed I'm going to blog about exactly that in the next few days) but I would generally code it so that the caller gets to make that decision.
One benefit of the "randomness as a service" concept is that it makes for repeatability - if you've got a test which fails, you can pass in a Random with a specific seed and know you'll always get the same results, which makes debugging easier.
Of course, there's always the option of making the Random optional - use a thread-local singleton as a default if the caller doesn't provide their own.
Yes, that does break encapsulation. As with most software design decisions, this is a trade-off between two opposing forces. If you encapsulate the RNG then you make it difficult to change for a unit test. If you make it a parameter then you make it easy for a user to change the RNG (and potentially get it wrong).
My personal preference is to make it easy to test, then provide a default implementation (a default constructor that creates its own RNG, in this particular case) and good documentation for the end user. Adding a method with the signature
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
that creates a Random using the current system time as its seed would take care of most normal use cases of this method. The original method
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
could be used for testing (pass in a Random object with a known seed) and also in those rare cases where a user decides they need a cryptographically secure RNG. The one-parameter implementation should call this method.
I don't think this violates encapsulation.
Your Example
I would say that being able to provide an RNG is a feature of the class. I would obviously provide a method that doesn't require it, but I can see times where it may be useful to be able to duplicate the randomization.
What if the array shuffler was part of a game that used the RNG for level generation. If a user wanted to save the level and play it again later, it may be more efficient to store the RNG seed.
General Case
Simple classes that have a single task like this typically don't need to worry about divulging their inner workings. What they encapsulate is the logic of the task, not the elements required by that logic.

How to design a class that has only one heavy duty work method and data returning other methods?

I want to design a class that will parse a string into tokens that are meaningful to my application.
How do I design it?
Provide a ctor that accepts a string, provide a Parse method and provide methods (let's call them "minor") that return individual tokens, count of tokens etc. OR
Provide a ctor that accepts nothing, provide a Parse method that accepts a string and minor methods as above. OR
Provide a ctor that accepts a string and provide only minor methods but no parse method. The parsing is done by the ctor.
1 and 2 have the disadvantage that the user may call minor methods without calling the Parse method. I'll have to check in every minor method that the Parse method was called.
The problem I see in 3 is that the parse method may potentially do a lot of things. It just doesn't seem right to put it in the ctor.
2 is convenient in that the user may parse any number of strings without instantiating the class again and again.
What's a good approach? What are some of the considerations?
(the language is c#, if someone cares).
Thanks
I would have a separate class with a Parse method that takes a string and converts it into a separate new object with a property for each value from the string.
ValueObject values = parsingClass.Parse(theString);
I think this is a really good question...
In general, I'd go with something that resembles option 3 above. Basically, think about your class and what it does; does it have any effective data other than the data to parse and the parsed tokens? If not, then I would generally say that if you don't have those things, then you don't really have an instance of your class; you have an incomplete instance of your class; something which you'd like to avoid.
One of the considerations that you point out is that the parsing of the tokens may be a relatively computationally complicated process; it may take a while. I agree with you that you may not want to take the hit for doing that in the constructor; in that case, it may make sense to use a Parse() method. The question that comes in, though, is whether or not there's any sensible operations that can be done on your class before the parse() method completes. If not, then you're back to the original point; before the parse() method is complete, you're effectively in an "incomplete instance" state of your class; that is, it's effectively useless. Of course, this all changes if you're willing and able to use some multithreading in your application; if you're willing to offload the computationally complicated operations onto another thread, and maintain some sort of synchronization on your class methods / accessors until you're done, then the whole parse() thing makes more sense, as you can choose to spawn that in a new thread entirely. You still run into issues of attempting to use your class before it's completely parsed everything, though.
I think an even more broad question that comes into this design, though, is what is the larger scope in which this code will be used? What is this code going to be used for, and by that, I mean, not just now, with the intended use, but is there a possibility that this code may need to grow or change as your application does? In terms of the stability of implementation, can you expect for this to be completely stable, or is it likely that something about the set of data you'll want to parse or the size of the data to parse or the tokens into which you will parse will change in the future? If the implementation has a possibility of changing, consider all the ways in which it may change; in my experience, those considerations can strongly lead to one or another implementation. And considering those things is not trivial; not by a long shot.
Lest you think this is just nitpicking, I would say, at a conservative estimate, about 10 - 15 percent of the classes that I've written have needed some level of refactoring even before the project was complete; rarely has a design that I've worked on survived implementation to come out the other side looking the same way that it did before. So considering the possible permutations of the implementation becomes very useful for determining what your implementation should be. If, say, your implementation will never possibly want to vary the size of the string to tokenize, you can make an assumption about the computatinal complexity, that may lead you one way or another on the overall design.
If the sole purpose of the class is to parse the input string into a group of properties, then I don't see any real downside in option 3. The parse operation may be expensive, but you have to do it at some point if you're going to use it.
You mention that option 2 is convenient because you can parse new values without reinstantiating the object, but if the parse operation is that expensive, I don't think that makes much difference. Compare the following code:
// Using option 3
ParsingClass myClass = new ParsingClass(inputString);
// Parse a new string.
myClass = new ParsingClass(anotherInputString);
// Using option 2
ParsingClass myClass = new ParsingClass();
myClass.Parse(inputString);
// Parse a new string.
myClass.Parse(anotherInputString);
There's not much difference in use, but with Option 2, you have to have all your minor methods and properties check to see if parsing had occurred before they can proceed. (Option 1 requires to you do everything that option 2 does internally, but also allows you to write Option 3-style code when using it.)
Alternatively, you could make the constructor private and the Parse method static, having the Parse method return an instance of the object.
// Option 4
ParsingClass myClass = ParsingClass.Parse(inputString);
// Parse a new string.
myClass = ParsingClass.Parse(anotherInputString);
Options 1 and 2 provide more flexibility, but require more code to implement. Options 3 and 4 are less flexible, but there's also less code to write. Basically, there is no one right answer to the question. It's really a matter of what fits with your existing code best.
Two important considerations:
1) Can the parsing fail?
If so, and if you put it in the constructor, then it has to throw an exception. The Parse method could return a value indicating success. So check how your colleagues feel about throwing exceptions in situations which aren't show-stopping: default is to assume they won't like it.
2) The constructor must get your object into a valid state.
If you don't mind "hasn't parsed anything yet" being a valid state of your objects, then the parse method is probably the way to go, and call the class SomethingParser.
If you don't want that, then parse in the constructor (or factory, as Garry suggests), and call the class ParsedSomething.
The difference is probably whether you are planning to pass these things as parameters into other methods. If so, then having a "not ready yet" state is a pain, because you either have to check for it in every callee and handle it gracefully, or else you have to write documentation like "the parameter must already have parsed a string". And then most likely check in every callee with an assert anyway.
You might be able to work it so that the initial state is the same as the state after parsing an empty string (or some other base value), thus avoiding the "not ready yet" problem.
Anyway, if these things are likely to be parameters, personally I'd say that they have to be "ready to go" as soon as they're constructed. If they're just going to be used locally, then you might give users a bit more flexibility if they can create them without doing the heavy lifting. The cost is requiring two lines of code instead of one, which makes your class slightly harder to use.
You could consider giving the thing two constructors and a Parse method: the string constructor is equivalent to calling the no-arg constructor, then calling Parse.

DoSomethingToThing(Thing n) vs Thing.DoSomething()

What factors determine which approach is more appropriate?
I think both have their places.
You shouldn't simply use DoSomethingToThing(Thing n) just because you think "Functional programming is good". Likewise you shouldn't simply use Thing.DoSomething() because "Object Oriented programming is good".
I think it comes down to what you are trying to convey. Stop thinking about your code as a series of instructions, and start thinking about it like a paragraph or sentence of a story. Think about which parts are the most important from the point of view of the task at hand.
For example, if the part of the 'sentence' you would like to stress is the object, you should use the OO style.
Example:
fileHandle.close();
Most of the time when you're passing around file handles, the main thing you are thinking about is keeping track of the file it represents.
CounterExample:
string x = "Hello World";
submitHttpRequest( x );
In this case submitting the HTTP request is far more important than the string which is the body, so submitHttpRequst(x) is preferable to x.submitViaHttp()
Needless to say, these are not mutually exclusive. You'll probably actually have
networkConnection.submitHttpRequest(x)
in which you mix them both. The important thing is that you think about what parts are emphasized, and what you will be conveying to the future reader of the code.
To be object-oriented, tell, don't ask : http://www.pragmaticprogrammer.com/articles/tell-dont-ask.
So, Thing.DoSomething() rather than DoSomethingToThing(Thing n).
If you're dealing with internal state of a thing, Thing.DoSomething() makes more sense, because even if you change the internal representation of Thing, or how it works, the code talking to it doesn't have to change. If you're dealing with a collection of Things, or writing some utility methods, procedural-style DoSomethingToThing() might make more sense or be more straight-forward; but still, can usually be represented as a method on the object representing that collection: for instance
GetTotalPriceofThings();
vs
Cart.getTotal();
It really depends on how object oriented your code is.
Thing.DoSomething is appropriate if Thing is the subject of your sentence.
DoSomethingToThing(Thing n) is appropriate if Thing is the object of your sentence.
ThingA.DoSomethingToThingB(ThingB m) is an unavoidable combination, since in all the languages I can think of, functions belong to one class and are not mutually owned. But this makes sense because you can have a subject and an object.
Active voice is more straightforward than passive voice, so make sure your sentence has a subject that isn't just "the computer". This means, use form 1 and form 3 frequently, and use form 2 rarely.
For clarity:
// Form 1: "File handle, close."
fileHandle.close();
// Form 2: "(Computer,) close the file handle."
close(fileHandle);
// Form 3: "File handle, write the contents of another file handle."
fileHandle.writeContentsOf(anotherFileHandle);
I agree with Orion, but I'm going to rephrase the decision process.
You have a noun and a verb / an object and an action.
If many objects of this type will use this action, try to make the action part of the object.
Otherwise, try to group the action separately, but with related actions.
I like the File / string examples. There are many string operations, such as "SendAsHTTPReply", which won't happen for your average string, but do happen often in a certain setting. However, you basically will always close a File (hopefully), so it makes perfect sense to put the Close action in the class interface.
Another way to think of this is as buying part of an entertainment system. It makes sense to bundle a TV remote with a TV, because you always use them together. But it would be strange to bundle a power cable for a specific VCR with a TV, since many customers will never use this. The key idea is how often will this action be used on this object?
Not nearly enough information here. It depends if your language even supports the construct "Thing.something" or equivalent (ie. it's an OO language). If so, it's far more appropriate because that's the OO paradigm (members should be associated with the object they act on). In a procedural style, of course, DoSomethingtoThing() is your only choice... or ThingDoSomething()
DoSomethingToThing(Thing n) would be more of a functional approach whereas Thing.DoSomething() would be more of an object oriented approach.
That is the Object Oriented versus Procedural Programming choice :)
I think the well documented OO advantages apply to the Thing.DoSomething()
This has been asked Design question: does the Phone dial the PhoneNumber, or does the PhoneNumber dial itself on the Phone?
Here are a couple of factors to consider:
Can you modify or extend the Thing class. If not, use the former
Can Thing be instantiated. If not, use the later as a static method
If Thing actually get modified (i.e. has properties that change), prefer the latter. If Thing is not modified the latter is just as acceptable.
Otherwise, as objects are meant to map on to real world object, choose the method that seems more grounded in reality.
Even if you aren't working in an OO language, where you would have Thing.DoSomething(), for the overall readability of your code, having a set of functions like:
ThingDoSomething()
ThingDoAnotherTask()
ThingWeDoSomethingElse()
then
AnotherThingDoSomething()
and so on is far better.
All the code that works on "Thing" is on the one location. Of course, the "DoSomething" and other tasks should be named consistently - so you have a ThingOneRead(), a ThingTwoRead()... by now you should get point. When you go back to work on the code in twelve months time, you will appreciate taking the time to make things logical.
In general, if "something" is an action that "thing" naturally knows how to do, then you should use thing.doSomething(). That's good OO encapsulation, because otherwise DoSomethingToThing(thing) would have to access potential internal information of "thing".
For example invoice.getTotal()
If "something" is not naturally part of "thing's" domain model, then one option is to use a helper method.
For example: Logger.log(invoice)
If DoingSomething to an object is likely to produce a different result in another scenario, then i'd suggest you oneThing.DoSomethingToThing(anotherThing).
For example you may have two was of saving thing in you program so you might adopt a DatabaseObject.Save(thing) SessionObject.Save(thing) would be more advantageous than thing.Save() or thing.SaveToDatabase or thing.SaveToSession().
I rarely pass no parameters to a class, unless I'm retrieving public properties.
To add to Aeon's answer, it depends on the the thing and what you want to do to it. So if you are writing Thing, and DoSomething alters the internal state of Thing, then the best approach is Thing.DoSomething. However, if the action does more than change the internal state, then DoSomething(Thing) makes more sense. For example:
Collection.Add(Thing)
is better than
Thing.AddSelfToCollection(Collection)
And if you didn't write Thing, and cannot create a derived class, then you have no chocie but to do DoSomething(Thing)
Even in object oriented programming it might be useful to use a function call instead of a method (or for that matter calling a method of an object other than the one we call it on). Imagine a simple database persistence framework where you'd like to just call save() on an object. Instead of including an SQL statement in every class you'd like to have saved, thus complicating code, spreading SQL all across the code and making changing the storage engine a PITA, you could create an Interface defining save(Class1), save(Class2) etc. and its implementation. Then you'd actually be calling databaseSaver.save(class1) and have everything in one place.
I have to agree with Kevin Conner
Also keep in mind the caller of either of the 2 forms. The caller is probably a method of some other object that definitely does something to your Thing :)