What will the inner tuning learner return in nested cv in MLR? - mlr

In MLR there is a method to implement the nested cross validation. In nested cv, the inner loop is used to select the best tuning parameters and the outer loop is used to evaluate the model performance. When I combine nested cv with the feature selection process, I'm a bit confounded about what will MLR return about the inner bested tuned model. For example, I want to first apply a filter based on the correlation p value with outcome<0.05. In nested cv (I say it in training, validation and test set), it should be:
In the inner loop, for each training set, apply the filter, then tune the parameter we're interested and test in the validation set. In the inner loop, we can get the best tuning parameter and the feature set associated with it.
What I'm wondering is what the inner best tuned parameter will return for outer loop training, I assume there are two possible models:
The inner best tuned model just return the best tuned parameter, not the selected feature subset. So in the outer loop, we'll first apply the same filter, then train the training+validation set with the best tuned parameter.
The inner best tuned model return the best tuned parameter and the selected feature subset. So in the outer loop, we'll just train the training+validation set with the best tuned parameter and selected feature subset (from the inner loop).
In my opinion, I think the first one is more logic. Part of my code is as below:
svm_learner<-makeLearner("classif.svm",predict.type="prob",fix.factors.prediction = TRUE)
svm_filter<-makeFilterWrapper(learner = svm_learner,
fw.method = "t.test.filter", fw.threshold = -0.05)
svm_filter_nested<-makeTuneWrapper(svm_filter,par.set=ps,
control=ctrl,resampling=inner)
r=resample(svm_filter_nested,task,resampling=outer,models=TRUE)

Option 2) is correct.
Hyperpars are optimized for the chosen feature subset. I would not make sense to do so if you rerun the filter process again in the outer loop.
There is not more than train/predict happening in each outer fold with parameters coming from the inner optimization loop. No optimization is going on in the outer loop.
PS: You might want to ask such general questions on https://stats.stackexchange.com/ rather than on Stackoverflow since they are related to general (statistical) concepts rather than programming.
People will vote to close such questions since they lack a relation to programming. (Note that no one of the mlr-team is watching stats.stackexchange questions though)

Related

StableBaselines3 - Why does calling "model.learn(50,000)" twice not give same result as calling "model.learn(100,000)" once?

I am working on a Reinforcement Learning problem in StableBaselines3.
I am trying to understand why this code:
model = MaskablePPO(MaskableActorCriticPolicy, env, verbose=1, learning_rate=0.0003, gamma=0.975, seed=10, batch_size=256, clip_range=0.2)
model.learn(100000)
Does not give the exact same result as this code:
model = MaskablePPO(MaskableActorCriticPolicy, env, verbose=1, learning_rate=0.0003, gamma=0.975, seed=10, batch_size=256, clip_range=0.2)
model.learn(50000)
model.learn(50000)
I say they don't give the same results because in both cases, I tested out the model on a test-set through a for-loop, and the performance was different. Given that I set deterministic=True in the for-loop and I didn't change the seed, the different performance must mean the networks are different, which means the training process was different.
I was under the impression that if I run model.learn() on an existing model, it would just pick up the training where it was previously left off, but I guess that's incorrect.
Can someone help me understand why those two situations deliver different results?

mlr: what exactly does the function getBMRTuneResults?

I'm doing nested resampling with a 4x3 setup (4-fold cross-validation in the outer loop, and 3-fold cross-validation in the inner loop). For now, I only use Support Vector Machines (ksvm from kernlab). In the inner loop, I'm looking for the optimal tuning parameters C and sigma.
Calling getBMRPerformances() then outputs me the performances on the 4 individual outer test data sets. The function getBMRTuneResults() outputs 4 values for the measure I am using (in my case cohen's kappa) as well but they differ from the output of getBMRPerformances() and I don't understand what the second output actually is.
As the function name indicates, it outputs the results from tuning. So the performance values correspond to the performances calculated during the tuning (inner loop).
The four values in particular here are the performances reached by the best performing hyperparameter setting for a particular fold of your outer loop.

Pattern matching with associative and commutative operators

Pattern matching (as found in e.g. Prolog, the ML family languages and various expert system shells) normally operates by matching a query against data element by element in strict order.
In domains like automated theorem proving, however, there is a requirement to take into account that some operators are associative and commutative. Suppose we have data
A or B or C
and query
C or $X
Going by surface syntax this doesn't match, but logically it should match with $X bound to A or B because or is associative and commutative.
Is there any existing system, in any language, that does this sort of thing?
Associative-Commutative pattern matching has been around since 1981 and earlier, and is still a hot topic today.
There are lots of systems that implement this idea and make it useful; it means you can avoid write complicated pattern matches when associtivity or commutativity could be used to make the pattern match. Yes, it can be expensive; better the pattern matcher do this automatically, than you do it badly by hand.
You can see an example in a rewrite system for algebra and simple calculus implemented using our program transformation system. In this example, the symbolic language to be processed is defined by grammar rules, and those rules that have A-C properties are marked. Rewrites on trees produced by parsing the symbolic language are automatically extended to match.
The maude term rewriter implements associative and commutative pattern matching.
http://maude.cs.uiuc.edu/
I've never encountered such a thing, and I just had a more detailed look.
There is a sound computational reason for not implementing this by default - one has to essentially generate all combinations of the input before pattern matching, or you have to generate the full cross-product worth of match clauses.
I suspect that the usual way to implement this would be to simply write both patterns (in the binary case), i.e., have patterns for both C or $X and $X or C.
Depending on the underlying organisation of data (it's usually tuples), this pattern matching would involve rearranging the order of tuple elements, which would be weird (particularly in a strongly typed environment!). If it's lists instead, then you're on even shakier ground.
Incidentally, I suspect that the operation you fundamentally want is disjoint union patterns on sets, e.g.:
foo (Or ({C} disjointUnion {X})) = ...
The only programming environment I've seen that deals with sets in any detail would be Isabelle/HOL, and I'm still not sure that you can construct pattern matches over them.
EDIT: It looks like Isabelle's function functionality (rather than fun) will let you define complex non-constructor patterns, except then you have to prove that they are used consistently, and you can't use the code generator anymore.
EDIT 2: The way I implemented similar functionality over n commutative, associative and transitive operators was this:
My terms were of the form A | B | C | D, while queries were of the form B | C | $X, where $X was permitted to match zero or more things. I pre-sorted these using lexographic ordering, so that variables always occurred in the last position.
First, you construct all pairwise matches, ignoring variables for now, and recording those that match according to your rules.
{ (B,B), (C,C) }
If you treat this as a bipartite graph, then you are essentially doing a perfect marriage problem. There exist fast algorithms for finding these.
Assuming you find one, then you gather up everything that does not appear on the left-hand side of your relation (in this example, A and D), and you stuff them into the variable $X, and your match is complete. Obviously you can fail at any stage here, but this will mostly happen if there is no variable free on the RHS, or if there exists a constructor on the LHS that is not matched by anything (preventing you from finding a perfect match).
Sorry if this is a bit muddled. It's been a while since I wrote this code, but I hope this helps you, even a little bit!
For the record, this might not be a good approach in all cases. I had very complex notions of 'match' on subterms (i.e., not simple equality), and so building sets or anything would not have worked. Maybe that'll work in your case though and you can compute disjoint unions directly.

On-the-fly parser/pre-generation space/time tradeoff considerations

Do the space-related benefits of using an on-the-fly parser outweigh the time-related benefits of a pre-generated lookup table?
Long version:
I am authoring a chemistry reference tool, and am including a feature that will automatically name formulae conforming to a specific pattern; e.g. C[n]H[2n+2] => [n]ane; where [n] is an integer for the LHS; and an index into an array of names on the RHS. (meth, eth, …)
As far as I can see, this can be implemented in one of two ways:
I pre-generate a key/value dual lookup dictionary of formula <=> name pairs; either when the application starts (slower startup), or a static list which is published with the application (slower download).
Formulae are evaluated on the fly by a custom-built parser.
In approach 1. name => formula lookup becomes simpler by an order of magnitude; but the generator will, unless I want to ship dozens of megabytes of data with the application, have to have a preset, and fairly low, value for n.
Compounding this is the fact that formulae can have several terms; such as C[n]H[2n+1]OC[n']H[2n'+1]; and for each of these, the number of possible matches increases geometrically with n. Additionally, using this approach would eat RAM like nobody's business.
Approach 2. lets me support fairly large values of n using a fairly small lookup table, but makes name => formula lookup somewhat more complex. Compared to the pre-generation to file for shipping with the application, it also lets me correct errors in the generation logic without having to ship new data files.
This also requires that each formula be matched against a cursory test for several rules, determining if it could fit; which, if there are a lot of rules, takes time that might lead to noticeable slowdowns in the interface.
The question then, is:
Are there any considerations in the tradeoff I have failed to account for, or approaches that I haven't considered?
Do the benefits of using an on-the-fly parser justify the increased implementation complexity?
You should go with the second approach.
One possible solution is a greedy algorithm. Define your set of transformations as a regular expression (used to test the pattern) and a function which is given the regexp match object and returns the transformed string.
Regular expressions aren't quite powerful enough to handle what you want directly. Instead you'll have to do something like:
m = re.match(r"C\[(\d+)\]H\[(\d+)]\]", formula)
if m:
C_count, H_count = int(m.group(1)), int(m.group(2))
match_size = len(m.group(0))
if C_count*2+2 == H_count:
replacement = alkane_lookup[C_count]
elif C_count*2 == H_count:
replacement = alkene_lookup[C_count]
...
else:
replacement = m.group(0) # no replacement available
(plus a lot more for the other possibilities)
then embed that in a loop which looks like:
formula = "...."
new_formula = ""
while formula:
match_size, replacement = find_replacement(formula)
new_formula += replacement
formula = formula[match_size:]
(You'll need to handle the case where nothing matches. One possible way is to include a list of all possible elements at the end of find_replacement(), which only returns the next element and counts.)
This is a greedy algorithm, which doesn't guarantee the smallest solution. That's more complicated, but since chemists themselves have different ideas of the right form, I wouldn't worry so much about it.

What is an invariant?

The word seems to get used in a number of contexts. The best I can figure is that they mean a variable that can't change. Isn't that what constants/finals (darn you Java!) are for?
An invariant is more "conceptual" than a variable. In general, it's a property of the program state that is always true. A function or method that ensures that the invariant holds is said to maintain the invariant.
For instance, a binary search tree might have the invariant that for every node, the key of the node's left child is less than the node's own key. A correctly written insertion function for this tree will maintain that invariant.
As you can tell, that's not the sort of thing you can store in a variable: it's more a statement about the program. By figuring out what sort of invariants your program should maintain, then reviewing your code to make sure that it actually maintains those invariants, you can avoid logical errors in your code.
It is a condition you know to always be true at a particular place in your logic and can check for when debugging to work out what has gone wrong.
The magic of wikipedia: Invariant (computer science)
In computer science, a predicate that,
if true, will remain true throughout a
specific sequence of operations, is
called (an) invariant to that
sequence.
This answer is for my 5 year old kid. Do not think of an invariant as a constant or fixed numerical value. But it can be. However, it is more than that.
Rather, an invariant is something like of a fixed relationship between varying entities. For example, your age will always be less than that compared to your biological parents. Both your age, and your parent's age changes in the passage of time, but the relationship that i mentioned above is an invariant.
An invariant can also be a numerical constant. For example, the value of pi is an invariant ratio between the circle's circumference over its diameter. No matter how big or small the circle is, that ratio will always be pi.
I usually view them more in terms of algorithms or structures.
For example, you could have a loop invariant that could be asserted--always true at the beginning or end of each iteration. That is, if your loop was supposed to process a collection of objects from one stack to another, you could say that |stack1|+|stack2|=c, at the top or bottom of the loop.
If the invariant check failed, it would indicate something went wrong. In this example, it could mean that you forgot to push the processed element onto the final stack, etc.
As this line states:
In computer science, a predicate that, if true, will remain true throughout a specific sequence of operations, is called (an) invariant to that sequence.
To better understand this hope this example in C++ helps.
Consider a scenario where you have to get some values and get the total count of them in a variable called as count and add them in a variable called as sum
The invariant (again it's more like a concept):
// invariant:
// we have read count grades so far, and
// sum is the sum of the first count grades
The code for the above would be something like this,
int count=0;
double sum=0,x=0;
while (cin >> x) {
++count;
sum+=x;
}
What the above code does?
1) Reads the input from cin and puts them in x
2) After one successful read, increment count and sum = sum + x
3) Repeat 1-2 until read stops ( i.e ctrl+D)
Loop invariant:
The invariant must be True ALWAYS. So initially you start out your code with just this
while(cin>>x){
}
This loop reads data from standard input and stores in x. Well and good. But the invariant becomes false because the first part of our invariant wasn't followed (or kept true).
// we have read count grades so far, and
How to keep the invariant true?
Simple! increment count.
So ++count; would do good!. Now our code becomes something like this,
while(cin>>x){
++count;
}
But
Even now our invariant (a concept which must be TRUE) is False because now we didn't satisfy the second part of our invariant.
// sum is the sum of the first count grades
So what to do now?
Add x to sum and store it in sum ( sum+=x) and the next time
cin>>x will read a new value into x.
Now our code becomes something like this,
while(cin>>x){
++count;
sum+=x;
}
Let's check
Whether code matches our invariant
// invariant:
// we have read count grades so far, and
// sum is the sum of the first count grades
code:
while(cin>>x){
++count;
sum+=x;
}
Ah!. Now the loop invariant is True always and code works fine.
The above example was taken and modified from the book Accelerated C++ by Andrew-koening and Barbara-E
Something that doesn't change within a block of code
All the answers here are great, but i felt that i can shed more light on the matter:
Invariant from a language point of view means something that never changes. The concept though comes actually from math, it's one of the popular proof techniques when combined with induction.
Here is how a proof goes, If you can find an invariant that is in the initial state, And that this invariant persists regardless of any [legal] transformation applied to the state, then you can prove that If a certain state does not have this invariant then it can never occur, no matter what sequence of transformations are applied to the initial state.
Now the previous way of thinking (again combined with induction) makes it possible to predicate the logic of computer software. Especially important when the execution goes in loops, in which an invariant can be used to prove that a certain loop will yield a certain result or that it will never change the state of a program in a certain way.
When invariant is used to predicate a loop logic its called loop invariant. It can be used outside loops, but for loops it is really important, because you often have a lot of possibilities, or an infinite number of possibilities.
Notice that i use the word "predicate" the logic of a computer software, and not prove. And that's because while in math invariant can be used as a proof, it can never prove that the computer software when executed will yield what is expected, due to the fact that the software is executed on top of many abstractions, that can never be proved that they will yield what is expected (think of the hardware abstraction for example).
Finally while theoretically and rigorously predicting software logic is only important for high critical applications like Medical, and Military ones. Invariant can still be used to aid the typical programmer when debugging. It can be used to know where at a certain location The program failed because it has failed to maintain a certain invariant - many of us use it anyway without giving a thought about it.
Class Invariant
Class Invariant is a condition which should be always true before and after calling relevant function
For example balanced tree has an Invariant which is called isBalanced. When you modify your tree through some methods (e.g. addNode, removeNode...) - isBalanced should be always true before and after modifying the tree
Following on from what it is, invariants are quite useful in writing clean code, since knowing conceptually what invariants should be present in your code allows you to easily decide how to organize your code to reach those aims. As mentioned ealier, they're also useful in debugging, as checking to see if the invariant's being maintained is often a good way of seeing if whatever manipulation you're attempting to perform is actually doing what you want it to.
It's typically a quantity that does not change under certain mathematical operations.
An example is a scalar, which does not change under rotations. In magnetic resonance imaging, for example, it is useful to characterize a tissue property by a rotational invariant, because then its estimation ideally does not depend on the orientation of the body in the scanner.
The ADT invariant specifes relationships
among the data fields (instance variables)
that must always be true before and after
the execution of any instance method.
There is an excellent example of an invariant and why it matters in the book Java Concurrency in Practice.
Although Java-centric, the example describes some code that is responsible for calculating the factors of a provided integer. The example code attempts to cache the last number provided, and the factors that were calculated to improve performance. In this scenario there is an invariant that was not accounted for in the example code which has left the code susceptible to race conditions in a concurrent scenario.