Problem with spline method = 'monoH.FC'' - function

I am interested in using the monotone spline, but I get an error when R tries to use it. I am using R 2.12.0, and the method 'monoH.FC' says that it has been supported since 2.8.0
Reproducible example (same result for more complicated (x,y) relationships)
x<-1:2
y<-1:2
spline(x,y,method="monoH.FC")
Error in spline(x, y, method = "monoH.FC") : invalid interpolation method
What I have tried
?spline returns:
...
Usage:
...
spline(x, y = NULL, n = 3*length(x), method = "fmm",
xmin = min(x), xmax = max(x), xout, ties = mean)
...
Arguments:
method: specifies the type of spline to be used. Possible values are
‘"fmm"’, ‘"natural"’, ‘"periodic"’ and ‘"monoH.FC"’.
...
But the spline function itself indicates that the 'monoH.FC' method is not supported:
...
method <- pmatch(method, c("periodic", "natural", "fmm"))
if (is.na(method))
stop("invalid interpolation method")
...
Question
How can I use method = 'monoH.FC' with spline?

Use splinefun; it supports method=monoH.FC.
The last example in ?spline shows you how to do it.
## An example of monotone interpolation
n <- 20
set.seed(11)
x. <- sort(runif(n)) ; y. <- cumsum(abs(rnorm(n)))
plot(x.,y.)
curve(splinefun(x.,y.)(x), add=TRUE, col=2, n=1001)
curve(splinefun(x.,y., method="mono")(x), add=TRUE, col=3, n=1001)
legend("topleft", paste("splinefun( \"", c("fmm", "monoH.CS"), "\" )", sep=''),
col=2:3, lty=1)

Related

Missing arguments in a nested function

I follow a python course on finance about portfolio theory. I have to create a function with a nested function in it.
My problem is I have a error message of "neg_sharpe_ratio() missing 2 required positional arguments: 'er' and 'cov'" whereas to my mind 'er' and 'cov' are already defined in my function msr below. So I understand how they are missing.
from scipy.optimize import minimize
def msr(riskfree_rate, er, cov):
n= er.shape[0]
init_guess= np.repeat(1/n, n)
bounds=((0.00, 1.0),)*n
weights_sum_to_1 = {
'type' :'eq' , #
'fun' : lambda weights: np.sum(weights) - 1 ##
}
def neg_sharpe_ratio(weights,riskfree_rate, er, cov):
r = erk.portfolio_return(weights, er)
vol = erk.portfolio_vol(weights,cov)
return -(r-riskfree_rate)/vol
results = minimize( neg_sharpe_ratio, init_guess,
args=(cov,), method="SLSQP",
options={'disp': False},
constraints=( weights_sum_to_1),
bounds=bounds
)
return results.x
TypeError: neg_sharpe_ratio() missing 2 required positional arguments: 'er' and 'cov'
The function neg_sharpe_ratio is able to reference any of the variables passed in and made by the function msr without needing those same variables passed into it itself. Therefore you should be able to remove the paramters riskfree_rate, er, and cov from the neq_sharpe_ratio function definition and have it work, as those variables are passed into its parent function, leaving you with:
def neg_sharpe_ratio(weights):
For those who might be interested, I find my mistake..
Indeed, I forgot to define correctly the arguments of my function neg_share_ratio in my function minimize.
Here is the code amended:
from scipy.optimize import minimize
def msr(riskfree_rate, er, cov):
n= er.shape[0]
init_guess= np.repeat(1/n, n)
bounds=((0.00, 1.0),)*n
weights_sum_to_1 = {
'type' :'eq' , #
'fun' : lambda weights: np.sum(weights) - 1 ##
}
def neg_sharpe_ratio(weights,riskfree_rate, er, cov):
r = erk.portfolio_return(weights, er)
vol = erk.portfolio_vol(weights,cov)
return -(r-riskfree_rate)/vol
results = minimize( neg_sharpe_ratio, init_guess,
args=(weights,riskfree_rate,er,cov), method="SLSQP",
options={'disp': False},
constraints=( weights_sum_to_1),
bounds=bounds
)
return results.x code here

More arguments required when I run my python/pyspark function

I have a function which I have defined as follows, you can see that it clearly requires 7 arguments;
def calc_z(w,S,var,a1,a2,yt1,yt2):
mu = w*S
sigma = mt.sqrt(var)
z = np.random.normal(mu,sigma)
u = [a1,a2,z]
yt = [yt1,yt2,1]
thetaset = np.random.rand(len(u))
m = [i for i in range(len(u))]
max_iter = 30
#Calculate E-step
for i in range(max_iter):
print 'Iteration:', i
print 'z:', z
print 'thetaset', thetaset
devLz = eq6(var,w,S,z,yt,u,thetaset,m)
dev2Lz2 = eq9(var,thetaset,u)
#Calculate M-Step
z = z - (devLz / dev2Lz2)
w = lambdaw * z
for i in range(len(thetaset)):
devLTheta = eq7(yt,u,thetaset,lambdatheta)
dev2LTheta2 = eq10(thetaset,u,lambdatheta)
thetaset = thetaset - (devLTheta / dev2LTheta2)
return z
I am using pyspark so I convert this to a udf
calc_z_udf = udf(calc_z,FloatType())
and then run it as follows (where I am clearly passing in 7 arguments - Or am I going mad!?);
data = data.withColumn('z', calc_z_udf(data['w'],data['Org_Depth_Diff_S'],data['var'],data['proximity_rank_a1'],data['cotravel_count_a2'],data['cotravel_yt1'],data['proximity_yt2']))
When I run this however I am getting an error which states:
TypeError: calc_z() takes exactly 7 arguments (6 given)
Could anyone help me with why this might be as it is clear that when I am running the function I am infact passing in 7 arguments and not 6 as the error states?
I am not sure its is the reason no need to send column objects you can just pass strings:
data = data.withColumn('z', calc_z_udf('w', 'Org_Depth_Diff_S','var', 'proximity_rank_a1', 'cotravel_count_a2', 'cotravel_yt1', 'proximity_yt2'))

Nonlinear fits in Octave

currently I'm using nonlin_curvefit function from GNU Octave's 'optim' package to fit data with . But this time I did also need the uncertainty of the returned parameters to determine the quality of the fit. After reading through the documentation I tied using the function curvefit_stat.
However whenever I alwas get errors using this functiona and I can't make any sense of the error message. I'm using Octave 4.2.2 from Ubuntu 18.04's default repository.
Standalone minimal example and error messages below. Startparameters init_cvg usually produces good result, while using init_dvg usually results in poor fits:
1;
x = linspace(-2*pi, 2*pi, 600);
ydata = 3.4*sin(1.6*x) + rand(size(x))*1.3;
f = #(p, x) p(1)*sin(p(2).*x);
function y = testfun(p ,x)
y = p(1).*sin(p(2).*x);
endfunction
init_cvg = [1; 1.1];
init_dvg = [1; 1.0];
[pc, mod_valc, cvgc, outpc] = nonlin_curvefit(f, init_cvg, x, ydata);
[pd, mod_vald, cvgd, outpd] = nonlin_curvefit(f, init_dvg, x, ydata);
hold off
plot(x, yd, "b.");
hold on;
plot(x, mod_valc, "r-");
plot(x, mod_vald, "color", [0.9 0.4 0.3]);
settings = optimset("ret_covp", true);
covpc = curvefit_stat(f, pc, x, ydata, settings);
covpd = curvefit_stat(f, pd, x, ydata, settings);
puts("sqrt(diag(covpc))")
sqrt(diag(covpc))
puts("sqrt(diag(covpd))")
sqrt(diag(covpd))
The first error message occurs when I use f as a model function, the second occurs when I use testfun instead:
>> curvefit_stat_TEST
error: label of objective function must be specified
error: called from
__residmin_stat__ at line 566 column 7
curvefit_stat at line 56 column 7
curvefit_stat_TEST at line 25 column 7
>> curvefit_stat_TEST
error: 'p' undefined near line 8 column 7
error: called from
testfun at line 8 column 5
curvefit_stat_TEST at line 25 column 7
>>
Could somebody confirm this error ?
I would appreciate any help.
I found the problem.
I needed to add "abjf_type", "wls" as arguments to optimset.

How to profile methods in Scala?

What is a standard way of profiling Scala method calls?
What I need are hooks around a method, using which I can use to start and stop Timers.
In Java I use aspect programming, aspectJ, to define the methods to be profiled and inject bytecode to achieve the same.
Is there a more natural way in Scala, where I can define a bunch of functions to be called before and after a function without losing any static typing in the process?
Do you want to do this without changing the code that you want to measure timings for? If you don't mind changing the code, then you could do something like this:
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
// Now wrap your method calls, for example change this...
val result = 1 to 1000 sum
// ... into this
val result = time { 1 to 1000 sum }
In addition to Jesper's answer, you can automatically wrap method invocations in the REPL:
scala> def time[R](block: => R): R = {
| val t0 = System.nanoTime()
| val result = block
| println("Elapsed time: " + (System.nanoTime - t0) + "ns")
| result
| }
time: [R](block: => R)R
Now - let's wrap anything in this
scala> :wrap time
wrap: no such command. Type :help for help.
OK - we need to be in power mode
scala> :power
** Power User mode enabled - BEEP BOOP SPIZ **
** :phase has been set to 'typer'. **
** scala.tools.nsc._ has been imported **
** global._ and definitions._ also imported **
** Try :help, vals.<tab>, power.<tab> **
Wrap away
scala> :wrap time
Set wrapper to 'time'
scala> BigDecimal("1.456")
Elapsed time: 950874ns
Elapsed time: 870589ns
Elapsed time: 902654ns
Elapsed time: 898372ns
Elapsed time: 1690250ns
res0: scala.math.BigDecimal = 1.456
I have no idea why that printed stuff out 5 times
Update as of 2.12.2:
scala> :pa
// Entering paste mode (ctrl-D to finish)
package wrappers { object wrap { def apply[A](a: => A): A = { println("running...") ; a } }}
// Exiting paste mode, now interpreting.
scala> $intp.setExecutionWrapper("wrappers.wrap")
scala> 42
running...
res2: Int = 42
This what I use:
import System.nanoTime
def profile[R](code: => R, t: Long = nanoTime) = (code, nanoTime - t)
// usage:
val (result, time) = profile {
/* block of code to be profiled*/
}
val (result2, time2) = profile methodToBeProfiled(foo)
There are three benchmarking libraries for Scala that you can avail of.
Since the URLs on the linked site are likely to change, I am pasting the relevant content below.
SPerformance - Performance Testing framework aimed at automagically comparing performance tests and working inside Simple Build Tool.
scala-benchmarking-template - SBT template project for creating Scala (micro-)benchmarks based on Caliper.
Metrics - Capturing JVM- and application-level metrics. So you know what's going on
testing.Benchmark might be useful.
scala> def testMethod {Thread.sleep(100)}
testMethod: Unit
scala> object Test extends testing.Benchmark {
| def run = testMethod
| }
defined module Test
scala> Test.main(Array("5"))
$line16.$read$$iw$$iw$Test$ 100 100 100 100 100
I use a technique that's easy to move around in code blocks. The crux is that the same exact line starts and ends the timer - so it is really a simple copy and paste. The other nice thing is that you get to define what the timing means to you as a string, all in that same line.
Example usage:
Timelog("timer name/description")
//code to time
Timelog("timer name/description")
The code:
object Timelog {
val timers = scala.collection.mutable.Map.empty[String, Long]
//
// Usage: call once to start the timer, and once to stop it, using the same timer name parameter
//
def timer(timerName:String) = {
if (timers contains timerName) {
val output = s"$timerName took ${(System.nanoTime() - timers(timerName)) / 1000 / 1000} milliseconds"
println(output) // or log, or send off to some performance db for analytics
}
else timers(timerName) = System.nanoTime()
}
Pros:
no need to wrap code as a block or manipulate within lines
can easily move the start and end of the timer among code lines when being exploratory
Cons:
less shiny for utterly functional code
obviously this object leaks map entries if you do not "close" timers,
e.g. if your code doesn't get to the second invocation for a given timer start.
ScalaMeter is a nice library to perform benchmarking in Scala
Below is a simple example
import org.scalameter._
def sumSegment(i: Long, j: Long): Long = (i to j) sum
val (a, b) = (1, 1000000000)
val execution_time = measure { sumSegment(a, b) }
If you execute above code snippet in Scala Worksheet you get the running time in milliseconds
execution_time: org.scalameter.Quantity[Double] = 0.260325 ms
The recommended approach to benchmarking Scala code is via sbt-jmh
"Trust no one, bench everything." - sbt plugin for JMH (Java
Microbenchmark Harness)
This approach is taken by many of the major Scala projects, for example,
Scala programming language itself
Dotty (Scala 3)
cats library for functional programming
Metals language server for IDEs
Simple wrapper timer based on System.nanoTime is not a reliable method of benchmarking:
System.nanoTime is as bad as String.intern now: you can use it,
but use it wisely. The latency, granularity, and scalability effects
introduced by timers may and will affect your measurements if done
without proper rigor. This is one of the many reasons why
System.nanoTime should be abstracted from the users by benchmarking
frameworks
Furthermore, considerations such as JIT warmup, garbage collection, system-wide events, etc. might introduce unpredictability into measurements:
Tons of effects need to be mitigated, including warmup, dead code
elimination, forking, etc. Luckily, JMH already takes care of many
things, and has bindings for both Java and Scala.
Based on Travis Brown's answer here is an example of how to setup JMH benchmark for Scala
Add jmh to project/plugins.sbt
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.3.7")
Enable jmh plugin in build.sbt
enablePlugins(JmhPlugin)
Add to src/main/scala/bench/VectorAppendVsListPreppendAndReverse.scala
package bench
import org.openjdk.jmh.annotations._
#State(Scope.Benchmark)
#BenchmarkMode(Array(Mode.AverageTime))
class VectorAppendVsListPreppendAndReverse {
val size = 1_000_000
val input = 1 to size
#Benchmark def vectorAppend: Vector[Int] =
input.foldLeft(Vector.empty[Int])({ case (acc, next) => acc.appended(next)})
#Benchmark def listPrependAndReverse: List[Int] =
input.foldLeft(List.empty[Int])({ case (acc, next) => acc.prepended(next)}).reverse
}
Execute benchmark with
sbt "jmh:run -i 10 -wi 10 -f 2 -t 1 bench.VectorAppendVsListPreppendAndReverse"
The results are
Benchmark Mode Cnt Score Error Units
VectorAppendVsListPreppendAndReverse.listPrependAndReverse avgt 20 0.024 ± 0.001 s/op
VectorAppendVsListPreppendAndReverse.vectorAppend avgt 20 0.130 ± 0.003 s/op
which seems to indicate prepending to a List and then reversing it at the end is order of magnitude faster than keep appending to a Vector.
I took the solution from Jesper and added some aggregation to it on multiple run of the same code
def time[R](block: => R) = {
def print_result(s: String, ns: Long) = {
val formatter = java.text.NumberFormat.getIntegerInstance
println("%-16s".format(s) + formatter.format(ns) + " ns")
}
var t0 = System.nanoTime()
var result = block // call-by-name
var t1 = System.nanoTime()
print_result("First Run", (t1 - t0))
var lst = for (i <- 1 to 10) yield {
t0 = System.nanoTime()
result = block // call-by-name
t1 = System.nanoTime()
print_result("Run #" + i, (t1 - t0))
(t1 - t0).toLong
}
print_result("Max", lst.max)
print_result("Min", lst.min)
print_result("Avg", (lst.sum / lst.length))
}
Suppose you want to time two functions counter_new and counter_old, the following is the usage:
scala> time {counter_new(lst)}
First Run 2,963,261,456 ns
Run #1 1,486,928,576 ns
Run #2 1,321,499,030 ns
Run #3 1,461,277,950 ns
Run #4 1,299,298,316 ns
Run #5 1,459,163,587 ns
Run #6 1,318,305,378 ns
Run #7 1,473,063,405 ns
Run #8 1,482,330,042 ns
Run #9 1,318,320,459 ns
Run #10 1,453,722,468 ns
Max 1,486,928,576 ns
Min 1,299,298,316 ns
Avg 1,407,390,921 ns
scala> time {counter_old(lst)}
First Run 444,795,051 ns
Run #1 1,455,528,106 ns
Run #2 586,305,699 ns
Run #3 2,085,802,554 ns
Run #4 579,028,408 ns
Run #5 582,701,806 ns
Run #6 403,933,518 ns
Run #7 562,429,973 ns
Run #8 572,927,876 ns
Run #9 570,280,691 ns
Run #10 580,869,246 ns
Max 2,085,802,554 ns
Min 403,933,518 ns
Avg 797,980,787 ns
Hopefully this is helpful
I like the simplicity of #wrick's answer, but also wanted:
the profiler handles looping (for consistency and convenience)
more accurate timing (using nanoTime)
time per iteration (not total time of all iterations)
just return ns/iteration - not a tuple
This is achieved here:
def profile[R] (repeat :Int)(code: => R, t: Long = System.nanoTime) = {
(1 to repeat).foreach(i => code)
(System.nanoTime - t)/repeat
}
For even more accuracy, a simple modification allows a JVM Hotspot warmup loop (not timed) for timing small snippets:
def profile[R] (repeat :Int)(code: => R) = {
(1 to 10000).foreach(i => code) // warmup
val start = System.nanoTime
(1 to repeat).foreach(i => code)
(System.nanoTime - start)/repeat
}
You can use System.currentTimeMillis:
def time[R](block: => R): R = {
val t0 = System.currentTimeMillis()
val result = block // call-by-name
val t1 = System.currentTimeMillis()
println("Elapsed time: " + (t1 - t0) + "ms")
result
}
Usage:
time{
//execute somethings here, like methods, or some codes.
}
nanoTime will show you ns, so it will hard to see. So I suggest that you can use currentTimeMillis instead of it.
While standing on the shoulders of giants...
A solid 3rd-party library would be more ideal, but if you need something quick and std-library based, following variant provides:
Repetitions
Last result wins for multiple repetitions
Total time and average time for multiple repetitions
Removes the need for time/instant provider as a param
.
import scala.concurrent.duration._
import scala.language.{postfixOps, implicitConversions}
package object profile {
def profile[R](code: => R): R = profileR(1)(code)
def profileR[R](repeat: Int)(code: => R): R = {
require(repeat > 0, "Profile: at least 1 repetition required")
val start = Deadline.now
val result = (1 until repeat).foldLeft(code) { (_: R, _: Int) => code }
val end = Deadline.now
val elapsed = ((end - start) / repeat)
if (repeat > 1) {
println(s"Elapsed time: $elapsed averaged over $repeat repetitions; Total elapsed time")
val totalElapsed = (end - start)
println(s"Total elapsed time: $totalElapsed")
}
else println(s"Elapsed time: $elapsed")
result
}
}
Also worth noting you can use the Duration.toCoarsest method to convert to the biggest time unit possible, although I am not sure how friendly this is with minor time difference between runs e.g.
Welcome to Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.concurrent.duration._
import scala.concurrent.duration._
scala> import scala.language.{postfixOps, implicitConversions}
import scala.language.{postfixOps, implicitConversions}
scala> 1000.millis
res0: scala.concurrent.duration.FiniteDuration = 1000 milliseconds
scala> 1000.millis.toCoarsest
res1: scala.concurrent.duration.Duration = 1 second
scala> 1001.millis.toCoarsest
res2: scala.concurrent.duration.Duration = 1001 milliseconds
scala>
adding on => method with name & seconds
profile[R](block: => R,methodName : String): R = {
val n = System.nanoTime()
val result = block
val n1 = System.nanoTime()
println(s"Elapsed time: ${TimeUnit.MILLISECONDS.convert(n1 - n,TimeUnit.NANOSECONDS)}ms - MethodName: ${methodName}")
result
}

Wrapper to FOR loops with progress bar

I like to use a progress bar while running slow for loops. This could be done easily with several helpers, but I do like the tkProgressBar from tcltk package.
A small example:
pb <- tkProgressBar(title = "Working hard:", min = 0, max = length(urls), width = 300)
for (i in 1:300) {
# DO SOMETHING
Sys.sleep(0.5)
setTkProgressBar(pb, i, label=paste( round(i/length(urls)*100, 0), "% ready!"))
}
close(pb)
And I would like to set up a small function to store in my .Rprofile named to forp (as: for loop with progressbar), to call just like for but with auto added progress bar - but unfortunately have no idea how to implement and grab the expr part of the loop function. I had some experiments with do.call but without success :(
Imaginary working example (which acts like a for loop but creates a TkProgressBar and auto updates it in each iteration):
forp (i in 1:10) {
#do something
}
UPDATE: I think the core of the question is how to write a function which not only has parameters in the parentheses after the function (like: foo(bar)), but also can handle expr specified after the closing parentheses, like: foo(bar) expr.
BOUNTY OFFER: would go to any answer that could modify my suggested function to work like the syntax of basic for loops. E.g. instead of
> forp(1:1000, {
+ a<-i
+ })
> a
[1] 1000
it could be called like:
> forp(1:1000) {
+ a<-i
+ }
> a
[1] 1000
Just to clarify the task again: how could we grab the { expression } part of a function call? I am afraid that this is not possible, but will leave on the bounty for a few days for the pros :)
Given the other answers supplied, I suspect that it is impossible tough to do in exactly the way you specify.
However, I believe there is a way of getting very close, if you use the plyr package creatively. The trick is to use l_ply which takes a list as input and creates no output.
The only real differences between this solution and your specification is that in a for loop you can directly modify variables in the same environment. Using l_ply you need to send a function, so you will have to be more careful if you want to modify stuff in the parent environment.
Try the following:
library(plyr)
forp <- function(i, .fun){
l_ply(i, .fun, .progress="tk")
}
a <- 0
forp(1:100, function(i){
Sys.sleep(0.01)
a<<-a+i
})
print(a)
[1] 5050
This creates a progress bar and modifies the value of a in the global environment.
EDIT.
For the avoidance of doubt: The argument .fun will always be a function with a single argument, e.g. .fun=function(i){...}.
For example:
for(i in 1:10){expr} is equivalent to forp(1:10, function(i){expr})
In other words:
i is the looping parameter of the loop
.fun is a function with a single argument i
My solution is very similar to Andrie's except it uses base R, and I second his comments on the need to wrap what you want to do in a function and the subsequent need to use <<- to modify stuff in a higher environment.
Here's a function that does nothing, and does it slowly:
myfun <- function(x, text) {
Sys.sleep(0.2)
cat("running ",x, " with text of '", text, "'\n", sep="")
x
}
Here's my forp function. Note that regardless of what we're actually looping over, it instead loops over the sequence 1:n instead and get the right term of what we actually want within the loop. plyr does this automatically.
library(tcltk)
forp <- function(x, FUN, ...) {
n <- length(x)
pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
out <- vector("list", n)
for (i in seq_len(n)) {
out[[i]] <- FUN(x[i], ...)
setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
}
close(pb)
invisible(out)
}
And here's how both for and forp might be used, if all we want to do is call myfun:
x <- LETTERS[1:5]
for(xi in x) myfun(xi, "hi")
forp(x, myfun, text="hi")
And here's how they might be used if we want to modify something along the way.
out <- "result:"
for(xi in x) {
out <- paste(out, myfun(xi, "hi"))
}
out <- "result:"
forp(x, function(xi) {
out <<- paste(out, myfun(xi, "hi"))
})
For both versions the result is
> out
[1] "result: A B C D E"
EDIT: After seeing your (daroczig's) solution, I have another idea that might not be quite so unwieldy, which is to evaluate the expression in the parent frame. This makes it easier to allow for values other than i (now specified with the index argument), though as of right now I don't think it handles a function as the expression, though just to drop in instead a for loop that shouldn't matter.
forp2 <- function(index, x, expr) {
expr <- substitute(expr)
n <- length(x)
pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
for (i in seq_len(n)) {
assign(index, x[i], envir=parent.frame())
eval(expr, envir=parent.frame())
setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
}
close(pb)
}
The code to run my example from above would be
out <- "result:"
forp2("xi", LETTERS[1:5], {
out <- paste(out, myfun(xi, "hi"))
})
and the result is the same.
ANOTHER EDIT, based on the additional information in your bounty offer:
The syntax forX(1:1000) %doX$ { expression } is possible; that's what the foreach package does. I'm too lazy right now to build it off of your solution, but building off mine, it could look like this:
`%doX%` <- function(index, expr) {
x <- index[[1]]
index <- names(index)
expr <- substitute(expr)
n <- length(x)
pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
for (i in seq_len(n)) {
assign(index, x[i], envir=parent.frame())
eval(expr, envir=parent.frame())
setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
}
close(pb)
invisible(out)
}
forX <- function(...) {
a <- list(...)
if(length(a)!=1) {
stop("index must have only one element")
}
a
}
Then the use syntax is this, and the result is the same as above.
out <- "result:"
forX(xi=LETTERS[1:5]) %doX% {
out <- paste(out, myfun(xi, "hi"))
}
out
If you use the plyr family of commands instead of a for loop (generally a good idea if possible), you get as an added bonus a whole system of progress bars.
R.utils also has some progress bars built into it, and there exist instructions for using them in for loops.
R's syntax doesn't let you do exactly what you want, ie:
forp (i in 1:10) {
#do something
}
But what you can do is create some kind of iterator object and loop using while():
while(nextStep(m)){sleep.milli(20)}
Now you have the problem of what m is and how you make nextStep(m) have side effects on m in order to make it return FALSE at the end of your loop. I've written simple iterators that do this, as well as MCMC iterators that let you define and test for a burnin and thinning period within your loop.
Recently at the R User conference I saw someone define a 'do' function that then worked as an operator, something like:
do(100) %*% foo()
but I'm not sure that was the exact syntax and I'm not sure how to implement it or who it was put that up... Perhaps someone else can remember!
What you're hoping for, I think would be something that looks like
body(for)<- as.call(c(as.name('{'),expression([your_updatebar], body(for))))
And yep, the problem is that "for" is not a function, or at least not one whose "body" is accessible. You could, I suppose, create a "forp" function that takes as arguments 1) a string to be turned into the loop counter, e.g., " ( i in seq(1,101,5) )" , and 2) the body of your intended loop, e.g., y[i]<- foo[i]^2 ; points(foo[i],y[i], and then jump thru some getcallparse magic to execute the actual for loop.
Then , in pseudocode (not close to actual R code, but I think you see what should happen)
forp<-function(indexer,loopbody) {
pseudoparse( c("for (", indexer, ") {" ,loopbody,"}")
}
The problem is that the for-loop in R is treated special. A normal function is not allowed to look like that. Some small tweaks can make it loop pretty close though. And as #Aaron mentioned, the foreach package's %dopar% paradigm seems like the best fit. Here's my version of how it could work:
`%doprogress%` <- function(forExpr, bodyExpr) {
forExpr <- substitute(forExpr)
bodyExpr <- substitute(bodyExpr)
idxName <- names(forExpr)[[2]]
vals <- eval(forExpr[[2]])
e <- new.env(parent=parent.frame())
pb <- tkProgressBar(title = "Working hard:", min = 0, max = length(vals), width = 300)
for (i in seq_along(vals)) {
e[[idxName]] <- vals[[i]]
eval(bodyExpr, e)
setTkProgressBar(pb, i, label=paste( round(i/length(vals)*100, 0), "% ready!"))
}
}
# Example usage:
foreach(x = runif(10)) %doprogress% {
# do something
if (x < 0.5) cat("small\n") else cat("big")
}
As you can see, you have to type x = 1:10 instead of x in 1:10, and the infix operator %<whatever>% is needed to get hold of the looping construct and the loop body. I currently don't do any error checking (to avoid muddling the code). You should check the name of the function ("foreach"), the number of arguments to it (1) and that you actually get a valid loop variable ("x") and not an empty string.
I propose hereby two solutions that use the standard for syntax, both are using the great package progress from Gábor Csárdi and Rich FitzJohn
1) we can override temporarily or locally the for function to wrap around base::for and support progress bars.
2) we can define the unused for<-, and wrap around base::for using syntax pb -> for(it in seq) {exp} where pb is progress bar built with progress::progress_bar$new().
Both solutions behave as standard for calls :
The values changed at the previous iteration are available
on error the modified variables will have the value they had just before the error
I packaged my solution and will demo them below then will go through the code
Usage
#devtools::install_github("moodymudskipper/pbfor")
library(pbfor)
Using pb_for()
By default pb_for() will override the for function for one run only.
pb_for()
for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
Using parameters from progress::progress_bar$new() :
pb_for(format = "Working hard: [:bar] :percent :elapsed",
callback = function(x) message("Were'd done!"))
for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
Using for<-
The only restriction compared to a standard for call is that the first argument must exist and can't be NULL.
i <- NA
progress_bar$new() -> for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
We can define a custom progress bar, and maybe define it conveniently in an initialisation script or in one's R profile.
pb <- progress_bar$new(format = "Working hard: [:bar] :percent :elapsed",
callback = function(x) ("Were'd done!"))
pb -> for (i in 1:10) {
# DO SOMETHING
Sys.sleep(0.5)
}
For nested progress bars we can use the following trick :
pbi <- progress_bar$new(format = "i: [:bar] :percent\n\n")
pbj <- progress_bar$new(format = "j: [:bar] :percent ")
i <- NA
j <- NA
pbi -> for (i in 1:10) {
pbj -> for (j in 1:10) {
# DO SOMETHING
Sys.sleep(0.1)
}
}
note that due to operator precedence the only way to call for<- and benefit from the syntax of for calls is to use the left to right arrow ´->´.
how they work
pb_for()
pb_for() creates a for function object in its parent environment, then the new for :
sets up a progress bar
modifies the loop content
adds a `*pb*`$tick() at the end of the loop content expression
feeds it back to base::`for` in a clean environment
assigns on exit all modified or created variables to the parent environment.
removes itself if once is TRUE (the default)
It's generally sensitive to override an operator, but it cleans after itself and won't affect the global environment if used in a function so I think it's safe enough to use.
for<-
This approach :
doesn't override for
allows the use of progress bar templates
has an arguably more intuitive api
It has a few drawbacks however:
its first argument must exist, which is the case for all assignment functions (fun<-).
it does some memory magic to find the name of its first argument as it's not easily done with assignment functions, this might have a performance cost, and I'm not 100% sure about the robustness
we need the package pryr
What it does :
find the name of the first argument, using a helper function
clone the progress bar input
edit it to account for the number of iterations of the loop (the length of the second argument of for<-
After this it's similar to what is described for pb_for() in the section above.
The code
pb_for()
pb_for <-
function(
# all args of progress::progress_bar$new() except `total` which needs to be
# infered from the 2nd argument of the `for` call, and `stream` which is
# deprecated
format = "[:bar] :percent",
width = options("width")[[1]] - 2,
complete = "=",
incomplete = "-",
current =">",
callback = invisible, # doc doesn't give default but this seems to work ok
clear = TRUE,
show_after = .2,
force = FALSE,
# The only arg not forwarded to progress::progress_bar$new()
# By default `for` will self detruct after being called
once = TRUE) {
# create the function that will replace `for`
f <- function(it, seq, expr){
# to avoid notes at CMD check
`*pb*` <- IT <- SEQ <- EXPR <- NULL
# forward all arguments to progress::progress_bar$new() and add
# a `total` argument computed from `seq` argument
pb <- progress::progress_bar$new(
format = format, width = width, complete = complete,
incomplete = incomplete, current = current,
callback = callback,
clear = clear, show_after = show_after, force = force,
total = length(seq))
# using on.exit allows us to self destruct `for` if relevant even if
# the call fails.
# It also allows us to send to the local environment the changed/created
# variables in their last state, even if the call fails (like standard for)
on.exit({
vars <- setdiff(ls(env), c("*pb*"))
list2env(mget(vars,envir = env), envir = parent.frame())
if(once) rm(`for`,envir = parent.frame())
})
# we build a regular `for` loop call with an updated loop code including
# progress bar.
# it is executed in a dedicated environment and the progress bar is given
# a name unlikely to conflict
env <- new.env(parent = parent.frame())
env$`*pb*` <- pb
eval(substitute(
env = list(IT = substitute(it), SEQ = substitute(seq), EXPR = substitute(expr)),
base::`for`(IT, SEQ,{
EXPR
`*pb*`$tick()
})), envir = env)
}
# override `for` in the parent frame
assign("for", value = f,envir = parent.frame())
}
for<- (and fetch_name())
`for<-` <-
function(it, seq, expr, value){
# to avoid notes at CMD check
`*pb*` <- IT <- SEQ <- EXPR <- NULL
# the symbol fed to `it` is unknown, R uses `*tmp*` for assignment functions
# so we go get it by inspecting the memory addresses
it_chr <- fetch_name(it)
it_sym <-as.symbol(it_chr)
# complete the progress bar with the `total` parameter
# we need to clone it because progress bars are environments and updated
# by reference
pb <- value$clone()
pb$.__enclos_env__$private$total <- length(seq)
# when the script ends, even with a bug, the values that have been changed
# are written to the parent frame
on.exit({
vars <- setdiff(ls(env), c("*pb*"))
list2env(mget(vars, env),envir = parent.frame())
})
# computations are operated in a separate environment so we don't pollute it
# with it, seq, expr, value, we need the progress bar so we name it `*pb*`
# unlikely to conflict by accident
env <- new.env(parent = parent.frame())
env$`*pb*` <- pb
eval(substitute(
env = list(IT = it_sym, SEQ = substitute(seq), EXPR = substitute(expr)),
base::`for`(IT, SEQ,{
EXPR
`*pb*`$tick()
})), envir = env)
# because of the `fun<-` syntax we need to return the modified first argument
invisible(get(it_chr,envir = env))
}
helpers:
fetch_name <- function(x,env = parent.frame(2)) {
all_addresses <- sapply(ls(env), address2, env)
all_addresses <- all_addresses[names(all_addresses) != "*tmp*"]
all_addresses_short <- gsub("(^|<)[0x]*(.*?)(>|$)","\\2",all_addresses)
x_address <- tracemem(x)
untracemem(x)
x_address_short <- tolower(gsub("(^|<)[0x]*(.*?)(>|$)","\\2",x_address))
ind <- match(x_address_short, all_addresses_short)
x_name <- names(all_addresses)[ind]
x_name
}
address2 <- getFromNamespace("address2", "pryr")
Thanks for everyone for your kind answers! As none of those fit my wacky needs, I started to steal some pieces of the given answers and made up a quite customized version:
forp <- function(iis, .fun) {
.fun <- paste(deparse(substitute(.fun)), collapse='\n')
.fun <- gsub(' <- ', ' <<- ', .fun, fixed=TRUE)
.fun <- paste(.fun, 'index.current <- 1 + index.current; setTkProgressBar(pb, index.current, label=paste( round(index.current/index.max*100, 0), "% ready!"))', sep='\n')
ifelse(is.numeric(iis), index.max <- max(iis), index.max <- length(iis))
index.current <- 1
pb <- tkProgressBar(title = "Working hard:", min = 0, max = index.max, width = 300)
for (i in iis) eval(parse(text=paste(.fun)))
close(pb)
}
This is quite lengthy for a simple function like this, but depends only on base (anf of course: tcltk) and has some nice features:
can be used on expressions, not just functions,
you do not have to use <<- in your expressions to update global environment, <- are replaced to <<- in the given expr. Well,this might be annoying for someone.
can be used with non-numeric indexes (see below). That is why the code become so long :)
Usage is similar to for except for you do not have to specify the i in part and you have to use i as index in the loop. Other drawback is that I did not find a way to grab the {...} part specified after a function, so this must be included in the parameters.
Example #1: Basic use
> forp(1:1000, {
+ a<-i
+ })
> a
[1] 1000
Try it to see the neat progress bar on your computer! :)
Example #2: Looping through some characters
> m <- 0
> forp (names(mtcars), {
+ m <- m + mean(mtcars[,i])
+ })
> m
[1] 435.69