Using optim to estimate 3-parameters log-normal by MLE - optim

I need your help. I am having problems to estimate the parameters of the following data in R using optim. I am always getting a error message of the the following type:
Warning: NaNs producedError in optim(par = c(0, 1, -2), nll, method = "L-BFGS-B", upper = c(Inf, :
L-BFGS-B needs finite values of 'fn'
Could you please take a look and tell what i am doing wrong? Thanks
The code I am using is the following:
X <- rlnorm3(n=1000, meanlog = 0, sdlog = 1, threshold = -2)
nll <- function(theta) {
N <- length(X)
m <- theta[1]
s <- theta[2]
a <- theta[3]
e_1 <- log(X - a) - m
e_2 <- log(X - a)
nll <- -0.5 * N * log(2 * pi) - N * log(s) - (t(e_1) %*% e_1) / 2 * s ^ 2 - (t(e_2) %*% e_2)
return(-nll)
}
optim(par=c(0, 1, -2), nll, method="L-BFGS-B", upper = c(Inf, Inf, min(X)))

Related

R - specifying interaction contrasts for aov

How to specificy the contrasts (point estimates, 95CI and p-values) for the between-group differences of the within-group delta changes?
In the example below, I would be interest in the between-groups (group = 1 minus group = 2) of delta changes (time = 3 minus time = 1).
df and model:
demo3 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo3.csv")
## Convert variables to factor
demo3 <- within(demo3, {
group <- factor(group)
time <- factor(time)
id <- factor(id)
})
par(cex = .6)
demo3$time <- as.factor(demo3$time)
demo3.aov <- aov(pulse ~ group * time + Error(id), data = demo3)
summary(demo3.aov)
Neither of these chunks of code achieve my goal, correct?
m2 <- emmeans(demo3.aov, "group", by = "time")
pairs(m2)
m22 <- emmeans(demo3.aov, c("group", "time") )
pairs(m22)
Look at the documentation for emmeans::contrast and in particular the argument interaction. If I understand your question correctly, you might want
summary(contrast(m22, interaction = c("pairwise", "dunnett")),
infer = c(TRUE, TRUE))
which would compute Dunnett-style contrasts for time (each time vs. time1), and compare those for group1 - group2. The summary(..., infer = c(TRUE, TRUE)) part overrides the default that tests but not CIs are shown.
You could also do this in stanges:
time.con <- contrast(m22, "dunnett", by = "group", name = "timediff")
summary(pairs(time.con, by = NULL), infer = c(TRUE, TRUE))
If you truly want just time 3 - time 1, then replace time.con with
time.con1 <- contrast(m22, list(`time3-time1` = c(-1, 0, 1, 0, 0))
(I don't know how many times you have. I assumed 5 in the above.)

How can gauss newton method implemented using armijo line search in python?

We define the sigmoidal function
σ(t) = 1 / (1+e−t)
It has the derivative σ′(t) = σ(t)(1 − σ(t)). The module gauss_newton contains a function generate_data(gamma=0) which generates a data set (ti , αi ) where ti ∈ R and αi ∈ R with
αi = σ(6ti + 1) + εiγ.
for i = 1, . . . , 10. The values εi ∼ N (0, 1) are independently normally distributed and the real value γ ∈ R controls the influence of εi.
(i) Solve the problem min (1/2(∥F(x)∥^2),
with Fi(x) = σ(x1ti + x2) − αi for i = 1,...,10 and γ = 0 using the Gauss Newton algorithm . Iterate until the size of the search direction is sufficiently small, i.e. until ∥∆xk ∥ < δ for some tolerance δ > 0.

Multiple functions inside of a function in R

I'd like to create a function called g which in turn contains three other functions f1 and f2. Each of the two functions f1, f2 returns a data frame. I would like that the function g returns the two dataframe obtained from f1 and f2. Here is the code that I run:
g <- function(n,a,b,c,d,e) {
f1 <- function(n,a,b,c,d,e) {
X <- a*matrix(sample(0:1,n,replace = T),nrow=n,ncol=1)
Y <- (b*c-d)*matrix(sample(1:10,n,replace = T),nrow=n,ncol=1)
Z <- (a*e)*matrix(sample(0:12,n,replace = T),nrow=n,ncol=1)
data1 <- as.data.frame(cbind(X,Y,Z))
colnames(data1) <- c("X","Y","Z")
return(data1)
}
f1(n,a,b,c,d,e)
varpredict <- lm(Y ~ 0 + X + Z, data=f1(n,a,b,c,d,e))$fitted.values
h <- function(){
olsreg <- lm(Y ~ 0 + X + Z, data=f1(n,a,b,c,d,e))
P <- olsreg$residuals^2
return(P)
}
h()
G <- rep(0,n)
f2 <- function(n,a,b){
for (i in 1:n) {
G[i] <- varpredict[i]-a
}
X <- matrix(sample(0:1,n,replace = T),nrow=n,ncol=1)+h()
Y <- b*matrix(sample(1:10,n,replace = T),nrow=n,ncol=1)
Z <- (a*b)*matrix(sample(0:12,n,replace = T),nrow=n,ncol=1)-G
data2 <- as.data.frame(cbind(X,Y,Z))
colnames(data2) <- c("X","Y","Z")
return(data2)
}
f2(n,a,b)
return(list(data1,data2))
}
To run the function g I did this:
n=100
a=0.3
b=0.5
c=0.3
d=-1.32
e=c*d
my_function <- g(n,a,b,c,d,e)
But I received the following error message:
Error in g(n, a, b, c, d, e) : object 'data1' not found
Why am I getting this error?
First off, when you call your functions f1 and f2 you need to store their return value somewhere.
Secondly, it’s unclear why you want f1 inside g: it doesn’t seem to share any state with g, so it can be defined independently alongside g instead of inside it. That said, if it’s only ever used inside g then this is to some extent a matter of style.
Here’s how I’d write your code:
g <- function (n, a, b, c, d, e) {
f1 <- function (n, a, b, c, d, e) {
X <- a * matrix(sample(0 : 1, n, replace = TRUE), nrow = n)
Y <- (b * c - d) * matrix(sample(1 : 10, n, replace = TRUE),nrow = n)
Z <- (a * e) * matrix(sample(0 : 12, n, replace = TRUE), nrow = n)
data.frame(X, Y, Z))
}
f2 <- function(n, a, b) {
G <- varpredict - a
X <- matrix(sample(0 : 1, n, replace = TRUE), nrow = n) + square_res
Y <- b * matrix(sample(1 : 10, n, replace = TRUE), nrow = n)
Z <- (a * b) * matrix(sample(0 : 12, n, replace = TRUE), nrow = n) - G
data.frame(X, Y, Z)
}
model <- lm(Y ~ 0 + X + Z, data = f1(n, a, b, c, d, e))
varpredict <- model$fitted.values
square_res <- model$residuals ^ 2
list(f1(n, a, b, c, d, e), f2(n, a, b))
}
I’ve cleaned up the code a bit and I hope you’ll agree that it is more readable this way — it is also more efficient, since it avoids recomputing the initial model over and over again. Apart from that I’ve followed a few simple rules: use consistent spacing, don’t use gratuitous abbreviations (T instead of TRUE), use the appropriate functions to avoid redundant code (e.g. the creation of the data frames), no unnecessary use of returns.
But fundamentally I still have no good idea what it does since the variable names don’t provide any useful information. The most impactful improvement for readability is therefore the choice of better variable names.

Loop for regression over multiple factors

I am struggling to get a loop to run several regressions and store the coefficients and intercepts. I have a data similar as this:
data <- data.frame(y = rnorm(10), x1 = rnorm(10)*2, ID = c(rep(1,10), rep(2,10)), group = c(rep(3,5), rep(4,5)))
Where ID and group are factors, therefore:
data$ID <- as.factor(data$ID)
data$group <- as.factor(data$group)
So far I tried 2 approaches.
First I did the following:
for (i in unique(data$ID)){
for (j in unique(data$group)){
fit <- glm(y ~ x1, data=data[data$ID == i & data$group == j, ])
}
}
Afterwards I did the following:
myfun <- function(data) {
step(glm(y ~ x1, data = data), trace=0)
}
fcomb <- unique(data[,c("ID","group")])
mod <- list()
for(i in 1:nrow(fcomb)) {
mod <- c(mod,list(myfun(subset(data,ID==fcomb$ID[i] & group==fcomb$group[i]))))
}
In the end I would like to have a dataset in which for each ID and group I would have the intercept and the beta for the effect of x1 in y.
When I performed the second strategy I got something, but the betas and the intercepts are the same (which is totally impossible) and I still don't know how to store the values.
set.seed(1839)
data <- data.frame(
y = rnorm(10),
x1 = rnorm(10) * 2,
ID = c(rep(1, 10), rep(2, 10)),
group = c(rep(3, 5), rep(4, 5))
)
grid <- expand.grid(ID = unique(data$ID), group = unique(data$group))
results <- lapply(1:nrow(grid), function(x) {
lm(y ~ x1, data[data$ID == grid[x, 1] & data$group == grid[x, 2], ])$coef
})
results <- t(do.call(cbind, results))
results <- cbind(grid, results)
results
Returns:
ID group (Intercept) x1
1 1 3 -0.454072247 1.0295731
2 2 3 -0.454072247 1.0295731
3 1 4 0.007800405 -0.1832663
4 2 4 0.007800405 -0.1832663

Tweaking a Function in Python

I am trying to get the following code to do a few more tricks:
class App(Frame):
def __init__(self, master):
Frame.__init__(self, master)
self.grid()
self.create_widgets()
def create_widgets(self):
self.answerLabel = Label(self, text="Output List:")
self.answerLabel.grid(row=2, column=1, sticky=W)
def psiFunction(self):
j = int(self.indexEntry.get())
valueList = list(self.listEntry.get())
x = map(int, valueList)
if x[0] != 0:
x.insert(0, 0)
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
self.answer = Label(self, text=rtn)
self.answer.grid(row=2, column=2, sticky=W)
if __name__ == "__main__":
root = Tk()
In particular, I am trying to get it to calculate len(x) * j - 1 terms, and to work for a variety of parameter values. If you try running it you should find that you get errors for larger parameter values. For example with a list 0,1,2,3,4 and a parameter j=3 we should run through the program and get 0123456789101112. However, I get an error that the last value is 'out of range' if I try to compute it.
I believe it's an issue with my function as defined. It seems the issue with parameters has something to do with the way it ties the parameter to the n value. Consider 0123. It works great if I use 2 as my parameter (called index in the function) but fails if I use 3.
EDIT:
def psi_j(x, j):
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
if r == 0:
rtn.append(j * x[n])
else:
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
print 'n2 =', n2, ': n =', n, ' r =' , r, ' rtn =', rtn
return rtn
For example if we have psi_j(x,2) with x = [0,1,2,3,4] we will be able to get [0,1,2,3,4,5,6,7,8,9,10,11] with an error on 12.
The idea though is that we should be able to calculate that last term. It is the 12th term of our output sequence, and 12 = 3*4+0 => 3*x[4] + 0*(x[n+1]-x[n]). Now, there is no 5th term to calculate so that's definitely an issue but we do not need that term since the second part of the equation is zero. Is there a way to write this into the equation?
If we think about the example data [0, 1, 2, 3] and a j of 3, the problem is that we're trying to get x[4]` in the last iteration.
len(x) * j - 2 for this data is 10
range(0, 10) is 0 through 9.
Manually processing our last iteration, allows us to resolve the code to this.
n = 3 # or 9 / 3
r = 0 # or 9 - 3 * 3
rtn.append(3 * x[3] + 0 * (x[3 + 1] - x[3]))
We have code trying to reach x[3 + 1], which doesn't exist when we only have indices 0 through 3.
To fix this, we could rewrite the code like this.
n = n2 / j
r = n2 - n * j
if r == 0:
rtn.append(j * x[n])
else:
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
If r is 0, then (x[n + 1] - x[n]) is irrelevant.
Please correct me if my math is wrong on that. I can't see a case where n >= len(x) and r != 0, but if that's possible, then my solution is invalid.
Without understanding that the purpose of the function is (is it a kind of filter? or smoothing function?), I prickled it out of the GUI suff and tested it alone:
def psiFunction(j, valueList):
x = map(int, valueList)
if x[0] != 0:
x.insert(0, 0)
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
print "n =", n, "max_n2 =", len(x) * j - 2, "n2 =", n2, "lx =", len(x), "r =", r
val = j * x[n] + r * (x[n + 1] - x[n])
rtn.append(val)
print j * x[n], r * (x[n + 1] - x[n]), val
return rtn
if __name__ == '__main__':
print psiFunction(3, [0, 1, 2, 3, 4])
Calling this module leads to some debugging output and, at the end, the mentionned error message.
Obviously, your x[n + 1] access fails, as n is 4 there, so n + 1 is 5, one too much for accessing the x array, which has length 5 and thus indexes from 0 to 4.
EDIT: Your psi_j() gives me the same behaviour.
Let me continue guessing: Whatever we want to do, we have to ensure that n + 1 stays below len(x). So maybe a
for n2 in range(0, (len(x) - 1) * j):
would be helpful. It only produces the numbers 0..11, but I think this is the only thing which can be expected out of it: the last items only can be
3*3 + 0*(4-3)
3*3 + 1*(4-3)
3*3 + 2*(4-3)
and stop. And this is achieved with the limit I mention here.