Difference between . and 1 in glm - regression

I want to perform a regression and first of all I want to test if my nullmodel is significant.
So if not, I wont be able to perform the stepwise introduction according to AIC.
So I did the following:
m0 <- glm(Y~ 1, data = Data, family = binomial)
summary(m0)
So I have seen on the internet that some people use this code:
model <- glm(Y ~ .,data = Data, family=binomial)
summary(model)
What I want to know is whats the difference between the dot (.) and the 1.
Thank you :)

The Y is the dependent variable the right is the independent variable, the . is short hand for all the other variables
Y ~ val means model Y with val
Y ~ . means the model against all the other variables
Meaning of dot in lm(y~.) in R

Related

Problem with combining ForwardDiff and PyPlot in Julia 1.0.5

In Julia 1.0.5, I have a function f(x::Vector{<:Real}), defined as
f(x::Vector{<:Real}) = (x[1] - 2)^2 + ( x[2] - 1 )^2
The signature is like this, because I would like to use it with the ForwardDiff package, and it works with it just fine. I give the function to ForwardDiff.gradient, and everything works like a charm.
However, I would also like to do some visualizations with PyPlot, using this same function f. Namely, I would like to draw its contour with contourf. For this purpose, I have constructed two vectors X::Vector{<:Real} and Y::Vector{<:Real}, and would like to call the same function f with them to produce the contour.
However, making the call f.([X, Y]) is not broadcasting the vectors as I would like, as I get the error
LoadError: MethodError: no method matching (::getfield(Main, Symbol("#f#1044")))(::Int64)
Closest candidates are:
f(!Matched::Array{#s25,1} where #s25<:Real)
This of course prevents me from using the contourf function, as it needs the values of f on a 2D-grid.
Do I need to define an entirely different f(x::Vector{<:Real}, y::Vector{<:Real}) to be able to plot the contour as I would like, or is there an alternative where I can avoid this?
this problem can be resolved by the power of multiple dispatch:
f(x::Vector{<:Real}) = (x[1] - 2)^2 + ( x[2] - 1 )^2
f(x::Real,y::Real) = f([x,y])
nx = 10
ny = 20
X = rand(nx) #mesh of x points
Y = rand(ny) #mesh of y points
Z = f.(transpose(X),Y) #nx x ny matrix
for the two argument gradient:
two_point_gradient(f,x,y) = ForwardDiff.gradient(f,[x,y])
G = two_point_gradient.(f,transpose(X),Y) #returns a vector of gradients, where G[i..] = gradient(f,X[i..],Y[i...])

Error plotting a function of 2 variables

I am trying to plot the function
f(x, y) = (x – 3).^2 – (y – 2).^2.
x is a vector from 2 to 4, and y is a vector from 1 to 3, both with increments of 0.2. However, I am getting the error:
"Subscript indices must either be real positive integers or logicals".
What do I do to fix this error?
I (think) I see what you are trying to achieve. You are writing your syntax like a mathematical function definition. Matlab is interpreting f as a 2-dimensional data type and trying to assign the value of the expression to data indexed at x,y. The values of x and y are not integers, so Matlab complains.
If you want to plot the output of the function (we'll call it z) as a function of x and y, you need to define the function quite differently . . .
f = #(x,y)(x-3).^2 - (y-2).^2;
x=2:.2:4;
y=1:.2:3;
z = f( repmat(x(:)',numel(y),1) , repmat(y(:),1,numel(x) ) );
surf(x,y,z);
xlabel('X'); ylabel('Y'); zlabel('Z');
This will give you an output like this . . .
The f = #(x,y) part of the first line states you want to define a function called f taking variables x and y. The rest of the line is the definition of that function.
If you want to plot z as a function of both x and y, then you need to supply all possible combinations in your range. This is what the line containing the repmat commands is for.
EDIT
There is a neat Matlab function meshgrid that can replace the repmat version of the script as suggested by #bas (welcome bas, please scroll to bas' answer and +1 it!) ...
f = #(x,y)(x-3).^2 - (y-2).^2;
x=2:.2:4;
y=1:.2:3;
[X,Y] = meshgrid(x,y);
surf(x,y,f(X,Y));
xlabel('x'); ylabel('y'); zlabel('z');
I typically use the MESHGRID function. Like so:
x = 2:0.2:4;
y = 1:0.2:3;
[X,Y] = meshgrid(x,y);
F = (X-3).^2-(Y-2).^2;
surf(x,y,F);
xlabel('x');ylabel('y');zlabel('f')
This is identical to the answer by #learnvst. it just does the repmat-ing for you.
Your problem is that the function you are using uses integers, and you are trying to assign a double to it. Integers cannot have decimal places. To fix this, you can make it to where it increases in increments of 1, instead of 0.2

SciLab Plotting

How would you plot these in SciLab or MatLab? I am new to these and have no idea how the software works. Please help.
$Plot following functions with different colors in Scilab or MatLab
– f2(x) = logn
– f3(x) = n
– f4(x) = nlogn
– f5(x) = n2
– f6(x) = nj (j > 2)
– f7(x) = cn (c > 1)
– f8(x) = n!
where x = linspace(1, 50, 50).
Well, a lot of these are built-in functions. For example
>> x = linspace(1,50,50);
>> plot(x,log(x))
>> plot(x,x)
>> plot(x,x.*log(x))
>> plot(x,x.^2)
I don't know what nj (j > 2) and cn (c > 1) are supposed to mean.
For the last one, you should look at the function factorial.
It's not clear from the context whether you're supposed to plot them on different graphs or all on the same graph. If all on the same graph, then you can use
>> hold on;
to freeze the current axes - that means that any new lines will get drawn on top of the old ones, instead of being drawn on a fresh set of axes.
In Matlab (and probably in Scilab) you can supply a "line spec" argument to the plot function, which tells it what color and style to draw the line in. For example,
>> figure
>> hold on
>> plot(x,log(x),'b')
>> plot(x,x/10,'r')
>> plot(x,x.^2/1000,'g')
Tells Matlab to plot the function f(x)=log(x) in blue, f(x)=x/10 in red and f(x)=x^2/1000 in green, which results in this plot:
I can't comment or upvote yet but I'd add to Chris Taylor's answer that in Scilab the hold on and hold off convention isn't used. All plot commands output to the current axes, which are 'held on' all the time. If you want to generate a new figure or change the current axes you can use figure(n), where n can be any (nonconsecutive) positive integer - just a label really.
See also clf(n), gcf() and gca() - Scilab's figure handling differs quite a bit from Matlab's, though the matplotlib ATOMS module goes some way towards making Scilab look and behave more like Matlab.
In Scilab, it will be
x = 1:50;
clf
plot("ll", x,log, x,x, x,x.*log(x), x,x.^2)
gca().sub_ticks(2) = 8;
xgrid(color("grey"))
legend("$"+["ln(x)", "x", "x.ln(x)", "x^2"]+"$", "in_upper_left")

Summing Tensors

I'm implementing the system detailed in this paper.
On page 3, section 4 it shows the form that tensors take within the system:
R [ cos(2t), sin(2t); sin(2t), -cos(2t) ]
In my system, I only store R and t, since everything can be calculated from them.
However, I've got to the point where I need to sum two of these tensors (page 4, section 5.2). How can I find values for R and t after summing two tensors of this form?
I guess that's what you are looking for:
x = R_1*cos(2*t_1) + R_2*cos(2*t_2)
y = R_1*sin(2*t_1) + R_2*sin(2*t_2)
R_result = sqrt(x*x+y*y)
t_result = atan2(y,x)/2
Each term reduces to
R_1 trg(2 t_1) + R_2 trg(2 t_2) = R_1 trg_1 + R_2 trg_2
where trg represents either sin or cos and the indexed version takes the obvious meaning. So this is a just an ordinary problem in trigonometric identities repeated a couple of times.
Let
Q = (R_1 + R_2)/2
S = (R_1 - R_2)/2
then
R_1 trg(2 t_1) + R_2 trg(2 t_2) = (Q+S)(trg_1 + trg_2) + (Q-S)(trg_1 - trg_2)
which involves identities you can look up.
Sorry, adding two tensors is nothing more than algebra. The two matricies have to be the same size, and you add them term by term.
You can't just add the radii and angles and plug them back into the tensor. Do the addition properly and it'll work. Here's the first term:
R1*cost(2t1) + R2*cos(2t2) = ?
Here's the answer from Wolfram Alpha. As you can see, it doesn't simplify into a nice, neat expression with an R and a T for you.
In case you haven't thought of it, put the tensor sum into Wolfram Alpha and see what it gives you. They're better at algebra than anyone at this site. Why not get an independent check of your work?

Returning a function in a list, from a function

I searched for this question, but found answers that weren't specific enough.
I'm cleaning up old code and I'm trying to make sure that the following is relatively clean, and hoping that it won't bite me on the rear later on.
My question is about passing a function through a function. Look at the "y" part of the following plot statement. The goo(df)[[1]](x) thing works, but am I asking for trouble in any way? If so, is there a cleaner way?
Also, if the goo() function is called many many times, for instance in a Monte Carlo analysis, will this load up R's internals or possibly cause some type of environment issues?
Edit (02/21/2011) --- The following code is just an example. The real function "goo" has a lot of code before it gets to the approxfun() statement.
#Build a dataframe
df <- data.frame(a=c(1, 2, 3, 4, 5), b=c(4, 3, 1, 2, 6))
#Build a function that passes a function
goo <- function(inp.df) {
out.fun <- approxfun(x=inp.df$a, y=inp.df$b, yright=max(inp.df$b), method="linear", f=1)
list(out.fun, inp.df$a[5], inp.df$b[5])
}
#Set up the plot range
x <- seq(1, 4.3, 0.01)
#Plot the function
plot(x, goo(df)[[1]](x), type="l", xlim=c(0, goo(df)[[2]]), ylim=c(0, goo(df)[[3]]), lwd=2, col="red")
grid()
goo(df)
[[1]]
function (v)
.C("R_approxfun", as.double(x), as.double(y), as.integer(n),
xout = as.double(v), as.integer(length(v)), as.integer(method),
as.double(yleft), as.double(yright), as.double(f), NAOK = TRUE,
PACKAGE = "stats")$xout
<environment: 0219d56c>
[[2]]
[1] 5
[[3]]
[1] 6
It's hard to give you specific recommendations without knowing exactly what your code is, but here are a few things to consider:
Is it really necessary to include pieces of goo's input data in its return value? In other words, can you make goo a straightforward factory that just returns a function? In your example, at least, the plot function already has all the data it needs to determine the limits.
If this is not possible, then stay with this pattern, but give the elements of goo's return value descriptive names so that at least it's easy to see what's going on when you reference them. (E.g., goo(df)$approx(x).) If this structure is used widely in your code, consider making it an S3 class.
Finally, don't invoke goo(df) multiple times in the plot function, just to get different elements out. When you do that, you literally call goo every time, which as you said will execute a lot of code. Also, each invocation will have its own environment with a copy of the input data (although R will be smart enough to reduce the copying to a certain extent and use the same physical instance of df.) Instead, call goo once, assign its value to a variable, and reference that variable subsequently.
I would remove a level of function handling and keep the input data out of the function generation. Then you can keep your function out of the goo and call approxfun only once.
It also generalizes to an input dataframe of any size, not just one with 5 rows.
#Build a dataframe
df <- data.frame(a=c(1, 2, 3, 4, 5), b=c(4, 3, 1, 2, 6))
#Build a function
fun <- approxfun(x = df$a, y = df$b, yright=max(df$b), method="linear", f = 1)
#Set up the plot range
x <- seq(1, 4.3, 0.01)
#Plot the function
plot(x, fun(x), type="l", xlim=c(0, max(df$a)), ylim=c(0, max(df$b)), lwd=2, col="red")
That might not be quite what you need ultimately, but it does remove a level of complexity and gives a cleaner starting point.
This might not be better in a big Monte Carlo simulation, but for simpler situations, it might be clearer to include the x and y ranges as attributes of the output from the created function instead of in a list with the created function. This way goo is a more straightforward factory, like Davor mentions. You could also make the result from your function an object (here using S3) so that it can be plotted more simply.
goo <- function(inp.df) {
out.fun <- approxfun(x=inp.df$a, y=inp.df$b, yright=max(inp.df$b),
method="linear", f=1)
xmax <- inp.df$a[5]
ymax <- inp.df$b[5]
function(...) {
structure(data.frame(x=x, y = out.fun(...)),
limits=list(x=xmax, y=ymax),
class=c("goo","data.frame"))
}
}
plot.goo <- function(x, xlab="x", ylab="approx",
xlim=c(0, attr(x, "limits")$x),
ylim=c(0, attr(x, "limits")$y),
lwd=2, col="red", ...) {
plot(x$x, x$y, type="l", xlab=xlab, ylab=ylab,
xlim=xlim, ylim=ylim, lwd=lwd, col=col, ...)
}
Then to make the function for a data frame, you'd do:
df <- data.frame(a=c(1, 2, 3, 4, 5), b=c(4, 3, 1, 2, 6))
goodf <- goo(df)
And to use it on a vector, you'd do:
x <- seq(1, 4.3, 0.01)
goodfx <- goodf(x)
plot(goodfx)