How to figure out which column names are illegal in ranger? - tidymodels

Here is a ranger call:
rf_fit <-
rf_mod %>%
fit(my_outcome_factor ~ ., data = data_train)
and the output:
Error in parse.formula(formula, data, env = parent.frame()) :
Error: Illegal column names in formula interface. Fix column names or use alternative interface in ranger.
How can I tell which columns are illegal?
I tried setting up a function foo() to debug:
foo <- function() {
browser()
ranger:::parse.formula(my_outcome_factor ~ ., data = data_train)
}
This function didn't help me much, because I don't know how to get a breakpoint in the right spot.
It's ranger version 0.12.1.

Related

How to loop over comparegoup function in R

I have a large dataset and I want to apply comparegroups function in R. The dataset has number of grouping variables that I want to compare other variables on it; I wanted to loop over these grouping variables. the function is not accepting the loop
`vars <- c("fibrosis_stage", "Steatosis_stage", "patient_classification")
for (var in vars){
model <- compareGroups(var~.,data = data)
result.model <- createTable(model)
export2xls(result.model,paste0(var,"comparisons.xlsx"))
}`
but I got the following error:
Error in model.frame.default(formula = var ~ ., data = list(Gender = c(2L, :
variable lengths differ
I tried even to make it in a function; with the grouping varaible as input but I had an error also.
Any one can help?

error: element number 2 undefined in return list. I'm new to this, pls help me

x = fopen('pm10_data.txt');
fseek(x, 8,0);
dat = fscanf (x,'%f',[2,1000]);
dat = transpose(dat);
a = dat(:,1);
b = dat(:,2);
[r,p] = cor_test (a,b)
fclose(x);
r
p
this is what i got,
r =
scalar structure containing the fields:
method = Pearson's product moment correlation
params = 76
stat = 6.2156
dist = t
pval = 2.5292e-08
alternative = !=
Run error
error: element number 2 undefined in return list
error: called from
tester.octave at line 7 column 6
Presumably you're referring to the cor_test function from the statistics package, even though you don't show loading this in your workspace.
According to the documentation of cor_test:
The output is a structure with the following elements:
PVAL The p-value of the test.
STAT The value of the test statistic.
DIST The distribution of the test statistic.
PARAMS The parameters of the null distribution of the test statistic.
ALTERNATIVE The alternative hypothesis.
METHOD The method used for testing.
If no output argument is given, the p-value is displayed.
This seems to be what you're getting too.
If you want the p value explicitly from that structure, you can access that as r.pval
The syntax [a, b, ...] = functionname( args, ... ) expects the function to return more than one argument, and capture all the returned arguments into the named variables (i.e. a, b, etc).
In this case, cor_test only returns a single argument, even though that argument is a struct (which means it has fields you can access).
The error you're getting effectively means you requested a second output argument p, but the function you're using does not return a second output argument. It only returns that struct you already captured in r.

How to read value of property depending on an argument

How can I get the value of a property given a string argument.
I have a Object CsvProvider.Row which has attributes a,b,c.
I want to get the attribute value depending on property given as a string argument.
I tried something like this:
let getValue (tuple, name: string) =
snd tuple |> Seq.averageBy (fun (y: CsvProvider<"s.csv">.Row) -> y.```name```)
but it gives me the following error:
Unexpected reserved keyword in lambda expression. Expected incomplete
structured construct at or before this point or other token.
Simple invocation of function should look like this:
getValue(tuple, "a")
and it should be equivalent to the following function:
let getValue (tuple) =
snd tuple |> Seq.averageBy (fun (y: CsvProvider<"s.csv">.Row) -> y.a)
Is something like this is even possible?
Thanks for any help!
The CSV type provider is great if you are accessing data by column names statically, because you get nice auto-completion with type inference and checking.
However, for a dynamic access, it might be easier to use the underlying CsvFile (also a part of F# Data) directly, rather than using the type provider:
// Read the given file
let file = CsvFile.Load("c:/test.csv")
// Look at the parsed headers and find the index of column "A"
let aIdx = file.Headers.Value |> Seq.findIndex (fun k -> k = "A")
// Iterate over rows and print A values
for r in file.Rows do
printfn "%A" (r.Item(aIdx))
The only unfortunate thing is that the items are accessed by index, so you need to build some lookup table if you want to easily access them by their name.

Ignore NA's in sapply function

I am using R and have searched around for an answer but while I have seen similar questions, it has not worked for my specific problem.
In my data set I am trying to use the NA's as placeholders because I am going to return to them once I get part of my analysis done so therefore, I would like to be able to do all my calculations as if the NA's weren't really there.
Here's my issue with an example data table
ROCA = c(1,3,6,2,1,NA,2,NA,1,NA,4,NA)
ROCA <- data.frame (ROCA=ROCA) # converting it just because that is the format of my original data
#Now my function
exceedes <- function (L=NULL, R=NULL, na.rm = T)
{
if (is.null(L) | is.null(R)) {
print ("mycols: invalid L,R.")
return (NULL)
}
test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
test1 <- sapply(L,function(x) if((x)> test){1} else {0})
return (test1)
}
L=ROCA[,1]
R=.5
ROCA$newcolumn <- exceedes(L,R)
names(ROCA)[names(ROCA)=="newcolumn"]="Exceedes1"
I am getting the error:
Error in if ((x) > test) { : missing value where TRUE/FALSE needed
As you guys know, it is something wrong with the sapply function. Any ideas on how to ignore those NA's? I would try na.omit if I could get it to insert all the NA's right where they were before, but I am not sure how to do that.
There's no need for sapply and your anonymous function because > is already vectorized.
It also seems really odd to specify default argument values that are invalid. My guess is that you're using that as a kludge instead of using the missing function. It's also good practice to throw an error rather than return NULL because you would still have to try to catch when the function returns NULL.
exceedes <- function (L, R, na.rm=TRUE)
{
if(missing(L) || missing(R)) {
stop("L and R must be provided")
}
test <- mean(L,na.rm=TRUE)-R*sd(L,na.rm=TRUE)
as.numeric(L > test)
}
ROCA <- data.frame(ROCA=c(1,3,6,2,1,NA,2,NA,1,NA,4,NA))
ROCA$Exceeds1 <- exceedes(ROCA[,1],0.5)
This statement is strange:
test1 <- sapply(L,function(x) if((x)> test){1} else {0})
Try:
test1 <- ifelse(is.na(L), NA, ifelse(L > test, 1, 0))
Do you want NA:s in the result? That is, do you want the rows to line up?
seems like just returning L > test would work then. And adding the column can be simplified too (I suspect "Exeedes1" is in a variable somewhere).
exceedes <- function (L=NULL, R=NULL, na.rm = T)
{
if (is.null(L) | is.null(R)) {
print ("mycols: invalid L,R.")
return (NULL)
}
test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
L > test
}
L=ROCA[,1]
R=.5
ROCA[["Exceedes1"]] <- exceedes(L,R)

Passing a filepath to a R function?

I tried to pass a filepath to a function in R, but I failed =/ I hope someone here can help me.
>heat <- function(filepath)
{ chicks <- read.table(file=filepath, dec=",", header=TRUE, sep="\t")
...
}
When I call the function, nothing happens...
>heat("/home/.../file.txt")
... and "chicks" is not found
>chicks
Error: Object 'chicks' not found
What is the correct way to pass a path to a function?
You should be able to pass file paths as you have (if the file exists). You can also query file paths in R using list.files() [use the argument full.names=TRUE]. However, in this case I believe you cannot see chicks because it is local to the function so you will not be able to see this variable outside of the function. Furthermore, if your last expression is an assignment, I believe the output is not printed. Try
> heat <- function(filepath) {
+ read.table(file=filepath, dec=",", header=TRUE, sep="\t")
+ }
> heat("/home/.../file.txt")
or
> chicks <- heat("/home/.../file.txt")
> chicks
and you should see chicks. Or if you want to see it printed while assigning, add parentheses around the statement:
> (chicks <- heat("/home/.../file.txt"))
If you want to assign to chicks within the function but still see it after the function has completed,
> heat <- function(filepath) {
+ chicks <- read.table(file=filepath, dec=",", header=TRUE, sep="\t")
+ assign("chicks",chicks,globalenv())
+ }
The function can't know what you're trying to make the output. If you don't specify it, the output will be the last viable line, which may not always be what you want. Use return() to specify what should come out as an object.
heat <- function(filepath) {
chicks <- read.table(file=filepath, dec=",", header=TRUE, sep="\t")
...
return(chicks)
}
inpt <- heat("/.../file.txt")
Does this help with your problem?
Also when working with paths, it is often helpful to test whether the file/folder exists:
heat <- function(filepath){
if(!file.exists(filepath)){
stop(sprintf("Filepath %s does not exist",filepath))
}
...
}
In the example above, however, read.table will give an error message if the file does not exist.