How to add beta parameter to F1 Score for fit_resample in TidyModel - tidymodels

I am using the fit_resamples() function in TidyModels to get the F1 metrics as below.
I would like to know how to pass the beta parameter whose default is set at 1 at the moment.
glm_workflow %>%
fit_resamples(resamples = trainDatFolds,
metrics = metric_set(roc_auc, pr_auc,
accuracy, f_meas),
control = control_resamples(save_pred = TRUE) %>%
collect_metrics()
Thanks a lot!
Zarni

You'll need a simple wrapper for the metric. See ?metric_set. The examples include one where the ccc() function is used with an argument.

Related

Measurement-repeated ANCOVA in 2x2 Mixed Design

I am calculating in R an ANOVA with repeated measures in 2x2 mixed design. For this I use one of the following inputs in R:
(1)
res.aov <- anova_test(data = datac, dv = Stress, wid = REF,between = Gruppe, within = time )
get_anova_table(res.aov)
(2)
aov <- datac %>%
anova_test(dv = Stress, wid = REF, between = Gruppe, within = time, type = 3)
aov
Both lead to the same results. Now I want to add a covariate from the 1st measurement time point. So far I could not find a suitable R input for the ANCOVA for this repeated measures design.
Does anyone of you perhaps have an idea?
Many greetings
ANOVA <- aov(Stress~time+covariate, data = data)
summary(ANOVA)
For a simple ANCOVA, the R input of an ANOVA with the addition of the covariate applies. Unfortunately, I have no idea how this works in the repeated measures design.

for loop using ggplot for longitudinal data

I am trying to visualize my longtudinal data by graphs using ggplot through for loop.
for(i in colnames(dat_longC)[c(3,5:10,14,17:19,30:39)]){
print(ggplot(dat_longC, aes(x = exam, y = i, group = zz_nr))+ geom_point()+
geom_line() + xlab("Examination") + ylab(i))}
}
when I use the ggplot command in a for loop, I only get a line extending between examination times. If I use the same command on a single variable, it works and gives me trajectory graphs. What do you think could be the problem?
Your problem is that you are using i to indicate the column. That is just an index, so it does not know what you are actually trying to plot. you really want colnames(dat_longC)[i]. Unfortunately, that will still not work because you are using a string as a variable name, which does not work for ggplot2. Instead, you will likely need !!sym(colnames(dat_longC)[i]). I can't really test without your data, but here is some example code to help guide you.
library(tidyverse)
map(colnames(mtcars)[2:4],
\(x) ggplot(mtcars, aes(!!sym(x), mpg))+
geom_point()+
ggtitle(x))
#> [[1]]
#>
#> [[2]]
#>
#> [[3]]
for(i in colnames(dat_longC)[c(3,5:10,14,17:19,30:39)]){
print(ggplot(dat_longC, aes_string(x = "exam" , y = i, group = "zz_nr"))+ geom_point()+
geom_line() + xlab("Examination") + ylab(i))}
}
Thank you very much for your reply!
I just used aes_string and added quote to the variable names and it worked out.

Callback for multivalue dropdown

I am pretty new to Plotly Dash and have been struggling especially with multivalue dropdown callback and would really appreciate any help. Basically I've followed a tutorial and created a pie-chart if a single pillar(from my data) value is selected. I would like to achieve two things:
The default or initial chart should show all pillar and the number of projects
Multi selection of pillar values
My main issue is actually the creating the callback for these. Thank you in advance for any help!!
Here is my code
app = dash.Dash(__name__)
all = df.Pillar.unique()
app.layout=html.Div([
html.H1("PM dashboard"),
dcc.Dropdown(id='pillar-choice',
options=[{'label':x, 'value':x}
for x in all],
value='Service Provider',
multi=False),
dcc.Graph(id='my-graph',
figure={}),
])
#app.callback(
Output(component_id='my-graph', component_property='figure'),
Input(component_id='pillar-choice', component_property='value')
)
def interactive_graphs(value_pillar):
print(value_pillar)
dff = df[df.Pillar==value_pillar]
fig = px.pie(data_frame=dff, names='Pillar', values='Project No')
return fig
if __name__=='__main__':
app.run_server()
I think the problem here is that value_pillar will be a list, so you need to do something like:
dff = df[df.Pillar.isin(value_pillar)]
And if you want to show everything by default, you'll need to check the value of that argument for your default value and, if it matches the default, avoid filtering.

How to get descriptive table for both continuous and categorical variables?

I want to get descriptive table in html format for all variables that are in data frame. I need for continuous variables mean and standard deviation. For categorical variables frequency (absolute count) of each category and percentage of each category. Also I need the count of missing values to be included.
Lets use this data:
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
I want to get table in html format that will look like this:
----------------------------------------------------------------------
Variables N (missing) Mean (SD) / %
----------------------------------------------------------------------
len 59 (1) 18.9 (7.65)
supp
OJ 30 50%
VC 29 48.33%
NA 1 1.67%
dose 60 1.17 (0.629)
I need also to set the number of digits after decimal point to show.
If you know better variant to display that information in html in better way than please provide your solution.
Here's a programatic way to create separate summary tables for the numeric and factor columns. Note that this doesn't make note of NAs in the table as you requested, but does ignore NAs to calculate summary stats as you did. It's a starting point, anyway. From here you could combine the tables and format the headers however you want.
If you knit this code within an RMarkdown document with HTML output, kable will automatically generate the html table and a css will format the table nicely with a horizontal rules as pictured below. Note that there's also a booktabs option to kable that makes prettier tables like the LaTeX booktabs package. Otherwise, see the documentation for knitr::kable for options.
library(dplyr)
library(tidyr)
library(knitr)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
numeric_cols <- dplyr::select_if(df, is.numeric) %>%
gather(key = "variable", value = "value") %>%
group_by(variable) %>%
summarize(count = n(),
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE))
factor_cols <- dplyr::select_if(df, is.factor) %>%
gather(key = "variable", value = "value") %>%
group_by(variable, value) %>%
summarize(count = n()) %>%
mutate(p = count / sum(count, na.rm = TRUE))
knitr::kable(numeric_cols)
knitr::kable(factor_cols)
I found r package table1 that does what I want. Here is a code:
library(table1)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
table1(reformulate(colnames(df)), data=df)

R function scope

I am trying to create a function for a series of rounding work in R, I have percentage, decimals, etc, each of them need to be rounded differently.
I start writing the function by first picking up the object I want, but I fail to do that already, here's the code:
roundings <- function(obj.head)
{obj.list <- ls(pattern=obj.head)
obj.list
}
Suppose I have two object A1 and B1, I suppose if I run roundings("A"), A1 should appear as the function output, but it didn't.
What have I done wrong? Thanks.
The call to ls is searching the current environment within the function and does not find any objects to match. You can specify the envir parameter in ls with .GlobalEnv. Thus your code becomes:
roundings <- function(obj.head)
{
obj.list <- ls(pattern = obj.head, envir = .GlobalEnv)
obj.list
}
I found the reason, I should have added envir=.GlobalEnv inside the ls parameters.