How do workflows in TidyModels apply recipe steps to CV folds? - tidymodels

To control for a minority positive class for the project I'm working on, I'm implementing step_downsample() in my recipe. I'm also using 10-fold cross-validation to mitigate bias. When I use a workflow to wrap up the learner, recipe, a grid search, and the CV folds, does the workflow apply the recipe steps to each individual fold prior to model training? The order of operations is hazy to me and I wasn't able to find any satisfactory answers in the documentation. Thanks!

I think you might find this chapter helpful, especially the section "Where does the model begin and end?".
Yes, in tidymodels, the preprocessing recipe (i.e. feature engineering procedure) is considered part of the modeling process and is trained on each fold like the learner.
You can see this happening in the logging if you set verbose = TRUE during tuning:
library(tidymodels)
library(themis)
#>
#> Attaching package: 'themis'
#> The following objects are masked from 'package:recipes':
#>
#> step_downsample, step_upsample
data(Ionosphere, package = "mlbench")
svm_mod <-
svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab")
iono_rec <-
recipe(Class ~ ., data = Ionosphere) %>%
# remove any zero variance predictors
step_zv(all_predictors()) %>%
# remove any linear combinations
step_lincomb(all_numeric()) %>%
step_downsample(Class)
set.seed(123)
iono_rs <- bootstraps(Ionosphere, times = 5)
set.seed(325)
svm_mod %>%
tune_grid(
iono_rec,
resamples = iono_rs,
control = control_grid(verbose = TRUE)
)
#> i Bootstrap1: preprocessor 1/1
#> ✓ Bootstrap1: preprocessor 1/1
#> i Bootstrap1: preprocessor 1/1, model 1/10
#> ✓ Bootstrap1: preprocessor 1/1, model 1/10
#> i Bootstrap1: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 2/10
#> ✓ Bootstrap1: preprocessor 1/1, model 2/10
#> i Bootstrap1: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 3/10
#> ✓ Bootstrap1: preprocessor 1/1, model 3/10
#> i Bootstrap1: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 4/10
#> ✓ Bootstrap1: preprocessor 1/1, model 4/10
#> i Bootstrap1: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 5/10
#> ✓ Bootstrap1: preprocessor 1/1, model 5/10
#> i Bootstrap1: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 6/10
#> ✓ Bootstrap1: preprocessor 1/1, model 6/10
#> i Bootstrap1: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 7/10
#> ✓ Bootstrap1: preprocessor 1/1, model 7/10
#> i Bootstrap1: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 8/10
#> ✓ Bootstrap1: preprocessor 1/1, model 8/10
#> i Bootstrap1: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 9/10
#> ✓ Bootstrap1: preprocessor 1/1, model 9/10
#> i Bootstrap1: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 10/10
#> ✓ Bootstrap1: preprocessor 1/1, model 10/10
#> i Bootstrap1: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap2: preprocessor 1/1
#> ✓ Bootstrap2: preprocessor 1/1
#> i Bootstrap2: preprocessor 1/1, model 1/10
#> ✓ Bootstrap2: preprocessor 1/1, model 1/10
#> i Bootstrap2: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 2/10
#> ✓ Bootstrap2: preprocessor 1/1, model 2/10
#> i Bootstrap2: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 3/10
#> ✓ Bootstrap2: preprocessor 1/1, model 3/10
#> i Bootstrap2: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 4/10
#> ✓ Bootstrap2: preprocessor 1/1, model 4/10
#> i Bootstrap2: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 5/10
#> ✓ Bootstrap2: preprocessor 1/1, model 5/10
#> i Bootstrap2: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 6/10
#> ✓ Bootstrap2: preprocessor 1/1, model 6/10
#> i Bootstrap2: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 7/10
#> ✓ Bootstrap2: preprocessor 1/1, model 7/10
#> i Bootstrap2: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 8/10
#> ✓ Bootstrap2: preprocessor 1/1, model 8/10
#> i Bootstrap2: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 9/10
#> ✓ Bootstrap2: preprocessor 1/1, model 9/10
#> i Bootstrap2: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 10/10
#> ✓ Bootstrap2: preprocessor 1/1, model 10/10
#> i Bootstrap2: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap3: preprocessor 1/1
#> ✓ Bootstrap3: preprocessor 1/1
#> i Bootstrap3: preprocessor 1/1, model 1/10
#> ✓ Bootstrap3: preprocessor 1/1, model 1/10
#> i Bootstrap3: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 2/10
#> ✓ Bootstrap3: preprocessor 1/1, model 2/10
#> i Bootstrap3: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 3/10
#> ✓ Bootstrap3: preprocessor 1/1, model 3/10
#> i Bootstrap3: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 4/10
#> ✓ Bootstrap3: preprocessor 1/1, model 4/10
#> i Bootstrap3: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 5/10
#> ✓ Bootstrap3: preprocessor 1/1, model 5/10
#> i Bootstrap3: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 6/10
#> ✓ Bootstrap3: preprocessor 1/1, model 6/10
#> i Bootstrap3: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 7/10
#> ✓ Bootstrap3: preprocessor 1/1, model 7/10
#> i Bootstrap3: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 8/10
#> ✓ Bootstrap3: preprocessor 1/1, model 8/10
#> i Bootstrap3: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 9/10
#> ✓ Bootstrap3: preprocessor 1/1, model 9/10
#> i Bootstrap3: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 10/10
#> ✓ Bootstrap3: preprocessor 1/1, model 10/10
#> i Bootstrap3: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap4: preprocessor 1/1
#> ✓ Bootstrap4: preprocessor 1/1
#> i Bootstrap4: preprocessor 1/1, model 1/10
#> ✓ Bootstrap4: preprocessor 1/1, model 1/10
#> i Bootstrap4: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 2/10
#> ✓ Bootstrap4: preprocessor 1/1, model 2/10
#> i Bootstrap4: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 3/10
#> ✓ Bootstrap4: preprocessor 1/1, model 3/10
#> i Bootstrap4: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 4/10
#> ✓ Bootstrap4: preprocessor 1/1, model 4/10
#> i Bootstrap4: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 5/10
#> ✓ Bootstrap4: preprocessor 1/1, model 5/10
#> i Bootstrap4: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 6/10
#> ✓ Bootstrap4: preprocessor 1/1, model 6/10
#> i Bootstrap4: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 7/10
#> ✓ Bootstrap4: preprocessor 1/1, model 7/10
#> i Bootstrap4: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 8/10
#> ✓ Bootstrap4: preprocessor 1/1, model 8/10
#> i Bootstrap4: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 9/10
#> ✓ Bootstrap4: preprocessor 1/1, model 9/10
#> i Bootstrap4: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 10/10
#> ✓ Bootstrap4: preprocessor 1/1, model 10/10
#> i Bootstrap4: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap5: preprocessor 1/1
#> ✓ Bootstrap5: preprocessor 1/1
#> i Bootstrap5: preprocessor 1/1, model 1/10
#> ✓ Bootstrap5: preprocessor 1/1, model 1/10
#> i Bootstrap5: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 2/10
#> ✓ Bootstrap5: preprocessor 1/1, model 2/10
#> i Bootstrap5: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 3/10
#> ✓ Bootstrap5: preprocessor 1/1, model 3/10
#> i Bootstrap5: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 4/10
#> ✓ Bootstrap5: preprocessor 1/1, model 4/10
#> i Bootstrap5: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 5/10
#> ✓ Bootstrap5: preprocessor 1/1, model 5/10
#> i Bootstrap5: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 6/10
#> ✓ Bootstrap5: preprocessor 1/1, model 6/10
#> i Bootstrap5: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 7/10
#> ✓ Bootstrap5: preprocessor 1/1, model 7/10
#> i Bootstrap5: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 8/10
#> ✓ Bootstrap5: preprocessor 1/1, model 8/10
#> i Bootstrap5: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 9/10
#> ✓ Bootstrap5: preprocessor 1/1, model 9/10
#> i Bootstrap5: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 10/10
#> ✓ Bootstrap5: preprocessor 1/1, model 10/10
#> i Bootstrap5: preprocessor 1/1, model 10/10 (predictions)
#> # Tuning results
#> # Bootstrap sampling
#> # A tibble: 5 x 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [351/136]> Bootstrap1 <tibble [20 × 6]> <tibble [0 × 1]>
#> 2 <split [351/124]> Bootstrap2 <tibble [20 × 6]> <tibble [0 × 1]>
#> 3 <split [351/134]> Bootstrap3 <tibble [20 × 6]> <tibble [0 × 1]>
#> 4 <split [351/130]> Bootstrap4 <tibble [20 × 6]> <tibble [0 × 1]>
#> 5 <split [351/138]> Bootstrap5 <tibble [20 × 6]> <tibble [0 × 1]>
Created on 2021-03-10 by the reprex package (v1.0.0)
In this particular case, it walks through each resample and first trains the recipe, then fits the first learner/parameter option, then evaluates predictions on the heldout set for that resample and learner/parameter option, then goes to the next learner option.

Related

How to apply sqrt to vector in Cython?

Hello I'm really beginner to Cython or C-based language.
I have a problem to get a square of a vector.
I have a vector(each value is double type):
x = [1, 4, 9]
and I want to get:
y = [1, 2, 3]
How can I get this vector?
A solution I thought is:
cdef floating[::1] y = x
for i in range(length):
y[i] = x[i] ** 0.5
But in this way it's too slow. I want to acclerate this.
Can I use sqrt or square function from libc.math in this case?
Edit:
If I want to get a vector like 1/3 root (like [1, 8, 27] -> [1,2,3]) what function should I use instead of sqrt?
Quick win
First you should check if your function is already implemented in Numpy. If so, it will probably be a very fast (C/C++) implementation.
This is the case for your function:
import numpy as np
x = np.array([1, 4, 9])
y = np.sqrt(x)
#> array([ 1, 2, 3])
With Numpy arrays
Alternatively (following #joni's comment), you can use np arrays just for the input/output and compute the function element-wise using C/C++:
cimport numpy as cnp
import numpy as np
from libc.math cimport sqrt
cpdef cnp.ndarray[double, ndim=1] cy_sqrt_np(cnp.ndarray[double, ndim=1] x):
cdef Py_ssize_t i, l=x.shape[0]
cdef np.ndarray[double, 1] y = np.empty(l)
for i in range(l):
y[i] = sqrt(x[i])
return y
With C++ vectors
Lastly, here is a possible implementation with C++ vectors and automatic conversion from/to python lists:
from libc.math cimport sqrt
from libcpp.vector cimport vector
cpdef vector[double] cy_sqrt_vec(vector[double] x):
cdef Py_ssize_t i, l = x.size()
cdef vector[double] y
y.reserve(l)
for i in range(l):
y.push_back(sqrt(x[i]))
return y
Some things to keep in mind in this case and the previous:
We initialize the y vector to be empty, and then allocate space for it with reserve(). According to SO this seems to be a good option.
We use a typed i in the for loop, and use push_back to assign new values.
We use sqrt from libc.math to avoid using Python code inside the loop.
We type the input of the function to be vector[double]. This automatically adds convenient type conversions from other python types (e.g., list of ints).
Time comparison
We define a random input x to avoid cached results polluting our measures:
%%timeit -n 10000 -r 7 x = gen_x()
y = np.sqrt(x)
#> executed in 177ms, finished 16:16:57 2022-04-19
#> 2.3 µs ± 241 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit -n 10000 -r 7 x = gen_x()
y = x**.5
#> executed in 194ms, finished 16:16:51 2022-04-19
#> 2.46 µs ± 256 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit -n 10000 -r 7 x = gen_x()
y = cy_sqrt(x)
executed in 359ms, finished 16:17:02 2022-04-19
4.9 µs ± 274 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit -n 10000 -r 7 x = list(gen_x())
y = cy_sqrt_vec(x)
executed in 2.85s, finished 16:17:11 2022-04-19
40.4 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As expected, the np.sqrt version wins. Besides, the vector allocation looks comparatively slower.

Rmarkdown with shiny elements - issue to render html locally

I am trying to produce a standalone html file using rmarkdown::render from Rmd with some shiny reactive elements (without shiny server). It is possible for me to knit the file and open the dashboard in the R preview window as well as in the browser. However, when using rmarkdown::render
for creating an html file, I am getting an error message:
output file: mycode.knit.md
Error: path for html_dependency not provided
Structure of the Rmd script:
---
title: "Monitoring"
author: "me"
date: Monitoring date `r format(Sys.Date(), format="%d %B %Y")`
runtime: shiny
output: html_document
---
(...)
dateRangeInput('dateRange','Select reference period: ',
format = "dd-MM-yyyy", startview = "month", weekstart = 0,
language = "en", separator = " to ", width = NULL)
(...)
renderDT({ datatable(df(),
filter=list(position='top',clear = TRUE),
escape=FALSE,
rownames=FALSE,
extensions = 'Buttons',
options=list(
dom = 'Bfrtip',
buttons = c('copy', 'excel', 'pdf' ),
paging=FALSE,
columnDefs = list(list(width = '500px', targets =c("Bank")),
list(className = "dt-head-center dt-center", targets = "_all"))
)
)
})
(...)
df_r<-reactive(df,...)
renderPlotly({plot_ly(df_r(),
y= ~reorder(id, -share),
x = ~share,
type = 'bar',
marker = list(color = '#0a00b6'))%>%
layout(title = 'Title')
})
R script I am running:
rmarkdown::render('my_script.Rmd',
output_file = "output.html",
output_dir = 'C:/my_folder')
Session info:
- Session info -------------------------------------------------------------------------------------------------
setting value
version R version 3.5.0 (2018-04-23)
os Windows 7 x64 SP 1
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United Kingdom.1252
ctype English_United Kingdom.1252
tz Europe/Berlin
date 2020-04-14
- Packages -----------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.0 2017-04-11 [2] CRAN (R 3.5.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.3)
broom 0.4.4 2018-03-29 [2] CRAN (R 3.5.0)
Cairo 1.5-10 2019-03-28 [1] CRAN (R 3.5.3)
callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3)
cellranger 1.1.0 2016-07-27 [2] CRAN (R 3.5.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
colorspace 1.3-2 2016-12-14 [2] CRAN (R 3.5.0)
crayon 1.3.4 2017-09-16 [2] CRAN (R 3.5.0)
crosstalk 1.0.0 2016-12-21 [2] CRAN (R 3.5.0)
curl 3.2 2018-03-28 [2] CRAN (R 3.5.0)
data.table * 1.11.0 2018-05-01 [2] CRAN (R 3.5.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.3)
devtools 2.0.2 2019-04-08 [1] CRAN (R 3.5.3)
digest 0.6.15 2018-01-28 [2] CRAN (R 3.5.0)
dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.5.3)
DT * 0.13 2020-03-23 [1] CRAN (R 3.5.3)
ellipsis 0.2.0.1 2019-07-02 [1] CRAN (R 3.5.3)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.5.3)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.3)
farver 2.0.3 2020-01-16 [1] CRAN (R 3.5.3)
fastmap 1.0.1 2019-10-08 [1] CRAN (R 3.5.3)
forcats * 0.3.0 2018-02-19 [2] CRAN (R 3.5.0)
foreign 0.8-70 2017-11-28 [2] CRAN (R 3.5.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.3)
ggplot2 * 3.2.1 2019-08-10 [2] CRAN (R 3.5.3)
ggrepel * 0.8.2 2020-03-08 [1] CRAN (R 3.5.3)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
gtable 0.2.0 2016-02-26 [2] CRAN (R 3.5.0)
haven 2.1.1 2019-07-04 [1] CRAN (R 3.5.3)
hms 0.4.2 2018-03-10 [2] CRAN (R 3.5.0)
htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.5.3)
htmlwidgets 1.5.1 2019-10-08 [2] CRAN (R 3.5.3)
httpuv 1.5.2 2019-09-11 [1] CRAN (R 3.5.3)
httr 1.3.1 2017-08-20 [2] CRAN (R 3.5.0)
installr * 0.22.0 2019-08-02 [1] CRAN (R 3.5.3)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.3)
kableExtra * 1.1.0 2019-03-16 [1] CRAN (R 3.5.3)
knitr * 1.23 2019-05-18 [1] CRAN (R 3.5.3)
later 1.0.0 2019-10-04 [1] CRAN (R 3.5.3)
lattice 0.20-35 2017-03-25 [2] CRAN (R 3.5.0)
lazyeval 0.2.1 2017-10-29 [2] CRAN (R 3.5.0)
leaflet * 2.0.3 2019-11-16 [1] CRAN (R 3.5.3)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.5.3)
lubridate * 1.7.4 2018-04-11 [2] CRAN (R 3.5.0)
magrittr 1.5 2014-11-22 [2] CRAN (R 3.5.0)
markdown * 0.8 2017-04-20 [2] CRAN (R 3.5.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.3)
mime 0.5 2016-07-07 [2] CRAN (R 3.5.0)
mnormt 1.5-5 2016-10-15 [2] CRAN (R 3.5.0)
modelr 0.1.5 2019-08-08 [1] CRAN (R 3.5.3)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.3)
nlme 3.1-137 2018-04-07 [2] CRAN (R 3.5.0)
openxlsx 4.0.17 2017-03-23 [2] CRAN (R 3.5.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.5.3)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3)
pkgconfig 2.0.1 2017-03-21 [2] CRAN (R 3.5.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.3)
plotly * 4.9.1 2019-11-07 [1] CRAN (R 3.5.3)
plyr 1.8.4 2016-06-08 [2] CRAN (R 3.5.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.3)
processx 3.3.1 2019-05-08 [1] CRAN (R 3.5.3)
promises 1.1.0 2019-10-04 [1] CRAN (R 3.5.3)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.3)
psych 1.8.12 2019-01-12 [1] CRAN (R 3.5.3)
purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.5.3)
R6 2.2.2 2017-06-17 [2] CRAN (R 3.5.0)
Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.5.3)
readr * 1.1.1 2017-05-16 [2] CRAN (R 3.5.0)
readxl 1.1.0 2018-04-20 [2] CRAN (R 3.5.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.5.0)
reshape2 1.4.3 2017-12-11 [2] CRAN (R 3.5.0)
rio * 0.5.16 2018-11-26 [1] CRAN (R 3.5.3)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.5.3)
rmarkdown * 1.15 2019-08-21 [1] CRAN (R 3.5.0)
rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.5.0)
rsconnect 0.8.8 2018-03-09 [2] CRAN (R 3.5.0)
rstudioapi 0.7 2017-09-07 [2] CRAN (R 3.5.0)
rvest 0.3.4 2019-05-15 [1] CRAN (R 3.5.3)
scales 1.1.0 2019-11-18 [1] CRAN (R 3.5.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.3)
shiny * 1.4.0 2019-10-10 [1] CRAN (R 3.5.3)
shinyWidgets * 0.5.1 2020-03-04 [1] CRAN (R 3.5.3)
sparkline * 2.0 2016-11-12 [1] CRAN (R 3.5.3)
stringi 1.1.7 2018-03-12 [2] CRAN (R 3.5.0)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.5.3)
tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.5.3)
tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.5.3)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.3)
tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.5.3)
usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.3)
utf8 1.1.3 2018-01-03 [2] CRAN (R 3.5.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.5.3)
viridisLite 0.3.0 2018-02-01 [2] CRAN (R 3.5.0)
webshot 0.5.1 2018-09-28 [1] CRAN (R 3.5.3)
withr 2.1.2 2018-03-15 [2] CRAN (R 3.5.0)
xfun 0.7 2019-05-14 [1] CRAN (R 3.5.3)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.5.3)
xtable 1.8-2 2016-02-05 [2] CRAN (R 3.5.0)
yaml 2.1.19 2018-05-01 [2] CRAN (R 3.5.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.5.3)
pandoc version: 2.2.3.2
I tried adding self_contained: no and few other solutions provided in other posts with similar issues but unfortunately it did not fix an issue. Please let me know whichever ideas may help!
Many thanks.
You have to use rmarkdown::run() instead of rmarkdown::render() because it is a shiny environment.

how to prevent my DNN / MLP converging to average

I want to use available several features to predict a variable. It does not seem to be related to vision or NLP. Although I believe there are good reasons that the variable to be predicted is a non linear function of these features. So I just use normal MLP like following:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(53, 200)
self.fc2 = nn.Linear(200, 100)
self.fc3 = nn.Linear(100, 36)
self.fc4 = nn.Linear(36, 1)
def forward(self, x):
x = F.leaky_relu(self.fc1(x))
x = F.leaky_relu(self.fc2(x))
x = F.leaky_relu(self.fc3(x))
x = self.fc4(x)
return x
net = Net().to(device)
loss_function = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001, weight_decay= 1e-6)
def train_normal(model, device, train_loader, optimizer, epoch):
model.train ()
for batch_idx, (data, target) in enumerate (train_loader):
data = data.to (device)
target = target.to (device)
optimizer.zero_grad ()
output = model (data)
loss = loss_function (output, target)
loss.backward ()
torch.nn.utils.clip_grad_norm_(model.parameters(), 100)
optimizer.step ()
if batch_idx % 100 == 0:
print ('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format (
epoch, batch_idx * len (data), len (train_loader.dataset),
100. * batch_idx / len (train_loader), loss.item ()))
At first it seems to work and did learn something:
Train Epoch: 9 [268800/276316 (97%)] Loss: 0.217219
Train Epoch: 9 [275200/276316 (100%)] Loss: 0.234965
predicted actual diff
-1.18 -1.11 -0.08
0.15 -0.15 0.31
0.19 0.27 -0.08
-0.49 -0.48 -0.01
-0.05 0.08 -0.14
0.44 0.50 -0.06
-0.17 -0.05 -0.12
1.81 1.92 -0.12
1.55 0.76 0.79
-0.05 -0.30 0.26
But when it kept learning, I saw the results seemingly to be close to each other's average regardless the different input:
predicted actual diff
-0.16 -0.06 -0.10
-0.16 -0.55 0.39
-0.13 -0.26 0.14
-0.15 0.50 -0.66
-0.16 0.02 -0.18
-0.16 -0.12 -0.04
-0.16 -0.40 0.24
-0.01 1.20 -1.21
-0.07 0.33 -0.40
-0.09 0.02 -0.10
What technology / trick can prevent it? Also, how to increase the accuracy, shall I add more hidden layers or add more neurons of each layer?
One possible problem is that there is nothing to learn.
Check that your data is standardized and try different learning rates (maybe even cyclic learning rate). Something that can be happening is that the algorithm is not able to get inside the minima and keeps jumping around.
I am not sure, if you are using it but, use a standard implementation that works in another dataset and then change it to your problem, just to avoid small development mistakes. You can check either this tutorial How to apply Deep Learning on tabular data with FastAi but if you are really new I will totally recommend doing this MOOC https://course.fast.ai/. This should allow you to gain some level and understanding.
If you have all tabular data already you can try to use a machine learning algorithm like linear regression/gradient boosting. Just to check if your data has some info.
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> # y = 1 * x_0 + 2 * x_1 + 3
>>> y = np.dot(X, np.array([1, 2])) + 3
>>> reg = LinearRegression().fit(X, y)
>>> reg.score(X, y)
1.0
>>> reg.coef_
array([1., 2.])
>>> reg.intercept_
3.0000...
>>> reg.predict(np.array([[3, 5]]))
array([16.])
Let me know if you find the solution to your problem!

I was training the lstm network using pytorch and encountered this error

I was training the lstm network using pytorch and encountered this error in jupyter notebook.
RuntimeError Traceback (most recent call last)
<ipython-input-16-b6b1e0b8cad1> in <module>()
4
5 # train the model
----> 6 train(net, encoded, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length, lr=0.001, print_every=10)
<ipython-input-14-43dc0cc515e7> in train(net, data, epochs, batch_size, seq_length, lr, clip, val_frac, print_every)
55
56 # calculate the loss and perform backprop
---> 57 loss = criterion(output, targets.view(batch_size*seq_length))
58 loss.backward()
59 # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
~\Anaconda3\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
902 def forward(self, input, target):
903 return F.cross_entropy(input, target, weight=self.weight,
--> 904 ignore_index=self.ignore_index, reduction=self.reduction)
905
906
~\Anaconda3\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
1968 if size_average is not None or reduce is not None:
1969 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 1970 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
1971
1972
~\Anaconda3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
1788 .format(input.size(0), target.size(0)))
1789 if dim == 2:
-> 1790 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
1791 elif dim == 4:
1792 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target'
Cast output vector of your network to Long (you have Int) as the error says.
Oh, and please provide Minimal, Complete and Verifiable example next time you ask a question.

Clojure read CSV and split the columns into several vectors

Currently i have functions like this:
(def csv-file (.getFile (clojure.java.io/resource "datasources.csv")))
(defn process-csv [file]
(with-open [in-file (io/reader file)]
(doall (csv/read-csv in-file))))
what i need to do now is to produce vectors based on / group by columns from csv, i.e my process-csv output looks like this:
(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67])
my goal is to generate 3 vectors from column atom, neutron & photon:
atom: [10 23]
neutron: [22 23]
photon: [3 67]
FYI, i define 3 empty vectors before read the csv file:
(def atom [])
(def neutron[])
(def photon[])
first of all you can't modify these vectors, you've defined. It's the nature of immutable data structures. If you really need mutable vectors, use atom.
you can solve your task this way:
user> (def items (rest '(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67]
[1 2 3]
[5 6 7])))
user> (let [[atom neutron photon] (apply map vector items)]
{:atom atom :neutron neutron :photon photon})
{:atom [10 23 1 5], :neutron [22 23 2 6], :photon [3 67 3 7]}
that is how it work:
(apply map vector items) equals the following:
(map vector [10 22 3] [23 23 67] [1 2 3] [5 6 7])
it takes first items of each coll and make a vector of them, then second items and so on.
also, you can make it more robust, by taking row column names exactly from your csv data header:
user> (def items '(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67]
[1 2 3]
[5 6 7]))
#'user/items
user> (zipmap (map keyword (first items))
(apply map vector (rest items)))
{:atom [10 23 1 5], :neutron [22 23 2 6], :photon [3 67 3 7]}
I'll illustrate some other methods you could use, which can be combined with methods that leetwinski illustrates. Like leetwinski, I'll suggest using a hash map as your final structure, rather than three symbols containing vectors. That's up to you.
If you want, you can use core.matrix's transpose to do what leetwinski does with (apply map vector ...):
(require '[clojure.core.matrix :as mx])
(mx/transpose '(["atom" "neutron" "photon"] [10 22 3] [23 23 67]))
which produces:
[["atom" 10 23] ["neutron" 22 23] ["photon" 3 67]]
transpose is designed to work on any kind of matrix that implements the core.matrix protocols, and normal Clojure sequences of sequences are treated as matrices by core.matrix.
To generate a map, here's one approach:
(into {} (map #(vector (keyword (first %)) (rest %))
(mx/transpose '(["atom" "neutron" "photon"] [10 22 3] [23 23 67]))))
which produces:
{:atom (10 23), :neutron (22 23), :photon (3 67)}
keyword makes strings into keywords. #(vector ...) makes a pair, and (into {} ...) takes the sequence of pairs and makes a hash map from them.
Or if you want the vectors in vars, as you specified, then you can use a variant of leetwinski's let method. I suggest not defing the symbol atom, because that's the name of a standard function in Clojure.
(let [[adam neutron proton] (mx/transpose
(rest '(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67])))]
(def adam adam)
(def neutron neutron)
(def proton proton))
It's not exactly good form to use def inside a let, but you can do it. Also, I don't recommend naming the local variables defined by let with the same names as the top-level variables. As you can see, if makes the defs confusing. I did this on purpose here just to show how the scoping rule works: In (def adam adam), the first instance of "adam" represents the top-level variable that gets defined, whereas the second instance of "adam" represents the local var defined by let, containing [10 23]. The result is:
adam ;=> [10 23]
neutron ;=> [22 23]
proton ;=> [3 67]
(I think there are probably some subtleties that I'm expressing incorrectly. If so, someone will no doubt comment about it.)