Get column name in apply function - function

I am trying to make a function that makes a small report for every column in a data frame by using apply. In the report I want to use the name of the column so I have to 'extract' it somehow and that is what my question is about. How do I get the name of the column in my apply function?
Here is a simple example where I want to use the name of the column in the graph title: (for now I just hardcoded the name as 'x')
x <- c(1,1,2,2,2,3)
y <- c(2,3,4,5,4,4)
Tb <- data.frame(x,y)
Dq_Hist <- function(Tab){
Name <- 'x'
Ttl <- paste('Variable: ',Name,'')
hist(Tab,main=Ttl,col=c('grey'),xlab=Name)
}
D <- apply(Tb,MARGIN=2,FUN=Dq_Hist)

Well, if nobody answers you got to find out yourself... And I found out that you can call sapply with an index list and use this index in the function. So the solution is:
x <- c(1,1,2,2,2,3)
y <- c(2,3,4,5,4,4)
Tb <- data.frame(x,y)
Dq_Hist <- function(i){
Name <- colnames(Tb)[i]
Ttl <- paste('Variable: ',Name,'')
hist(Tb[,i],main=Ttl,col=c('grey'),xlab=Name)
}
D <- sapply(1:ncol(Tb),Dq_Hist)

Related

Function for bootstrap - get sd of statistics of simulations

I am trying to write a function for bootstrap (an assignment). The question is as follow:
Compute the bootstrap standard error for:
- mean() and
- median() and
- the top quartile and
- the standard deviation of the price (yes, I want the standard deviation of the standard deviation... If this confuses you, go through the mean example and replace the computation of the mean by the_thing_we_want and realize that the_thing_we_want can be the standard deviation)
- max()
One way to approach this is to define a new function for each. Another is to write a bootstrap_func function that takes an additional argument called fun, and then you call it bootstrap_func(B, v, median) to have the same effect as bootstrap_median. Implement this function bootstrap_func.
Example call to this function: bootstrap_func(1000, vienna_data$price, mean).
This is what I attempt to do:
hotels_price <- read.csv("data_repo/hotels-europe/clean/hotels-europe_price.csv")
hotels_features_vienna <- read.csv("data_repo/hotels-europe/clean/hotels-europe_features.csv") %>%
filter(city == "Vienna")
hotels_vienna <- left_join(hotels_price, hotels_features_vienna) %>%
filter(across(.cols = everything(), function(x){!is.na(x)}))
set.seed(7777)
B <- 100 # B is asked to be set to 100 - the number of simulations
v <- hotels_vienna$price
get_sim <- function(v) {
sample(v, replace = TRUE)
}
fun <- function(get_sim) {
mean <- function(get_sim) {mean(get_sim)}
median <- function(get_sim) {mean(get_sim)}
top_quartile <- function(get_sim) {quantile(get_sim, 0.75)}
sd <- function(get_sim) {sd(get_sim)}
max <- function(get_sim) {max(get_sim)}
}
sim_stats <- function(B) {
replicate(B, fun(get_sim()))
}
bootstrap_func <- function(B, v, fun) {
sd(sim_stats())
}
bootstrap_func(100, hotels_vienna$price, sd)
When I try to run the last line R throw an error "Error: C stack usage 15926432 is too close to the limit". I do not know how to fix this, and more importantly, if my code will get to the desired results. I appreciate any inputs. TIA.

How to get descriptive table for both continuous and categorical variables?

I want to get descriptive table in html format for all variables that are in data frame. I need for continuous variables mean and standard deviation. For categorical variables frequency (absolute count) of each category and percentage of each category. Also I need the count of missing values to be included.
Lets use this data:
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
I want to get table in html format that will look like this:
----------------------------------------------------------------------
Variables N (missing) Mean (SD) / %
----------------------------------------------------------------------
len 59 (1) 18.9 (7.65)
supp
OJ 30 50%
VC 29 48.33%
NA 1 1.67%
dose 60 1.17 (0.629)
I need also to set the number of digits after decimal point to show.
If you know better variant to display that information in html in better way than please provide your solution.
Here's a programatic way to create separate summary tables for the numeric and factor columns. Note that this doesn't make note of NAs in the table as you requested, but does ignore NAs to calculate summary stats as you did. It's a starting point, anyway. From here you could combine the tables and format the headers however you want.
If you knit this code within an RMarkdown document with HTML output, kable will automatically generate the html table and a css will format the table nicely with a horizontal rules as pictured below. Note that there's also a booktabs option to kable that makes prettier tables like the LaTeX booktabs package. Otherwise, see the documentation for knitr::kable for options.
library(dplyr)
library(tidyr)
library(knitr)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
numeric_cols <- dplyr::select_if(df, is.numeric) %>%
gather(key = "variable", value = "value") %>%
group_by(variable) %>%
summarize(count = n(),
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE))
factor_cols <- dplyr::select_if(df, is.factor) %>%
gather(key = "variable", value = "value") %>%
group_by(variable, value) %>%
summarize(count = n()) %>%
mutate(p = count / sum(count, na.rm = TRUE))
knitr::kable(numeric_cols)
knitr::kable(factor_cols)
I found r package table1 that does what I want. Here is a code:
library(table1)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
table1(reformulate(colnames(df)), data=df)

Displaying a vector of values wrapping at screen width [Shiny]

I have a vector of values and each value is associated with a name; length of vector changes as per the user input. Although I used table related commands, I like to know other ways to display this kind of data, which is essentially a vector (a single row) of values with names). The problem shows up when selected sample size produces the output that is greater than screen width. Scrolling horizontally allows flexibility to glance over the data, but I am looking for a solution that wraps up the data at the screen width and prints in multiple rows without the need to scroll. Here is the code to play:
ui <- fluidPage(
tabPanel("Test",
numericInput("samsize","specify sample size",4,1,52),
tableOutput('table')
))
server <- function(input, output) {
data <- reactive({
# create a vector of lower- and upper- case alphabets
# create a vector assigning numbers to alphabets
alphabets <- c(letters,LETTERS)
Index <- seq(1,length(alphabets),1)
names(Index) <- alphabets
# sample values
SampleIndex <- sample(Index,input$samsize)
# convert it into a matrix
data <- matrix(SampleIndex,nrow=1)
colnames(data)=names(SampleIndex)
data
})
output$table <- renderTable(data(),digits = 0)
}
shinyApp(ui, server)
As you see in the below picture, for a sample size '36' one need to scroll the page horizontally to see all the values. width in renderTable did not offer any solution Converting data into a html object/text might be one option, but not sure how to retain the names.
You can use renderUI together with uiOutput to create yourself the HTML object you want to display for example using div:
library(shiny)
ui <- fluidPage(
tabPanel("Test",
numericInput("samsize","specify sample size",4,1,52),
uiOutput('myTable')
))
server <- function(input, output) {
data <- reactive({
alphabets <- c(letters,LETTERS)
Index <- seq(1,length(alphabets),1)
names(Index) <- alphabets
# Notice I don't put the vector in a one row matrix as in you example
sample(Index,input$samsize)
})
library(purrr) # map is a nice alternative to lapply
output$myTable <- renderUI(map(names(data()),~div(strong(.),
div(data()[.]),
style="float:left;padding:10px 20px;")))
}
shinyApp(ui, server)

Remove lines empty element (NULL) with readHTMLTable()

I try to remove rows in HTML table with almost one empty element, represents by (NULL) using readHTMLTable() function in XLM package without success. In my code:
require(httr)
require(XML)
Function for read HTML table
readFE<- function (x, URL = ""){
FILE <- GET(url=URL)
tables <- getNodeSet(htmlParse(FILE), "//table")
FE_tab <- readHTMLTable(tables[[1]],
header = c("empresa","desc_projeto","desc_regiao",
"cadastrador_por","cod_talhao","descricao",
"formiga_area","qtd_destruido","latitude",
"longitude","data_cadastro"),
colClasses = c("character","character","character",
"character","character","character",
"character","character","character",
"character","character"),
trim = TRUE, stringsAsFactors = FALSE
)
x<-NULL
results <- x
x<-FE_tab[-(1),]
results <- x
results
}
--
Exemple
tableFE<-readFE(URL="https://www.dropbox.com/s/mb316ghr4irxipr/TALHOES_AGENTES.htm?dl=1")
tableFE
Someone could help me?
Thanks,
Alexandre

how to replace one part in url by using R

Currently I have the website
http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=1&showViewpoints=0&sortBy=bySubmissionDateDescending
I want to replace this part
pageNumber=1
to be replaced with a sequence of numbers such as 1,2,3,.....n
I know I need to use the paste function. But can do I locate this number and replace it?
You can use the parseQueryString function from the shiny package or parse_url and build_url from httr package.
require(shiny)
testURL <- "<http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=1&showViewpoints=0&sortBy=bySubmissionDateDescending>"
parseURL <- parseQueryString(testURL)
parseURL$pageNumber <- 4
newURL <- paste(names(parseURL), parseURL, sep = "=", collapse="&")
require(httr)
testURL <- "<http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=1&showViewpoints=0&sortBy=bySubmissionDateDescending>"
parseURL <- parse_url(testURL)
parseURL$query$pageNumber <- 4
newURL <- build_url(parseURL)
Try this:
# inputs
URL1 <- "...whatever...&pageNumber=1"
i <- 2
URL2 <- sub("pageNumber=1", paste0("pageNumber=", i), URL1)
or using a perl zero width regex:
URL2 <- sub("(?<=pageNumber=)1", i, URL1, perl = TRUE)
If we know that there is no 1 prior to pageNumber, as is the case here, then it simplifies to just:
URL2 <- sub(1, i, URL1)
Another very simple approach is to use sprintf:
sprintf('http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=%s&showViewpoints=0&sortBy=bySubmissionDateDescending',
1:10)
In the above code, the %s in the string provided as the first argument is replaced by each element of the vector provided in the second argument, in turn.
See ?sprintf for more details about this very handy string manipulation function.
simplest approach would be splitting the string to
var part1 = " http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=";
var number =1;
var part2 = "&showViewpoints=0&sortBy=bySubmissionDateDescending"
link = part1+number+part2
another approach would be to use string.replace("pageNumber=1","pageNumber=2");
and another option would be to use regex but im not good with that youll have to do some googling.
i figure it out now, the code is here.
listurl<-paste("http://rads.stackoverflow.com/amzn/click/B0047DVWLW",1:218)
ipadlisturl<-paste(listurl,"&showViewpoints=0&sortBy=bySubmissionDateDescending")