Is there a way to get a vector with the name of all functions that one could use in R? - function

I would like to have a call that returns me a vector with the names of all function that I could call in the current R session. Does anybody know how to achieve this?
(I would like to check user entered variables against this vector. We had some unforseen problem with users entering e.g., c as variable names)
UPDATE: I would like to get the function names from all packages currently loaded.
SOLUTION (half way): Based on Joris Meys tip with lsf.str() I came up with the following function that returns a sorted vector with all currently available function names:
getFunctionNames <- function() {
loaded <- (.packages())
loaded <- paste("package:", loaded, sep ="")
return(sort(unlist(lapply(loaded, lsf.str))))
}
Bu,t see also the comments on Joris Meys' post for even better answers.

I'd use lsf.str() as a start.
eg : x <- as.character(lsf.str("package:base")) gives you a list of all functions in the base package. You could do add all packages you want to check against. stats and utils come to mind first.
EDIT : Regarding your question about currently loaded packages :
x <- unlist(sapply(search()[-1],function(x)as.character(lsf.str(x)))) see comments
pkgs <- search()
pkgs <- pkgs[grep("package:",pkgs)]
y <- unlist(sapply(pkgs,lsf.str))
does the trick.

I asked a similar Q on R-Help many moons ago (2007) and Prof. Brian Ripley provided this as a solution:
findfuns <- function(x) {
if(require(x, character.only=TRUE)) {
env <- paste("package", x, sep=":")
nm <- ls(env, all=TRUE)
nm[unlist(lapply(nm, function(n) exists(n, where=env,
mode="function",
inherits=FALSE)))]
} else character(0)
}
pkgs <- dir(.Library)
z <- lapply(pkgs, findfuns)
names(z) <- pkgs
Z <- sort(unique(unlist(z)))
Which gives output like:
> head(Z)
[1] "^" "-" "-.Date" "-.POSIXt" ":" "::"
This was for finding all the functions in packages specified by object pkgs so you can control which packages are loaded/checked against.
A modified version that work on the currently loaded set of packages would be:
findfuns2 <- function(pkgs) {
nm <- ls(pkgs, all = TRUE)
nm[unlist(lapply(nm, function(n) exists(n, where = pkgs,
mode = "function",
inherits = FALSE)))]
if(isTRUE(all.equal(length(nm), 0)))
character(0)
else
nm
}
pkgs <- search()
pkgs <- pkgs[grep("package:", pkgs)]
z <- lapply(pkgs, findfuns2)
z <- sort(unique(unlist(z)))
head(z)

Related

R - Issue with the DOM of the danish parliament (webscraping)

I've been working on a webscraping project for the political science department at my university.
The Danish parliament is very transparent about their democratic process and they are uploading all the legislative documents on their website. I've been crawling over all pages starting 2008. Right now I'm parsing the information into a dataframe and I'm having an issue that I was not able to resolve so far.
If we look at the DOM we can see that they named most of the objects div.tingdok-normal. The number of objects varies between 16-19. To parse the information correctly for my dataframe I tried to grep out the necessary parts according to patterns. However, the issue is that sometimes my pattern match more than once and I don't know how to tell R that I only want the first match.
for the sake of an example I include some code:
final.url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
to.save <- getURL(final.url)
p <- read_html(to.save)
normal <- p %>% html_nodes("div.tingdok-normal > span") %>% html_text(trim =TRUE)
tomatch <- c("Forkastet regeringsforslag", "Forkastet privat forslag", "Vedtaget regeringsforslag", "Vedtaget privat forslag")
type <- unique (grep(paste(tomatch, collapse="|"), results, value = TRUE))
Maybe you can help me with that
My understanding is that you want to extract the text of the webpage, because the "tingdok-normal" are related to the text. I was able to get the text of the webpage with the following code. Also, the following code identifies the position of the first "regex hit" of the different patterns to match.
library(pagedown)
library(pdftools)
library(stringr)
pagedown::chrome_print("https://www.ft.dk/samling/20161/lovforslag/l154/index.htm",
"C:/.../danish.pdf")
text <- pdftools::pdf_text("C:/.../danish.pdf")
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}
Here is another approach :
library(RDCOMClient)
library(stringr)
library(rvest)
url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$Document()
html_Content <- doc$documentElement()$innerText()
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}

rbind fromJSON page: duplicate rowname error

I was trying to rbind some json data scraped from api
library(jsonlite)
pop_dat <- data.frame()
for (i in 1:3) {
# Generate url for each page
url <- paste0('http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',i)
# Get json data from each page and transform it into dataframe
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
pop_dat <- rbind(pop_dat, dat)
}
However, it returns the following error:
Error in row.names<-.data.frame(*tmp*, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘2’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’, ‘29’, ‘3’, ‘30’, ‘31’, ‘32’, ‘33’, ‘34’, ‘35’, ‘36’, ‘37’, ‘38’, ‘39’, ‘4’, ‘40’, ‘41’, ‘42’, ‘43’, ‘44’, ‘45’, ‘46’, ‘47’, ‘48’, ‘49’, ‘5’, ‘50’, ‘6’, ‘7’, ‘8’, ‘9’
Changing the row.names to null doesn't work. I heard from someone it is due to the fact that some data are stored as lists here, which I don't quite understand.
I understand that there is an alternative package WDI to access this data and it works well, but I want to know how to resolve the duplicates row name problem here in general so that I can deal with similar situation where no alternative package is available.
I heard from someone it is due to the fact that some data are stored as lists...
This is correct. The solution is fairly simple, but I find it really easy to get tripped up by this. Right now you're using:
dat <- as.data.frame(fromJSON(url)[2],flatten = TRUE, row.names = NULL)
The problem comes from fromJSON(url)[2]. This should be fromJSON(url)[[2]] instead. According to the documentation, the key difference between [ and [[ is a single bracket can select multiple elements whereas [[ selects only one.
You can see how this works with some fake data.
foo <- list(
a = rnorm(100),
b = rnorm(100),
c = rnorm(100)
)
With [, you can select multiple values inside this list.
foo[c("a", "b")]
length(foo["a"]) # Result is 1 not 100 like you might expect.
With [[ the results are different.
foo[[c("a", "b")]] # Raises a subscript error.
foo[["a"]] #This works.
length(foo[["a"]]) # Result is 100.
So, your answer will depend on which subset operator you're using. For your problem, you'll want to use [[ to select a single data.frame inside of the list. Then, you should be able to use rbind correctly.
final <- data.frame()
for (i in 1:10) {
url <- paste0(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=',
i
)
res <- jsonlite::fromJSON(url, flatten = TRUE)[[2]]
final <- rbind(final, res)
}
Alternative solution with lapply:
urls <- sprintf(
'http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json&page=%s',
1:10
)
resl <- lapply(urls, jsonlite::fromJSON, flatten = TRUE)
resl <- lapply(resl, "[[", 2) # Use lapply to select the 2 element from each list element.
resl <- do.call(rbind, resl) # This takes all the elements of the list and uses those elements as the arguments for rbind.

R - Iterate through variable names

I'm trying to iterate (for) through variables. This is the code:
json_YR_FKA <- getURL('https://www.kimonolabs.com/api/b7i1ej7i?apikey=-')
json_9A_BTE <- getURL('https://www.kimonolabs.com/api/a83t52cg?apikey=-')
I will have two variables: json_YR_FKA / json_9A_BTE
matriculas <- ls()
matriculas <- str(matriculas)
matriculas
matriculas
[1] "json_9A_BTE" "json_YR_FKA"
And now, I need to do some things with both variables, so I have a for iteration:
for (i in 1:total){
avion <- fromJSON(matriculas[i])
# boring code
}
My idea is to do this:
First iteration: fromJSON(json_9A_BTE)
Second iteration: fromJSON(json_YR_FKA)
But at the beginning of the first iteration I get this:
fromJSON(matriculas[i])
Error in fromJSON(matriculas[1]) : unexpected character 'j'
And I don't know why.
Anyone?
Thanks in advance.
Luis
The approach that you're taking isn't very R-like, so I'm guessing you're coming from Python or some other language?
First off, I'm not sure what packages you're using. Where does getURL() come from? Which JSON package are you using? I suggest jsonlite since it can just pull data in from the url.
library(jsonlite)
myUrls <- c(json_YR_FKA='https://www.kimonolabs.com/api/b7i1ej7i?apikey=-',
json_9A_BTE='https://www.kimonolabs.com/api/a83t52cg?apikey=-')
matriculas <- lapply(myUrls, fromJSON) #Provides a list of your data.
f_doSomething <- function(dat) {
#boring code that puts data into 'out' variable
return(out)
}
avion <- lapply(matriculas, f_doSomething)

Issues with readHTMLTable in R

I was trying to use readHTMLTable to store some data in a dataframe in R Studio, but it just keeps telling me could not find function "ReadHTMLTable". I don't understand where I did wrong. Can someone take a lot at this and tell me how I can fix this? or if it works in your R studio.
url <- 'http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/case-counts.html'
ebola <- getURL(url)
ebola <- readHTMLTable(ebola, stringAsFactors = F)
Error: could not find function "readHTMLTable"
You are reading the table in with R default which converts characters to factors. You can use stringsAsFactors = FALSE in readHTMLTable and this will be passed to data.frame. Also the table uses commas for thousand seperators which you will need to remove :
library(XML)
url1 <-'http://en.wikipedia.org/wiki/List_of_Ebola_outbreaks'
df1<- readHTMLTable(url1, which = 2, stringsAsFactors = FALSE)
df1$"Human death"
mySum <- sum(as.integer(gsub(",", "", df1$"Human death")))
> mySum
[1] 6910
The problem is that you dont initialize de XML library
library(XML)

R: calling rq() within a function and defining the linear predictor

I am trying to call rq() of the package quantreg within a function. Herebelow is a simplified explanation of my problem.
If I follow the recommendations found at
http://developer.r-project.org/model-fitting-functions.txt, I have a design matrix after the line
x <- model.matrix(mt, mf, contrasts)
with the first column full of 1's to create an intercept.
Now, when I call rq(), I am obliged to use something like
fit <- rq (y ~ x [,2], tau = 0.5, ...)
My problem happens if there is more than 1 explanatory variable. I don't know how to find an automatic way to write:
x [,2] + x [,3] + x [,4] + ...
Here is the complete simplified code:
ao_qr <- function (formula, data, method = "br",...) {
cl <- match.call ()
## keep only the arguments which should go into the model
## frame
mf <- match.call (expand.dots = FALSE)
m <- match (c ("formula", "data"), names (mf), 0)
mf <- mf[c (1, m)]
mf$drop.unused.levels <- TRUE
mf[[1]] <- as.name ("model.frame")
mf <- eval.parent (mf)
if (method == "model.frame") return (mf)
## allow model.frame to update the terms object before
## saving it
mt <- attr (mf, "terms")
y <- model.response (mf, "numeric")
x <- model.matrix (mt, mf, contrasts)
## proceed with the quantile regression
fit <- rq (y ~ x[,2], tau = 0.5, ...)
print (summary (fit, se = "boot", R = 100))
}
I call the function with:
ao_qr(pain ~ treatment + extra, data = data.subset)
And here is how to get the data:
require (lqmm)
data(labor)
data <- labor
data.subset <- subset (data, time == 90)
data.subset$extra <- rnorm (65)
In this case, with this code, my linear predictor only includes "treatment". If I want "extra", I have to manually add x[,3] in the linear predictor of rq() in the code. This is not automatic and will not work on other datasets with unknown number of variables.
Does anyone know how to tackle this ?
Any help would be greatly appreciated !!!
I found a simple solution:
x[,2:ncol(x)]