Use (partykit- Nodeapply- Info_node) to extrac information within the function - function

I am trying to extract information coming from nodeapply( info_node) function.
I want to automate the process so that I can extract the information of a list a node ids and operate on them later.
The example as follow:
data("cars", package = "datasets")
ct <- ctree(dist ~ speed, data = cars)
node5 <-nodeapply(as.simpleparty(ct), ids = 5, info_node)
node5$`5`$n
I use the code above to extract the number of records on node 5.
I want to create a function to extract the info from a series of node:
infonode <- function(x,y){
for (j in x){
info = nodeapply(y, j, info_node)
print(info$`j`$n)
}
}
But the result always comes back as null.
I wonder if the type of "J" is wrong within the function that leads to a null read in the print.
If someone could help me it would be greatly appreciated!
Thanks

You can give nodeapply() a list of ids and then not only list with a single element will be extracted but a list of all selected nodes. This is the only partykit-specific part of your question.
From that point forward it is simply operating on standard named lists in R, without anything partykit specific about that. To address your problem you can easily use [[ indexing rather than $ indexing, either with an integer or a character index:
node5[[1]]$n
## n
## 19
node5[["5"]]$n
## n
## 19
Thus, in your infonode() function you could replace info$j$n by either info[[1]]$n or info[[as.character(j)]]$n.
However, I would simply do this with an sapply():
ni <- nodeapply(as.simpleparty(ct), ids = 3:5, info_node)
sapply(ni, "[[", "n")
## 3.n 4.n 5.n
## 15 16 19
Or some variation of this...

Related

Create a loop within a function so that URLs return a dataframe

I was provided with a list of identifiers (in this case the identifier is called an NPI). These identifiers can be copied and pasted to this website (https://npiregistry.cms.hhs.gov/registry/?). I want to return the name of the NPI number, name of the physician, address, phone number, and specialty.
I have over 3,000 identifiers so a copy and paste is not efficient and not easily repeatable for future use.
If possible, I would like to create a list of URLs, pass them into a function, and received a dataframe that provides me with the variables mentioned above (NPI, NAME, ADDRESS, PHONE, SPECIALTY).
I was able to write a function that produces the URLs needed:
Here are some NPI numbers for reference: 1417024746, 1386790517, 1518101096, 1255500625.
This is my code for reading in the file that contains my NPIs
npiList <- c("1417024746", "1386790517", "1518101096", "1255500625")
npiList <- as.list(npiList)
npiList <- unlist(npiList, use.names = FALSE)
This is the function to return the list of URLs:
npiaddress <- function(x){
url <- paste("https://npiregistry.cms.hhs.gov/registry/search-results-
table?number=",x,"&addressType=ANY", sep = "")
return(url)
}
I saved the list to a variable and perhaps this is my downfall:
npi_urls <- npiaddress(npiList)
From here I wrote a function that can accept a single URL, retrieves the data I want and turns it into a dataframe. My issue is that I cannot pass multiple URLs:
npiLookup <- function (x){
url <- x
webpage <- read_html(url)
npi_html <- html_nodes(webpage, "td")
npi <- html_text(npi_html)
npi[4] <- gsub("\r?\n|\r", " ", npi[4])
npi[4] <- gsub("\r?\t|\r", " ", npi[4])
npiFinal <- npi[c(1:2,4:6)]
npiFinal <- as.data.frame(npiFinal)
npiFinal <- t(npiFinal)
npiFinal <- as.data.frame(npiFinal)
names(npiFinal) <- c("NPI", "NAME", "ADDRESS", "PHONE", "SPECIALTY")
return(npiFinal)
}
For example:
If I wanted to get a dataframe for the following identifier (1417024746), I can run this and it works:
x <- npiLookup("https://npiregistry.cms.hhs.gov/registry/search-results-table?number=1417024746&addressType=ANY")
View(x)
My output for the example returns the NPI, NAME, ADDRESS, PHONE, SPECIALTY as desired, but again, I need to do this for several thousand NPI identifiers. I feel like I need a loop within npiLookup. I've also tried to put npi_urls into the npiLookup function but it does not work.
Thank you for any help and for taking the time to read.
You're most of the way there. The final step uses this useful R idiom:
do.call(rbind,lapply(npiList,function(npi) {url=npiaddress(npi); npiLookup(url)}))
do.call is a base R function that applies a function (in this case rbind) to the list produced by lapply. That list is the result of running your npiLookup function on the url produced by your npiaddress for each element of npiList.
A few further comments for future reference should anyone else come upon this question: (1) I don't know why you're doing the as.list, unlist sequence at the beginning; it's redundant and probably unnecessary. (2) The NPI registry provides a programming interface (API) that avoids the need to scrape data from the HTML pages; this might be more robust in the long run. (3) The NPI registry provides the entire dataset as a downloadable file; this might have been an easier way to go.

How to use/convert my R vector object in a rest (est.ensembl.org) api query?

Good morning.
I want to use the the following rest: https://rest.ensembl.org/documentation/info/sequence_id_post
I have the vector object (ids) in R:
> ids
[1] "NM_007294.3:c.932_933insT" "NM_007294.3:c.1883C>T" "NM_007294.3:c.2183A>C"
[4] "NM_007294.3:c.2321C>T" "NM_007294.3:c.4585G>A" "NM_007294.3:c.4681C>A"
I have to put this vector(ids) with more than 200 variables in the body= ids variable (bellow), according to the example of code below, for it works:
Code:
library(httr)
library(jsonlite)
library(xml2)
server <- "https://rest.ensembl.org"
ext <- "/vep/human/hgvs"
r <- POST(paste(server, ext, sep = ""), content_type("application/json"), accept("application/json"), body = '{ "hgvs_notations" : ["NM_007294.3:c.932_933insT", "NM_007294.3:c.1883C>T"] }')
stop_for_status(r)
head(fromJSON(toJSON(content(r))))
I know it's a json format, but when I convert my variable ids to json it's not in the correct format.
Do you have any suggestions?
Thanks for any help.
Leandro
I think that NM_007294.3:c.2321C>T is not a valid query to /sequence/id REST endpoint. It contains a sequence id (NM_007294.3) and a variant (c.2321C>T) and if you understood this literally, you are asking the server a letter T, since this call returns sequences.
Valid query would contain only sequence ids and you can use it like that (provided you have your ids in a vector):
r <- POST(paste(server, ext, sep = ""), content_type("application/json"), accept("application/json"), body = paste0('{ "ids" :', jsonlite::toJSON(ids), ' }')
Depending on the downstream scenario, making your ids unique might help/speed things up.

JSON parsing in a for loop in R

I'm trying to write a for loop that will take zip codes, make an API call to a database of Congressional information and then parse out only the parties of congressmen representing at zip code.
The issue is that some of the zip codes have more than one congressman and others have none at all, (an error on the part of the database, I think). That means I need to loop through the count returned by the original pull until there are no more representatives.
The issue is that the number of congressmen representing each zip code is different. Thus, I'd like to be able to write new variable names into my dataframe for each new congressman. That is, if there are 2 congressmen, I'd like to write new columns named "party.1" and "party.2", etc.
I have this code so far and I feel that I'm close, but I'm really stuck on what to do next. Thank you all for your help!
EDIT: I found this way to be easier, but I'm still not getting the results I'm looking for
library(rjson)
library(RCurl)
zips <- (c("10001","92037","90801", "94011")
test <- matrix(nrow=4,ncol=7)
temp <- NULL
tst <- NULL
for (i in 1:length(zips)) {
for (n in length(temp$count)) {
temp <- (fromJSON(getURL(paste('https://congress.api.sunlightfoundation.com/legislators/locate?zip=',
zips[i],'&apikey= 'INSERT YOUR API KEY', sep=""), .opts = list(ssl.verifypeer = FALSE))))
tst <- try(temp$results[[n]]$party, silent=T)
if(is(tst,"try-error"))
test[i,n] <- NA
else
test[i,n] <- (temp$results[[n]]$party)
}
}
install.packages("rsunlight")
library("rsunlight")
zips <- c("10001","92037","90801", "94011")
out <- lapply(zips, function(z) cg_legislators(zip = z))
# results for some only
sapply(out, "[[", "count")
# peek at results for one zip code
head(out[[1]]$results[,1:4])
bioguide_id birthday chamber contact_form
1 S000148 1950-11-23 senate http://www.schumer.senate.gov/Contact/contact_chuck.cfm
2 N000002 1947-06-13 house https://jerroldnadler.house.gov/forms/writeyourrep/default.aspx
3 M000087 1946-02-19 house https://maloney.house.gov/contact-me/email-me
4 G000555 1966-12-09 senate http://www.gillibrand.senate.gov/contact/
You can change as needed within a lapply or for loop to add columns, etc.
To pull out party could be as simple as lapply(zips, function(z) cg_legislators(zip = z)$results$party).

Parallel programming in R

I have a file that consists of multiple JSON objects. I need to read through these files and extract certain fields from the JSON objects. To complicate things, some of the objects do not contain all the fields. I am dealing with a large file of over 200,000 JSON objects. I would like to split job across multiple cores. I have tried to experiment with doSNOW, foreach, and parallel and really do not understand how to do this. The following is my code that I would like to make more efficient.
foreach (i in 2:length(linn)) %dopar% {
json_data <- fromJSON(linn[i])
if(names(json_data)[1]=="info")
next
mLocation <- ifelse('location' %!in% names(json_data$actor),'NULL',json_data$actor$location$displayName)
mRetweetCount <- ifelse('retweetCount' %!in% names(json_data),0,json_data$retweetCount)
mGeo <- ifelse('geo' %!in% names(json_data),c(-0,-0),json_data$geo$coordinates)
tweet <- rbind(tweet,
data.frame(
record.no = i,
id = json_data$id,
objecttype = json_data$actor$objectType,
postedtime = json_data$actor$postedTime,
location = mLocation,
displayname = json_data$generator$displayName,
link = json_data$generator$link,
body = json_data$body,
retweetcount = mRetweetCount,
geo = mGeo)
)
}
Rather than trying to parallelize an iteration, I think you're better off trying to vectorize (hmm, actually most of the below is still iterating...). For instance here we get all our records (no speed gain yet, though see below...)
json_data <- lapply(linn, fromJSON)
For location we pre-allocate a vector of NAs to represent records for which there is no location, then find records that do have a location (maybe there's a better way of doing this...) and update them
mLocation <- rep(NA, length(json_data))
idx <- sapply(json_data, function(x) "location" %in% names(x$actor))
mLocation[idx] <- sapply(json_data[idx], function(x) x$location$displayName)
Finally, create a 200,000 row data frame in a single call (rather than your 'copy and append' pattern, which makes a copy of the first row, then the first and second row, then the first, second, third row, then ... so N-squared rows, in addition to recreating factors and other data.frame specific expenses; this is likely where you spend most of your time)
data.frame(i=seq_along(json_data), location=mLocation)
The idea would be to accumulate all the columns, and then do just one call to data.frame(). I think you could cheat on parsing line-at-a-time, by pasting everything into a single string repersenting a JSON array, and parsing in one call
json_data <- fromJSON(sprintf("[%s]", paste(linn, collapse=",")))

How to rearrange this function to return the extended list in Haskell

I am doing problem 68 at project euler and came up with the following code in Haskell to return the list of numbers which fit the (given) solution:
lists = [n|n<- permutations [1..6] , ring n ]
ring [a,b,c,d,e,f] = (length $ nub $ map sum [[d,c,b],[f,b,a],[e,a,c]]) == 1
This only returns a list of lists of 6 numbers each which fit the solution. What I don't know how to do, is make it return the actual solution, the lists that fit the form:
[d,c,b],[f,b,a],[e,a,c]
How can I make lists return a list of this format?
(PS: I will add in the appropriate functions to return what the site actually wants later)
It's simply
lists = [ [[d,c,b],[f,b,a],[e,a,c]] | n#[a,b,c,d,e,f] <- permutations [1..6], ring n ]
Or in order to generate the strings:
[ foldl (++) "" $ map show [d,c,b,f,b,a,e,a,c] | n#[a,b,c,d,e,f] <- permutations [1..6], ring n ]