Exporting R tables to HTML - html
Is there a way to easily export R tables to a simple HTML page?
The xtable function in the xtable package can export R tables to HTML tables. This blog entry describes how you can create HTML pages from Sweave documents.
It might be worth mentioning that there has been a new package specifically designed to convert (and style with css) data.frames (or tables) into HTML tables in an easy and intuitive way. It is called tableHTML. You can see a simple example below:
library(tableHTML)
#create an html table
tableHTML(mtcars)
#and to export in a file
write_tableHTML(tableHTML(mtcars), file = 'myfile.html')
You can see a detailed tutorial here as well.
Apart from xtable mentioned by #nullglob there are three more packages that might come handy here:
R2HTML
HTMLUtils
hwriter
The grammar of tables package gt is also an option.
Here's the example from the docs for generating a HTML table:
library(gt)
tab_html <-
gtcars %>%
dplyr::select(mfr, model, msrp) %>%
dplyr::slice(1:5) %>%
gt() %>%
tab_header(
title = md("Data listing from **gtcars**"),
subtitle = md("`gtcars` is an R dataset")
) %>%
as_raw_html()
In an issue for the DT package, someone posted how to use the DT package to get html in tables. I've pasted the relevant example code, modifying it to reference all columns with: targets = "_all".
library(DT)
render <- c(
"function(data, type, row){",
" if(type === 'sort'){",
" var parser = new DOMParser();",
" var doc = parser.parseFromString(data, 'text/html');",
" data = doc.querySelector('a').innerText;",
" }",
" return data;",
"}"
)
dat <- data.frame(
a = c("AAA", "BBB", "CCC"),
b = c(
'aaaaa',
'bbbbb',
'jjjjj'
)
)
datatable(
dat,
escape = FALSE,
options = list(
columnDefs = list(
list(targets = "_all", render = JS(render))
)
)
)
I hope this helps.
Related
How to extract table, convert it to data frame ,write as csv file and deal with child tables?
I cant write the file as a csv. file there is an error, as I want to Extract bike sharing system data from a Wiki page and convert the data to a data frame. but When I use the head function to see the table or str. function I cant determine the table it came out with so many unorganized details. Also Note that this HTML page at least contains three child nodes under the root HTML node. So, you will need to use (html_nodes(root_node, "table") function to get all its child nodes: <html> <table>(table1)</table> <table>(table2)</table> <table>(table3)</table> ... </html> url<- "https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems" root_node<-read_html(url) table_nodes <- html_nodes(root_node,"table") Bicycle_sharing <- html_table(table_nodes, fill = TRUE ) head(Bicycle_sharing) summary(Bicycle_sharing) str(Bicycle_sharing) ## Exporting the date frame as csv. file. write.csv(mtcars,"raw_bike_sharing_systems.csv",row.names = FALSE)
library(tidyverse) library(rvest) data <- "https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems" %>% read_html() %>% html_table() %>% getElement(2) %>% janitor::clean_names() data %>% write_csv(file = "bike_sharing.csv")
Extracting affiliation information from PubMed search string in R
I need some help extracting affiliation information from PubMed search strings in R. I have already successfully extracted affiliation information from a single PubMed ID XML, but now I have a search string of multiple terms that I need to extract the affiliation information from with hope of then creating a data frame with columns such as: PMID, author, country, state etc. This is my code so far: my_query <- (PubMed Search String) my_entrez_id <- get_pubmed_ids(my_query) my_abstracts_txt <- fetch_pubmed_data(my_entrez_id, format = "abstract") The PubMed search string is very long, hence why I haven't included it here. The main aim is therefore to produce a dataframe from this search string which is a table clearly showing affiliation and other general information from the PubMed articles. Any help would be greatly appreciated!
Have you tried the pubmedR package? https://cran.rstudio.com/web/packages/pubmedR/index.html library(pubmedR) library(purrr) library(tidyr) my_query <- '(((("diabetes mellitus"[MeSH Major Topic]) AND ("english"[Language])) AND (("2020/01/01"[Date - Create] : "3000"[Date - Create]))) AND ("coronavirus"[MeSH Major Topic])' my_request <- pmApiRequest(query = my_query, limit = 5) You can use the built in function my_pm_df <- pmApi2df(my_request) but this will not provide affiliations for all authors. You can use a combination of pluck() and map() from purrr to extract what you need into a tibble. auth <- pluck(my_request, "data") %>% { tibble( pmid = map_chr(., pluck, "MedlineCitation", "PMID", "text"), author_list = map(., pluck, "MedlineCitation", "Article", "AuthorList") ) } All author data is contained in that nested list, in the Author$AffiliationInfo list (note it is a list because one author can have multiple affiliations). ================================================= EDIT based on comments: First construct your request URLs. Make sure you replace &email with your email address: library(httr) library(xml2) mypmids <- c("32946812", "32921748", "32921727", "32921708", "32911500", "32894970", "32883566", "32880294", "32873658", "32856805", "32856803", "32820143", "32810084", "32809963", "32798472") my_query <- paste0("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=", mypmids, "&retmode=xml&email=MYEMAIL#MYDOMAIN.COM") I like to wrap my API requests in safely to catch any errors. Then use map to loop through the my_query vector. Note we Sys.sleep for 5 seconds after each request to comply with PubMed's rate limit. You can probably cut this down a bit seconds or even less, check in the API documentation. get_safely <- safely(GET) my_req <- map(my_query, function(z) { print(z) req <- get_safely(url = z) Sys.sleep(5) return(req) }) Next we parse the request with content() in read_xml(). Note that we are parsing the result: my_resp <- map(my_req, function(z) { read_xml(content(z$result, as = "text", encoding = "UTF-8")) }) This can probably be cleaned up some but it works. Coerce the AuthorInfo to a list and use a combination of map() , pluck() and unnest(). Note that a given author might have more than one affiliation but am only plucking the first one. my_pm_list <- map(my_resp, function (z) { my_xml <- xml_child(xml_child(z, 1), 1) pmid <- xml_text(xml_find_first(my_xml, "//PMID")) authinfo <- as_list(xml_find_all(my_xml, ".//AuthorList")) return(list(pmid, authinfo)) }) myauthinfo <- map(my_pmids, function(z) { auth <- z[[2]][[1]] }) mytibble <- myauthinfo %>% { tibble( lastname = map_depth(., 2, pluck, "LastName", 1, .default = NA_character_), firstname = map_depth(., 2, pluck, "ForeName", 1, .default = NA_character_), affil = map_depth(., 2, pluck, "AffiliationInfo", "Affiliation", 1, .default = NA_character_) ) } my_unnested_tibble <- mytibble %>% bind_cols(pmid = map_chr(my_pm_list, pluck, 1)) %>% unnest(c(lastname, firstname, affil))
SDMX to dataframe with RSDMX in R
I'm trying to get data from the Lithuanian Statistics Department. They offer SDMX API with either XML or JSON (LSD). The example XML shown is : https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217 which downloads the XML file. I tried following: devtools::install_github("opensdmx/rsdmx") library(rsdmx) string <- "https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217" medianage <- readSDMX(string) which results in error: <simpleError in doTryCatch(return(expr), name, parentenv, handler): Invalid SDMX-ML file> I also tried simply reading in the manually downloaded file devtools::install_github("opensdmx/rsdmx") library(rsdmx) medianage <- readSDMX(file="rest_data_M3010217_20180116163251.xml" , isURL = FALSE) medianage <- as.data.frame(medianage) results in medianage being NULL (empty) Maybe soneone has an idea, how I could solve downloading /transforming the data from LSD by using either: https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217 https://osp-rs.stat.gov.lt/rest_json/data/S3R629_M3010217 Thanks a lot!
In order to use rsdmx for this datasource, some enhancements have been added (see details at https://github.com/opensdmx/rsdmx/issues/141). You will need re-install rsdmx from Github (version 0.5-11) You can use the url of the SDMX-ML file library(rsdmx) url <- "https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217" medianage <- readSDMX(url) df <- as.data.frame(medianage) A connector has been added in rsdmx to facilitate data query on the LSD (Lithuanian Statistics Department) SDMX endpoint. See below an example on how to use it. sdmx <- readSDMX(providerId = "LSD", resource = "data", flowRef = "S3R629_M3010217", dsd = TRUE) df <- as.data.frame(sdmx, labels = TRUE) The above example shows how to enrich the data.frame with code labels extracted from the SDMX Data Structure Definition (DSD). For this, specify dsd = TRUE with readSDMX. This allows then to use labels = TRUE when converting to data.frame. For filtering data with readSDMX, e.g. (startPeriod, endPeriod, code filters), check this page https://github.com/opensdmx/rsdmx/wiki#readsdmx-as-helper-function
crawl data from "angular.callbacks" web
I want to use R to crawl the news from url(http://www.foxnews.com/search-results/search?q="AlphaGo"&ss=fn&start=0). Here is my code: url <- "http://api.foxnews.com/v1/content/search?q=%22AlphaGo%22&fields=date,description,title,url,image,type,taxonomy§ion.path=fnc&start=0&callback=angular.callbacks._0&cb=2017719162" html <- str_c(readLines(url,encoding = "UTF-8"),collapse = "") content_fox <- RJSONIO:: fromJSON(html) However, the json could not be understood as the error showed up : Error in file(con, "r") : cannot open the connection I notice that the json starts from angular.callbacks._0 , which I think might be the problem. Any idea how to fix this?
According to the answer in Parse JSONP with R, I ajusted my code with two new ones and it worked: url <- "http://api.foxnews.com/v1/content/search?q=%22AlphaGo%22&fields=date,description,title,url,image,type,taxonomy§ion.path=fnc&start=0&callback=angular.callbacks._0&cb=2017719162" html <- str_c(readLines(url,encoding = "UTF-8"),collapse = "") html <- sub('[^\\{]*', '', html) # remove function name and opening parenthesis html <- sub('\\)$', '', html) # remove closing parenthesis content_fox <- RJSONIO:: fromJSON(html)
Separate multiple JSON data in R
I am newbie of R and working on the below JSON file (snippet of head and relevant code example). {"mdsDat":{"x":[0.098453,-0.19334,-0.23836,-0.28512,0.010195,0.14132,-0.026636,-0.17141, 0.082936,-0.030503,0.22893,0.097832,0.19978,0.048286,0.050141,0.026101,-0.10637,0.040702, 0.013531,0.013531],"y":[-0.21144,-0.25048,0.14525,-0.06405,0.16668,-0.066238,-0.23403, 0.17033,-0.037128,-0.019674,0.0089501,0.0069049,0.10143,-0.14445,0.052727,0.15911,0.049328, 0.074852,0.045969,0.045969],"topics":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], "Freq":[16.358,13.397,12.979,10.383,7.5134,7.16,6.1765,4.9584,4.6035,3.4624,3.4249,3.0709, 1.8512,1.8512,1.4977,0.90723,0.23895,0.16034,0.0031352,0.0031352], "cluster":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]}, "tinfo":{"Term":["equation","equations","mathematics","beauty","mathematical","people", "beautiful","explain","world","time","understand","science","things","meaning","language", "symbols","simple","life","nature","interesting","art","agree","movie","find","numbers", "explore","mass","relationship","video","scientists","agree","scientists","amazing","learn", "apply","artistic","common","fear","beautiful","mathematics","study","mathematical","science", "meaning","physics","gravity","exchange","math","world","future","explained","sense", "process","words","equations","experience","move","faster","eyes","fall","nature","power", "human","exam","things","answer","people","world","ways","truth","equations","video", "balance","painting","space" ... "token.table":{"Term":["0","1","2","2","abstract","abstract","addition","admire","agree", "amazing","answer","answer","apple","application","applied","applied","apply","art","artist", "artist","artistic","arts","balance","balance","balance","beautiful","beautiful","beautiful", "beautiful","beauty","beauty","bring","bring","bunch","bunch","calculate","calculation", "collings","collings","collings","common","complex","complex","complex","contact","curiosity", "curiosity","daily","difficult","discover","documentary","documentary","documentary","earth", "earth","einstein","energy","energy","english","english","enjoy","enjoy","enjoy","equation", "equation","equation","equation","equation","equations","equations","equations","equations", "exam","exam","examination","examination","exchange","exchange","exchange","experience", "experience","explain","explained","explained","explore","eyes","eyes","fact","fall","famous", "famous","faster","faster","fear","feel","film","film","find","find","force","formula","formula","found", ... "work","world","world","world","worlds","years","years"],"Topic":[8,5,11,13,8,10,5,4,1,1,2, 15,9,10,9,12,1,3,4,9,1,5,2,4,10,1,4,7,14,2,6,3,15,10,15, ... ,16,3,7,2,14,2,5,1,8,4,9,10,15,1,2,14,9,11,13],"Freq":[0.97036,0.9702,0.75081,0.25027,0.22141, 0.77494,0.97584,0.96609,0.99493,0.98083,0.73954,0.24651,0.99013,0. ... In the situation of the project, I have created a variable of getJSONfield as below, getJSONfield <- json %>% spread_values(jsonList = jstring("token.table")) %>% select(jsonList) Also, it returns a JSON list something like this jsonNodes 1 list(Term = list("0", "1", "1", "2", "Data of JSON"), Topic = list(9, 1, 10,"Data of JSON"), Freq = list(0.99834, "Data of JSON")) And, I have to separate the multiple variables (i.e. Term, Topic and Freq) as head and edge of network diagram. Something that I would like to use the JSON data for: jsonNode <-lapply(json$**topic**, header=T, as.is = T) jsonTermsLinkS <- lapply(json$**term**, header=T, as.is=T) jsonTermsLinkE <- lapply(json$**freq**, header=T, as.is=T) But, first, I need to separate or call them successfully. Can anyone have any idea or advise on this? Great thanks if someone help me!