kableExtra::text_spec - Rotate Text - Unwanted Commas - html

Using RSTudio > Blogdown > Hugo to create a blog
Inserting this R in a post. When the HTML is rendered there are commas between the rotated letters. Why is that?
library("knitr")
library("kableExtra")
library("dplyr")
library("formattable")
library("stringr")
library("tidyverse")
p1 <- c("R Markdown is pretty neat. You can do things like this. I wonder why more people don't")
p1 <- c("Hello World!")
p2 <- c("do this. It's so much easier to read. NOTE: Those people live here.")
p_text <- unlist(strsplit(p1, "")) # strsplit returns a list. Make it a vector.
num_char <- length(p_text)
p_angle <- seq(30, 360, 30)
num_ang <- length(p_angle)
p_angle_long <- rep(p_angle, ceiling(num_char / num_ang)) # Repeat the angles for the length of the string
p_angle_long <- p_angle_long[1:num_char]
rtext <- text_spec(p_text, "html", bold = TRUE, angle = p_angle_long)

The output of text_spec is a vector with each letter (+ accompanying HTML tags) as a separate element. You can combine into a single string with paste0:
# Example RMarkdown chunk that produces rotated text:
```{r txt, results='asis'}
library("knitr")
library("kableExtra")
library("tidyverse")
p1 <- c("Hello World!")
p2 <- c("do this. It's so much easier to read. NOTE: Those people live here.")
p_text <- unlist(strsplit(p1, "")) # strsplit returns a list. Make it a vector.
num_char <- length(p_text)
p_angle <- seq(30, 360, 30)
num_ang <- length(p_angle)
p_angle_long <- rep(p_angle, ceiling(num_char / num_ang))
# Repeat the angles for the length of the string
p_angle_long <- p_angle_long[1:num_char]
rtext <- text_spec(p_text, "html", bold = TRUE, angle = p_angle_long)
cat(paste0(rtext, collapse=""))
```

Related

column_spec crashing PDF in Rmd output to both PDF and html in shiny app

EDIT: Because I had set the format options in a global function, I have to set either latex_options or bootstrap_options in the kable_styling() call. I was using bootstrap_options which wasn't being read by the latex. My work-around is to make the tables twice, once in a chunk for html, and once in a chunk for latex. Not great, but it works if I click the Knit button and choose Knit to PDF. However, it throws the original error when I try to run it in the shiny app.
I have created a test version (MiniTest) of my project. What I need to do is have a shiny app run with a tab that will produce an html file for a user-chosen (reactive) Country, and provide an Excel download (I have that working so kept it out of this example), and a PDF download. I knit in an .Rmd which chooses the format and allows for parameterization. (The shiny part was set up by someone else, from whom I took over this project when they left before finishing it.)
I use kable and kableExtra to create and format tables, as I heard it words for both html and LaTeX output. The HTML is more as less as I want it. I can knit either html or PDF, and it runs, BUT when in the shiny app, only the html portion works. I think I have narrowed down the PDF issue(s) to column_spec crashing the download. If I comment out the column_spec lines in t01 and t02, the Download PDF runs. But I need that formatting. I'm sorry, but I've lost track of all the sites I have searched.
In global.R, I set:
countries <- c("ABC", "DEF", "GHI", "JKL")
In the .Rmd, I have YAML set up (with two-space indents for Country and output types):
params:
Country: ABC
output:
pdf_document: default
html_document: default
Relevant .Rmd chunks and inline code include:
knitr::opts_chunk$set(echo = FALSE)
options(knitr.table.format = function() {
if (knitr::is_latex_output()) "latex" else "html"
})
library(shiny)
library(htmlwidgets)
library(shinythemes)
library(shinydashboard)
library(shinyjs)
library(shinycssloaders)
library(markdown)
library(tidyr)
library(tidyverse)
library(janitor)
library(kableExtra)
options(scipen = 999)
mini <- mtcars %>%
tibble::rownames_to_column(var = "car") %>%
mutate(Country = c(rep("ABC", 8), rep("DEF", 8), rep("GHI", 8), rep("JKL", 8)))
## https://bookdown.org/yihui/rmarkdown-cookbook/font-color.html
colorize <- function(x, color) {
if (knitr::is_latex_output()) {
## hack setting color='blue' instead of a hexcode with # that breaks the LaTeX code
sprintf("\\textcolor{%s}{%s}", color = 'blue', x) ## works, but isn't right blue
} else if (knitr::is_html_output()) {
sprintf("<span style='color: %s;'>%s</span>", color, x)
} else x
}
## make two tables with `kable` and `kableExtra`
new_title <- paste0("Dynamically Changing Country Name in column", params$Country)
t01 <- mini %>%
filter(Country == params$Country) %>%
select(car, mpg:hp) %>%
rename({{new_title}} := car) %>%
kable(align = c("l", "c", "c", "c", "c")) %>%
kable_styling(full_width = FALSE, position = "left", bootstrap_options = c("striped", "condensed")) %>%
column_spec(1, bold = TRUE) %>%
column_spec(2:3, width = "5em") %>%
row_spec(0, color = "#2A64AB") %>%
row_spec(6, bold = TRUE)
t02_title <- paste0(params$Country, " Table with Dollar Signs in Var Names")
t02 <- mini %>%
filter(Country == params$Country) %>%
select(car, drat, wt) %>%
mutate(car = case_when(car == "Mazda RX4" ~ "Mazda RX4 (US\\$)*", TRUE ~ as.character(car))) %>%
## want to blank out column names - removing them entirely would be best, but it fails
kable(align = c("l", "r", "c"), escape = TRUE, col.names = c("", "", "")) %>%
kable_styling(full_width = FALSE, position = "left", bootstrap_options = c("striped", "condensed")) %>%
column_spec(1, bold = TRUE) %>%
column_spec(2, width = "10em") %>%
footnote(general = "*Never smart to start with an asterisk, but here we are", general_title = "")
## make two charts with `ggplot2`
chart1 <- mini %>%
filter(Country == params$Country) %>%
select(car, mpg:hp) %>%
ggplot2::ggplot(mapping = aes(x = mpg)) +
geom_col(aes(y = `cyl`, fill = "cyl"), color = "black")
c1_title <- paste0("Some fab title here for ", params$Country)
chart2 <- mini %>%
filter(Country == params$Country) %>%
select(car, vs:carb) %>%
ggplot2::ggplot(mapping = aes(x = carb)) +
geom_col(aes(y = `gear`, fill = "gear"), color = "black")
c2_title <- paste0("Another chart, ", params$Country)
## make a "tiny" LaTeX environment that is only generated for LaTeX output, with chunk setting `include = knitr::is_latex_output()`.
knitr::asis_output('\n\n\\begin{tiny}')
## Table 1
t01
I expect a PDF to pop up, but instead a Save File box pops up asking to save "DownloadPDF" with no file extension. The ui.R is supposed to name it as "FactCountryName.pdf" where "CountryName" is input from the Country the user chose in the drop-down list. Regardless of whether I choose Save (nothing happens) or Cancel, my R throws the following error:
```
! LaTeX Error: Illegal character in array arg.
```
If I comment out the line column_spec(1, bold = TRUE) %>%, the error changes to:
```
! Use of \#array doesn't match its definition.
\new#ifnextchar ...served#d = #1\def \reserved#a {
#2}\def \reserved#b {#3}\f...
l.74 ...m}|>{\centering\arraybackslash}p{5em}|c|c}
```
Please help!
Turns out that using the Knit button in R automatically loads the required LaTeX packages, such as booktabs. Running the file in the Shiny app was not loading all the packages needed. All I had to do was specifically call the extra packages in the YAML (which I found by looking at the .tex file made from the PDF through Knit button).
---
params:
Country: ABC
header-includes:
- \usepackage{booktabs}
- \usepackage{longtable}
- \usepackage{array}
- \usepackage{multirow}
- \usepackage{wrapfig}
- \usepackage{float}
- \usepackage{colortbl}
- \usepackage{pdflscape}
- \usepackage{tabu}
- \usepackage{threeparttable}
- \usepackage{threeparttablex}
output:
pdf_document:
keep_tex: true
html_document: default
---

R - Issue with the DOM of the danish parliament (webscraping)

I've been working on a webscraping project for the political science department at my university.
The Danish parliament is very transparent about their democratic process and they are uploading all the legislative documents on their website. I've been crawling over all pages starting 2008. Right now I'm parsing the information into a dataframe and I'm having an issue that I was not able to resolve so far.
If we look at the DOM we can see that they named most of the objects div.tingdok-normal. The number of objects varies between 16-19. To parse the information correctly for my dataframe I tried to grep out the necessary parts according to patterns. However, the issue is that sometimes my pattern match more than once and I don't know how to tell R that I only want the first match.
for the sake of an example I include some code:
final.url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
to.save <- getURL(final.url)
p <- read_html(to.save)
normal <- p %>% html_nodes("div.tingdok-normal > span") %>% html_text(trim =TRUE)
tomatch <- c("Forkastet regeringsforslag", "Forkastet privat forslag", "Vedtaget regeringsforslag", "Vedtaget privat forslag")
type <- unique (grep(paste(tomatch, collapse="|"), results, value = TRUE))
Maybe you can help me with that
My understanding is that you want to extract the text of the webpage, because the "tingdok-normal" are related to the text. I was able to get the text of the webpage with the following code. Also, the following code identifies the position of the first "regex hit" of the different patterns to match.
library(pagedown)
library(pdftools)
library(stringr)
pagedown::chrome_print("https://www.ft.dk/samling/20161/lovforslag/l154/index.htm",
"C:/.../danish.pdf")
text <- pdftools::pdf_text("C:/.../danish.pdf")
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}
Here is another approach :
library(RDCOMClient)
library(stringr)
library(rvest)
url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$Document()
html_Content <- doc$documentElement()$innerText()
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}

Read HTML into R

I would like R to take a word in a column in a dataset, and return a value from a website. The code I have so far is below. So, for each word in the data frame column, it will go to the website and return the pronunciation (for example, the pronunciation on http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in=word&stress=-s is "W ER1 D"). I have looked at the HTML of the website, and it's unclear what I would need to enter to return this value - it's between <tt> and </tt> but there are many of these. I'm also not sure how to then get that value into R. Thank you.
library(xml2)
for (word in df$word) {
result <- read_html("http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in="word"&stress=-s")
}
Parsing HTML is a tricky task in R. There are a couple ways though. If the HTML converts well to XML and the website/API always returns the same structure then you can use tools to parse XML. Otherwise you could use regex and call stringr::str_extract() on the HTML.
For your case, it is fairly easy to get the value you're looking for using XML tools. It's true that there are a lot of <tt> tags but the one you want is always in the second instance so you can just pull out that one.
#load packages. dplyr is just to use the pipe %>% function
library(httr)
library(XML)
library(dplyr)
#test words
wordlist = c('happy', 'sad')
for (word in wordlist){
#build the url and GET the result
url <- paste0("http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in=",word,"&stress=-s")
h <- handle(url)
res <- GET(handle = h)
#parse the HTML
resXML <- htmlParse(content(res, as = "text"))
#retrieve second <tt>
print(getNodeSet(resXML, '//tt[2]') %>% sapply(., xmlValue))
#don't abuse your API
Sys.sleep(0.1)
}
>[1] "HH AE1 P IY0 ."
>[1] "S AE1 D ."
Good luck!
EDIT: This code will return a dataframe:
#load packages. dplyr is just to use the pipe %>% function
library(httr)
library(XML)
library(dplyr)
#test words
wordlist = c('happy', 'sad')
#initializae the dataframe with pronunciation field
pronunciation_list <- data.frame(pronunciation=character(),stringsAsFactors = F)
#loop over the words
for (word in wordlist){
#build the url and GET the result
url <- paste0("http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in=",word,"&stress=-s")
h <- handle(url)
res <- GET(handle = h)
#parse the HTML
resXML <- htmlParse(content(res, as = "text"))
#retrieve second <tt>
to_add <- data.frame(pronunciation=(getNodeSet(resXML, '//tt[2]') %>% sapply(., xmlValue)))
#bind the data
pronunciation_list<- rbind(pronunciation_list, to_add)
#don't abuse your API
Sys.sleep(0.1)
}

R highcharter get data from plots saved as html

I plot data with highcharter package in R, and save them as html to keep interactive features. In most cases I plot more than one graph, therefore bring them together as a canvas.
require(highcharter)
hc_list <- lapply(list(sin,cos,tan,tanh),mapply,seq(1,5,by = 0.1)) %>%
lapply(function(x) highchart() %>% hc_add_series(x))
hc_grid <- hw_grid(hc_list,ncol = 2)
htmltools::browsable(hc_grid) # print
htmltools::save_html(hc_grid,"test_grid.html") # save
I want to extract the data from plots that I have saved as html in the past, just like these. Normally I would do hc_list[[1]]$x$hc_opts$series, but when I import html into R and try to do the same, I get an error. It won't do the job.
> hc_imported <- htmltools::includeHTML("test_grid.html")
> hc_imported[[1]]$x$hc_opts$series
Error in hc_imported$x : $ operator is invalid for atomic vectors
If I would be able to write a function like
get_my_data(my_imported_highcharter,3) # get data from 3rd plot
it would be the best. Regards.
You can use below code
require(highcharter)
hc_list <- lapply(list(sin,cos,tan,tanh),mapply,seq(1,5,by = 0.1)) %>%
lapply(function(x) highchart() %>% hc_add_series(x))
hc_grid <- hw_grid(hc_list,ncol = 2)
htmltools::browsable(hc_grid) # print
htmltools::save_html(hc_grid,"test_grid.html") # save
# hc_imported <- htmltools::includeHTML("test_grid.html")
# hc_imported[[1]]$x$hc_opts$series
library(jsonlite)
library(RCurl)
library(XML)
get_my_data<-function(my_imported_highcharter,n){
webpage <- readLines(my_imported_highcharter)
pagetree <- htmlTreeParse(webpage, error=function(...){})
body <- pagetree$children$html$children$body
divbodyContent <- body$children$div$children[[n]]
script<-divbodyContent$children[[2]]
data<-as.character(script$children[[1]])[6]
data<-fromJSON(data,simplifyVector = FALSE)
data<-data$x$hc_opts$series[[1]]$data
return(data)
}
get_my_data("test_grid.html",3)
get_my_data("test_grid.html",1)

How to read a <li> table in a webpage

I debug the program many times to get the result as follows:
url 研究所知识库列表
/handle/1471x/1 力学研究所
/handle/1471x/8865 半导体研究所
However, no metter what parameters I use, the result is not correct. The content in this table is one part of the basis of my further analysis, and I am very trembled for it. I'm looking forward to your help with great sincerity.
## download community-list ---the 1st level of IR Grid
#loading webpage and analyzing
community_url<-"http://www.irgrid.ac.cn/community-list"
com_source <- readLines(community_url, encoding = "UTF-8")
com_parsed <- htmlTreeParse(com_source, encoding = "UTF-8", useInternalNodes = TRUE)
# get table specs
tableNodes <- getNodeSet(com_parsed, "//table")
com_tb<-readHTMLTable(tableNodes[[8]], header=TRUE)
# get External links
xpath <- "//a/#href"
getHTMLExternalFiles(tableNodes[[8]], xpQuery = xpath)
it is unclear exactly what you want your end result to look like but if you modify your xpath statements a bit to take advantage of the DOM structure you can get something like this:
library(XML)
community_url<-"http://www.irgrid.ac.cn/community-list"
com_source <- readLines(community_url, encoding = "UTF-8")
com_parsed <- htmlTreeParse(com_source, encoding = "UTF-8", useInternalNodes = TRUE)
list_header <- xpathSApply(com_parsed, '//table[.//li]//h1', xmlValue)
hrefs <- xpathSApply(com_parsed, '//li[#class="communityLink"]//#href', function(x) unname(x))
display_text <- xpathSApply(com_parsed, '//li[#class="communityLink"]//a', xmlValue)
table_data <- cbind(display_text, hrefs)
colnames(table_data) <- c(list_header, "url")
table_data
console output causes stackoverflow to think this answer is spam but here is a screen shot: