Putting hyperlinks into an HTML table in R - html

I am a biologist trying to do computer science for research, so I may be a bit naïve. But I would like to a make a table containing information from a data frame, with a hyperlink in one of the columns. I imagine this needs to be an html document (?). I found this post this post describing how to put a hyperlink into a data frame and write it as an HTML file using googleVis. I would like to use this approach (it is the only one I know and seems to work well) except I would like to replace the actual URL with a description. The real motivation being that I would like to include many of these hyperlinks, and the links have long addresses which is difficult to read.
To be verbose, I essentially want to do what I did here where we read 'here' but 'here' points to
http:// stackoverflow.com/questions/8030208/exporting-table-in-r-to-html-with-hyperlinks

From your previous question, you can have another list which contains the titles of the URL's:
url=c('http://nytimes.com', 'http://cnn.com', 'http://www.weather.gov'))
urlTitles=c('NY Times', 'CNN', 'Weather'))
foo <- transform(foo, url = paste('<a href = ', shQuote(url), '>', urlTitles, '</a>'))
x = gvisTable(foo, options = list(allowHTML = TRUE))
plot(x)

Building on Jack's answer but consolidating from different threads:
library(googleVis)
library(R2HTML)
url <- c('http://nytimes.com', 'http://cnn.com', 'http://www.weather.gov')
urlTitles <- c('NY Times', 'CNN', 'Weather')
foo <- data.frame(a=c(1,2,3), b=c(4,5,6), url=url)
foo <- transform(foo, url = paste('<a href = ', shQuote(url), '>', urlTitles, '</a>'))
x <- gvisTable(foo, options = list(allowHTML = TRUE))
plot(x)

Related

Deleting commas in R Markdown html output

I am using R Markdown to create an html file for regression results tables, which are produced by stargazer and lfe in a code chunk.
library(lfe); library(stargazer)
data <- data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
result <- stargazer(felm(y ~ x + z, data = data), type = 'html')
I create a html file win an inline code r result after the chunk above. However, a bunch of commas appear at the top of the table.
When I check the html code, I see almost every </tr> is followed by a comma.
How can I delete these commas?
Maybe not what you are looking for exactly but I am a huge fan of modelsummary. I knit to HTML to see how it looks and then usually knit to pdf. The modelsummary equivalent would look something like this
library(lfe)
library(modelsummary)
data = data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
results = felm(y ~ x + z, data = data)
modelsummary(results)
There are a lot of ways to customize it through kableExtra and other packages. The documentation is really good. Here is kind of a silly example
library(kableExtra)
modelsummary(results,
coef_map = c("x" = "Cool Treatment",
"z" = "Confounder",
"(Intercept)" = "(Intercept)")) %>%
row_spec(1, background = "#F5ABEA")

Loop through multiple links from an Excel file, open and download the corresponding webpages

I downloaded from MediaCloud an Excel file with 1719 links to different newspaper articles. I am trying to use R to loop through each link, open it and download all the corresponding online articles in a single searchable file (HTML, CSV, TXT, PDF - doesn't matter) that I can read and analyze later.
I went through all similar questions on Stack Overflow and a number of tutorials for downloading files and managed to assemble this code (I am very new to R):
express <-read.csv("C://Users//julir//Documents//Data//express.csv")
library(curl)
for (express$url in 2:1720)
destfile <- paste0("C://Users//julir//Documents//Data//results.csv")
download.file(express$url, destfile, method = "auto", quiet = TRUE, cacheOK=TRUE)
Whenever I try to run it though I get the following error:
Error in download.file(express$url, destfile = express$url, method = "auto", : 'url' must be a length-one character vector
I tried also this alternative method suggested online:
library(httr)
url <- express$url
response <- GET(express$url)
html_document <- content(response, type = "text", encoding = "UTF-8")
But I get the same mistake:
Error in parse_url(url) : length(url) == 1 is not TRUE
So I guess there is a problem with how the URLs are stored - but I can't understand how to fix it.
I am also not certain about the downloading process - I would ideally want all text on the HTML page - it seems unpractical to use selectors and rvest in this case - but I might be very wrong.
You need to look through the url's and read/parse each individually. You are essentially passing an array of urls into one request, which is why you see that error.
I don't know your content/urls, but here's an example of how you would approach this:
library(xml2)
library(jsonlite)
library(dplyr)
df <- data.frame(page_n = 1:5, urls = sprintf('https://www.politifact.com/factchecks/list/?page=%s', 1:5))
result_info <- lapply(df$urls, function(i){
raw <- read_html(i)
a_tags <- raw %>% xml_find_all(".//a[contains(#href,'factchecks/2021')]")
urls <- xml2::url_absolute(xml_attr(a_tags, "href"),xml_url(raw))
titles <- xml_text(a_tags) %>% stri_trim_both()
data.frame(title = titles, links = urls)
}) %>% rbind_pages()
result_info %>% head()
title
links
Says of UW-Madison, "It cost the university $50k (your tax dollars) to remove" a rock considered by some a symbol of racism.
https://www.politifact.com/factchecks/2021/aug/14/rachel-campos-duffy/no-taxpayer-funds-were-not-used-remove-rock-deemed/
“Rand Paul’s medical license was just revoked!”
https://www.politifact.com/factchecks/2021/aug/13/facebook-posts/no-rand-pauls-medical-license-wasnt-revoked/
Every time outgoing New York Gov. Andrew Cuomo “says the firearm industry ‘is immune from lawsuits,’ it's false.”
https://www.politifact.com/factchecks/2021/aug/13/elise-stefanik/refereeing-andrew-cuomo-elise-stefanik-firearm-ind/
The United States' southern border is "basically open" and is "a super spreader event.”
https://www.politifact.com/factchecks/2021/aug/13/gary-sides/north-carolina-school-leader-repeats-false-claims-/
There is a “0.05% chance of dying from COVID.”
https://www.politifact.com/factchecks/2021/aug/13/tiktok-posts/experts-break-down-numbers-catching-or-dying-covid/
The Biden administration is “not even testing these people” being released by Border Patrol into the U.S.
https://www.politifact.com/factchecks/2021/aug/13/ken-paxton/biden-administration-not-even-testing-migrants-rel/

.text is scrambled with numbers and special keys in BeautifuSoup

Hello I am currently using Python 3, BeautifulSoup 4 and, requests to scrape some information from supremenewyork.com UK. I have implemented a proxy script (that I know works) into the script. The only problem is that this website does not like programs to scrape this information automatically and so they have decided to scramble this script which I think makes it unusable as text.
My question: is there a way to get the text without using the .text thing and/or is there a way to get the script to read the text? and when it sees a special character like # to skip over it or to read the text when it sees & skip until it sees ;?
because basically how this website scrambles the text is by doing this. Here is an example, the text shown when you inspect element is:
supremetshirt
Which is supposed to say "supreme t-shirt" and so on (you get the idea, they don't use letters to scramble only numbers and special keys)
this  is kind of highlighted in a box automatically when you inspect the element using a VPN on the UK supreme website, and is different than the text (which isn't highlighted at all). And whenever I run my script without the proxy code onto my local supremenewyork.com, It works fine (but only because of the code, not being scrambled on my local website and I want to pull this info from the UK website) any ideas? here is my code:
import requests
from bs4 import BeautifulSoup
categorys = ['jackets', 'shirts', 'tops_sweaters', 'sweatshirts', 'pants', 'shorts', 't-shirts', 'hats', 'bags', 'accessories', 'shoes', 'skate']
catNumb = 0
#use new proxy every so often for testing (will add something that pulls proxys and usses them for you.
UK_Proxy1 = '51.143.153.167:80'
proxies = {
'http': 'http://' + UK_Proxy1 + '',
'https': 'https://' + UK_Proxy1 + '',
}
for cat in categorys:
catStr = str(categorys[catNumb])
cUrl = 'http://www.supremenewyork.com/shop/all/' + catStr
proxy_script = requests.get(cUrl, proxies=proxies).text
bSoup = BeautifulSoup(proxy_script, 'lxml')
print('\n*******************"'+ catStr.upper() + '"*******************\n')
catNumb += 1
for item in bSoup.find_all('div', class_='inner-article'):
url = item.a['href']
alt = item.find('img')['alt']
req = requests.get('http://www.supremenewyork.com' + url)
item_soup = BeautifulSoup(req.text, 'lxml')
name = item_soup.find('h1', itemprop='name').text
#name = item_soup.find('h1', itemprop='name')
style = item_soup.find('p', itemprop='model').text
#style = item_soup.find('p', itemprop='model')
print (alt +(' --- ')+ name +(' --- ')+ style)
#print(alt)
#print(str(name))
#print (str(style))
When I run this script I get this error:
name = item_soup.find('h1', itemprop='name').text
AttributeError: 'NoneType' object has no attribute 'text'
And so what I did was I un-hash-tagged the stuff that is hash-tagged above, and hash-tagged the other stuff that is similar but different, and I get some kind of str error and so I tried the print(str(name)). I am able to print the alt fine (with every script, the alt is not scrambled), but when it comes to printing the name and style all it prints is a None under every alt code is printed.
I have been working on fixing this for days and have come up with no solutions. can anyone help me solve this?
I have solved my own answer using this solution:
thetable = soup5.find('div', class_='turbolink_scroller')
items = thetable.find_all('div', class_='inner-article')
for item in items:
alt = item.find('img')['alt']
name = item.h1.a.text
color = item.p.a.text
print(alt,' --- ', name, ' --- ',color)

Generate an HTML report based on user interactions in shiny

I have a shiny application that allows my user to explore a dataset. The idea is that the user explores the dataset, and any interesting things the user finds he will share with his client via email. I don't know in advance how many things the user will find interesting. So, next to each table or chart I have an "add this item to the report" button, which isolates the current view and adds it to a reactiveValues list.
Now, what I want to do is the following:
Loop through all the items in the reactiveValues list,
Generate some explanatory text describing the item (This text should preferably be formatted HTML/markdown, rather than code comments)
Display the item
Capture the output of this loop as HTML
Display this HTML in Shiny as a preview
write this HTML to a file
knitr seems to do exactly the reverse of what I want - where knitr allows me to add interactive shiny components in an otherwise static document, I want to generate HTML in shiny (maybe using knitr, I don't know) based on static values the user has created.
I've constructed a minimum not-working example below to try to indicate what I would like to do. It doesn't work, it's just for demonstration purposes.
ui = shinyUI(fluidPage(
title = "Report generator",
sidebarLayout(
sidebarPanel(textInput("numberinput","Add a number", value = 5),
actionButton("addthischart", "Add the current chart to the report")),
mainPanel(plotOutput("numberplot"),
htmlOutput("report"))
)
))
server = shinyServer(function(input, output, session){
#ensure I can plot
library(ggplot2)
#make a holder for my stored data
values = reactiveValues()
values$Report = list()
#generate the plot
myplot = reactive({
df = data.frame(x = 1:input$numberinput, y = (1:input$numberinput)^2)
p = ggplot(df, aes(x = x, y = y)) + geom_line()
return(p)
})
#display the plot
output$numberplot = renderPlot(myplot())
# when the user clicks a button, add the current plot to the report
observeEvent(input$addthischart,{
chart = isolate(myplot)
isolate(values$Report <- c(values$Report,list(chart)))
})
#make the report
myreport = eventReactive(input$addthischart,{
reporthtml = character()
if(length(values$Report)>0){
for(i in 1:length(values$Report)){
explanatorytext = tags$h3(paste(" Now please direct your attention to plot number",i,"\n"))
chart = values$Report[[i]]()
theplot = HTML(chart) # this does not work - this is the crux of my question - what should i do here?
reporthtml = c(reporthtml, explanatorytext, theplot)
# ideally, at this point, the output would be an HTML file that includes some header text, as well as a plot
# I made this example to show what I hoped would work. Clearly, it does not work. I'm asking for advice on an alternative approach.
}
}
return(reporthtml)
})
# display the report
output$report = renderUI({
myreport()
})
})
runApp(list(ui = ui, server = server))
You could capture the HTML of your page using html2canvas and then save the captured portion of the DOM as a image using this answer, this way your client can embed this in any HTML document without worrying about the origin of the page contents

R Shiny: htmlOutput removes white spaces I need

I have a string like this for example:
SomeName SomeValue
A much longer name AnotherValue
Even longer name than before NewValue
Let's say I have this text correctly in a R variable like
variable<-"SomeName SomeValue<br\>A much longer name AnotherValue<br\>Even longer name than before NewValue<br\>
In my ui.R:
...htmlOutput("TextTable")
And in my server.R
output$TextTable<- renderUI({
HTML(variable)
})
But the output isn't the way I want it. All white spaces are deleted except one. So the "columns" aren't as they should be in my output. How can I avoid this?
If I remember correctly, browsers tend to shorten consecutive whitespaces in HTML code to a single character. Don't put the whitespace in your r variable. Store the data as a matrix, and use renderTable.
##server.R
library(shiny)
shinyServer(function(input, output) {
variable = c("SomeName","SomeValue","A much longer name","AnotherValue","Even longer name than before", "NewValue")
variable = matrix(data = variable, ncol = 2)
output$TextTable<- renderTable({variable},include.rownames = FALSE,include.colnames = FALSE)
})
##ui.R
library(shiny)
shinyUI(fluidPage(titlePanel(""),
sidebarLayout(sidebarPanel(),
mainPanel(uiOutput("TextTable")
)
)
))
Update
On the other hand, if your text is already in that format, you can prevent a browser from collapsing whitespace using the <pre> tag. In your case, you could use the following code:
output$TextTable<- renderUI(HTML(paste0("<pre>",variable,"</pre>")))