how to replace one part in url by using R - html

Currently I have the website
http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=1&showViewpoints=0&sortBy=bySubmissionDateDescending
I want to replace this part
pageNumber=1
to be replaced with a sequence of numbers such as 1,2,3,.....n
I know I need to use the paste function. But can do I locate this number and replace it?

You can use the parseQueryString function from the shiny package or parse_url and build_url from httr package.
require(shiny)
testURL <- "<http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=1&showViewpoints=0&sortBy=bySubmissionDateDescending>"
parseURL <- parseQueryString(testURL)
parseURL$pageNumber <- 4
newURL <- paste(names(parseURL), parseURL, sep = "=", collapse="&")
require(httr)
testURL <- "<http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=1&showViewpoints=0&sortBy=bySubmissionDateDescending>"
parseURL <- parse_url(testURL)
parseURL$query$pageNumber <- 4
newURL <- build_url(parseURL)

Try this:
# inputs
URL1 <- "...whatever...&pageNumber=1"
i <- 2
URL2 <- sub("pageNumber=1", paste0("pageNumber=", i), URL1)
or using a perl zero width regex:
URL2 <- sub("(?<=pageNumber=)1", i, URL1, perl = TRUE)
If we know that there is no 1 prior to pageNumber, as is the case here, then it simplifies to just:
URL2 <- sub(1, i, URL1)

Another very simple approach is to use sprintf:
sprintf('http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=%s&showViewpoints=0&sortBy=bySubmissionDateDescending',
1:10)
In the above code, the %s in the string provided as the first argument is replaced by each element of the vector provided in the second argument, in turn.
See ?sprintf for more details about this very handy string manipulation function.

simplest approach would be splitting the string to
var part1 = " http://www.amazon.com/Apple-generation-Tablet-processor-White/product-reviews/B0047DVWLW/ref=cm_cr_pr_btm_link_2?ie=UTF8&pageNumber=";
var number =1;
var part2 = "&showViewpoints=0&sortBy=bySubmissionDateDescending"
link = part1+number+part2
another approach would be to use string.replace("pageNumber=1","pageNumber=2");
and another option would be to use regex but im not good with that youll have to do some googling.

i figure it out now, the code is here.
listurl<-paste("http://rads.stackoverflow.com/amzn/click/B0047DVWLW",1:218)
ipadlisturl<-paste(listurl,"&showViewpoints=0&sortBy=bySubmissionDateDescending")

Related

Deleting commas in R Markdown html output

I am using R Markdown to create an html file for regression results tables, which are produced by stargazer and lfe in a code chunk.
library(lfe); library(stargazer)
data <- data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
result <- stargazer(felm(y ~ x + z, data = data), type = 'html')
I create a html file win an inline code r result after the chunk above. However, a bunch of commas appear at the top of the table.
When I check the html code, I see almost every </tr> is followed by a comma.
How can I delete these commas?
Maybe not what you are looking for exactly but I am a huge fan of modelsummary. I knit to HTML to see how it looks and then usually knit to pdf. The modelsummary equivalent would look something like this
library(lfe)
library(modelsummary)
data = data.frame(x = 1:10, y = rnorm(10), z = rnorm(10))
results = felm(y ~ x + z, data = data)
modelsummary(results)
There are a lot of ways to customize it through kableExtra and other packages. The documentation is really good. Here is kind of a silly example
library(kableExtra)
modelsummary(results,
coef_map = c("x" = "Cool Treatment",
"z" = "Confounder",
"(Intercept)" = "(Intercept)")) %>%
row_spec(1, background = "#F5ABEA")

Web Scrape an Image with rvest R

I'm having a problem when trying to scrape an image from this page. My code is as follow:
library(rvest)
url <- read_html("https://covid-19vis.cmm.uchile.cl/chart")
m <- '/html/body/div/div/div[4]/main/div/div/div/div/div/div[2]/div[1]'
grafico_cmm <- html_node(url, xpath = m) %>% html_attr('src')
When I run the above code, the result is NA. Does someone know how can I scrape the plot or maybe the data from the page?
Thanks a lot
It not an image, it is an interactive chart. For an image, you would need to scrape the data points and re-create as a chart and then convert to an image. Xpath is also invalid.
The data comes from an API call. I checked the values against the chart and this is the correct endpoint.
library(jsonlite)
data <- jsonlite::read_json('https://covid-19vis.cmm.uchile.cl/api/data?scope=0&indicatorId=57', simplifyVector = T)
The chart needs some tidying but here is a basic plot of the r values:
data$date <- data$date %>% as.Date()
library("ggplot2")
ggplot(data=data,
aes(x=date, y=value, colour ='red')) +
geom_line() +
scale_color_discrete(name = "R Efectivo", labels = c("Chile"))
print tail(data)

How to trigger a file download using R

I am trying to use R to trigger a file download on this site: http://www.regulomedb.org. Basically, an ID, e.g., rs33914668, is input in the form, click Submit. Then in the new page, click Download in the bottom left corner to trigger a file download.
I have tried rvest with the help from other posts.
library(httr)
library(rvest)
library(tidyverse)
pre_pg <- read_html("http://www.regulomedb.org")
POST(
url = "http://www.regulomedb.org",
body = list(
data = "rs33914668"
),
encode = "form"
)
) -> res
pg <- content(res, as="parsed")
By checking pg, I think I am still on the first page, not the http://www.regulomedb.org/results. (I don't know how to check pg list other than reading it line by line). So, I cannot reach the download button. I cannot figure out why it cannot jump to the next page.
By learning from some other posts, I managed to download the file without using rvest.
library(httr)
library(rvest)
library(RCurl)
session <- html_session("http://www.regulomedb.org")
form <- html_form(session)[[1]]
filledform <- set_values(form, `data` = "rs33914668")
session2 <- submit_form(session, filledform)
form2 <- html_form(session2)[[1]]
filledform2 <- set_values(form2)
thesid <- filledform2[["fields"]][["sid"]]$value
theurl <- paste0('http://www.regulomedb.org/download/',thesid)
download.file(theurl,destfile="test.bed",method="libcurl")
In filledform2, I found the sid. Using www.regulomedb.org/download/:sid, I can download the file.
I am new to html or even R, and don't even know what sid is. Although I made it, I am not satisfied with the coding. So, I hope some experienced users can provide better, alternative solutions, or improve my current solution. Also, what is wrong with the POST/rvest method?
url<-"http://www.regulomedb.org/"
library(rvest)
page<-html_session(url)
download_page<-rvest:::request_POST(page,url="http://www.regulomedb.org/results",
body=list("data"="rs33914668"),
encode = 'form')
#This is a unique id on generated based on your query
sid<-html_nodes(download_page,css='#download > input[type="hidden"]:nth-child(8)') %>% html_attr('value')
#This is a UNIX time
download_token<-as.numeric(as.POSIXct(Sys.time()))
download_page1<-rvest:::request_POST(download_page,url="http://www.regulomedb.org/download",
body=list("format"="bed",
"sid"=sid,
"download_token_value_id"=download_token ),
encode = 'form')
writeBin(download_page1$response$content, "regulomedb_result.bed")

Shiny renderText: half italicized, half not?

In my shiny app, I have a textOutput named acronym where I would like to renderText some text which is half non-italicized, half-italicized.
I tried doing it like this:
output$acronym_1 <- renderText(paste("SID SIDE:", tags$em("Siderastrea siderea")))
But this did not get the second half in italics. How do I do this?
Thanks in advance.
The following code will produce italicized text
library(shiny)
ui = fluidPage(uiOutput("htmlText"))
server <- function(input, output)
output$htmlText <- renderUI(HTML(paste(
"Non-italic text.", em("Italic text")
)))
shinyApp(ui, server)
I don't think textOutput is capable of text markup since the output string will be created by cat according to the documentation.
renderText(expr, env = parent.frame(), quoted = FALSE,
outputArgs = list())
expr An expression that returns an R object that can be used as an argument to cat.

Get column name in apply function

I am trying to make a function that makes a small report for every column in a data frame by using apply. In the report I want to use the name of the column so I have to 'extract' it somehow and that is what my question is about. How do I get the name of the column in my apply function?
Here is a simple example where I want to use the name of the column in the graph title: (for now I just hardcoded the name as 'x')
x <- c(1,1,2,2,2,3)
y <- c(2,3,4,5,4,4)
Tb <- data.frame(x,y)
Dq_Hist <- function(Tab){
Name <- 'x'
Ttl <- paste('Variable: ',Name,'')
hist(Tab,main=Ttl,col=c('grey'),xlab=Name)
}
D <- apply(Tb,MARGIN=2,FUN=Dq_Hist)
Well, if nobody answers you got to find out yourself... And I found out that you can call sapply with an index list and use this index in the function. So the solution is:
x <- c(1,1,2,2,2,3)
y <- c(2,3,4,5,4,4)
Tb <- data.frame(x,y)
Dq_Hist <- function(i){
Name <- colnames(Tb)[i]
Ttl <- paste('Variable: ',Name,'')
hist(Tb[,i],main=Ttl,col=c('grey'),xlab=Name)
}
D <- sapply(1:ncol(Tb),Dq_Hist)