how to display user defined title in r markdown? - html

I am new to R and R markdown. In my R code I used a textOutput so the user can enter a title in a blank field. The title is given to a variable called 'title'. How can I display that in an R markdown script that generates pdf, html, and doc files.
Thanks
SOLUTION:
In my Rmd file I wrote this: r dfdrctitle$title and in my server.R file I used this code to get the value for the textOutput:
drctitle <- as.character(input$drc.title)
dfdrctitle <- data.frame( title = drctitle)

You could accomplish this by parameterizing your rmarkdown report. You pass parameters into the report as a list using an option of rmarkdown::render().
First off, in the yaml header of your rmarkdown document you'd include the title parameter. You can access passed parameters into the report via r params$item which instructs knitr to evaluate that as literal r code. You need to quote it because knitr expects a string as a title in the yaml.
---
title: "`r params$rep_title`"
author: "generic_user"
---
Include other output options that you need as well (document output type, etc.). Now to render your report and pass in the parameter in a list that matches the parameter name.
library(rmarkdown)
render(path_to_my_report.rmd,
output_dir = "path_to_mydir",
output_file = "myreport",
params = list(rep_title = title))

Try this in the title field of your header. I believe that you can do this to output any r variable from a code chunk as text, even in a header.
---
title: `r title`
author: "your_name"
date: "11/18/2016"
output: pdf_document
---
R markdown: Accessing variable from code chunk (variable scope)

Related

Scraping escaped JSON data within a <script type="text/javascript"> in R

I am currently trying to scrape the data from the two graphs on following html page (information from two graphs listed there: Forsmark and Ringhals): https://group.vattenfall.com/se/var-verksamhet/vara-energislag/karnkraft/aktuell-karnkraftsproduktion
The data originate from script tags like this (fragment)
<script type="text/javascript">
/*<![CDATA[*/ productionData = JSON.parse("{\"timestamp\":1582642616000,\"powerPlant\":\"Ringhals\", // etc
</script>
I would like to get two dataframes that looks like these:
F1 F2 F3
number number number
and
R1 R2 R3
number number number
I tried to use XML and xpath to parse an html page but did not get anywhere with that.
Do you have any ideas?
Thanks!
Those charts are <iframe>s that load from
https://gvp.vattenfall.com/sweden/produced-power/iframe/forsmark and
https://gvp.vattenfall.com/sweden/produced-power/iframe/ringhals
so you should scrape those two pages directly.
This was an interesting challenge.
It becomes not too hard with rvest and jsonlite, which you will have to install if you don't already have. Both require rtools.
Try this:
library('rvest')
library('jsonlite')
# Load the URL (do the same for the other iframe)
url <- 'https://gvp.vattenfall.com/sweden/produced-power/iframe/forsmark'
# Parse it
webpage <- read_html(url)
# Extract the script element. That's a CSS selector for the specific one that holds the json data
# You can find it in your browser's DevTools by finding the script element
# and right-clicking, choosing Copy > CSS Path/Selector
script_element <- html_nodes(webpage, 'body > section:nth-child(2) > script:nth-child(2)')
# Extract its string content
json = html_text(script_element)
# Clean it up
json = gsub("\n /*<![CDATA[*/\n productionData = JSON.parse(", "", json, fixed=TRUE)
json = gsub(");\n /*]]>*/\n ", "", json, fixed=TRUE)
json = gsub("\"{", "{\"", json, fixed=TRUE)
json = gsub("}\"", "}", json, fixed=TRUE)
json = gsub("{\"\\\"", "{\\\"", json, fixed=TRUE)
# Extract data
data = jsonlite::fromJSON(gsub("\\\"", "\"", json, fixed=TRUE))
Caveat: I'm not really an R expert, there is likely a more elegant way of doing this (particularly the data cleaning portion). But it works.
For historical preservation, that takes this DOM node (the text content of the <script> tag):
"\n /*<![CDATA[*/\n productionData = JSON.parse(\"{\\\"timestamp\\\":1582643336000,\\\"powerPlant\\\":\\\"Forsmark\\\",\\\"blockProductionDataList\\\":[{\\\"name\\\":\\\"F1\\\",\\\"production\\\":998.86194,\\\"percent\\\":99.88619},{\\\"name\\\":\\\"F2\\\",\\\"production\\\":1120.434,\\\"percent\\\":97.8545},{\\\"name\\\":\\\"F3\\\",\\\"production\\\":1189.7126,\\\"percent\\\":99.55754}]}\");\n /*]]>*/\n "
and will result in data of this format
> data
$timestamp
[1] 1.582647e+12
$powerPlant
[1] "Forsmark"
$blockProductionDataList
name production percent
1 F1 997.7902 99.77902
2 F2 1131.6150 98.83100
3 F3 1190.0520 99.58594

SDMX to dataframe with RSDMX in R

I'm trying to get data from the Lithuanian Statistics Department. They offer SDMX API with either XML or JSON (LSD).
The example XML shown is : https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217 which downloads the XML file.
I tried following:
devtools::install_github("opensdmx/rsdmx")
library(rsdmx)
string <- "https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217"
medianage <- readSDMX(string)
which results in error:
<simpleError in doTryCatch(return(expr), name, parentenv, handler): Invalid SDMX-ML file>
I also tried simply reading in the manually downloaded file
devtools::install_github("opensdmx/rsdmx")
library(rsdmx)
medianage <- readSDMX(file="rest_data_M3010217_20180116163251.xml" , isURL = FALSE)
medianage <- as.data.frame(medianage)
results in medianage being NULL (empty)
Maybe soneone has an idea, how I could solve downloading /transforming the data from LSD by using either:
https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217
https://osp-rs.stat.gov.lt/rest_json/data/S3R629_M3010217
Thanks a lot!
In order to use rsdmx for this datasource, some enhancements have been added (see details at https://github.com/opensdmx/rsdmx/issues/141). You will need re-install rsdmx from Github (version 0.5-11)
You can use the url of the SDMX-ML file
library(rsdmx)
url <- "https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217"
medianage <- readSDMX(url)
df <- as.data.frame(medianage)
A connector has been added in rsdmx to facilitate data query on the LSD (Lithuanian Statistics Department) SDMX endpoint. See below an example on how to use it.
sdmx <- readSDMX(providerId = "LSD", resource = "data",
flowRef = "S3R629_M3010217", dsd = TRUE)
df <- as.data.frame(sdmx, labels = TRUE)
The above example shows how to enrich the data.frame with code labels extracted from the SDMX Data Structure Definition (DSD). For this, specify dsd = TRUE with readSDMX. This allows then to use labels = TRUE when converting to data.frame. For filtering data with readSDMX, e.g. (startPeriod, endPeriod, code filters), check this page https://github.com/opensdmx/rsdmx/wiki#readsdmx-as-helper-function

How do I pass pandoc_options as output_options to rmarkdown::render()

I have an Rmd file that renders into html correctly almost all of the time. However, it does not render correctly when pandoc (used in the rendering process) finds 4 spaces in the html and at that point, interprets that I want to render a markdown code snippet instead of html.
I have been told that I can turn off the markdown_in_html_blocks feature by doing something like this:
pandoc -f markdown-markdown_in_html_blocks.
I have tried calling pandoc directly rather than it being called implicitly by
rmarkdown::render()
but couldn't get that syntax to work and being able to specify this option (-markdown_in_html_blocks) directly as I call render() is preferred. Here is the latest of I have tried without success:
Base case: works but HTML output file is malformed / has a code block instead of the data that I want to display in the table.
render("reports/Pacing.Rmd")
Attempted fix: not working
rmdFmt <- rmarkdown_format("-markdown_in_html_blocks")
pandocOpts <- pandoc_options(to = "html", from = rmdFmt)
render("reports/Pacing.Rmd",output_format = "html_document",output_file = NULL, output_dir = NULL, output_options = pandocOpts)
Error message: Error in (function (toc = FALSE, toc_depth = 3, toc_float = FALSE, number_sections = FALSE, :
argument 1 matches multiple formal arguments
I have tried other syntax to express that I want to turn off markdown_in_html_blocks but no luck.
Given the following document test.Rmd...
---
title: Test
output: html_document
---
<table>
<tr>
<td>*one*</td>
<td>[a link](https://google.com)</td>
</tr>
</table>
...you can disable the markdown_in_html_blocks extension via
rmarkdown::render("test.Rmd",
output_options = list(md_extensions = "-markdown_in_html_blocks"))
md_extensions is one of the arguments that can be passed to rmarkdown::html_document (see ?rmarkdown::html_document for other arguments).
That seems to be an open issue, but a simpler way to turn off/on such a feature is to directly update the YAML in Rmd file. This should work in your case:
output:
html_document:
pandoc_args: [
"-f", "markdown-markdown_in_html_blocks"
]

Passing params to shiny rmarkdown

My first question here, though I've been reading the site for a long time.
I've been struggling (to put it mildly) for a few days to import the data into a shiny rmarkdown file.
I'm open to other methods, but, ultimately, it needs to be linked to in a website.
The file that the data comes from would be on the users computer.
My latest attempt is this..
It gets the error:
"Error: cannot open the connection" when I click on the link.
html:
<input type="file" id="files" name="files[]" multiple />
<output id="list">
<a href="https://<myName>.shinyapps.io/<myShinyRmarkdownFile>/?"id>linkText</a>
</output>
Bits from Rmarkdown file:
---
title: "x"
author: "x"
date: "x"
output: html_document
runtime: shiny
params:
filename: "<link on desktop which works when running in desktop rstudio>"
---
```{r echo=FALSE, comment=NA}
print(paste("x",params$filename,sep=" "))
A<-read.delim(params$filename,header=T,sep="\t")
.
.
.
I found out how to do what I wanted, though the title of the question was probably misleading. I wanted to get data into shiny, that I could then analyse and make a pdf with.
The following helped me get the data into shiny: youtube video and corresponding server.R file.
https://www.youtube.com/watch?v=HPZSunrSo5M
https://github.com/aagarw30/R-Shinyapp-Tutorial/blob/master/fileinput/server.R
thanks,
John
Edit:
The server.R file is below:
library(shiny)
# use the below options code if you wish to increase the file input limit, in this example file input limit is increased from 5MB to 9MB
# options(shiny.maxRequestSize = 9*1024^2)
shinyServer(function(input,output){
# This reactive function will take the inputs from UI.R and use them for read.table() to read the data from the file. It returns the dataset in the form of a dataframe.
# file$datapath -> gives the path of the file
data <- reactive({
file1 <- input$file
if(is.null(file1)){return()}
read.table(file=file1$datapath, sep=input$sep, header = input$header, stringsAsFactors = input$stringAsFactors)
})
# this reactive output contains the summary of the dataset and display the summary in table format
output$filedf <- renderTable({
if(is.null(data())){return ()}
input$file
})
# this reactive output contains the summary of the dataset and display the summary in table format
output$sum <- renderTable({
if(is.null(data())){return ()}
summary(data())
})
# This reactive output contains the dataset and display the dataset in table format
output$table <- renderTable({
if(is.null(data())){return ()}
data()
})
# the following renderUI is used to dynamically generate the tabsets when the file is loaded. Until the file is loaded, app will not show the tabset.
output$tb <- renderUI({
if(is.null(data()))
h5("Powered by", tags$img(src='RStudio-Ball.png', heigth=200, width=200))
else
tabsetPanel(tabPanel("About file", tableOutput("filedf")),tabPanel("Data", tableOutput("table")),tabPanel("Summary", tableOutput("sum")))
})
})
The "important bits":
data <- reactive({
file1 <- input$file
if(is.null(file1)){return()}
read.table(file=file1$datapath, sep=input$sep, header = input$header, stringsAsFactors = input$stringAsFactors)
# [ john's edit: more code can be put here to alter data before passed to data(). eg:
# o=read.delim(file=file1$datapath)
# convertData(o)
#]
})
Reactive meaning, it waits for input$file (ie, the file chooser), then reads the file you've chosen. Personally, I use read.delim(....) rather than read.table(...). The data (dataframe) is now stored in "data()" and is referred to as such. Hence:
output$table <- renderTable({
if(is.null(data())){return ()}
data()
})
Note the above checks that there is data in data() before displaying.
data() can be passed into functions in the same was as a dataframe:
new.dataframe=do.something.with.the.data(data())
Another issue that had me stumped for a long time is how to export/render a pdf.
output$download_pdf <- downloadHandler(
filename = function() {
paste('pdf_name', sep = '.','pdf')
},
content = function(file) {
src <- normalizePath('rmarkdownFile.Rmd')
print(paste("src",src,sep=": "))
owd <- setwd(tempdir())
on.exit(setwd(owd))
file.copy(src, 'rmarkdownFile.Rmd')
df.A=data.frame(functionA(data()))
df.B=data.frame(functionB(data()))
out=rmarkdown::render('rmarkdownFile.Rmd',pdf_document())
file.rename(out,file)
}
)
I've not got it all worked out, eg, you're supposed to copy the file first to rename it, but that seems to cause problems, I've left it for now.
df.A and df.B are dataframes passed to the rmarkdown file. Nothing else needs to be done. Within rmarkdown just use them as normal, no params etc..
Using on your home computer (with the settings above) the pdf is saved within the project directory. Using as an uploaded app in shinyapps.io a file save location window pops up.
I hope the above helps.

Download hidden json array in HTML using R

I'm trying to scrape data from tranfermrkt using mainly XML + httr package.
page.doc <- content(GET("http://www.transfermarkt.es/george-corral/marktwertverlauf/spieler/103889"))
After downloading, there is a hidden array named 'series':
'series':[{'type':'line','name':'Valor de mercado','data':[{'y':600000,'verein':'CF América','age':21,'mw':'600 miles €','datum_mw':'02/12/2011','x':1322780400000,'marker':{'symbol':'url(http://akacdn.transfermarkt.de/images/wappen/verysmall/3631.png?lm=1403472558)'}},{'y':850000,'verein':'Jaguares de Chiapas','age':21,'mw':'850 miles €','datum_mw':'02/06/2012','x':1338588000000,'marker':{'symbol':'url(http://akacdn.transfermarkt.de/images/wappen/verysmall/4774_1441956822.png?lm=1441956822)'}},{'y':1000000,'verein':'Jaguares de Chiapas','age':22,'mw':'1,00 mill. €','datum_mw':'03/12/2012','x':1354489200000,'marker':{'symbol':'url(http://akacdn.transfermarkt.de/images/wappen/verysmall/4774_1441956822.png?lm=1441956822)'}},{'y':1000000,'verein':'Jaguares de Chiapas','age':22,'mw':'1,00 mill. €','datum_mw':'29/05/2013','x':1369778400000,'marker':{'symbol':'url(http://akacdn.transfermarkt.de/images/wappen/verysmall/4774_1441956822.png?lm=1441956822)'}},{'y':1250000,'verein':'Querétaro FC','age':23,'mw':'1,25 mill. €','datum_mw':'27/12/2013','x':1388098800000,'marker':{'symbol':'url(http://akacdn.transfermarkt.de/images/wappen/verysmall/4961.png?lm=1409989898)'}},{'y':1500000,'verein':'Querétaro FC','age':24,'mw':'1,50 mill. €','datum_mw':'01/09/2014','x':1409522400000,'marker':{'symbol':'url(http://akacdn.transfermarkt.de/images/wappen/verysmall/4961.png?lm=1409989898)'}},{'y':1800000,'verein':'Querétaro FC','age':25,'mw':'1,80 mill. €','datum_mw':'01/10/2015','x':1443650400000,'marker':{'symbol':'url(http://akacdn.transfermarkt.de/images/wappen/verysmall/4961.png?lm=1409989898)'}}]}]
Is there a way to download directly? I want to scrape 600+ pages.
Until now, I have tried
page.doc.2 <- xpathSApply(page.doc, "//*/div[#class='eight columns']")
page.doc.2 <- xpathSApply(page.doc, "//*/div[#class='eight columns']", xmlAttrs)
No, there is no way to download just the JSON data: the JSON array you’re interested in is embedded inside the page’s source code, as part of a script.
You can then use conventional XPath or CSS selectors to find the script elements. However, finding and extracting just the JSON part is harder without a library that evaluates the JavaScript code. A better option would definitely be to use an official API, should one exist.
library(rvest) # Better suited for web scraping than httr & xml.
library(rjson)
doc = read_html('http://www.transfermarkt.es/george-corral/marktwertverlauf/spieler/103889')
script = doc %>%
html_nodes('script') %>%
html_text() %>%
grep(pattern = "'series':", value = TRUE)
# Replace JavaScript quotes with JSON quotes
json_content = gsub("'", '"', gsub("^.*'series':", '', script))
# Truncate characters from the end until the result is parseable as valid JSON …
while (nchar(json_content) > 0) {
json = try(fromJSON(json_content), silent = TRUE)
if (! inherits(json, 'try-error'))
break
json_content = substr(json_content, 1, nchar(json_content) - 1)
}
However, there’s no guarantee that the above will always work: it is JavaScript after all, not JSON; the two are similar but not every valid JavaScript array is valid JSON.
It could be possible to evaluate the JavaScript fragment instead but that gets much more complicated. As a start, take a look at the V8 interface for R.