I have a FlexTable produced with the ReporteRs package which I would like to export as .html.
When I print the table to the viewer in RStudio I can do this by clicking on 'Export' and selecting 'Save as webpage'.
How would I replicate this action in my script?
I don't want to knit to a html document or produce a report just yet as at present I just want separate files for each of my draft tables which I can share with collaborators (but nicely formatted so they are easy to read).
I have tried the as.html function and that does produce a .html file but all the formatting is missing (it is just plain text).
Here is a MWE:
# load libraries:
library(data.table)
library(ReporteRs)
library(rtable)
# Create dummy table:
mydt <- data.table(id = c(1,2,3), name = c("a", "b", "c"), fruit = c("apple", "orange", "banana"))
# Convert to FlexTable:
myflex <- vanilla.table(mydt)
# Attempt to export to html in script:
sink('MyFlexTable.html')
print(as.html(myflex))
sink()
# Alternately:
sink('MyFlexTable.html')
knit_print(myflex)
sink()
The problem with both methods demonstrated above is that they output the table without any formatting (no borders etc).
However, manually selecting 'export' and 'save as webpage' in RStudio renders the FlexTable to a html file with full formatting. Why is this?
This works for me:
writeLines(as.html(myflex), "MyFlexTable.html")
Related
I wrote an HTML vignette for my R package hosted on GitHub. When I open it with browseVignettes, it flawlessly opens on the browsers showing this content:
Vignettes found by browseVignettes("package_name")
Vignettes in package package_name
package_name file_name - HTML source R code
clicking on HTML source R code it opens the same file in three different versions.
However, I don't need the source and the R code files to show.
Is there a way to output only the HTML file? As in the following output
Vignettes found by browseVignettes("package_name")
Vignettes in package package_name
package_name file_name - HTML
You can't easily drop the source, but you can drop the R code by setting that component to blank. For example,
allfields <- browseVignettes()
noR <- lapply(allfields, function(pkg) {pkg[,"R"] <- ""; pkg})
class(noR) <- class(allfields)
noR
If you really want to drop the source, then you'll need to get the print method and modify it:
print.browseVignettes <- utils:::print.browseVignettes
# Modify it as you like.
I created a plot along the lines of this:
http://www.buildingwidgets.com/blog/2015/7/2/week-26-sunburstr
# devtools::install_github("timelyportfolio/sunburstR")
library(sunburstR)
# read in sample visit-sequences.csv data provided in source
# https://gist.github.com/kerryrodden/7090426#file-visit-sequences-csv
sequence_data <- read.csv(
paste0(
"https://gist.githubusercontent.com/kerryrodden/7090426/"
,"raw/ad00fcf422541f19b70af5a8a4c5e1460254e6be/visit-sequences.csv"
)
,header=F
,stringsAsFactors = FALSE
)
In Rstudio I can click in the Viewer: "Export > Save as Web Page ..."
Which then saves the plot as interactive html-document. I would like to do this as part of the code. How do I save a plot to html using R-code? There are plenty of examples for PDF/jpg etc., but not html.
Store the sunburst in a variable and use htmltools::save_html to save it.
plot <- sunburst(sequence_data)
htmltools::save_html(plot, file = "C:/Users/User/Desktop/sunburst.html")
The qa() function of the ShortRead bioconductor library generates quality statistics from fastq files. The report() function then prepares a report of the various measures in an html format. A few other questions on this site have recommended using the display_html() function of IRdisplay to show html in jupyter notebooks using R (irkernel). However it only throws errors for me when trying to display an html report generated by the report() function of ShortRead.
library("ShortRead")
sample_dir <- system.file(package="ShortRead", "extdata", "E-MTAB-1147") # A sample fastq file
qa_object <- qa(sample_dir, "*fastq.gz$")
qa_report <- report(qa_object, dest="test") # Makes a "test" directory containing 'image/', 'index.html' and 'QA.css'
library("IRdisplay")
display_html(file = "test/index.html")
Gives me:
Error in read(file, size): unused argument (size)
Traceback:
1. display_html(file = "test/index.html")
2. display_raw("text/html", FALSE, data, file, isolate_full_html(list(`text/html` = data)))
3. prepare_content(isbinary, data, file)
4. read_all(file, isbinary)
Is there another way to display this report in jupyter with R?
It looks like there's a bug in the code. The quick fix is to clone the github repo, and make the following edit to the ./IRdisplay/R/utils.r, and on line 38 change the line from:
read(file,size)
to
read(size)
save the file, switch to the parent directory, and create a new tarbal, e.g.
tar -zcf IRdisplay.tgz IRdisplay/
and then re-install your new version, e.g. after re-starting R, type:
install.packages( "IRdisplay.tgz", repo=NULL )
Is it possible to parse text data from PDF files in R? There does not appear to be a relevant package for such extraction, but has anyone attempted or seen this done in R?
In Python there is PDFMiner, but I would like to keep this analysis all in R if possible.
Any suggestions?
Linux systems have pdftotext which I had reasonable success with. By default, it creates foo.txt from a give foo.pdf.
That said, the text mining packages may have converters. A quick rseek.org search seems to concur with your crantastic search.
This is a very old thread, but for future reference: the pdftools R package extracts text from PDFs.
A colleague turned me on to this handy open-source tool: http://tabula.nerdpower.org/. Install, upload the PDF, and select the table in the PDF that requires data-ization. Not a direct solution in R, but certainly better than manual labor.
A purely R solution could be:
library('tm')
file <- 'namefile.pdf'
Rpdf <- readPDF(control = list(text = "-layout"))
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf))
corpus.array <- content(content(corpus)[[1]])
then you'll have pdf lines in an array.
install.packages("pdftools")
library(pdftools)
download.file("http://www.nfl.com/liveupdate/gamecenter/56901/DEN_Gamebook.pdf",
"56901.DEN.Gamebook", mode = "wb")
txt <- pdf_text("56901.DEN.Gamebook")
cat(txt[1])
The tabula PDF table extractor app is based around a command line application based on a Java JAR package, tabula-extractor.
The R tabulizer package provides an R wrapper that makes it easy to pass in the path to a PDF file and get data extracted from data tables out.
Tabula will have a good go at guessing where the tables are, but you can also tell it which part of a page to look at by specifying a target area of the page.
Data can be extracted from multiple pages, and a different area can be specified for each page, if required.
For an example use case, see: When Documents Become Databases – Tabulizer R Wrapper for Tabula PDF Table Extractor.
I used an external utility to do the conversion and called it from R. All files had a leading table with the desired information
Set path to pdftotxt.exe and convert pdf to text
exeFile <- "C:/Projects/xpdfbin-win-3.04/bin64/pdftotext.exe"
for(i in 1:length(pdfFracList)){
fileNumber <- str_sub(pdfFracList[i], start = 1, end = -5)
pdfSource <- paste0(reportDir,"/", fileNumber, ".pdf")
txtDestination <- paste0(reportDir,"/", fileNumber, ".txt")
print(paste0("File number ", i, ", Processing file ", pdfSource))
system(paste(exeFile, "-table" , pdfSource, txtDestination, sep = " "), wait = TRUE)
}
In Sublime Text, is there a way I can extract a selected piece of text into a separate file?
I do this often in LaTeX. Consider the following file:
main.tex
\section{Introduction}
...
...
\section{Conclusion}
I want to be able to select the text starting from Introduction until one line before the Conclusion, right-click and then say "Extract to file" (somewhat similar to how "Extract method" works in Visual Studio). Is there a way to achieve this using any shortcuts?
Bonus: Once the extraction is complete, substitute the extracted text with custom text such as \input{introduction} where introduction is the name of the file that the text was extracted into.
Nothing built in, but it's easily doable with a plugin. Note the following is minimally tested and won't handle everything in ST well. That being said, it should be a good base for you to start with. Just to be safe, I'd throw everything into a local git repo before using this to much. Hate for this to lead to loss of work. I copy the content being replaced to the clipboard just to be safe, but if you feel confident with it, you can remove sublime.set_clipboard(content)
import sublime
import sublime_plugin
import os
import re
class ExtractAndInput(sublime_plugin.TextCommand):
def run(self, edit):
view = self.view
self.region = view.sel()[0]
content = view.substr(self.region)
sublime.set_clipboard(content)
match = re.search(r"\\section{(.+?)}", content)
if match:
replace = "\\input{%s}" % match.group(1)
view.replace(edit, view.sel()[0], replace)
current = view.file_name()
new_file = "%s.tex" % match.group(1)
path = os.path.normpath(os.path.join(current, "..", new_file))
with open(path, "a") as file_obj:
file_obj.write("% Generated using ExtractAndInput Plugin\n")
file_obj.write(content)
After saving the plugin, you can create a key binding to extract_and_input. You can also add a context menu by creating a Context.sublime-menu in Packages/User with the following content.
[
{ "caption": "Extract to File", "command": "extract_and_input"}
]