Bookdown: Single html output file - html

If I add a line below the first in _output.yml:
bookdown::gitbook:
split_by: none
css: ...
in the bookdown-demo the output becomes a single .html file which looks kind of plain ugly. Is it somehow possible to retain the nice style which is produced by the default settings but in a single file? If I want to send the book to someone else sending a stack of files is not great, especially if the person who receives it is not familiar with HTML as a document format.

This turns out to be a bug of bookdown, and I just fixed it on Github. You can install and test the development version (>= 0.3.3):
devtools::install_github('rstudio/bookdown')

Related

How to change format of Warning admonition or add Caution in Sphinx HTML output

This seems like it should be straightforward but I've been prowling the documentation and web and haven't found the answer.
I want to output HTML doc from Sphinx. Ideally I'd like to have three levels of "note" type highlighted text boxes. ReST defines several "admonitions": (http://docutils.sourceforge.net/docs/ref/rst/directives.html#admonitions) but most of the Sphinx HTML themes include special formatting only for Note and Warning. (I am using one of the preinstalled themes, Classic.)
I have two questions:
1) How can I customize the color behind Warning in my documents?
2) How can I add a formatting style for Caution?
I see that these all end up with tags like <div class="admonition warning"> ... in the HTML output. But I can't find where the formatting for that class is defined. Is it in a stylesheet? Is it in a layout.html file or some other file?
Is there anything that explains how the various files in themes actually interact with each other? I haven't found a good primer. (I am no expert on css-based HTML either, so maybe that's part of the problem.)
Okay, I figured out more and have a working workaround. (I'm still not sure how I'm supposed to handle this.)
Looks like my HTML code is reading directly from a few cascading stylesheets stored along with the output in a directory called _static. There's classic.css, which inherits from basic.css.
I don't understand how these relate to the files named like basic.css_t that live in the Python Sphinx install.
To change things, should I (A) try altering the _t files? or (B) create an altered local copy of classic.css that lives in my source directory?
If I go with B, more questions.
Will it be overwritten by the values in the css_t template at build time? (I guess this is easy enough to test)
Is it good practice to use the same filename for a modified version of that stylesheet?
Here's a workaround that avoids those questions and seems to be doing what I want - from this: https://github.com/snide/sphinx_rtd_theme/issues/117
I created an override stylesheet that includes just the formatting I want to change.
I stored it in the _static of my source directory.
I defined it in my conf.py as follows:
html_context = {
'css_files': [
'_static/theme_overrides.css',
],
}
Now, that github discussion said that this wasn't a solution for all kinds of themes (including the RTD theme mentioned in the question) but I think I'm safe for now.
What more should I know?

convert docx with (ordered) list to html

I'm trying to convert a large docx document with several layers' ordered list to an html. (see an example of the document here: http://docdro.id/X1oyfBv You should download it)
I tried the following things, including:
online converters such as html-cleaner and index.html (which only recognize one layer of the list)
save as html - which creates an horrendous file but still doesn't recognize the ol structure.
saved the file as zip and then opened the xml file, but I dont see an easy way to get the ol structure out of the w:... tags
saving it to google docs and running Omar Alzabir's script
http://omaralzabir.com/wp-content/uploads/2014/05/GoogleDocsEmail.jpg
btw. If I create a word file with an ordered list with multiple layers and i convert it, it does recognize it as ol's. But the existing file is not recognized as ol's even if I 'un-list' and list it again. So possibly there is something wrong with how the original document was created (?)
Any suggestions much appreciated:) Or indications as to why this problem occurs
Are you asking how to save a Word-doc in HTML format, with multi-level ordered-lists?
Word-HTML has bugs in its multi-level ordered lists. For the list-items, the indentation tends to be incorrect and inconsistent. There's an example here.
Word-HTML has similar bugs in its multi-level unordered lists. An example is here.
I recently wrote a Python program that fixes these bugs, in Word's HTML. The program is part of WordWebNav (WWN), which is free and open-source.
WWN is an app that converts a Microsoft-Word document to a usable web-page. It adds some missing features in the Word-HTML web-page (e.g., a navigation pane), and it fixes bugs in the Word-HTML.
You can use pandoc : https://github.com/jgm/pandoc
This is an open source universal command line tool to convert markup source based document files.
You can use it as something like that:
pandoc -o output.html input.docx

convert pdf into small chunks of data(many chunks per page)?

I have a pdf file and I need to get get small pieces of data from it.
It is structured like this :
Page1:
Question 1
......................................
......................................
Question 2
......................................
......................................
Page End
I want to get Question 1 and Question 2 as separate html files, which contain text and image.
I've tried
pdftohtml -c pdffile.pdf output.html
And I got files with png images, but how to do I cut the Image into smaller chunks to fit the size of each Question (I want to separate each question into individual files)?
P.S. I have alot of pdf files, so a command-line tool would be nice.
I'll try to give you an approach on how I would go about it. You mention, that every page in your PDF document might have multiple questions and you basically want have one HTML file for every question.
It's great if pdftohtml works for you, but I also found another decent command line utility that you might want to try out.
Ok, so assuming you have an HTML file converted from the PDF you initially had, you might want to use csplit or awk to split your file into multiple files based on the delimiter 'Question' in your case. (Side note- csplit and awk are linux specific utilites, but I'm sure there are alternatives if you are on Windows or a MAC. I haven't specifically tried the following code)
From a relevant SO Post :
csplit input.txt'/^Question$/' '{*}'
awk '/Question/{filename=NR".txt"}; {print >filename}' input.txt
So, assuming this works, you will have a couple of broken html files. Broken because they'll be unsanitized due to dangling < or > or some other stray HTML elements after the splitting.
So you could start by saving the initial .html as .txt, removing the html, head and body elements specifically and going through the general structure of how the program converts the pdf into html. I'm sure you'll see a pattern around how the string 'Quetion' is wrapped in an element and is something you can take care of. That is why I mention .txt files in the code snippets.
You will basically have a bunch of text files with just the content html and not the usual starting tags for an html file because we removed that initially. Then it's only a matter of reading each file, just taking care of the element that surrounds the string 'Question' and adding the html, head and body elements around the content and saving them as .html files. You could do this in any programming language of your choice that supports file reading and writing (would be a fun exercise)
I hope this gets you started in the right direction.

Fixing malformed html that html tidy doesn't fix

Okay, so I've been utilizing HTML tidy to convert regular HTML webpages into XHTML suitable for parsing. The problem is the test page I saved in firefox had its html apparently somewhat precleaned by firefox during saving, call this File F. Html tidy works fine on file F, but fails on the raw data written to a file via .NET (file N). Html tidy is complaining about form tags being intermixed with table tags. The Html isn't mine so I can't just fix the source.
How do I clean up file N enough so that it can be run through Html tidy? Is there a standard way of hooking into Firefox (completely programmically without having to use mouse or keyboard) or another tool that will apply extra fixes to the html?
I had been using HTML tidy for some time, but then found that I was getting better results from TagSoup.
It can be used as a JAXP parser, converting non-wellformed HTML on the fly. I usually let it parse the input for Saxon XQuery transformations.
But it can also be used as a stand-alone utility, as an executable jar.
I wound up using SendKeys in C# and importing functions from user32.dll to set Firefox as the active window after launching it to the website I wanted (file:///myfilepathhere/).
SendKeys seemed to require running a windowed program, so I also added another executable which performs actions in its form_load() method.
By using alt+f, down six times, enter, wait for a bit, type full path file name, enter (twice) and then killing firefox, I was able to automate firefox's ability to clean some html up.

Something like Include for HTML in VS2010 at build time

Is there a way to split a single HTML page (purely static, HTML + JS) in VS2010 (I use VS2010 + ReSharper for my HTML /Js coding) into parts, but get / build a single page at build time.
There was such a feature with Dreamweaver (I have used this years back, think it called libraries). If I was using PHP I'd use something like Include at runtime.
My page contains several div sections serving as tabs, only one visible at time. I want to place the code between these tabs in a single file, to make it easier to maintain. But in the end I do need one single, static HTML file. Again, I want to do this a build time, not at server side.
<DIV>
many lines of HTML
</DIV>
should be replaced by something like
<DIV>
#include tab1.html
</DIV>
I could write a script building the static page and hook it into VS2010, but is there some extension or function already existing?
-- Follow ups on using T4 ---
VS2010 - Assign html code formatting to T4 (.tt) file
VS2010 - disable validation for particular html file (not all files)
I ran across this question while searching for something else. You've probably moved on, but what the heck, maybe the answer will help someone else:
You could get so-called T4 templates to do this:
http://msdn.microsoft.com/en-us/library/bb126445.aspx
Alternately, Microsoft has similar capabilities in Razor, and it's more specific to HTML.
Here is a comparison of the two:
http://blogs.msdn.com/b/garethj/archive/2011/03/11/t4-vs-razor-what-s-the-skinny.aspx