Pandoc Markdown - insert rendered HTML - html

What is an easy solution to insert rendered HTML (no source code) into a Pandoc Markdown or LaTeX file?
I want to visualize an architecture diagram and tried Tikz but hadn't much success in a days worth trying and I figured HTML essentially can do the same and I am already familiar with it.
Only problem I have, is that I haven't found a good way to import it into Markdown.
What I figured so far:
PDF seems problematic as you can only insert entrire pages and you don't have labels.
Images would work I guess but I haven't found any native solution

HTML and PDF are so different that images are the easiest way to bring one into the other. The best choice to embed a vector image in PDF is via another, cropped PDF "image", with a high resolution PNG being the second best option. Open source tools like ImageMagick or GIMP can help you with these transformations.
My slightly more general advice would be to use Mermaid diagrams in combination with Quarto. Mermaid is a very 'Markdown-esque' way of drawing diagrams and is supported by GitHub and the like, so it can even be embedded in README files. Quarto is based on pandoc but is more opinionated and has many addons and improvements built on top (including support for diagrams).

Related

Typesetting Math using MathJax for IPython Notebook Web page

I chose to learn to write mathematical expressions in IPython Notebook and sat to explore learning resources. I found this official link.
I reproduced the whole tutorial on my IPython NoteBook locally and then published it on my website. I also explained the whole struggle I went into while trying to achieve this in my blog post.
Unfortunately, whatever I do, I am not able to get the web page render/display the math expressions at all. I am not sure why the whole notebook failed to show the display of equations when the last couple cells gave the right output.
I also checked both the html codes of the source web page and my web page to find that the <script> code is where it has to be in mine.
I do not know JavaScript and LATEX at all! Can someone help please!
Again, this is the Source link.
And this is my webpage where I copied the source tutorial.
Also, is the ipython nbconvert the best and the right way to convert .ipynb files into html?
The problem you face is that currently pandoc (the document converter) strips raw latex when converting the markdown cells to html, see docu. Math in pandoc needs to be inside $..$ or $$..$$ delimiters.
The last couple of cells have explicit $s and thus convert fine. The other cells don't have this latex markup and thus get stripped during the conversion. You can try to embed the not working equations in $$s to prevent their removal (not tested).
There is a very recent PR to pandoc, to not strip raw latex when converting to html if the --mathjax option is supplied. Nevertheless, it will take some time to get this feature available unless you build pandoc from source.
And yes, nbconvert is the right (and best) tool to convert .ipynb files to html!

PDF to Structured Format

I have tons PDFs that I need to convert to some structured format that I can interpret (HTML/XML/etc)
PDFs are in this format:
http://img840.imageshack.us/img840/5407/pdfv.png
I have tried so far a lot of softwares that convert to HTML but all of them have no capabilities to separate the images, they just take like a printscreen of the page without the text and then use this image as a background in the html, using css to position the text
Like this: http://img37.imageshack.us/img37/5015/examplelp.jpg
I have a bunch of PDFs so process each ones images manually is not an option. Does anyone knows any solution for this (even paid softwares)?
I had a similar problem a while back and ended up writing my own solution. It's called PDFX and it's free to use. It converts PDF to a structured-format XML and also renders any bitmap images (not vector graphics) found in the PDF separately.
Example input/output can be found here. You might want to give it a try.

Converting a web page image layout to HTML

A graphic designer created a web page design and I have it in PSD now.
What are the tools or techniques used to easily convert this image into HTML.
to get the best result you need to code up the html yourself integrating the relevant graphics when needed. if you are unable to do this yourself there are quite a number of companies that will take a PSD and code it into HTML for you. One example being www.psd2html.com - do a search on google for more examples.
check http://www.bolducpress.com/tutorials/from-photoshop-to-html/ for a great tutorial about "slicing" which is one technique to "convert" a psd-file to a webpage.
Use the Slice Tool to slice up the psd file into chunks of graphics that can be layed out on a web page. Then choose Save for web... to save these chunks into individual jpeg, gif or png files.
Have it sliced if you must, but better build carefully planned HTML by hand, or have it done for you.
There are slicing tools that others will be able to tell more about. I personally think there is no better way really than creating the basic HTML and CSS by hand. Because what you build now is the foundation for your entire web site, and any future extensions to it, it is really worth the effort.
If you go this route, you would pick a normal HTML editing program or platform and sketch out the basic structure according to the layout you have.
If you have little experience with HTML and need to get the job done, try out a slicer. If you have time and/or money, work it out by hand, use a high quality template as a basis, or have it done professionally.
Whilst I would agree with all of the comments above, if you want to do this yourself or don't have the knowledge/funds, you can do it with the likes of Dreamweaver/Fireworks, but as everyone has said, you won't get good HTML and unless you use it properly you'll have problems if you ever make changes to your page as changing sizes will break your layout.

Math equations on the web

How can I render Math equations on the web? I am already familiar with LaTeX's Math mode.
The other answers are out-of-date. As of 2012, beautiful math is easy to write and render. The technology is called MathJax. You can see it in quiet action on MathOverflow and hundreds of math blogs.
MathJax is an open source JavaScript display engine for mathematics that works in all modern browsers. No more setup for readers. No more browser plugins. No more font installations… It just works.
Mathjax is reliable and unobtrusive, so you just need to write the math. You do so in Tex (Latex), a concise syntax with which most scientists and mathematicians are familiar (and have shared decades of good tutorials). For Mathjax, you simply write Tex code in-line in your HTML between double dollar signs, eg.
When $$a \ne 0$$, there are two solutions to $$ax^2 + bx + c = 0$$ and they are $$x = {-b \pm \sqrt{b^2-4ac} \over 2a}.$$
To use Mathjax to render your math, put a Javascript line in your HTML header:
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js"></script>
If you publish on a platform such as Wordpress, Tumblr or Blogger there are plug-ins in their galleries to do this (Wordpress).
How does Mathjax render math? With Javascript it renders your math to beautiful HTML and CSS (remarkably resembling Latex) in a fraction of a second. If a browser supports MathML, it can render math through that too, but that's not important. It's a popular success because the end-user workflow is easy, not because of the technology behind it.
You can choose to use Mathjax (over png images) on Wikipedia if you have an account. Look for Special:Preferences / Appearance.
MathML is ridiculous. It's neither human-readable nor human-writable (the quadratic equation takes 800 characters - it's 50 in Tex). It's just another pointless XML language . Thankfully, it's obsolete before most browsers support it. It doesn't even look as good as Tex or Mathjax's HTML-CSS!
It turns out this is a bit of a pain.
You can use MathML, but browser support is still iffy. If you are starting with latex you've got a few options for converting to html, but they'll all typically end up rendering the actual equations to images and inlining those.
Nothings all that pretty (unless you resort to pdf or something). What's best will depend a bit on what sort of content, how many equations, and how complicated the equations are.
Here is a decent summary.
My two favorite approaches:
Client-side: MathJax. See some examples here. It is very easy to use and install and its development is backed by the AMS and SIAM among other scientific institutions. I expect this to become the defacto standard for displaying math on the Web.
Server-side: LaTeXML. This is used for producing the NIST Digital Library of Mathematical Functions. It tends to hiccup if you have custom macros in your TeX sources but in general it does give very good results.
The jsMath package is another option that uses LaTeX markup and native fonts. Quoting from their webpage http://www.math.union.edu/~dpvc/jsMath/:
The jsMath package provides a method
of including mathematics in HTML pages
that works across multiple browsers
under Windows, Macintosh OS X, Linux
and other flavors of unix. It
overcomes a number of the shortcomings
of the traditional method of using
images to represent mathematics:
jsMath uses native fonts, so they
resize when you change the size of the
text in your browser, they print at
the full resolution of your printer,
and you don't have to wait for dozens
of images to be downloaded in order to
see the mathematics in a web page.
There are also advantages for web-page
authors, as there is no need to
preprocess your web pages to generate
any images, and the mathematics is
entered in TeX form, so it is easy to
create and maintain your web pages.
See for example this page or that one.
Katex
A couple of developers from the Khan Academy released a blazing quick library based off of Tex called Katex:
Fast
High-quality
Self-contained; and,
Can be rendered on the server
Looks like a great modern option.
You can do more math directly in HTML than most people realize. See these notes.
The only safe way to render LaTeX is to save the output as an image. Some sites try to use tools to do this on the fly, and they never work reliably. For example, on some blogs, this works if you visit the web page directly but not if you go through Feedburner/Google Reader.
I've had terrible experience with MathML browser support, both in Firefox and IE. Don't even try it. Not yet. Maybe in a few years.
Here's the site I use to compile LaTeX to gifs.
If you're willing to use PDF instead of HTML, things get much easier. Just create your LaTeX document and use pdflatex to compile it to PDF. If you do go the PDF route, you may be interested in how to include PDF properties such as author, keywords, etc. in your LaTeX file. Also, this page explains how to mark up the LaTeX to make links in your PDF.
texvc can convert LaTeX math equations to png or HTML.
LaTeX and MathML are the only "right" ways to do this. However each has severe limitations. The other options are images (not really optimal if you need to edit the equations later) or complex HTML(requires some training but can be done).
I do render LaTeX formulas "on demand" in my wiki. Basically, I extract the latex code from each wiki section and put it into a .tex file (whose filename is an md5sum of the latex, so if the same code is used again, the same tex and therefore the same image will be used).
The tex file is then latex compiled by a cron task every minute, to produce first a .ps, then with the convert program a .png (named again with the original md5). The wiki entry replaces the latex text with an img tag referring to this png (with the original latex code as an alt, for text readers).
If you want to go this way, be very careful to sanitize your latex as much as you can. there are commands in latex, like \input, that you definitely do not want to let go through, as anybody able to use them would be able to include any readable file in your server disk and include it in the resulting latex output.
To solve this issue, Mediawiki (of wikipedia fame) has a special plugin which sanitizes the latex input, but I didn't want to use it for two reasons: first I did not use mediawiki, second it's written in OCaml and I didn't want to mess with a language I don't know.
I've used ASCIIMathML for this in the past. It's essentially a JavaScript library and can use a plugin in IE to optimize performance, but also works without it in IE & Firefox/Mozilla (although a bit slower). The syntax supports a subset of LaTeX, but the differences cause some confusion, so it may confuse your users, depending on where they are coming from.
Here are some links so you can check it out yourself:
ASCIIMathML
ASCIIMath Tutorial
Not perfect and doesn't work in all browsers (Safari, etc) but it's something that works today at least, albeit in a somewhat selective subset of the web.
I've written an open source javascript module to do this, named jqmath. See http://mathscribe.com/author/jqmath.html. You type equations in a simplified TeX-like syntax, and jqmath converts them to MathML or simple HTML and CSS, depending on the browser. This is more efficient and accessible than using images.
By the way, some of the summaries and notes mentioned in the other answers here are pretty outdated now. Also, Firefox supports MathML now, and webkit (Chrome and Safari) have it in their nightly builds, though they haven't released it yet. Internet Explorer renders MathML if you have the MathPlayer plugin. Opera fakes MathML with a stylesheet. MathML is part of the HTML 5 standard, so presumably all these browsers will natively support it sooner rather than later. It's true that until then, jqmath's output will not look as good as TeX's, but it's certainly readable, and is definitely a better solution for web pages going forward.
If you do use images, will a reader for a blind user be able to read the equation? Some may want to.
There is a little Mac App called LatexIt that makes it very easy to convert LaTeX equations to PDF, PNGs etc.
(I use it to create equations for my slides in Keynote or PowerPoint. It's very nice, with drag 'n drop support, so you can just 'drag' the equations anywhere to insert them.)
I have the impression that MediaWiki will allow you to enter LaTeX markup (or something similar) and dynamically decide the best way to display it. Currently I think that uses HTML where possible for small expressions and images for more complicated expressions that cannot be represented otherwise; I suspect that one day it may take advantage of whatever other methods become state of the art, i.e., MathML if browsers start supporting it. So I think you might find that if you use MediaWiki as if it were your website engine you'll be forward-compatible with whatever comes in the future.
You can generate equation image on-the-fly via a LaTeX server.
http://www.forkosh.com/mimetex.html
If you are using WordPress, you can use LaTeX for WordPress (http://wordpress.org/extend/plugins/latex/) plugin.
Currently the state of client side MathML rendering isn't ready for broad adoption. The means you really need to render the MathML as an image. How you do this will depend on your environment.
Do you have root access to your own server? Are you comfortable installing software on it? In this case, you can render your own images. If your running blogging software or a wiki, generally you can find a plugin which will take advantage of your platforms capabilities. This is usually the idea scenario if you plan to write a lot of math expressions.
If you host your own images, you can either pre-render them, or use an extension like mimetex.cgi. If you allow arbitrary MathML expressions to be rendered, you run the risk of other websites hot linking to your image renderer. If you put a filter in on your web server to prevent hot linking, then people viewing your site through a feed reader will also be blocked.
If you can't render your own images, or if you only have a few expressions you want to render, then you can usually have another service generate the image, and you hot link the image on your site. The downside of course is your dependent on another site, who gets nothing in return for serving up images for you.
Examples of other services (as mentioned in other comments) include:
* http://www.artofproblemsolving.com/LaTeX/AoPS_L_TeXer.php : alt text http://alt2.artofproblemsolving.com/Forum/latexrender/pictures/a/f/c/afc183343d84d030898f589bac12a8d9cf04558a.gif
* http://www.forkosh.com/mimetex.html : mimetex.cgi http://www.forkosh.dreamhost.com/mimetex.cgi?c=%5Csqrt%7Ba%5E2+b%5E2%7D
The advantage of using mimetex is one can easily change the formula and have it re-rendered.

Anyone know of a good algorithm for rendering an HTML table to an image?

There is a standard two-pass algorithm mentioned in RFC 1942: http://www.ietf.org/rfc/rfc1942.txt however I haven't seen any good real-world implementations. Anyone know of any? I haven't been able to find anything useful in the Mozilla or WebKit code bases, but I am not entirely sure where to look.
I guess this might actually be a deeper problem with having to actually render HTML (the contents of table cells) but just to keep it simple - plaintext HTML table as an image. Even an HTML table rendering algorithm ignoring the "as an image" part...
If a commercial tool is an option, look at:
HtmlCapture ActiveX Control V2.0 (originally named HtmlSnap)
Some features they claim:
By calling SnapHtmlString(), you can take a snapshot for a html string.
Get snapshot images rendered by either Microsoft IE or Mozilla Firefox.
Just by calling SnapUrl() and SaveImage(), you can take a snapshot of a webpage into various images, such as BMP, JPG, JPEG, GIF, PNG, TIF, TGA and PCX.
Convert html to vector image format like EMF and WMF.
Self contained ActiveX control with no third party dependencies.
Support custom gdi output of the resulting image.
Support saving resulting image both to file and in memory.
Support saving both full-size web page and thumbnail one.
Take a snapshot of a whole webpage into one image without scrollbars.
Make grayscale or B&W images with efficient algorithms to keep the quality.
Support JPEG compression level, compression method selection of TIFF and GIF.
Support setting color depth in images while keeping the quality of the image as much as possible.
Selectively save activeX, image, java applets, scripts and videos on a web page as you want.
Send custom cookies, http headers, credentials in snapshot requests.
Take snapshots of webpages via a Proxy server.
More than 30 samples written in VC, C- , Delphi, VB, C++ Builder, Java, JScript, Perl, VBScript, ASP, ASP.net and PHP are provided.
html table rendering is non-trivial due to the various ways that the sizes of the cells may be specified, tables nested within tables, etc.
if all you want is the image, a simple solution would be the .NET browser control (which is basically the COM component for IE) and a screen-capture function
if you want to get some source to manipulate, the Mozilla source should still be available
I'm not sure if this will meet your constraints or not, but you can try using IE or an IE control with MSHTML and the IHTMLElementRender interface to render the table to a device context.
If you have XHTML, not plain HTML, you should be able to retrieve the content of those cells along with information about the table's structure: colspan, rowspan, etc. Using this information, you can render the table using your own border, padding and margin values.
Things get complex when you also want to render the user defined dimensions. But for retrieving the table data and drawing it, you could use an XML parser. PHP's parser is here: http://ca3.php.net/xml
One tool that comes close is: http://www.terrainformatica.com/htmlayout/main.whtm
This library offers a way to capture rendered HTML to an image, however it is not open source (but free!). Hope it is useful to some!
Unfortunately my app is cross platform, C/C++ with no MFC or platform dependencies (nightmare!). I'm hopefully looking to find a general purpose algorithm for table rendering. I think the 2-pass option from the RFC comes pretty close so I'm probably going to just dig in and work against that. I'll be sure to blog about it and post my eventual solution here if I can!
Take a look at Prince XML - it's a commercial tool to render CSS-styled XML (including XHTML) documents to PDFs. This tool is conform with major W3C standards such as XHTML and CSS2.1. You can try the free demo version from their Homepage!
Since you want an image: It shouldn't be a big problem to convert the generated PDFs programatically to an images.