Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Latex-to-html converters I've seen in the past have been pretty awful. Editing raw html is no fun and doesn't seem to translate well to the printed page. How do others solve this problem? Links to examples (both pdf and html) would be great.
Added: Another similar question was just asked:
What formatting language should I use for project documentation
For documenting code, I also recommend Sphinx. ReStructured Text is nice because it is readable and somewhat marked up in plaintext, and can do a nice job converting to html and to pdf. I still like LaTeX for certain things. My wife and I use LaTeX to write our christmas letter, which we mail out via snail mail. The pdf version is pretty fancy, with two columns, and headers and footers. The html version is simpler. I convert with plastex. Examples here:
http://fedibblety.com/annualReports
I don't think any binary format is a good choice (Word) for any sort of document that you might like to read 10 years from now. That is one of the nice things about LaTeX.
Yes, LaTeX-to-HTML converters used to suck (you've probably tried LaTeX2HTML), but of late they've got better. Tex4ht is highly configurable, and produces nice XHTML+CSS. See also other converters.
You can also use Docbook, if you can bear to write in it. There are converters from DocBook to both HTML and LaTeX (or to PDF directly); an example of the latter is dblatex.
See this post: LaTeX vs Docbook.
After many years of anguish and several false starts, I'm about to revisit this, and I'm going to give Sphinx a try. It can generate HTML or LaTeX from ReStructured Text.
I'm hoping it will be a much "lighter" option than full DocBook, but with many of the advantages.
You could take a step back and use something like DocBook and render to PDF via LaTeX and HTML straight from the DocBook files. Alternatively, Adobe Technical Communication Suite (Framemaker) will let you single-source a document to PDF and HTML. See this posting for a rundown on various technical documentation systems.
This is a personal choice but Latex in theory is perfect however in practice it's pain-in-the-arse. I'm using VS.NET HTML editor + raw HTML edit when I need it.
So I think using an WSIWYG HTML editor is best choice. You can always use a simple tool to convert it to PDF, and you can always edit HTML when you need something advanced. Also it's easier to put online when you need.
That's how I'm managing my software documentations and works fine for me.
PlasTeX looks like a nice latex-to-html converter, though I haven't tried it myself.
My friend Rob Felty wrote a blog post extolling its virtues:
http://blog.robfelty.com/2008/03/19/finally-a-better-latex-to-html-converter/
AsciiDoc looks like an interesting possibility.
Read about EPUB format. Its e-book format. http://en.wikipedia.org/wiki/EPUB
Since the answer mentioning Asciidoc was somewhat short on examples, here are some of the things your are looking for:
A pdf generated with Asciidoc
A cheatsheet with a side by side of the Asciidoc markup and the html result.
A list of publications done using Asciidoc, including O'Reilly books and the git documentation (to see both ends of the user scale).
I'm not sure that latex is really the best tool for this. The trouble you're having with the usual latex to html converter is indicative of the problem: html is simple not as expressive as latex.
If you insist on latex to html, take care to use a limited subset that can convert reasonably.
I've used TeXinfo in the past and it does a good job. Here's an example: http://yootles.com/api. I'd prefer to stick with LaTeX though instead of use another language.
If everything else fails you could grab an LaTeX to XML converter and write a simple XSLT stylesheet to convert it to HTML, or create a CSS style sheet and attach it to the XML file directly.
We've been using WebWorks ePublisher (www.webworks.com) which offers both multiple single-source formats (we are using Word) and the ability to output to many output formats (we output to Adobe PDF and Online Help (.CHM).
We were facing this problem in an academic project that involved Eclipse software, and we used plastex to convert Latex to HTML and Eclipse Help. Getting it to work was quite difficult, but the end result looks really nice. You can see all three versions here:
http://handbook.event-b.org/
Further, as this is an open project, the code (build scripts) are available. We have a continuous build system (Jenkins) that rebuilds everything when new Latex is checked in. This is particularly nice, as contributors don't need to install the toolchain on their systems. They just check in the new Latex and check on the server whether the HTML was produced correctly. Sources:
http://sourceforge.net/p/rodin-b-sharp/svn/HEAD/tree/trunk/Handbook/org.rodinp.handbook.feature/
Best, Michael
I don't have enough points to comment, but to bolster the plastex answer, here is the updated plastex example link:
http://robfelty.com/2008/03/19/finally-a-better-latex-to-html-converter
LaTeX? Seriously? I wasn't aware anyone outside academia still used it. I'd go with HTML, which you can save as PDF from the web browser. If you really must have some advanced typographic stuff, go with Word instead - it has a way to save to HTML (probably not as clean as one would like), and you can save as PDF with a free plug-in (downloadable separately).
Oh, and I wouldn't bother using things like InDesign - they are overkill. Also, don't bother paying for Acrobat Professional - there is a zillion free solutions available.
Related
I'm trying to implement my own little reader view app (an app that would do the same thing as reader-mode on safari), and there are a few things I find asking myself:
Is there a technical term for this feature (reader-view doesn't really cut it)?
Is there a standard that websites are supposed to follow in order to indicate the content they would like to have in their reader views
Is there an open-source set of HTML parsing rules to pull the "readable" content from a website?
Is the effort to implement such a thing simply too big for a single person in a few weeks and if so should I opt for services such as Instaparser?
I believe the original to be implemented by arc90, and they called it readability. You can check out their page here.
It's been ported to many different languages over time, so you could take a look at the different implementations to learn more about it, how it's done etc.
Python readability
JReadability
JavaScript
Ruby
This is just a small sample here, there's many more examples if you would like to find more.
Edit: Oops, after some more Googling I found this question with an answer that explains it very well.
I am pretty new to developing softwares and am intrigued by the huge world out there!! I have working knowledge of C/C++ and Java.. I was thinking of making an application that would convert a webpage to a pdf document.. I know there are many solutions available -- both online and offline..But I want to develop my own.. I googled but couldn't find anything that would help me get started..
I want to know how do we go about a conversion process?? How to get started?? What languages and technologies are pre-requisites for making a converter like this??
Thank You
So at least you need to get to the bottom to following specifications:
HTML specification
CSS specification
JavaScript specification
PDF specification
Moreover here are a lot of minor stuff such as Fonts, Decription/Encription algorithms and many many other minor but still necessary things.
I think you can imagine that this is quite a long way to get all this working. In fact, the complexity of such software is the reason why so many companies make money in this field.
Anyway, I'd suggest you to start from the simple things and grow your software gradually. Start with converting HTML to Image, because it is a bit simpler. Take and parse HTML, its CSS, its JavaScript. Clean HTML. Build DOM of the HTML document. Apply styles. Go thru the DOM and draw elements to the image.
Good luck!
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I've googled (without any luck) for open source software that can convert doc, ppt, and pdf to HTML5. (Exactly what Scribd does) Are there open source equivalents to the type of conversion Scribd does?
If anyone knows of a paid service, that would also work. Scribd has an API, but that's for use with the flash viewer. Also, I would like to host my own content as I need further control over converted html document.
You're unlikely to find a single offering that does all this, especially in the open source world. It's more likely that you'll end up relying on a mishmash of things, and may even need to chain some converters in order to get to HTML. (Eg PDF -> ps -> HTML)
OpenOffice supports conversion to HTML, and can be called from the command line.
http://pdftohtml.sourceforge.net/ looks reasonably good at converting pdf to html.
For Doc that is Word ML or OpenXML format it's conceivable that you could use XSLT transforms since both input and output formats are XML. I've seen some stylesheets floating around the net that do this, but YMMV.
Incidentally, why is there a specific requirement for open source? MS Powerpoint already supports save-as-HTML for example.
Open Office will convert pdf to html but you'll take a hit to design quality.
I suggest either: Crocodoc as a paid service (It provides different flavours for different platforms such as Python,Ruby,Java,PHP Developers are allowed to work on their APIs.) or waiting for an official Adobe tool (it's in the works).
For PDF to HTML conversion, pdf2htmlEX seems like a pretty good tool (looking at all the examples/samples):
https://github.com/coolwanglu/pdf2htmlEX
For pdf there is an open source project started by mozilla and it's very good: https://github.com/mozilla/pdf.js/
You can see a hello world example : https://github.com/mozilla/pdf.js/tree/master/examples/helloworld
For the rest of document types I think LibreOffice said that are planning to build something in html5, but so far there isn't anything done.
http://wvware.sourceforge.net/
wvHtml: convert your Word document
into HTML4.0.
Possibly:
http://www.abisource.com/
but in this case it looks like "open doc" > "export html" manually, maybe plugins help. Not sure, what do you mean: "source software that can convert".
Or this:
http://www.zope.org/Members/sf/NuxDocument
Also the pdftohtml will give you an html page output.But you will have to work upon its graphical interface.Since it doesn't seems to be very interactive.
I know the question is bit old however I have found new Open source tool called flaxpaper http://flexpaper.devaldi.com/
Does anyone have any suggestions on how to generate accessible PDFs (including images) from HTML?
The PDFs need to look like the original HTML, including positions of images etc.
Any special HTML structure required to help make the final PDF accessible?
I've seen questions about creating PDFS none of them specifically address the important issue of accessibility.
My poison of choice is Perl but references to any program, language or library will help.
I have a more in-depth question at TypeDoc if anyone has more general information to offer.
http://doctype.com/TiB
Also,
I, and others, would find it useful if users with accessibility problems could comment if they find the "usability experience" of using PDFs better or worse than reading from Plain Old Semantic HTML (POSH).
Thanks
Mike
Look into PrinceXML. Through CSS you can control margins, page breaking and orientation. While not open source, you can try it for free, but it places a small water mark in the upper right corner.
The Adobe ColdFusion server product does a really fine job of this, not surprisingly. But it's not free, and the open source implementations of the language (Smith and BlueDragon) don't support the pdf stuff.
Developer licenses to Adobe ColdFusion are free, and you can download it.
I've done this thing on a small scale but scripting Safari to print to PDFs. I don't recommend it for large-scale projects though.
By far the most capable PDF publishing tool I've ever come across is reportlab. There is an open source library written with Python and a proprietary system that allows you to construct a document using RML, a custom xml spec. The latter is easier for more complex docs. They tend to be very flexible (and reasonable) with pricing.
Not strictly an answer to your question as it doesn't handle html-to-pdf conversions, but perhaps of use to you.
I'm looking for a HTML editor that kinda supports templated editing or live snippets or something like that.
Background: I'm working on a website for a friend. As there are no specifications what the webspace/webserver can or can't do, I decided to make it a pure HTML/CSS page, or rather 10 of them. I wrote a template, copied it 10 times and edited the content. And guess what, the template has to be changed.
Therefore I'm looking for a (HTML-)editor that has some kind of live template system where I can edit the content in as it where plain text and then save the project into the 10 pure HTML/CSS files.
I thought about using PHP (the only script language I've some knowledge in), but writing the underlying template script would cost me enough time that I could change all files by hand. I'm not that familiar with AJAX to know if there's a way to load content from another file. If so, this would be an option if there already is a script. With Webdeveloper (firefox extension) I could save the generated source code as HTML/CSS.
Thanks in advance
Edit: any hints how to do this without an editor are welcome
Edit2: In my mind the tool looks like a plain old text editor like SciTe, but capable of editing multiple files simultaneously in the same text area, so it looks like editing one ordinary file, but actually it's a whole bunch of files.
Dreamweaver will do this for you, it's had HTML templating of the type your describe built in from very early versions (because from how you phrase the question I do not think you're thinking along the lines of a PHP templating engine such as Smarty, but some sort of HTML layout formating)
Although I regularly look around for Dreamweaver replacements, and I've certainly been impressed by Aptana, I still tend to use Dreamweaver in my development stack simply because whereas I can compensate for some of the more coding-orientated features it misses, I find the WYSIWYG nature of the editor invaluable.
I would have used a template engine.
I wrote a post about a dead simple script using the Dwoo template engine and mod_rewrite, where I am taking the uri and loading the forrect data and template based on that. You should be able to get it running in a few minutes.
Maybe I am way off on this, but why don't you look into an Open Source Content Management System (PHP/MYSQL)? There are MANY light systems that are not like Drupal, Joomla (if you do not want the big bulk of those CMS's).
There are even a few good ones for light web design that are flat file driven.
That would be my suggestion, at least if not for this project, look into it for future projects.
Here is an example of a great micro CMS that would seem to fit the bill for what you are doing:
http://www.mini-print.com/