Convert pdf, doc, ppt to html5 [closed] - html

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I've googled (without any luck) for open source software that can convert doc, ppt, and pdf to HTML5. (Exactly what Scribd does) Are there open source equivalents to the type of conversion Scribd does?
If anyone knows of a paid service, that would also work. Scribd has an API, but that's for use with the flash viewer. Also, I would like to host my own content as I need further control over converted html document.

You're unlikely to find a single offering that does all this, especially in the open source world. It's more likely that you'll end up relying on a mishmash of things, and may even need to chain some converters in order to get to HTML. (Eg PDF -> ps -> HTML)
OpenOffice supports conversion to HTML, and can be called from the command line.
http://pdftohtml.sourceforge.net/ looks reasonably good at converting pdf to html.
For Doc that is Word ML or OpenXML format it's conceivable that you could use XSLT transforms since both input and output formats are XML. I've seen some stylesheets floating around the net that do this, but YMMV.
Incidentally, why is there a specific requirement for open source? MS Powerpoint already supports save-as-HTML for example.

Open Office will convert pdf to html but you'll take a hit to design quality.
I suggest either: Crocodoc as a paid service (It provides different flavours for different platforms such as Python,Ruby,Java,PHP Developers are allowed to work on their APIs.) or waiting for an official Adobe tool (it's in the works).

For PDF to HTML conversion, pdf2htmlEX seems like a pretty good tool (looking at all the examples/samples):
https://github.com/coolwanglu/pdf2htmlEX

For pdf there is an open source project started by mozilla and it's very good: https://github.com/mozilla/pdf.js/
You can see a hello world example : https://github.com/mozilla/pdf.js/tree/master/examples/helloworld
For the rest of document types I think LibreOffice said that are planning to build something in html5, but so far there isn't anything done.

http://wvware.sourceforge.net/
wvHtml: convert your Word document
into HTML4.0.
Possibly:
http://www.abisource.com/
but in this case it looks like "open doc" > "export html" manually, maybe plugins help. Not sure, what do you mean: "source software that can convert".
Or this:
http://www.zope.org/Members/sf/NuxDocument
Also the pdftohtml will give you an html page output.But you will have to work upon its graphical interface.Since it doesn't seems to be very interactive.

I know the question is bit old however I have found new Open source tool called flaxpaper http://flexpaper.devaldi.com/

Related

Converting HTML to doc(x) and / or PDF [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have to convert html to the doc(x) and pdf format.
I found aspose, but this tool can do a lot of more work than i need, and thats why it isn't really cheap.
Are there similar tools, which can just do this conversion ?
I need this on a Desktopapplication where no word / office is installed
*Just for Info Finally bought asponse words. all other options weren't as good as this tool
Assuming that these are essentially “documents” and not fancy graphical web pages (i.e. you'd like them to be legible, but aren't deeply concerned with the minutiæ of web layout formatting), you can use LibreOffice to convert them; either manually (open, export as…) or using the "headless" mode, e.g.:
soffice -headless -convert-to pdf -outdir pdfs/ *.html
soffice -headless -convert-to doc -outdir docs/ *.html
Free, cross-platform, but a bit of a hefty install. (I think it's nearing the half-gigabyte mark for the full suite with all the plug-ins installed, but you should only need the Writer component)
Maybe this http://kitpdf.com might help. I tried it, it's free and really easy to use.
You can use ABCPdf:
http://www.websupergoo.com/products.htm
I can't speak for docx format, but you might look into DocRaptor to convert HTML to PDF format. It definitely handles CSS styling better than comparable programs, and doesn't just give you an image like creating a PDF with Photoshop.
If the webpage is or can be hosted then you can download an extension for Google Chrome called Screen Capture, this allows you to take a full screen grab of the webpage then you can paste it into Photoshop and Save As a .pdf (that is assuming you have Photoshop that is)

Is there a library for converting Flash / Flex AS3 TextLayoutFormat data to HTML and CSS? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have the job of recreating a flex app in HTML and CSS. The existing app makes considerable use of TextFlow to layout content. For several reasons I need to be quite accurate (within a few pixels) with positioning.
The current application is loading data which looks like this:
<p paragraphstartindent="0"
textalign="center"><span alignmentbaseline="useDominantBaseline"
backgroundalpha="1"
backgroundcolor="transparent"
baselineshift="0"
breakopportunity="auto"
cffhinting="horizontalStem"
color="0x0"
digitcase="default"
digitwidth="default"
dominantbaseline="auto"
fontfamily="ArialCFF"
fontlookup="embeddedCFF"
fontsize="22"
fontstyle="normal"
fontweight="bold"
kerning="auto"
ligaturelevel="common"
lineheight="120%"
linethrough="false"
locale="en"
renderingmode="cff"
textalpha="1"
textdecoration="none"
textrotation="auto"
trackingleft="0"
trackingright="0"
typographiccase="default">Here is some content which needs to be accurately positioned</span></p>
Ideally I'm looking for a library I can use to translate these many attributes into "proper" html and css. The current technology stack is PHP at the back end and javascript at the front end, but there would be little problem in using any other language to do the translation.
Failing that I guess I'll try and write my own, using the api reference as a guide.
I don't think there's a lib available for that, but from having a quick look at the docs, it should be too hard to translate over. Most of the options you can ignore as they're impossible to do in css (without going into css3 - I'm assuming you want maximum compatibility here) and the rest are pretty basic (colour, font, padding, line-height...)
Maybe Wallaby, the Adobe App to convert FLA files to HTML5/CSS can be helpful if you manage to make it work with your Flex Files... http://labs.adobe.com/technologies/wallaby/
This, of course, would just be a starting point :) but hope it helps.
Unfortunately, you will never get pixel accuracy in HTML text, by design. Font rendering strategies between browsers, and even different browser modes (eg: IE9, Safari for Windows) can have different layouts.
You may be able to export your content to HTML with the TextConverter class.
I would go with quick and easy since you only need formatted DOM Elements (HTML Tags). This is [part of Flash Player 9] somewhat reliable - you might to give it a try...
source : flashx.textLayout.elements:TextFlow
format : String
conversionType : String
(returnHTML as Object) = flashx.textLayout.conversion.export(source,format,conversionType)

Dynamic HTML to PDF [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I need to be able to convert dynamic HTML (html that is rendered on page load by javascript) to a PDF. I know there are plenty of HTML to PDF converters but none of the ones I have found thus far cope with dynamic HTML.
The given tool should be able to successfully convert the following page - http://www.simile-widgets.org/timeline/
Cheers
Anthony
UPDATE:
I don't need the JavaScript functionality here... i.e. i don't need to be able to interact screen... I just want the finial rendering of the screen to be captured in the PDF - like taking a photo after the page is loaded. And in the example I provided the javascript is only rendering divs to the screen so its nothing that it shouldn't be able to handle as long as it "lets" the "page" render first.
There is no way it can be done. The interfaces available for scripts in PDF are extremely limited compared to the full DOM and BOM access you enjoy in a web browser. Such interaction as you can achieve in PDF is not readily translatable from how it works in a browser and would almost certainly need hand authoring.
Your example page has many effects that PDF, as an essentially static document layout format, simply cannot reproduce at all.
Edit:
I just want the finial rendering of the screen to be captured in the PDF
Ah, OK, that's a far easier and more common problem then.
In that case you'll have to use and automate a real web browser (like Firefox), or a toolkit that provides all the logic of a web browser (like WebKit), then either:
export to PDF, either using built-in tools like ‘Print to file’ in Firefox (with background images/colours turned on) or one of the PDF export add-ons, or
take a image snapsnot of the browser (and include the image in a PDF if you have to)
See these questions for some discussion of browser snapshotting.
The fact that it uses any JavaScript at all means a lot of converters won't work. The JavaScript may be simple, but you still need an interpreter to handle it.
I haven't used it for myself, but you might try wkhtmltopdf. It uses the webkit rendering engine, and I believe it includes full javascript support. You would need to be able to install the software and run the executable, but otherwise it should be fairly straightforward.
You could use a javascript URI to alert the current DOM. eg:
javascript:alert("<html>" + document.documentElement.innerHTML + "</html>")
Copy the HTML and save to a file.
Then run it through the HTML2PDF converter.
dynamic-html-pdf
This is best library for node js convert dynamic html to pdf.
https://www.npmjs.com/package/dynamic-html-pdf
You can probably use PhantomJS or headless chrome.
Try xhtml2pdf. Here's the project page at python.org.

Is there a good website with lessons to learn HTML? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am looking for recommendations for a starting website to learn how to write HTML code
This question seems a bit weird... what do you mean by "sandbox"?
Usually you simply practice writing HTML by using a text editor and opening the local file from the browser.
start here at w3schools.com. They provide a niftly little sandbox with sample code for all your web design element questions.
Notepad + any broswer - This works well for me. Just save your file to .htm
Or if you want, get FireFox or Opera, go to any site (say, stackoverflow.com or w3schools.com), view the source, edit away and then apply the changes. Don't worry, the changes only affect a single tab and doesn't changes anything on the web.
Sandbox for HTML? you must be kidding.There are no chances of getting hurt even if your HTML goes wrong. So you don't need a sandbox.
Use any decent editor which gives a two-tab view for Source-code and Quick-view, and you are done. You can use MS Frontpage or EditPlus, both offer these features. You don't need to save to see the effect.
Please don't clog the bandwidth for just testing and debugging HTML. It ain't worth it.
Some things don't work with Javascript when served from file:// due to security protocols, and sometimes it can be too much of a pain trying to get a webhost up and running for experimenting with stuff.
http://www.webdevout.net/test
I have found to be a convenient playground tool, with the benefit when you mangle something up and you want help to work out what you did wrong you can post the link to somebody and they can see what you've done without you needing to worry about security, hosting, or firewalls.
I'd say check out these video tutorials from net tuts. It starts off with the very basics and then moves on to more in depth stuff. The tutorials are organized as a 30-day course, where they'll mail you a link to a video tutorial each day. The idea being you'll have learnt html/css within 30 days. But you really don't need to sign up for the mailing service, just take it at your own pace.
http://learncss.tutsplus.com/

How best to write documentation targeting both HTML and PDF? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Latex-to-html converters I've seen in the past have been pretty awful. Editing raw html is no fun and doesn't seem to translate well to the printed page. How do others solve this problem? Links to examples (both pdf and html) would be great.
Added: Another similar question was just asked:
What formatting language should I use for project documentation
For documenting code, I also recommend Sphinx. ReStructured Text is nice because it is readable and somewhat marked up in plaintext, and can do a nice job converting to html and to pdf. I still like LaTeX for certain things. My wife and I use LaTeX to write our christmas letter, which we mail out via snail mail. The pdf version is pretty fancy, with two columns, and headers and footers. The html version is simpler. I convert with plastex. Examples here:
http://fedibblety.com/annualReports
I don't think any binary format is a good choice (Word) for any sort of document that you might like to read 10 years from now. That is one of the nice things about LaTeX.
Yes, LaTeX-to-HTML converters used to suck (you've probably tried LaTeX2HTML), but of late they've got better. Tex4ht is highly configurable, and produces nice XHTML+CSS. See also other converters.
You can also use Docbook, if you can bear to write in it. There are converters from DocBook to both HTML and LaTeX (or to PDF directly); an example of the latter is dblatex.
See this post: LaTeX vs Docbook.
After many years of anguish and several false starts, I'm about to revisit this, and I'm going to give Sphinx a try. It can generate HTML or LaTeX from ReStructured Text.
I'm hoping it will be a much "lighter" option than full DocBook, but with many of the advantages.
You could take a step back and use something like DocBook and render to PDF via LaTeX and HTML straight from the DocBook files. Alternatively, Adobe Technical Communication Suite (Framemaker) will let you single-source a document to PDF and HTML. See this posting for a rundown on various technical documentation systems.
This is a personal choice but Latex in theory is perfect however in practice it's pain-in-the-arse. I'm using VS.NET HTML editor + raw HTML edit when I need it.
So I think using an WSIWYG HTML editor is best choice. You can always use a simple tool to convert it to PDF, and you can always edit HTML when you need something advanced. Also it's easier to put online when you need.
That's how I'm managing my software documentations and works fine for me.
PlasTeX looks like a nice latex-to-html converter, though I haven't tried it myself.
My friend Rob Felty wrote a blog post extolling its virtues:
http://blog.robfelty.com/2008/03/19/finally-a-better-latex-to-html-converter/
AsciiDoc looks like an interesting possibility.
Read about EPUB format. Its e-book format. http://en.wikipedia.org/wiki/EPUB
Since the answer mentioning Asciidoc was somewhat short on examples, here are some of the things your are looking for:
A pdf generated with Asciidoc
A cheatsheet with a side by side of the Asciidoc markup and the html result.
A list of publications done using Asciidoc, including O'Reilly books and the git documentation (to see both ends of the user scale).
I'm not sure that latex is really the best tool for this. The trouble you're having with the usual latex to html converter is indicative of the problem: html is simple not as expressive as latex.
If you insist on latex to html, take care to use a limited subset that can convert reasonably.
I've used TeXinfo in the past and it does a good job. Here's an example: http://yootles.com/api. I'd prefer to stick with LaTeX though instead of use another language.
If everything else fails you could grab an LaTeX to XML converter and write a simple XSLT stylesheet to convert it to HTML, or create a CSS style sheet and attach it to the XML file directly.
We've been using WebWorks ePublisher (www.webworks.com) which offers both multiple single-source formats (we are using Word) and the ability to output to many output formats (we output to Adobe PDF and Online Help (.CHM).
We were facing this problem in an academic project that involved Eclipse software, and we used plastex to convert Latex to HTML and Eclipse Help. Getting it to work was quite difficult, but the end result looks really nice. You can see all three versions here:
http://handbook.event-b.org/
Further, as this is an open project, the code (build scripts) are available. We have a continuous build system (Jenkins) that rebuilds everything when new Latex is checked in. This is particularly nice, as contributors don't need to install the toolchain on their systems. They just check in the new Latex and check on the server whether the HTML was produced correctly. Sources:
http://sourceforge.net/p/rodin-b-sharp/svn/HEAD/tree/trunk/Handbook/org.rodinp.handbook.feature/
Best, Michael
I don't have enough points to comment, but to bolster the plastex answer, here is the updated plastex example link:
http://robfelty.com/2008/03/19/finally-a-better-latex-to-html-converter
LaTeX? Seriously? I wasn't aware anyone outside academia still used it. I'd go with HTML, which you can save as PDF from the web browser. If you really must have some advanced typographic stuff, go with Word instead - it has a way to save to HTML (probably not as clean as one would like), and you can save as PDF with a free plug-in (downloadable separately).
Oh, and I wouldn't bother using things like InDesign - they are overkill. Also, don't bother paying for Acrobat Professional - there is a zillion free solutions available.