Generate a PDF with XSL/FO from XSLT (html) - html

I have a XML and a XSLT that generates a HTML output.
I would like to generate a PDF with XSL/FO that gives a render like the HTML output, just paged in PDF.
For this I understood that I had to use a XSL/FO stylesheet, that generates a .fo file, and then that I have to use Apache FOP to generate a pdf from the .fo file.
My question is: how to have the XSL/FO .xsl stylesheet that generates the .fo file, from my XSLT .xsl stylesheet that generates a HTML file ?
Is there any easy way to transform a XSL (html) to the other XSL (xsl/fo) ?
Thank you

Multiple people have produced XHTML to FO stylesheets at various times. See, e.g., 'xhtml2fo.xsl' on the 'XHTML to XSL-FO' tab of https://www.antennahouse.com/antenna1/xml-to-xsl-fo-stylesheets/.
Using your inline CSS would require parsing the CSS and transforming the properties into the nearest (often identical) XSL-FO properties.

You can use the XML or the HTML as input for the XSL-FO transformation.
You may be able to use your XSL as a starting point for the XSL-FO stylesheet. Replace the HTML output blocks with XSL-FO commands.
As an alternative, you could look into CSS Paged Media. Then you can use your existing HTML, plus a CSS stylesheet that's derived from your existing CSS.

Related

The general process of creating a printable pdf from xml

This is a very, very basic question. I'm self-taught using html, xml and css, so please forgive my absolute ignorance. My situation is as follows: I Know how to write XMl files, I can create the html output I want and use Css to style the page the way I need to. Now, I would like to print a book from this result. I need it to split the content of my html page into A4-pages, add page numbers and line numbers. What techniques do I have to learn to do this? I have read online that xsl:fo is used to transform xml to pdf. Is there any way I could use the html/css output with this or do I need to write an entire new stylesheet using xsl:fo? Do I need to learn javascript? I'm willing to do any of this, I just don't know where to start.
I had a look at importing my xml file into indesign and that would work, but then I'd have to do all the work of styling the text again. There has to be a better way.
If you want to use CSS to style your print output, the proprietary Prince XML seems to be the only tool that generates decent typography.
Turning to open source tooling, you could use XSLT to transform your custom XML to XSL-FO and then Apache FOP to generate the PDF, however the output is not so clean as with TeX and you'd need to specify all your layout in XSL-FO instead of CSS as well.
What I'd recommend is transforming your XML to (HTML or DocBook XML) and then use Pandoc to turn that into a PDF. Pandoc uses either pdflatex, xetex or luatex to generate the PDF. If you're not familiar with the LaTeX macro package, I recommend using the ConTeXt macro package instead, which has more consistent layout commands and doesn't rely on packages for basic functionality. To change the layout, use a custom Pandoc template file to generate the desired ConTeXt file. That would work as follows:
$ saxon -o docbook-file.xml custom.xml stylesheet.xslt #generate DocBook
$ pandoc -f docbook docbook-file.xml -t context --standalone --template template.tex -o out.tex #generate ConTeXt
$ context out.tex #generate PDF
Or look here http://www.cloudformatter.com/CSS2Pdf which uses XSL FO hidden to the user. You style with CSS. There are many samples showing book features like headers/footers, page numbering, multiple sequences.
You can try http://pdfcrowd.com/ - very simple and easy. I'm using their java API and it's smooth. Also quite cheap.

make xml file using xslt and html

I have HTML page that I created using xslt and an xml file. now In this html file, there is the option to change some values, so after these changes I want to generate xml content with these changes. How can I do that? Is there an easy way?
It seems you are looking for somethig like XMLForm :- http://www.datamech.com/XMLForm/

Embedding an external XML file in HTML and using an XSL file to parse it?

I have an XML file ("exchange.xml") and an XSL file ("exchange.xsl") for parsing it. exchange.xml has the following line:
<?xml-stylesheet type="text/xsl" href="exchange.xsl"?>
Indicating that exchange.xsl should be used to parse the file.
It parses as expected if I load exchange.xml in a browser. However, I want to have exchange.xml embedded in an HTML file which otherwise does not have any XML in it. How do I do this? As far as I understand it, when you embed XML in HTML with the XML tag's SRC ID, you then have to parse it yourself for it to be displayed. I want it to automatically be displayed using the XSL file I've already created for it.
It has also come to my attention that some browsers (e.g. Android default browser) won't parse the XML with the XSL file, and require a server-side transform. Is it possible to embed my XML into an HTML file but use a server-side transform?
As far as I am aware HTML 4 as specified by the W3C does not provide any way to embed XML markup in an HTML 4 document. And of the browsers I know only IE has an extension to HTML 4 that allows that, the so called XML data islands, http://msdn.microsoft.com/en-us/library/ms766512%28v=VS.85%29.aspx, where IE's HTML parser recognizes an xml element that can contain XML markup as its content or link to it with the src attribute. So unless you want to use something IE specific like an XML data islands the cross-browser way to load XML as data inside of an HTML document is to use XMLHttpRequest with Javascript.
You can use PHP: XSL class for transforming XML and include it via PHP in your page.
eg: showxml.php
<?php
function processXML($file,$styles){
... //example here
}
<html>
...
<div id="xml>
<?php echo processXML('foo.xml','bar.xsl'); ?>
</div>
...
</html>

how to convert the webpage to xml document using java?

The assumption is the webpage is coded with correct tags. How can I Convert it to the XML file? I think the most webpages can be viewed as dom tree...How can I convert it to XML file?
JTidy reads HTML and presents it as a DOM. Once you have your HTML as a DOM you should be able to process it and write it out as XML.
To output a DOM, see the example code here and the XMLSerializer in particular.

Extracting HTML tag properties and values

Is it possible to extract the properties and values in a HTML file and export them to a XML file?
Sure.
A HTML File IS a (more or less) a XML file.
You can use XSLT (http://www.w3.org/TR/xslt) for transforming HTML/XML to another HTML/XML.
Or you can use any other language like PHP with xmldom.