MSXSL Error and barely any output - html

I am trying to transform some HTML files to my own XML-format via XSL.
For this purpose I use HTML Tidy to clean up the input files, then transform them to xhtml with html2xhtml and then use a xsl script with msxsl to transform the xhtml files to my own format.
However, the last step is failing with not a error message at all (it is a semantical fail; not a technical ;-)): My output file just contains empty tags.
I had a problem like this before and removed the xmlns attribute from the html tag, what causes nearly all of the online transformers to work with my files correctly. MSXSL now writes the following error message: "Use of default namespace declaration attribute in DTD not supported".
Find the files I use here: http://pastie.org/5483087
Thank you in advance!

Well that is the FAQ with XSLT and XPath 1.0, the elements in your input XHTML document are in a namespace and your XSLT does not take that into account. You need to change it to e.g.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">
<xsl:template match="/">
<stellenausschreibung>
<hochschule><xsl:value-of select="//xhtml:div[#id='contentText']/xhtml:img/#alt" /></hochschule>
<anbieter><xsl:value-of select="//xhtml:p[#id='ad_employer']" /></anbieter>
<typ><xsl:value-of select="//xhtml:h1" /></typ>
<bewerbungsschluss><xsl:value-of select="//xhtml:span[#id='ad_bewerbungsschluss']" /></bewerbungsschluss>
<erscheinungsdatum><xsl:value-of select="//xhtml:span[#class='job_published_at']" /></erscheinungsdatum>
<inhalt><xsl:value-of select="//xhtml:p[#id='ad_job']" /></inhalt>
</stellenausschreibung>
</xsl:template>
</xsl:stylesheet>
The prefix (in my example xhtml) for the XHTML namespace used in the stylesheet can of course be freely chosen but it is necessary to use one as with XSLT/XPath 1.0 a path of e.g. //p always selects p elements in no namespace.

Related

XSL stylesheet keeps Firefox from recognising DTD-defined ids

I want a client-side XSL-transformed document with elements targettable (jumpable to) by #foo (URL fragments). Problem is, as soon as I attach the simplest XSL stylesheet, Firefox stops scrolling to the elements. Here's simple code:
test.xml:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href='test.xsl'?>
<!DOCTYPE foo [<!ATTLIST bar id ID #REQUIRED>]>
<foo xmlns:html='http://www.w3.org/1999/xhtml' xml:lang='en-GB'>
<html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/>
<bar id='baz'>Baf.</bar>
</foo>
test.xsl:
<xsl:stylesheet version='1.0' xmlns:html='http://www.w3.org/1999/xhtml' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match='/'>
<xsl:copy-of select='.'/>
</xsl:template>
</xsl:stylesheet>
As soon as I uncomment the stylesheet line, /test.xml#baz does nothing. As though the transformation somehow loses some data about elements' identification.
Any ideas? Thanks.
Well the XSLT/XPath data model does not include any DTD and thus your result tree that XSLT creates is a copy of the input without the DTD, thus there is no definition of any ID attributes in the result tree and Firefox has no way of establishing to which element with which attribute #some-id refers.
Usually if you use client-side XSLT in the browser the target format is (X)HTML or SVG or a mix of both where id attributes are known by the browser implementation without needing a DTD. If you want to transform to a result format unknown to the browser then I don't think there is a way to use DTDs for the result tree in Firefox/Mozilla. And I am not sure whether they ever implemented xml:id support so that you could use that instead of defining your own ID attributes.
Martin Honnen's mention of XHTML resulted in experimentation during which I found out that setting the target element's namespace to XHTML's, xmlns='http://www.w3.org/1999/xhtml', does the trick. It doesn't seem very clean, but it doesn't seem as grave as, for instance, setting the whole doctype to XHTML's. So text.xml is now:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href='test.xsl'?>
<foo xmlns:html='http://www.w3.org/1999/xhtml' xml:lang='en-GB'>
<html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/>
<html:bar id='baz'>Baf.</html:bar>
</foo>
Also relevant might be http://xmlplease.com/xhtmlxhtml I found.
Thanks, all.

Styling inline XML tags with XSLT

This is a similar question to Style inline text along with nested tags with XSLT, but I can't comment to get clarification, so I will elaborate my specific scenario here. I basically have an XML document with the following structure:
<book>
<chapter>
<para>This is some text about <place>New York</place></para>
</chapter>
</book>
I am using XSLT to output XHTML from my XML file, and I want to be able to put span tags or something around the content in the place tag in the example above. The purpose is so that I can style these segments of text with CSS. Following the example I referenced above, I added this:
<xsl:template match="book/chapter/para/place">
<span class="place">
<xsl:apply-templates/>
</span>
</xsl:template>
When I load the XML document in the browser I get the error: "Error loading stylesheet: Parsing an XSLT stylesheet failed." (the stylesheet was loading properly before I added this part)
I'm assuming I lack some basic understanding of how xsl:apply-templates should be used. I would appreciate it if someone could point me in the direction of figuring this out.
Thanks!
The match:
<xsl:template match="book/chapter/para/">
applies templates to all children of the place element, rather than place itself.
Use select within apply-templates instead:
<xsl:template match="/">
<xsl:apply-templates select="book/chapter/para/place"/>
</xsl:template>
In the absence of a select attribute, the xsl:apply-templates instruction processes all of the children of the current node, including text nodes.
A select attribute can be used to process nodes selected by an expression instead of processing all children. The value of the select attribute is an expression. The expression must evaluate to a node-set.
References
XSLT 1.0 Specification

How to include HTML entities into an XML file

In firefox :
<?xml version="1.0" encoding="utf-8"?>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
results in "undefined entity" error.
I know there is something missing there. I just don't know what I should write to correct the problem. I would like to avoid rewriting every single unicode character into the document.
EDIT I tried the following, still not working, same error :
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math [
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;
]>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
EDIT In chrome, this results in the following message :
error on line 6 at column 13: PEReference: %HTMLlat1; not found
warning on line 10 at column 15: PEReference: %HTMLsymbol; not found
warning on line 14 at column 16: PEReference: %HTMLspecial; not found
EDIT Tried to download the .ent files and change the reference to either a local http:// path or file:/// path with no success.
A similar post about the subject : XML catalog in PHP
EDIT Quick workaround for browsers :
<!DOCTYPE html>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
You need to suppress the XML header, so it is understood as HTML.
Nevertheless, this doesn't answer the question, as the question was to import entities, while the document is declared as XML.
ANSWER
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
Note the definitions in XHTML1 and MathML2 are now obsolete and not aligned with the definitions that are built in to HTML parsers in current browsers. The current definitions as used in MathML3 and HTML5 are defined here
http://www.w3.org/2003/entities/2007doc/Overview.html
which is the editors (my:-) draft, with a link at the top to the REC version.
A single file set of DTD declarations for the entities is
http://www.w3.org/2003/entities/2007/htmlmathml-f.ent
generally speaking it is better to use numeric references rather than the named entities in an XML context as browsers will not fetch the externally referenced DTD.
Browsers following the HTML(5) spec will use a built in set of definitions derived from the above spec if you refer to the xhtml or mathml2 dtd via the public identifiers (ie they do not use the entity definitions that you specify).
see related bug against the HTML spec
https://www.w3.org/Bugs/Public/show_bug.cgi?id=13409
Add the MathML 2.0 doctype, after the XML declaration:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math
PUBLIC "-//W3C//DTD MathML 2.0//EN"
"http://www.w3.org/Math/DTD/mathml2/mathml2.dtd"
>
The reason is that handling of entity references is very kludgy in web browsers. They do not actually read DTDs. Instead, they have built-in tables of predefined entities, which can be turned on by using specific doctype strings. This is string magic, and e.g. using MathML 3.0 doctype will not work. Cf. to XML to XHTML using XSLT: using entities such as &Sum; (which is a MATHML entity) (especially Martin Honnen’s comment on an answer).
Alternatively, use characters as such or, if your authoring system cannot produce them conveniently, character references like ρ.
If you can modify the the XML to include an inline DTD you can define the entities there:
> <!DOCTYPE yourRootElement [
> <!ENTITY bull "•">
> .... ]>

Docbook - suppressing TOC

I am converting a docbook to an html using 1.77 xsl transformation. But when it is transformed it automatically generates a Table of Contents. How do you change this behavior?
I have found this: Disable table of contents for documents
So I am guessing that html xsl transform would be the presentation system?
Elaborate DocBook formatting is meant to be customized using an xsl stylesheet.
See Also
Writing a DocBOok Customization Layer for Formatting
Customizing Table Of Contents Using XSL
customize_formatting.xsl: Example DocBook XSL Customization Layer
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="1.0"> <!-- change this to 2.0 if your tools support it -->
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/fo/docbook.xsl"/>
<!--uncomment this to suppress toc when using XSL 1.0 or 2.0
<xsl:param name="generate.toc">article</xsl:param>
<xsl:param name="generate.toc">book</xsl:param>
-->
<!--uncomment this to suppress toc when using XSL 2.0
<xsl:param name="generate.toc">
article nop
book nop
</xsl:param>
-->
</xsl:stylesheet>
How to Use customize_formatting.xsl
Point your tools to use customize_formatting.xsl instead of the off-the-shelf docbook.xsl. Then, put all your formatting customizations in the body of the <xsl:stylesheet> section.
For TOC suppression, you can just uncomment the appropriate line.
There is a quirk with some (or maybe all) XSL 1.0 tools that seem to prevent them from handling the whitespace-separated pairs used in the body of <xsl:param name="generate.toc">. I have had success suppressing TOC by just using the single word article or book instead of the proper whitespace separated pairs.
When transforming, you can use the Transformer#setParameter(String, Object) method to specify no TOC generation like this:
transformer.setParameter("generate.toc", "nop");

<xsl:value-of select="document(content)//title"/> returns empty node

I'm trying to get title of simple html document to build sitemap. But always return empty value. I debug this and found out that document(content) returns document nodes. It looks like this.alt text http://www.freeimagehosting.net/uploads/f7caf412dc.png But I could not access document(content)/html or something like this. Please help!
Some more code would help, but in such situations the first one to blame is namespace. I can see that your nodes are in the XHTML namespace, but you do not use any namespace prefix in your XPath.
You have to declare namespace prefix in your stylesheet like this:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:h="http://www.w3.org/1999/xhtml"
>
And then use this prefix in your XPath like this:
document(content)/h:html
If your xml elements are in a namespace, even if it is the default namespace for the document, you must use namespace prefixes in any XPath expressions and template match rules. It is the namespace uri and not the prefix that matters. Note that attributes will not be in the default namespace, they only have a namespace if their name has a prefix.
Additionally, an XPath expression containing // is usually less efficient than one that does not.
<xsl:stylesheet version="1.0"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- and elsewhere in your stylesheet -->
<xsl:value-of select="document(content)/h:html/h:head/h:title"/>