How to keep XSLT from introducing whitespaces in HTML output - html

I am generating HTML from XML sources using XSLT. The HTML shows a lot of whitespace that was not in the original XML files. Normally this is not a problem as the browser will ignore the extra whitespace characters. But I am developing an application that relies on correct positioning of the text cursor inside the HTML page. The added whitespaces do mess up the offsets, making it impossible to reliably position the cursor inside an element.
My question: how can I get my XSLT to not introduce any additional whitespaces in text nodes? I am using <xsl:strip-space elements="*"/> but that does not keep the processor from introducing lots of whitespace. It looks like some pretty-printing processing is applied to the HTML and I have no idea where this comes from. I am currently using Saxon PE 9.9.1.7
[Edit]
I created a simple example that shows the same strange behaviour. First the XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT. This manual describes the spacecraft, safety aspects, usage and maintenance procedures. Make sure the manual is available to anyone who will be using the product.</p>
</root>
Here is the simplified XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="1.0">
<xsl:output method="html" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="root">
<xsl:text disable-output-escaping="yes"><!DOCTYPE html>
</xsl:text>
<html>
<head>
<title>Test</title>
</head>
<body>
<xsl:apply-templates select="*"/>
<script src="cursor.js"></script>
</body>
</html>
</xsl:template>
<xsl:template match="p">
<p contenteditable="true" id="p1" onclick="show_position()">
<xsl:value-of select="."/>
</p>
</xsl:template>
</xsl:stylesheet>
The JavaScript file to show the current cursor position:
function show_position( )
{
alert('position: ' + document.getSelection().anchorOffset );
}
The HTML that is generated by the XSLT looks like this (shown in oXygen):
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Test</title>
</head>
<body>
<p contenteditable="true" id="p1" onclick="show_position()">This is a long sentence. Trying to reproduce a whitespace handling problem with XSLT.
This manual describes the spacecraft, safety aspects, usage and maintenance procedures.
Make sure the manual is available to anyone who will be using the product.</p><script src="cursor.js"></script></body>
</html>
Viewing the HTML in a browser makes all the extra whitespaces collapse into a single space, as expected. Clicking inside the paragraph shows the current offset from the start of the paragraph. Clicking immediately before 'This manual' shows position 86. Clicking one character to the right shows position 96. The same extra whitespace is introduced in the sentence starting with 'Make sure'.
I tried with Chrome and Safari - both show identical results. It does not seem to be a browser problem, but an issue with HTML generation by the XSLT processor. I have tried other Saxon versions but the resulting HTML is always the same.
Any further info on how to prevent these extra whitespace characters in my HTML output would be highly appreciated.

The default for output method="html" is indent="yes", I think, so you could certainly explicitly set indent="no" on your xsl:output declaration.
Additionally, as you say you use Saxon PE 9.9, you have access to XSLT 3 features like suppress-indentation="p" and/or Saxon PE/EE specific settings to use a very high setting for the normal line length, check the documentation for e.g. saxon:line-length or similar.

Related

Using XSLT contains function does not work but xpath expression is good

Can somebody please tell me what is wrong with my xslt syntax ?
I already confirmed that my Xpath expression bellow is good and returning the right result :
/*[local-name()='animal']/*[local-name()='birth']/*[local-name()='date']
Now, I am trying to re-use this expression in XSLT with the "contains" function in order to obtain a true or false but it doesn't work. I must me doing something wrong.
I tried this :
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output
method="html"
encoding="UTF-8"
doctype-public="-//W3C//DTD HTML 4.01//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"
indent="yes" ></xsl:output>
<xsl:template match="/">
<html>
<head>
<title>Test </title>
</head>
<body>
<p> Birth date 1998-08-20 (true/false) : </p>
<xsl:apply-templates select="/*[local-name()='animal']/*[local-name()='birth']"/>
</body>
</html>
</xsl:template>
<xsl:template match="/*[local-name()='birth']">
<xsl:value-of select="contains(/*[local-name()='date'], '1998-08-20')"> </xsl:value-of>
</xsl:template>
</xsl:stylesheet>
Can somebody please tell me what I am doing wrong ?
Thanks !!! :-)))
Without seeing your XML, we can only guess. I would say that if this:
/*[local-name()='animal']/*[local-name()='birth']/*[local-name()='date']
works, then your root element's name must be animal. Therefore this:
/*[local-name()='birth']
cannot work, because the root element's name is not birth (a well-formed XML has one root element only).
--
P.S. You should never have to use a hack like *[local-name()='xyz']. Learn how to handle namespaces by using prefixes.
You have too many slashes. In XPath when you start an expression with / it "jumps up" to the top of the tree, to the root node.
Your template match expression should just match the birth element (without a leading slash). Inside of that template, the context node is the birth element, so your contains() expression should not have a leading slash before the date element. An XPath statement without leading slashes will be selecting children relative from the context node.
<xsl:template match="*[local-name()='birth']">
<xsl:value-of select="contains(*[local-name()='date'], '1998-08-20')"/>
</xsl:template>

XML to XSL transformation

I am new to XML. I am trying to convert the following XMl file
<?xml version="1.0"?>
<parent original_id="OI123" id="I123">
<custompanel panel="cp"></custompanel>
</parent>
into the following HTML
<html>
<body><div xmlAttribute="{"original-id":"OI123","id":"I123"}">
<p xmlAttribute={"panel":"cp"}/>
</div>
</body>
</html>
XML tag <parent> should be converted to <div> and <custompanel> should be converted to <p> tag.
I have read the XSTL documentation from W3CSchool but still I am not exactly sure how to approach the problem.Can anyone help me with it?
The custom attribute needs to be stored in xmlAttribute as JSONObject.
After a quick research of the correct syntax I came up with this.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output
method="xml"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
indent="yes"
encoding="utf-8" />
<xsl:template match="parent">
<html>
<body>
<div xmlAttribute="{{'original-id':'{#original_id}','id':'{#id}'}}">
<xsl:apply-templates />
</div>
</body>
</html>
</xsl:template>
<xsl:template match="custompanel">
<p xmlAttribute="{{'panel':'{#panel}'}}" />
</xsl:template>
</xsl:stylesheet>
The tricky part is espacing the {} for the JSON, which we build ourselves. You need two curly braces {{ to have a literal one. Also you need to use single quotes ' inside the attributes as double quotes would be escaped to ". You can access attributes with the #foo selector, but now you need to use actual {} to make the processor recognize it should do something.
I guess that your actual file has more than one <parent>. In that case you need to have a root element around it, and you need to adjust the XSLT. Add another <xsl:template match="/"> and move the HTML frame there.

Display CDATA with XSLT

I'm working with a bowling website and a software program written for management of our bowling league scores. The software also allows us to write an article on the last bowling league day. Now I'm copying the article into HTML module on the website but since it's also implemented in the same XML output file as the scores I'd like to take it out as well with XSLT. I only don't know what are the commands for this... So I have no XSLT file I'm working with. If I look it up here or on google , there seem to be tons of ways to do it but every time there's a bunch of other code as well and I'm still a beginner...
Can anybody help me?
<Infos>
<Info>
<Title><![CDATA[THIS IS THE TITLE OF THE ARTICLE]]></Title>
<Text><![CDATA[THIS THE ARTICLE]]></Text>
</Info>
</Infos>
The CDATA piece contains text, I want it to display after applying
XSLT and determine the text outcome (bold, Italic and underlined) with
XSLT
The following stylesheet:
XSLT 1.0
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="UTF-8" indent="yes"/>
<xsl:template match="/Infos">
<html>
<body>
<xsl:apply-templates select="Info"/>
</body>
</html>
</xsl:template>
<xsl:template match="Info">
<h2><xsl:value-of select="Title" /></h2>
<i><xsl:value-of select="Text" /></i>
</xsl:template>
</xsl:stylesheet>
when applied to your example input, will produce the following result:
<html>
<body>
<h2>THIS IS THE TITLE OF THE ARTICLE</h2>
<i>THIS THE ARTICLE</i>
</body>
</html>
rendered as:

XSLT transform removes HTML elements from mixed-content

Is it possible for XSLT preserve anchors and other embedded HTML tags within XML?
Background: I am trying to convert an HTML document into XML with an XSL stylesheet using XSLT. The original HTML document had content interspersed with anchor tags (e.g. Some hyperlinks here and there). I've copied that content into my XML, but the XSLT output lacks anchor tags.
Example XML:
<?xml version="1.0" ?>
<observations>
<observation>Hyperlinks disappear.</observation>
</observations>
Example XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/html">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:value-of select="observation"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Output:
<html xmlns="http://www.w3.org/1999/html">
<body>Hyperlinks disappear.</body>
</html>
I've read a few similar articles on stackoverflow and checked out the Identity transform page on wikipedia; I started to get some interesting results using xsl:copy-of, but I don't understand enough about XSLT to get all of the words and tags embedded within each XML element to appear in the resulting HTML. Any help would be appreciated.
Write a separate template to match a elements, copy their attributes and content.
What is wrong with your approach? In your code,
<xsl:value-of select="observation"/>
simply sends to the output the string value of the observation element. Its string value is the concatenation of all text nodes it contains. But you need not only the text nodes in it, but also the a elements themselves.
The default behaviour of an XSLT processor is to "skip" element nodes, because of a built-in template. So, if you do not mention a in a template match, it is simply ignored and only its text content is output.
Stylesheet
Note: This stylesheet still relies on the default behaviour of the XSLT processor to some extent. The order of events will resemble the following:
The template where match="/observations" is matched. It adds html
and body to the output. Then, a template rule must be found for the
content of observations. A built-in template matches observation,
does nothing with it, and looks for a template to process its content.
For the a element, the corresponding template is matched, with
copies the element and attributes. Finally, a built-in template copies
the text nodes inside observation and a.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="a">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
XML Output
<html>
<body>Hyperlinks disappear.
</body>
</html>

How to get XSL to ignore xmlns short name

I don't know too much about XSL but have managed to format XML coming from 3rd party web services using XSL without too much trouble. But the other day, a site that used to work stopped working. I discovered that they made a tiny change to the XML returned by the web service. This is what used to work (greatly simplified):
Update: I see the problem now, but I don't have a solution. The problem is with xsl:if test="#xsi:type='r0:CreditTx'". Change every "r0" to "s0" in the XSL, and it does not work.
I have replaced my original code with a working example:
XML:
<?xml version="1.0" encoding="unicode"?>
<MyResp xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:r0="http://www.foo.com/2.1/schema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<r0:creditVendReceipt receiptNo="1234567890">
<r0:transactions>
<r0:tx xsi:type="r0:CreditTx">
<r0:amt value="100" />
</r0:tx>
</r0:transactions>
</r0:creditVendReceipt>
</MyResp>
XSL:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:r0="http://www.foo.com/2.1/schema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<html>
<head>
</head>
<body >
<xsl:for-each select="MyResp/r0:creditVendReceipt/r0:transactions/r0:tx">
<xsl:if test="#xsi:type='r0:CreditTx'">
<xsl:value-of select="r0:amt/#value"/>
</xsl:if>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Desired HTML:
<html xmlns:r0="http://www.foo.com/2.1/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
100
</body>
</html>
The problem came when the web service changed the xmlns' short name "a" to "a0" (it now sends xmlns:a0="http://mysite.com/webservice/1.0/schema"); the namespace and everything else is the same. I have to change "a" to "a0" in the XSL for it to work (i.e. "GetInfoResp/a0:userName"). The problem is that the short name sent by the service changes from time to time. (In the real app there are a lot of name spaces, and the short names are even changing between the various requests.)
I thought the short name was just to make the XML shorter and easier to read, and that the actual name isn't significant (betwen the XML and the XSL; within the XSL obviously it has to match).
Can I get the XSL to ignore the short name in the XML, and just use its own short name?
Sorry if this was answered before; I looked thru the other questions and didn't see this specific issue.
The "short name" is called a namespace prefix -- and you don't have to change the namespace prefix in the transformation -- in fact it may be completely different from any prefix that could be used in the XML document.
This transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xYz="http://mysite.com/webservice/1.0/schema"
exclude-result-prefixes="xYz">
<xsl:template match="/">
<html>
<body >
<xsl:value-of select="GetInfoResp/xYz:userName"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
produces exactly the same result as this transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:a0="http://mysite.com/webservice/1.0/schema"
exclude-result-prefixes="a0">
<xsl:template match="/">
<html>
<body >
<xsl:value-of select="GetInfoResp/a0:userName"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Both transformations, when applied on this XML document (what is provided in the question is severely malformed and had to be corrected):
<GetInfoResp xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:a0="http://mysite.com/webservice/1.0/schema">
<a0:userName>Joe</a0:userName>
</GetInfoResp>
produce the same result:
<html>
<body>Joe</body>
</html>
Lesson to learn:
What matters is the namespace, not the prefix used to shorthand it.