How to store superscript in XML attribute and read using XSL? - html

I have a requirement where I need create an XML document dynamically. Some of the attributes of the nodes of this XML contain superscript Reg etc. My question is how should I store such superscript characters in XML and then read it using XSL to render as HTML. A sample XML is shown below:
<?xml version="1.0" encoding="utf-8"?>
<node name="Some text <sup>®</sup>"/>
I know this cannot be stored under sup tag inside attribute as it breaks XML. I tried using <sup> also in place of opening and closing tag. But then they are rendered as <sup> on HTML instead of actually making it superscript.
Please let me know the solution for this problem. I have control over generation of XML. I can write it the correct way, If I know what is the right way to store superscripts.

Since you're using XSL to transform the input into HTML, I would suggest using a different method to encode the fact that some things need to be superscripts. Make up your own simple markup, for example
<node name="Some text [[®]]"/>
The markup can be anything that you can uniquely identify later and doesn't occur naturally in your data. Then in your XSL process the attribute values that can contain this markup with a custom template that converts the special markup to <sup> and </sup>. This allows you to keep the document structure (i.e. not move these string values to text nodes) and still achieve your goal.

Please let me know the solution for this problem. I have control over
generation of XML. I can write it the correct way, If I know what is
the right way to store superscripts.
Because attributes can only contain values (no nodes), the solution is to store markup (nodes) inside elements:
<node>
<name>Some text <sup>®</sup></name>
</node>

If it's only single characters like ® that need to be made superscript, then you can leave the XML without crooks like <sup>, i.e. like
<node name="Some text ®"/>
and look for the to-be-superscripted characters during processing. A template like this might help:
<xsl:template match="node/#name">
<xsl:param name="nameString" select="string()"/>
<!-- We're stepping through the string character by character -->
<xsl:variable name="firstChar" select="substring($nameString,1,1)"/>
<xsl:choose>
<!-- '®' can be extended to be a longer string of single characters
that are meant to be turned into superscript -->
<xsl:when test="contains('®',$firstChar)">
<sup><xsl:value-of select="$firstChar"/></sup>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$firstChar"/>
</xsl:otherwise>
</xsl:choose>
<!-- If we we didn't yet step through the whole string,
chop off the first character and recurse. -->
<xsl:if test="$firstChar!=''">
<xsl:apply-templates select=".">
<xsl:with-param name="nameString" select="substring($nameString,2)"/>
</xsl:apply-templates>
</xsl:if>
</xsl:template>
This approach is however not very efficient, especially if you have lots of name attributes and/or very long name attributes. If your application is performance critical, then better do some testing whether the impact on processing times is justifiable.

Related

Value in CDATA tag not being displayed in XSL file?

I want to replace the & symbol inside of a piece of text that is generated dynamically with its encoded value %26 to prevent it from breaking the URL string. I am storing the text inside hrefvalue variable. My goal is to replace & with %26 to be output in the final HTML code in the browser.
For example:
"listening & comprehension" should become "listening %26 comprehension"
I am using <![CDATA[ to preserve %26 but this seems not to be working. I still end up with "listening & comprehension" in the browser. Why?
<xsl:variable name="hrefvalue" select="./node()" />
<xsl:choose>
<xsl:when test="contains($hrefvalue, '&')">
<xsl:variable name="string-before" select="substring-before($hrefvalue, '&')" />
<xsl:variable name="string-after" select="substring-after($hrefvalue, '&')" />
<xsl:variable name="ampersand"><![CDATA[%26]]></xsl:variable>
<xsl:value-of select="concat($string-before, $ampersand, $string-after)" disable-output-escaping="yes" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$hrefvalue" disable-output-escaping="no" />
</xsl:otherwise>
</xsl:choose>
I am working on a system that uses XSL 1.0
Firstly, the CDATA is completely unnecessary and irrelevant. The only purpose of CDATA is to allow & and < to be written without escaping, and if those special characters are not present, the CDATA tag makes no difference.
Also, if you're generating an attribute (which seems likely if it's a URL) then disable-output-escaping has no effect. But again, it's probably not needed.
Your code only deals with one ampersand in a string.
But your code looks fine. If there's a problem, it's in the part of the code that you haven't shown us. Try to construct a complete reproducible example: a complete stylesheet and source document that we can actually run to see if we can reproduce the problem.

XSLT How to select only the value of an element with children

I have this code and can't edit it:
<elenco-treni>
<treno id='1'> Moderno
<percorso>Terni - Spoleto</percorso>
<tipo genere='locale'> aaa
<fermata>Narni</fermata>
<fermata informale='s'>Giano</fermata>
</tipo>
</treno>
<treno id='5' codice='G140'> Jazz
<percorso>Orte - Terontola</percorso>
<tipo genere='regionale'>
<fermata>Terni</fermata>
<fermata>Spoleto</fermata>
<fermata>Foligno</fermata>
<fermata>Assisi</fermata>
<fermata>Perugia</fermata>
</tipo>
</treno>
</elenco-treni>
and I got some problems:
When I select "elenco-treni", everything doesn't work
<xsl:for-each select="elenco-treni">
<xsl:value-of select="treno"/>
gives me blank result.
I can't get the value of tipo which is "aaa"
<xsl:for-each select="treno">
<xsl:value-of select="tipo"/>
gives me all of "tipo" children and it's value.
This is badly designed XML, in that it is using mixed content (elements that have both text nodes and other elements as children) in a way that mixed content wasn't designed to be used. Constructs like xsl:value-of work well if mixed content is used properly, but they don't work well on this kind of input.
When you're dealing with badly designed XML, the best thing is to start by transforming it to something cleaner. You could do this here with a transformation that wraps the text nodes in an element:
<xsl:template match="treno/text()[normalize-space(.)]">
<veicolo><xsl:value-of select="normalize-space(.)"/></veicolo>
</xsl:template>
This takes care only to wrap the non-whitespace text nodes.

XSL - Whitespace issue when setting dynamically placeholder

I have an xslt template, in which i am loading translation content from xml files.
I want to set dynamically the placeholder in an input field, but apparently I am keep getting whitespace (the placeholder is moved to the right).
Here is my code.
<xsl:attribute name="placeholder">
<xsl:value-of select="/paygate/language/computop.creditcard.number.message"/>
</xsl:attribute>
I tried removing the whitespace between the lines, also setting
<xsl:strip-space elements="*"/>
in the beginning of the file. Nothing worked :(
An XSLT processor ought to strip whitespace-only text nodes that are direct children of an <xsl:attribute> by default. If the transform you present is producing placeholder attributes with unwanted leading or trailing whitespace in their values, then, I conclude it is coming from the application of the <xsl:value-of> element; its result is not subject to whitespace stripping.
In that case, you could consider applying the standard normalize-space() XPath function to the attribute value:
<xsl:attribute name="placeholder">
<xsl:value-of select="normalize-space(string(/paygate/language/computop.creditcard.number.message))"/>
</xsl:attribute>
normalize-space() will delete both leading and trailing whitespace from its (string) argument, but will also replace each internal run of whitespace characters with a single space character.

Is there a way to detect numeric string in xslt?

I am now doing a html to xml xslt transformation, pretty straigh-forward. But I have one slight problem that is left unsolved.
For example, in my source html, a node looks like:
<p class="Arrow"><span class="char-style-override-12">4</span><span class="char-style-override-13"> </span>Sore, rash, growth, discharge, or swelling.</p>
As you can see, the first child node < span> has a value of 4, is it actually rendered as a arrow point in the browser (maybe some encoding issue, it is treated as a numeric value in my xml editor).
So my question is, I wrote a template to match the tag, then pass the text content of it to another template match :
<xsl:template match="text()">
<xsl:variable name="noNum">
<xsl:value-of select="normalize-space(translate,'4',''))"/>
</xsl:variable>
<xsl:copy-of select="$noNum"/>
</xsl:template>
As you can see, this is definitely not a good solution, it will replace all the numbers appearing in the string, not only the first character. So I wonder if there is a way to remove only the first character IF it is a number, maybe using regular expression? Or, I am actually going the wrong way, should there be a better way to think of solving this problem(e.g, changing the encoding)?
Any idea is welcomed! Thanks in advance!
Just use this :
<xsl:variable name="test">4y4145</xsl:variable>
<xsl:if test= "not(string(number(substring($test,1,1)))='NaN')">
<xsl:message terminate="no">
<xsl:value-of select="substring($test,2)"/>
</xsl:message>
</xsl:if>
This is a XSLT 1.0 solution. I think regex is an overkill for this.
Output :
[xslt] y4145
Use this single XPath expression:
concat(translate(substring(.,1,1), '0123456789', ''),
substring(.,2)
)

Is it possible to have HTML text or CDATA inside an XML attribute?

I keep getting "XML parser failure: Unterminated attribute" with my parser when I attempt to put HTML text or CDATA inside my XML attribute. Is there a way to do this or is this not allowed by the standard?
No, The markup denoting a CDATA Section is not permitted as the value of an attribute.
According to the specification, this prohibition is indirect rather than direct. The spec says that the Attribute value must not have an open angle bracket. Open angle brackets and ampersand must be escaped. Therefore you cannot insert a CDATA section. womp womp.
A CData Section is interpreted only when it is in a text node of an element.
Attributes can only have plain text inside, no tags, comments, or other structured data. You need to escape any special characters by using character entities. For example:
<code text="<a href="/">">
That would give the text attribute the value <a href="/">. Note that this is just plain text so if you wanted to treat it as HTML you'd have to run that string through an HTML parser yourself. The XML DOM wouldn't parse the text attribute for you.
CDATA is unfortunately an ambiguous thing to say here. There are "CDATA Sections", and "CDATA Attribute Type".
Your attribute value can be of type CDATA with the "CDATA Attribute Type".
Here is an xml that contains a "CDATA Section" (aka. CDSect):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<elemke>
<![CDATA[
foo
]]>
</elemke>
Here is an xml that contains a "CDATA Attribute Type" (as AttType):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE elemke [
<!ATTLIST brush wood CDATA #REQUIRED>
]>
<elemke>
<brush wood="guy
threep"/>
</elemke>
You cannot use a "CDATA Section" for an Attribute Value: wrong:<brush wood=<![CDATA[foo]]>/>
You can use a "CDATA Attribute Type" for your Attribute's Type, I think this is actually what happens in the usual case, and your attribute value is actually a CDATA: for an element like <brush wood="guy
threep"/>, in the raw binary bytestring that is the .xml file, you have guy
threep however when the file is processed, the attribute value in memory will be
guy
threep
Your problem may lie in 1) producing a right xml file and 2) configuring a "xml processor" to produce an output you want.
For example, in case you write a raw binary file as your xml by hand, you need to put these escapes inside the attribute value part in the raw file, like I wrote <brush wood="guy
threep"/> here, instead of <brush wood="guy (newline) threep"/>
Then the parse would actually give you a newline, I've tried this with a processor.
You can try it with a processor like saxon or for poor-man's experiment one like a browser, opening the xml in firefox and copying the value to a text editor - firefox displayed the newline as a space, but copying the string to a text editor showed the newline. (Probably with a better suited processor you could save the direct output right away.)
Now the "only" thing you need to do is make sure you handle this CDATA appropriately. For example, if you have an XSL stylesheet, that would produce you a html, you can use something like this .xsl for such an xml:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="split">
<xsl:param name="list" select="''" />
<xsl:param name="separator" select="'
'" />
<xsl:if test="not($list = '' or $separator = '')">
<xsl:variable name="head" select="substring-before(concat($list, $separator), $separator)" />
<xsl:variable name="tail" select="substring-after($list, $separator)" />
<xsl:value-of select="$head"/>
<br/><xsl:text>
</xsl:text>
<xsl:call-template name="split">
<xsl:with-param name="list" select="$tail" />
<xsl:with-param name="separator" select="$separator" />
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="brush">
<html>
<xsl:call-template name="split">
<xsl:with-param name="list" select="#wood"/>
</xsl:call-template>
</html>
</xsl:template>
</xsl:stylesheet>
Which in a browser or with a processor like saxon using java -jar saxon9he.jar -s:eg2.xml -xsl:eg2.xsl -o:eg2.html saxon home edition 9.5 would produce this html-like thing:
<html>guy<br>
threep<br>
</html>
which will look like this in a browser:
guy
threep
Here I am using a recursive template 'split' from Tomalak, thanks to Mads Hansen, because my target processor doesn't support neither string-join nor tokenize which are version 2.0 only.
If an attribute is not a tokenized or enumerated type, it is processed as CDATA. The details for how the attribute is processed can be found in the Extensible Markup Language (XML) 1.0 (Fifth Edition).
3.3.1 Attribute Types
XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types are more constrained. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3.3 Attribute-Value Normalization.
[54] AttType ::= StringType | TokenizedType | EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' [VC: ID]
[VC: One ID per Element Type]
[VC: ID Attribute Default]
| 'IDREF' [VC: IDREF]
| 'IDREFS' [VC: IDREF]
| 'ENTITY' [VC: Entity Name]
| 'ENTITIES' [VC: Entity Name]
| 'NMTOKEN' [VC: Name Token]
| 'NMTOKENS' [VC: Name Token]
...
3.3.3 Attribute-Value Normalization
Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.
All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
Begin with a normalized value consisting of the empty string.
For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
For a character reference, append the referenced character to the normalized value.
For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
For another character, append the character to the normalized value.
If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.
Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.
All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA.
It is an error if an attribute value contains a reference to an entity for which no declaration has been read.
We can't use CDATA as attribute, but we can bind html using HTML codes.
Here is one example:
to achieve this: <span class="abc"></span>
use XML code like this:
<xmlNode attibuteName="<span class="abc">Your Text</span>"></xmlNode>
Yes you can when you encode the content within the XML tags.
I.e. use & < > " &apos;, that way it will not be seen as markup inside your markup.