HTML entity numbers in xslt - html

I'm attempting to transform HTML to XML. My Input HTML is obtained dynamically, and the input HTML has html entity numbers as below.
HTML Input:
<root>
<h1>Hello stack Over flow</h1>
<H1 align="left">The list will be managed with a  <SUB>of © ®</H1>
</root>
My transform looks as below :
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="root">
<xsl:copy >
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
the output from the transform is writing all html entity numbers as html special characters.
The desired output should have html entity numbers instead of html characters. Please help me to get out of this issue?

You could try to put encoding="US-ASCII" on your xsl:output directive, that way any characters outside of that encoding should be output as character references.

Related

Wrapping XML document in one-element JSON using XSLT 1.0

I'm trying to transform an XML document to be in single-line and wrap it in a one-element JSON. Using XSLT 1.0
The problem is, XSL generates double quotes in the xmlns definitions so the resulting JSON is invalid.
This is my input:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<otm:Transmission xmlns:otm='http://xmlns.oracle.com/apps/otm/transmission/v6.4' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<otm:TransmissionHeader/>
<otm:TransmissionBody>
<otm:GLogXMLElement>
<otm:Invoice>
<otm:Payment>
<otm:PaymentHeader>
<otm:DomainName>CompanyX</otm:DomainName>
<otm:TransactionCode>EX</otm:TransactionCode>
<otm:InvoiceDate>
<otm:GLogDate>20220414000000</otm:GLogDate>
</otm:InvoiceDate>
</otm:PaymentHeader>
</otm:Payment>
</otm:Invoice>
</otm:GLogXMLElement>
</otm:TransmissionBody>
</otm:Transmission>
This is what I'm getting:
{"jsonElement":"<otm:Transmission xmlns:otm="http://xmlns.oracle.com/apps/otm/transmission/v6.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><otm:TransmissionHeader/><otm:TransmissionBody><otm:GLogXMLElement><otm:Invoice><otm:Payment><otm:PaymentHeader><otm:DomainName>CompanyX</otm:DomainName><otm:TransactionCode>EX</otm:TransactionCode><otm:InvoiceDate><otm:GLogDate>20220414000000</otm:GLogDate></otm:InvoiceDate></otm:PaymentHeader></otm:Payment></otm:Invoice></otm:GLogXMLElement></otm:TransmissionBody></otm:Transmission>"}
The XSL that I use:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:otm='http://xmlns.oracle.com/apps/otm/transmission/v6.4'>
<xsl:output method="text" indent="no" suppress-indentation="otm:Transmission"/>
<xsl:strip-space elements="*" />
<xsl:template match="/">
{"jsonElement":"<xsl:apply-templates select="*"/>"}
</xsl:template>
<xsl:template match="#* | *">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
As you can see, the JSON is invalid due to double quotes in the xmlns definitons.
I tried several approaches and I am not able to get rid of the double quotes. In the input, they are single quotes but XSL is generating them differently.
What would be the best approach to have a valid JSON result?
The XML in the JSON has to be 1:1 copy of the input but transformed into a single line and I can only use XSLT 1.0
An XSLT 1.0 processor is going to reject the suppress-indentation attribute, and it's going to output the text of the source document without markup. Like #MartinHonnen, I don't see how any XSLT processor can give you the output you claim to be getting.
In XSLT 3.0 you can do
<xsl:output method="json">
<xsl:template match="/">
<xsl:map key="'jsonElement'"
select="serialize(., map{'method':'xml'})"/>
</xsl:template>

Why is there xmlns in my html output

In the html output file from an XSLT process (using saxon9he), there have been 155 occurrences of xmlns:fn="http://www.w3.org/2005/xpath-functions" inserted into a variety of tr elements
The part of xsl that uses xpath-functions is
<xsl:if test="(string(#hideIfHardwareIs)='') or (not(fn:matches(string($input_doc//inf[#id='5'), string(#hideIfHardwareIs), 'i')))">
unless I am reading it wrong, matches takes 3 arguments, a string, another string and then a flag in which case this is case-insensitive.
What I don't undestand is that the tr elements that are showing up with the xmlns arent close to the portion or xsl that the matches() function is done at.
The XSL file I am working with is 2100 lines and the XML file it parses is 12800 lines. So I don't think I can share it easily. I've inherited this and need to (at this time) maintain it.
What are somethings i can look for within the XSL that would insert the xmlns into the html output?
Those functions do not need to be prefixed.
Remove the xmlns:fn="http://www.w3.org/2005/xpath-functions" from your xsl:stylesheet and remove the fn: prefix from the xpath functions.
Examples:
XML Input
<foo>test</foo>
XSLT 2.0 #1
<xsl:stylesheet version="2.0" xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:if test="fn:matches(.,'^t')">
<bar><xsl:value-of select="."/></bar>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output
<bar xmlns:fn="http://www.w3.org/2005/xpath-functions">test</bar>
XSLT 2.0 #2
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:if test="matches(.,'^t')">
<bar><xsl:value-of select="."/></bar>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output
<bar>test</bar>

How to get a value from a JavaScript in XSLT?

I have the following XML and XSLT to transform to HTML.
XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
<te>t1</te>
</root>
XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="yes" />
<xsl:template match="root">
<html>
<div>
<xsl:variable name="name1" select="te" />
**
<xsl:value-of select="CtrlList['$name1']" />
**
</div>
<script language="javascript">var List={
"t1":"test"
}</script>
</html>
</xsl:template>
</xsl:stylesheet>
So my objective is get the value of "te" from the XML and map it with the JavaScript object "List" and return the value test while transforming with the XSLT. So i should get the value test as output.
Can anyone figure out what wrong in the XSLT.
When you look at your XSLT, it may seem like there is JavaScript there, but all XSLT sees is that it is outputing an element named "script", with an attribute "language", which contains some text. It is also worth noting that xsl:value-of is used to get the value from the input document, but your script element is actually part of the result tree, and so not accessible to xsl:value-of.
However, it is possible to extend XSLT so it can use javascript functions, but this is very much processor dependant, and you should think of it the same way as embedding JavaScript in HTML. Have a look at this question, as an example
How to include javaScript file in xslt
So, in your case, your XSLT would be something like this (Note this particular example will only work in Mircorsofts MSXML processor)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:user="http://mycompany.com/mynamespace"
exclude-result-prefixes="msxsl user">
<xsl:output method="xml" indent="yes" />
<msxsl:script language="JScript" implements-prefix="user">
var List={
"t1":"test"
}
function lookup(key) {
return List[key];
}
</msxsl:script>
<xsl:template match="root">
<html>
<div>
<xsl:variable name="name1" select="te"/>
<xsl:value-of select="user:lookup(string($name1))"/>
</div>
</html>
</xsl:template>
</xsl:stylesheet>
Of course, it might be worth asking why you want to use javascript in your XSLT. It may be possible to achieve the same result using purely XSLT, which would certainly make you XSLT more portable.

xslt xml to html : how to match an element that its name starts with xxx?

This is a problem I came across. To prevent h1 to be duplicated, in xml every h1 tag will have a radom number after h1. And the source xml and the wanted html are shown below:
source xml:
<h1_JW1XRT>Hello1</h1_JW1XRT>
<h1_JXZRIE>Hello2</h1_JXZRIE>
convert into html
<h1 id="h1_JW1XRT">Hello1</h1>
<h1 id="h1_JXZRIE">Hello2</h1>
how can i write this template?
This transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[starts-with(name(), 'h1')]">
<h1 id="{name()}"><xsl:apply-templates/></h1>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document (the provided XML fragment, wrapped in a single top element -- to become a well-formed XML document):
<t>
<h1_JW1XRT>Hello1</h1_JW1XRT>
<h1_JXZRIE>Hello2</h1_JXZRIE>
</t>
produces the wanted, correct result:
<h1 id="h1_JW1XRT">Hello1</h1>
<h1 id="h1_JXZRIE">Hello2</h1>
Explanation: Proper use of the standard XPath function starts-with()

Issue with xslt transforming xml to output html

I am trying to do a simple transform of XML using XSLT to generate HTML, but I'm having difficulties and I can't seem to figure out what the problem is. Here is a sample of the XML I am working with:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="C:\Users\cgubata\Documents\Digital Measures\jcamp_fac_ex_xslt.xsl"?>
<Data xmlns="http://www.digitalmeasures.com/schema/data" xmlns:dmd="http://www.digitalmeasures.com/schema/data-metadata" dmd:date="2012-02-27">
<Record userId="310106" username="jcamp" termId="453" dmd:surveyId="1154523">
<dmd:IndexEntry indexKey="COLLEGE" entryKey="School of Business" text="School of Business"/>
<dmd:IndexEntry indexKey="DEPARTMENT" entryKey="Accountancy" text="Accountancy"/>
<dmd:IndexEntry indexKey="DEPARTMENT" entryKey="MBA" text="MBA"/>
<PCI id="11454808064" dmd:lastModified="2012-02-08T13:17:39">
<PREFIX>Dr.</PREFIX>
<FNAME>Julia</FNAME>
<PFNAME/>
<MNAME>M.</MNAME>
<LNAME>Camp</LNAME>
<SUFFIX/>
<ALT_NAME>Julia M. Brennan</ALT_NAME>
<ENDPOS/>
All I want to do is have the value for some of the nodes to be displayed in HTML. So for example, I might want the PREFIC, FNAME, LNAME nodes to display as "Dr. Julia Camp" (no quotes - I'll do styling later). Here's the XSL that I am using:
<?xml version="1.0" encoding="utf-8"?><!-- DWXMLSource="jcamp_fac_ex.xml" -->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dmd="http://www.digitalmeasures.com/schema/data-metadata">
<xsl:output method="html" encoding="utf-8"/>
<xsl:template match="/">
<xsl:value-of select="/Data/Record/PCI/PREFIX"/>
</xsl:template>
</xsl:stylesheet>
From what I been researching, that should show the value of that PREFIX field. But instead, it is outputting all of the values from all of the nodes (so if there are 4000 nodes with a text value, I am getting 4000 values returned in HTML). My goal will be to pull out the values from certain nodes, and I will probably arrange them in a table.
How do I pull out the values from a specific node? Thanks in advance.
Well, I can't reproduce your symptoms. When I test what you've posted it doesn't produce any output at all. Which looks correct because your xpath is testing the wrong namespace. You need to add in your xslt a namespace-prefix mapping for the http://www.digitalmeasures.com/schema/data namespace, and then use it in the value-of xpath. Like this:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dmd="http://www.digitalmeasures.com/schema/data-metadata"
xmlns:dm="http://www.digitalmeasures.com/schema/data">
<xsl:output method="html" encoding="utf-8"/>
<xsl:template match="/">
<xsl:value-of select="/dm:Data/dm:Record/dm:PCI/dm:PREFIX"/>
</xsl:template>
</xsl:stylesheet>
I'm afraid you've fallen into the number one XSLT trap for beginners: we see this question at least once a day on this forum. Your elements are in a namespace and your stylesheet is trying to match nodes in no namespace.