Wrapping XML document in one-element JSON using XSLT 1.0 - json

I'm trying to transform an XML document to be in single-line and wrap it in a one-element JSON. Using XSLT 1.0
The problem is, XSL generates double quotes in the xmlns definitions so the resulting JSON is invalid.
This is my input:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<otm:Transmission xmlns:otm='http://xmlns.oracle.com/apps/otm/transmission/v6.4' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<otm:TransmissionHeader/>
<otm:TransmissionBody>
<otm:GLogXMLElement>
<otm:Invoice>
<otm:Payment>
<otm:PaymentHeader>
<otm:DomainName>CompanyX</otm:DomainName>
<otm:TransactionCode>EX</otm:TransactionCode>
<otm:InvoiceDate>
<otm:GLogDate>20220414000000</otm:GLogDate>
</otm:InvoiceDate>
</otm:PaymentHeader>
</otm:Payment>
</otm:Invoice>
</otm:GLogXMLElement>
</otm:TransmissionBody>
</otm:Transmission>
This is what I'm getting:
{"jsonElement":"<otm:Transmission xmlns:otm="http://xmlns.oracle.com/apps/otm/transmission/v6.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><otm:TransmissionHeader/><otm:TransmissionBody><otm:GLogXMLElement><otm:Invoice><otm:Payment><otm:PaymentHeader><otm:DomainName>CompanyX</otm:DomainName><otm:TransactionCode>EX</otm:TransactionCode><otm:InvoiceDate><otm:GLogDate>20220414000000</otm:GLogDate></otm:InvoiceDate></otm:PaymentHeader></otm:Payment></otm:Invoice></otm:GLogXMLElement></otm:TransmissionBody></otm:Transmission>"}
The XSL that I use:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:otm='http://xmlns.oracle.com/apps/otm/transmission/v6.4'>
<xsl:output method="text" indent="no" suppress-indentation="otm:Transmission"/>
<xsl:strip-space elements="*" />
<xsl:template match="/">
{"jsonElement":"<xsl:apply-templates select="*"/>"}
</xsl:template>
<xsl:template match="#* | *">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
As you can see, the JSON is invalid due to double quotes in the xmlns definitons.
I tried several approaches and I am not able to get rid of the double quotes. In the input, they are single quotes but XSL is generating them differently.
What would be the best approach to have a valid JSON result?
The XML in the JSON has to be 1:1 copy of the input but transformed into a single line and I can only use XSLT 1.0

An XSLT 1.0 processor is going to reject the suppress-indentation attribute, and it's going to output the text of the source document without markup. Like #MartinHonnen, I don't see how any XSLT processor can give you the output you claim to be getting.
In XSLT 3.0 you can do
<xsl:output method="json">
<xsl:template match="/">
<xsl:map key="'jsonElement'"
select="serialize(., map{'method':'xml'})"/>
</xsl:template>

Related

Can I get value from content dictionary with xpath?

This is an example of a meta tag from which I want to get the pub_date:
<meta name="parsely-page" content='{"title":"Article title","link":"https:\/\/site.com\/category\/article","type":"post","section":"category","image_url":"","author":null,"pub_date":"2009-03-01T14:17:14+00:00","post_id":"article_6463676334","tags":[]}' />
The xpath to get the entire content would be:
//meta[#name="parsely-author"]/#content
Is it possible to get the values of dict keys using xpath?
With XPath 3.1 you can do
//meta[#name="parsely-author"]/parse-json(#content)?pub-date
Sadly, it's very likely that you are using an XPath processor that only supports XPath 1.0 in which case you won't be able to use this unless you find a different processor.
With XSLT 1.0:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vQ">"</xsl:variable>
<xsl:template match="/">
<xsl:value-of select=
'substring-before(substring-after(//meta[#name="parsely-page"]/#content,
concat($vQ, "pub_date", $vQ, ":", $vQ)), $vQ)'/>
</xsl:template>
</xsl:stylesheet>
When this transformation is performed on this XML document (your meta tag):
<meta name="parsely-page"
content='{"title":"Article title","link":"https:\/\/site.com\/category\/article","type":"post","section":"category","image_url":"","author":null,"pub_date":"2009-03-01T14:17:14+00:00","post_id":"article_6463676334","tags":[]}' />
the wanted result is produced:
2009-03-01T14:17:14+00:00
We can write a single XPath 1.0 expression that evaluates to the wanted string, however we will have to escape quotes and apostrophes in order to avoid errors for their being nested, if unescaped:
substring-before(substring-after(//meta[#name="parsely-page"]/#content,
&apos;"pub_date":"&apos;),
&apos;"&apos;)
Verification using XSLT 1.0:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vQ">"</xsl:variable>
<xsl:template match="/">
<xsl:value-of select=
'substring-before(substring-after(//meta[#name="parsely-page"]/#content,
&apos;"pub_date":"&apos;),
&apos;"&apos;)'/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied to the same XML document (above), it evaluates the single XPath 1.0 expression and outputs the wanted, correct result:
2009-03-01T14:17:14+00:00

is it possible to disable-output-escaping twice in XSLT

I have XML that has encoded HTML data. I am trying to render the data but can't seem to figure out how. Best I can tell is I need to disable-output-escaping="yes" twice but not sure how to do that.
For example, this is a snippet of my XML:
<root>
<node value="&lt;b&gt;body&lt;/b&gt;" />
</root>
My XSLT is outputting HTML. Here is the rendered output (the HTML source) with various options
<xsl:value-of select="#value" /> outputs &lt;b&gt;hi&lt;/b&gt;
<xsl:value-of select="#value" disable-output-escaping="yes" /> outputs <b>hi</b>
I would like it to output <b>hi</b> to the HTML source so its actually rendered as a bolded hi. Does that make sense? Is that possible?
Escaping is the process of turning < into <. If you disable escaping, it will leave < as <. What you want to achieve is to turn < into <, which would normally be called "unescaping".
In the normal course of events, a parser performs unescaping, while a serializer performs escaping. So if you want to unescape characters, you need to put them through a parsing process, which means you need to take the content of the #value attribute and put it through an operation like fn:parse-xml-fragment() in XPath 3.0, or an equivalent extension function in your chosen processor.
Assuming Sharepoint as a Microsoft .NET product uses XslCompiledTransform you could try to implement the unescaping and parsing with extension "script" (C# or VB or JScript.NET code embedded in XSLT) as follows:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="msxsl mf">
<msxsl:script language="C#" implements-prefix="mf">
<msxsl:using namespace="System.IO"/>
public string Unescape(string input)
{
XmlDocument doc = new XmlDocument();
XmlDocumentFragment frag = doc.CreateDocumentFragment();
frag.InnerXml = input;
return frag.InnerText;
}
public XPathNavigator ParseXml(string xmlInput)
{
using (StringReader sr = new StringReader(xmlInput))
{
return new XPathDocument(sr).CreateNavigator();
}
}
</msxsl:script>
<xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<html>
<head>
<title>Test</title>
</head>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="node">
<div>
<xsl:copy-of select="mf:ParseXml(mf:Unescape(#value))" />
</div>
</xsl:template>
</xsl:stylesheet>
If you have access to an XSLT processor (like any version of Saxon 9.7 or Exselt or the latest Altova or XmlPrime) supporting the XPath 3 functions parse-xml and parse-xml-fragment you can write that template without extension functions (in a version="3.0" stylesheet) as
<xsl:template match="node">
<div>
<xsl:copy-of select="parse-xml(string(parse-xml-fragment(#value)))"/>
</div>
</xsl:template>
Output your result with disable-output-escaping, then treat it again in another XSL with disable-output-escaping.

xslt completely remove duplicates from string

I have a variable containing non-numerical values, and I need to completely remove duplicate entries from this string using XSLT:
$string = a,b,c,c,d,d,e,f,g
needs to become: $newstring = a,b,e,f,g
An alternative option would be to compare the two variables and ignore/remove the overlapping entries.
$stringA = a,c
$stringB = a,b,c,d,e,f
needs to become:
$newstring = b,d,e,f
Concatenating the variables is straightforward but I need the opposite of that!
Please help,
XSLT is designed to process XML, not strings. XSLT 1.0 in particular is a poor tool for manipulating text.
IMHO, the best way to proceed here is to convert the problem to XML first. If you're using libxslt (as xsltproc does), this is quite easy to do using an extension function:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
extension-element-prefixes="str">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:param name="stringA">a,c,g</xsl:param>
<xsl:param name="stringB">a,b,c,d,e,f</xsl:param>
<xsl:variable name="setA" select="str:tokenize($stringA, ',')" />
<xsl:variable name="setB" select="str:tokenize($stringB, ',')" />
<xsl:template match="/">
<test>
<xsl:for-each select="$setA[not(.=$setB)] | $setB[not(.=$setA)]">
<xsl:value-of select="."/>
<xsl:if test="position()!=last()">,</xsl:if>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<test>g,b,d,e,f</test>

HTML entity numbers in xslt

I'm attempting to transform HTML to XML. My Input HTML is obtained dynamically, and the input HTML has html entity numbers as below.
HTML Input:
<root>
<h1>Hello stack Over flow</h1>
<H1 align="left">The list will be managed with a  <SUB>of © ®</H1>
</root>
My transform looks as below :
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="root">
<xsl:copy >
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
the output from the transform is writing all html entity numbers as html special characters.
The desired output should have html entity numbers instead of html characters. Please help me to get out of this issue?
You could try to put encoding="US-ASCII" on your xsl:output directive, that way any characters outside of that encoding should be output as character references.

Why is there xmlns in my html output

In the html output file from an XSLT process (using saxon9he), there have been 155 occurrences of xmlns:fn="http://www.w3.org/2005/xpath-functions" inserted into a variety of tr elements
The part of xsl that uses xpath-functions is
<xsl:if test="(string(#hideIfHardwareIs)='') or (not(fn:matches(string($input_doc//inf[#id='5'), string(#hideIfHardwareIs), 'i')))">
unless I am reading it wrong, matches takes 3 arguments, a string, another string and then a flag in which case this is case-insensitive.
What I don't undestand is that the tr elements that are showing up with the xmlns arent close to the portion or xsl that the matches() function is done at.
The XSL file I am working with is 2100 lines and the XML file it parses is 12800 lines. So I don't think I can share it easily. I've inherited this and need to (at this time) maintain it.
What are somethings i can look for within the XSL that would insert the xmlns into the html output?
Those functions do not need to be prefixed.
Remove the xmlns:fn="http://www.w3.org/2005/xpath-functions" from your xsl:stylesheet and remove the fn: prefix from the xpath functions.
Examples:
XML Input
<foo>test</foo>
XSLT 2.0 #1
<xsl:stylesheet version="2.0" xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:if test="fn:matches(.,'^t')">
<bar><xsl:value-of select="."/></bar>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output
<bar xmlns:fn="http://www.w3.org/2005/xpath-functions">test</bar>
XSLT 2.0 #2
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:if test="matches(.,'^t')">
<bar><xsl:value-of select="."/></bar>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output
<bar>test</bar>