xslt completely remove duplicates from string - html

I have a variable containing non-numerical values, and I need to completely remove duplicate entries from this string using XSLT:
$string = a,b,c,c,d,d,e,f,g
needs to become: $newstring = a,b,e,f,g
An alternative option would be to compare the two variables and ignore/remove the overlapping entries.
$stringA = a,c
$stringB = a,b,c,d,e,f
needs to become:
$newstring = b,d,e,f
Concatenating the variables is straightforward but I need the opposite of that!
Please help,

XSLT is designed to process XML, not strings. XSLT 1.0 in particular is a poor tool for manipulating text.
IMHO, the best way to proceed here is to convert the problem to XML first. If you're using libxslt (as xsltproc does), this is quite easy to do using an extension function:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
extension-element-prefixes="str">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:param name="stringA">a,c,g</xsl:param>
<xsl:param name="stringB">a,b,c,d,e,f</xsl:param>
<xsl:variable name="setA" select="str:tokenize($stringA, ',')" />
<xsl:variable name="setB" select="str:tokenize($stringB, ',')" />
<xsl:template match="/">
<test>
<xsl:for-each select="$setA[not(.=$setB)] | $setB[not(.=$setA)]">
<xsl:value-of select="."/>
<xsl:if test="position()!=last()">,</xsl:if>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<test>g,b,d,e,f</test>

Related

Can I get value from content dictionary with xpath?

This is an example of a meta tag from which I want to get the pub_date:
<meta name="parsely-page" content='{"title":"Article title","link":"https:\/\/site.com\/category\/article","type":"post","section":"category","image_url":"","author":null,"pub_date":"2009-03-01T14:17:14+00:00","post_id":"article_6463676334","tags":[]}' />
The xpath to get the entire content would be:
//meta[#name="parsely-author"]/#content
Is it possible to get the values of dict keys using xpath?
With XPath 3.1 you can do
//meta[#name="parsely-author"]/parse-json(#content)?pub-date
Sadly, it's very likely that you are using an XPath processor that only supports XPath 1.0 in which case you won't be able to use this unless you find a different processor.
With XSLT 1.0:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vQ">"</xsl:variable>
<xsl:template match="/">
<xsl:value-of select=
'substring-before(substring-after(//meta[#name="parsely-page"]/#content,
concat($vQ, "pub_date", $vQ, ":", $vQ)), $vQ)'/>
</xsl:template>
</xsl:stylesheet>
When this transformation is performed on this XML document (your meta tag):
<meta name="parsely-page"
content='{"title":"Article title","link":"https:\/\/site.com\/category\/article","type":"post","section":"category","image_url":"","author":null,"pub_date":"2009-03-01T14:17:14+00:00","post_id":"article_6463676334","tags":[]}' />
the wanted result is produced:
2009-03-01T14:17:14+00:00
We can write a single XPath 1.0 expression that evaluates to the wanted string, however we will have to escape quotes and apostrophes in order to avoid errors for their being nested, if unescaped:
substring-before(substring-after(//meta[#name="parsely-page"]/#content,
&apos;"pub_date":"&apos;),
&apos;"&apos;)
Verification using XSLT 1.0:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vQ">"</xsl:variable>
<xsl:template match="/">
<xsl:value-of select=
'substring-before(substring-after(//meta[#name="parsely-page"]/#content,
&apos;"pub_date":"&apos;),
&apos;"&apos;)'/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied to the same XML document (above), it evaluates the single XPath 1.0 expression and outputs the wanted, correct result:
2009-03-01T14:17:14+00:00

Transform data in Text file to JSON or XML

I'm very much new to JSON and have little knowledge on XML. I have a question regarding data transformation from text to XML or JSON. I've already worked with XSL Transformations, wherein I transform an XML document to either XML or text. But now, I wanted to do other way around, i.e., text -> JSON/XML
For example, I have the following text:
One(A,B) One(A',B)
Two(C,D) Two(C',D)
Three(E,F) Three(E',F)
Four(G,H)
Five(I,J)
Hence the corresponding XML output may look like:
<B> A,A' </B>
<D> C,C' </D>
<F> E,E' </F>
<H> G </H>
<J> I </J>
I hope the question is clear, if not please let me know.
Thanks in advance.
Here is an XSLT 3.0 stylesheet you can use with Saxon 9.8 or Altova XMLSpy/Raptor:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="xs math fn"
version="3.0">
<xsl:param name="input-uri" as="xs:string" select="'input1.txt'"/>
<xsl:output indent="yes"/>
<xsl:template name="xsl:initial-template">
<xsl:apply-templates select="unparsed-text-lines($input-uri)"/>
</xsl:template>
<xsl:template match=".[. instance of xs:string]">
<xsl:for-each-group select="analyze-string(., '\w+\(([^,]*),(\w+)\)')//fn:match" group-by="fn:group[#nr = 2]">
<xsl:element name="{current-grouping-key()}">
<xsl:value-of select="current-group()/fn:group[#nr = 1]" separator=","/>
</xsl:element>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>

Why is there xmlns in my html output

In the html output file from an XSLT process (using saxon9he), there have been 155 occurrences of xmlns:fn="http://www.w3.org/2005/xpath-functions" inserted into a variety of tr elements
The part of xsl that uses xpath-functions is
<xsl:if test="(string(#hideIfHardwareIs)='') or (not(fn:matches(string($input_doc//inf[#id='5'), string(#hideIfHardwareIs), 'i')))">
unless I am reading it wrong, matches takes 3 arguments, a string, another string and then a flag in which case this is case-insensitive.
What I don't undestand is that the tr elements that are showing up with the xmlns arent close to the portion or xsl that the matches() function is done at.
The XSL file I am working with is 2100 lines and the XML file it parses is 12800 lines. So I don't think I can share it easily. I've inherited this and need to (at this time) maintain it.
What are somethings i can look for within the XSL that would insert the xmlns into the html output?
Those functions do not need to be prefixed.
Remove the xmlns:fn="http://www.w3.org/2005/xpath-functions" from your xsl:stylesheet and remove the fn: prefix from the xpath functions.
Examples:
XML Input
<foo>test</foo>
XSLT 2.0 #1
<xsl:stylesheet version="2.0" xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:if test="fn:matches(.,'^t')">
<bar><xsl:value-of select="."/></bar>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output
<bar xmlns:fn="http://www.w3.org/2005/xpath-functions">test</bar>
XSLT 2.0 #2
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:if test="matches(.,'^t')">
<bar><xsl:value-of select="."/></bar>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output
<bar>test</bar>

xml tags passed as arguments to xsl

How to modify the below xsl to process parameters whose value is tags. Instead of using w:p and w:pPr/w:pStyle/#w:val i will be passing them as args
Actual XSl:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
>
<xsl:param name="styleName"/>
<xsl:output method="text"/>
<xsl:template match="w:p"/>
<xsl:template match="w:p[w:pPr/w:pStyle/#w:val[matches(., concat('^(',$styleName,')$'),'i')]]">
<xsl:value-of select="."/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Required XSL:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
<xsl:param name="styleName" select="'articletitle'"/>
<xsl:param name="para" select="'//w:p[w:pPr/w:pStyle/#w:val[matches(.,concat('^(',$styleName,')$')]]'"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="$para">
<xsl:value-of select="."/><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
NOTE: You should get used to give information about which version of xslt and which processor you are using. In this case, the answer is valid only if you are using XSLT 2.0
This is not a complete answer, but you can start by looking at this approach:
Try using local-name and in-scope-prefixes (only available in XSLT 2.0) to match a node dynamically.
Here you have an example template to replace your empty template:
<xsl:template match="*[local-name()=substring-after($para,':') and in-scope-prefixes(.)[.=substring-before($para,':')]]"/>
For the second part of the expression ($parastyle), I can only think about writing your own function to evaluate it dynamically.
I'll try to post an example of such a function later on.
I fugured out the problem in my coding, is in second line,
<xsl:param name="para" select="'//w:p[w:pPr/w:pStyle/#w:val[matches(.,concat('^(',$styleName,')$')]]'"/>
Since i have given the quotes in select attribute the value has been considered as string instead of xpath expression.

How have xsl:function return a string value including html tags

I am attempting to transpose a java function to an xsl:function spec.
The function basically places html tags around substrings.
I now bump into difficulties: using the java inline code this works perfectly, but I am unable to figure out how to prevent output escaping when using the xsl:function.
How can I achieve the output to contain the wanted html tags?
A simplified example of what I am trying to achieve is the following:
input parameter value "AB" should lead to a string A<b>B</b>, shown in html browser as AB of course.
Example function I tried is the below; but then the resulting string is A&lt ;b&gt ;B&lt ;/b&gt ; (note that I had to add blanks to prevent the entities from getting interpreted in this editor), which of course shows up in browers as A<b>B</b>.
Note that xsl:element cannot be used in the xsl:function code, because that has no effect; I want the string result of the function call to contain < and > characters, and then add the string result to the output result file.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:custom="http://localhost:8080/customFunctions">
<xsl:output method="html" version="4.0" encoding="UTF-8" indent="yes"/>
<xsl:function name="custom:test">
<xsl:param name="str"/>
<xsl:value-of select="substring($str,1,1)"/>
<xsl:text disable-output-escaping="yes"><![CDATA[<b>]]></xsl:text>
<xsl:value-of select="substring($str,2)"/>
<xsl:text disable-output-escaping="yes"><![CDATA[</b>]]></xsl:text>
</xsl:function>
<xsl:template match="/">
<xsl:element name="html">
<xsl:element name="body">
<xsl:value-of select="custom:test('AB')"/>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Here is an example, use sequence instead value-of and make sure your function returns nodes (which is usually simply done by writing literal result elements):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:custom="http://localhost:8080/customFunctions"
exclude-result-prefixes="custom">
<xsl:output method="html" version="4.0" encoding="UTF-8" indent="yes"/>
<xsl:function name="custom:test">
<xsl:param name="str"/>
<xsl:value-of select="substring($str,1,1)"/>
<b>
<xsl:value-of select="substring($str,2)"/>
</b>
</xsl:function>
<xsl:template match="/">
<xsl:element name="html">
<xsl:element name="body">
<xsl:sequence select="custom:test('AB')"/>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>