extracting information from a JSON file using XSLT version 1.0 - json

I'm a noobie to stackoverflow and xslt so I hope I don't sound unintelligent!
So I am working with SDI for a GIS company and I have a task that requires me to convert points that are in one spacial reference system (SRS) coordinate plane, such as EPSG:4035, to the world SRS, aka EPSG:4326. This really isn't a problem for me since I have the accessibility of an online service that will just give me what I want. However, the format that it outputs is in either JSON or HTML. I have browsed for a while to find a way to extract information from a JSON file but most of the techniques I have seen use xslt:stylesheet version 2.0, and I have to use version 1.0. One method I thought about doing was using the document($urlWithJsonFormat) xslt function, however this only accepts xml files.
Here is an example of the JSON formatted file that I would retrieve after asking for the conversion:
{
"geometries" :
[{
"xmin" : -4,
"ymin" : -60,
"xmax" : 25,
"ymax" : -41
}
]
}
All I simply want are the xmin, ymin, xmax, and ymax values, that's all! It just seems so simple yet nothing works for me...

You could use an external entity to include the JSON data as part of an XML file that you then transform.
For instance, assuming the example JSON is saved as a file called "geometries.json" you could create an XML file like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE wrapper [
<!ENTITY otherFile SYSTEM "geometries.json">
]>
<wrapper>&otherFile;</wrapper>
And then transform it with the following XSLT 1.0 stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="wrapper">
<geometries>
<xsl:call-template name="parse-json-member-value">
<xsl:with-param name="member" select="'xmin'"/>
</xsl:call-template>
<xsl:call-template name="parse-json-member-value">
<xsl:with-param name="member" select="'ymin'"/>
</xsl:call-template>
<xsl:call-template name="parse-json-member-value">
<xsl:with-param name="member" select="'xmax'"/>
</xsl:call-template>
<xsl:call-template name="parse-json-member-value">
<xsl:with-param name="member" select="'ymax'"/>
</xsl:call-template>
</geometries>
</xsl:template>
<xsl:template name="parse-json-member-value">
<xsl:param name="member"/>
<xsl:element name="{$member}">
<xsl:value-of select="normalize-space(
translate(
substring-before(
substring-after(
substring-after(.,
concat('"',
$member,
'"'))
, ':')
,'
')
, ',', '')
)"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
To produce the following output:
<geometries>
<xmin>-4</xmin>
<ymin>-60</ymin>
<xmax>25</xmax>
<ymax>-41</ymax>
</geometries>

The two main choices here seem to be:
write (or use) a JSON parser in XSLT 1.0, or
use some other language than XSLT.
Since XSLT 1 engines generally can't process JSON directly I'd recommend using some other language to convert to XML.
https://github.com/WelcomWeb/JXS may help you too, if this is XSLT in a Web browser.

Related

Getting HTML elements via XPath in bash

I was trying to parse a page (Kaggle Competitions) with xpath on MacOS as described in another SO question:
curl "https://www.kaggle.com/competitions/search?SearchVisibility=AllCompetitions&ShowActive=true&ShowCompleted=true&ShowProspect=true&ShowOpenToAll=true&ShowPrivate=true&ShowLimited=true&DeadlineColumnSort=Descending" -o competitions.html
cat competitions.html | xpath '//*[#id="competitions-table"]/tbody/tr[205]/td[1]/div/a/#href'
That's just getting a href of a link in a table.
But instead of returning the value, xpath starts validating .html and returns errors like undefined entity at line 89, column 13, byte 2964.
Since man xpath doesn't exist and xpath --help ends with nothing, I'm stuck. Also, many similar solutions relate to xpath from GNU distributions, not in MacOS.
Is there a correct way of getting HTML elements via XPath in bash?
Getting HTML elements via XPath in bash
from html file (with not valid xml)
One possibility may be to use xsltproc. (I hope it is available for MAC). xsltproc has an option --html to use html as input. But with that you need
to have a xslt stylesheet.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" />
<xsl:template match="/*">
<xsl:value-of select="//*[#id='competitions-table']/tr[205]/td[1]/div/a/#href" />
</xsl:template>
</xsl:stylesheet>
Notice that the xapht is changed. There is no tbodyin the input file.
Call xsltproc:
xsltproc --html test.xsl competitions.html 2> /dev/null
Where the xslproc complaining about errors in html is ignored ( send to /devn/null ).
The output is: /c/R
To use different xpath expression from command line you may use a xslt template and replace the __xpath__.
E.g. xslt template:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" />
<xsl:template match="/*">
<xsl:value-of select="__xpaht__" />
</xsl:template>
</xsl:stylesheet>
And use (e.g) sed for the replacement.
sed -e "s,__xpaht__,//*[#id='competitions-table']/tr[205]/td[1]/div/a/#href," test.xslt.tmpl > test.xsl
xsltproc --html test.xsl competitions.html 2> /dev/null

XLST creates an empty space after convert to html

I donĀ“t get it.
My xml input:
<?xml version="1.0" encoding="UTF-8"?>
<results>
<error file="mixed.cpp" line="11" id="unreadVariable" severity="style" msg="Variable 'wert' is assigned a value that is never used."/>
<error file="mixed.cpp" line="13" id="unassignedVariable" severity="style" msg="Variable 'b' is not assigned a value."/>
<error file="mixed.cpp" line="11" id="arrayIndexOutOfBounds" severity="error" msg="Array 'wert[2]' accessed at index 3, which is out of bounds."/>
<error file="mixed.cpp" line="15" id="uninitvar" severity="error" msg="Uninitialized variable: b"/>
<error file="mixed.cpp" line="5" id="unusedFunction" severity="style" msg="The function 'func' is never used."/>
<error file="*" line="0" id="unmatchedSuppression" severity="style" msg="Unmatched suppression: missingIncludeSystem"/>
</results>
using this xsl file:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:template match="error">
<tr>
<td><xsl:value-of select="#file"/></td>
<td><xsl:value-of select="#line"/></td>
<td><xsl:value-of select="#test"/></td>
<td><xsl:value-of select="#severity"/></td>
<td><xsl:value-of select="#msg"/></td>
</tr>
</xsl:template>
</xsl:stylesheet>
But the first line I get is empty:
empty line
<tr><td>mixed.cpp</td><td>11</td><td/><td>style</td><td>Variable 'wert' is assigned a value that is never used.</td></tr>
Where is the empty line coming from?
The default template kicks in for templates not matching your error template, and the default template just outputs the text. Since you have whitespace text nodes, and you are not matching results, the whitespace inside results (and before and after error) will become part of the output.
There are multiple ways to fix this. A typical method is to write a low priority template that matches text that you do not want to match. I.e., if you add the following, your whitespace will disappear:
<xsl:template match="text()" />
Another approach would be to positively match your structure. I.e., if you would add the following, the whitespace also disappears, because now you match the root element and subsequently only apply templates on the elements that you are interested in (and not also the text nodes under results).
<xsl:template match="results">
<xsl:apply-templates select="error" />
</xsl:template>
A third approach would be to add a whitespace-stripping declaration, but this may influence the input XML if your actual stylesheet is larger and would depend on whitespace elsewhere. This would only strip the whitespace on the results element:
<xsl:strip-space elements="results"/>
All three solution work, it depends on your project as a whole which one is most suitable.
Remember that in XSLT 1.0 and XSLT 2.0 non-matching nodes will be matched by the default template (which is invisible) and simply outputs the text value of that node. In XSLT 3.0 you have more control over this process:
<!-- XSLT 3.0 only -->
<xsl:mode on-no-match="shallow-skip" />

XSL:if test not working as expected

<xsl:for-each select="class/student">
ID: <xsl:value-of select="id"/><br/>
Name: <xsl:value-of select="lastName"/>,<xsl:value-of select="firstName"/><br/>
Date: <xsl:value-of select="date"/><br/>
Major: <xsl:if test="major[#Year > 2008]">
<xsl:value-of select="major"/> ,
declared in: <xsl:value-of select="major[#Year]"/>
</xsl:if><br/><br/>
</xsl:for-each>
XML code`:
<student>
<id>1000001</id>
<lastName>john</lastName>
<firstName>Doe</firstName>
<date format="d">08/25/2006</date>
<major Year="2006">CS:BS</major>
</student>
output:
ID: 1000001
Name: Doe,John
Date-enrolled: August 25, 2006
Major: CS:BS , declared in: CS:BS
the xml code above is just a sample of the actual xml code, there are more 'Year' values/elements.
Hi guys,
Im trying to get only majors whose Year is greater than 2008, for some reason im getting the wrong output.
thanks
You say "there are more 'Year' values/elements" than shown, and that may be the key to the problem. If your input contains two elements
<major Year="2006">CS:BS</major>
<major Year="2009">CS:BS</major>
then test="major/#Year > 2008" will return true, because there is one such element, and in XSLT 1.0, <xsl:value-of select="major"/> will output the first selected element (in 2.0 it will raise an error).
In future, please try to supply a complete sample stylesheet and source document that allow others to reproduce the problem. If you try to cut it down without testing that the cut-down version exhibits the problem, it's easy to eliminate the feature that is the actual source of the trouble.
The problem is that you are going about this backwards. You need to select the stuff that you want, and then use it. You are checking whether the stuff that you want exists, and then using something less specific.
This should fix your issue:
Major:
<xsl:for-each select="major[#Year > 2008]">
<br/>
<xsl:value-of select="."/>, declared in: <xsl:value-of select="#Year"/>
</xsl:for-each>
<br/>
Your use of
<xsl:value-of select="major[#Year]"/>
was also incorrect. The following would have actually output a year value:
<xsl:value-of select="major/#Year"/>
I strongly suggest using either <xsl:text> or <xsl:value-of> for your static text. Your current approach of sprinkling text throughout your XSLT is resulting in very messy-looking XSLT code:
<xsl:for-each select="class/student">
<xsl:value-of select="concat('ID: ', id)"/>
<br/>
<xsl:value-of select="concat('Name: ', lastName, ',', firstName)"/>
<br/>
<xsl:value-of select="concat('Date: ', date)"/>
<br/>
<xsl:text>Major:</xsl:text>
<xsl:for-each select="major[#Year > 2008]">
<br/>
<xsl:value-of select="concat(., ', declared in: ', #Year)"/>
</xsl:for-each>
<br/>
</xsl:for-each>

XSLT- how to display the first word of the paragraph as BOLD

I am unable to do this. I am using XML file, which will be converted to HTML using XSLT.
The sample XML file would be like this...
<name>ABC</name>
<dob>09-Jan-1973</dob>
..
..
<info>My name is ABC. I am working with XYZ Ltd.
I have an experience of 5 years.
I have been working on Java platform since 5 years.
</info>
The info tag contains information which is in the form of paragraphs. I want the first word bold.
Following will be the HTML output, only for info tag..
<b>My</b> name is ABC. I am working with XYZ Ltd.
<b>I</b> have an...
<b>I</b> have been working....
Have a nice day
John
This transformation (XSLT 2.0):
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="info">
<xsl:variable name="vParas" select="tokenize(.,'
')[normalize-space()]"/>
<xsl:for-each select="$vParas">
<xsl:variable name="vHead" select=
"tokenize(., '[\s.?!,;:\-]+')[.][1]"/>
<b>
<xsl:sequence select="$vHead"/>
</b>
<xsl:sequence select="substring-after(., $vHead)"/>
<xsl:sequence select="'
'"/>
</xsl:for-each>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
when applied on the provided input (massaged into a wellformed XML document):
<t>
<name>ABC</name>
<dob>09-Jan-1973</dob> .. ..
<info>My name is ABC.
I am working with XYZ Ltd.
I have an experience of 5 years.
I have been working on Java platform since 5 years. </info>
</t>
produces the wanted, correct result:
<b>My</b> name is ABC.
<b>I</b> am working with XYZ Ltd.
<b>I</b> have an experience of 5 years.
<b>I</b> have been working on Java platform since 5 years.
And this displays in the browser as expected:
My name is ABC.
I am working with XYZ Ltd.
I have an experience of 5 years.
I have been working on Java platform since 5 years.
An XSLT 1.0 solution is also possible and is a little bit more complicated.
Explanation: Use of the standard XPath 2.0 function tokenize() and of the standard XPath functions normalize-space() and substring-after().
Try using Pseudo-Element property in css
p:first-letter
{
font-weight: bold;
}
i would work with the substring methods ..,
something like this
<xsl:template match="info" mode="parapgrah">
<b><xsl:value-of select="substring-before(text(), ' ')" /></b><xsl:value-of select="substring-after(text(), ' ')" />
</xsl:template>
ups - this will only replace the first occurence .. you'll need to add a foreach for every 'paragraph'

Is there an elegant way to add multiple HTML classes with XSLT?

Let's say I'm transforming a multiple-choice quiz from an arbitrary XML format to HTML. Each choice will be represented as an HTML <li> tag in the result document. For each choice, I want to add an HTML class of correct to the <li> if that choice was the correct answer. Additionally, if that choice was the one selected by the user, I want to add a class of submitted to the <li>. Consequently, if the choice is the correct one as well as the submitted one, the <li> should have a class of correct submitted.
As far as I know, white-space separated attribute values aren't a part of the XML data model and thus cannot directly be created via XSLT. However, I have a feeling there's a better way of doing this than littering the code with one conditional for every possible combination of classes (which would be acceptable in this example, but unwieldy in more complex scenarios).
How can I solve this in an elegant way?
Example of Desired Result:
<p>Who trained Obi-Wan Kenobi?</p>
<ul>
<li>Mace Windu</li>
<li class="correct submitted">Qui-Gon Jinn</li>
<li>Ki-Adi-Mundi</li>
<li>Yaddle</li>
</ul>
Firstly, there is nothing wrong with whitespace in attribute values in XML: roughly speaking, attribute value normalization converts whitespace characters to spaces and collapses adjacent spaces to a single space when a document is parsed, but whitespace is definitely allowed. EDIT: See below for more on this.
Matthew Wilson's approach fails to include whitespace between the possible values, as you mention in your comment thereto. However, his approach is fundamentally sound. The final piece of the jigsaw is your dislike of redundant spaces: these can be eliminated by use of the normalize-space XPath function.
The following stylesheet puts all the bits together - note that it doesn't do anything with its input document, so for testing purposes you can run it against any XML document, or even against itself, to verify that the output meets your requirements.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="foo0" select="false()"/>
<xsl:variable name="bar0" select="true()"/>
<xsl:variable name="foo1" select="true()"/>
<xsl:variable name="bar1" select="false()"/>
<xsl:variable name="foo2" select="true()"/>
<xsl:variable name="bar2" select="true()"/>
<xsl:template match="/">
<xsl:variable name="foobar0">
<xsl:if test="$foo0"> foo</xsl:if>
<xsl:if test="$bar0"> bar</xsl:if>
</xsl:variable>
<xsl:variable name="foobar1">
<xsl:if test="$foo1"> foo</xsl:if>
<xsl:if test="$bar1"> bar</xsl:if>
</xsl:variable>
<xsl:variable name="foobar2">
<xsl:if test="$foo2"> foo</xsl:if>
<xsl:if test="$bar2"> bar</xsl:if>
</xsl:variable>
<li>
<xsl:attribute name="class">
<xsl:value-of select="normalize-space($foobar0)"/>
</xsl:attribute>
</li>
<li>
<xsl:attribute name="class">
<xsl:value-of select="normalize-space($foobar1)"/>
</xsl:attribute>
</li>
<li>
<xsl:attribute name="class">
<xsl:value-of select="normalize-space($foobar2)"/>
</xsl:attribute>
</li>
</xsl:template>
</xsl:stylesheet>
EDIT: Further to the question of spaces separating discrete components within the value of an attribute: The XML Spec defines a number of possible valid constructs as attribute types, including IDREFS and NMTOKENS. The first case matches the Names production, and the second case matches the NMTokens production; both these productions are defined as containing multiple values of the appropriate type, delimited by spaces. So space-delimited lists of values as the value of a single attribute are an inherent component of the XML information set.
Off the top of my head, you can build up a space-separated list with something like:
<li>
<xsl:attribute name="class">
<xsl:if cond="...">correct</xsl:if>
<xsl:if cond="...">submitted</xsl:if>
</xsl:attribute>
</li>
As far as I know, white-space separated attribute values aren't a part of the XML data model and thus cannot directly be created via XSLT
Unless you are converting to an XML language (which HTML is not, XHTML is), you shouldn't worry about the XML validity of the XSLT ouput. This can be anything, and doesn't need to conform to XML!