I would like to do some XSLT conversion with the HTML page with YQL. The following line is used to get HTML:
select * from html where url="http://example.com/somepage" and
xpath='//div[#class="article-text"]'
How can I apply select * from xslt where ... to the previous result?
Not sure as I haven't used YQL before, but I guess you have to go the other way round: using XSLT to get the result out of the HTML and than apply the YQL-Query to get the XML as result:
XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:apply-templates select="//div[#class='article-text']" />
</xsl:template>
<xsl:template match="div[#class='article-text']">
<articletext>
<xsl:value-of select="."/>
</articletext>
</xsl:template>
YQL query:
select * from xslt where stylesheet="url/name-of.xsl" and
url="http://example.com/somepage"
This should result in
<results>
<articletext>Text of article</articletext>
</results>
As I don't know YQL but was used working with XSLT/XPath, I just googled about it and found this recommendable SO example: YQL column projection using XPATH . Instead of just pasting the link I adjusted the XSLT-Part of the example provided there to match your query.
Note that HTML is not an XML-based language (though XHTML is). If you want to operate on HTML using XML tools, you will need to either find an HTML parser (such as nekohtml, which is based on Apache Xerces) or preconvert the HTML to XHTML using something like the W3C's tidy tool.
Related
I need to specify the output order from a html file to a text file. Therefore I use the xsl:apply-templates select approach.
It works ok but in order to fine tune the output of the different nodes I need a corresponding template, not just a general one. This also works ok but I need to repeat the select pattern in the match pattern for the template.
I like to define a variable that holds the pattern so it only needs to be defined once.
Below is my simplified style sheet and simplified html which does not work but gives an idea of what I want to accomplish.
Is it possible to use variables like this? I can use both xslt 1.0 and 2.0 if needed.
<xsl:stylesheet ...>
...
<xsl:variable name="first">div[#class='one']</xsl:variable>
<xsl:variable name="second">div[#class='two']</xsl:variable>
<xsl:template match="/*">
<xsl:apply-templates select="//$first"/>
<xsl:apply-templates select="//$second"/>
...
</xsl:template>
<xsl:template match="//$first">
<xsl:text>Custom text for class one:</xsl:text><xsl:value-of select="text()"/>
</xsl:template>
<xsl:template match="//$second">
<xsl:text>Custom text for class two:</xsl:text><xsl:value-of select="text()"/>
</xsl:template>
</xsl:stylesheet>
The html:
...
<div class="two">text from two</div>
<div class="one">text from one </div>
...
Desired output:
Custom text for class one: text from one
Custom text for class two: text from two
There is no way to use variables like that in XSLT 1 or 2. The only way would be to write a stylesheet producing a second stylesheet and execute that separately.
In XSLT 3 there are new features called static variables/parameters and shadow attributes that could help or there you could use the transform function to execute a newly generated stylesheet directly with XSLT instead of in a separate step with a host language.
But using XSLT 2 you can shorten the
<xsl:apply-templates select="//div[#class='one']"/>
<xsl:apply-templates select="//div[#class='two']"/>
to
<xsl:apply-templates select="//div[#class='one'], //div[#class='two']"/>
For completeness here is the XSLT 3 approach with two static parameters used in shadow attributes:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
version="3.0">
<xsl:param name="first" static="yes" as="xs:string" select=""div[#class='one']""/>
<xsl:param name="second" static="yes" as="xs:string" select=""div[#class='two']""/>
<xsl:template match="/*">
<xsl:apply-templates _select="//{$first}, //{$second}"/>
</xsl:template>
<xsl:template _match="{$first}">
<xsl:text>Custom text for class one:</xsl:text><xsl:value-of select="text()"/>
</xsl:template>
<xsl:template _match="{$second}">
<xsl:text>Custom text for class two:</xsl:text><xsl:value-of select="text()"/>
</xsl:template>
</xsl:stylesheet>
Variables in XSLT hold values, not fragments of expressions. (In other words, XSLT is not a macro language).
As an alternative to Martin's solution, which requires XSLT 3.0, you could consider using what are sometimes called "meta-stylesheets" - do a transformation as a pre-processing step on the stylesheet itself. You could even write the generic stylesheet to use the XSLT 3.0 syntax with shadow attributes like _match, and do an XSLT preprocessing phase to convert this to regular XSLT 1.0 or 2.0 syntax for execution.
I tried to transform a XML document within a web browser to HTML via two XSL transformations.
Long story short: XML => XML => HTML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<?xml-stylesheet type="text/xsl" href="enrich.xsl" ?>
<?xml-stylesheet type="text/xsl" href="overview.xsl" ?>
<project></project>
The first XSL should add some elements to the XML,
the second XSL should transform the result from the first step to HTML.
My target is to get HTML displayed at the end.
Both XSL are transformed separately.
It seems to me that Safari, Firefox and Chrome do not execute more than one processing instruction. Is this true, or am i missing something?
I never tried to execute two seperate transformations in a web browser, but you may try this kind of patterns to do "2 transforms in 1" (this only works with XSLT 2.0, cause of the variable structure) :
<xsl:template match="/">
<!-- You use a variable to store the result of the first transformation.-->
<xsl:variable name="result1">
<!-- You use a mode called transform1 (or whatever) to distinct templates for
transform1 from those of transform2-->
<xsl:apply-templates select="*" mode="transform1"/>
</xsl:variable>
<!-- You execute the second transform on the result variable (you could use a
mode to formally distinct the template from transform2, or you could use default
mode for them) -->
<xsl:apply-templates select="$result1"/>
I am attempting to place the current date/time into an XSLT document.
The XSLT:
<xsl:template name="global" match="/">
<body>
<h1>Feed</h1>
<div class="date"><xsl:value-of select="current-dateTime()"/></div>
<xsl:apply-templates select="products[#productCode='REMSA']"/>
</body>
</xsl:template>
When the current-dateTime() function is in place, it and anything below it will not render on the page. Anything above it will show up just fine. I get no errors, just blank space. This is day 4 of me looking at XSLT so I am very new this language. Any help, tips, or recommendations will go a long way.
Thank You!
Maybe you missed to declare the XSL function namespace to have access to the current-dateTime() function:
Add to your xsl root element the fn declaration:...:
xmlns:fn="http://www.w3.org/2005/xpath-functions"
...and change your statement to:
<xsl:value-of select="fn:current-dateTime()"/>
see also:
http://www.w3schools.com/xpath/xpath_functions.asp#datetime
Can an XSLT insert the current date?
I am converting a docbook to an html using 1.77 xsl transformation. But when it is transformed it automatically generates a Table of Contents. How do you change this behavior?
I have found this: Disable table of contents for documents
So I am guessing that html xsl transform would be the presentation system?
Elaborate DocBook formatting is meant to be customized using an xsl stylesheet.
See Also
Writing a DocBOok Customization Layer for Formatting
Customizing Table Of Contents Using XSL
customize_formatting.xsl: Example DocBook XSL Customization Layer
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="1.0"> <!-- change this to 2.0 if your tools support it -->
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/fo/docbook.xsl"/>
<!--uncomment this to suppress toc when using XSL 1.0 or 2.0
<xsl:param name="generate.toc">article</xsl:param>
<xsl:param name="generate.toc">book</xsl:param>
-->
<!--uncomment this to suppress toc when using XSL 2.0
<xsl:param name="generate.toc">
article nop
book nop
</xsl:param>
-->
</xsl:stylesheet>
How to Use customize_formatting.xsl
Point your tools to use customize_formatting.xsl instead of the off-the-shelf docbook.xsl. Then, put all your formatting customizations in the body of the <xsl:stylesheet> section.
For TOC suppression, you can just uncomment the appropriate line.
There is a quirk with some (or maybe all) XSL 1.0 tools that seem to prevent them from handling the whitespace-separated pairs used in the body of <xsl:param name="generate.toc">. I have had success suppressing TOC by just using the single word article or book instead of the proper whitespace separated pairs.
When transforming, you can use the Transformer#setParameter(String, Object) method to specify no TOC generation like this:
transformer.setParameter("generate.toc", "nop");
I am trying to create a new XML file from an exisiting one using XSL. When writing the new file, I want to mask data appearing in the accountname field.
This is how my XML looks like:
<?xml version="1.0" encoding="UTF-8"?>
<Sumit>
<AccountName>Sumit</AccountName>
<CCT_datasetT id="Table">
<row>
<CCTTitle2>Title</CCTTitle2>
</row>
</CCT_datasetT>
</Sumit>
Here is my XSL Code:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="no" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute namespace="{namespace-uri()}" name="{name()}"/>
</xsl:template>
<xsl:template match="AccountName">
<AccountName>acc_no</AccountName>
</xsl:template>
</xsl:stylesheet>
When I apply the XSL code to my XML, I get the following output:
<?xml version="1.0" encoding="UTF-16"?>
<Sumit>
<AccountName>acc_no</AccountName>
<CCT_datasetT id="">
<row>
<CCTTitle2>Title</CCTTitle2>
</row>
</CCT_datasetT>
</Sumit>
with the following issues:
1) It creates the output using UTF-16 encoding
2) The output of the second line is:
<CCT_datasetT id="">
The attribute value(Table) is missing.
Can anyone please tell me how do I get rid of these two issues. Many thanks.
#Evan Lenz:
Here is the javascript code:
var oArgs = WScript.Arguments;
if (oArgs.length == 0)
{
WScript.Echo ("Usage : cscript xslt.js xml xsl");
WScript.Quit();
}
xmlFile = oArgs(0) + ".xml";
xslFile = oArgs(1) + ".xsl";
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load(xmlFile)
// Load the XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load(xslFile)
// Transform
var msg = xml.transformNode(xsl)
var fso = new ActiveXObject("Scripting.FileSystemObject");
// Open the text file at the specified location with write mode
var txtFile = fso.OpenTextFile("Output.xml", 2, false, 0);
txtFile.Write(msg);
txtFile.close();
It creates the output in a new file "Output.xml", but I don't know why the encoding is getting changed. I am more concerned about it, because of the following reason:
My input XML containg the following code:
<Status></Status>
And in the output it appears as
<Status>
</Section>
A carriage return is introduced for all empty tags. I am not sure, if it has something to do with the encoding. Please suggest.
Many Thanks.
Remove your second template rule. The first template rule (the identity rule) will already copy attributes for you. By including the second one (which has the explicit <xsl:attribute> instruction), you're creating a conflict--an error condition, and the XSLT processor is recovering by picking the one that comes later in your stylesheet. The reason the "id" attribute is empty is that your second rule is creating a new attribute with the same name but with no value. But again, that second rule is unnecessary anyway, so you should just delete it. That will solve the missing attribute value issue.
As for the output encoding, it sounds like your XSLT processor is not honoring the <xsl:output> directive you've given it, or it's being invoked in a context (such as a server-side framework?) where the encoding is determined by the framework, rather than the XSLT code. What XSLT processor are you using and how are you invoking it?
UPDATE (re: character encoding):
The save Method (DOMDocument) documentation says this:
Character encoding is based on the encoding attribute in the XML declaration, such as <?xml version="1.0" encoding="windows-1252"?>. When no encoding attribute is specified, the default setting is UTF-8.
I would try using transformNodeToObject() and save() instead of outputting to a string.
I haven't tested this, but you probably want something like this:
var result = new ActiveXObject("Microsoft.XMLDOM")
// Transform
xml.transformNodeToObject(xsl, result);
result.save("Output.xml");
UPDATE (re: unwanted whitespace):
If you want to have ultimate control over what whitespace appears in the result, you should not specify indent="yes" on the <xsl:output> element. Try removing that.
Try this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="no" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- You don't actually need this template -->
<!-- but I think this was what you were trying to do -->
<xsl:template match="#*" priority="2">
<xsl:attribute namespace="{namespace-uri()}" name="{name()}"><xsl:value-of select="."/></xsl:attribute>
</xsl:template>
<xsl:template match="AccountName" priority="2">
<AccountName>acc_no</AccountName>
</xsl:template>
</xsl:stylesheet>
As for the UTF issue, you are doing the right thing.
From www.w3.org/TR/xslt:
The encoding attribute specifies the preferred encoding to use for outputting the result tree. XSLT processors are required to respect values of UTF-8 and UTF-16.