XPath choose one if both exist - html

I have something like
<li class="ProductPrice">
<span class="Regular Price">80.00</span>
<span class="Sale Price">50.00</span>
</li>
<li class="ProductPrice">
<span class="Regular Price">100.00</span>
</li>
where some items might not have the Sale Price span.
I would like to extract the current retail price, where I would select the Sale Price or if both Sale and Regular Price exist, choose Sale Price only.
I'm new to XPath, so I'm not sure how this if-else could be translated.

I would like to extract the current retail price, where I would select
the Sale Price or if both Sale and Regular Price exist, choose Sale
Price only.
If you know that the Sale price will always come after the Regular price, use the XPath expression
span[#class = 'Regular Price' or #class = 'Sale Price'][last()]
In XPath 2.0 you can use this approach even if you don't know the order:
(span[#class = 'Sale Price'], span[#class = 'Regular Price'])[1]

An XSLT-1.0 solution is a bit more complicated:
<xsl:template match="li">
<xsl:choose>
<xsl:when test="span/#class='Sale Price'">
<xsl:value-of select="span[#class='Sale Price']" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="span[#class='Regular Price']" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
So if you have XPath-2.0 or above available you could use the following:
<xsl:template match="li">
<xsl:value-of select="if (span/#class='Sale Price') then (span[#class='Sale Price']) else (span[#class='Regular Price'])" />
</xsl:template>
The output of both solutions is the same:
<?xml version="1.0" encoding="UTF-8"?>
50.00
100.00
The logic is not exactly what you desired, but it is close to.

You can use lxml module in python. It is by far the easiest module which I have used.
from lxml import html
data = '''<li class="ProductPrice">
<span class="Regular Price">80.00</span>
<span class="Sale Price">50.00</span>
</li>
<li class="ProductPrice">
<span class="Regular Price">100.00</span>
</li>
'''
#make the html object
tree = html.fromstring(data)
li = tree.xpath('//li') #get all the li tags
for i in li:
sp = i.xpath('.//span[contains(#class,"Sale Price")]/text()')
rp = i.xpath('.//span[contains(#class,"Regular Price")]/text()')
if sp:
print('Price is :',sp[0])
else:
print('Price is :',rp[0])
What I did was, to extract sale price and check if it is there. If it is present then program will print that else program will print regular price.
Note
Remember to put (.) in the xpath when playing with seperate elements.
The result of the xpath always comes as list.
For any query do put up a comment.

Related

XSL: How to concat additional string when setting a HTML table class

Is there a way to concat some name to a class with a variable?
<table style="display:none;border-style:solid;">
<xsl:attribute name="class">
<xsl:value-of select="BookName"/>
</xsl:attribute>
This piece of my code would name a class by a value of "BookName" from XML, but I need somehow to concat it with just a static text "booktable", meaning that BookName would be some value, but booktable is always static text for example result would be class="NewEncouters2009 booktable"
Simply use
<table style="display:none;border-style:solid;" class="{BookName} booktable">
As for xsl:attribute, you can of course put in any static text there e.g.
<xsl:attribute name="class"><xsl:value-of select="BookName"/> booktable</xsl:attribute>

how to separate a value from its node and group the node with others of its type

End result I need is to have all the text nodes have the same indent. The #name field is not constant size. Parentnode has a varying number of children that must be parsed in the order received. possibleothernodes are not explicitly ordered in all cases.
XML:
<parentnode>
<possibleothernodes1...n/>
<node name="SomeBoldText">
<text>Text1</text>
</node>
<node>
<text>Text2</text>
</node>
<node>
<text>Text3</text>
</node>
<node>
<text>Text4</text>
</node>
<possibleothernodes2...n/>
</parentnode>
I need the resulting HTML to look like
possibleothernodes1
SomeBoldText: Text1
Text2
Text3
Text4
possibleothernodes2
My real goal right now is how do I group Text1,Text2, Text3, Text4 into one div tag, and the #name into a different div tag? With two divs I can just float them to where they need to be.
How about something like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:template match="parentnode">
<div>
<h1>
<xsl:value-of select="node/#name"/>
</h1>
<div>
<xsl:apply-templates select="node/text" />
</div>
</div>
</xsl:template>
<xsl:template match="node/text">
<div>
<xsl:value-of select ="."/>
</div>
</xsl:template>
</xsl:stylesheet>
When run on your sample input, the result is:
<div>
<h1>SomeBoldText</h1>
<div>
<div>Text1</div>
<div>Text2</div>
<div>Text3</div>
<div>Text4</div>
</div>
</div>
ok, this is totally untested, and before coffee hits but hopefully it will be a good start using some JQuery:
$(function)(){
$.ajax({
url:"../folder/nameoffile.xml",
dataType:"xml",
success:function(xml){
$(xml).find("node").each(function(){
var sideName = $(this).attr("name");
$("#idofSideDiv").append(sideName);
var myNode = $(this).find("node").text();
// ok, this is assuming that you have an ul list
// to contain the text defined in the node
$("#idofULList").append("<li>"+myNode+"</li>");
})
}
});
});
Like I said - totally untested, and probably some syntax errors - but hopefully it can be a good start for a google search.

Extract data from html/xml

I'm using Webharvest to retrieve data from websites. It converts the html pages to xml documents before getting for me the wanted data based on the xPath provided.
Now I'm working on a page like this: pastebin Where I showed the blocks I'd like to get. Each block should be returned as a single unit.
the xPath the first element of the block is: //div[#id="layer22"]/b/span[#style="background-color: #FFFF99"]
I tested it and it gives all "bloc start" elements.
the xPath of the last element of the block is: //div[#id="layer22"]/a[contains(.,"Join")]
I tested it and it gives all the "bloc end" elements.
The xPath should return a set of blocks as:
(xPath)[1] = block 1
(xPath)[2] = block 2
....
Thank you in advance
Use (for the first wanted result):
($first)[1] | ($last)[1]
|
($first)[1]/following::node()
[count(.|($last)[1]/preceding::node()) = count(($last)[1]/preceding::node())]
where you need to substitute $first with:
//div[#id="layer22"]/b/span[#style="background-color: #FFFF99"]
and substitute $last with:
//div[#id="layer22"]/a[contains(.,"Join")]
To get the k-th result, substitute in the final expression ($first)[1] with ($first)[{k}] and ($last)[1] with ($last)[{k}], where {k} should be replaced by the number k.
This technique follows directly from the well-known Kayessian formula for set intersection in XPath 1.0:
$ns1[count(.|$ns2) = count($ns2)]
which selects the intersection of the two node-sets $ns1 and $ns2 .
Here is XSLT verification with a simple example:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>03</num>
<num>07</num>
<num>10</num>
</nums>
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="v1" select=
"(//num[. = 3])[1]/following-sibling::*"/>
<xsl:variable name="v2" select=
"(//num[. = 7])[1]/preceding-sibling::*"/>
<xsl:template match="/">
<xsl:copy-of select=
"$v1[count(.|$v2) = count($v2)]"/>
</xsl:template>
</xsl:stylesheet>
applies the XPath expression and the selected nodes are copied to the output:
<num>04</num>
<num>05</num>
<num>06</num>

xsl generate-id() function returns same id twice for different nodes

I have an input xml for a transformation like ;
<?xml version="1.0" encoding="UTF-8" ?>
<AssetcustomerCollection xmlns="http://xmlns.oracle.com/pcbpel/adapter/db/top/somens">
<Assetcustomer xmlns="">
....
</Assetcustomer>
<Assetcustomer xmlns="">
<accountklantid>000000123456789</accountklantid>
<accountrowid>1-W8HQ1J</accountrowid>
<adrestypeaccnt/>
<adrestypecon/>
<assetbankcode>1173</assetbankcode>
<assetnumber>0000001234</assetnumber>
<assetprodcode>1200</assetprodcode>
<assetproduct>Overeenkomst Rekening-courant</assetproduct>
<assetproductlocatie>00</assetproductlocatie>
<assetstatus>Actief</assetstatus>
<assetsubstatus>Lopende rekening</assetsubstatus>
<assettypecode>0010</assettypecode>
<contactklantid/>
<contactrowid/>
<primairaccount>Y</primairaccount>
<primaircontact>N</primaircontact>
<reltypeaccnt>Hoofdcontractant</reltypeaccnt>
<reltypecon/>
<rowidasset>1-X3XBMO</rowidasset>
<rowidassetaccnt>1-X3XBMQ</rowidassetaccnt>
<rowidassetcon/>
<tnsidaccnt/>
<tnsidcon/>
</Assetcustomer>
<Assetcustomer xmlns="">
....
</Assetcustomer>
<Assetcustomer xmlns="">
<accountklantid/>
<accountrowid/>
<adrestypeaccnt/>
<adrestypecon/>
<assetbankcode>1173</assetbankcode>
<assetnumber>0000004321</assetnumber>
<assetprodcode>1201</assetprodcode>
<assetproduct>WereldPas (Zakelijk)</assetproduct>
<assetproductlocatie>00</assetproductlocatie>
<assetstatus>Actief</assetstatus>
<assetsubstatus>Lopende rekening</assetsubstatus>
<assettypecode>0003</assettypecode>
<contactklantid>000000987654321</contactklantid>
<contactrowid>1-X17PLM</contactrowid>
<primairaccount>N</primairaccount>
<primaircontact>Y</primaircontact>
<reltypeaccnt/>
<reltypecon>Pasverantwoordelijke</reltypecon>
<rowidasset>1-X3XBN0</rowidasset>
<rowidassetaccnt/>
<rowidassetcon>1-X3XBNE</rowidassetcon>
<tnsidaccnt/>
<tnsidcon/>
</Assetcustomer>
<Assetcustomer xmlns="">
....
</Assetcustomer>
</AssetcustomerCollection>
When transforming this input xml i got an unexpected output (15 of the 16 input Assetcustomer nodes were transformed) I now have found the cause, but cannot explain why it occurs;
The following transformation returns the same id twice;
<xsl:element name="A">
<xsl:value-of select="generate-id(key('AssetRowIDs',/ns0:AssetcustomerCollection/Assetcustomer[rowidasset = '1-X3XBMO']/*)[1])"/>
</xsl:element>
<xsl:element name="B">
<xsl:value-of select="generate-id(key('AssetRowIDs',/ns0:AssetcustomerCollection/Assetcustomer[rowidasset = '1-X3XBN0']/*)[1])"/>
</xsl:element>
<A>N10211</A>
<B>N10211</B>
While the generated id for any other node with a different rowidasset is different.
Any ideas before i start pulling my hair out ?
Peter
I do not know exactly why , but changing
<xsl:key name="AssetRowIDs" match="Assetcustomer" use="rowidasset"/>
into
<xsl:key name="AssetRowIDs" match="Assetcustomer" use="concat('-',rowidasset,'-')"/>
and
<xsl:for-each select="/ns0:AssetcustomerCollection/Assetcustomer[generate-id() = generate-id(key('AssetRowIDs',rowidasset)[1])]">
into
<xsl:for-each select="/ns0:AssetcustomerCollection/Assetcustomer[generate-id() = generate-id(key('AssetRowIDs',concat('-',rowidasset,'-'))[1])]">
Seems to generate a unique id for each node, still bugging me dat i do not understand the cause of it.
Check the namespace? If the ns0 prefix is bound to a wrong namespace URI, your query will in both cases yield an empty result set. Together with the same first argument for key, that, I imagine, will yield the same call to key() and thus the same ID.
Also I don't think the key() function does what you think it does: http://www.w3schools.com/xsl/func_key.asp
In any case you can apply generate-id() directly on the node set for which you wish to calculate the ID.

xsl any symbol code for value comparison using <xsl:when test

I was wondering what would be any symbol code?
<xsl:when test="path/path1 = '(ANYSYMBOL)1' ">
this code alows us to check if some values equals X1,#1,%1,91....
so what is the anysymbol/ anychar code #xxx?
There's no such wildcards.
You have two options:
<xsl:when test="substring(path/path1, 2) = '1'">
and
<xsl:when test="matches(path/path1, '.1')">
The latter one using regexp is only XSLT 2.0 compatible.