Select distinct-values according to child node in XQuery - duplicates

Let's say I have the following XML:
<info>
<channel>
<A>
<X>
<title>title1</title>
</X>
<Y value="20"/>
</A>
</channel>
<channel>
<A>
<X>
<title>title1</title>
</X>
<Y value="20"/>
</A>
<A>
<X>
<title>title2</title>
</X>
<Y value="20"/>
</A>
</channel>
</info>
and the following XQuery
{
for $A in doc('test.xml')//A
let $TITLE := $A/X/title
where string($A/Y/value) > 20
return
string($TITLE)
}
this, of course, outputs:
title1
title1
title2
How can I use distinct-values in order to remove duplicates? I wonder because for essentially only gives me one item per iteration and I can't call distinct-values on $A. Or is there any other way to remove duplicate output?
The problem is that I need to refer to another node, so basically calling distinct-values(doc...) doesn't work, as it doesn't return nodes.

UPDATE
to filter duplicate nodes, use a variation of the xpath from this answer:
//A[index-of(//A/X/title, X/title)[1]]
this gives you all the As with different titles.
you can expand this xpath expression to also filter on Y - no need for XQuery FLWOR.
UPDATE END
apply the distinct-values to the xpath expression over which you want to iterate:
for $title in distinct-values(doc('test.xml')//A/X/#title)
return string($title)
or just
distinct-values(doc('test.xml')//A/X/#title)

Related

Why is contains(text(), "string" ) not working in XPath?

I have written this expression //*[contains(text(), "Brand:" )] for the below HTML code.
<div class="info-product mt-3">
<h3>Informazioni prodotto</h3>
Brand: <span class="brand_title font-weight-bold text-uppercase">Ava</span><br> SKU: 8002910009960<br> Peso Lordo: 0.471 kg <br> Dimensioni: 44.00 × 145.00 × 153.00 mm<br>
<p class="mt-2">
AVA BUCATO A MANO E2 GR.380</p>
</div>
The xpath that I have written is not working I want to select Node that contains text Brand:. Can someone tell me my mistake?
Your XPath,
//*[contains(text(), "Brand:")]
in XPath 1.0 will select all elements whose first text node child contains a "Brand:" substring. In XPath 2.0 it is an error to call contains() with a sequence of more than one item as the first argument.
This XPath,
//*[text()[contains(., "Brand:")]]
will select all elements with a text node child whose string value contains a "Brand:" substring.
See also
XPath 1.0 vs 2.0+ different contains() behavior explanation
Testing text() nodes vs string values in XPath

(Reverse) Traverse XPath Query for Accessing a DIV with a particular Text Value

Working with a DOM that has the same HTML loop 100+ times that looks like this
<div class="intro">
<div class="header">
<h1 class="product-code"> <span class="code">ZY001</span> <span class="intro">ZY001 Title/Intro</span> </h1>
</div>
<div>
<table>
<tbody>
<tr>
<td>Available</td>
<td> S </td>
<td> M </td>
<td> XL </td>
</tr>
I was previously using this XPath Query to get ALL the node values back (all 100+ instances of the DOM Query in connection with the variable nodes that may contain in Available
//div[#class='intro']/div/table/tbody/tr/td[contains(text(),'Available')]/following-sibling::td
object(DOMNodeList)[595]
public 'length' => int 591
Now I am needing to target the product-code / code specifically to retrieve all the td attributes for a particular code
Because the div that contains the unique identifier (in the example above, ZY001) is not a direct ancestor, my thinking is I have to do a Reverse XPath Query
Here's one of my attempts:
//h1[#class='product-code']/span[contains(#class, 'code') and text() = 'ZY001']/../../div[#class='intro']/div/table/tbody/tr/td[contains(text(),'Available')]/following-sibling::td
As I am defining /span[contains(#class, 'code') and text() = 'ZY001'] and then attempting to traverse the dom backwards twice using /../../ I was hoping/expecting to get back the div[#class='intro'] with the text ZY001 immediately above it, or rather a public 'length' => int 1
But all my attempts thus far have resulted in 0 results. Not false, indicating an improper XPath, but 0.
How can I modify my XPath Query to get back the single instance in the one-of-many <div class="intro">'s that contain the <h1 class="product-code">/<span class="code"> text value ZY001?
Use
//h1[#class='product-code']/span[contains(#class, 'code') and text() = 'ZY001']/../../../div/table/tbody
instead of
//h1[#class='product-code']/span[contains(#class, 'code') and text() = 'ZY001']/../../div[#class='intro']/div/table/tbody
You can use any of the below xpath's for that:
//div[#class='intro' and //h1[#class='product-code']/span[#class='code' and text()='ZY001']]//tbody/tr[td[text()='Available']]/td[2]
//div[#class='intro' and //span[#class='code' and text()='ZY001']]//tbody/tr[td[text()='Available']]/td[2]
//div[#class='intro' and //span[#class='code' and text()='ZY001']]//tr[td[text()='Available']]/td[2]
Change td[2] to td[3] and td[4] to get the 3rd and 4th td respectively

How can I get the first item in a list using free marker?

I have the following code which contains about 12 items but I only need to retrieve the first item. How can I display the first item in my list?
My code is:
<#list analysttest.rss.channel.item as item>
<div>
<h3 class="bstitle">${item.title}</h3>
<span class="bsauthor">${item.author}</span>
<span>${item.pubDate}</span>
<p>${item.description}</p>
</div>
</#list>
analysttest.rss.channel.item[0] gives the fist item, which you can #assign to a shorther name for convenience. Note that at least 1 item must exist, or else you get an error. (Or, you can do something like <#assign item = analysttest.rss.channel.item[0]!someDefault>, where someDefault is like '', [], {}, etc, depending on what you need. There's even a shorter <#assign item = analysttest.rss.channel.item[0]!> form, which uses a multi-typed "generic nothing" value as the default... see in the Manual.)
Listing is also possible, though odd for only one item: <#list analysttest.rss.channel.item[0..*1] as item>, where *1 means at most length of 1 (requires FreeMarker 2.3.21 or later). This works (and outputs nothing) even if you have 0 items.
<#assign item = analysttest.rss.channel.item[0]>
<div>
<h3 class="bstitle">${item.title}</h3>
<span class="bsauthor">${item.author}</span>
<span>${item.pubDate}</span>
<p>${item.description}</p>
</div>

How to get the position of a node based on the value of a child element

In my xml I want specific menuitems/menuitem nodes that will be at different arbitrary positions under its parent (I don't want hardcoded position selector).
Is it possible to get the position of a menuitem node that has the right value in the name element under it, meaning menuitems/menuitem/name. In short: selecting the menuitem that has the right name value under it.
<one>
<menuitems>
<menuitem> <!-- I dont want this one -->
<name>
...
</name>
</menuitem>
<menuitem> <!-- I want this one at position 2 under <one> -->
<name>
... <!-- Based one correct name value here -->
</name>
</menuitem>
</menuitems>
</one>
<two>
<menuitems>
<menuitem> <!-- I want this one at position 1 under <two> -->
<name>
...
</name>
</menuitem>
</menuitems>
</two>
I can easily find out if one menuitem under menuitems has the correct name value. Like so:
<xsl:value-of select="current()/menuitems/menuitem/name = 'OhYes'"></xsl:value-of>
Which will return true. But at which position is this menuitem amongs other menuitem that returned true? Selecting under the same parent and at the same level.
I want to avoid this:
<xsl:if test="current()/menuitems/menuitem[1]/name = 'OhYes'"> .. </xsl:if>
<xsl:if test="current()/menuitems/menuitem[2]/name = 'OhYes'"> .. </xsl:if>
Use this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="/*/*/menuitems/menuitem[name='OhYes']">
position: <xsl:text/>
<xsl:value-of select="count(preceding-sibling::menuitem) +1"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
**When this transformation is applied on the following XML document:
...
-->
OhYes
-->
OhYes
the wanted, correct result is produced:
position: 2
position: 1
You can first seperating out one and two containers using getElementsByTagName. The for each section (one, two) try iterating through all the names by creaing an array using getElementsByTagName, and then checking each element to see if the name is correct. Since each menuitem position corresponds to the name position under each heading (one, two), you can use the .parentNode accessor on each matching name node to return the menuitem node, and the index of the iterator to return its position.
For example, for the XML Code
<one>
<menuitems>
<menuitem> <!-- I dont want this one -->
<name>
the incorrect name
</name>
</menuitem>
</menuitems>
<menuitems>
<menuitem> <!-- I want this one at position 2 -->
<name>
the correct name <!-- Based one correct name value here -->
</name>
</menuitem>
</menuitems>
</one>
<two>
<menuitems>
<menuitem> <!-- I want this one at position 1 -->
<name>
the correct name
</name>
</menuitem>
</menuitems>
</two>
The following code alerts the index of the name nodes(and parent menuitem) under the current container (one, two) that cotnain the correct name text in the tag labeled "the correct name"
names = new Array();
// Find the correct position for the menuitem under one
one = document.getElementsByTagName("one")[0];
names = one.getElementsByTagName("name");
for (var i=0; i<names.length; i++){
if (names[i].innerHTML.search("the correct name") >= 0)
alert("For one: Correct name found at name node index " + (i+1) + " and its parent menuitem is " + names[i].parentNode);
// names[i].parentNode is the reference to the menuitem in one that contains the correct name
}
// Find the correct position for the menuitem under two
two = document.getElementsByTagName("two")[0];
names = two.getElementsByTagName("name");
for (var i=0; i<names.length; i++){
if (names[i].innerHTML.search("the correct name") >= 0)
alert("For two: Correct name found at name node index " + (i+1) + " and its parent menuitem is " + names[i].parentNode);
// names[i].parentNode is the reference to the menuitem in one that contains the correct name
}
I have also created a fiddle at http://jsfiddle.net/nMN3j/1/ so you can try it out, and see how it works.
This code will alert that under container "one" a menuitem tagwas found at position 2 that has a correct name sub-tab, but not at position 1, which has an incorrect name sub-tag.
For container "two", the code will alert that a menuitem was found at position 1 that has a correct name.
Hope this helps!

Extract data from html/xml

I'm using Webharvest to retrieve data from websites. It converts the html pages to xml documents before getting for me the wanted data based on the xPath provided.
Now I'm working on a page like this: pastebin Where I showed the blocks I'd like to get. Each block should be returned as a single unit.
the xPath the first element of the block is: //div[#id="layer22"]/b/span[#style="background-color: #FFFF99"]
I tested it and it gives all "bloc start" elements.
the xPath of the last element of the block is: //div[#id="layer22"]/a[contains(.,"Join")]
I tested it and it gives all the "bloc end" elements.
The xPath should return a set of blocks as:
(xPath)[1] = block 1
(xPath)[2] = block 2
....
Thank you in advance
Use (for the first wanted result):
($first)[1] | ($last)[1]
|
($first)[1]/following::node()
[count(.|($last)[1]/preceding::node()) = count(($last)[1]/preceding::node())]
where you need to substitute $first with:
//div[#id="layer22"]/b/span[#style="background-color: #FFFF99"]
and substitute $last with:
//div[#id="layer22"]/a[contains(.,"Join")]
To get the k-th result, substitute in the final expression ($first)[1] with ($first)[{k}] and ($last)[1] with ($last)[{k}], where {k} should be replaced by the number k.
This technique follows directly from the well-known Kayessian formula for set intersection in XPath 1.0:
$ns1[count(.|$ns2) = count($ns2)]
which selects the intersection of the two node-sets $ns1 and $ns2 .
Here is XSLT verification with a simple example:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>03</num>
<num>07</num>
<num>10</num>
</nums>
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="v1" select=
"(//num[. = 3])[1]/following-sibling::*"/>
<xsl:variable name="v2" select=
"(//num[. = 7])[1]/preceding-sibling::*"/>
<xsl:template match="/">
<xsl:copy-of select=
"$v1[count(.|$v2) = count($v2)]"/>
</xsl:template>
</xsl:stylesheet>
applies the XPath expression and the selected nodes are copied to the output:
<num>04</num>
<num>05</num>
<num>06</num>