Troubleshooting XPath Expression to select nodes based on child node - html

NB - This question is very similar to the other one I asked - Xpath Expression to select nodes based on presence of child node? - however, I'm trying to extend it, and failing.
I have a HTML page listing products.
I'm trying to use Xpath to distinguish between available and sold-out products.
Available products look like this:
<div class="product-widget-container">
<article itemscope="" itemtype="http://schema.org/Product" class="product grid_4 full space omega large " data-productid="1996364" data-name="Daily Wrinkle Defence Essential Skin Reviver Cream Cleanser - 100ml" data-actual-price="5.99" data-is-available="true" data-low-stock="" data-popularity="6" data-smallimgsrc="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996364_94d4a520-7e4a-11e3-930f-000c29c9a057_image_310x434.JPG" data-largeimgsrc="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996364_94d4a520-7e4a-11e3-930f-000c29c9a057_image_310x434.JPG" data-sizes="[]" data-available-sizes="[]" data-categories="[119977]" data-brand="That Natural Source" data-discount="83" data-default-order="9">
<figure>
<div class="product-img-container ">
<img itemprop="image" class="lazy product-img" src="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996364_94d4a520-7e4a-11e3-930f-000c29c9a057_image_310x434.JPG" data-original="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996364_94d4a520-7e4a-11e3-930f-000c29c9a057_image_310x434.JPG" alt="Up to 85% off Summer Looks Daily Wrinkle Defence Essential Skin Reviver Cream Cleanser - 100ml " style="display: inline;">
<span class="arrow arrow-up"></span>
<div class="quick-buy" style="display: none;">
<span class="arrow-down-trans"></span>
<div class="select-size">
<form class="express-buy" action="/basket/add/1996364/" method="post">
<input type="hidden" id="id_quantity_1996364" class="purchase-quantity" name="quantity" value="1">
<input type="hidden" value="" name="addbasket.x">
<span>
<input class="add-to-basket btn btn-primary btn-large " type="submit" value="ADD TO BASKET">
</span>
</form>
</div>
</div>
</div>
<a itemprop="url" class="overlay-link" href="/event/outlet/up-to-off-summer-looks/1996364-daily-wrinkle-defence-essential-skin-reviver-cream-cleanser-100ml/" title="Daily Wrinkle Defence Essential Skin Reviver Cream Cleanser - 100ml"></a>
<figcaption>
<h2 itemprop="name" class="mason name">
That Natural Source: Daily Wrinkle Defence Essential Skin Reviver Cream Cleanser - 100ml
</h2>
<small itemprop="brand" class="bed"> Up to 85% off Summer Looks</small>
<small class="bed shoes-price">
$5.99
<del>$34.95 RRP</del>
<span class="discount">(83% discount)</span>
</small>
</figcaption>
</figure>
</article>
</div>
Sold-out products look like this:
<div class="product-widget-container">
<article itemscope="" itemtype="http://schema.org/Product" class="product grid_4 full space omega large " data-productid="1996526" data-name="#T58 When Monkeys Fly! - Oz The Great And Powerful Collection By OPI" data-actual-price="10.99" data-is-available="" data-low-stock="true" data-popularity="1" data-smallimgsrc="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996526_d0402efe-7e4a-11e3-930f-000c29c9a057_image_310x434.jpg" data-largeimgsrc="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996526_d0402efe-7e4a-11e3-930f-000c29c9a057_image_310x434.jpg" data-sizes="[]" data-available-sizes="[]" data-categories="[119968]" data-brand="OPI" data-discount="0" data-default-order="39">
<div class="stock-status be_sprites sold-out">Sold Out</div>
<figure>
<div class="product-img-container ">
<img itemprop="image" class="lazy product-img" src="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996526_d0402efe-7e4a-11e3-930f-000c29c9a057_image_310x434.jpg" data-original="https://staging.foo.com.au/site_media/uploads/product_image/2014/1/16/pd1996526_d0402efe-7e4a-11e3-930f-000c29c9a057_image_310x434.jpg" alt="Up to 85% off Summer Looks #T58 When Monkeys Fly! - Oz The Great And Powerful Collection By OPI " style="display: inline;">
<span class="arrow arrow-up"></span>
</div>
<a itemprop="url" class="overlay-link" href="/event/outlet/up-to-off-summer-looks/1996526-t58-when-monkeys-fly-oz-the-great-and-powerful-collection-by-opi/" title="#T58 When Monkeys Fly! - Oz The Great And Powerful Collection By OPI"></a>
<figcaption>
<h2 itemprop="name" class="mason name">
Opi: #T58 When Monkeys Fly! - Oz The Great And Powerful Collection By OPI
</h2>
<small itemprop="brand" class="bed"> Up to 85% off Summer Looks</small>
<small class="bed shoes-price">
$10.99
</small>
</figcaption>
</figure>
</article>
</div>
I was thinking I can go on either the "sold-out" class on the , or the Sold Out text within it.
I've tried all of the following, and none of them seem to work - they all give me the full set of products:
//div[#class="product-widget-container" and not(div[#class="stock-status be_sprites sold-out"])]
//div[#class="product-widget-container" and not(div[contains(#class, "sold-out")])]
//div[#class="product-widget-container" and not(div[contains(., "Sold Out")])]
Any thoughts on what I'm doing wrong in my XPath expression?
Cheers,
Victor

Your expressions have the right idea, but you don't need to nest [ ] brackets. Once you open them, you are in a conditional statement: everything you write will be part of the statement. So when you want to check an attribute of a child node, you just need to select it: node[child/#attribute].
You also need to check for the div at any depth since it isn't the first child node. If you write div[div/#class="foo"], you are checking for <div><div class="foo"></div></div>. If you write div[.//div/#class="foo"], you are checking for <div><anything><bar><div class="foo"></div></bar></anything></div>.
Something like
//div[#class="product-widget-container" and not(.//div/#class="stock-status be_sprites sold-out")]
should work !

try
//div[#class='product-widget-container' and not(#class='stock-status be_sprites sold-out')]
you should remove div[ and ] in the predicate

Related

Selenium - What strategies for get link?

I have many web elements like this
<a data-control-name="browsemap_profile" href="/in/quyen-nguyen-63098b123/" id="ember278" class="pv-browsemap-section__member ember-view"> <img width="56" src="https://media-exp1.licdn.com/dms/image/C5603AQFHZ41UPexTLQ/profile-displayphoto-shrink_100_100/0?e=1599091200&v=beta&t=lkoiKVK58W1tciUEc5UUohvEsa99lLTv66a1PJ4hp5k" loading="lazy" height="56" alt="member_name" id="ember279" class="lazy-image pv-browsemap-section__member-image EntityPhoto-circle-4 ember-view">
<div class="pv-browsemap-section__member-detail">
<h3 id="ember280" class="pv-browsemap-section__member-detail--has-hover actor-name-with-distance ember-view"> <span class="name-and-icon"><span class="name">Quyen Nguyen</span>
<span class="distance-and-badge">
<span data-test-distance-badge="" id="ember281" class="distance-badge separator ember-view"><span class="visually-hidden">
2nd degree connection
</span>
<span class="dist-value">2nd</span>
</span><!----> </span>
</span>
</h3>
<p class="pv-browsemap-section__member-headline t-14 t-black t-normal">
<div style="line-height:2rem;max-height:4rem;-webkit-line-clamp:2;" id="ember282" class="inline-show-more-text inline-show-more-text--is-collapsed inline-show-more-text--is-collapsed-with-line-clamp ember-view">I'm looking for IT Director/Admissions Director/Training Head/Marketing Director
<!---->
<!----></div>
</p>
</div>
</a>
I want to get a list of data like this /in/quyen-nguyen-63098b123/? How many way to select then get this data?
I also want to get a list of id in pattern: ember278, ember279 , ember238 , etc.
Use
IEnumerable<IWebElement> connectionBlocks = driver.FindElements(By.XPath("//a[#id[starts-with(., 'ember') and string-length() > 5]]"));
Then use regular expression for next parsing.

CSS display:inline is not working

I saw that there are many questions looking like this one but can't find a solution yet.
My HTML code:
<p class="ref" style="display:inline">
<p class="mini-caps">albums</p>:
Scum
(1987) ;
Bootlegged in Japan
(1998)
</p>
<p class="ref">
<p class="mini-caps">compilation </p>:
Noise for Music’s Sake
(2 CD, 2003)
</p>
<p class="ref">
<p class="mini-caps">album</p>:
Illmatic
(1994)
</p>
I tried to style p.ref with display:inline with no success.
The output I would like to have:
albums : Scum (1987) ; Bootlegged in Japan (1998)
compilation : Noise for Music’s Sake (2 CD, 2003)
album : Illmatic (1994)
why the use of the p tags? You can properly do this with div's and span's.
<div class="ref">
<span class="mini-caps">albums</span>:
<span>Scum (1987) ; Bootlegged in Japan (1998)</span>
</div>
<div class="ref">
<span class="mini-caps">compilation </span>:
<span>Noise for Music’s Sake (2 CD, 2003)</span>
</div>
<div class="ref">
<span class="mini-caps">album</span>:
<span>Illmatic (1994)</span>
</div>
See https://jsfiddle.net/1r9Lu6y3/6/
Hope this helps
BTW: Illmatic album is sooo good!

Microdata for organization

I'm trying to do the right mikrodaty layout for the site. Do I understand how to act . For example I go to https://schema.org/Organization and go through the list to the bottom . Opt for each tag (name, adress, streetAddress and so on ) . If possible, I need to fill out all the tags , am I right?Now I got something like this:
<div itemscope itemtype="http://schema.org/Organization">
<div class="historyb" id="historyb">
<span class="historyh2" itemprop="name">MyName</span>
<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">
Street & number of house
</span>
<span itemprop="addressLocality">City</span>,
<span itemprop="postalCode">111111</span>
</div>
<a href="mailto:MyEmail#gmail.com" itemprop="email">
MyEmail#gmail.com
</a>
<span itemprop="foundingDate">Date</span>
<span itemprop="foundingLocation">City</span>
<span itemprop="legalName">Full Name</span>
<a itemprop="url" href="#"><img itemprop="logo" src="http://www.mysite.ru/images/logo.png" /></a>
</div>
This is valid according to Google's test tool but you might want to include addressCountry and addressRegion for https://schema.org/Place

How to get plain text with Xpath

Hi I got this piece of html and i want to get text elements from it
<span id="product_description" itemprop="description" class="">
<h1>Toltec Lighting 216-BRZ-508 Leaf Collection Traditional Potrack With Italian Marble Glass In Bronze</h1>
<br class="">
<span style="font-weight: bold;" class="">MANUFACTURE: </span>
Toltec Lighting
<br class=" xh-highlight">
<span style="font-weight: bold;" class="">COLLECTION: </span>
Leaf
<br class=" xh-highlight">
</span>
I want to get list of values. In this case it will be "Toltec Lighting" and "Leaf"
You can try this :
//span[#id='product_description']/text()
or if you need to also make sure no empty text nodes selected :
//span[#id='product_description']/text()[normalize-space()]
You may try using this:
//*[text()='Toltec Lighting']

Multiline regular expression in Autoit3

I am trying to match multi-line HTML source code with a regular expression (using AutoIt). HTML source code to match:
<li class="mission">
<div>
<div class="missionTitle">
<h3>Eat a quarter-pounder with cheese</h3>
<div class="missionProgress">
<span>100%</span>
<div class="missionProgressBar" style="width: 100%;"></div>
</div>
</div>
<div class="missionDetails">
<ul class="missionRewards">
<li class="rewardCash">5,000–8,000</li>
<li class="rewardXP">XP +5</li>
</ul>
<div class="fightItems clearfix">
<h5><span>Prerequisites:</span></h5>
<div class="fightItemsWrap">
<div class="fightItem tooltip" title="Sunglasses" data-attack="Attack: 2" data-defence="Defence: 2">
<img src="/img/enhancement/3.jpg" alt="">
<span>× 1</span>
</div>
<div class="fightItem tooltip" title="Broad Shoulders" data-attack="Attack: 0" data-defence="Defence: 3">
<img src="/img/enhancement/1003.jpg" alt="">
<span>× 1</span>
</div>
<div class="fightItem tooltip" title="Irish Fond Anglia" data-attack="Attack: 4" data-defence="Defence: 8">
<img src="/img/enhancement/2004.jpg" alt="">
<span>× 1</span>
</div>
</div>
</div>
<form action="/quest/index/i/kdKJBrgjdGWKqtfDrHEkRM2duXVn1ntH/h/c0b2d58642cd862bfad47abf7110042e/t/1336917311" method="post">
<input type="hidden" id="id" name="id" value="17"/>
<button class="button buttonIcon btnEnergy"><em>5</em></button>
</form>
</div>
</div>
</li>
It is present multiple times on a single page (but items within <div class="fightItems clearfix">...</div> vary).
I need to match
<h3>Eat a quarter-pounder with cheese</h3>,
the first span <span>100%</span> and
<input type="hidden" id="id" name="id" value="17"/>.
Expected result (for every occurrence on a page):
$a[0] = "Eat a quarter-pounder with cheese"
$a[1] = "100%"
$a[2] = "17"
What I came up with:
(?U)(?:<div class="missionTitle">\s+<h3>(.*)</h3>\s+<div class="missionProgress">\s+<span>(.*)</span>)|(?:<form .*\s+.*<input\stype="hidden"\sid="id"\sname="id"\svalue="(\d+)"/>\s+.*\s+</form>)
But that leaves some array-items empty. I also tried the (?s) flag, but then it only captures first occurrence (and stops matching after).
I had not to use . to match words or integers, because of the (?s) flag. The correct regex is:
(?U)(?s)<div class="missionTitle">\s+<h3>([\w\s]+)</h3>(?:.*)<div class="missionProgress">\s+<span>(\d+)%</span>(?:.*)<input.* value="(\d+)"/>
Regular expression to match multi-line HTML source code:
As per documentation;
\R matches newline characters (?>\r\n|\n|\r),
dot . does not (unless (?s) is set).
\s matches white space characters.
Generally some combination is required (like \R\s*?).
Non-capturing groups are redundant (match without capturing instead).
If uniquely enclosed, single characters may be excluded instead (like attribute="([^"]*?)" for text between double-quotes).
Example (contains double-quotes; treat as per Documentation - FAQ - double quotes):
(?s)<div class="missionTitle">.*?<h3>(.*?)</h3>.*?<div class="missionProgress">.*?<span>([^<]*?)</span>.*?<input type="hidden" id="id" name="id" value="([^"]*?)"/>
Visual explanation:
If regular expressions should be used on HTML (beyond simple listings like this) is a different question (been, done, T-shirt).