How to exclude one child in XPath/CSS locator - html

Is there any way to get div's inner text except 'by '?
<div class="name">
"by "
<em>Some Author</em>
", "
<em>Another Author</em>
</div>

What's pretty strange about your HTML is the quotes and the whitespace between the "Some Author" <em> and the comma. I assume you mean this instead:
<div class="name">
by <em>Some Author</em>, <em>Another Author</em>
</div>
With XPath, you could fetch the nodes with a query like this:
//*[#class='name']/node()[position()>1]
If we're talking about a browser environment and you want a single string, not just a collection of nodes, you could do:
document.querySelector('div.name').textContent.replace(/^\s*by\s*/, '')

You can do it with Jquery like this... but it'll remove the "," too...
Check it out:
$(document).ready(function(){
var name = $('div').children('em');
$('div').html(name);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="name">
"by "
<em>Some Author</em>
", "
<em>Another Author</em>
</div>

You can try the expression:
//em/text()
Output:
'Some Author'
'Another Author'

You can try below XPath to get all descendant text nodes, except the first one:
(//div//text())[position() > 1]

Related

xpath text within a span within text within a div

I'm trying desperately to extract text from within a span which is within text which is within a div (underlined in the image)
This is the relevant part of the code ...
<div id="groupBlock3">
<div class="groupBlockTitle">
::before
"
ALL TEACHERES ("
<span class="activeTeachers">12</span>
" ACTIVE, "
<span class="archivedTeachers">1</span>
" ARCHIVED)
"
<div>...</div>
<div>+ enroll a teacher</div>
</div>
<div>...</div>
</div>
I can retrieve the text from within the first div with this ...
"normalize-space(//div[#id='groupBlock3']/div[1])"
... which gives me ...
'ALL TEACHERES ( ACTIVE, ARCHIVED) + enroll a teacher'
... but, try as I might I cannot get the text from within the first or second span - it just returns a null string. Please help me!!
Try one of these XPath-1.0 expressions:
normalize-space(//div[#id='groupBlock3']/div[1]/span[1]/text())
which results in 12, or, for the second span
normalize-space(//div[#id='groupBlock3']/div[1]/span[2]/text())
which results in 1.
But if you want all text of the first div, use this expression
normalize-space(string(//div[#id='groupBlock3']/div[1]))
which gives you the result
::before " ALL TEACHERES (" 12 " ACTIVE, " 1 " ARCHIVED) " ...+ enroll a teacher

HTML::ELEMENT not finding all elements

I have this snippet of html:
<li class="result-row" data="2">
<p class="result-info">
<span class="icon icon-star" role="button">
<span class="screen-reader-text">favorite this post</span>
</span>
<time class="result-date" datetime="2018-12-04 09:21" title="Tue 04 Dec 09:21:50 AM">Dec 4</time>
Link Text
and this perl code (not production, so no quality comments are necessary)
my $root = $tree->elementify();
my #rows = $root->look_down('class', 'result-row');
my $item = $rows[0];
say $item->dump;
my $date = $item->look_down('class', 'result-date');
say $date;
my $title = $item->look_down('class', 'result-title hdrlnk');
All outputs are as I expected except $date isn't defined.
When I look at the $item->dump, it looks like the time element doesn't show up in the output. Here's a snippet of the output from $item->dump where I would expect to see a <time...> element. All it shows is the text from the time element.
<li class="result-row" data="2"> #0.1.9.3.2.0
<a class="result-image gallery empty" href="https://localhost/1.html"> #0.1.9.3.2.0.0
<p class="result-info"> #0.1.9.3.2.0.1
<span class="icon icon-star" role="button"> #0.1.9.3.2.0.1.0
" "
<span class="screen-reader-text"> #0.1.9.3.2.0.1.0.1
"favorite this post"
" "
" Dec 4 "
<a class="result-title hdrlnk" data="2" href="https://localhost/1.html"> #0.1.9.3.2.0.1
.2
"Link Text..."
" "
...
I've not used HTML::Element before. I rtfmed and didn't see any tag exclusions and I did a search of the package code for tags white/black lists (which wouldn't make sense, but neither does leaving out the time tag).
Does anyone know why the time element is not showing up in the dump and any search for it turns up nothing?
As an fyi, the rest of the code searches and finds elements without issue, it just appears to be the time tag that's missing.
HTML::TreeBuilder does not support HTML5 tags. Consider Mojo::DOM as an alternative that keeps up with the living HTML standard. I can't show how your whole code would look with Mojo::DOM since you've only shown a piece, but the Mojo::DOM equivalent of look_down is find (returns a Mojo::Collection arrayref) or at (returns the first element found or undef), both taking a CSS selector.

angular - create a new line on HTML

I have a simple question (I hope this). I have a service that return a string as result. The format is something like this:
"
Test1: the association has been accepted.\nTest2: the association has been accepted.\n"
"
On the client side (I'm using Angular 1.5.x) I put that string into a object (say the variable $scope.alert.message). After that I want to print that string in a modal. My html is:
<script type="text/ng-template" id="infoTemplate.html">
<div class="modal-header left" ng-class="['div-' + alert.type, closeable ? 'alert-dismissible' : null]">
<h3 class="modal-title" id="modal-title">Info</h3>
</div>
<div class="modal-body" id="modal-body">
<img class="imm-info" ng-class="['imm-' + alert.type, closeable ? 'alert-dismissible' : null]" />
<p class="col-sm-10 col-sm-offset-2">{{alert.message}}</p><button class="col-sm-3 col-sm-offset-5 btn " ng-class="['button-' + alert.type, closeable ? 'alert-dismissible' : null]" ng-click="cancel()">OK</button>
</div>
</script>
You can see the '{{alert.message}}'. My problem is that my message "doesn't display" the character '\n'. So it doesn't create more than one line. An example here:
example
I use the white-space: pre-wrap CSS style, e.g. :
<p style="white-space: pre-wrap">{{alert.message}}</p>
Try this in HTML:
<pre>{{ alert.message }}</pre>
Already answered here:
The < pre > wrapper will print text with \n as text
\n is not interpreted in html. You need to replace these instances with <br/> elements. You could for example replace them with a regex if you do not want to change the original string.
You can write a function where you take the alert-message and split it by "\n"
than iterate trough it via *ngFor.
For example:
<p *ngFor="let msg of getMessageSplitted(alert.message)">{{msg}}</p>

Get div class title content text using xpath

I have a requirement of getting the text below of "ELECTRONIC ARTS" (this can change according to data) using class title "Offered By" (this class will be same for all) using Xpath. I tried various xpath coding, but couldn't get the results I want. I'm really looking for someone's help on this.
<div class="meta-info">
<div class="title"> Offered By</div>
<div class="content">ELECTRONIC ARTS</div> </div>
This is one possible XPath expression to starts with, which then you can simplify or add more criteria as needed (XPath formatted to be more readable) :
//div[
#class='meta-info'
and
div[#class='title' and normalize-space()='Offered By']
]/div[#class='content']
explanation :
//div[#class='meta-info' and ... : find div element where class attribute value equals "meta-info" and ...
div[#class='title' and normalize-space()='Offered By']] : ... has child element div where class attribute value equals "title" and content equals "Offered By"
/div[#class='content'] : from such div (the <div class="meta-info"> to be clear), return child element div where class attribute value equals "content"
Using the examples on Mozilla:
var xpath = document.evaluate("//div[#class='content']", document, null, XPathResult.STRING_TYPE, null);
document.write('The text found is: "' + xpath.stringValue + '".');
console.log(xpath);
<div class="meta-info">
<div class="title"> Offered By</div>
<div class="content">ELECTRONIC ARTS</div>
</div>
By the way, I think document.querySelector or document.querySelectorAll are much more convenient in this situation:
var content = document.querySelector('.meta-info .content').innerText;
document.write('The text found is: "' + content + '".');
console.log(content);
<div class="meta-info">
<div class="title"> Offered By</div>
<div class="content">ELECTRONIC ARTS</div>
</div>

Getting first level using TFHpple

I have some trouble using TFHpple, so here it is :
I would like to parse the following lines :
<div class=\"head\" style=\"height: 69.89px; line-height: 69.89px;\">
<div class=\"cell editable\" style=\"width: 135px;\"contenteditable=\"true\">
<p> 1</p>
</div>
<div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
<p>2</p>
</div>
</div>
<div style=\"height: 69.89px; line-height: 69.89px;\" class=\"head\">
<div class=\"cell\" style=\"width: 135px; text-align: left;\"contenteditable=\"false\">
<p>3 </p>
</div>
<div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
<p>4</p>
</div>
</div>
<div style=\"height: 69.89px; line-height: 69.89px;\" class=\"\">
<div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
<p>5</p>
</div>
<div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
<p>6</p>
</div>
</div>
For now I would like to put the first level of div "element" (sorry I don't know the proper terminology) in an array.
So I have tried to do it by simply giving /div as the xPath to the searchWithXPathQuery methods but it simply doesn't find anything.
My second solution was to try using a path of this kind : //div[#class=\"head\"] but also allowing [#class=\"\"] but I don't even know if it is possible.
(I would like to do so because I need the elements to be in the same order in the array as they are in the data)
So here is my question, is there a particular reason why TFHpple wouldn't work with /div?
And if there is noway to just take the first level of div, then is it possible to make a predicate on the value of an attribute with xPath (here the attribute class) ? (And how ? I have looked quite a lot now and couldn't find anything)
Thanks for your help.
PS : If it helps, here is the code I use to try and parse the data, it is first contained in the string self.material.Text :
NSData * data = [self.material.Text dataUsingEncoding:NSUnicodeStringEncoding];
TFHpple * tableParser = [TFHpple hppleWithHTMLData:data];
NSString * firstXPath = #"/div";
NSArray<TFHppleElement *> * tableHeader = [tableParser searchWithXPathQuery:firstXPath];
NSLog(#"We found : %d", tableHeader.count);
You wrote:
Getting first level using TFHpple
I assume you mean: without also getting all descendants?
Taking your other requirements into account, you can do so as follows:
//div[not(ancestor::div)][#class='head' or #class='']
Dissecting this:
Select all div elements (yes, correct term ;) at any level in the whole document: //div
Filter with a predicate (the thing between brackets) for elements not containing a div themselves, by checking if there's some div ancestor (parent of a parent of a parent of a....) [not(ancestor::div)]
Filter by your other requirements: [#class='head' or #class='']
Note 1: your given XML is not valid, it contains multiple root elements. XML can have at most one root element.
Note 2: if your requirements are to first get all divs by #class or empty #class, and then only those that are "first level", reverse the predicates:
//div[#class='head' or #class=''][not(ancestor::div)]
You can use the following XPath expression to get div element -that's quite a correct term-, having class attribute value equals "head" or empty :
//div[#ciass='head' or #class='']