I'm trying desperately to extract text from within a span which is within text which is within a div (underlined in the image)
This is the relevant part of the code ...
<div id="groupBlock3">
<div class="groupBlockTitle">
::before
"
ALL TEACHERES ("
<span class="activeTeachers">12</span>
" ACTIVE, "
<span class="archivedTeachers">1</span>
" ARCHIVED)
"
<div>...</div>
<div>+ enroll a teacher</div>
</div>
<div>...</div>
</div>
I can retrieve the text from within the first div with this ...
"normalize-space(//div[#id='groupBlock3']/div[1])"
... which gives me ...
'ALL TEACHERES ( ACTIVE, ARCHIVED) + enroll a teacher'
... but, try as I might I cannot get the text from within the first or second span - it just returns a null string. Please help me!!
Try one of these XPath-1.0 expressions:
normalize-space(//div[#id='groupBlock3']/div[1]/span[1]/text())
which results in 12, or, for the second span
normalize-space(//div[#id='groupBlock3']/div[1]/span[2]/text())
which results in 1.
But if you want all text of the first div, use this expression
normalize-space(string(//div[#id='groupBlock3']/div[1]))
which gives you the result
::before " ALL TEACHERES (" 12 " ACTIVE, " 1 " ARCHIVED) " ...+ enroll a teacher
Related
Is there any way to get div's inner text except 'by '?
<div class="name">
"by "
<em>Some Author</em>
", "
<em>Another Author</em>
</div>
What's pretty strange about your HTML is the quotes and the whitespace between the "Some Author" <em> and the comma. I assume you mean this instead:
<div class="name">
by <em>Some Author</em>, <em>Another Author</em>
</div>
With XPath, you could fetch the nodes with a query like this:
//*[#class='name']/node()[position()>1]
If we're talking about a browser environment and you want a single string, not just a collection of nodes, you could do:
document.querySelector('div.name').textContent.replace(/^\s*by\s*/, '')
You can do it with Jquery like this... but it'll remove the "," too...
Check it out:
$(document).ready(function(){
var name = $('div').children('em');
$('div').html(name);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="name">
"by "
<em>Some Author</em>
", "
<em>Another Author</em>
</div>
You can try the expression:
//em/text()
Output:
'Some Author'
'Another Author'
You can try below XPath to get all descendant text nodes, except the first one:
(//div//text())[position() > 1]
I have this snippet of html:
<li class="result-row" data="2">
<p class="result-info">
<span class="icon icon-star" role="button">
<span class="screen-reader-text">favorite this post</span>
</span>
<time class="result-date" datetime="2018-12-04 09:21" title="Tue 04 Dec 09:21:50 AM">Dec 4</time>
Link Text
and this perl code (not production, so no quality comments are necessary)
my $root = $tree->elementify();
my #rows = $root->look_down('class', 'result-row');
my $item = $rows[0];
say $item->dump;
my $date = $item->look_down('class', 'result-date');
say $date;
my $title = $item->look_down('class', 'result-title hdrlnk');
All outputs are as I expected except $date isn't defined.
When I look at the $item->dump, it looks like the time element doesn't show up in the output. Here's a snippet of the output from $item->dump where I would expect to see a <time...> element. All it shows is the text from the time element.
<li class="result-row" data="2"> #0.1.9.3.2.0
<a class="result-image gallery empty" href="https://localhost/1.html"> #0.1.9.3.2.0.0
<p class="result-info"> #0.1.9.3.2.0.1
<span class="icon icon-star" role="button"> #0.1.9.3.2.0.1.0
" "
<span class="screen-reader-text"> #0.1.9.3.2.0.1.0.1
"favorite this post"
" "
" Dec 4 "
<a class="result-title hdrlnk" data="2" href="https://localhost/1.html"> #0.1.9.3.2.0.1
.2
"Link Text..."
" "
...
I've not used HTML::Element before. I rtfmed and didn't see any tag exclusions and I did a search of the package code for tags white/black lists (which wouldn't make sense, but neither does leaving out the time tag).
Does anyone know why the time element is not showing up in the dump and any search for it turns up nothing?
As an fyi, the rest of the code searches and finds elements without issue, it just appears to be the time tag that's missing.
HTML::TreeBuilder does not support HTML5 tags. Consider Mojo::DOM as an alternative that keeps up with the living HTML standard. I can't show how your whole code would look with Mojo::DOM since you've only shown a piece, but the Mojo::DOM equivalent of look_down is find (returns a Mojo::Collection arrayref) or at (returns the first element found or undef), both taking a CSS selector.
I have a requirement of getting the text below of "ELECTRONIC ARTS" (this can change according to data) using class title "Offered By" (this class will be same for all) using Xpath. I tried various xpath coding, but couldn't get the results I want. I'm really looking for someone's help on this.
<div class="meta-info">
<div class="title"> Offered By</div>
<div class="content">ELECTRONIC ARTS</div> </div>
This is one possible XPath expression to starts with, which then you can simplify or add more criteria as needed (XPath formatted to be more readable) :
//div[
#class='meta-info'
and
div[#class='title' and normalize-space()='Offered By']
]/div[#class='content']
explanation :
//div[#class='meta-info' and ... : find div element where class attribute value equals "meta-info" and ...
div[#class='title' and normalize-space()='Offered By']] : ... has child element div where class attribute value equals "title" and content equals "Offered By"
/div[#class='content'] : from such div (the <div class="meta-info"> to be clear), return child element div where class attribute value equals "content"
Using the examples on Mozilla:
var xpath = document.evaluate("//div[#class='content']", document, null, XPathResult.STRING_TYPE, null);
document.write('The text found is: "' + xpath.stringValue + '".');
console.log(xpath);
<div class="meta-info">
<div class="title"> Offered By</div>
<div class="content">ELECTRONIC ARTS</div>
</div>
By the way, I think document.querySelector or document.querySelectorAll are much more convenient in this situation:
var content = document.querySelector('.meta-info .content').innerText;
document.write('The text found is: "' + content + '".');
console.log(content);
<div class="meta-info">
<div class="title"> Offered By</div>
<div class="content">ELECTRONIC ARTS</div>
</div>
"<td style=max-width:0px;>" +
"<span> Issuer: <a href=/app?"+JSON.stringify(issuer)+" target='_blank'> <span> " + celldata[rowIndex].hiddenprops.issuer +" </span></a> </span>"+
"</td>"
The result should be: "issuer: rojioijoijrieroirjg"
but I get
: "issuer:
rojioioijoij"
on two different lines. I tried display:inline on every tag and tried to insert tags and stuff but nothing is working?
Any idea how I should proceed. All I want is a label on the same line than my link inside a <td>
<div id='imgCaption' style='background-color:grey;padding:5px 20px;color:white;'>
</div>
<script language="javascript>
// lots of codes
$('#imgCaption').html(imgCaption + \"<div style='width:87px; float:right; text-align:right;'>\" + nextImgNum + ' of ' + totalNo + \"</div>\" );
</script>
The result are not stable.
First result
Some caption text 1 of 11
Second result
Some caption text continuing.....................................................
.. 1 of 11
Third result
Some caption text continuing ...............................................
1 of 11
First and 2nd second result are ok as the caption text and the index text are in the same line..
3rd result is not ok as they are on the different line. and the caption div's background color is not covering the index div.
As a result, the index cannot be seen on the third result..
Any work around??
Tkz..
This looks like a float issue, try adding overflow:hidden to <div id="imgCaption" />