How to select all text excluding a single node using XPath? - html

Given the following XML:
<div class="a">
some text i want to see
<div class="b">
other text i want to see
<div>
<div class="c">
some text i DON'T WANT to see
</div>
some more text i wish to see..
</div>
I would like to have an XPATH that selects all the text that is not under class c.
Expected output:
some text i want to see
other text i want to see
some more text i wish to see..

This XPath,
//div[#class="a"]//text()[not(parent::div[#class="c"])]
will select all text nodes without a div parent of #class="c":
some text i want to see
other text i want to see
some more text i wish to see..
If you want to exclude white-space-only text nodes, then this XPath,
//div[#class="a"]//text()[not(parent::div[#class="c"]) and normalize-space()]
will select these text nodes,
some text i want to see
other text i want to see
some more text i wish to see..
as requested.

Related

Hide specific text from html using css

In the below code, I need to hide the 2nd tag and it's related content, how can I do that in Css
<div id="content-list">
<b>Title:</b> some random text <br/>
<b>Title2:</b> some random text 2 <br/>
</div>
With the below css I can only hide the 2nd b tag, but not able to hide the text.
div > b:nth-child(1) {
display: none;
}
Note: HTML mockup can't be modified due to various reason.
There is no way to reference a text node in CSS. However there are probably some hacky ways to accomplish this.
One way you could do this, if the layout supports it, would be to hide the title and anything adjacent to it using a large, negative number for margin-left.
.content-list > b:nth-of-type(2) {
margin-left: -1000000px;
}
<div class="content-list">
<b>Title:</b> some random text <br />
<b>Title 2:</b> some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text some large random text <br />
<b>Title 3:</b> some random text <br />
</div>
As you can see if you run the snippet, there are some issues. Mainly there will just be a blank like in the place where the text was. Plus any one using a text reader will still have access to it.
The only real solution will be either to fix your html or use JavaScript.
"I can only hide the 2nd b tag, but not able to hide the text"
That's because the text "some random text 2" is outside of the tags.
Since you can't actually select text nodes directly, one work-around would be to set the font-size of the parent element to 0. Then reset the font-size for those desired b elements. In doing so, only the b elements should appear, and the adjacent text nodes should effectively be hidden.
div {
font-size: 0px
}
div > b:nth-child(1) {
font-size: 16px
}
<div>
<b>Title:</b> some random text <br/>
</div>
An alternative solution is to change the original HTML to something more like this, which is highly recommended in terms of accessibility:
#content-list div:nth-child(2) {
display: none;
}
<div id="content-list">
<div>
<h2>Title 1</h2>
<span>some random text</span>
</div>
<div>
<h2>Title 2</h2>
<span>some random text</span>
</div>
<div>
<h2>Title 3</h2>
<span>some random text</span>
</div>
</div>

Indent Any Wrapped Text Line Not Beginning with Large Bullet Character?

I have a series of lines of text that each begin with a large bullet and space and are seperated by br tags.
br tags are being used because Blogger replaces the p tags with those.
I want to indent any wrapped line of text that does not start with a large bullet character so as to emulate a bulleted list. The wrapped text should line up with the red lines.
Blogger's home page post snippets strip out list tags.
● Some Text. More Text. Even More Text. Additional Text. <br>
● Some Text. More Text. Even More Text. Additional Text. <br>
● Some Text. More Text. Even More Text. Additional Text. <br>
● Some Text. More Text. Even More Text. Additional Text. <br>
The home page page used for Blogger accounts shows a small text snippet of each post + the post title and a thumbnail 1st image used in the post.
The snippet widget only recognizes some styling tags, the a tag and the br tag. List tags are stripped out. Blogger replaces p tags with br tags.
I use lists to display my content within posts and want to emulate a list in the snippets. Here is how two vertically stacked lists appear in the home page snippet. This sucks!
I've added a span / section at the beginning of each li with a large bullet inside. This is hidden on the actual post page via code inside the head. The content of this span / section, the large bullet, is shown within the snippet on the home page although the actual span / section tags will be stripped by the snippet. A br tag is added at the end of each li tag to force a new line within the snippet - this br tag does not change the appearance of the actual post.
This emulates the list found on the actual post, but I've not been able to emulate the indentation for any wrapped text.
<b>List Title 1</b>
<ul><div class="outer">
<li><section class="pointer">● </section><i>This is some sample text in the first li tag of the list, with more text to follow..<br>
</li>
</div>
<div class="outer">
<li><section class="pointer">● </section>This is some sample text in the second li tag of the list, with more text to follow. Some more text here as promised.</i>.<br>
</li>
</div>
<div class="outer">
<li><section class="pointer">● </section>This is some sample text in the third li tag of the list, with more text to follow. Some more text here as promised. More text here also and her also.<br>
</li>
</div>
<div class="outer">
<li><section class="pointer">● </section>A similar <i>Seiler</i> This is some sample text in the fourth li tag of the list, with more text to follow. Some more text here as promised. More text here also and her also.<br>
</li>
</div>
<div class="outer">
<li><section class="pointer">● </section>This is some sample text in the fifth li tag of the list, with more text to follow. Some more text here as promised. More text here also and her also. Cemetery.<br>
</li>
</div>
</ul>
I have added 427+ lines of custom code to an exsisting Blogger template.

Use AND and NOT in xPath

I have the following issue at hand where I need to get part of the text without including a tag. just to be more clear, I have the following code:
<div class="field-item even">
<p> text text text</p>
<p> text text text <a class="people-articles">text text</a> text text</p>
<p> text text text</p>
</div>
So I'm trying to get the text inside the p tag but not the a class="people-articles". and here what I've done so far but its not working
//div[#class="field-item even"] and [not a(#class='people-articles')]
Can someone tell me what am i doing wrong? and how to obtain p without a ?
this should be straight forward.
//div[#class="field-item even"]/p[not(a[#class="people-articles"])]/text()

XPath for an element that follows some specific paragraph text nested in a div?

I'm trying to select the text "Part Sun, Sun" and "Herb", "Houseplant" from the html below.
The <div class="specifics"> has more of these "row" divs and the text I'm interested in always comes after certain paragraph tags containing specific text like "Light:", and "Type:" below.
Edit: To clarify out of all the "value" divs I'm only interested in ones that have specific "names". So I want to check the text of paragraphs nested inside <div class="name"> elements and if it's what I'm interested in then select the text inside the subsequent <div class="value"> element.
<div class="specifics">
<div class="row">
<div class="name">
<p>Light:</p>
</div>
<div class="value">
<p>Part Sun, Sun</p>
</div>
</div>
<div class="row">
<div class="name">
<p>Type:</p>
</div>
<div class="value">
<p>
Herb, Houseplant
</p>
</div>
</div>
...more rows...
</div>
I've tried this (using Scrapy):
trait = response.xpath("//div[#class='specifics']")
trait.xpath(".//div[#class='row']/div[#class='name']/p[text()='Light:']/../../div[#class='value']/p/text()[normalize-space()]")
The first line is ok but the second one is returning \n \n
Apologies for poor editing originally, below is what the paragraph element actually looks like.
Second Edit: There are a bunch of empty lines and when I select just /p without text() I still get back just a bunch of \n without any of the text? Tried normalize-space as above.
<p>
Part Sun,
Sun
</p>
To select the elements you need, you can do something like this:
/div[#class='specifics']/div[#class='row']/div[#class='value']/p
Adding /text() on the end will grab the Part Sun, Sun in your first row, but because your second row has additional nested elements in it, that text won't be picked up.
Instead you can use /string() which will also extract text from children. /div[#class='specifics']/div[#class='row']/div[#class='value']/p/string()
If you also need to strip out whitespace then you can use either normalize-whitespace() or translate(input, charsToReplace, replacement).
/div[#class='specifics']/div[#class='row']/div[#class='value']/p/normalize-space(string()). Using this tool I get output of String='Part Sun, Sun' and String='Herb, Houseplant'
/div[#class='specifics']/div[#class='row']/div[#class='value']/p/translate(string(), '
', '') where
is the newline character, but you could also add others characters you need removing. source

Fetching text with xpath in dynamic html structure

I have a lot of html and want to process it via xpath. There are two possible ways text can occur:
<div>
The Text
</div>
<!-- OR -->
<div>
<span>The Text</span>
</div>
<!-- BUT NOT -->
<div> other text
<span>The Text</span>
</div> other text
Is there a way I can fetch "The Text" with a single xpath expression?
edit:
concrete structure:
<div id="content">
<h1>...</h1>
<div>
...
</div>
<div>
<span>The Text</span>
</div>
I'm getting the content node via //div[#id='content'][1] and reuse it for other purposes. On this context-node, I tried to execute ./div[2]/span/text() | ./div[not(span)][2]/text(). It works if there is no span, but returns blank/null if there is a spawn. Im using the Java xpath implementation. The div is always the second one of the content-node.
div/span/text() | div[not(span)]/text()
should do the trick. This selects text nodes that are children of the <span> (if there is a <span>), as well as text nodes that are children of the <div> if there is no <span>.
You'll have to modify the div parts to reflect the context from which you're evaluating the XPath expression. If you want to do this with all <div> elements in the document, then change div to //div.
Update:
Based on the new context information you posted, the above XPath should be modified to:
./div[2]/span/text() | ./div[2][not(span)]/text()
However I don't see why your version is returning no text when there is a <span> element. Can you give more context -- your java code that's evaluating the XPath; maybe a more detailed snippet of your input HTML? Is the sample input HTML really exactly representative of your actual input? Could there be another </div> in there that's going unnoticed?