XPath - get text from whole document except text from specified elements - html

I'm trying to figure out how to get text using XPath and exclude some tags.
Let's say (for illustration) I want to get all text from this page's body tag (so all visible text), but I don't want my text to contain text from tags with class="comment-copy" i.e. I don't want text to include comments.
I tried this but it doesn't work. It returns text including comments.
//body//text()[not(*[contains(#class,"comment-copy")])]
Do you have any idea?
EDIT:
Probably figured it out but maybe there are better or faster approaches so I won't delete the question.
//body//text()[not(ancestor-or-self::*[contains(#class,"comment-copy")])]

You were very close.
Just change
//body//text()[not(*[contains(#class,"comment-copy")])]
to
//body//text()[not(contains(../#class,"comment-copy"))]
Note that this will only exclude immediate children text() nodes of comment-copy marked elements. Your follow-up XPath will exclude all descendant text() nodes beneath comment-copy marked elements.
Note: You might want to beef up the robustness of the #class test; see Xpath: Find element with class that contains spaces.

Related

Finding xpath of element

In the following snippet
I want to get xpath of the element containing the text 'This is what I should get'. I use the xpath expression html/body/div[5]/div[3]/div/div/div/div[2]/div/table/tbody/tr[2]/td/span, but I am getting the element with text 'This is what I am getting'. Please help me to modify element locator to get desired text
There must be a better XPath expression than that verbose one, but without more information I can only suggest based on the existing XPath. So, the desired text node can be identified either as text node that follows the previously selected span element :
..../table/tbody/tr[2]/td/span/following-sibling::text()[1]
or as direct child text node from the parent td element :
..../table/tbody/tr[2]/td/text()[normalize-space()]
If you want to get the text node, the xpath would be:
html/body/div[5]/div[3]/div/div/div/div[2]/div/table/tbody/tr[2]/td/text()[2]
Although xPath expression should probably less verbose.

How to dynamically display a multiline text in D3.js?

I need to display a multiline text in a SVG:Text using D3.js.
The sample data looks as follows and I want to display "all" the "titles" under a single node for every author and not as an individual node in a force directional layout.
Sample data
{
{"author":"Author1", "group":"fiction", "books" : [
{"title":"Book Title1", "rating":3},
{"title":"Book Title2", "rating":4}
]},
{"author":"Author2", "group":"non-fiction", "books" : [
{"title":"Book Title3", "rating":3},
]}
}
SVG:text takes only one text entry and displays in a single line, so I have add more text and adjust the "dy"? or retractively collec node information and replace?
Thanks for the tips.
You have the following options.
You can, as you've mentioned, add more than one text element with the appropriate spacing.
You can also use multiple tspan elements within a text element to the same effect. Again, you would have to set the spacing yourself.
You can use foreignObject to embed a suitable HTML element (e.g. a div) that will take care of the line breaking, spacing etc. for you. For an example of that, see e.g. here.
I would go with the HTML embedding option unless you have a specific reason not to. It makes the actual text formatting so much easier than the other options.

Using neutral <div> as word boundary?

I have a .html file containing text content like:
<div> The study concludes that 1+1 = 2. (Author in Journal..., Page ...) Another study finds...</div>
Now when viewing this in Firefox, I want to be able to conveniently copy the text in the () brackets. But 2 left mouseclicks only mark one word like "Journal", and 3 clicks mark the content of the whole div.
So my idea was to put the brackets in another div like:
<div> The study concludes that 1+1 = 2. <div>(Author in Journal..., Page ...)</div> Another study finds...</div>
But this leads to the () text being pushed into a new line, but the text flow shouldn't be altered at all, I just want to achieve the copy+paste behavior. Is there a way to achieve this? I thought about applying a div class to the () and canceling the attributes in the .css file, but somehow it did not work.
Essentially a triple click will mark a paragraph. So even if you were able to make your inner div inline (which is very simple, you can use style="display:inline"), the browsers text analyzing engine would still read it as one paragraph (or one block) and use the standard behaviour: mark the paragraph.
So basically: no, not if you use only CSS. You have to use JavaScript to identify a triple click on the element and mark it.

Get (text) in XPath

I have the following DOM structure / HTML, I want to get (just practicing...) the marked data.
The one that is under the h2 element. that div[#class="coordsAgence"] element, has some more div children below and some more h2's.. so doing:
div[#class="coordsAgence"]
Will get that value, but with additional unneeded text.
UPDATE: The value (From this example) that I basically want is that: "GALLIER Dennis" text.
It seems you want the first text node in that div:
div[#class="coordsAgence"]/text()[1]
should do it.
Note that this assumes that there is actually no whitespace between those comments inside <div class="coordsAgence">; otherwise that whitespace will constitute additional text nodes that you'll have to account for.
Get the first text node following the first h2 in the div with class "coordsAgence":
div[#class='coordsAgence']/h2[1]/following-sibling::text()[1]
Note that this first expression returns the first text node after the first h2 even when some other node appears between the two. If you want to return the text only when it's the node that immediately follows the first h2, then try something like this:
div[#class='coordsAgence']/h2[1][following-sibling::node()[1][self::text()]]/following-sibling::text()[1]
using Python/Scrapy to get text from h1 tag(for example):
response.xpath(
"//div[contains(#class, 'class_name')]//h1[contains(#class, 'class_name')]/text()"
).get()

xpath help can't get the text?

I am unable to get the text from this website: http://mp3bear.com...so now I just want to get the title of the song that is displayed on it.. here is what i wrote as the code:
//table/tr[2]/td[2]
so now I want to get second row from second column... it doesn't display anything.... is there any thing special when
I can't find any table element on this site, the tables are constructed with divs.
Therefore the expression for the second row of the second column of the table is.
//div[#id='listwrap']/div[3]/div[2]
There are some xpath implementations that don't allow indexing of child elements in this manner. In this case you could use
//div[#id='listwrap']/div[position()='3']/div[position()='2']
Edit:
In that case you need this expression:
//div[#id='listwrap']/div[3]/div[2]/a/text()
as the title is contained in a 'a' element and you use the xpath function text() to get the text value of the 'a' element
tested in firepath.