Finding a span with a specific word in text using xPath - html

I have many spans with different text. But I need to find a span with a long text with a certain word inside this text.
<div class="commentMessage">
<span>Any endgame spoilers</span>
</div>
In this case, I want to choose a span contains text with a certain word 'endgame'.
Here's what I tried:
//div[#class='commentMessage']/span[text()='endgame'] - but it show no results (very strange) =\
then I tried to find the span using all phrase:
//div[#class='commentMessage']/span[text()='Any endgame spoilers'] - and it works =\
But as I said I need to find a span using a certain word in it.
Also, I tried that construction:
//div[#class='commentMessage']/span[contains(text(), 'endgame')] - but Chrome xPath extension says: Type is not appropriate for the context in which the expression occurs

Try the following xpath.
//div[#class='commentMessage']/span[contains(.,'endgame')]

Related

How to match text and skip HTML tags using a regular expression?

I have a bunch of records in a QuickBase table that contain a rich text field. In other words, they each contain some paragraphs of text intermingled with HTML tags like <p>, <strong>, etc.
I need to migrate the records to a new table where the corresponding field is a plain text field. For this, I would like to strip out all HTML tags and leave only the text in the field values.
For example, from the below input, I would expect to extract just a small example link to a webpage:
<p>just a small <a href="#">
example</a> link</p><p>to a webpage</p>
As I am trying to get this done quickly and without coding or using an external tool, I am constrained to using Quickbase Pipelines' Text channel tool. The way it works is that I define a regex pattern and it outputs only the bits that match the pattern.
So far I've been able to come up with this regular expression (Python-flavored as QB's backend is written in Python) that correctly does the exact opposite of what I need. I.e. it matches only the HTML tags:
/(<[^>]*>)/
In a sense, I need the negative image of this expression but have not be able to build it myself.
Your help in "negating" the above expression is most appreciated.
Assuming there are no < or > elsewhere or entity-encoded, an idea using a lookbehind.
(?:(?<=>)|^)[^<]+
See this demo at regex101
(?:(?<=>)|^) is an alternation between either ^ start of the string or looking behind for any >. From there [^<]+ matches one or more characters that are not < (negated character class).

RegExp to search text inside HTML tags

I'm having some difficulty using a RegExp to search for text between HTML tags. This is for a search function to search text on a HTML page without find the characters as a match in the tags or attributes of the HTML. When a match has been found I surround it with a div and assign it a highlight class to highlight the search words in the HTML page. If the RegExp also matches on tags or attributes the HTML code is becoming corrupt.
Here is the HTML code:
<html>
<span>assigned</span>
<span>Assigned > to</span>
<span>assigned > to</span>
<div>ticket assigned to</div>
<div id="assigned" class="assignedClass">Ticket being assigned to</div>
</html>
and the current RegExp I've come up with is:
(?<=(>))assigned(?!\<)(?!>)/gi
which matches if assigned or Assigned is the start of text in a tag, but not on the others. It does a good job of ignoring the attributes and tags but it is not working well if the text does not start with the search string.
Can anyone help me out here? I've been working on this for a an hour now but can' find a solution (RegExp noob here..)
UPDATE 2
https://regex101.com/r/ZwXr4Y/1 show the remaining problem regarding HTML entities and HTML comments.
When searching the problem left is that is not ignored, all text inside HTML entities and comments should be ignored. So when searching for "b" it should not match even if the HTML entity is correctly between HTML tags.
Update #2
Regex:
(<)(script[^>]*>[^<]*(?:<(?!\/script>)[^<]*)*<\/script>|\/?\b[^<>]+>|!(?:--\s*(?:(?:\[if\s*!IE]>\s*-->)?[^-]*(?:-(?!->)-*[^-]*)*)--|\[CDATA[^\]]*(?:](?!]>)[^\]]*)*]])>)|(e)
Usage:
html.replace(/.../g, function(match, p1, p2, p3) {
return p3 ? "<div class=\"highlight\">" + p3 + "</div>" : match;
})
Live demo
Explanation:
As you went through more different situations I had to modify RegEx to cover more possible cases. But now I came with this one that covers almost all cases. How it works:
Captures all <script> tags and their contents
Captures all CDATAblocks
Captures all HTML tags (opening / closing)
Captures all HTML comments (as well as IE if conditional statements)
Captures all targeted strings defined in last group inside remaining text (here it is
(e))
Doing so lets us quickly manipulate our target. E.g. Wrap it in tags as represented in usage section. Talking performance-wise, I tried to write it in a way to perform well.
This RegEx doesn't provide a 100% guarantee to match correct positions (99% does) but it should give expected results most of the time and can get modified later easily.
try this
Live Demo
string.match(/<.{1,15}>(.*?)<\/.{1,15}>/g)
this means <.{1,15}>(.*?)</.{1,15}> that anything that between html tag
<any> Content </any>
will be the target or the result for example
<div> this is the content </content>
"this is the content" this is the result

Xpath that find a specific text and only this specific text

I'm using an xpath to locate element that contains a certain text in it. My problem is that it locate another element that have the same text i'm looking for in it but also some other text, here the xpath i'm using is:
//a[contains(text(), 'Workflow')]
And i want to locate a link that contain the text Workflow and Workflow only,
but the xpath locate a link with Workflow.MAINMENU wich i don't want to.
Is this possible with an XPATH ?
Yes, this is possible. You need to not use the contains function, but to instead compare the text directly:
//a[text() = 'Workflow']
If there is whitespace surrounding the text, you could use:
//a[normalize-space(text()) = 'Workflow']

Using XPATH to find a li based on text contains doesnt work on text after a <b> tag

So I've got some li on a page and I'm trying to identify it with some XPATH, only trouble is I need to make sure that all the text matches so I need to identify on all the text and there is a in there that is giving me hassle (I'm using a chrome addin to validate the XPATH and it keeps telling me its null when I try), any suggestions welcome!
Here is the html on the page: -
<li>
Some pre text, <b>bold</b> nothing here is identified.
</li>
Here is what I've tried that doesnt work: -
//ul/li[contains(text(),'') and contains(text(),'bold') and contains(text(),'nothing here is identified')]
I also tried this just to see if it works (bear in mind my XPATH needs to check all the text within that li), but it won't identify it at all using any text after the bold tags...
//ul/li[contains(text(),'nothing here is identified')]
What obvious XPATH trickery and I missing...?
Cheers
You can use the following:
//ul/li[contains(.,'') and contains(.,'bold') and contains(.,'nothing here is identified')]
Use of text() would give you three text nodes, as there are 3 nodes infact, which when used in contains() will be an irrecoverable error:
Some pre text,
bold
nothing here is identified.
But the use of . or current()(both mean the same here), would give you only one string(concatenation of all three nodes mentioned above).

Using Ruby and watir-webdriver, how does one get the XPath of an element if the text is known?

Using watir-webdriver, I am trying to click the "down" button next to some known text on the screen (in this case, "An Example"). When I click the down button, the text itself will move down a list of arbitrary length. I don't always know where the text will appear in the list, and also there is no ID on the text or the down button to uniquely identify the location to click. The button has a class attached ("rowDown down"), but as there can be multiple rows of text, the button class is not unique.
In situations where I can't get get a unique ID, I always turn to XPath.
In this case, I know the XPath of the text I care about will end with /div[2]
and the button I want to click is in div[1], more specifically the button I want will have XPath ending in div[1]/button[2]
The question is, how do I get the XPath for the text using watir-webdriver?
(An example of the full XPath I could be dealing with is "//*[#id='sortOrder0']/tbody/tr[2]/td[1]/div[2]".)
Alternately, and equally acceptable, is there some other, reliable way of getting to the button I care about?
Here is the relevant portion of my HTML, which produces an up arrow and a down arrow next to the words "An Example":
<div>
<button class="rowUp up"></button>
<br />
<button class="rowDown down"></button>
</div>
<div>
An Example
</div>
You can check the text of node during an XPath using text(). Therefore you could write an XPath that finds the div with text "An Example", goes to the preceding div and then clicks the down button:
browser.button(xpath: '//div[normalize-space(text())="An Example"]
/preceding-sibling::div[1]
/button[contains(#class, "rowDown")]').click
Personally I find this hard to read and error prone. I prefer to use basic locators and leave the XPath generation to Watir-Webdriver's internals. A similar effect to the XPath can be achieved with:
browser.div(text: 'An Example').parent.button(class: 'rowDown').click