Regex match any li of ul that contains text - html

I have a string
<ul><li>Option to add embroidered text personalization below design<br/>for only $1.00 per shirt and free setup</li><li>Men's Sizes: XS-6XL</li><li>Individually folded and bagged with size sticker for easy distribution</li><li>Ready to ship in 7 business days after art approval</li></ul>
Trying to match
<li>Men's Sizes: XS-6XL</li>
I am looking to take only the last <li></li> set that contains words
So for li that contains sizes I am looking to run something like:
(<li>).*?\b[sS]izes[ :]{1}.*?<\/li>
but that selects the first <li> instance instead of the closest.
EDIT: I can't use a html parser here like HTMLAgilityPack.

I'd use the pattern:
<li>[^<]*[Ss]izes[^<]*<\/li>
Which works like:
Element
Matches
<li>
The opening tag
[^<]*
Zero or more characters that are not the start of a new tag (<)
[Ss]izes
The keyword we are looking for
[^<]*
Zero or more characters that are not the start of a new tag (<)
<\/li>
The closing tag
Try it out!
And I'd take the last such matching element.

You can use innerHTML and innetText properties like this:
const str = "<ul><li>Option to add embroidered text personalization below design<br/>for only $1.00 per shirt and free setup</li><li>Men's Sizes: XS-6XL</li><li>Individually folded and bagged with size sticker for easy distribution</li><li>Ready to ship in 7 business days after art approval</li></ul>"
const el1 = document.createElement('div')
el1.innerHTML = str;
let liArr = el1.getElementsByTagName('li')
let resultsText = []
let resultsHTML = []
for (const listElement of liArr) {
if(listElement.innerText.indexOf('Size') >-1){
resultsText.push(listElement.innerText)
resultsHTML.push(listElement)
}
}
console.log('resultsText:::::::::::::')
console.log(resultsText)
console.log('resultsHTML::::::::::::')
console.log(resultsHTML)

Related

How to write regex expression for this type of text?

I'm trying to extract the price from the following HTML.
<td>$75.00/<span class='small font-weight-bold text-
danger'>Piece</span></small> *some more text here* </td>
What is the regex expression to get the number 75.00?
Is it something like:
<td>$*/<span class='small font-weight-bold text-danger'>
The dollar sign is a special character in regex, so you need to escape it with a backslash. Also, you only want to capture digits, so you should use character classes.
<td>\$(\d+[.]\d\d)<span
As the other respondent mentioned, regex changes a bit with each implementing language, so you may have to make some adjustments, but this should get you started.
I think you can go with /[0-9]+\.[0-9]+/.
[0-9] matches a single number. In this example you should get the number 7.
The + afterwards just says that it should look for more then just one number. So [0-9]+ will match with 75. It stops there because the character after 5 is a period.
Said so we will add a period to the regex and make sure it's escaped. A period usually means "every character". By escaping it will just look for a period. So we have /[0-9]+\./ so far.
Next we just to add [0-9]+ so it will find the other number(s) too.
It's important that you don't give it the global-flag like this /[0-9]+\.[0-9]+/g. Unless you want it to find more then just the first number/period-combination.
There is another regex you can use. It uses the parentheses to group the part you're looking for like this: /<td>\$(.+)<span/
It will match everything from <td>$ up to <span. From there you can filter out the group/part you're looking for. See the examples below.
// JavaScript
const text = "<td>$something<span class='small font-weight..."
const regex = /<td>\$(.+)<span/g
const match = regex.exec(text) // this will return an Array
console.log( match[1] ) // prints out "something"
// python
text = "<td>$something<span class='small font-weight..."
regex = re.compile(r"<td>\$(.+)<span")
print( regex.search(text).group(1) ) // prints out "something"
As an alternative you could use a DOMParser.
Wrap your <td> inside a table, use for example querySelector to get your element and get the first node from the childNodes.
That would give you $75.00/.
To remove the $ and the trailing forward slash you could use slice or use a regex like \$(\d+\.\d+) and get the value from capture group 1.
let html = `<table><tr><td>$75.00/<span class='small font-weight-bold text-
danger'>Piece</span></small> *some more text here* </td></tr></table>`;
let parser = new DOMParser();
let doc = parser.parseFromString(html, "text/html");
let result = doc.querySelector("td");
let textContent = result.childNodes.item(0).nodeValue;
console.log(textContent.slice(1, -1));
console.log(textContent.match(/\$(\d+\.\d+)/)[1]);

Find first word in html and replace

I have following construct:
<h1>
<span>
</span>
</h1>
....
and
<div id="tourname">Riding trip arround the volcano</div>
Now the div with the id is filled by a php function and I would like that
my h1 looks like this:
<h1>
<span>Riding</span>
trip arround the volcano
</h1>
I managed to fill the h1 using this function:
$("h1").html($("#tourname").html());
but I have no idea how to split my string into 2 parts and fill one part
into that span and the rest behind the /span
Can you give a hint ?
Many thanks
You can do that in 2 ways .
1 - When you are writing using php , you can create the span for first word.
2 - Using jQuery. You can see the below code using jQuery
$(document).ready(function(){
var text = $("#tourname").html();
var firstword = text.substr(0,text.indexOf(' ')); // to get the first word
var remaining = text.substr(text.indexOf(' ')+1); // remianing words
var final = '<span>'+firstword+'</span> '+ remaining // combining by adding <span> around the first word
$("h1").html(final);
});
use this:
var strArr = $("#tourname").html().split(' ');
$("h1 span").html(strArr.shift());
$("h1").append(document.createTextNode(strArr.join(" ")));
This should be of some help.
Also consider this fiddle
This will be specific to your use only.
For more flexibility with number of text elements use strArr.splice(index, number_of_elements_to_include_in_span);
instead of strArr.shift();

How to use the text between HTML tags to access an element - Selenium WebDriver

I have following HTML code.
<span class="ng-binding" ng-bind="::result.display">All Sector ETFs</span>
<span class="ng-binding" ng-bind="::result.display">China Macro Assets</span>
<span class="ng-binding" ng-bind="::result.display">Consumer Discretionary (XLY)</span>
<span class="ng-binding" ng-bind="::result.display">Consumer Staples (XLP)</span>
As it can be seen that tags are all the same for every line except the text between the tags.
How can I access each of the above line separately based on the text between tags.
use the below as xpath
//span[text()='All Sector ETFs']
You can use x-path function text() for that.
For example
//span[text()="All Sector ETFs"]
to find first span
You can use following xPath to find desired element based on text
String text = 'Your text';
//text may be ==>All Sector ETFs, China Macro Assets, Consumer Discretionary (XLY), Consumer Staples (XLP)
String xPath = "//*[contains(text(),'"+text+"')]";
By this you can find each elements..
Hope it will help you..:)
Hi please do it like below
Way One
public static void main(String[] args) {
WebDriver driver = new FirefoxDriver();
List<WebElement> mySpanTags = driver.findElements(By.xpath("ur xpath"));
System.out.println("Count the number of total tags : " + mySpanTags.size());
// print the value of the tags one by one
// or do whatever you want to do with a specific tag
for(int i=0;i<mySpanTags.size();i++){
System.out.println("Value in the tag is : " + mySpanTags.get(i).getText());
// either perform next operation inside this for loop
if(mySpanTags.get(i).getText().equals("Consumer Staples (XLP)")){
// perform your operation here
mySpanTags.get(i).click(); // clicks on the span tag
}
}
// or perform next operations on span tag here outside the for loop
// in this case use index for a specific tag (e.g below)
mySpanTags.get(3).click(); // clicks on the 4 th span tag
}
Way Two
find the tag directly //span[text()='Consumer Staples (XLP)']

what does an * before a class reference mean?

I have code like below:
<div class="new-modal *max-height--xxl flex--0-1">
I want to know what does *max-height--xxl mean here?
This is a different question to wildcard * in CSS for classes
I want to ask the * prefix in html class reference.
Nothing in a global sense. In HTML5 there are no restrictions on what characters a class attribute can contain (with the exception of the space character, which is used to separate multiple classes).
For instance, the following HTML is valid:
<figure class="foo bar baz %foo *bar _baz 'foo (bar )baz"></figure>
Here foo is one class, %foo is a second unrelated class and 'foo is a third unrelated class.
The following is also valid:
<figure class="%*_'()"></figure>
The HTML5 specification states that the class attribute must be a set of space-separated tokens. It goes on to define these as:
A set of space-separated tokens is a string containing zero or more words (known as tokens) separated by one or more space characters, where words consist of any string of one or more characters, none of which are space characters.
It's worth noting that these symbols will possibly need escaping (by prefixing them with the a backslash (\) character) in order to be targeted by a CSS selector.
.\%\*\_\'\(\) {
color: red;
}
<figure class="%*_'()">
Hello, world!
</figure>
It doesn't mean anything special. The class name just begins with a * character.
It might be a hack designed to change the class name and, effectively, comment it out so it no longer matches a CSS selector.
var c = document.querySelector('div').classList;
var list = [];
for (var i = 0; i < c.length; i++) {
list.push(c[i]);
};
document.body.innerHTML = list.join(" / ");
<div class="new-modal *max-height--xxl flex--0-1"></div>

How to excluded XPATH Nested SPAN Class

From the following Html code, I want to select only the first span class Text..
<span class="item_amount order_minibasket_amount order_full_minibasket">10
<span class="article">Article
<i class="icon"> >
</i>
</span>
</span>
This is my current XPATH :
//span[contains(#class,'order_minibasket_amount')]
when I use this in my Selenium Test, I got the whole SPAN TEXT.. Like :
10 Article >
I just want to get the "10" article amount..
AMOUNT(new PageElement(By.xpath("//span[contains(#class,'order_minibasket_amount')]/text()[1]"), "not such Element...."))
public String getAmount() {
return amount = PageObjectUtil.findAndInitElementInside(webElement, PageElements.AMOUNT.pe, amount, String.class);
}
Many thanks in advance,
Cheers,
koko
What you want cannot be done directly, you will have to resort to String manipulation. Something like:
String completeString = driver.findElement(By.className("item_amount")).getText()
String endString = driver.findElement(By.className("article")).getText()
String beginString = completeString.replace(endString, "")
You could add /text() to the end:
//span[contains(#class,'order_minibasket_amount')]/text()
Instead of selecting the span element node, this XPath will select the set of text nodes that are direct children of the span element. This should be a set of two nodes, one containing "10", a newline and four spaces (the text between the opening tag of the target span and the opening tag of the nested span) and the other containing just a newline (between the closing tags of the two spans. If you only want the first text node child (10, nl, spaces) then use
//span[contains(#class,'order_minibasket_amount')]/text()[1]
Now I am using work around solution...but i am not happy with it. :-(
public String getAmount() {
String tempAmount = PageObjectUtil.waitFindAndInitElement(PageElements.AMOUNT.pe).getText();
String output = tempAmount.replaceAll("[a-zA-Z->]", "");
return amount = output.trim();
}
cheers,
KoKo