Getting two different result from same xpath - html

HTML Code:
<div class="deviceName truncate"><a ng-href="#" href="#" style="">Hello World</a></div>
Each <a> element, the Link text contains double space as "Hello World"
Retrieving information, in List
List<WebElement> findAllUserName = driver.findElements
(By.xpath("//div[#class='deviceName truncate']//a[text()]"));
for (WebElement webElement : findAllUserName) {
String findUSerText = webElement.getText();
System.out.println(findUSerText);
}
Its gives list Result with single space, "Hello World"
How should overcome to this situation ? To compare text,
Concern behind this, wants to compare list element with given string :
driver.findElement(By.xpath("//div[#class='deviceName truncate']//a[contains(text(),'" + name + "')]"))
And its considering double space,

If you just check how you browser renders the link you will realize that it is also having a single space. See this discussion: Do browsers remove whitespace in between text of any html tag
So you either should to add a style to your page like this
a {
white-space: pre;
}
That will make all your links unformatted.
or try to inject the style of the particular element on the fly like it is shown here: How can i set new style of element using selenium web-driver
Here is the example having two identical links but with different styles set.

Related

Split Web page source string based on span id using componentsSeperatedByString not working objective c

I have a requirement where i need to get the value of a span element from HTML web page source. I am able to get the entire HTML page source but i could not be able to retrieve the value inside the following span element
<span id="MySpanID"> This is my span</span>
I have used componentsSeperatedByString in order to split the HTML page source string into array elements based on span element as shown in the code line below.
NSArray *arrayComponents = [htmlPageSourceString componentsSeparatedByString:#"<span id="MySpanID">"];
The above line of code is always returning nil value due to which i could not able to split the string into array and retrieve the value present inside the span element . Can you please suggest any solution to retrieve the span element value from HTML Web page source?
Not sure how you're able to run this line of code
NSArray *arrayComponents = [htmlPageSourceString componentsSeparatedByString:#"<span id="MySpanID">"];
Because quotes inside NSString should be escaped with \.
Try to use
NSArray *arrayComponents = [htmlPageSourceString componentsSeparatedByString:#"<span id=\"MySpanID\">"];
It should work.

Replace a text keyword with a "Page Break" element in Apps Script

I want to replace a specific text keyword with a page break. Here's what I've tried:
body.findText("%PAGE_BREAK%").getElement().appendPageBreak()
and
body.replaceText("%PAGE_BREAK%", "").asBody().appendPageBreak()
I'm trying to edit existing documents which have %page_break% somewhere and replace it with an actual page break element.
The following will suffice under the assumption that "%page_break%" appears either at the end of a paragraph, or in a paragraph of its own. The script searches the text for this pattern, and appends a PageBreak element to the end of the element (paragraph) containing the found text. Then it removes the pattern.
function pageBreaks() {
var searchPattern = "%page_break%";
var body = DocumentApp.getActiveDocument().getBody();
var found = body.findText(searchPattern);
while (found) {
found.getElement().getParent().appendPageBreak();
found = body.findText(searchPattern, found);
}
body.replaceText(searchPattern, "");
}
It does not make much sense to place page breaks inside of paragraphs, but Google Docs supports this, and there is Paragraph.insertPageBreak method for that. However, to use it you would need to first split the Text element containing the search pattern so that a page break can go in between two Texts. (A Text element cannot contain any other elements.)

RegExp to search text inside HTML tags

I'm having some difficulty using a RegExp to search for text between HTML tags. This is for a search function to search text on a HTML page without find the characters as a match in the tags or attributes of the HTML. When a match has been found I surround it with a div and assign it a highlight class to highlight the search words in the HTML page. If the RegExp also matches on tags or attributes the HTML code is becoming corrupt.
Here is the HTML code:
<html>
<span>assigned</span>
<span>Assigned > to</span>
<span>assigned > to</span>
<div>ticket assigned to</div>
<div id="assigned" class="assignedClass">Ticket being assigned to</div>
</html>
and the current RegExp I've come up with is:
(?<=(>))assigned(?!\<)(?!>)/gi
which matches if assigned or Assigned is the start of text in a tag, but not on the others. It does a good job of ignoring the attributes and tags but it is not working well if the text does not start with the search string.
Can anyone help me out here? I've been working on this for a an hour now but can' find a solution (RegExp noob here..)
UPDATE 2
https://regex101.com/r/ZwXr4Y/1 show the remaining problem regarding HTML entities and HTML comments.
When searching the problem left is that is not ignored, all text inside HTML entities and comments should be ignored. So when searching for "b" it should not match even if the HTML entity is correctly between HTML tags.
Update #2
Regex:
(<)(script[^>]*>[^<]*(?:<(?!\/script>)[^<]*)*<\/script>|\/?\b[^<>]+>|!(?:--\s*(?:(?:\[if\s*!IE]>\s*-->)?[^-]*(?:-(?!->)-*[^-]*)*)--|\[CDATA[^\]]*(?:](?!]>)[^\]]*)*]])>)|(e)
Usage:
html.replace(/.../g, function(match, p1, p2, p3) {
return p3 ? "<div class=\"highlight\">" + p3 + "</div>" : match;
})
Live demo
Explanation:
As you went through more different situations I had to modify RegEx to cover more possible cases. But now I came with this one that covers almost all cases. How it works:
Captures all <script> tags and their contents
Captures all CDATAblocks
Captures all HTML tags (opening / closing)
Captures all HTML comments (as well as IE if conditional statements)
Captures all targeted strings defined in last group inside remaining text (here it is
(e))
Doing so lets us quickly manipulate our target. E.g. Wrap it in tags as represented in usage section. Talking performance-wise, I tried to write it in a way to perform well.
This RegEx doesn't provide a 100% guarantee to match correct positions (99% does) but it should give expected results most of the time and can get modified later easily.
try this
Live Demo
string.match(/<.{1,15}>(.*?)<\/.{1,15}>/g)
this means <.{1,15}>(.*?)</.{1,15}> that anything that between html tag
<any> Content </any>
will be the target or the result for example
<div> this is the content </content>
"this is the content" this is the result

Extract whitespace-collapsed text from html as it would be rendered

I use an html parser (Neko) in order to extract the free-text of an html document.
Since I'm interested in text's semantic I must give special attention to the distance between words as it appears in browser.
for example:
<H1>My
title</H1>
<P>Hello
World</P>
Is rendered as:
My title
Hello world
While containing the paragraph inside <pre> tags or with style:
<style>
p { white-space:pre; }
</style>
would result:
My title
Hello
World
which I would like to treat differently since "Hello" for that matter is not semantically tied to the word "World". As said in other posts - there's a difference between what parsing does and what rendering does. I'm interested in the connection between words as it appears after rendering since obviously parsing doesn't collapse white-spaces as would been shown on browser.
Is there any way to extract whitespace-collapsed text from html as it's read on browser?
I have not used Neko before, but you will need to access the styles of the elements and see if the white-space property is set to either pre, pre-wrap, or preline.
If it is either pre or pre-wrap, replace any whitespace group in the text with a single space.
Else if pre-line, only replace groups of spaces/tabs with a single space.
Else, do not modify the text.
Here's an example using JQuery: JSFiddle
JQuery
function getRenderedText(obj) {
var text = obj.text();
var renderedText;
switch (obj.css('white-space')) {
case 'pre':
case 'pre-wrap':
renderedText = text;
break;
case 'pre-line':
renderedText = text.replace(/[ \t]+/,' ');
break;
default:
renderedText = text.replace(/\s+/,' ');
}
return renderedText;
}
Just look at this basic info on w3schools
http://www.w3schools.com/cssref/pr_text_white-space.asp
and a bit better explained with examples:
http://css-tricks.com/almanac/properties/w/whitespace/
i also think that you have to put hello in 1 <p> and world in another for the effect to work.
otherwise they both go to the right.

How to modify how TinyMCE format text

TinyMCE color formating is putting in to span tag,
now I need when ever user change color for a text add
one extra character
(for those who may wonder way I need this, read this: Inserting HTML tag in the middle of Arabic word breaks word connection (cursive))
so this is how TinyMCE normaly format text:
<p><span style="color: #ff6600;">forma</span>tings</p>
this is how I need to be:
<p>X<span style="color: #ff6600;">forma</span>tings</p>
so before any span I need to add one extra character.
I was searching throug TinyMCE source but I couldn't find where it assembly this.
I totaly understand your need for a word-joiner.
Depending on the browser you might be able to insert this character using a css-pseudo element - in this case before: http://www.w3schools.com/cssref/sel_before.asp
Your tinymce content css (use the tinymce init setting content_css) should contain the following:
body span:before {
content:'\2060'; // use '\00b6' to get something visible for testing
}
UPDATE: Approch2:
You can do this check to enter your word joiners:
var ed = tinymce.get('content') || tinymce.editors[0];
var span = $(ed.getBody()).find('span:not(.has_word_joiner)').each(function(index) {
ed.selection.select(this);
ed.execCommand('mceInsertContent', false, '\u2060<span class="has_word_joiner">'+this.innerHTML+'</span>'); // you might want to add the formerspan attributes too, but that is a minor issue
});
You might need to call this using an own plugin on special events.