Replacing innerHTML closes tag unnecessarily in Angular [duplicate] - html

This question already has answers here:
paragraph tag not closed?
(2 answers)
Closed 3 years ago.
I have a function in Angular that takes DOM content and does search and replace to annotate specific text. The problem is that the replaced text (using innerHTML) closes tags prematurely. Simplistically, it is reading:
}--><p _ngcontent-atr-c1="" class="paragraph-body ng-star-inserted"><div>Blah blah</div></p><!--bindings={
and thinks the <p> is not closed and the </p> is not opened, so the innerHTML is inappropriately closing and opening tags automatically like so:
}--><p _ngcontent-atr-c1="" class="paragraph-body ng-star-inserted"></p><div>Blah blah</div><p></p><!--bindings={
How do I resolve this?
My function (which looks for case variants of searchTerm to replace):
startSearch(searchTerm: string) {
const content = document.getElementById('chapter').children;
const regexLower = new RegExp(`${searchTerm.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')}`, 'g');
const regexUpper = new RegExp(`${searchTerm.toUpperCase().replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')}`, 'g');
const regexCapitalized = new RegExp(
`${searchTerm.replace(/^\w/,
c => c.toUpperCase()).replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')}`, 'g'
);
for (let i = 0; i < content.length; i++) {
const block = content[i].innerHTML;
block.replace(regexLower, `<span class="highlight">${searchTerm.toLowerCase()}</span>`);
block.replace(regexUpper, `<span class="highlight">${searchTerm.toUpperCase()}</span>`);
block.replace(regexCapitalized, `<span class="highlight">${searchTerm.replace(/^\w/, c => c.toUpperCase())}</span>`);
content[i].innerHTML = block;
}
}

You have malformed HTML according to the web browser.
It's disallowing <div> tags inside the <p> tag content.

Related

Is there a way to fix quotes that are inside of each other without them clashing? [duplicate]

This question already has answers here:
How to escape double quotes in a title attribute
(7 answers)
How do I properly escape quotes inside HTML attributes?
(6 answers)
Closed 2 days ago.
I'm making a list of links that have bookmarklets inside. The problem is that there are quotes in the bookmarklet that clash with the quotes. Is there a way to fix this, or otherwise is there a different way to do it?
Code:
<a href='javascript:(function() { var l = document.querySelector("link[rel*='icon']") || document.createElement('link'); l.type = 'image/x-icon'; l.rel = 'shortcut icon'; l.href = 'https://google.com/favicon.ico'; document.getElementsByTagName('head')[0].appendChild(l); document.title = 'Google';})();'>Code</a>
I tried changing the quote type, but that doesn't work. I want the javascript to be inside the link.

Chrome extension: Add style to element if it contains particular text

I want to add styling to an element only if it contains a particular string. i.e. if(el contains str) {el:style}.
If I wanted just the links containing w3.org to be pink, how would I find <a href="http://www.w3.org/1999/xhtml">article < /a> inside the innerHTML and then style the word "article" on the page.
So far, I can turn ALL the links pink but I can't selectively target the ones containing "www.w3.org".
var links = [...document.body.getElementsByTagName("a")];
for (var i = 0; i < links.length; i++) {
links[i].style["color"] = "#FF00FF";
}
How would I apply this ONLY to the elements containing the string "w3.org"?
I thought this would be so simple at first! Any and all help is appreciated.
While you can't filter by non-exact href values when finding the initial list, and you can't filter by contained text then either, you can filter the list after the fact using plain javascript:
var links = [...document.body.getElementsByTagName("a")];
for (var i = 0; i < links.length; i++) {
if (links[i]['href'].indexOf('www.w3.org') == -1) { continue };
links[i].style["color"] = "#FF00FF";
}
Assuming you want to filter by the href, that is. If you mean the literal text, you would use links[i]['text'] instead.

Regex to strip line comments from html [duplicate]

This question already has answers here:
RegEx for match/replacing JavaScript comments (both multiline and inline)
(17 answers)
Closed 5 years ago.
I am trying to remove unnecessary Line Comments from html & css. I've Found a regex to remove comments like these:
/* Write your comments here */
but what Im looking for is a regex to Multi Line Comments like these:
<!-- Write your comments here
Second line
third Line
-->
Currently Using this code to remove the single line comments:
<!--[^\[].*-->
Any assistance would be greatly appreciated
Better to generate a temporary DOM element and iterating over all the nodes recursively remove the comment nodes.
var str = `
<div>
test
<!-- test comment -->
<!-- test comment -->
test
<!--
test comment
multiline
-->
</div>`;
// generate div element
var temp = document.createElement('div');
// set HTML content
temp.innerHTML = str;
function removeEle(e) {
// convert child nodes into array
Array.from(e.childNodes)
// iterate over nodes
.forEach(function(ele) {
// if node type is comment then remove it
if (ele.nodeType === 8) e.removeChild(ele);
// if node is an element then call it recursively
else if (ele.nodeType === 1) removeEle(ele);
})
}
removeEle(temp);
console.log(temp.innerHTML);

Counting inner text letters of HTML element

Is there a way to count the letters of inner text of an HTML element, without counting the letters of inner element's texts?
I tried out the ".getText()" method of "WebElements" using the Selenium library, but this counts the inner Texts of inner web elements in (e.G. "<body><div>test</div></body>" results in 4 letters for the "div" and the "body" element, instead of 0 for the "body" element)
Do I have to use an additional HTML parsing library, and when yes which one would you recommend?
I'm using Java 7...
Based on this answer for a similar question, I cooked you a solution:
The piece of JavaScript takes an element, iterates over all its child nodes and if they're text nodes, it reads them and returns them concatenated:
var element = arguments[0];
var text = '';
for (var i = 0; i < element.childNodes.length; i++)
if (element.childNodes[i].nodeType === Node.TEXT_NODE) {
text += element.childNodes[i].textContent;
}
return text;
I saved this script into a script.js file and loaded it into a single String via FileUtils.readFileToString(). You can use Guava's Files.toString(), too. Or just embed it into your Java code.
final String script = FileUtils.readFileToString(new File("script.js"), "UTF-8");
JavascriptExecutor js = (JavascriptExecutor)driver;
...
WebElement element = driver.findElement(By.anything("myElement"));
String text = (String)js.executeScript(script, element);

highlight words in html using regex & javascript - almost there

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.
At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.
This is as close as I have gotten:
(?<=^|>)([^><].*?)(?=<|$)
It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.
Input: Any html element (this could be quite large, eg <body>)
Search Term: 1 or more characters
Replace Txt: <span class='highlight'>$1</span>
UPDATE
The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...
Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>
However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".
var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
$(this).html(text);
});
It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?
UPDATE
The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.
Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.
For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.
Example code (adjusted, origin is here):
(function iterate_node(node) {
if (node.nodeType === 3) { // Node.TEXT_NODE
var text = node.data,
pos = text.search(/any regular expression/g), //indexOf also applicable
length = 5; // or whatever you found
if (pos > -1) {
node.data = text.substr(0, pos); // split into a part before...
var rest = document.createTextNode(text.substr(pos+length)); // a part after
var highlight = document.createElement("span"); // and a part between
highlight.className = "highlight";
highlight.appendChild(document.createTextNode(text.substr(pos, length)));
node.parentNode.insertBefore(rest, node.nextSibling); // insert after
node.parentNode.insertBefore(highlight, node.nextSibling);
iterate_node(rest); // maybe there are more matches
}
} else if (node.nodeType === 1) { // Node.ELEMENT_NODE
for (var i = 0; i < node.childNodes.length; i++) {
iterate_node(node.childNodes[i]); // run recursive on DOM
}
}
})(content); // any dom node
There's also highlight.js, which might be exactly what you want.