Chrome extension: Add style to element if it contains particular text - html

I want to add styling to an element only if it contains a particular string. i.e. if(el contains str) {el:style}.
If I wanted just the links containing w3.org to be pink, how would I find <a href="http://www.w3.org/1999/xhtml">article < /a> inside the innerHTML and then style the word "article" on the page.
So far, I can turn ALL the links pink but I can't selectively target the ones containing "www.w3.org".
var links = [...document.body.getElementsByTagName("a")];
for (var i = 0; i < links.length; i++) {
links[i].style["color"] = "#FF00FF";
}
How would I apply this ONLY to the elements containing the string "w3.org"?
I thought this would be so simple at first! Any and all help is appreciated.

While you can't filter by non-exact href values when finding the initial list, and you can't filter by contained text then either, you can filter the list after the fact using plain javascript:
var links = [...document.body.getElementsByTagName("a")];
for (var i = 0; i < links.length; i++) {
if (links[i]['href'].indexOf('www.w3.org') == -1) { continue };
links[i].style["color"] = "#FF00FF";
}
Assuming you want to filter by the href, that is. If you mean the literal text, you would use links[i]['text'] instead.

Related

How to extract the href value from links in HTML data based on then element's text?

I have been tasked with the coding of a web crawler that goes through several URLs (around 400, but the list could grow), each with a completely different html structure and extract the links containing certain information. The only thing the program knows beforehand is what are the keywords it should search for, but the html structure and any semantic cues as to where to look for those keywords is unknown.
So far, I have used the request-promise module for Node.js to send a request to the URL where the search for keywords will take place:
const htmlResult = await request.get(url);
htmlResult stores the response as a string, and I can save it both as an .txt or .html if needed.
The problem I have is that I don't know how to instruct the program how to extract a URL based on words that aren't necessarily present in the url string. An example might help clarify:
<a href="site/with/no/keywords-just-a-random-string" title="Keywords might be here, but title attribute might be absent"><span class="img"><img data-cfsrc="/thumbpdf/618a8nb4.jpg" alt="" style="display:none;visibility:hidden;"><noscript><img src="/thumbpdf/8bfa84.jpg" alt=""></noscript></span>
<h2>KEYWORDS ARE IN THIS TAG, WHICH IN TURN IS INSIDE THE <a> TAG</h2>
<span class="date--type">2 Nov 2021 </span>
<span class="tag">
oher stuff with no keywords in it</span>
</a>
As you can see, this tag has a complex structure. The keywords I need to parse are inside an h2 tag which, in turn, is inside the a tag. But he a tag could also be like this:
KEYWORDS TO PARSE
Here the keywords are simply within the a tag.
My question, thus, is how do I parse htmlResult (either as a string or saved as a .txt/.html file), and, once I get a match, instruct the program to extract the url that is in the bounds of the a tag wherein I go the match of keywords?
As I am using Node.js I open to using any tool available.
Could someone offer some advice on how to tackle this challenge?
Thanks so much in advance.
This is very quick and dirty, and I'm sure it can be further streamlined, but it should get you at least closer to where you need to be.
This assumes a bunch of <div> elements, each containing one of your your <a> elements, all in one document (see link below). It uses xpath to locate the data:
function xpathEval(xpath, context) {
return document.evaluate(xpath, context, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
}
desiredHrefs = []
let targets = xpathEval("//div[#class='container']", document);
for (let i = 0; i < targets.snapshotLength; i++) {
let attribs = xpathEval('.//*/#*', targets.snapshotItem(i)),
texts = xpathEval('.//*/text()', targets.snapshotItem(i));
for (let k = 0; k < attribs.snapshotLength; k++) {
attribData = attribs.snapshotItem(k).textContent
if (attribData.includes("trainer") & attribData.includes("dog")) {
//either
//console.log(targets.snapshotItem(i).querySelector('a').getAttribute('href'))
//ot
let href = xpathEval('.//a/#href', targets.snapshotItem(i));
desiredHrefs.push(href.snapshotItem(0).textContent)
}
}
for (let j = 0; j < texts.snapshotLength; j++) {
data = texts.snapshotItem(j).nodeValue.trim().toLowerCase()
if (data.includes("trainer") & data.includes("dog")) {
//either
//console.log(targets.snapshotItem(i).querySelector('a').getAttribute('href'))
//or
let href = xpathEval('.//a/#href', targets.snapshotItem(i));
desiredHrefs.push(href.snapshotItem(0).textContent)
}
}
}
for (let href of [...new Set(desiredHrefs)])
console.log(href)
You can see it in action here.

Changing bullet to dash ( -) in Google Document

We have a function to set a glyphType to DocumentApp.GlyphType.BULLET.
listItem.setGlyphType(DocumentApp.GlyphType.BULLET)
However, is there any way to set the glyphType to dash (-)?
For example, our list is below.
- Item 1
- Item 2
- Item 3
Ref: https://developers.google.com/apps-script/reference/document/list-item#setGlyphType(GlyphType)
The dash is not listed as a glyph type. But here is a work around. You could make you own pre-filled list with place holder items in a master document, copy the list and replace the items into the target document. Perhaps this is a lot of effort for styling bullets, but it could work.
Yes #Jason Allshorn is correct. I was able to set a custom bullet using apps script. I have a template doc that I copy to make a new doc. In this template I created a list item, text "list item", with my custom bullet glyph. Google, what is up with those giant dots? Ugly! I find that list item in the doc, copy it, and remove it. Code below:
function getListItem(ss, doc) {
var body = doc.getBody();
for (var i = 0; i < body.getNumChildren(); i++) {
var child = body.getChild(i);
var childType = child.getType();
if (childType == DocumentApp.ElementType.LIST_ITEM && child.getText() == 'list item') {
var customBulletListItem = child.copy();
body.removeChild(child);
break;
}
}
return customBulletListItem;
}
... then when I add a list item (li), I do the following:
body.insertListItem(i, li.copy());
body.getChild(i).replaceText("list item", "My new list item text");
body.getChild(i).setIndentFirstLine(0).setIndentStart(15);
body.getChild(i).editAsText().setBold(true);
This gets me my custom bullet glyph. The last two lines fix the huge indent on list items and bold the line. Google, what is up with the huge indents? Ugly!

IE8 select population performance issues - solutions needed

I have an application that is having issue when populating selects with over 100 items. This problem only occurs in IE8. I am using angularjs to do the population, but my research shows that this is a general problem with IE8. What solutions have others used to deal with this problem. We have over 40,000 users tied to IE8 for the foreseeable future (Fortune 200 company) so moving to another browser is not an option.
Some thoughts I had.
Create a series of option tags as a one long string in memory and replace the innerHTML of the . But running some people samples this does not appear to solve the issue.
Originally populating the select with a few and then adding the rest as the user scrolls down. I am not sure if this is possible, or how to implement this
I am sure others have run into this issue. Does anyone have some ideas?
Thanks,
Jerry
Another solution that preserves the original <select> is to set the <option> values after adding the options to the <select>.
Theory
Add the <option> elements to a document fragment.
Add the document fragment to the <select>.
Set the value for each <option>.
Practice
In practice we end up with a couple issues we have to work around to get this to work:
IE11 is very slow when setting the value for each individual <option>.
IE8 has selection bugs because it isn't properly doing a re-flow/layout on the <select>.
Result
To handle these what we really do is something like the following:
Add the <option> tags to a document fragment. Make sure to set the values so that step 3 is a no-op in IE11.
Add the document fragment to the <select>.
Set the value for each <option>. In IE8 this will set the values, in IE11 this is a no-op.
In a setTimeout add and remove a dummy <option>. This forces a re-flow.
Code
function setSelectOptions(select, options)
{
select.innerHTML = ''; // Blank the list.
// 1. Add the options to a document fragment.
var docFrag = document.createDocumentFragement();
for (var i = 0; i < options.length; i++)
{
var opt = document.createElement('option');
opt.text = options[i];
docFrag.appendChild(opt);
}
// 2. Add the document fragment to the select.
select.appendChild(docFrag);
// 3. Set the option values for IE8. This is a no-op in IE11.
for (i = 0; i < options.length; i++)
select.options[i].text = options[i];
// 4. Force a re-flow/layout to fix IE8 selection bugs.
window.setTimeout(function()
{
select.add(document.createElement('option'));
select.remove(select.options.length - 1);
}, 0);
}
The best solution seems to be to create the Select and it's options as a text string and add that string as the innerHTML of the containing tag such as a DIV. Below is some code.
<div id="selectHome" ></div>
In JS (from angular controller)
function insertSelect(divForSelect) {
var str = "<select id='myselect'>";
for (var i = 0; i < data.length; i++) {
str += '<option>' + data[i] + '</data>';
}
str += '</select>';
divForSelect.innnerHTML = str;
}
Note that inserting options into a existing Select is very slow (8,000 msecs for 2000 items). But, if the select and the options are inserted as a single string it is very fast (12 msec for 2000 items).

Counting inner text letters of HTML element

Is there a way to count the letters of inner text of an HTML element, without counting the letters of inner element's texts?
I tried out the ".getText()" method of "WebElements" using the Selenium library, but this counts the inner Texts of inner web elements in (e.G. "<body><div>test</div></body>" results in 4 letters for the "div" and the "body" element, instead of 0 for the "body" element)
Do I have to use an additional HTML parsing library, and when yes which one would you recommend?
I'm using Java 7...
Based on this answer for a similar question, I cooked you a solution:
The piece of JavaScript takes an element, iterates over all its child nodes and if they're text nodes, it reads them and returns them concatenated:
var element = arguments[0];
var text = '';
for (var i = 0; i < element.childNodes.length; i++)
if (element.childNodes[i].nodeType === Node.TEXT_NODE) {
text += element.childNodes[i].textContent;
}
return text;
I saved this script into a script.js file and loaded it into a single String via FileUtils.readFileToString(). You can use Guava's Files.toString(), too. Or just embed it into your Java code.
final String script = FileUtils.readFileToString(new File("script.js"), "UTF-8");
JavascriptExecutor js = (JavascriptExecutor)driver;
...
WebElement element = driver.findElement(By.anything("myElement"));
String text = (String)js.executeScript(script, element);

highlight words in html using regex & javascript - almost there

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.
At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.
This is as close as I have gotten:
(?<=^|>)([^><].*?)(?=<|$)
It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.
Input: Any html element (this could be quite large, eg <body>)
Search Term: 1 or more characters
Replace Txt: <span class='highlight'>$1</span>
UPDATE
The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...
Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>
However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".
var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
$(this).html(text);
});
It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?
UPDATE
The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.
Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.
For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.
Example code (adjusted, origin is here):
(function iterate_node(node) {
if (node.nodeType === 3) { // Node.TEXT_NODE
var text = node.data,
pos = text.search(/any regular expression/g), //indexOf also applicable
length = 5; // or whatever you found
if (pos > -1) {
node.data = text.substr(0, pos); // split into a part before...
var rest = document.createTextNode(text.substr(pos+length)); // a part after
var highlight = document.createElement("span"); // and a part between
highlight.className = "highlight";
highlight.appendChild(document.createTextNode(text.substr(pos, length)));
node.parentNode.insertBefore(rest, node.nextSibling); // insert after
node.parentNode.insertBefore(highlight, node.nextSibling);
iterate_node(rest); // maybe there are more matches
}
} else if (node.nodeType === 1) { // Node.ELEMENT_NODE
for (var i = 0; i < node.childNodes.length; i++) {
iterate_node(node.childNodes[i]); // run recursive on DOM
}
}
})(content); // any dom node
There's also highlight.js, which might be exactly what you want.