How to replace words in Google doc while maintaing text style? - google-apps-script

I want to scan words in a Google doc from left to right and replace the first occurrences of some keywords with a URL or a bbcode like tag wrapper around them.
I cannot use findText API because it's not simple regex finding but complex pattern matching involving lots of if else conditions involving business logic.
Here is how I want to solve this
let document = DocumentApp.getActiveDocument().getBody();
let paragraph = document.getParagraphs()[0];
let contents = paragraph.getText();
// makeAllTheNecessaryReplacemens has all the business logic to identify which keywords need to changed
let newContents = makeAllTheNecessaryReplacemens(contents);
paragraph.setText(newContents);
The problem here is that text style gets wiped out and also makeAllTheNecessaryReplacemens cannot add hyperlinks to string text.
Please suggest a way to do this.

Proposed function
/**
* This is a wrapper around the attribute functions
* this allows setting one attribute at a time
* based of a complete attribute object obtained
* from another element. This makes it far more
* reliable.
*/
const attributeKey = {
FONT_SIZE : (o,s,e,a) => o.setFontSize(s,e,a),
STRIKETHROUGH : (o,s,e,a) => o.setStrikethrough(s,e,a),
FOREGROUND_COLOR : (o,s,e,a) => o.setForegroundColor(s,e,a),
LINK_URL : (o,s,e,a) => o.setLinkUrl(s,e,a),
UNDERLINE : (o,s,e,a) => o.setUnderline(s,e,a),
BOLD : (o,s,e,a) => o.setBold(s,e,a),
ITALIC : (o,s,e,a) => o.setItalic(s,e,a),
BACKGROUND_COLOR : (o,s,e,a) => o.setBackgroundColor(s,e,a),
FONT_FAMILY : (o,s,e,a) => o.setFontFamily(s,e,a)
}
/**
* Replace textToReplace with replacementText
* Will reatain formatting and hyperlinks
*/
function replaceTextPlus(textToReplace, replacementText) {
// Initializing
let body = DocumentApp.getActiveDocument().getBody();
let searchResult = body.findText(textToReplace);
while (searchResult != null) {
// Getting info about result
let foundElement = searchResult.getElement();
let start = searchResult.getStartOffset();
let end = searchResult.getEndOffsetInclusive();
// This returns a complete attributes object
// Many attributes have null as a value
let attributes = foundElement.getAttributes(start);
// Replacing text
foundElement.deleteText(start, end);
foundElement.insertText(start, replacementText);
// Setting new end index
let newEnd = start + replacementText.length - 1
// Set attributes for new text skipping over null values
// This requires the constant defined at the top.
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
// Modifies the actual searchResult so that the next findText
// starts at the NEW end index.
try {
let rangeBuilder = DocumentApp.getActiveDocument().newRange();
rangeBuilder.addElement(foundElement, start, newEnd);
searchResult = rangeBuilder.getRangeElements()[0];
} catch (e){
Logger.log("End of Document")
return null
}
// searches for next result
searchResult = body.findText(textToReplace, searchResult);
}
}
Extending the findText API
This function relies on the findText API, but it adds in a few more steps.
Find the text.
Get the element containing the text.
Get the start and end indices of the text.
Get the attributes of the text (font, color, hyperlink etc)
Replace the text.
Update the end index.
Use the old attributes to update the new text.
You call it like this:
replaceTextPlus("Bing", "Google")
replaceTextPlus("occurrences", "happenings")
replaceTextPlus("text", "prefixedtext")
How to set the formatting and link attributes.
This relies on the attributes object that gets returned from getAttributes. Which looks something like this:
{
FOREGROUND_COLOR=#ff0000,
LINK_URL=null,
FONT_SIZE=null,
ITALIC=true,
STRIKETHROUGH=null,
FONT_FAMILY=null,
BOLD=null,
UNDERLINE=true,
BACKGROUND_COLOR=null
}
I tried to use setAttributes but it was very unreliable. Using this method almost always resulted in some formatting loss.
To fix this I make an object attributeKey that wraps all the different functions for setting individual attributes, so that they can be called from this loop:
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
This allows null values to be skipped which seems to have solved the unreliability problem. Perhaps the update buffer gets confused with many values.
Limitations
This function gets the formatting of the first character of the found word. If the same work has different formatting within itself. For example, "Hello" (Mixed normal with bold and italic), the replacement word will have the formatting of the first letter. This could potentially be fixed by identifying the word and iterating over every single letter.
References
Text class
Body class
DocumentApp
Element Interface
Attribute Enum

Related

Centering in a gDoc after a replace

I am looking to replace a string within a Google Doc via an app script. The string will exist on a line, but after the replace, I want it to have a specific font, size and justification.
I've created a style to address all these attributes (I included both Horiz. and Vert. alignment) and most of it works fine. When the string is replaced, the replacement has the right font, size and bold attributes. For some reason, I cannot get the justification to get changed.
// Define the style for the replacement string.
var hdrStyle = {};
hdrStyle[DocumentApp.Attribute.HORIZONTAL_ALIGNMENT] =
DocumentApp.HorizontalAlignment.CENTER;
hdrStyle[DocumentApp.Attribute.VERTICAL_ALIGNMENT] =
DocumentApp.VerticalAlignment.CENTER;
hdrStyle[DocumentApp.Attribute.FONT_FAMILY] = 'Calibri';
hdrStyle[DocumentApp.Attribute.FONT_SIZE] = 24;
hdrStyle[DocumentApp.Attribute.BOLD] = true;
{ then later }
documentBody = DocumentApp.openById(fileId).getBody();
hdrElem = documentBody.findText("old string").getElement();
hdrText = hdrElem.asText().setText("new string");
// Force our 'header style':
hdrElem.setAttributes(hdrStyle);
I've tried setting the style after the findText and (as here) after, but no change in centering.
I see there is a paragraph centering, but I am not clear how to 'get' the paragraph associated with the element that is returned on the find.
I'm hoping this is some simple set of calls - but have run out of ideas (and patience)..!
Any help would be appreciated!
You can use getParent() on hdrElem to get the parent paragraph to apply the styling to.
https://developers.google.com/apps-script/reference/document/text#getParent()
documentBody = DocumentApp.openById(fileId).getBody();
hdrElem = documentBody.findText("old string").getElement();
hdrText = hdrElem.asText().setText("new string");
var hdrParent = hdrElem.getParent()
// Force our 'header style':
hdrParent.setAttributes(hdrStyle);

Search viewer model by attribute names

I followed this Search demo, and am trying to expand it to only search on specified attribute names.
It works without an attribute name, and returns an array of matching ids. But if I supply anything for the attribute name then search returns an empty array. I am guessing I need some magic formating for the attribute name.
So currently I have:
function search() {
var txtArea = document.getElementById("TextAreaResult");
var searchStr = document.getElementById("SearchString").value;
var searchProperties = document.getElementById("SearchProperties").value;
if (searchStr.length == 0) {
txtArea.value = "no search string.";
return;
}
var viewer = viewerApp.getCurrentViewer();
viewer.clearSelection();
if (searchProperties.length == 0)
viewer.search(searchStr, searchCallback, searchErrorCallback);
else {
var searchPropList = searchProperties.split(',');
viewer.search(searchStr, searchCallback, searchErrorCallback, searchPropList);
}
}
where searchProperties is a user input, eg "Name", and searchPropList becomes a single element array.
The same example also covers getProperties(), which returns displayName and displayCategory for each property, but I don't see a separate internal name.
Am I missing something obvious from here or do I need to transform "Name" in some way.
Or does someone have an example that will list the true name rather than displayName?
The Autodesk.Viewing.Viewer3D.search() method is NOT case sensitive on the text parameter, but it IS case sensitive on the attributeNames parameter, and you need to use the full name of the attribute.
We're now (Aug, 25, 2016) updating the documentation.

Is there a simple way to have a local webpage display a variable passed in the URL?

I am experimenting with a Firefox extension that will load an arbitrary URL (only via HTTP or HTTPS) when certain conditions are met.
With certain conditions, I just want to display a message instead of requesting a URL from the internet.
I was thinking about simply hosting a local webpage that would display the message. The catch is that the message needs to include a variable.
Is there a simple way to craft a local web page so that it can display a variable passed to it in the URL? I would prefer to just use HTML and CSS, but adding a little inline javascript would be okay if absolutely needed.
As a simple example, when the extension calls something like:
folder/messageoutput.html?t=Text%20to%20display
I would like to see:
Message: Text to display
shown in the browser's viewport.
You can use the "search" property of the Location object to extract the variables from the end of your URL:
var a = window.location.search;
In your example, a will equal "?t=Text%20to%20display".
Next, you will want to strip the leading question mark from the beginning of the string. The if statement is just in case the browser doesn't include it in the search property:
var s = a.substr(0, 1);
if(s == "?"){s = substr(1);}
Just in case you get a URL with more than one variable, you may want to split the query string at ampersands to produce an array of name-value pair strings:
var R = s.split("&");
Next, split the name-value pair strings at the equal sign to separate the name from the value. Store the name as the key to an array, and the value as the array value corresponding to the key:
var L = R.length;
var NVP = new Array();
var temp = new Array();
for(var i = 0; i < L; i++){
temp = R[i].split("=");
NVP[temp[0]] = temp[1];
}
Almost done. Get the value with the name "t":
var t = NVP['t'];
Last, insert the variable text into the document. A simple example (that will need to be tweaked to match your document structure) is:
var containingDiv = document.getElementById("divToShowMessage");
var tn = document.createTextNode(t);
containingDiv.appendChild(tn);
getArg('t');
function getArg(param) {
var vars = {};
window.location.href.replace( location.hash, '' ).replace(
/[?&]+([^=&]+)=?([^&]*)?/gi, // regexp
function( m, key, value ) { // callback
vars[key] = value !== undefined ? value : '';
}
);
if ( param ) {
return vars[param] ? vars[param] : null;
}
return vars;
}

highlight words in html using regex & javascript - almost there

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.
At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.
This is as close as I have gotten:
(?<=^|>)([^><].*?)(?=<|$)
It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.
Input: Any html element (this could be quite large, eg <body>)
Search Term: 1 or more characters
Replace Txt: <span class='highlight'>$1</span>
UPDATE
The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...
Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>
However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".
var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
$(this).html(text);
});
It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?
UPDATE
The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.
Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.
For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.
Example code (adjusted, origin is here):
(function iterate_node(node) {
if (node.nodeType === 3) { // Node.TEXT_NODE
var text = node.data,
pos = text.search(/any regular expression/g), //indexOf also applicable
length = 5; // or whatever you found
if (pos > -1) {
node.data = text.substr(0, pos); // split into a part before...
var rest = document.createTextNode(text.substr(pos+length)); // a part after
var highlight = document.createElement("span"); // and a part between
highlight.className = "highlight";
highlight.appendChild(document.createTextNode(text.substr(pos, length)));
node.parentNode.insertBefore(rest, node.nextSibling); // insert after
node.parentNode.insertBefore(highlight, node.nextSibling);
iterate_node(rest); // maybe there are more matches
}
} else if (node.nodeType === 1) { // Node.ELEMENT_NODE
for (var i = 0; i < node.childNodes.length; i++) {
iterate_node(node.childNodes[i]); // run recursive on DOM
}
}
})(content); // any dom node
There's also highlight.js, which might be exactly what you want.

What's the fastest way to search a very long list of words for a match in actionscript 3?

So I have a list of words (the entire English dictionary).
For a word matching game, when a player moves a piece I need to check the entire dictionary to see if the the word that the player made exists in the dictionary. I need to do this as quickly as possible. simply iterating through the dictionary is way too slow.
What is the quickest algorithm in AS3 to search a long list like this for a match, and what datatype should I use? (ie array, object, Dictionary etc)
I would first go with an Object, which is a hash table (at least, storage-wise).
So, for every word in your list, make an entry in your dictionary Object and store true as its value.
Then, you just have to check if a given word is a key into your dictionary to know whether the word the user has choosen is valid or not.
This works really fast in this simple test (with 10,000,000 entries):
var dict:Object = {};
for(var i:int = 0; i < 10000000; i++) {
dict[i] = true;
}
var btn:Sprite = new Sprite();
btn.graphics.beginFill(0xff0000);
btn.graphics.drawRect(0,0,50,50);
btn.graphics.endFill();
addChild(btn);
btn.addEventListener(MouseEvent.CLICK,checkWord);
var findIt:Boolean = true;
function checkWord(e:MouseEvent):void {
var word:String;
if(findIt) {
word = "3752132";
} else {
word = "9123012456";
}
if(dict[word]) {
trace(word + " found");
} else {
trace(word + " not found");
}
findIt = !findIt;
}
It takes a little longer to build the dictionary, but lookup is almost instantaneous.
The only caveat is that you will have to consider certain keys that will pass the check and not necessarily be part of your words list. Words such as toString, prototype, etc. There are just a few of them, but keep that in mind.
I would try something like this with your real data set. If it works fine, then you have a really easy solution. Go have a beer (or whatever you prefer).
Now, if the above doesn't really work after testing it with real data (notice I've build the list with numbers cast as strings for simplicity), then a couple of options, off the top of my head:
1) Partition the first dict into a set of dictionaries. So, instead of having all the words in dict, have a dictionary for words that begin with 'a', another for 'b', etc. Then, before looking up a word, check the first char to know where to look it up.
Something like:
var word:String = "hello";
var dictKey:String = word.charAt(0);
// actual check
if(dict[dictKey][word]) {
trace("found");
} else {
trace("not found");
}
You can eventually repartition if necessary. I.e, make dict['a'] point to another set of dictionaries indexed by the first two characters. So, you'll have dict['a']['b'][wordToSearch]. There are a number of possible variations on this idea (you'd also have to come up with some strategy to cope with words of two letters, such as "be", for instance).
2) Try a binary search. The problem with it is that you'll first have to sort the list, upfront. You have to do it just once, as it doesn't make sense to remove words from your dict. But with millions of words, it might be rarther intensive.
3) Try some fancy data structures from open source libraries such as:
http://sibirjak.com/blog/index.php/collections/as3commons-collections/
http://lab.polygonal.de/ds/
But again, as I said above, I'd first try the easiest and simpler solution and check if it works against the real data set.
Added
A simple way to deal with keywords used for Object's built-in properties:
var dict:Object = {};
var keywordsInDict:Array = [];
function buildDictionary():void {
// let's assume this is your original list, retrieved
// from XML or other external means
// it contains "constructor", which should be dealt with
// separately, as it's a built-in prop of Object
var sourceList:Array = ["hello","world","foo","bar","constructor"];
var len:int = sourceList.length;
var word:String;
// just a dummy vanilla object, to test if a word in the list
// is already in use internally by Object
var dummy:Object = {};
for(var i:int = 0; i < len; i++) {
// also, lower-casing is a good idea
// do that when you check words as well
word = sourceList[i].toLowerCase();
if(!dummy[word]) {
dict[i] = true;
} else {
// it's a keyword, so store it separately
keywordsInDict.push(word);
}
}
}
Now, just add an extra check for built-in props in the checkWords function:
function checkWord(e:MouseEvent):void {
var word:String;
if(findIt) {
word = "Constructor";
} else {
word = "asdfds";
}
word = word.toLowerCase();
var dummy:Object = {};
// check first if the word is a built-in prop
if(dummy[word]) {
// if it is, check if that word was in the original list
// if it was present, we've stored it in keywordsInDict
if(keywordsInDict.indexOf(word) != -1) {
trace(word + " found");
} else {
trace(word + " not found");
}
// not a built-in prop, so just check if it's present in dict
} else {
if(dict[word]) {
trace(word + " found");
} else {
trace(word + " not found");
}
}
findIt = !findIt;
}
This isn't specific to ActionScript, but a Trie is a suitable data structure for storing words.