I want to scan words in a Google doc from left to right and replace the first occurrences of some keywords with a URL or a bbcode like tag wrapper around them.
I cannot use findText API because it's not simple regex finding but complex pattern matching involving lots of if else conditions involving business logic.
Here is how I want to solve this
let document = DocumentApp.getActiveDocument().getBody();
let paragraph = document.getParagraphs()[0];
let contents = paragraph.getText();
// makeAllTheNecessaryReplacemens has all the business logic to identify which keywords need to changed
let newContents = makeAllTheNecessaryReplacemens(contents);
paragraph.setText(newContents);
The problem here is that text style gets wiped out and also makeAllTheNecessaryReplacemens cannot add hyperlinks to string text.
Please suggest a way to do this.
Proposed function
/**
* This is a wrapper around the attribute functions
* this allows setting one attribute at a time
* based of a complete attribute object obtained
* from another element. This makes it far more
* reliable.
*/
const attributeKey = {
FONT_SIZE : (o,s,e,a) => o.setFontSize(s,e,a),
STRIKETHROUGH : (o,s,e,a) => o.setStrikethrough(s,e,a),
FOREGROUND_COLOR : (o,s,e,a) => o.setForegroundColor(s,e,a),
LINK_URL : (o,s,e,a) => o.setLinkUrl(s,e,a),
UNDERLINE : (o,s,e,a) => o.setUnderline(s,e,a),
BOLD : (o,s,e,a) => o.setBold(s,e,a),
ITALIC : (o,s,e,a) => o.setItalic(s,e,a),
BACKGROUND_COLOR : (o,s,e,a) => o.setBackgroundColor(s,e,a),
FONT_FAMILY : (o,s,e,a) => o.setFontFamily(s,e,a)
}
/**
* Replace textToReplace with replacementText
* Will reatain formatting and hyperlinks
*/
function replaceTextPlus(textToReplace, replacementText) {
// Initializing
let body = DocumentApp.getActiveDocument().getBody();
let searchResult = body.findText(textToReplace);
while (searchResult != null) {
// Getting info about result
let foundElement = searchResult.getElement();
let start = searchResult.getStartOffset();
let end = searchResult.getEndOffsetInclusive();
// This returns a complete attributes object
// Many attributes have null as a value
let attributes = foundElement.getAttributes(start);
// Replacing text
foundElement.deleteText(start, end);
foundElement.insertText(start, replacementText);
// Setting new end index
let newEnd = start + replacementText.length - 1
// Set attributes for new text skipping over null values
// This requires the constant defined at the top.
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
// Modifies the actual searchResult so that the next findText
// starts at the NEW end index.
try {
let rangeBuilder = DocumentApp.getActiveDocument().newRange();
rangeBuilder.addElement(foundElement, start, newEnd);
searchResult = rangeBuilder.getRangeElements()[0];
} catch (e){
Logger.log("End of Document")
return null
}
// searches for next result
searchResult = body.findText(textToReplace, searchResult);
}
}
Extending the findText API
This function relies on the findText API, but it adds in a few more steps.
Find the text.
Get the element containing the text.
Get the start and end indices of the text.
Get the attributes of the text (font, color, hyperlink etc)
Replace the text.
Update the end index.
Use the old attributes to update the new text.
You call it like this:
replaceTextPlus("Bing", "Google")
replaceTextPlus("occurrences", "happenings")
replaceTextPlus("text", "prefixedtext")
How to set the formatting and link attributes.
This relies on the attributes object that gets returned from getAttributes. Which looks something like this:
{
FOREGROUND_COLOR=#ff0000,
LINK_URL=null,
FONT_SIZE=null,
ITALIC=true,
STRIKETHROUGH=null,
FONT_FAMILY=null,
BOLD=null,
UNDERLINE=true,
BACKGROUND_COLOR=null
}
I tried to use setAttributes but it was very unreliable. Using this method almost always resulted in some formatting loss.
To fix this I make an object attributeKey that wraps all the different functions for setting individual attributes, so that they can be called from this loop:
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
This allows null values to be skipped which seems to have solved the unreliability problem. Perhaps the update buffer gets confused with many values.
Limitations
This function gets the formatting of the first character of the found word. If the same work has different formatting within itself. For example, "Hello" (Mixed normal with bold and italic), the replacement word will have the formatting of the first letter. This could potentially be fixed by identifying the word and iterating over every single letter.
References
Text class
Body class
DocumentApp
Element Interface
Attribute Enum
I encountered the following problem. I have a Google Document, that contains a bunch of table objects and some of those tables contain inline images themselves.
With the Body.getImages() function I should be able to get the images of the whole document (right?). But is there any way to get the images from a specific table or is there a way to determine in which tables the images retrieved by the Body.getImages() method are located?
In case you are wondering what this is used for: My Google Doc is used to store several multiple-choice exam questions, where each question is represented by a table. I am trying to write a script to export these questions to a specific format and I encountered the problem that some of those questions contain images.
Correct - body.getImages() will return an array of images.
We can use this array of images to find the corresponding table's for each image. If we use a recursive function on each image, we can getParent() up the document tree until the Parent Table to a particular image is found, then we list the element number (the ChildIndex) for the Table. If there is a "Question #" header in the table, we can search for it and return the question number of the located Table.
function myFunction() {
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
var tables = body.getTables();
var images = doc.getBody().getImages();
Logger.log("Found " + images.length + " images");
Logger.log("Found " + tables.length + " tables");
//list body element #'s for each tables
let tableList = []
tables.forEach(table => tableList.push(String(table.getParent().getChildIndex(table))))
Logger.log("Tables at body element #s: ", tableList);
function findQuestionNumber (element, index) {
parent = element.getParent()
//IF found the parent Table
if (parent.getType() == DocumentApp.ElementType.TABLE) {
//Find the question # from the Table
let range = parent.findText("Question")
//Output where this image was found. (The childindex)
Logger.log("found Image", String(index + 1), "in ", range.getElement().getParent().getText(), " at body element #", String(parent.getParent().getChildIndex(parent)));
return
//use recursion to continue up the tree until the parent Table is found
} else {
findQuestionNumber(parent, index)
}
}
//Run Function for each image in getImages() Array
images.forEach((element, index) => findQuestionNumber(element, index));
}
Unfortunately there are errors. Here it is:
11:11:16 PM Notice Execution started
11:11:16 PM Info Found 26 images
11:11:16 PM Info Found 1 tables
11:11:16 PM Info Tables at body element #s:
11:11:17 PM Error
TypeError: Cannot read property 'getElement' of null
findQuestionNumber # Code.gs:22
findQuestionNumber # Code.gs:26
findQuestionNumber # Code.gs:26
findQuestionNumber # Code.gs:26
(anonymous) # Code.gs:31
myFunction # Code.gs:31
I'm writing a script that picks the paragraph where the cursor is contained, set the text to uppercase and change the paragraph heading to HEADING1.
However, the paragraph is set to the 'global' HEADING1, not to HEADING1 as it is defined in the current document. Here is the code.
function SetSceneHeading() {
var cursor = DocumentApp.getActiveDocument().getCursor();
var element = cursor.getElement();
var paragraph = [];
if (element.getType() != 'PARAGRAPH') {
paragraph = element.getParent().asParagraph();
}
else paragraph = element.asParagraph();
var txt = paragraph.getText();
var TXT = txt.toUpperCase();
paragraph.setText(TXT);
paragraph.setHeading(DocumentApp.ParagraphHeading.HEADING1);
}
Is there a way to set a paragraph to the 'current' HEADING1? Thanks.
I found a workaroud to set a paragraph to a user defined heading. Basically, you first set the heading using setHeading(), then you set to "null" the attributes that the previous operation messed up. This way the paragraph is set according to the user defined heading.
function MyFunction ()
var paragraph = ....
paragraph.setHeading(DocumentApp.ParagraphHeading.HEADING1);
paragraph.setAttributes(ResetAttributes());
function ResetAttributes() {
var style = {};
style[DocumentApp.Attribute.FONT_SIZE] = null;
style[DocumentApp.Attribute.BOLD] = null;
style[DocumentApp.Attribute.SPACING_BEFORE] = null;
style[DocumentApp.Attribute.SPACING_AFTER] = null;
return style;
}
I made a few tests, FONT_SIZE BOLD SPACING_BEFORE SPACING_AFTER seem to be the attributes that need to be reset. They may be more, according to the cases.
Unfortunately it seems that this won't be possible for now, there is an open issue that I think is relevant : issue 2373 (status acknowledged) , you could star it to get informed of any enhancement.
I want to create a new document based on a template and need to know when my insertion or append results in a new page in the final printed output is there any property/attribute eg number of pages that can be used for this?
I've search this a lot in the past and I don't think there's any property or any other way to know page info.
The solution I use is to insert page breaks on my template or via the script, using my own knowledge of how my template works, i.e. how much space it takes as I iterate, etc.
And then I know which page I am by counting the page breaks.
Anyway, you could an enhancement request on the issue tracker.
One way to get total number of pages:
function countPages() {
var blob = DocumentApp.getActiveDocument().getAs("application/pdf");
var data = blob.getDataAsString();
var re = /Pages\/Count (\d+)/g;
var match;
var pages = 0;
while(match = re.exec(data)) {
Logger.log("MATCH = " + match[1]);
var value = parseInt(match[1]);
if (value > pages) {
pages = value;
}
}
Logger.log("pages = " + pages);
return pages;
}
I have the following function that is supposed to get HTMLs for the user selected area on the web page. This function does not seems to work properly.
Sometime, it gets htmls which is not selected also.
Can anyone please look into this function? -- Thanks a lot.
//----------------------------Get Selected HTML------------------------
function getSelectionHTML(){
if (window.getSelection)
{
var focusedWindow = document.commandDispatcher.focusedWindow;
var sel = focusedWindow.getSelection();
var html = "";
var r = sel.getRangeAt(0);
var parent_element = r.commonAncestorContainer;
var prev_html = parent_element.innerHTML;
if(prev_html != undefined)
{
return prev_html;
}
return sel;
}
return null;
}
It looks to me like you're getting the contents of the parent element rather than the selection itself. If the parent element contains anything other than what you have selected, then you'll get that too.
var sel = focusedWindow.getSelection();
This line returns a selection object. It contains the exact text selected by the user. You then get the range from the selection and get the commonAncestorContainer. So if you have code like this:
<div id="ancestor">
<p>First sentence.</p>
<p>Another sentence.</p>
</div>
And your user selects from the 's' of the first sentence to the 's' of the second sentence then the commonAncestorContainer is the div element so you'll also get the rest of the text.
A good reason for this would be if you wanted to guarantee yourself a valid HTML fragment (this seems to be the case, implied by your function name), but if you just want the selected text then call the toString method on the range directly:
var focusedWindow = document.commandDispatcher.focusedWindow;
var sel = focusedWindow.getSelection();
var r = sel.getRangeAt(0);
return r.toString();