Can Google App Scripts access the location of footnote superscripts programmatically? - google-apps-script

Is it possible to use DocumentApp to find the location of footnote references in the body?
Searching the body or an element using editAsText() or findText() does not show the superscript footnote markers.
For example, in the following document:
This is a riveting story with statistics!1 You can see other stuff here too.
body.getText() returns 'This is a riveting story with statistics! You can see other stuff here too.' No reference, no 1
If I want to replace, edit, or manipulate text around the footnote reference (e.g. 1 ), how can I find its location?

It turns out that the footnote reference is indexed as a child in the Doc. So you can get the index of the footnote reference, insert some text at that index, and then remove the footnote from its parent.
function performConversion (docu) {
var footnotes = docu.getFootnotes() // get the footnote
var noteText = footnotes.map(function (note) {
return '((' + note.getFootnoteContents() + ' ))' // reformat text with parens and save in array
})
footnotes.forEach(function (note, index) {
var paragraph = note.getParent() // get the paragraph
var noteIndex = paragraph.getChildIndex(note) // get the footnote's "child index"
paragraph.insertText(noteIndex, noteText[index]) // insert formatted text before footnote child index in paragraph
note.removeFromParent() // delete the original footnote
})
}

You can use getFootnotes() to edit footnotes. getFootnotes() returns an arrays of objects , you need to iterate over them.
You can list the locations (i.e Parent Paragraphs) of footnotes in Logger.log(), in the following fashion:
function getFootnotes(){
var doc = DocumentApp.openById('...');
var footnotes = doc.getFootnotes();
var textLocation = {};
for(var i in footnotes ){
textLocation = footnotes[i].getParent().getText();
Logger.log(textLocation);
}
}
To get the paragraph truncated right upto the footnote superscript. You can use:
textLocation = footnotes[i].getPreviousSibling().getText();
in your case it should return: This is a riveting story with statistics! only this portion, because [1] is just after the word statistics!

Related

How to replace words in Google doc while maintaing text style?

I want to scan words in a Google doc from left to right and replace the first occurrences of some keywords with a URL or a bbcode like tag wrapper around them.
I cannot use findText API because it's not simple regex finding but complex pattern matching involving lots of if else conditions involving business logic.
Here is how I want to solve this
let document = DocumentApp.getActiveDocument().getBody();
let paragraph = document.getParagraphs()[0];
let contents = paragraph.getText();
// makeAllTheNecessaryReplacemens has all the business logic to identify which keywords need to changed
let newContents = makeAllTheNecessaryReplacemens(contents);
paragraph.setText(newContents);
The problem here is that text style gets wiped out and also makeAllTheNecessaryReplacemens cannot add hyperlinks to string text.
Please suggest a way to do this.
Proposed function
/**
* This is a wrapper around the attribute functions
* this allows setting one attribute at a time
* based of a complete attribute object obtained
* from another element. This makes it far more
* reliable.
*/
const attributeKey = {
FONT_SIZE : (o,s,e,a) => o.setFontSize(s,e,a),
STRIKETHROUGH : (o,s,e,a) => o.setStrikethrough(s,e,a),
FOREGROUND_COLOR : (o,s,e,a) => o.setForegroundColor(s,e,a),
LINK_URL : (o,s,e,a) => o.setLinkUrl(s,e,a),
UNDERLINE : (o,s,e,a) => o.setUnderline(s,e,a),
BOLD : (o,s,e,a) => o.setBold(s,e,a),
ITALIC : (o,s,e,a) => o.setItalic(s,e,a),
BACKGROUND_COLOR : (o,s,e,a) => o.setBackgroundColor(s,e,a),
FONT_FAMILY : (o,s,e,a) => o.setFontFamily(s,e,a)
}
/**
* Replace textToReplace with replacementText
* Will reatain formatting and hyperlinks
*/
function replaceTextPlus(textToReplace, replacementText) {
// Initializing
let body = DocumentApp.getActiveDocument().getBody();
let searchResult = body.findText(textToReplace);
while (searchResult != null) {
// Getting info about result
let foundElement = searchResult.getElement();
let start = searchResult.getStartOffset();
let end = searchResult.getEndOffsetInclusive();
// This returns a complete attributes object
// Many attributes have null as a value
let attributes = foundElement.getAttributes(start);
// Replacing text
foundElement.deleteText(start, end);
foundElement.insertText(start, replacementText);
// Setting new end index
let newEnd = start + replacementText.length - 1
// Set attributes for new text skipping over null values
// This requires the constant defined at the top.
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
// Modifies the actual searchResult so that the next findText
// starts at the NEW end index.
try {
let rangeBuilder = DocumentApp.getActiveDocument().newRange();
rangeBuilder.addElement(foundElement, start, newEnd);
searchResult = rangeBuilder.getRangeElements()[0];
} catch (e){
Logger.log("End of Document")
return null
}
// searches for next result
searchResult = body.findText(textToReplace, searchResult);
}
}
Extending the findText API
This function relies on the findText API, but it adds in a few more steps.
Find the text.
Get the element containing the text.
Get the start and end indices of the text.
Get the attributes of the text (font, color, hyperlink etc)
Replace the text.
Update the end index.
Use the old attributes to update the new text.
You call it like this:
replaceTextPlus("Bing", "Google")
replaceTextPlus("occurrences", "happenings")
replaceTextPlus("text", "prefixedtext")
How to set the formatting and link attributes.
This relies on the attributes object that gets returned from getAttributes. Which looks something like this:
{
FOREGROUND_COLOR=#ff0000,
LINK_URL=null,
FONT_SIZE=null,
ITALIC=true,
STRIKETHROUGH=null,
FONT_FAMILY=null,
BOLD=null,
UNDERLINE=true,
BACKGROUND_COLOR=null
}
I tried to use setAttributes but it was very unreliable. Using this method almost always resulted in some formatting loss.
To fix this I make an object attributeKey that wraps all the different functions for setting individual attributes, so that they can be called from this loop:
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
This allows null values to be skipped which seems to have solved the unreliability problem. Perhaps the update buffer gets confused with many values.
Limitations
This function gets the formatting of the first character of the found word. If the same work has different formatting within itself. For example, "Hello" (Mixed normal with bold and italic), the replacement word will have the formatting of the first letter. This could potentially be fixed by identifying the word and iterating over every single letter.
References
Text class
Body class
DocumentApp
Element Interface
Attribute Enum

How to set a certain number of spaces or indents before a Paragraph in Google Docs using Google Apps Script

I have a 20 line script, and I want to make sure that each paragraph is indented exactly once.
function myFunction() {
/*
This function turns the document's format into standard MLA.
*/
var body = DocumentApp.getActiveDocument().getBody();
body.setFontSize(12); // Set the font size of the contents of the documents to 9
body.setForegroundColor('#000000');
body.setFontFamily("Times New Roman");
// Loops through paragraphs in body and sets each to double spaced
var paragraphs = body.getParagraphs();
for (var i = 3; i < paragraphs.length; i++) { // Starts at 3 to exclude first 4 developer-made paragraphs
var paragraph = paragraphs[i];
paragraph.setLineSpacing(2);
// Left align the first cell.
paragraph.setAlignment(DocumentApp.HorizontalAlignment.LEFT);
// One indent
paragraph.editAsText().insertText(0, "\t"); // Adds one tab every time
}
var bodyText = body.editAsText();
bodyText.insertText(0, 'February 3, 1976\nMrs. Smith\nYour Name Here\nSocial Studies\n');
bodyText.setBold(false);
}
The code I have tried doesn't work. But my expected results are that for every paragraph in the for loop in myFunction(), there are exactly 4 spaces before the first word in each paragraph.
Here is a sample: https://docs.google.com/document/d/1sMztzhOehzheRdqumC6PLnvk4qJgUCSE0irjTZ0FjTQ/edit?usp=sharing
If the user uses Autoformat, but already has the paragraphs indented...
Update
I have investigated use of the Paragraph.setIndentFirstLine() method. When I set it to four, it sets it to 1 space. Now I realize this is because points and spaces are not the same thing. What number do I need to multiply by to get four spaces in points?
Let us consider a few basic identing operations: manual and by script.
The following image shows how to indent current paragraph (cursor stays inside this one).
Please note, the units are centimetres. Also note, that the paragraph does not include leading spaces or tabs, we have no need of them.
Suppose we would like to get the indent values in the script and apply them to the next paragraph. Look at the code below:
function myFunction() {
var ps = DocumentApp.getActiveDocument().getBody().getParagraphs();
// We work with the 5-th and 6-th paragraphs indeed
var iFirst = ps[5].getIndentFirstLine();
var iStart = ps[5].getIndentStart();
var iEnd = ps[5].getIndentEnd();
Logger.log([iFirst, iStart, iEnd]);
ps[6].setIndentFirstLine(iFirst);
ps[6].setIndentStart(iStart);
ps[6].setIndentEnd(iEnd);
}
If you run and look at the log, you will see something like this: [92.69291338582678, 64.34645669291339, 14.173228346456694]. No surprise, we have typographic points instead of centimetres. (1cm=28.3465pt) So we can measure and modify any paragraph indent values precisely.
Addition
For some reasons you might want to control spaces number at the beginning of the paragraph. It is also possible by scripting, but it has no effect on the paragraph's "left" or "right" indents.
Sample code below is for similar task: count leading spaces number of the 5-th paragraph and make the same number of spaces at the beginning of the next one.
function mySpaces() {
var ps = DocumentApp.getActiveDocument().getBody().getParagraphs();
// We work with the 5-th and 6-th paragraphs indeed
var spacesCount = getLeadingSpacesCount(ps[5]);
Logger.log(spacesCount);
var diff = getLeadingSpacesCount(ps[6]) - spacesCount;
if (diff > 0) {
ps[6].editAsText().deleteText(0, diff - 1);
} else if (diff < 0) {
var s = Array(1 - diff).join(' ');
ps[6].editAsText().insertText(0, s);
}
}
function getLeadingSpacesCount(p) {
var found = p.findText("^ +");
return found ? found.getEndOffsetInclusive() + 1 : 0;
}
We have used methods deleteText() and insertText() of the class Text for proper corrections and findText() to locate the spaces if any. Note, the last method argument is a string, representing a regular expression. It matches "all leading spaces", if they exist. See more details about regular expression syntax.

Changing bullet to dash ( -) in Google Document

We have a function to set a glyphType to DocumentApp.GlyphType.BULLET.
listItem.setGlyphType(DocumentApp.GlyphType.BULLET)
However, is there any way to set the glyphType to dash (-)?
For example, our list is below.
- Item 1
- Item 2
- Item 3
Ref: https://developers.google.com/apps-script/reference/document/list-item#setGlyphType(GlyphType)
The dash is not listed as a glyph type. But here is a work around. You could make you own pre-filled list with place holder items in a master document, copy the list and replace the items into the target document. Perhaps this is a lot of effort for styling bullets, but it could work.
Yes #Jason Allshorn is correct. I was able to set a custom bullet using apps script. I have a template doc that I copy to make a new doc. In this template I created a list item, text "list item", with my custom bullet glyph. Google, what is up with those giant dots? Ugly! I find that list item in the doc, copy it, and remove it. Code below:
function getListItem(ss, doc) {
var body = doc.getBody();
for (var i = 0; i < body.getNumChildren(); i++) {
var child = body.getChild(i);
var childType = child.getType();
if (childType == DocumentApp.ElementType.LIST_ITEM && child.getText() == 'list item') {
var customBulletListItem = child.copy();
body.removeChild(child);
break;
}
}
return customBulletListItem;
}
... then when I add a list item (li), I do the following:
body.insertListItem(i, li.copy());
body.getChild(i).replaceText("list item", "My new list item text");
body.getChild(i).setIndentFirstLine(0).setIndentStart(15);
body.getChild(i).editAsText().setBold(true);
This gets me my custom bullet glyph. The last two lines fix the huge indent on list items and bold the line. Google, what is up with the huge indents? Ugly!

find common words of two webpages on the fly

I have a list of species here:
http://megasun.bch.umontreal.ca/ogmp/projects/other/compare.html
And a list of species here:
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3524
I would like to find all species that are mentioned on BOTH pages. How can I do this quickly? (I dont mind if words not referring to species are to be found. I want to do comparision of words in general:)
Thanks for suggestions.
On each page in a console, do:
var html = document.body.innerHTML;
results = [];
html.match(/>([^<]+?)</g) // grab all values like ">...<"
.map(function(match) { // look for a long words..words..words
return match.match(/\w.*\w/);
})
.filter(function(match) { // ignore empty matches
return match!==null
})
.forEach(function(match) {
var text = match[0];
if (!text.match(/[0-9]/) && // ignore matches with numbers
results.indexOf(text)==-1) // add to results if not duplicate
results.push(text);
});
JSON.stringify(results);
Then do:
var page1 = JSON.parse(' /*COPY-PASTE THE RESULT OF PAGE 1*/ ');
var page2 = JSON.parse(' /*COPY-PASTE THE RESULT OF PAGE 2*/ ');
page1.map(function(s){return page2.indexOf(s)!=-1});
This is necessary to circumvent browser restrictions.
Demo:
> JSON.stringify( page1.filter(function(s){return page2.indexOf(s)!=-1}) )
'["Beta vulgaris","Spinacia oleracea"]'

highlight words in html using regex & javascript - almost there

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.
At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.
This is as close as I have gotten:
(?<=^|>)([^><].*?)(?=<|$)
It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.
Input: Any html element (this could be quite large, eg <body>)
Search Term: 1 or more characters
Replace Txt: <span class='highlight'>$1</span>
UPDATE
The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...
Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>
However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".
var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
$(this).html(text);
});
It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?
UPDATE
The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.
Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.
For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.
Example code (adjusted, origin is here):
(function iterate_node(node) {
if (node.nodeType === 3) { // Node.TEXT_NODE
var text = node.data,
pos = text.search(/any regular expression/g), //indexOf also applicable
length = 5; // or whatever you found
if (pos > -1) {
node.data = text.substr(0, pos); // split into a part before...
var rest = document.createTextNode(text.substr(pos+length)); // a part after
var highlight = document.createElement("span"); // and a part between
highlight.className = "highlight";
highlight.appendChild(document.createTextNode(text.substr(pos, length)));
node.parentNode.insertBefore(rest, node.nextSibling); // insert after
node.parentNode.insertBefore(highlight, node.nextSibling);
iterate_node(rest); // maybe there are more matches
}
} else if (node.nodeType === 1) { // Node.ELEMENT_NODE
for (var i = 0; i < node.childNodes.length; i++) {
iterate_node(node.childNodes[i]); // run recursive on DOM
}
}
})(content); // any dom node
There's also highlight.js, which might be exactly what you want.