How to sort sections of a GDocs document alphabetically? - google-apps-script

Starting point: I have a doc with sections separated by page breaks.
Goal: I want a Google Apps Script that scans through a doc and sorts the sections alphabetically (aka everything between a heading1 and a page break). The manual equivalent would be cut-pasting sections around until sorted.
To illustrate, here's an example document: initial state and desired end state.
So far, this is what I have:
A script that searches through a doc and identifies the headings with a given format (heading1). I could sort the headings alphabetically, but I don't know how to move whole sections around in the document.
function findHeading1s() {
let doc = DocumentApp.getActiveDocument();//.openById('<doc-id>');
var body = doc.getBody();
// Define the search parameters.
let searchType = DocumentApp.ElementType.PARAGRAPH;
let searchHeading = DocumentApp.ParagraphHeading.HEADING1;
let searchResult = null;
let headers = [];
// Search until the paragraph is found.
while (searchResult = body.findElement(searchType, searchResult)) {
let par = searchResult.getElement().asParagraph();
let parText = par.getText();
if (par.getHeading() == searchHeading && parText != "") {
headers.push(parText);
Logger.log(headers); //Headers in current order: [Zone, Template, Example]
}
}
Logger.log(headers.sort()) //[Example, Template, Zone]
}
Any ideas how to move everything between a heading and the following pagebreak? I don't mind if the sorted end result is in a new document.
Is this doable with existing GAS capabilities? Tried to debug the body/paragraph elements to get some idea on how to solve this, but I only get the functions of the object, not the actual content. Not super intuitive.
Here are the steps I think are needed:
Identify heading1 headings ✅
Sort headings alphabetically ✅
Find each heading in the doc
Cut-paste section in the appropriate position of the doc (or append to a new doc in the correct order)
Thanks!

Issue:
You want to sort a series of document sections according to the section's heading text.
Solution:
Assuming that each section starts with a heading, you can do the following:
Iterate through all the body elements and get info on which indexes does each section start and end (based on the location of the headings). Methods getNumChildren and getChild would be useful here. You should end up with an array of section objects, each containing the start and end indexes as well as the heading text.
Sort the section objects according to the heading text.
Iterate through the sections array, and append the children corresponding to each section indexes to the end of the document. Useful methods here are copy and the different append methods (e.g. appendParagraph). Here's a related question.
Please notice that only PARAGRAPH, TABLE and LIST_ITEM are appended on the sample below; if there are other element types present in your original content, you should add the corresponding condition to the code.
When appending the elements, it's important to use the original elements Attributes; otherwise, things like list bullets might not get appended (see this related question).
After the copied elements are appended, remove the original ones (here you can use removeFromParent).
Code sample:
function findHeading1s() {
let doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
const parType = DocumentApp.ElementType.PARAGRAPH;
const headingType = DocumentApp.ParagraphHeading.HEADING1;
const numChildren = body.getNumChildren();
let sections = [];
for (let i = 0; i < numChildren; i++) {
const child = body.getChild(i);
let section;
if (child.getType() === parType) {
const par = child.asParagraph();
const parText = par.getText();
if (par.getHeading() === headingType && parText != "") {
section = {
startIndex: i,
headingText: parText
}
if (sections.length) sections[sections.length - 1]["endIndex"] = i - 1;
sections.push(section);
}
}
}
const lastSection = sections.find(section => !section["endIndex"]);
if (lastSection) lastSection["endIndex"] = numChildren - 1;
sections.sort((a,b) => a.headingText.localeCompare(b.headingText));
sections.forEach(section => {
const { startIndex, endIndex } = section;
for (let i = startIndex; i <= endIndex; i++) {
const element = body.getChild(i);
const copy = element.copy();
const type = element.getType();
const attributes = element.getAttributes();
if (type == DocumentApp.ElementType.PARAGRAPH) {
body.appendParagraph(copy).setAttributes(attributes);
} else if (type == DocumentApp.ElementType.TABLE) {
body.appendTable(copy).setAttributes(attributes);
} else if (type == DocumentApp.ElementType.LIST_ITEM) {
body.appendListItem(copy).setAttributes(attributes);
}
}
});
for (let i = numChildren - 1; i >= 0; i--) {
body.getChild(i).removeFromParent();
}
}

Related

finding Text with specific format and delete it

I have a big google doc file with over 100 pages(with tables etc) and there is some reference text in that document in multiple locations reference texts are highlighted with the color "grey", I want to have a function that can find those colors/style in the table or paragraph and delete it. So Step 1 is finding it, and then deleting(removing those texts from the document) it in one go.
How we did it in MS Word is, we created custom styles and assign those styles to those "Remarks Text"(in grey) and in VBA we look for text matching the style name, and if it returns true than we delete those texts. As much i know about doc, there is no option to create custom styles.
Here is the code I am trying:-
function removeText()
{
var doc = DocumentApp.getActiveDocument()
var body = doc.getBody()
body.getParagraphs().map(r=> {
if(r.getAttributes().BACKGROUND_COLOR === "#cccccc")
{
//Don't know what to do next, body.removeChild(r.getChild()) not working
}
})
}
Can you guide me on how I can achieve this effectively please.
Thanks
Try this
body.getParagraphs().forEach( r => {
if( r.getAttributes().BACKGROUND_COLOR === "#cccccc" ) {
r.removeFromParent();
}
}
Reference
Paragraph.removeFromParent()
Google Apps Script hasn't a method to find text based on their style attributes, instead we need to get each part and in order to be able to get their attributes. The following example, if the format is applied to the whole paragraph, it is deleted, if not, it uses the regular expression for finding any single character ..
function removeHighlightedText() {
// In case that we want to remove the hightlighting instead of deleting the content
const style = {};
style[DocumentApp.Attribute.BACKGROUND_COLOR] = null;
const backgroundColor = '#cccccc';
const doc = DocumentApp.getActiveDocument();
const searchPattern = '.';
let rangeElement = null;
const rangeElements = [];
doc.getParagraphs().forEach(paragraph => {
if (paragraph.getAttributes().BACKGROUND_COLOR === backgroundColor) {
paragraph.removeFromParent();
// Remove highlighting
// paragraph.setAttributes(style);
} else {
// Collect the rangeElements to be processed
while (rangeElement = paragraph.findText(searchPattern, rangeElement)) {
if (rangeElement != null && rangeElement.getStartOffset() != -1) {
const element = rangeElement.getElement();
if (element.getAttributes(rangeElement.getStartOffset()).BACKGROUND_COLOR === backgroundColor) {
rangeElements.push(rangeElement)
}
}
}
}
});
// Process the collected rangeElements in reverse order (makes things easier when deleting content)
rangeElements.reverse().forEach(r => {
if (r != null && r.getStartOffset() != -1) {
const element = r.getElement();
// Remove text
element.asText().deleteText(r.getStartOffset(), r.getEndOffsetInclusive())
// Remove highlighting
// element.setAttributes(textLocation.getStartOffset(), textLocation.getEndOffsetInclusive(), style);
}
});
}

How to select all underlined text in a paragraph

I'm trying to create a google apps script that will format certain parts of a paragraph. For example, text that is underlined will become bolded/italicized as well.
One docs add-on I have tried has a similar feature: https://imgur.com/a/5Cw6Irn (this is exactly what I'm trying to achieve)
How can I write a function that will select a certain type of text and format it?
**I managed to write a script that iterates through every single letter in a paragraph and checks if it's underlined, but it becomes extremely slow as the paragraph gets longer, so I'm looking for a faster solution.
function textUnderline() {
var selectedText = DocumentApp.getActiveDocument().getSelection();
if(selectedText) {
var elements = selectedText.getRangeElements();
for (var index = 0; index < elements.length; index++) {
var element = elements[index];
if(element.getElement().editAsText) {
var text = element.getElement().editAsText();
var textLength = text.getText().length;
//For every single character, check if it's underlined and then format it
for (var i = 0; i < textLength; i++) {
if(text.isUnderline(i)) {
text.setBold(i, i, true);
text.setBackgroundColor(i,i,'#ffff00');
} else {
text.setFontSize(i, i, 8);
}
}
}
}
}
}
Use getTextAttributeIndices:
There is no need to check each character in the selection. You can use getTextAttributeIndices() to get the indices in which the text formatting changes. This method:
Retrieves the set of text indices that correspond to the start of distinct text formatting runs.
You just need to iterate through these indices (that is, check the indices in which text formatting changes), which are a small fraction of all character indices. This will greatly increase efficiency.
Code sample:
function textUnderline() {
var selectedText = DocumentApp.getActiveDocument().getSelection();
if(selectedText) {
var elements = selectedText.getRangeElements();
for (var index = 0; index < elements.length; index++) {
var element = elements[index];
if(element.getElement().editAsText) {
var text = element.getElement().editAsText();
var textRunIndices = text.getTextAttributeIndices();
var textLength = text.getText().length;
for (let i = 0; i < textRunIndices.length; i++) {
const startOffset = textRunIndices[i];
const endOffset = i + 1 < textRunIndices.length ? textRunIndices[i + 1] - 1 : textLength - 1;
if (text.isUnderline(textRunIndices[i])) {
text.setBold(startOffset, endOffset, true);
text.setBackgroundColor(startOffset, endOffset,'#ffff00');
} else {
text.setFontSize(startOffset, endOffset, 8);
}
}
}
}
}
}
Reference:
getTextAttributeIndices()
Based on the example shown in the animated gif, it seems your procedure needs to
handle a selection
set properties if the selected region is of some format (e.g. underlined)
set properties if the selected region is NOT of some format (e.g. not underlined)
finish as fast as possible
and your example code achieves all these goals expect the last one.
The problem is that you are calling the text.set...() functions at each index position. Each call is synchronous and blocks the code until the document is updated, thus your run time grows linearly with each character in the selection.
My suggestion is to build up a collection of subranges from the selection range and then for each subrange use text.set...(subrange.start, subrange.end) to apply the formatting. Now the run time will be dependent on chunks of characters, rather than single characters. i.e., you will only update when the formatting switches back and forth from, in your example, underlined to not underlined.
Here is some example code that implements this subrange idea. I separated the specific predicate function (text.isUnderline) and specific formatting effects into their own functions so as to separate the general idea from the specific implementation.
// run this function with selection
function transformUnderlinedToBoldAndYellow() {
transformSelection("isUnderline", boldYellowOrSmall);
}
function transformSelection(stylePredicateKey, stylingFunction) {
const selectedText = DocumentApp.getActiveDocument().getSelection();
if (!selectedText) return;
const getStyledSubRanges = makeStyledSubRangeReducer(stylePredicateKey);
selectedText.getRangeElements()
.reduce(getStyledSubRanges, [])
.forEach(stylingFunction);
}
function makeStyledSubRangeReducer(stylePredicateKey) {
return function(ranges, rangeElement) {
const {text, start, end} = unwrapRangeElement(rangeElement);
if (start >= end) return ranges; // filter out empty selections
const range = {
text, start, end,
styled: [], notStyled: [] // we will extend our range with subranges
};
const getKey = (isStyled) => isStyled ? "styled" : "notStyled";
let currentKey = getKey(text[stylePredicateKey](start));
range[currentKey].unshift({start: start});
for (let index = start + 1; index <= end; ++index) {
const isStyled = text[stylePredicateKey](index);
if (getKey(isStyled) !== currentKey) { // we are switching styles
range[currentKey][0].end = index - 1; // note end of this style
currentKey = getKey(isStyled);
range[currentKey].unshift({start: index}); // start new style range
}
}
ranges.push(range);
return ranges;
}
}
// a helper function to unwrap a range selection, deals with isPartial,
// maps RangeElement => {text, start, end}
function unwrapRangeElement(rangeElement) {
const isPartial = rangeElement.isPartial();
const text = rangeElement.getElement().asText();
return {
text: text,
start: isPartial
? rangeElement.getStartOffset()
: 0,
end: isPartial
? rangeElement.getEndOffsetInclusive()
: text.getText().length - 1
};
}
// apply specific formatting to satisfy the example
function boldYellowOrSmall(range) {
const {text, start, end, styled, notStyled} = range;
styled.forEach(function setTextBoldAndYellow(range) {
text.setBold(range.start, range.end || end, true);
text.setBackgroundColor(range.start, range.end || end, '#ffff00');
});
notStyled.forEach(function setTextSmall(range) {
text.setFontSize(range.start, range.end || end, 8);
});
}

How to search and replace a horizontal rule and linebreaks

I need to automatically delete all horizontal rules which are surrounded by 6 linebreaks (3 before and 3 after) on a google doc.
This piece of code seems to put in the logs the correct linebreaks I want to delete (that's a first step) :
function myFunction() {
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody().getText();
var pattern = /\s\s\s\s/g;
while (m=pattern.exec(body)) { Logger.log(m[0]); }
}
I have two questions :
What tool can I use to delete these linebreaks (I don't yet understand the subtilies of using replace or replaceText, all my attemps with these have failed) ?
How can I add to my var pattern (the pattern to be deleted) a horizontal rule ? I tried /\s\s\s\sHorizontalRule\s\s\s\s/g but of course it did not work.
Horizontal rule is an element inside a paragraph (or sometimes inside a list item). Since it is not text, it can not be found or replaced by means of a regular expression. We should search for objects which are specially arranged in the document body, and delete them if found.
Consider the following code example:
function deleteHR() {
var body = DocumentApp.getActiveDocument().getBody();
var hr = null, hrArray = [], countDeleted = 0;
// Collect all horizontal rules in the Document
while (true) {
hr = body.findElement(DocumentApp.ElementType.HORIZONTAL_RULE, hr);
if (hr == null) break;
hrArray.push(hr);
}
hrArray.forEach(function(hr) {
var p = hr.getElement().getParent();
// Get empty paragraphs as siblings (if any)
var prevSiblings = getSiblings(p, 3, true),
nextSiblings = getSiblings(p, 3, false);
// Define a short function for batch deleting items
function remove(e) {
e.removeFromParent();
}
// If empty paragraphs exist (3 before and 3 after)
if (prevSiblings.length == 3 && nextSiblings.length == 3) {
// then delete them as well as the rule itself
hr.getElement().removeFromParent();
prevSiblings.forEach(remove);
nextSiblings.forEach(remove);
countDeleted++;
}
});
// Optional report
Logger.log(countDeleted + ' rules deleted');
}
// Recursive search for empty paragraphs as siblings
function getSiblings(p, n, isPrevious) {
if (n == 0) return [];
if (isPrevious) {
p = p.getPreviousSibling();
} else {
p = p.getNextSibling();
}
if (p == null) return [];
if (p.getType() != DocumentApp.ElementType.PARAGRAPH) return [];
if (p.asParagraph().getText().length > 0) return [];
var siblings = getSiblings(p, n - 1, isPrevious);
siblings.push(p);
return siblings;
}
Main function deleteHR() does all the work. However it appears helpful to use another separate function getSiblings() for recursive search for empty paragraphs. May be, this way is not the only, but it works.

How to insert a paragraph object in listItem preserving the formating of each word of the paragraph?

Following the documentation sample, I'm trying to create a function that search for a numerated list in a google document and, if it finds it, adds a new item to the list. My code works well (thanks to #Serge insas for previous help) with strings, but not with paragraphs objects. I know I could get the paragraph text and add it to listItem, but then I lose the formating. Is there a way to insert a paragraph preserving all it's formating? (I know I could use var newElement = child.getParent().insertListItem(childIndex, elementContent.getText()) do insert text without words formating)
Here the code:
function test() {
var targetDocId = "1A02VhxOWLUIdl8LTV1tt2S1yASDbOq77VbsUpxPa6vk";
var targetDoc = DocumentApp.openById(targetDocId);
var body = targetDoc.getBody();
var elementContent = targetDoc.getChild(2); // a paragraph with its formating
var childIndex = 0;
for (var p= 0; p< targetDoc.getNumChildren(); p++) {
var child = targetDoc.getChild(p);
if (child.getType() == DocumentApp.ElementType.LIST_ITEM){
while(child.getType() == DocumentApp.ElementType.LIST_ITEM){
child = targetDoc.getChild(p)
Logger.log("child = " + child.getText())
childIndex = body.getChildIndex(child);
Logger.log(childIndex)
p++
}
child = targetDoc.getChild(p-2);
var listId = child.getListId();
if (child.getText() == '') {
childIndex = childIndex -1;
}
Logger.log(childIndex)
var newElement = child.getParent().insertListItem(childIndex, elementContent);
newElement.setListId(child);
var lastEmptyItem = targetDoc.getChild(childIndex +1).removeFromParent();
break;
}
Here a screen shot of my targetDoc (note the second item, Paragraph):
I know this question is old, but I've come up with a solution and will leave here for anyone that may need it. It is not complete, as I have yet to find a way to copy any Inline Drawing and Equation to a new element...
Anyways, here is my code, it will work well if the paragraph you want to convert to a list item only has text and Inline Images.
function parToList() {
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
//gets the paragraph at index 1 on body -> can be changed to what you want
var par = body.getChild(1);
var childs = [];
for (var i = 0; i<par.getNumChildren(); i++) {
var child = par.getChild(0);
childs.push(child);
child.removeFromParent();
};
par.removeFromParent();
//puts the list item on index 1 of body -> can be changed to the wanted position
var li = body.insertListItem(1, "");
childs.reverse();
for (var j in childs) {
var liChild = childs[j];
var childType = liChild.getType();
if (childType == DocumentApp.ElementType.EQUATION) {
//still need to find a way to append an equation
} else if (childType == DocumentApp.ElementType.INLINE_DRAWING) {
//still need to find a way to append an inlineDrawing
} else if (childType == DocumentApp.ElementType.INLINE_IMAGE) {
li.appendInlineImage(liChild);
} else if (childType == DocumentApp.ElementType.TEXT) {
li.appendText(liChild);
};
};
};
Cheers

How to get a link to a part of document (header, paragraph, section...)

I'm creating a document dynamically with some heading structure
doc = DocumentApp.create("My Document");
doc.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = doc.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
I can open it online, insert Table of contents and can access directly to "Section 1" by url like:
https://docs.google.com/document/d/1aA...FQ/edit#heading=h.41bpnx2ug57j
The question is: How I can get similar url/id to the "Section 1" in the code at run time and use it later as a link?
If I can't - is there any way to set something like anchor/bookmark and get it's url?
Thanks!
Starting to test Google Apps in depth, I had issues with the limited features related to the management of table of contents. I bumped into the code you proposed and used it as a starting point to write my own function to format a table of content:
- applying proper headings styles,
- numeroting the different parts.
I hope this would help some of you improving Google Docs templates:
/**
* Used to properly format the Table of Content object
*/
function formatToc() {
//Define variables
var level1 = 0;
var level2 = 0;
// Define custom paragraph styles.
var style1 = {};
style1[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style1[DocumentApp.Attribute.FONT_SIZE] = 18;
style1[DocumentApp.Attribute.BOLD] = true;
style1[DocumentApp.Attribute.FOREGROUND_COLOR] = '#ff0000';
var style2 = {};
style2[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style2[DocumentApp.Attribute.FONT_SIZE] = 14;
style2[DocumentApp.Attribute.BOLD] = true;
style2[DocumentApp.Attribute.FOREGROUND_COLOR] = '#007cb0';
// Search document's body for the table of contents (assuming there is one and only one).
var toc = doc.getBody().findElement(DocumentApp.ElementType.TABLE_OF_CONTENTS).getElement().asTableOfContents();
//Loop all the table of contents to apply new formating
for (var i = 0; i < toc.getNumChildren(); i++) {
//Search document's body for corresponding paragraph & retrieve heading
var searchText = toc.getChild(i).getText();
for (var j=0; j<doc.getBody().getNumChildren(); j++) {
var par = doc.getBody().getChild(j);
if (par.getType() == DocumentApp.ElementType.LIST_ITEM) {
var searchcomp = par.getText();
if (par.getText() == searchText) {
// Found corresponding paragrapg and update headingtype.
var heading = par.getHeading();
var level = par.getNestingLevel();
}
}
}
//Insert Paragraph number before text
if (level==0) {
level1++;
level2=0;
toc.getChild(i).editAsText().insertText(0,level1+". ");
}
if (level==1) {
level2++;
toc.getChild(i).editAsText().insertText(0,level1+"."+level2+". ");
}
//Apply style corresponding to heading
if (heading == DocumentApp.ParagraphHeading.HEADING1) {
toc.getChild(i).setAttributes(style1);
}
if (heading == DocumentApp.ParagraphHeading.HEADING2) {
toc.getChild(i).setAttributes(style2);
}
}
}
Now it is impossible to get a document part (section, paragraph, etc) link without having a TOC. Also there is no way to manage bookmarks from a GAS. There is an issue on the issue tracker. You can star the issue to promote it.
There is a workaround by using a TOC. The following code shows how to get URL from a TOC. It works only if the TOC exists, if to delete it, the links do not work anymore.
function testTOC() {
var doc = DocumentApp.openById('here is doc id');
for (var i = 0; i < doc.getNumChildren(); i++) {
var p = doc.getChild(i);
if (p.getType() == DocumentApp.ElementType.TABLE_OF_CONTENTS) {
var toc = p.asTableOfContents();
for (var ti = 0; ti < toc.getNumChildren(); ti++) {
var itemToc = toc.getChild(ti).asParagraph().getChild(0).asText();
var itemText = itemToc.getText();
var itemUrl = itemToc.getLinkUrl();
}
break;
}
}
}
The function iterates all document parts, finds the 1st TOC, iterates it and the variables itemText and itemUrl contain a TOC item text and URL. The URLs have #heading=h.uuj3ymgjhlie format.
Since the time the accepted answer was written, the ability to manage bookmarks inside Google Apps Script code was introduced. So it is possible to get a similar URL, though not the same exact URL as in example. You can manually insert a bookmark at the section heading, and use that bookmark to link to the section heading. It seems that for the purposes of the question, it will suffice. Here is some sample code (including slight modifications of code from question):
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
body.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = body.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
// create and position bookmark
var sectionPos = doc.newPosition(section, 0);
var sectionBookmark = doc.addBookmark(sectionPos);
// add a link to the section heading
var paragraph = body.appendParagraph("");
paragraph.appendText("Now we add a ");
paragraph.appendText("link to the section heading").setLinkUrl('#bookmark=' + sectionBookmark.getId());
paragraph.appendText(".");
Is it imperative that the document is a native Google docs type (ie. application/vnd.google-apps.document)?
If you stored the document as text/html you would have much greater control over how you assemble the document and how you expose it, eg with anchors.