text formatting according to indentation - html

Hi can someone help me understand how the stackoverflow question's code area works(technically).
I mean the way it formats the text as it indent the text.
example: without indentation
example: with indentation ( text background color and font has changed)
can someone explain me the technology behind this. I am new to programming, is this something hard to implement. How can we implement this kind of formatting depending on the indentation of the text.

One approach could be to loop through each line of text in a string, and group them by indentation level into sections:
var leadingSpaces = /^\s*/;
blockOfText = blockOfText.replace(/\t/g, ' '); // replace tabs with 4 spaces
var lines = blockOfText.split('\n');
var sections = [];
var currentIndentLevel = null;
var currentSection = null;
lines.forEach(function(line) {
var indentLevel = leadingSpaces.exec(line)[0].length;
if (indentLevel !== currentIndentLevel) {
currentIndentLevel = indentLevel;
currentSection = { indentLevel: currentIndentLevel, lines: [] };
sections.push(currentSection);
}
currentSection.lines.push(line);
});
Then, once you have those sections, you can loop through them:
sections.forEach(function(section) {
switch (section.indentLevel) {
case 4:
// format as code
break;
// etc.
default:
// format as markdown
break;
}
});

Related

finding Text with specific format and delete it

I have a big google doc file with over 100 pages(with tables etc) and there is some reference text in that document in multiple locations reference texts are highlighted with the color "grey", I want to have a function that can find those colors/style in the table or paragraph and delete it. So Step 1 is finding it, and then deleting(removing those texts from the document) it in one go.
How we did it in MS Word is, we created custom styles and assign those styles to those "Remarks Text"(in grey) and in VBA we look for text matching the style name, and if it returns true than we delete those texts. As much i know about doc, there is no option to create custom styles.
Here is the code I am trying:-
function removeText()
{
var doc = DocumentApp.getActiveDocument()
var body = doc.getBody()
body.getParagraphs().map(r=> {
if(r.getAttributes().BACKGROUND_COLOR === "#cccccc")
{
//Don't know what to do next, body.removeChild(r.getChild()) not working
}
})
}
Can you guide me on how I can achieve this effectively please.
Thanks
Try this
body.getParagraphs().forEach( r => {
if( r.getAttributes().BACKGROUND_COLOR === "#cccccc" ) {
r.removeFromParent();
}
}
Reference
Paragraph.removeFromParent()
Google Apps Script hasn't a method to find text based on their style attributes, instead we need to get each part and in order to be able to get their attributes. The following example, if the format is applied to the whole paragraph, it is deleted, if not, it uses the regular expression for finding any single character ..
function removeHighlightedText() {
// In case that we want to remove the hightlighting instead of deleting the content
const style = {};
style[DocumentApp.Attribute.BACKGROUND_COLOR] = null;
const backgroundColor = '#cccccc';
const doc = DocumentApp.getActiveDocument();
const searchPattern = '.';
let rangeElement = null;
const rangeElements = [];
doc.getParagraphs().forEach(paragraph => {
if (paragraph.getAttributes().BACKGROUND_COLOR === backgroundColor) {
paragraph.removeFromParent();
// Remove highlighting
// paragraph.setAttributes(style);
} else {
// Collect the rangeElements to be processed
while (rangeElement = paragraph.findText(searchPattern, rangeElement)) {
if (rangeElement != null && rangeElement.getStartOffset() != -1) {
const element = rangeElement.getElement();
if (element.getAttributes(rangeElement.getStartOffset()).BACKGROUND_COLOR === backgroundColor) {
rangeElements.push(rangeElement)
}
}
}
}
});
// Process the collected rangeElements in reverse order (makes things easier when deleting content)
rangeElements.reverse().forEach(r => {
if (r != null && r.getStartOffset() != -1) {
const element = r.getElement();
// Remove text
element.asText().deleteText(r.getStartOffset(), r.getEndOffsetInclusive())
// Remove highlighting
// element.setAttributes(textLocation.getStartOffset(), textLocation.getEndOffsetInclusive(), style);
}
});
}

xpath in apps script?

I made a formula to extract some Wikipedia data in Google Seets which works fine. Here is the formula:
=regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Geography')]]"))),"\[[^\]]+\]","")&char(10)&char(10)&iferror(regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Education')]]"))),"\[[^\]]+\]",""))
Where D2 is a URL like https://en.wikipedia.org/wiki/Abbeville,_Alabama
This extracts some Geography and Education data from the Wikipedia page. Trouble is that importxml only runs a few times before it dies due to quota.
So I thought maybe better to use Apps Script where there are much higher limits on fetching and parsing. I could not see a good way however of using Xpath in Apps Script. Older posts on the web discuss using a deprecated service called Xml but it seems to no longer work. There is a Service called XmlService which looks like it may do the job but you can't just plug in an Xpath. It looks like a lot of sweating to get to the result. Any solutions out there where you can just plug in Xpath?
Here is an alternative solution I actually do in a case like this.
I have used XmlService but only for parsing the content, not for using Xpath. This makes use of the element tags and so far pretty consistent on my tests. Although, it might need tweaks when certain tags are in the result and you might have to include them into the exclusion condition.
Tested the code below in both links:
https://en.wikipedia.org/wiki/Abbeville,_Alabama#Geography
https://en.wikipedia.org/wiki/Montgomery,_Alabama#Education
My test shows that the formula above used did not return the proper output from the 2nd link while the code does. (Maybe because it was too long)
Code:
function getGeoAndEdu(path) {
var data = UrlFetchApp.fetch(path).getContentText();
// wikipedia is divided into sections, if output is cut, increase the number
var regex = /.{1,100000}/g;
var results = [];
// flag to determine if matches should be added
var foundFlag = false;
do {
m = regex.exec(data);
if (foundFlag) {
// if another header is found during generation of data, stop appending the matches
if (matchTag(m[0], "<h2>"))
foundFlag = false;
// exclude tables, sub-headers and divs containing image description
else if(matchTag(m[0], "<div") || matchTag(m[0], "<h3") ||
matchTag(m[0], "<td") || matchTag(m[0], "<th"))
continue;
else
results.push(m[0]);
}
// start capturing if either IDs are found
if (m != null && (matchTag(m[0], "id=\"Geography\"") ||
matchTag(m[0], "id=\"Education\""))) {
foundFlag = true;
}
} while (m);
var output = results.map(function (str) {
// clean tags for XmlService
str = str.replace(/<[^>]*>/g, '').trim();
decode = XmlService.parse('<d>' + str + '</d>')
// convert html entity codes (e.g.  ) to text
return decode.getRootElement().getText();
// filter blank results due to cleaning and empty sections
// separate data and remove citations before returning output
}).filter(result => result.trim().length > 1).join("\n").replace(/\[\d+\]/g, '');
return output;
}
// check if tag is found in string
function matchTag(string, tag) {
var regex = RegExp(tag);
return string.match(regex) && string.match(regex)[0] == tag;
}
Output:
Difference:
Formula ending output
Script ending output
Education ending in wikipedia
Note:
You still have quota when using UrlFetchApp but should be better than IMPORTXML's limit depending on the type of your account.
Reference:
Apps Script Quotas
Sorry I got very busy this week so I didn't reply. I took a look at your answer which seems to work fine, but it was quite code heavy. I wanted something I would understand so I coded my own solution. not that mine is any simpler. It's just my own code so it's easier for me to follow:
function getTextBetweenTags(html, paramatersInFirstTag, paramatersInLastTag) { //finds text values between 2 tags and removes internal tags to leave plain text.
//eg getTextBetweenTags(html,[['class="mw-headline"'],['id="Geography"']],[['class="wikitable mw-collapsible mw-made-collapsible"']])
// **Note: you may want to replace &#number; with ascII number
var openingTagPos = null;
var closingTagPos = null;
var previousChar = '';
var readingTag = false;
var newTag = '';
var tagEnd = false;
var regexFirstTagParams = [];
var regexLastTagParams = [];
//prepare regexes to test for parameters in opening and closing tags. put regexes in arrays so each condition can be tested separately
for (var i in paramatersInFirstTag) {
regexFirstTagParams.push(new RegExp(escapeRegex(paramatersInFirstTag[i][0])))
}
for (var i in paramatersInLastTag) {
regexLastTagParams.push(new RegExp(escapeRegex(paramatersInLastTag[i][0])))
}
var startTagIndex = null;
var endTagIndex = null;
var matches = 0;
for (var i = 0; i < html.length - 1; i++) {
var nextChar = html.substr(i, 1);
if (nextChar == '<' && previousChar != '\\') {
readingTag = true;
}
if (nextChar == '>' && previousChar != '\\') { //if end of tag found, check tag matches start or end tag
readingTag = false;
newTag += nextChar;
//test for firstTag
if (startTagIndex == null) {
var alltestsPass = true;
for (var j in regexFirstTagParams) {
if (!regexFirstTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
startTagIndex = i + 1;
//console.log('Start Tag',startTagIndex)
matches++;
}
}
//test for lastTag
else if (startTagIndex != null) {
var alltestsPass = true;
for (var j in regexLastTagParams) {
if (!regexLastTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
endTagIndex = i + 1;
matches++;
}
}
if(startTagIndex && endTagIndex) break;
newTag = '';
}
if (readingTag) newTag += nextChar;
previousChar = nextChar;
}
if (matches < 2) return 'No matches';
else return html.substring(startTagIndex, endTagIndex).replace(/<[^>]+>/g, '');
}
function escapeRegex(string) {
if (string == null) return string;
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
My function requires an array of attributes for the start tag and an array of attributes for the end tag. It gets any text in between and removes any tags found inbetween. One issue I also noticed was there were often special characters (eg  ) so they need to be replaced. I did that outside the scope of the function above.
The function could be easily improved to check the tag type (eg h2), but it wasn't necessary for the wikipedia case.
Here is a function where I called the above function. the html variable is just the result of UrlFetchApp.fetch('some wikipedia city url').getContextText();
function getWikiTexts(html) {
var geography = getTextBetweenTags(html, [['class="mw-headline"'], ['id="Geography']], [['class="mw-headline"']]);
var economy = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Economy']], 'span', [['class="mw-headline"']])
var education = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Education']], 'span', [['class="mw-headline"']])
var returnString = '';
if (geography != 'No matches' && !/Wikipedia/.test(geography)) returnString += geography + '\n';
if (economy != 'No matches' && !/Wikipedia/.test(economy)) returnString += economy + '\n';
if (education != 'No matches' && !/Wikipedia/.test(education)) returnString += education + '\n';
return returnString
}
Thanks for posting your answer.

Copied Image from Google Document Paragraph inserted twice

I'm trying to combine several Google Document inside one, but images inside the originals documents are inserted twice. One is at the right location, the other one is at the end of the newly created doc.
From what I saw, these images are detected as Paragraph by the script.
As you might see in my code below, I've been inspired by similar topics found here.
One of them suggested searching for child Element inside the Paragraph Element, but debugging showed that there is none. The concerned part of the doc will always be inserted with appendParagraph method as the script is not able to properly detect the image.
This is why the other relevant topic I found cannot work here : it suggested inserting the image before the paragraph itself but it cannot detects it.
Logging with both default Logger and console.log from Stackdriver will display an object typed as Paragraph.
The execution step by step did not show displayed any loop calling the appendParagraph method twice.
/* chosenParts contains list of Google Documents name */
function concatChosenFiles(chosenParts) {
var folders = DriveApp.getFoldersByName(folderName);
var folder = folders.hasNext() ? folders.next() : false;
var parentFolders = folder.getParents();
var parentFolder = parentFolders.next();
var file = null;
var gdocFile = null;
var fileContent = null;
var offerTitle = "New offer";
var gdocOffer = DocumentApp.create(offerTitle);
var gfileOffer = DriveApp.getFileById(gdocOffer.getId()); // transform Doc into File in order to choose its path with DriveApp
var offerHeader = gdocOffer.addHeader();
var offerContent = gdocOffer.getBody();
var header = null;
var headerSubPart = null;
var partBody= null;
var style = {};
parentFolder.addFile(gfileOffer); // place current offer inside generator folder
DriveApp.getRootFolder().removeFile(gfileOffer); // remove from home folder to avoid copy
for (var i = 0; i < chosenParts.length; i++) {
// First retrieve Document to combine
file = folder.getFilesByName(chosenParts[i]);
file = file.hasNext() ? file.next() : null;
gdocFile = DocumentApp.openById(file.getId());
header = gdocFile.getHeader();
// set Header from first doc
if ((0 === i) && (null !== header)) {
for (var j = 0; j < header.getNumChildren(); j++) {
headerSubPart = header.getChild(j).copy();
offerHeader.appendParagraph(headerSubPart); // Assume header content is always a paragraph
}
}
fileContent = gdocFile.getBody();
// Analyse file content and insert each part inside the offer with the right method
for (var j = 0; j < fileContent.getNumChildren(); j++) {
// There is a limit somewhere between 50-100 unsaved changed where the script
// wont continue until a batch is commited.
if (j % 50 == 0) {
gdocOffer.saveAndClose();
gdocOffer = DocumentApp.openById(gdocOffer.getId());
offerContent = gdocOffer.getBody();
}
partBody = fileContent.getChild(j).copy();
switch (partBody.getType()) {
case DocumentApp.ElementType.HORIZONTAL_RULE:
offerContent.appendHorizontalRule();
break;
case DocumentApp.ElementType.INLINE_IMAGE:
offerContent.appendImage(partBody);
break;
case DocumentApp.ElementType.LIST_ITEM:
offerContent.appendListItem(partBody);
break;
case DocumentApp.ElementType.PAGE_BREAK:
offerContent.appendPageBreak(partBody);
break;
case DocumentApp.ElementType.PARAGRAPH:
// Search for image inside parapraph type
if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE)
{
offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
} else {
offerContent.appendParagraph(partBody.asParagraph());
}
break;
case DocumentApp.ElementType.TABLE:
offerContent.appendTable(partBody);
break;
default:
style[DocumentApp.Attribute.BOLD] = true;
offerContent.appendParagraph("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.").setAttributes(style);
console.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
Logger.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
}
}
// page break at the end of each part.
offerContent.appendPageBreak();
}
}
The problem occurs no matter how much files are combined, using one is enough to reproduce.
If there's only one image in the file (no spaces nor line feed around) and if the "appendPageBreak" is not used afterward, it will not occur. When some text resides next to the image, then the image is duplicated.
One last thing : Someone suggested that it is "due to natural inheritance of formatting", but I did not find how to prevent that.
Many thanks to everyone who'll be able to take a look at this :)
Edit : I adapted the paragraph section after #ziganotschka suggestions
It is very similar to this subject except its solution does not work here.
Here is the new piece of code :
case DocumentApp.ElementType.PARAGRAPH:
// Search for image inside parapraph type
if(partBody.asParagraph().getPositionedImages().length) {
// Assume only one image per paragraph (#TODO : to improve)
tmpImage = partBody.asParagraph().getPositionedImages()[0].getBlob().copyBlob();
// remove image from paragraph in order to add only the paragraph
partBody.asParagraph().removePositionedImage(partBody.asParagraph().getPositionedImages()[0].getId());
tmpParagraph = offerContent.appendParagraph(partBody.asParagraph());
// Then add the image afterward, without text
tmpParagraph.addPositionedImage(tmpImage);
} else if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE) {
offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
} else {
offerContent.appendParagraph(partBody.asParagraph());
}
break;
Unfortunately, it stills duplicate the image. And if I comment the line inserting the image (tmpParagraph.addPositionedImage(tmpImage);) then no image is inserted at all.
Edit 2 : it is a known bug in Google App Script
https://issuetracker.google.com/issues/36763970
See comments for some workaround.
Your image is embedded as a 'Wrap text', rather than an Inline image
This is why you cannot retrieve it with getBody().getImages();
Instead, you can retrieve it with getBody().getParagraphs();[index].getPositionedImages()
I am not sure why exactly your image is copied twice, but as a workaround you can make a copy of the image and insert it as an inline image with
getBody().insertImage(childIndex, getBody().getParagraphs()[index].getPositionedImages()[index].copy());
And subsequently
getBody().getParagraphs()[index].getPositionedImages()[index].removeFromParent();
Obviously, you will need to loop through all the paragraphs and check for each one either it has embedded positioned images in order to retrieve them with the right index and proceed.
Add your PositionedImages at the end of your script after you add all your other elements. From my experience if other elements get added to the document after the the image positioning paragraph, extra images will be added.
You can accomplish this my storing a reference to the paragraph element that will be used as the image holder, and any information (height, width, etc) along with the blob from the image. And then at the end of your script just iterate over the stored references and add the images.
var imageParagraphs = [];
...
case DocumentApp.ElementType.PARAGRAPH:
var positionedImages = element.getPositionedImages();
if (positionedImages.length > 0){
var imageData = [];
for each(var image in positionedImages){
imageData.push({
height: image.getHeight(),
width: image.getWidth(),
leftOffset: image.getLeftOffset(),
topOffset: image.getTopOffset(),
layout: image.getLayout(),
blob: image.getBlob()
});
element.removePositionedImage(image.getId());
}
var p = merged_doc_body.appendParagraph(element.asParagraph());
imageParagraphs.push({element: p, imageData: imageData});
}
else
merged_doc_body.appendParagraph(element);
break;
...
for each(var p in imageParagraphs){
var imageData = p.imageData
var imageParagraph = p.element
for each(var image in imageData){
imageParagraph.addPositionedImage(image.blob)
.setHeight(image.height)
.setWidth(image.width)
.setLeftOffset(image.leftOffset)
.setTopOffset(image.topOffset)
.setLayout(image.layout);
}
}

Update/Replace inline image on Google Document

I'm trying to set a feature to update images on a Google Document, the same way Lucidchart Add-on does on its "Updated inserted diagram" feature. For this, I'm current doing the following:
Creating a Named Range and storing its id on document properties, together with the data to generate the image, for later retrieve.
On update, call body.getNamedRangeById() and replace the element with the new generated image.
This works, but I have the following problems that does not happen with Lucidchart:
Every update, a blank line is added after the image.
If the user drag and drop the image inside document for reposition it, the Named Range disappears and I'm not able to retrieve it later.
If the user centralize the image, after update the image comes back to left position, even copying its attributes
Does anybody knows a good strategy to replace/update a referenced image on Google Docs, the same way Lucidchart add-on update feature works?
Thanks
NamedRanges indeed get lost when the range is moved, so they're not very good for your scenario. But there's no other way of identifying elements (which is a great misfeature of Google Docs).
In the case of an image you could use its LINK_URL to identify it, which seems to be what Lucidchart uses. It does not get in the way of the user, so it may be a good solution.
About getting a blank line and losing attributes when inserting an image, I imagine (since you haven't shared any code) you're inserting the image directly in the document body instead of a paragraph. Then a paragraph gets created automatically to wrap your image resulting in the blank line and lost of attributes.
Here's some code example:
function initialInsert() {
var data = Charts.newDataTable().addColumn(
Charts.ColumnType.STRING, 'Fruits').addColumn(
Charts.ColumnType.NUMBER, 'Amount').addRow(
['Apple',15]).addRow(
['Orange',6]).addRow(
['Banana',14]).build();
var chart = Charts.newPieChart().setDataTable(data).build();
var body = DocumentApp.getActiveDocument().getBody()
body.appendImage(chart).setLinkUrl('http://mychart');
//here we're inserting directly in the body, a wrapping paragraph element will be created for us
}
function updateImage() {
var data = Charts.newDataTable().addColumn(
Charts.ColumnType.STRING, 'Fruits').addColumn(
Charts.ColumnType.NUMBER, 'Amount').addRow(
['Apple',Math.floor(Math.random()*31)]).addRow( //random int between 0 and 30
['Orange',Math.floor(Math.random()*31)]).addRow(
['Banana',Math.floor(Math.random()*31)]).build();
var chart = Charts.newPieChart().setDataTable(data).build();
var img = getMyImg(DocumentApp.getActiveDocument().getBody(), 'http://mychart');
//let's insert on the current parent instead of the body
var parent = img.getParent(); //probably a paragraph, but does not really matter
parent.insertInlineImage(parent.getChildIndex(img)+1, chart).setLinkUrl('http://mychart');
img.removeFromParent();
}
function getMyImg(docBody, linkUrl) {
var imgs = docBody.getImages();
for( var i = 0; i < imgs.length; ++i )
if( imgs[i].getLinkUrl() === linkUrl )
return imgs[i];
return null;
}
About the link_url, you could of course do like Lucidchart does and link back to your site. So it's not just broken for the user.
Take a look at my add-on called PlantUML Gizmo.
Here's the code to the insert image function, which deals with replacing images if there's already one selected:
function insertImage(imageDataUrl, imageUrl) {
/*
* For debugging cursor info
*/
// var cursor = DocumentApp.getActiveDocument().getCursor();
// Logger.log(cursor.getElement().getParent().getType());
// throw "cursor info: " + cursor.getElement().getType() + " offset = " + cursor.getOffset() + " surrounding text = '" + cursor.getSurroundingText().getText() + "' parent's type = " +
// cursor.getElement().getParent().getType();
/*
* end debug
*/
var doc = DocumentApp.getActiveDocument();
var selection = doc.getSelection();
var replaced = false;
if (selection) {
var elements = selection.getSelectedElements();
// delete the selected image (to be replaced)
if (elements.length == 1 &&
elements[0].getElement().getType() ==
DocumentApp.ElementType.INLINE_IMAGE) {
var parentElement = elements[0].getElement().getParent(); // so we can re-insert cursor
elements[0].getElement().removeFromParent();
replaced = true;
// move cursor to just before deleted image
doc.setCursor(DocumentApp.getActiveDocument().newPosition(parentElement, 0));
} else {
throw "Please select only one image (image replacement) or nothing (image insertion)"
}
}
var cursor = doc.getCursor();
var blob;
if (imageDataUrl != "") {
blob = getBlobFromBase64(imageDataUrl);
} else {
blob = getBlobViaFetch(imageUrl);
}
var image = cursor.insertInlineImage(blob);
image.setLinkUrl(imageUrl);
// move the cursor to after the image
var position = doc.newPosition(cursor.getElement(), cursor.getOffset()+1);
doc.setCursor(position);
if (cursor.getElement().getType() == DocumentApp.ElementType.PARAGRAPH) {
Logger.log("Resizing");
// resize if wider than current page
var currentParagraph = DocumentApp.getActiveDocument().getCursor().getElement().asParagraph();
var originalImageWidth = image.getWidth(); // pixels
var documentWidthPoints = DocumentApp.getActiveDocument().getBody().getPageWidth() - DocumentApp.getActiveDocument().getBody().getMarginLeft() - DocumentApp.getActiveDocument().getBody().getMarginRight();
var documentWidth = documentWidthPoints * 96 / 72; // convert to pixels (a guess)
var paragraphWidthPoints = documentWidthPoints - currentParagraph.getIndentStart() - currentParagraph.getIndentEnd();
var paragraphWidth = paragraphWidthPoints * 96 / 72; // convert to pixels (a guess)
if (originalImageWidth > paragraphWidth) {
image.setWidth(paragraphWidth);
// scale proportionally
image.setHeight(image.getHeight() * image.getWidth() / originalImageWidth);
}
}
}

How to get a link to a part of document (header, paragraph, section...)

I'm creating a document dynamically with some heading structure
doc = DocumentApp.create("My Document");
doc.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = doc.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
I can open it online, insert Table of contents and can access directly to "Section 1" by url like:
https://docs.google.com/document/d/1aA...FQ/edit#heading=h.41bpnx2ug57j
The question is: How I can get similar url/id to the "Section 1" in the code at run time and use it later as a link?
If I can't - is there any way to set something like anchor/bookmark and get it's url?
Thanks!
Starting to test Google Apps in depth, I had issues with the limited features related to the management of table of contents. I bumped into the code you proposed and used it as a starting point to write my own function to format a table of content:
- applying proper headings styles,
- numeroting the different parts.
I hope this would help some of you improving Google Docs templates:
/**
* Used to properly format the Table of Content object
*/
function formatToc() {
//Define variables
var level1 = 0;
var level2 = 0;
// Define custom paragraph styles.
var style1 = {};
style1[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style1[DocumentApp.Attribute.FONT_SIZE] = 18;
style1[DocumentApp.Attribute.BOLD] = true;
style1[DocumentApp.Attribute.FOREGROUND_COLOR] = '#ff0000';
var style2 = {};
style2[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style2[DocumentApp.Attribute.FONT_SIZE] = 14;
style2[DocumentApp.Attribute.BOLD] = true;
style2[DocumentApp.Attribute.FOREGROUND_COLOR] = '#007cb0';
// Search document's body for the table of contents (assuming there is one and only one).
var toc = doc.getBody().findElement(DocumentApp.ElementType.TABLE_OF_CONTENTS).getElement().asTableOfContents();
//Loop all the table of contents to apply new formating
for (var i = 0; i < toc.getNumChildren(); i++) {
//Search document's body for corresponding paragraph & retrieve heading
var searchText = toc.getChild(i).getText();
for (var j=0; j<doc.getBody().getNumChildren(); j++) {
var par = doc.getBody().getChild(j);
if (par.getType() == DocumentApp.ElementType.LIST_ITEM) {
var searchcomp = par.getText();
if (par.getText() == searchText) {
// Found corresponding paragrapg and update headingtype.
var heading = par.getHeading();
var level = par.getNestingLevel();
}
}
}
//Insert Paragraph number before text
if (level==0) {
level1++;
level2=0;
toc.getChild(i).editAsText().insertText(0,level1+". ");
}
if (level==1) {
level2++;
toc.getChild(i).editAsText().insertText(0,level1+"."+level2+". ");
}
//Apply style corresponding to heading
if (heading == DocumentApp.ParagraphHeading.HEADING1) {
toc.getChild(i).setAttributes(style1);
}
if (heading == DocumentApp.ParagraphHeading.HEADING2) {
toc.getChild(i).setAttributes(style2);
}
}
}
Now it is impossible to get a document part (section, paragraph, etc) link without having a TOC. Also there is no way to manage bookmarks from a GAS. There is an issue on the issue tracker. You can star the issue to promote it.
There is a workaround by using a TOC. The following code shows how to get URL from a TOC. It works only if the TOC exists, if to delete it, the links do not work anymore.
function testTOC() {
var doc = DocumentApp.openById('here is doc id');
for (var i = 0; i < doc.getNumChildren(); i++) {
var p = doc.getChild(i);
if (p.getType() == DocumentApp.ElementType.TABLE_OF_CONTENTS) {
var toc = p.asTableOfContents();
for (var ti = 0; ti < toc.getNumChildren(); ti++) {
var itemToc = toc.getChild(ti).asParagraph().getChild(0).asText();
var itemText = itemToc.getText();
var itemUrl = itemToc.getLinkUrl();
}
break;
}
}
}
The function iterates all document parts, finds the 1st TOC, iterates it and the variables itemText and itemUrl contain a TOC item text and URL. The URLs have #heading=h.uuj3ymgjhlie format.
Since the time the accepted answer was written, the ability to manage bookmarks inside Google Apps Script code was introduced. So it is possible to get a similar URL, though not the same exact URL as in example. You can manually insert a bookmark at the section heading, and use that bookmark to link to the section heading. It seems that for the purposes of the question, it will suffice. Here is some sample code (including slight modifications of code from question):
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
body.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = body.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
// create and position bookmark
var sectionPos = doc.newPosition(section, 0);
var sectionBookmark = doc.addBookmark(sectionPos);
// add a link to the section heading
var paragraph = body.appendParagraph("");
paragraph.appendText("Now we add a ");
paragraph.appendText("link to the section heading").setLinkUrl('#bookmark=' + sectionBookmark.getId());
paragraph.appendText(".");
Is it imperative that the document is a native Google docs type (ie. application/vnd.google-apps.document)?
If you stored the document as text/html you would have much greater control over how you assemble the document and how you expose it, eg with anchors.