Parsing inlineImages from Gmail raw content - google-apps-script

Gmail message getAttachments function is not returning inlineImages - see issue 2810 https://code.google.com/p/google-apps-script-issues/issues/detail?id=2810
I need to do that, so I wrote the code below to parse the inline image in blob format out of the message raw content, knowing the image cid within the message, in advance.
However, I am afraid this parsing is quite fragile in the way I find the first and last character in the base64 image content, isn't it?
Is there a better way of doing this?
Regards, Fausto
var rawc = message.getRawContent();
var b64c1 = rawc.lastIndexOf(cid) + cid.length + 3; // first character in image base64
var b64cn = rawc.substr(b64c1).indexOf("--") - 3; // last character in image base64
var imgb64 = rawc.substring(b64c1, b64c1 + b64cn + 1); // is this fragile or safe enough?
var imgblob = Utilities.newBlob(Utilities.base64Decode(imgb64), "image/jpeg", cid); // decode and blob

I've had this problem a number of times, and I think I have a pretty general case solution. Getting non-embedded images has also been a problem.
I'm not sure my parsing is any less fragile than yours. In the end, I'm sucking out the part of the multipart by grabbing the surrounding lines that start with '--'. Everything else is just making sure I can use this without modifying the code too much when I need it next. I have had some emails which don't seem follow the \r\n and cause problems: something to lookout for.
The getInlineImages function will take the raw content of the message and return an array of objects. Each object will have the src of the img tag and the blob that goes with the image. If you just want inline images, you can choose to ignore anything that doesn't start with 'cid'.
The getBlobFromMessage function will take the raw content of the message and the src of the img tag (including 'cid') and return the associated blob.
You can see the code commented here.
function getInlineImages(rawContent) {
var url = /^https?:\/\//, cid = /^cid:/;
var imgtags = rawContent.match(/<img.*?>(.*?<\/img>)?/gi);
return imgtags ? imgtags.map(function(imgTag) {
var img = {src: Xml.parse(imgTag,true).html.body.img.src};
img.blob = url.test(img.src) ? UrlFetchApp.fetch(img.src).getBlob()
: cid.test(img.src) ? getBlobFromMessage(rawContent,img.src)
: null;
return img;
}) : [];
}
function getBlobFromMessage(rawContent,src) {
var cidIndex = src.search(/cid:/i);
if(cidIndex === -1) throw Utilities.formatString("Did not find cid: prefix for inline refenece: %s", src)
var itemId = src.substr(cidIndex + 4);
var contentIdIndex = rawContent.search("Content-ID:.*?" + itemId);
if(contentIdIndex === -1) throw Utilities.formatString("Item with ID %s not found.",src);
var previousBoundaryIndex = rawContent.lastIndexOf("\r\n--",contentIdIndex);
var nextBoundaryIndex = rawContent.indexOf("\r\n--",previousBoundaryIndex+1);
var part = rawContent.substring(previousBoundaryIndex,nextBoundaryIndex);
var contentTransferEncodingLine = part.match(/Content-Transfer-Encoding:.*?\r\n/i)[0];
var encoding = contentTransferEncodingLine.split(":")[1].trim();
if(encoding != "base64") throw Utilities.formatString("Unhandled encoding type: %s",encoding);
var contentTypeLine = part.match(/Content-Type:.*?\r\n/i)[0];
var contentType = contentTypeLine.split(":")[1].split(";")[0].trim();
var startOfBlob = part.indexOf("\r\n\r\n");
var blobText = part.substring(startOfBlob).replace("\r\n","");
return Utilities.newBlob(Utilities.base64Decode(blobText),contentType,itemId);
}

A more recent approach for this issue.
The issue
For example, here's an email body retrieved with .getBody()
<div dir="ltr"><div><img src="?view=att&th=1401f70d4881e07f&attid=0.3&disp=emb&realattid=ii_1401f6fc7824ebe1&zw&atsh=1" alt="Inline image 4" width="200" height="180"><br></div><div><br></div><img src="?view=att&th=1401f70d4881e07f&attid=0.2&disp=emb&realattid=ii_1401f6e6c1d46c4b&zw&atsh=1" alt="Inline image 2" width="200" height="65"><div><br></div><div>
jtykuykyu</div><div><br></div><div><img src="?view=att&th=1401f70d4881e07f&attid=0.1&disp=emb&realattid=ii_1401f6e9df3a4b1c&zw&atsh=1" alt="Inline image 3" width="200" height="82"><br><div><br></div><div><br></div></div></div>
And here is the list of attachments for the email (among which are our inline images):
[13-07-30 08:28:08:378 CEST] Screen Shot 2013-07-12 at 1.54.31 PM.png
[13-07-30 08:28:08:379 CEST] Screen Shot 2013-07-23 at 5.38.51 PM.png
[13-07-30 08:28:08:380 CEST] Screen Shot 2013-07-25 at 9.05.15 AM.png
[13-07-30 08:28:08:381 CEST] test2.png
As you can see, there's no link between the name of those images and the information available in the img tags, so there's no safe way to rebuild a correct email with only those information.
The solution
How to solve that ? We can use the method .getRawContent() to get the actual email and parse it to get the information we need. Specifically, this method give us a relationship between the name of an attachment and the 'realattid' available in the email body:
Content-Type: image/png; name="Screen Shot 2013-07-25 at 9.05.15 AM.png"
Content-Transfer-Encoding: base64
Content-ID:
X-Attachment-Id: ii_1401f6e9df3a4b1c
Code snippet
Here's a code snippet to:
-Retrieve the body & attachments of an email
-Get all the img tags inside the body and see which ones are linked to attachments in the email
-Get the 'realattid' of each image and use .getRawContent() to link this 'realattid' to the right attachment
-Replace the img tag to correctly link it to the right attachment
-Indicate that this attachment is no longer a simple attachment but an inline image
-Once all that is done you have all the data you need to send a copy of this email with the correct inline images displayed.
//////////////////////////////////////////////////////////////////////////////
// Get inline images and make sure they stay as inline images
//////////////////////////////////////////////////////////////////////////////
var emailTemplate = selectedTemplate.getBody();
var rawContent = selectedTemplate.getRawContent();
var attachments = selectedTemplate.getAttachments();
var regMessageId = new RegExp(selectedTemplate.getId(), "g");
if (emailTemplate.match(regMessageId) != null) {
var inlineImages = {};
var nbrOfImg = emailTemplate.match(regMessageId).length;
var imgVars = emailTemplate.match(/<img[^>]+>/g);
var imgToReplace = [];
if(imgVars != null){
for (var i = 0; i < imgVars.length; i++) {
if (imgVars[i].search(regMessageId) != -1) {
var id = imgVars[i].match(/realattid=([^&]+)&/);
if (id != null) {
var temp = rawContent.split(id[1])[1];
temp = temp.substr(temp.lastIndexOf('Content-Type'));
var imgTitle = temp.match(/name="([^"]+)"/);
if (imgTitle != null) imgToReplace.push([imgTitle[1], imgVars[i], id[1]]);
}
}
}
}
for (var i = 0; i < imgToReplace.length; i++) {
for (var j = 0; j < attachments.length; j++) {
if(attachments[j].getName() == imgToReplace[i][0]) {
inlineImages[imgToReplace[i][2]] = attachments[j].copyBlob();
attachments.splice(j, 1);
var newImg = imgToReplace[i][1].replace(/src="[^\"]+\"/, "src=\"cid:" + imgToReplace[i][2] + "\"");
emailTemplate = emailTemplate.replace(imgToReplace[i][1], newImg);
}
}
}
}
//////////////////////////////////////////////////////////////////////////////
var message = {
htmlBody: emailTemplate,
subject: selectedTemplate.getSubject(),
attachments: attachments,
inlineImages: inlineImages
}

Related

Is there any way to see the names of a website's network requests with Google apps scripts?

So here's what I'm trying to do:
There's a website called Torah Anytime (https://www.torahanytime.com/) which publishes audio files (I guess you can call them podcasts, the website refers to them as shiurim, shiur being hebrew for song, or in this case, audio) on a daily basis. I would like to create a script that downloads the audio of specific speakers and then emails those files to me. The way I'm accomplishing this is with Google Apps Scripts. Torah Anytime allows you to follow specific speakers and to get email notifications when a speaker you're following puts out a new podcast. Here is the code that I have so far:
function main() {
var emails = getemails();
for (var i = 0; i < emails.length; i++) {
var email = emails[i].getMessages();
if (email[0].getFrom() == "TorahAnytime Following <following#torahanytime.com>"){
var title = getTitle(email);
var shiurID = getShiurID(email);
var downloadLink = "https://dl.torahanytime.com/audio/" + shiurID;
var shiur = downloadShiur(downloadLink);
shiur.setName(title);
var emailSent = emailShiur(shiur);
if (emailSent) {email[0].moveToTrash();
Logger.log("Email moved to Trash");}
}
}
}
function getemails() {
var label = GmailApp.getUserLabelByName("TA Speeches");
return label.getThreads();
}
function getTitle(email) {
body = email[0].getPlainBody();
var begIndex = body.indexOf("from") + 4;
var endIndex = body.indexOf("on ");
var title = body.substring(begIndex, endIndex).toLowerCase().replaceAll(" ", "-").replace(/(\r\n|\n|\r)/gm, "-");
begIndex = body.indexOf("called ") + 7;
endIndex = body.indexOf(" [");
title += "-" + body.substring(begIndex, endIndex).toLowerCase().replaceAll(" ", "-").replace(/(\r\n|\n|\r)/gm, "-") + ".mp3";
return title;
}
function getShiurID(email) {
body = email[0].getPlainBody();
var begIndex = body.indexOf("[") + 1;
var endIndex = body.indexOf("]");
var link = body.substring(begIndex, endIndex).replaceAll("?v", "?a");
console.log(link);
var mainLink = UrlFetchApp.fetch(link);
//here I somehow need to get the link being used to stream that particular audio file
}
function getIDName(email) {
body = email[0].getPlainBody();
var begIndex = body.indexOf("ID ") + 3;
var endIndex = body.indexOf(" and");
return body.substring(begIndex, endIndex);
}
function downloadShiur(downloadLink) {
var audio = UrlFetchApp.fetch(downloadLink);
return audio.getBlob().getAs('audio/mp3');
}
function emailShiur(shiur) {
const maxFileSize = 26214400;
if (shiur.getBytes().length <= maxFileSize) {
MailApp.sendEmail("[Email addressed removed]", "TA Shiur (File)", "Enjoy!", {
attachments: [shiur],
name: 'Automatic Emailer Script'
});
return true;
} else {
MailApp.sendEmail("[Email addressed removed]", "TA Shiur (File)", "Error, File too large to email", {
name: 'Automatic Emailer Script'
});
return false;
}
}
My issue is that the URL to download the file is not in the HTML, so I don't know how to get to it using GAS. If you use chrome's dev-tools, you can see the URL right there in the network tab Example of output I see that I want to get. Does anyone know of any way that I can get the information that I see in chrome's dev-tools network tab (the name's of the URLs being received) using GAS? Thank you!

Microsoft.Graph: How to set ContentId of large embedded inline attachment/image

To send email using Microsoft.Graph, I use code like the following (simplified):
var recipientList = new List<Recipient>
{
new Recipient { EmailAddress = new EmailAddress {Address = "recipient#example.com"}}
};
var email = new Message
{
Body = new ItemBody
{
Content = "<html> ... <img src='cid:CID12345#example.com'> ... </html>",
ContentType = BodyType.Html,
},
Subject = "Message containing inline image",
ToRecipients = recipientList,
};
Message draft = await graphClient.Me
.MailFolders
.Drafts
.Messages
.Request()
.AddAsync(email);
byte[] contentBytes = ...;
if (contentBytes.Length < 3 * 1024 * 1024)
{
// Small Attachments
var fileAttachment = new FileAttachment
{
Name = "Image.png",
ContentBytes = contentBytes,
ContentId = "CID12345#example.com",
IsInline = true,
Size = contentBytes.Length
};
Attachment uploadedFileAttachment = await graphClient.Me.Messages[draft.Id].Attachments
.Request()
.AddAsync(fileAttachment);
}
else
{
// Large Attachments
var contentStream = new MemoryStream(contentBytes);
var attachmentItem = new AttachmentItem
{
#warning TODO: How to set ContentId?
AttachmentType = AttachmentType.File,
Name = "Image.png",
Size = contentStream.Length,
IsInline = true,
};
UploadSession uploadSession = await graphClient.Me.Messages[draft.Id].Attachments
.CreateUploadSession(attachmentItem)
.Request()
.PostAsync();
var maxSliceSize = 320 * 1024; // Must be a multiple of 320KiB.
var largeFileUploadTask = new LargeFileUploadTask<FileAttachment>(uploadSession, contentStream, maxSliceSize);
UploadResult<FileAttachment> uploadResult = await largeFileUploadTask.UploadAsync();
await graphClient.Me.Messages[draft.Id].Send().Request().PostAsync();
}
The email contains an inline image. The image file is added as an attachment. To link this attachment to an HTML img element, I set FileAttachment.ContentId to a value which I also set in the HTML image element's src attribute.
This works as long as the image is smaller than 3 MB. For larger attachments, we have to add the attachment differently - which is also shown in the code above. Instead of a FileAttachment, an AttachmentItem is used, which has an IsInline-Property like FileAttachment. Unfortunately, unlike FileAttachment, AttachmentItem does not have a ContentId property.
https://learn.microsoft.com/en-us/graph/api/resources/fileattachment?view=graph-rest-1.0
https://learn.microsoft.com/en-us/graph/api/resources/attachmentitem?view=graph-rest-1.0
How can I set a ContentId on large attachments?
Ive noticed that when doing large attachments, even if you set IsInline to true on the AttachmentItem , after all the bytes are uploaded, it is still set to false on the FileAttachment item attached to the message, and content ID is null...
You also cannot to a patch on the attachment using its ID to set the content ID and isInline properties because you will get the method is not allowed exception/ error...
Looking into all this , Ive tried everything I could trying to get large attachment images to be able to be used as inline images but nothing Ive tried has worked.
I dont know why they would limit it to only less than 3-4 MB to be able to be used as attachments but it seems that they have hard capped it there and have no intention of allowing that. If anyone can prove me wrong though I would love to hear more!

Copied Image from Google Document Paragraph inserted twice

I'm trying to combine several Google Document inside one, but images inside the originals documents are inserted twice. One is at the right location, the other one is at the end of the newly created doc.
From what I saw, these images are detected as Paragraph by the script.
As you might see in my code below, I've been inspired by similar topics found here.
One of them suggested searching for child Element inside the Paragraph Element, but debugging showed that there is none. The concerned part of the doc will always be inserted with appendParagraph method as the script is not able to properly detect the image.
This is why the other relevant topic I found cannot work here : it suggested inserting the image before the paragraph itself but it cannot detects it.
Logging with both default Logger and console.log from Stackdriver will display an object typed as Paragraph.
The execution step by step did not show displayed any loop calling the appendParagraph method twice.
/* chosenParts contains list of Google Documents name */
function concatChosenFiles(chosenParts) {
var folders = DriveApp.getFoldersByName(folderName);
var folder = folders.hasNext() ? folders.next() : false;
var parentFolders = folder.getParents();
var parentFolder = parentFolders.next();
var file = null;
var gdocFile = null;
var fileContent = null;
var offerTitle = "New offer";
var gdocOffer = DocumentApp.create(offerTitle);
var gfileOffer = DriveApp.getFileById(gdocOffer.getId()); // transform Doc into File in order to choose its path with DriveApp
var offerHeader = gdocOffer.addHeader();
var offerContent = gdocOffer.getBody();
var header = null;
var headerSubPart = null;
var partBody= null;
var style = {};
parentFolder.addFile(gfileOffer); // place current offer inside generator folder
DriveApp.getRootFolder().removeFile(gfileOffer); // remove from home folder to avoid copy
for (var i = 0; i < chosenParts.length; i++) {
// First retrieve Document to combine
file = folder.getFilesByName(chosenParts[i]);
file = file.hasNext() ? file.next() : null;
gdocFile = DocumentApp.openById(file.getId());
header = gdocFile.getHeader();
// set Header from first doc
if ((0 === i) && (null !== header)) {
for (var j = 0; j < header.getNumChildren(); j++) {
headerSubPart = header.getChild(j).copy();
offerHeader.appendParagraph(headerSubPart); // Assume header content is always a paragraph
}
}
fileContent = gdocFile.getBody();
// Analyse file content and insert each part inside the offer with the right method
for (var j = 0; j < fileContent.getNumChildren(); j++) {
// There is a limit somewhere between 50-100 unsaved changed where the script
// wont continue until a batch is commited.
if (j % 50 == 0) {
gdocOffer.saveAndClose();
gdocOffer = DocumentApp.openById(gdocOffer.getId());
offerContent = gdocOffer.getBody();
}
partBody = fileContent.getChild(j).copy();
switch (partBody.getType()) {
case DocumentApp.ElementType.HORIZONTAL_RULE:
offerContent.appendHorizontalRule();
break;
case DocumentApp.ElementType.INLINE_IMAGE:
offerContent.appendImage(partBody);
break;
case DocumentApp.ElementType.LIST_ITEM:
offerContent.appendListItem(partBody);
break;
case DocumentApp.ElementType.PAGE_BREAK:
offerContent.appendPageBreak(partBody);
break;
case DocumentApp.ElementType.PARAGRAPH:
// Search for image inside parapraph type
if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE)
{
offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
} else {
offerContent.appendParagraph(partBody.asParagraph());
}
break;
case DocumentApp.ElementType.TABLE:
offerContent.appendTable(partBody);
break;
default:
style[DocumentApp.Attribute.BOLD] = true;
offerContent.appendParagraph("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.").setAttributes(style);
console.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
Logger.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
}
}
// page break at the end of each part.
offerContent.appendPageBreak();
}
}
The problem occurs no matter how much files are combined, using one is enough to reproduce.
If there's only one image in the file (no spaces nor line feed around) and if the "appendPageBreak" is not used afterward, it will not occur. When some text resides next to the image, then the image is duplicated.
One last thing : Someone suggested that it is "due to natural inheritance of formatting", but I did not find how to prevent that.
Many thanks to everyone who'll be able to take a look at this :)
Edit : I adapted the paragraph section after #ziganotschka suggestions
It is very similar to this subject except its solution does not work here.
Here is the new piece of code :
case DocumentApp.ElementType.PARAGRAPH:
// Search for image inside parapraph type
if(partBody.asParagraph().getPositionedImages().length) {
// Assume only one image per paragraph (#TODO : to improve)
tmpImage = partBody.asParagraph().getPositionedImages()[0].getBlob().copyBlob();
// remove image from paragraph in order to add only the paragraph
partBody.asParagraph().removePositionedImage(partBody.asParagraph().getPositionedImages()[0].getId());
tmpParagraph = offerContent.appendParagraph(partBody.asParagraph());
// Then add the image afterward, without text
tmpParagraph.addPositionedImage(tmpImage);
} else if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE) {
offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
} else {
offerContent.appendParagraph(partBody.asParagraph());
}
break;
Unfortunately, it stills duplicate the image. And if I comment the line inserting the image (tmpParagraph.addPositionedImage(tmpImage);) then no image is inserted at all.
Edit 2 : it is a known bug in Google App Script
https://issuetracker.google.com/issues/36763970
See comments for some workaround.
Your image is embedded as a 'Wrap text', rather than an Inline image
This is why you cannot retrieve it with getBody().getImages();
Instead, you can retrieve it with getBody().getParagraphs();[index].getPositionedImages()
I am not sure why exactly your image is copied twice, but as a workaround you can make a copy of the image and insert it as an inline image with
getBody().insertImage(childIndex, getBody().getParagraphs()[index].getPositionedImages()[index].copy());
And subsequently
getBody().getParagraphs()[index].getPositionedImages()[index].removeFromParent();
Obviously, you will need to loop through all the paragraphs and check for each one either it has embedded positioned images in order to retrieve them with the right index and proceed.
Add your PositionedImages at the end of your script after you add all your other elements. From my experience if other elements get added to the document after the the image positioning paragraph, extra images will be added.
You can accomplish this my storing a reference to the paragraph element that will be used as the image holder, and any information (height, width, etc) along with the blob from the image. And then at the end of your script just iterate over the stored references and add the images.
var imageParagraphs = [];
...
case DocumentApp.ElementType.PARAGRAPH:
var positionedImages = element.getPositionedImages();
if (positionedImages.length > 0){
var imageData = [];
for each(var image in positionedImages){
imageData.push({
height: image.getHeight(),
width: image.getWidth(),
leftOffset: image.getLeftOffset(),
topOffset: image.getTopOffset(),
layout: image.getLayout(),
blob: image.getBlob()
});
element.removePositionedImage(image.getId());
}
var p = merged_doc_body.appendParagraph(element.asParagraph());
imageParagraphs.push({element: p, imageData: imageData});
}
else
merged_doc_body.appendParagraph(element);
break;
...
for each(var p in imageParagraphs){
var imageData = p.imageData
var imageParagraph = p.element
for each(var image in imageData){
imageParagraph.addPositionedImage(image.blob)
.setHeight(image.height)
.setWidth(image.width)
.setLeftOffset(image.leftOffset)
.setTopOffset(image.topOffset)
.setLayout(image.layout);
}
}

preload jQuery building of JSON results

So I am currently building an activity feed/news feed or sorts using JSON and jQuery (and of course PHP). Everything works really well, especially fetching new results. The only issue is the first load - and I'm wondering if there is some way to sort of preload the results to make it more slick?
jQuery code below:
for (var j = 0; j < jsonData.items.length; j++) {
var entryData = jsonData.items[j];
var entry = template.clone();
entry.removeClass("template");
entry.find(".message").text(entryData.statusid);
entry.find(".actName").text(entryData.name);
entry.find(".actContent").text(entryData.content);
//get the users ProfilePic
var profileImg = $("<img />");
profileImg.attr("src", "./img/" +entryData.profilePic);
profileImg.addClass("feed-user-img");
entry.find(".actProfilePic").append(profileImg);
//Get user-uploaded images.
entry.find(".actImage").text(entryData.imageKey);
if (entryData.imageKey != "")
{
var img = $("<img />"); // Create the image element
img.attr("src", "http://spartadev.s3.amazonaws.com/" + entryData.imageKey); // Set src to the s3 url plus the imageKey
entry.find(".actImage").append(img); // Append it to the element where it's supposed to be
}
spot.prepend(entry);
spot.find(".entry").first().hide().slideDown();
}

How to get a link to a part of document (header, paragraph, section...)

I'm creating a document dynamically with some heading structure
doc = DocumentApp.create("My Document");
doc.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = doc.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
I can open it online, insert Table of contents and can access directly to "Section 1" by url like:
https://docs.google.com/document/d/1aA...FQ/edit#heading=h.41bpnx2ug57j
The question is: How I can get similar url/id to the "Section 1" in the code at run time and use it later as a link?
If I can't - is there any way to set something like anchor/bookmark and get it's url?
Thanks!
Starting to test Google Apps in depth, I had issues with the limited features related to the management of table of contents. I bumped into the code you proposed and used it as a starting point to write my own function to format a table of content:
- applying proper headings styles,
- numeroting the different parts.
I hope this would help some of you improving Google Docs templates:
/**
* Used to properly format the Table of Content object
*/
function formatToc() {
//Define variables
var level1 = 0;
var level2 = 0;
// Define custom paragraph styles.
var style1 = {};
style1[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style1[DocumentApp.Attribute.FONT_SIZE] = 18;
style1[DocumentApp.Attribute.BOLD] = true;
style1[DocumentApp.Attribute.FOREGROUND_COLOR] = '#ff0000';
var style2 = {};
style2[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style2[DocumentApp.Attribute.FONT_SIZE] = 14;
style2[DocumentApp.Attribute.BOLD] = true;
style2[DocumentApp.Attribute.FOREGROUND_COLOR] = '#007cb0';
// Search document's body for the table of contents (assuming there is one and only one).
var toc = doc.getBody().findElement(DocumentApp.ElementType.TABLE_OF_CONTENTS).getElement().asTableOfContents();
//Loop all the table of contents to apply new formating
for (var i = 0; i < toc.getNumChildren(); i++) {
//Search document's body for corresponding paragraph & retrieve heading
var searchText = toc.getChild(i).getText();
for (var j=0; j<doc.getBody().getNumChildren(); j++) {
var par = doc.getBody().getChild(j);
if (par.getType() == DocumentApp.ElementType.LIST_ITEM) {
var searchcomp = par.getText();
if (par.getText() == searchText) {
// Found corresponding paragrapg and update headingtype.
var heading = par.getHeading();
var level = par.getNestingLevel();
}
}
}
//Insert Paragraph number before text
if (level==0) {
level1++;
level2=0;
toc.getChild(i).editAsText().insertText(0,level1+". ");
}
if (level==1) {
level2++;
toc.getChild(i).editAsText().insertText(0,level1+"."+level2+". ");
}
//Apply style corresponding to heading
if (heading == DocumentApp.ParagraphHeading.HEADING1) {
toc.getChild(i).setAttributes(style1);
}
if (heading == DocumentApp.ParagraphHeading.HEADING2) {
toc.getChild(i).setAttributes(style2);
}
}
}
Now it is impossible to get a document part (section, paragraph, etc) link without having a TOC. Also there is no way to manage bookmarks from a GAS. There is an issue on the issue tracker. You can star the issue to promote it.
There is a workaround by using a TOC. The following code shows how to get URL from a TOC. It works only if the TOC exists, if to delete it, the links do not work anymore.
function testTOC() {
var doc = DocumentApp.openById('here is doc id');
for (var i = 0; i < doc.getNumChildren(); i++) {
var p = doc.getChild(i);
if (p.getType() == DocumentApp.ElementType.TABLE_OF_CONTENTS) {
var toc = p.asTableOfContents();
for (var ti = 0; ti < toc.getNumChildren(); ti++) {
var itemToc = toc.getChild(ti).asParagraph().getChild(0).asText();
var itemText = itemToc.getText();
var itemUrl = itemToc.getLinkUrl();
}
break;
}
}
}
The function iterates all document parts, finds the 1st TOC, iterates it and the variables itemText and itemUrl contain a TOC item text and URL. The URLs have #heading=h.uuj3ymgjhlie format.
Since the time the accepted answer was written, the ability to manage bookmarks inside Google Apps Script code was introduced. So it is possible to get a similar URL, though not the same exact URL as in example. You can manually insert a bookmark at the section heading, and use that bookmark to link to the section heading. It seems that for the purposes of the question, it will suffice. Here is some sample code (including slight modifications of code from question):
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
body.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = body.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
// create and position bookmark
var sectionPos = doc.newPosition(section, 0);
var sectionBookmark = doc.addBookmark(sectionPos);
// add a link to the section heading
var paragraph = body.appendParagraph("");
paragraph.appendText("Now we add a ");
paragraph.appendText("link to the section heading").setLinkUrl('#bookmark=' + sectionBookmark.getId());
paragraph.appendText(".");
Is it imperative that the document is a native Google docs type (ie. application/vnd.google-apps.document)?
If you stored the document as text/html you would have much greater control over how you assemble the document and how you expose it, eg with anchors.