Copied Image from Google Document Paragraph inserted twice - google-apps-script

I'm trying to combine several Google Document inside one, but images inside the originals documents are inserted twice. One is at the right location, the other one is at the end of the newly created doc.
From what I saw, these images are detected as Paragraph by the script.
As you might see in my code below, I've been inspired by similar topics found here.
One of them suggested searching for child Element inside the Paragraph Element, but debugging showed that there is none. The concerned part of the doc will always be inserted with appendParagraph method as the script is not able to properly detect the image.
This is why the other relevant topic I found cannot work here : it suggested inserting the image before the paragraph itself but it cannot detects it.
Logging with both default Logger and console.log from Stackdriver will display an object typed as Paragraph.
The execution step by step did not show displayed any loop calling the appendParagraph method twice.
/* chosenParts contains list of Google Documents name */
function concatChosenFiles(chosenParts) {
var folders = DriveApp.getFoldersByName(folderName);
var folder = folders.hasNext() ? folders.next() : false;
var parentFolders = folder.getParents();
var parentFolder = parentFolders.next();
var file = null;
var gdocFile = null;
var fileContent = null;
var offerTitle = "New offer";
var gdocOffer = DocumentApp.create(offerTitle);
var gfileOffer = DriveApp.getFileById(gdocOffer.getId()); // transform Doc into File in order to choose its path with DriveApp
var offerHeader = gdocOffer.addHeader();
var offerContent = gdocOffer.getBody();
var header = null;
var headerSubPart = null;
var partBody= null;
var style = {};
parentFolder.addFile(gfileOffer); // place current offer inside generator folder
DriveApp.getRootFolder().removeFile(gfileOffer); // remove from home folder to avoid copy
for (var i = 0; i < chosenParts.length; i++) {
// First retrieve Document to combine
file = folder.getFilesByName(chosenParts[i]);
file = file.hasNext() ? file.next() : null;
gdocFile = DocumentApp.openById(file.getId());
header = gdocFile.getHeader();
// set Header from first doc
if ((0 === i) && (null !== header)) {
for (var j = 0; j < header.getNumChildren(); j++) {
headerSubPart = header.getChild(j).copy();
offerHeader.appendParagraph(headerSubPart); // Assume header content is always a paragraph
}
}
fileContent = gdocFile.getBody();
// Analyse file content and insert each part inside the offer with the right method
for (var j = 0; j < fileContent.getNumChildren(); j++) {
// There is a limit somewhere between 50-100 unsaved changed where the script
// wont continue until a batch is commited.
if (j % 50 == 0) {
gdocOffer.saveAndClose();
gdocOffer = DocumentApp.openById(gdocOffer.getId());
offerContent = gdocOffer.getBody();
}
partBody = fileContent.getChild(j).copy();
switch (partBody.getType()) {
case DocumentApp.ElementType.HORIZONTAL_RULE:
offerContent.appendHorizontalRule();
break;
case DocumentApp.ElementType.INLINE_IMAGE:
offerContent.appendImage(partBody);
break;
case DocumentApp.ElementType.LIST_ITEM:
offerContent.appendListItem(partBody);
break;
case DocumentApp.ElementType.PAGE_BREAK:
offerContent.appendPageBreak(partBody);
break;
case DocumentApp.ElementType.PARAGRAPH:
// Search for image inside parapraph type
if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE)
{
offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
} else {
offerContent.appendParagraph(partBody.asParagraph());
}
break;
case DocumentApp.ElementType.TABLE:
offerContent.appendTable(partBody);
break;
default:
style[DocumentApp.Attribute.BOLD] = true;
offerContent.appendParagraph("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.").setAttributes(style);
console.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
Logger.log("Element type '" + partBody.getType() + "' from '" + file.getName() + "' could not be merged.");
}
}
// page break at the end of each part.
offerContent.appendPageBreak();
}
}
The problem occurs no matter how much files are combined, using one is enough to reproduce.
If there's only one image in the file (no spaces nor line feed around) and if the "appendPageBreak" is not used afterward, it will not occur. When some text resides next to the image, then the image is duplicated.
One last thing : Someone suggested that it is "due to natural inheritance of formatting", but I did not find how to prevent that.
Many thanks to everyone who'll be able to take a look at this :)
Edit : I adapted the paragraph section after #ziganotschka suggestions
It is very similar to this subject except its solution does not work here.
Here is the new piece of code :
case DocumentApp.ElementType.PARAGRAPH:
// Search for image inside parapraph type
if(partBody.asParagraph().getPositionedImages().length) {
// Assume only one image per paragraph (#TODO : to improve)
tmpImage = partBody.asParagraph().getPositionedImages()[0].getBlob().copyBlob();
// remove image from paragraph in order to add only the paragraph
partBody.asParagraph().removePositionedImage(partBody.asParagraph().getPositionedImages()[0].getId());
tmpParagraph = offerContent.appendParagraph(partBody.asParagraph());
// Then add the image afterward, without text
tmpParagraph.addPositionedImage(tmpImage);
} else if (partBody.asParagraph().getNumChildren() != 0 && partBody.asParagraph().getChild(0).getType() == DocumentApp.ElementType.INLINE_IMAGE) {
offerContent.appendImage(partBody.asParagraph().getChild(0).asInlineImage().getBlob());
} else {
offerContent.appendParagraph(partBody.asParagraph());
}
break;
Unfortunately, it stills duplicate the image. And if I comment the line inserting the image (tmpParagraph.addPositionedImage(tmpImage);) then no image is inserted at all.
Edit 2 : it is a known bug in Google App Script
https://issuetracker.google.com/issues/36763970
See comments for some workaround.

Your image is embedded as a 'Wrap text', rather than an Inline image
This is why you cannot retrieve it with getBody().getImages();
Instead, you can retrieve it with getBody().getParagraphs();[index].getPositionedImages()
I am not sure why exactly your image is copied twice, but as a workaround you can make a copy of the image and insert it as an inline image with
getBody().insertImage(childIndex, getBody().getParagraphs()[index].getPositionedImages()[index].copy());
And subsequently
getBody().getParagraphs()[index].getPositionedImages()[index].removeFromParent();
Obviously, you will need to loop through all the paragraphs and check for each one either it has embedded positioned images in order to retrieve them with the right index and proceed.

Add your PositionedImages at the end of your script after you add all your other elements. From my experience if other elements get added to the document after the the image positioning paragraph, extra images will be added.
You can accomplish this my storing a reference to the paragraph element that will be used as the image holder, and any information (height, width, etc) along with the blob from the image. And then at the end of your script just iterate over the stored references and add the images.
var imageParagraphs = [];
...
case DocumentApp.ElementType.PARAGRAPH:
var positionedImages = element.getPositionedImages();
if (positionedImages.length > 0){
var imageData = [];
for each(var image in positionedImages){
imageData.push({
height: image.getHeight(),
width: image.getWidth(),
leftOffset: image.getLeftOffset(),
topOffset: image.getTopOffset(),
layout: image.getLayout(),
blob: image.getBlob()
});
element.removePositionedImage(image.getId());
}
var p = merged_doc_body.appendParagraph(element.asParagraph());
imageParagraphs.push({element: p, imageData: imageData});
}
else
merged_doc_body.appendParagraph(element);
break;
...
for each(var p in imageParagraphs){
var imageData = p.imageData
var imageParagraph = p.element
for each(var image in imageData){
imageParagraph.addPositionedImage(image.blob)
.setHeight(image.height)
.setWidth(image.width)
.setLeftOffset(image.leftOffset)
.setTopOffset(image.topOffset)
.setLayout(image.layout);
}
}

Related

xpath in apps script?

I made a formula to extract some Wikipedia data in Google Seets which works fine. Here is the formula:
=regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Geography')]]"))),"\[[^\]]+\]","")&char(10)&char(10)&iferror(regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Education')]]"))),"\[[^\]]+\]",""))
Where D2 is a URL like https://en.wikipedia.org/wiki/Abbeville,_Alabama
This extracts some Geography and Education data from the Wikipedia page. Trouble is that importxml only runs a few times before it dies due to quota.
So I thought maybe better to use Apps Script where there are much higher limits on fetching and parsing. I could not see a good way however of using Xpath in Apps Script. Older posts on the web discuss using a deprecated service called Xml but it seems to no longer work. There is a Service called XmlService which looks like it may do the job but you can't just plug in an Xpath. It looks like a lot of sweating to get to the result. Any solutions out there where you can just plug in Xpath?
Here is an alternative solution I actually do in a case like this.
I have used XmlService but only for parsing the content, not for using Xpath. This makes use of the element tags and so far pretty consistent on my tests. Although, it might need tweaks when certain tags are in the result and you might have to include them into the exclusion condition.
Tested the code below in both links:
https://en.wikipedia.org/wiki/Abbeville,_Alabama#Geography
https://en.wikipedia.org/wiki/Montgomery,_Alabama#Education
My test shows that the formula above used did not return the proper output from the 2nd link while the code does. (Maybe because it was too long)
Code:
function getGeoAndEdu(path) {
var data = UrlFetchApp.fetch(path).getContentText();
// wikipedia is divided into sections, if output is cut, increase the number
var regex = /.{1,100000}/g;
var results = [];
// flag to determine if matches should be added
var foundFlag = false;
do {
m = regex.exec(data);
if (foundFlag) {
// if another header is found during generation of data, stop appending the matches
if (matchTag(m[0], "<h2>"))
foundFlag = false;
// exclude tables, sub-headers and divs containing image description
else if(matchTag(m[0], "<div") || matchTag(m[0], "<h3") ||
matchTag(m[0], "<td") || matchTag(m[0], "<th"))
continue;
else
results.push(m[0]);
}
// start capturing if either IDs are found
if (m != null && (matchTag(m[0], "id=\"Geography\"") ||
matchTag(m[0], "id=\"Education\""))) {
foundFlag = true;
}
} while (m);
var output = results.map(function (str) {
// clean tags for XmlService
str = str.replace(/<[^>]*>/g, '').trim();
decode = XmlService.parse('<d>' + str + '</d>')
// convert html entity codes (e.g.  ) to text
return decode.getRootElement().getText();
// filter blank results due to cleaning and empty sections
// separate data and remove citations before returning output
}).filter(result => result.trim().length > 1).join("\n").replace(/\[\d+\]/g, '');
return output;
}
// check if tag is found in string
function matchTag(string, tag) {
var regex = RegExp(tag);
return string.match(regex) && string.match(regex)[0] == tag;
}
Output:
Difference:
Formula ending output
Script ending output
Education ending in wikipedia
Note:
You still have quota when using UrlFetchApp but should be better than IMPORTXML's limit depending on the type of your account.
Reference:
Apps Script Quotas
Sorry I got very busy this week so I didn't reply. I took a look at your answer which seems to work fine, but it was quite code heavy. I wanted something I would understand so I coded my own solution. not that mine is any simpler. It's just my own code so it's easier for me to follow:
function getTextBetweenTags(html, paramatersInFirstTag, paramatersInLastTag) { //finds text values between 2 tags and removes internal tags to leave plain text.
//eg getTextBetweenTags(html,[['class="mw-headline"'],['id="Geography"']],[['class="wikitable mw-collapsible mw-made-collapsible"']])
// **Note: you may want to replace &#number; with ascII number
var openingTagPos = null;
var closingTagPos = null;
var previousChar = '';
var readingTag = false;
var newTag = '';
var tagEnd = false;
var regexFirstTagParams = [];
var regexLastTagParams = [];
//prepare regexes to test for parameters in opening and closing tags. put regexes in arrays so each condition can be tested separately
for (var i in paramatersInFirstTag) {
regexFirstTagParams.push(new RegExp(escapeRegex(paramatersInFirstTag[i][0])))
}
for (var i in paramatersInLastTag) {
regexLastTagParams.push(new RegExp(escapeRegex(paramatersInLastTag[i][0])))
}
var startTagIndex = null;
var endTagIndex = null;
var matches = 0;
for (var i = 0; i < html.length - 1; i++) {
var nextChar = html.substr(i, 1);
if (nextChar == '<' && previousChar != '\\') {
readingTag = true;
}
if (nextChar == '>' && previousChar != '\\') { //if end of tag found, check tag matches start or end tag
readingTag = false;
newTag += nextChar;
//test for firstTag
if (startTagIndex == null) {
var alltestsPass = true;
for (var j in regexFirstTagParams) {
if (!regexFirstTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
startTagIndex = i + 1;
//console.log('Start Tag',startTagIndex)
matches++;
}
}
//test for lastTag
else if (startTagIndex != null) {
var alltestsPass = true;
for (var j in regexLastTagParams) {
if (!regexLastTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
endTagIndex = i + 1;
matches++;
}
}
if(startTagIndex && endTagIndex) break;
newTag = '';
}
if (readingTag) newTag += nextChar;
previousChar = nextChar;
}
if (matches < 2) return 'No matches';
else return html.substring(startTagIndex, endTagIndex).replace(/<[^>]+>/g, '');
}
function escapeRegex(string) {
if (string == null) return string;
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
My function requires an array of attributes for the start tag and an array of attributes for the end tag. It gets any text in between and removes any tags found inbetween. One issue I also noticed was there were often special characters (eg  ) so they need to be replaced. I did that outside the scope of the function above.
The function could be easily improved to check the tag type (eg h2), but it wasn't necessary for the wikipedia case.
Here is a function where I called the above function. the html variable is just the result of UrlFetchApp.fetch('some wikipedia city url').getContextText();
function getWikiTexts(html) {
var geography = getTextBetweenTags(html, [['class="mw-headline"'], ['id="Geography']], [['class="mw-headline"']]);
var economy = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Economy']], 'span', [['class="mw-headline"']])
var education = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Education']], 'span', [['class="mw-headline"']])
var returnString = '';
if (geography != 'No matches' && !/Wikipedia/.test(geography)) returnString += geography + '\n';
if (economy != 'No matches' && !/Wikipedia/.test(economy)) returnString += economy + '\n';
if (education != 'No matches' && !/Wikipedia/.test(education)) returnString += education + '\n';
return returnString
}
Thanks for posting your answer.

Get an HTML element's line number

Was wondering if there is a better way to find an elements line number in the sources code.
This is what i've go so far:
// Get the item that the user is clicking on:
var focused = doc.getSelection().anchorNode;
if(focused.nodeType == 3){ // text node
focused = focused.parentNode;
}
// Get the entire page as a string
// NOTE: <!doctype> is not included in this!
var pageStr = doc.documentElement.outerHTML;
// Get the focused node's parent and
// find where it begins in the page.
var parentNodeStr = focused.outerHTML;
var parentNodeIndex = pageStr.indexOf(parentNodeStr);
// Find where the focused node begins in it's parent.
var focusedStr = focused.outerHTML;
var focusedIndex = parentNodeStr.indexOf(focusedStr);
// Now find where the focused node begins in the overall page.
var actualIndex = parentNodeIndex - focusedIndex;
// Grab the text above the focused node
// and count the number of lines.
var contentAbove = pageStr.substr(0, actualIndex);
var lineNumbers = contentAbove.split("\n").length;
console.log("lineCount", lineNumbers);
Here's a better solution I've come up with, hopefully this will help someone down the road that's using Ace or CodeMirror in conjunction with contenteditable:
Setup (for newbies)
We can obtain where the user is selecting using:
var sel = document.getSelection();
The beginning of the selection is called the "anchor"
and the end is called "focus". For example,
when you select a few words of text, there is
a beginning and an end of the selection.
var anchorPoint = elementPointInCode(sel.anchorNode, sel.anchorOffset);
var focusPoint = elementPointInCode(sel.focusNode, sel.focusOffset);
Since HTML contains tags and readable text, there is an
offset. For example:
<p>abcdefgh</p>
// ...^
The offset is the index within the text node string.
In our example the letter "d" is offset by 4 characters
from the entry point of the &gtp&lt tag.
But the offset is zero based, so the offset is actually 3.
We get the offset using:
var offset = sel.anchorOffset;
// -- or --
var offset = sel.focusOffset;
... depending on what we want, the beginning of end.
Function
function elementPointInCode(element, offset) {
// There may or may not be an offset.
offset = offset || 0;
var node = element;
// Process first node because it'll more-than-likely be a text node.
// And we don't want to go matching text against any of the node HTML.
// e.g. page 1
// where the text "page" sould match the "page" within the <a> attributes.
var strIndex;
var str;
// Bump text nodes up to parent
if(node.nodeType == 3) {
node = node.parentNode;
str = node.outerHTML;
strIndex = str.indexOf(">") + offset + 1;
} else {
strIndex = ;
}
// This will ultimately contain the HTML string of the root node.
var parentNodeStr = "";
while(node){
// Get the current node's HTML.
var str = node.outerHTML;
// Preemptively snag the parent
var parent = node.parentNode;
if(parent && str){
// The <html> root, we won't have a parent.
var outer = parent.outerHTML;
if(outer){
// Stash the node's HTML for post processing.
parentNodeStr = outer;
// Cumulatively count the offset's within each node
strIndex += parentNodeStr.indexOf( str );
}
}
// Work our way up to the root
node = parent;
}
// Chop the root HTML by our cumulative string index
var str = parentNodeStr.substr(0, strIndex);
var Astr = str.split("\n" );
return {
row : Astr.length,
col : Astr.pop().length
}
};

Update/Replace inline image on Google Document

I'm trying to set a feature to update images on a Google Document, the same way Lucidchart Add-on does on its "Updated inserted diagram" feature. For this, I'm current doing the following:
Creating a Named Range and storing its id on document properties, together with the data to generate the image, for later retrieve.
On update, call body.getNamedRangeById() and replace the element with the new generated image.
This works, but I have the following problems that does not happen with Lucidchart:
Every update, a blank line is added after the image.
If the user drag and drop the image inside document for reposition it, the Named Range disappears and I'm not able to retrieve it later.
If the user centralize the image, after update the image comes back to left position, even copying its attributes
Does anybody knows a good strategy to replace/update a referenced image on Google Docs, the same way Lucidchart add-on update feature works?
Thanks
NamedRanges indeed get lost when the range is moved, so they're not very good for your scenario. But there's no other way of identifying elements (which is a great misfeature of Google Docs).
In the case of an image you could use its LINK_URL to identify it, which seems to be what Lucidchart uses. It does not get in the way of the user, so it may be a good solution.
About getting a blank line and losing attributes when inserting an image, I imagine (since you haven't shared any code) you're inserting the image directly in the document body instead of a paragraph. Then a paragraph gets created automatically to wrap your image resulting in the blank line and lost of attributes.
Here's some code example:
function initialInsert() {
var data = Charts.newDataTable().addColumn(
Charts.ColumnType.STRING, 'Fruits').addColumn(
Charts.ColumnType.NUMBER, 'Amount').addRow(
['Apple',15]).addRow(
['Orange',6]).addRow(
['Banana',14]).build();
var chart = Charts.newPieChart().setDataTable(data).build();
var body = DocumentApp.getActiveDocument().getBody()
body.appendImage(chart).setLinkUrl('http://mychart');
//here we're inserting directly in the body, a wrapping paragraph element will be created for us
}
function updateImage() {
var data = Charts.newDataTable().addColumn(
Charts.ColumnType.STRING, 'Fruits').addColumn(
Charts.ColumnType.NUMBER, 'Amount').addRow(
['Apple',Math.floor(Math.random()*31)]).addRow( //random int between 0 and 30
['Orange',Math.floor(Math.random()*31)]).addRow(
['Banana',Math.floor(Math.random()*31)]).build();
var chart = Charts.newPieChart().setDataTable(data).build();
var img = getMyImg(DocumentApp.getActiveDocument().getBody(), 'http://mychart');
//let's insert on the current parent instead of the body
var parent = img.getParent(); //probably a paragraph, but does not really matter
parent.insertInlineImage(parent.getChildIndex(img)+1, chart).setLinkUrl('http://mychart');
img.removeFromParent();
}
function getMyImg(docBody, linkUrl) {
var imgs = docBody.getImages();
for( var i = 0; i < imgs.length; ++i )
if( imgs[i].getLinkUrl() === linkUrl )
return imgs[i];
return null;
}
About the link_url, you could of course do like Lucidchart does and link back to your site. So it's not just broken for the user.
Take a look at my add-on called PlantUML Gizmo.
Here's the code to the insert image function, which deals with replacing images if there's already one selected:
function insertImage(imageDataUrl, imageUrl) {
/*
* For debugging cursor info
*/
// var cursor = DocumentApp.getActiveDocument().getCursor();
// Logger.log(cursor.getElement().getParent().getType());
// throw "cursor info: " + cursor.getElement().getType() + " offset = " + cursor.getOffset() + " surrounding text = '" + cursor.getSurroundingText().getText() + "' parent's type = " +
// cursor.getElement().getParent().getType();
/*
* end debug
*/
var doc = DocumentApp.getActiveDocument();
var selection = doc.getSelection();
var replaced = false;
if (selection) {
var elements = selection.getSelectedElements();
// delete the selected image (to be replaced)
if (elements.length == 1 &&
elements[0].getElement().getType() ==
DocumentApp.ElementType.INLINE_IMAGE) {
var parentElement = elements[0].getElement().getParent(); // so we can re-insert cursor
elements[0].getElement().removeFromParent();
replaced = true;
// move cursor to just before deleted image
doc.setCursor(DocumentApp.getActiveDocument().newPosition(parentElement, 0));
} else {
throw "Please select only one image (image replacement) or nothing (image insertion)"
}
}
var cursor = doc.getCursor();
var blob;
if (imageDataUrl != "") {
blob = getBlobFromBase64(imageDataUrl);
} else {
blob = getBlobViaFetch(imageUrl);
}
var image = cursor.insertInlineImage(blob);
image.setLinkUrl(imageUrl);
// move the cursor to after the image
var position = doc.newPosition(cursor.getElement(), cursor.getOffset()+1);
doc.setCursor(position);
if (cursor.getElement().getType() == DocumentApp.ElementType.PARAGRAPH) {
Logger.log("Resizing");
// resize if wider than current page
var currentParagraph = DocumentApp.getActiveDocument().getCursor().getElement().asParagraph();
var originalImageWidth = image.getWidth(); // pixels
var documentWidthPoints = DocumentApp.getActiveDocument().getBody().getPageWidth() - DocumentApp.getActiveDocument().getBody().getMarginLeft() - DocumentApp.getActiveDocument().getBody().getMarginRight();
var documentWidth = documentWidthPoints * 96 / 72; // convert to pixels (a guess)
var paragraphWidthPoints = documentWidthPoints - currentParagraph.getIndentStart() - currentParagraph.getIndentEnd();
var paragraphWidth = paragraphWidthPoints * 96 / 72; // convert to pixels (a guess)
if (originalImageWidth > paragraphWidth) {
image.setWidth(paragraphWidth);
// scale proportionally
image.setHeight(image.getHeight() * image.getWidth() / originalImageWidth);
}
}
}

How to get a link to a part of document (header, paragraph, section...)

I'm creating a document dynamically with some heading structure
doc = DocumentApp.create("My Document");
doc.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = doc.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
I can open it online, insert Table of contents and can access directly to "Section 1" by url like:
https://docs.google.com/document/d/1aA...FQ/edit#heading=h.41bpnx2ug57j
The question is: How I can get similar url/id to the "Section 1" in the code at run time and use it later as a link?
If I can't - is there any way to set something like anchor/bookmark and get it's url?
Thanks!
Starting to test Google Apps in depth, I had issues with the limited features related to the management of table of contents. I bumped into the code you proposed and used it as a starting point to write my own function to format a table of content:
- applying proper headings styles,
- numeroting the different parts.
I hope this would help some of you improving Google Docs templates:
/**
* Used to properly format the Table of Content object
*/
function formatToc() {
//Define variables
var level1 = 0;
var level2 = 0;
// Define custom paragraph styles.
var style1 = {};
style1[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style1[DocumentApp.Attribute.FONT_SIZE] = 18;
style1[DocumentApp.Attribute.BOLD] = true;
style1[DocumentApp.Attribute.FOREGROUND_COLOR] = '#ff0000';
var style2 = {};
style2[DocumentApp.Attribute.FONT_FAMILY] = DocumentApp.FontFamily.ARIAL;
style2[DocumentApp.Attribute.FONT_SIZE] = 14;
style2[DocumentApp.Attribute.BOLD] = true;
style2[DocumentApp.Attribute.FOREGROUND_COLOR] = '#007cb0';
// Search document's body for the table of contents (assuming there is one and only one).
var toc = doc.getBody().findElement(DocumentApp.ElementType.TABLE_OF_CONTENTS).getElement().asTableOfContents();
//Loop all the table of contents to apply new formating
for (var i = 0; i < toc.getNumChildren(); i++) {
//Search document's body for corresponding paragraph & retrieve heading
var searchText = toc.getChild(i).getText();
for (var j=0; j<doc.getBody().getNumChildren(); j++) {
var par = doc.getBody().getChild(j);
if (par.getType() == DocumentApp.ElementType.LIST_ITEM) {
var searchcomp = par.getText();
if (par.getText() == searchText) {
// Found corresponding paragrapg and update headingtype.
var heading = par.getHeading();
var level = par.getNestingLevel();
}
}
}
//Insert Paragraph number before text
if (level==0) {
level1++;
level2=0;
toc.getChild(i).editAsText().insertText(0,level1+". ");
}
if (level==1) {
level2++;
toc.getChild(i).editAsText().insertText(0,level1+"."+level2+". ");
}
//Apply style corresponding to heading
if (heading == DocumentApp.ParagraphHeading.HEADING1) {
toc.getChild(i).setAttributes(style1);
}
if (heading == DocumentApp.ParagraphHeading.HEADING2) {
toc.getChild(i).setAttributes(style2);
}
}
}
Now it is impossible to get a document part (section, paragraph, etc) link without having a TOC. Also there is no way to manage bookmarks from a GAS. There is an issue on the issue tracker. You can star the issue to promote it.
There is a workaround by using a TOC. The following code shows how to get URL from a TOC. It works only if the TOC exists, if to delete it, the links do not work anymore.
function testTOC() {
var doc = DocumentApp.openById('here is doc id');
for (var i = 0; i < doc.getNumChildren(); i++) {
var p = doc.getChild(i);
if (p.getType() == DocumentApp.ElementType.TABLE_OF_CONTENTS) {
var toc = p.asTableOfContents();
for (var ti = 0; ti < toc.getNumChildren(); ti++) {
var itemToc = toc.getChild(ti).asParagraph().getChild(0).asText();
var itemText = itemToc.getText();
var itemUrl = itemToc.getLinkUrl();
}
break;
}
}
}
The function iterates all document parts, finds the 1st TOC, iterates it and the variables itemText and itemUrl contain a TOC item text and URL. The URLs have #heading=h.uuj3ymgjhlie format.
Since the time the accepted answer was written, the ability to manage bookmarks inside Google Apps Script code was introduced. So it is possible to get a similar URL, though not the same exact URL as in example. You can manually insert a bookmark at the section heading, and use that bookmark to link to the section heading. It seems that for the purposes of the question, it will suffice. Here is some sample code (including slight modifications of code from question):
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
body.appendParagraph("Main").setHeading(DocumentApp.ParagraphHeading.HEADING1);
var section = body.appendParagraph("Section 1");
section.setHeading(DocumentApp.ParagraphHeading.HEADING2);
// create and position bookmark
var sectionPos = doc.newPosition(section, 0);
var sectionBookmark = doc.addBookmark(sectionPos);
// add a link to the section heading
var paragraph = body.appendParagraph("");
paragraph.appendText("Now we add a ");
paragraph.appendText("link to the section heading").setLinkUrl('#bookmark=' + sectionBookmark.getId());
paragraph.appendText(".");
Is it imperative that the document is a native Google docs type (ie. application/vnd.google-apps.document)?
If you stored the document as text/html you would have much greater control over how you assemble the document and how you expose it, eg with anchors.

Insert a Link Using CSS

I'm hand-maintaining an HTML document, and I'm looking for a way to automatically insert a link around text in a table. Let me illustrate:
<table><tr><td class="case">123456</td></tr></table>
I would like to automatically make every text in a TD with class "case" a link to that case in our bug tracking system (which, incidentally, is FogBugz).
So I'd like that "123456" to be changed to a link of this form:
123456
Is that possible? I've played with the :before and :after pseudo-elements, but there doesn't seem to be a way to repeat the case number.
Not in a manner that will work across browsers. You could, however, do that with some relatively trivial Javascript..
function makeCasesClickable(){
var cells = document.getElementsByTagName('td')
for (var i = 0, cell; cell = cells[i]; i++){
if (cell.className != 'case') continue
var caseId = cell.innerHTML
cell.innerHTML = ''
var link = document.createElement('a')
link.href = 'http://bugs.example.com/fogbugz/default.php?' + caseId
link.appendChild(document.createTextNode(caseId))
cell.appendChild(link)
}
}
You can apply it with something like onload = makeCasesClickable, or simply include it right at the end of the page.
here is a jQuery solution specific to your HTML posted:
$('.case').each(function() {
var link = $(this).html();
$(this).contents().wrap('');
});
in essence, over each .case element, will grab the contents of the element, and throw them into a link wrapped around it.
Not possible with CSS, plus that's not what CSS is for any way. Client-side Javascript or Server-side (insert language of choice) is the way to go.
I don't think it's possible with CSS. CSS is only supposed to affect the looks and layout of your content.
This seems like a job for a PHP script (or some other language). You didn't give enough information for me to know the best way to do it, but maybe something like this:
function case_link($id) {
return '' . $id . '';
}
Then later in your document:
<table><tr><td class="case"><?php echo case_link('123456'); ?></td></tr></table>
And if you want an .html file, just run the script from the command line and redirect the output to an .html file.
You could have something like this (using Javascript). Inside <head>, have
<script type="text/javascript" language="javascript">
function getElementsByClass (className) {
var all = document.all ? document.all :
document.getElementsByTagName('*');
var elements = new Array();
for (var i = 0; i < all.length; i++)
if (all[i].className == className)
elements[elements.length] = all[i];
return elements;
}
function makeLinks(className, url) {
nodes = getElementsByClass(className);
for(var i = 0; i < nodes.length; i++) {
node = nodes[i];
text = node.innerHTML
node.innerHTML = '' + text + '';
}
}
</script>
And then at the end of <body>
<script type="text/javascript" language="javascript">
makeLinks("case", "http://bugs.example.com/fogbugz/default.php?");
</script>
I've tested it, and it works fine.
I know this is an old question, but I stumbled upon this post looking for a solution for creating hyperlinks using CSS and ended up making my own, could be of interest for someone stumbling across this question like I did:
Here's a php function called 'linker();'that enables a fake CSS attribute
connect: 'url.com';
for an #id defined item.
just let the php call this on every item of HTML you deem link worthy.
the inputs are the .css file as a string, using:
$style_cont = file_get_contents($style_path);
and the #id of the corresponding item. Heres the whole thing:
function linker($style_cont, $id_html){
if (strpos($style_cont,'connect:') !== false) {
$url;
$id_final;
$id_outer = '#'.$id_html;
$id_loc = strpos($style_cont,$id_outer);
$connect_loc = strpos($style_cont,'connect:', $id_loc);
$next_single_quote = stripos($style_cont,"'", $connect_loc);
$next_double_quote = stripos($style_cont,'"', $connect_loc);
if($connect_loc < $next_single_quote)
{
$link_start = $next_single_quote +1;
$last_single_quote = stripos($style_cont, "'", $link_start);
$link_end = $last_single_quote;
$link_size = $link_end - $link_start;
$url = substr($style_cont, $link_start, $link_size);
}
else
{
$link_start = $next_double_quote +1;
$last_double_quote = stripos($style_cont, '"', $link_start);
$link_end = $last_double_quote;
$link_size = $link_end - $link_start;
$url = substr($style_cont, $link_start, $link_size); //link!
}
$connect_loc_rev = (strlen($style_cont) - $connect_loc) * -1;
$id_start = strrpos($style_cont, '#', $connect_loc_rev);
$id_end = strpos($style_cont,'{', $id_start);
$id_size = $id_end - $id_start;
$id_raw = substr($style_cont, $id_start, $id_size);
$id_clean = rtrim($id_raw); //id!
if (strpos($url,'http://') !== false)
{
$url_clean = $url;
}
else
{
$url_clean = 'http://'.$url;
};
if($id_clean[0] == '#')
{
$id_final = $id_clean;
if($id_outer == $id_final)
{
echo '<a href="';
echo $url_clean;
echo '" target="_blank">';
};
};
};
};
this could probably be improved/shortened using commands like .wrap() or getelementbyID()
because it only generates the <a href='blah'> portion, but seeing as </a> disappears anyway without a opening clause it still works if you just add them everywhere :D