How to Read a text file using Java script and display it in column fashion in HTML.? - html

This is the code which reads the text file using the Jscript and displays it in HTML. But i need it to display it in table.
How to display it in Table.
< this is my first question so i hope i get solution >
`
Read File (via User Input selection)
var reader; //GLOBAL File Reader object for demo purpose only
/**
* Check for the various File API support.
*/
function checkFileAPI() {
if (window.File && window.FileReader && window.FileList && window.Blob) {
reader = new FileReader();
return true;
} else {
alert('The File APIs are not fully supported by your browser. Fallback required.');
return false;
}
}
/**
* read text input
*/
function readText(filePath) {
var output = ""; //placeholder for text output
if(filePath.files && filePath.files[0]) {
reader.onload = function (e) {
output = e.target.result;
displayContents(output);
};//end onload()
reader.readAsText(filePath.files[0]);
}//end if html5 filelist support
else if(ActiveXObject && filePath) { //fallback to IE 6-8 support via ActiveX
try {
reader = new ActiveXObject("Scripting.FileSystemObject");
var file = reader.OpenTextFile(filePath, 1); //ActiveX File Object
output = file.ReadAll(); //text contents of file
file.Close(); //close file "input stream"
displayContents(output);
} catch (e) {
if (e.number == -2146827859) {
alert('Unable to access local files due to browser security settings. ' +
'To overcome this, go to Tools->Internet Options->Security->Custom Level. ' +
'Find the setting for "Initialize and script ActiveX controls not marked as safe" and change it to "Enable" or "Prompt"');
}
}
}
else { //this is where you could fallback to Java Applet, Flash or similar
return false;
}
return true;
}
/**
* display content using a basic HTML replacement
*/
function displayContents(txt) {
var el = document.getElementById('main');
el.innerHTML = txt; //display output in DOM
}
</script>
</head>
<body onload="checkFileAPI();">
<div id="container">
<input type="file" onchange='readText(this)' />
<br/>
<hr/>
<h3>Contents of the Text file:</h3>
<div id="main">
...
</div>
</div>
</body>
</html>`
Please help me in this

You could format the text file like a SSV (TSV or CSV as well), then instead of ReadAll() I'd do something like this:
var file = reader.OpenTextFile(filePath, 1),
data = [], n;
while (!file.AtEndOfStream) {
data.push(file.ReadLine().split(';')); // or use some other "cell-separator"
}
Then the rest is a lot simpler and faster, if you've an empty table element in your HTML:
<table id="table"></table>
Now just create rows and cells dynamically based on the data array:
var table = document.getElementById('table'),
len = data.length,
r, row, c, cell;
for (r = 0; r < len; r++) {
row = table.insertRow(-1);
for (c = 0; c < data[r].lenght; r++) {
cell.row.insertCell(-1);
cell.innerHTML = data[r][c];
}
}

Related

Get the first hyperlink and its text value

I hope everyone is in good health health and condition.
Recently, I have been working on Google Docs hyperlinks using app scripts and learning along the way. I was trying to get all hyperlink and edit them and for that I found an amazing code from this post. I have read the code multiple times and now I have a good understanding of how it works.
My confusion
My confusion is the recursive process happening in this code, although I am familiar with the concept of Recursive functions but when I try to modify to code to get only the first hyperlink from the document, I could not understand it how could I achieve that without breaking the recursive function.
Here is the code that I am trying ;
/**
* Get an array of all LinkUrls in the document. The function is
* recursive, and if no element is provided, it will default to
* the active document's Body element.
*
* #param {Element} element The document element to operate on.
* .
* #returns {Array} Array of objects, vis
* {element,
* startOffset,
* endOffsetInclusive,
* url}
*/
function getAllLinks(element) {
var links = [];
element = element || DocumentApp.getActiveDocument().getBody();
if (element.getType() === DocumentApp.ElementType.TEXT) {
var textObj = element.editAsText();
var text = element.getText();
var inUrl = false;
for (var ch=0; ch < text.length; ch++) {
var url = textObj.getLinkUrl(ch);
if (url != null) {
if (!inUrl) {
// We are now!
inUrl = true;
var curUrl = {};
curUrl.element = element;
curUrl.url = String( url ); // grab a copy
curUrl.startOffset = ch;
}
else {
curUrl.endOffsetInclusive = ch;
}
}
else {
if (inUrl) {
// Not any more, we're not.
inUrl = false;
links.push(curUrl); // add to links
curUrl = {};
}
}
}
if (inUrl) {
// in case the link ends on the same char that the element does
links.push(curUrl);
}
}
else {
var numChildren = element.getNumChildren();
for (var i=0; i<numChildren; i++) {
links = links.concat(getAllLinks(element.getChild(i)));
}
}
return links;
}
I tried adding
if (links.length > 0){
return links;
}
but it does not stop the function as it is recursive and it return back to its previous calls and continue running.
Here is the test document along with its script that I am working on.
https://docs.google.com/document/d/1eRvnR2NCdsO94C5nqly4nRXCttNziGhwgR99jElcJ_I/edit?usp=sharing
I hope you will understand what I am trying to convey, Thanks for giving a look at my post. Stay happy :D
I believe your goal as follows.
You want to retrieve the 1st link and the text of link from the shared Document using Google Apps Script.
You want to stop the recursive loop when the 1st element is retrieved.
Modification points:
I tried adding
if (links.length > 0){
return links;
}
but it does not stop the function as it is recursive and it return back to its previous calls and continue running.
About this, unfortunately, I couldn't understand where you put the script in your script. In this case, I think that it is required to stop the loop when links has the value. And also, it is required to also retrieve the text. So, how about modifying as follows? I modified 3 parts in your script.
Modified script:
function getAllLinks(element) {
var links = [];
element = element || DocumentApp.getActiveDocument().getBody();
if (element.getType() === DocumentApp.ElementType.TEXT) {
var textObj = element.editAsText();
var text = element.getText();
var inUrl = false;
for (var ch=0; ch < text.length; ch++) {
if (links.length > 0) break; // <--- Added
var url = textObj.getLinkUrl(ch);
if (url != null) {
if (!inUrl) {
// We are now!
inUrl = true;
var curUrl = {};
curUrl.element = element;
curUrl.url = String( url ); // grab a copy
curUrl.startOffset = ch;
}
else {
curUrl.endOffsetInclusive = ch;
}
}
else {
if (inUrl) {
// Not any more, we're not.
inUrl = false;
curUrl.text = text.slice(curUrl.startOffset, curUrl.endOffsetInclusive + 1); // <--- Added
links.push(curUrl); // add to links
curUrl = {};
}
}
}
if (inUrl) {
// in case the link ends on the same char that the element does
links.push(curUrl);
}
}
else {
var numChildren = element.getNumChildren();
for (var i=0; i<numChildren; i++) {
if (links.length > 0) { // <--- Added or if (links.length > 0) break;
return links;
}
links = links.concat(getAllLinks(element.getChild(i)));
}
}
return links;
}
In this case, I think that if (links.length > 0) {return links;} can be modified to if (links.length > 0) break;.
Note:
By the way, when Google Docs API is used, both the links and the text can be also retrieved by a simple script as follows. When you use this, please enable Google Docs API at Advanced Google services.
function myFunction() {
const doc = DocumentApp.getActiveDocument();
const res = Docs.Documents.get(doc.getId()).body.content.reduce((ar, {paragraph}) => {
if (paragraph && paragraph.elements) {
paragraph.elements.forEach(({textRun}) => {
if (textRun && textRun.textStyle && textRun.textStyle.link) {
ar.push({text: textRun.content, url: textRun.textStyle.link.url});
}
});
}
return ar;
}, []);
console.log(res) // You can retrieve 1st link and test by console.log(res[0]).
}

Google app script - getting all hyperlinks from document [duplicate]

Given a "normal document" in Google Docs/Drive (e.g. paragraphs, lists, tables) which contains external links scattered throughout the content, how do you compile a list of links present using Google Apps Script?
Specifically, I want to update all broken links in the document by searching for oldText in each url and replace it with newText in each url, but not the text.
I don't think the replacing text section of the Dev Documentation is what I need -- do I need to scan every element of the doc? Can I just editAsText and use an html regex? Examples would be appreciated.
This is only mostly painful! Code is available as part of a gist.
Yeah, I can't spell.
getAllLinks
Here's a utility function that scans the document for all LinkUrls, returning them in an array.
/**
* Get an array of all LinkUrls in the document. The function is
* recursive, and if no element is provided, it will default to
* the active document's Body element.
*
* #param {Element} element The document element to operate on.
* .
* #returns {Array} Array of objects, vis
* {element,
* startOffset,
* endOffsetInclusive,
* url}
*/
function getAllLinks(element) {
var links = [];
element = element || DocumentApp.getActiveDocument().getBody();
if (element.getType() === DocumentApp.ElementType.TEXT) {
var textObj = element.editAsText();
var text = element.getText();
var inUrl = false;
for (var ch=0; ch < text.length; ch++) {
var url = textObj.getLinkUrl(ch);
if (url != null) {
if (!inUrl) {
// We are now!
inUrl = true;
var curUrl = {};
curUrl.element = element;
curUrl.url = String( url ); // grab a copy
curUrl.startOffset = ch;
}
else {
curUrl.endOffsetInclusive = ch;
}
}
else {
if (inUrl) {
// Not any more, we're not.
inUrl = false;
links.push(curUrl); // add to links
curUrl = {};
}
}
}
if (inUrl) {
// in case the link ends on the same char that the element does
links.push(curUrl);
}
}
else {
var numChildren = element.getNumChildren();
for (var i=0; i<numChildren; i++) {
links = links.concat(getAllLinks(element.getChild(i)));
}
}
return links;
}
findAndReplaceLinks
This utility builds on getAllLinks to do a find & replace function.
/**
* Replace all or part of UrlLinks in the document.
*
* #param {String} searchPattern the regex pattern to search for
* #param {String} replacement the text to use as replacement
*
* #returns {Number} number of Urls changed
*/
function findAndReplaceLinks(searchPattern,replacement) {
var links = getAllLinks();
var numChanged = 0;
for (var l=0; l<links.length; l++) {
var link = links[l];
if (link.url.match(searchPattern)) {
// This link needs to be changed
var newUrl = link.url.replace(searchPattern,replacement);
link.element.setLinkUrl(link.startOffset, link.endOffsetInclusive, newUrl);
numChanged++
}
}
return numChanged;
}
Demo UI
To demonstrate the use of these utilities, here are a couple of UI extensions:
function onOpen() {
// Add a menu with some items, some separators, and a sub-menu.
DocumentApp.getUi().createMenu('Utils')
.addItem('List Links', 'sidebarLinks')
.addItem('Replace Link Text', 'searchReplaceLinks')
.addToUi();
}
function searchReplaceLinks() {
var ui = DocumentApp.getUi();
var app = UiApp.createApplication()
.setWidth(250)
.setHeight(100)
.setTitle('Change Url text');
var form = app.createFormPanel();
var flow = app.createFlowPanel();
flow.add(app.createLabel("Find: "));
flow.add(app.createTextBox().setName("searchPattern"));
flow.add(app.createLabel("Replace: "));
flow.add(app.createTextBox().setName("replacement"));
var handler = app.createServerHandler('myClickHandler');
flow.add(app.createSubmitButton("Submit").addClickHandler(handler));
form.add(flow);
app.add(form);
ui.showDialog(app);
}
// ClickHandler to close dialog
function myClickHandler(e) {
var app = UiApp.getActiveApplication();
app.close();
return app;
}
function doPost(e) {
var numChanged = findAndReplaceLinks(e.parameter.searchPattern,e.parameter.replacement);
var ui = DocumentApp.getUi();
var app = UiApp.createApplication();
sidebarLinks(); // Update list
var result = DocumentApp.getUi().alert(
'Results',
"Changed "+numChanged+" urls.",
DocumentApp.getUi().ButtonSet.OK);
}
/**
* Shows a custom HTML user interface in a sidebar in the Google Docs editor.
*/
function sidebarLinks() {
var links = getAllLinks();
var sidebar = HtmlService
.createHtmlOutput()
.setTitle('URL Links')
.setWidth(350 /* pixels */);
// Display list of links, url only.
for (var l=0; l<links.length; l++) {
var link = links[l];
sidebar.append('<p>'+link.url);
}
DocumentApp.getUi().showSidebar(sidebar);
}
I offer another, shorter answer for your first question, concerning iterating through all links in a document's body. This instructive code returns a flat array of links in the current document's body, where each link is represented by an object with entries pointing to the text element (text), the paragraph element or list item element in which it's contained (paragraph), the offset index in the text where the link appears (startOffset) and the URL itself (url). Hopefully, you'll find it easy to suit it for your own needs.
It uses the getTextAttributeIndices() method rather than iterating over every character of the text, and is thus expected to perform much more quickly than previously written answers.
EDIT: Since originally posting this answer, I modified the function a couple of times. It now also (1) includes the endOffsetInclusive property for each link (note that it can be null for links that extend to the end of the text element - in this case one can use link.text.length-1 instead); (2) finds links in all sections of the document, not only the body, and (3) includes the section and isFirstPageSection properties to indicate where the link is located; (4) accepts the argument mergeAdjacent, which when set to true, will return only a single link entry for a continuous stretch of text linked to the same URL (which would be considered separate if, for instance, part of the text is styled differently than another part).
For the purpose of including links under all sections, a new utility function, iterateSections(), was introduced.
/**
* Returns a flat array of links which appear in the active document's body.
* Each link is represented by a simple Javascript object with the following
* keys:
* - "section": {ContainerElement} the document section in which the link is
* found.
* - "isFirstPageSection": {Boolean} whether the given section is a first-page
* header/footer section.
* - "paragraph": {ContainerElement} contains a reference to the Paragraph
* or ListItem element in which the link is found.
* - "text": the Text element in which the link is found.
* - "startOffset": {Number} the position (offset) in the link text begins.
* - "endOffsetInclusive": the position of the last character of the link
* text, or null if the link extends to the end of the text element.
* - "url": the URL of the link.
*
* #param {boolean} mergeAdjacent Whether consecutive links which carry
* different attributes (for any reason) should be returned as a single
* entry.
*
* #returns {Array} the aforementioned flat array of links.
*/
function getAllLinks(mergeAdjacent) {
var links = [];
var doc = DocumentApp.getActiveDocument();
iterateSections(doc, function(section, sectionIndex, isFirstPageSection) {
if (!("getParagraphs" in section)) {
// as we're using some undocumented API, adding this to avoid cryptic
// messages upon possible API changes.
throw new Error("An API change has caused this script to stop " +
"working.\n" +
"Section #" + sectionIndex + " of type " +
section.getType() + " has no .getParagraphs() method. " +
"Stopping script.");
}
section.getParagraphs().forEach(function(par) {
// skip empty paragraphs
if (par.getNumChildren() == 0) {
return;
}
// go over all text elements in paragraph / list-item
for (var el=par.getChild(0); el!=null; el=el.getNextSibling()) {
if (el.getType() != DocumentApp.ElementType.TEXT) {
continue;
}
// go over all styling segments in text element
var attributeIndices = el.getTextAttributeIndices();
var lastLink = null;
attributeIndices.forEach(function(startOffset, i, attributeIndices) {
var url = el.getLinkUrl(startOffset);
if (url != null) {
// we hit a link
var endOffsetInclusive = (i+1 < attributeIndices.length?
attributeIndices[i+1]-1 : null);
// check if this and the last found link are continuous
if (mergeAdjacent && lastLink != null && lastLink.url == url &&
lastLink.endOffsetInclusive == startOffset - 1) {
// this and the previous style segment are continuous
lastLink.endOffsetInclusive = endOffsetInclusive;
return;
}
lastLink = {
"section": section,
"isFirstPageSection": isFirstPageSection,
"paragraph": par,
"textEl": el,
"startOffset": startOffset,
"endOffsetInclusive": endOffsetInclusive,
"url": url
};
links.push(lastLink);
}
});
}
});
});
return links;
}
/**
* Calls the given function for each section of the document (body, header,
* etc.). Sections are children of the DocumentElement object.
*
* #param {Document} doc The Document object (such as the one obtained via
* a call to DocumentApp.getActiveDocument()) with the sections to iterate
* over.
* #param {Function} func A callback function which will be called, for each
* section, with the following arguments (in order):
* - {ContainerElement} section - the section element
* - {Number} sectionIndex - the child index of the section, such that
* doc.getBody().getParent().getChild(sectionIndex) == section.
* - {Boolean} isFirstPageSection - whether the section is a first-page
* header/footer section.
*/
function iterateSections(doc, func) {
// get the DocumentElement interface to iterate over all sections
// this bit is undocumented API
var docEl = doc.getBody().getParent();
var regularHeaderSectionIndex = (doc.getHeader() == null? -1 :
docEl.getChildIndex(doc.getHeader()));
var regularFooterSectionIndex = (doc.getFooter() == null? -1 :
docEl.getChildIndex(doc.getFooter()));
for (var i=0; i<docEl.getNumChildren(); ++i) {
var section = docEl.getChild(i);
var sectionType = section.getType();
var uniqueSectionName;
var isFirstPageSection = (
i != regularHeaderSectionIndex &&
i != regularFooterSectionIndex &&
(sectionType == DocumentApp.ElementType.HEADER_SECTION ||
sectionType == DocumentApp.ElementType.FOOTER_SECTION));
func(section, i, isFirstPageSection);
}
}
I was playing around and incorporated #Mogsdad's answer -- here's the really complicated version:
var _ = Underscorejs.load(); // loaded via http://googleappsdeveloper.blogspot.com/2012/11/using-open-source-libraries-in-apps.html, rolled my own
var ui = DocumentApp.getUi();
// #region --------------------- Utilities -----------------------------
var gDocsHelper = (function(P, un) {
// heavily based on answer https://stackoverflow.com/a/18731628/1037948
var updatedLinkText = function(link, offset) {
return function() { return 'Text: ' + link.getText().substring(offset,100) + ((link.getText().length-offset) > 100 ? '...' : ''); }
}
P.updateLink = function updateLink(link, oldText, newText, start, end) {
var oldLink = link.getLinkUrl(start);
if(0 > oldLink.indexOf(oldText)) return false;
var newLink = oldLink.replace(new RegExp(oldText, 'g'), newText);
link.setLinkUrl(start || 0, (end || oldLink.length), newLink);
log(true, "Updating Link: ", oldLink, newLink, start, end, updatedLinkText(link, start) );
return { old: oldLink, "new": newLink, getText: updatedLinkText(link, start) };
};
// moving this reused block out to 'private' fn
var updateLinkResult = function(text, oldText, newText, link, urls, sidebar, updateResult) {
// and may as well update the link while we're here
if(false !== (updateResult = P.updateLink(text, oldText, newText, link.start, link.end))) {
sidebar.append('<li>' + updateResult['old'] + ' → ' + updateResult['new'] + ' at ' + updateResult['getText']() + '</li>');
}
urls.push(link.url); // so multiple links get added to list
};
P.updateLinksMenu = function() {
// https://developers.google.com/apps-script/reference/base/prompt-response
var oldText = ui.prompt('Old link text to replace').getResponseText();
var newText = ui.prompt('New link text to replace with').getResponseText();
log('Replacing: ' + oldText + ', ' + newText);
var sidebar = gDocUiHelper.createSidebar('Update All Links', '<h3>Replacing</h3><p><code>' + oldText + '</code> → <code>' + newText + '</code></p><hr /><ol>');
// current doc available to script
var doc = DocumentApp.getActiveDocument().getBody();//.getActiveSection();
// Search until a link is found
var links = P.findAllElementsFor(doc, function(text) {
var i = -1, n = text.getText().length, link = false, url, urls = [], updateResult;
// note: the following only gets the FIRST link in the text -- while(i < n && !(url = text.getLinkUrl(i++)));
// scan the text element for links
while(++i < n) {
// getLinkUrl will continue to get a link while INSIDE the stupid link, so only do this once
if(url = text.getLinkUrl(i)) {
if(false === link) {
link = { start: i, end: -1, url: url };
// log(true, 'Type: ' + text.getType(), 'Link: ' + url, function() { return 'Text: ' + text.getText().substring(i,100) + ((n-i) > 100 ? '...' : '')});
}
else {
link.end = i; // keep updating the end position until we leave
}
}
// just left the link -- reset link tracking
else if(false !== link) {
// and may as well update the link while we're here
updateLinkResult(text, oldText, newText, link, urls, sidebar);
link = false; // reset "counter"
}
}
// once we've reached the end of the text, must also check to see if the last thing we found was a link
if(false !== link) updateLinkResult(text, oldText, newText, link, urls, sidebar);
return urls;
});
sidebar.append('</ol><p><strong>' + links.length + ' links reviewed</strong></p>');
gDocUiHelper.attachSidebar(sidebar);
log(links);
};
P.findAllElementsFor = function(el, test) {
// generic utility function to recursively find all elements; heavily based on https://stackoverflow.com/a/18731628/1037948
var results = [], searchResult = null, i, result;
// https://developers.google.com/apps-script/reference/document/body#findElement(ElementType)
while (searchResult = el.findElement(DocumentApp.ElementType.TEXT, searchResult)) {
var t = searchResult.getElement().editAsText(); // .asParagraph()
// check to add to list
if(test && (result = test(t))) {
if( _.isArray(result) ) results = results.concat(result); // could be big? http://jsperf.com/self-concatenation/
else results.push(result);
}
}
// recurse children if not plain text item
if(el.getType() !== DocumentApp.ElementType.TEXT) {
i = el.getNumChildren();
var result;
while(--i > 0) {
result = P.findAllElementsFor(el.getChild(i));
if(result && result.length > 0) results = results.concat(result);
}
}
return results;
};
return P;
})({});
// really? it can't handle object properties?
function gDocsUpdateLinksMenu() {
gDocsHelper.updateLinksMenu();
}
gDocUiHelper.addMenu('Zaus', [ ['Update links', 'gDocsUpdateLinksMenu'] ]);
// #endregion --------------------- Utilities -----------------------------
And I'm including the "extra" utility classes for creating menus, sidebars, etc below for completeness:
var log = function() {
// return false;
var args = Array.prototype.slice.call(arguments);
// allowing functions delegates execution so we can save some non-debug cycles if code left in?
if(args[0] === true) Logger.log(_.map(args, function(v) { return _.isFunction(v) ? v() : v; }).join('; '));
else
_.each(args, function(v) {
Logger.log(_.isFunction(v) ? v() : v);
});
}
// #region --------------------- Menu -----------------------------
var gDocUiHelper = (function(P, un) {
P.addMenuToSheet = function addMenu(spreadsheet, title, items) {
var menu = ui.createMenu(title);
// make sure menu items are correct format
_.each(items, function(v,k) {
var err = [];
// provided in format [ [name, fn],... ] instead
if( _.isArray(v) ) {
if ( v.length === 2 ) {
menu.addItem(v[0], v[1]);
}
else {
err.push('Menu item ' + k + ' missing name or function: ' + v.join(';'))
}
}
else {
if( !v.name ) err.push('Menu item ' + k + ' lacks name');
if( !v.functionName ) err.push('Menu item ' + k + ' lacks function');
if(!err.length) menu.addItem(v.name, v.functionName);
}
if(err.length) {
log(err);
ui.alert(err.join('; '));
}
});
menu.addToUi();
};
// list of things to hook into
var initializers = {};
P.addMenu = function(menuTitle, menuItems) {
if(initializers[menuTitle] === un) {
initializers[menuTitle] = [];
}
initializers[menuTitle] = initializers[menuTitle].concat(menuItems);
};
P.createSidebar = function(title, content, options) {
var sidebar = HtmlService
.createHtmlOutput()
.setTitle(title)
.setWidth( (options && options.width) ? width : 350 /* pixels */);
sidebar.append(content);
if(options && options.on) DocumentApp.getUi().showSidebar(sidebar);
// else { sidebar.attach = function() { DocumentApp.getUi().showSidebar(this); }; } // should really attach to prototype...
return sidebar;
};
P.attachSidebar = function(sidebar) {
DocumentApp.getUi().showSidebar(sidebar);
};
P.onOpen = function() {
var spreadsheet = SpreadsheetApp.getActive();
log(initializers);
_.each(initializers, function(v,k) {
P.addMenuToSheet(spreadsheet, k, v);
});
};
return P;
})({});
// #endregion --------------------- Menu -----------------------------
/**
* A special function that runs when the spreadsheet is open, used to add a
* custom menu to the spreadsheet.
*/
function onOpen() {
gDocUiHelper.onOpen();
}
Had some trouble getting Mogsdad's solution to work. Specifically it misses links which end their parent element so there isn't a trailing non-link character to terminate it. I've implemented something which addresses this and returns a standard range element. Sharing here incase someone finds it useful.
function getAllLinks(element) {
var rangeBuilder = DocumentApp.getActiveDocument().newRange();
// Parse the text iteratively to find the start and end indices for each link
if (element.getType() === DocumentApp.ElementType.TEXT) {
var links = [];
var string = element.getText();
var previousUrl = null; // The URL of the previous character
var currentLink = null; // The latest link being built
for (var charIndex = 0; charIndex < string.length; charIndex++) {
var currentUrl = element.getLinkUrl(charIndex);
// New URL means create a new link
if (currentUrl !== null && previousUrl !== currentUrl) {
if (currentLink !== null) links.push(currentLink);
currentLink = {};
currentLink.url = String(currentUrl);
currentLink.startOffset = charIndex;
}
// In a URL means extend the end of the current link
if (currentUrl !== null) {
currentLink.endOffsetInclusive = charIndex;
}
// Not in a URL means close and push the link if ready
if (currentUrl === null) {
if (currentLink !== null) links.push(currentLink);
currentLink = null;
}
// End the loop and go again
previousUrl = currentUrl;
}
// Handle the end case when final character is a link
if (currentLink !== null) links.push(currentLink);
// Convert the links into a range before returning
links.forEach(function(link) {
rangeBuilder.addElement(element, link.startOffset, link.endOffsetInclusive);
});
}
// If not a text element then recursively get links from child elements
else if (element.getNumChildren) {
for (var i = 0; i < element.getNumChildren(); i++) {
rangeBuilder.addRange(getAllLinks(element.getChild(i)));
}
}
return rangeBuilder.build();
}
You are right ... search and replace is not applicable here.
Use setLinkUrl() https://developers.google.com/apps-script/reference/document/container-element#setLinkUrl(String)
Basically you have to iterate through the elements recursively (elements can contain elements) and for each
use getLinkUrl() to get the oldText
if not null , setLinkUrl(newText) .... leaves displayed text unchanged
This Excel macro lists the links from a Word doc. You'd need to copy your data into a Word doc first.
Sub getLinks()
Dim wApp As Word.Application, wDoc As Word.Document
Dim i As Integer, r As Range
Const filePath = "C:\test\test.docx"
Set wApp = CreateObject("Word.Application")
'wApp.Visible = True
Set wDoc = wApp.Documents.Open(filePath)
Set r = Range("A1")
For i = 1 To wDoc.Hyperlinks.Count
r = wDoc.Hyperlinks(i).Address
Set r = r.Offset(1, 0)
Next i
wApp.Quit
Set wDoc = Nothing
Set wApp = Nothing
End Sub
Here's a quick and dirty way to accomplish the same goal with no scripting:
From Google Docs, save the document in RTF format.
In your editor of choice, edit the links in the RTF file (in my case, I wanted to modify all the hyperlinks, so I used Emacs and regexp-replace). Save the file when you're done.
Create a fresh, new Google Doc, and from the menu, select File>Open and open the RTF file. Docs will convert your edited RTF file back into a proper Google Doc, restoring all formatting.
Google Docs' RTF format is pretty complete--I haven't noticed any loss of fidelity in making the round trip, and it has the advantage of fully exposing all the hyperlinks, formatting, and everything else about the document in a form that's easy to edit and to apply regex tools to.

Google Apps Script; Docs; convert selected element to HTML

I am just starting with Google Apps Script and following the Add-on quickstart
https://developers.google.com/apps-script/quickstart/docs
In the quickstart you can create a simple add-on to get a selection from a document and translate it with the LanguageApp service. The example gets the underlying text using this:
function getSelectedText() {
var selection = DocumentApp.getActiveDocument().getSelection();
if (selection) {
var text = [];
var elements = selection.getSelectedElements();
for (var i = 0; i < elements.length; i++) {
if (elements[i].isPartial()) {
var element = elements[i].getElement().asText();
var startIndex = elements[i].getStartOffset();
var endIndex = elements[i].getEndOffsetInclusive();
text.push(element.getText().substring(startIndex, endIndex + 1));
} else {
var element = elements[i].getElement();
// Only translate elements that can be edited as text; skip images and
// other non-text elements.
if (element.editAsText) {
var elementText = element.asText().getText();
// This check is necessary to exclude images, which return a blank
// text element.
if (elementText != '') {
text.push(elementText);
}
}
}
}
if (text.length == 0) {
throw 'Please select some text.';
}
return text;
} else {
throw 'Please select some text.';
}
}
It gets the text only: element.getText(), without any formatting.
I know the underlying object is not html, but is there a way to get the selection converted into a HTML string? For example, if the selection has a mix of formatting, like bold:
this is a sample with bold text
Then is there any method, extension, library, etc, -- like element.getHTML() -- that could return this?
this is a sample with <b>bold</b> text
instead of this?
this is a sample with bold text
There is a script GoogleDoc2HTML by Omar AL Zabir. Its purpose is to convert the entire document into HTML. Since you only want to convert rich text within the selected element, the function relevant to your task is processText from the script, shown below.
The method getTextAttributeIndices gives the starting offsets for each change of text attribute, like from normal to bold or back. If there is only one change, that's the attribute for the entire element (typically paragraph), and this is dealt with in the first part of if-statement.
The second part deals with the general case, looping over the indices and inserting HTML markup corresponding to the attributes.
The script isn't maintained, so consider it as a starting point for your own code, rather than a ready-to-use library. There are some unmerged PRs that improve the conversion process, in particular for inline links.
function processText(item, output) {
var text = item.getText();
var indices = item.getTextAttributeIndices();
if (indices.length <= 1) {
// Assuming that a whole para fully italic is a quote
if(item.isBold()) {
output.push('<b>' + text + '</b>');
}
else if(item.isItalic()) {
output.push('<blockquote>' + text + '</blockquote>');
}
else if (text.trim().indexOf('http://') == 0) {
output.push('' + text + '');
}
else {
output.push(text);
}
}
else {
for (var i=0; i < indices.length; i ++) {
var partAtts = item.getAttributes(indices[i]);
var startPos = indices[i];
var endPos = i+1 < indices.length ? indices[i+1]: text.length;
var partText = text.substring(startPos, endPos);
Logger.log(partText);
if (partAtts.ITALIC) {
output.push('<i>');
}
if (partAtts.BOLD) {
output.push('<b>');
}
if (partAtts.UNDERLINE) {
output.push('<u>');
}
// If someone has written [xxx] and made this whole text some special font, like superscript
// then treat it as a reference and make it superscript.
// Unfortunately in Google Docs, there's no way to detect superscript
if (partText.indexOf('[')==0 && partText[partText.length-1] == ']') {
output.push('<sup>' + partText + '</sup>');
}
else if (partText.trim().indexOf('http://') == 0) {
output.push('' + partText + '');
}
else {
output.push(partText);
}
if (partAtts.ITALIC) {
output.push('</i>');
}
if (partAtts.BOLD) {
output.push('</b>');
}
if (partAtts.UNDERLINE) {
output.push('</u>');
}
}
}
}
Ended up making a script to support my use-case of bold+links+italics:
function getHtmlOfElement(element) {
var text = element.editAsText();
var string = text.getText();
var indices = text.getTextAttributeIndices();
var output = [];
for (var i = 0; i < indices.length; i++) {
var offset = indices[i];
var startPos = offset;
var endPos = i+1 < indices.length ? indices[i+1]: string.length;
var partText = string.substring(startPos, endPos);
var isBold = text.isBold(offset);
var isItalic = text.isItalic(offset);
var linkUrl = text.getLinkUrl(offset);
if (isBold) {
output.push('<b>');
}
if (isItalic) {
output.push('<i>');
}
if (linkUrl) {
output.push('<a href="' + linkUrl + '">');
}
output.push(partText);
if (isBold) {
output.push('</b>');
}
if (isItalic) {
output.push('</i>');
}
if (linkUrl) {
output.push('</a>');
}
}
return output.join("");
}
You can simply call it using something like:
getHtmlOfElement(myTableCell); // returns something like "<b>Bold</b> test."
This is obviously a workaround, but you can copy/paste a Google Doc into a draft in Gmail and then that draft can be turned into HTML using
GmailApp.getDraft(draftId).getMessage().getBody().toString();
I found this thread trying to skip that step by going straight from a Doc to HTML, but I thought I'd share.

Export multiple html tables to Excel

I've scavenged the inter web for answers and though I found some, they were mostly incomplete or not working.
What I'm trying to do is: I have a info page which displays information about a customer or server (or something else), this information is displayed in a table, sometimes multiple tables (I sometimes create my own table for some of the data and use Html.Grid(Model.list) to create tables for the rest of the data stored in lists, all on 1 page).
I found this website which is an awesome: http://www.excelmashup.com/ and does exactly what I want for 1 table, though I need this for multiple tables (they must all be in the same Excel file). I know I can create multiple files (1 for each table) but this is not the desired output.
So I kept on searching and I found a post on stackoverflow: Export multiple HTML tables to Excel with JavaScript function
This seemed promising so I tried using it but the code had some minor errors which I tried to fix:
var tableToExcel = (function () {
var uri = 'data:application/vnd.ms-excel;base64,'
, template = '<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"><head><!--[if gte mso 9]><xml><x:ExcelWorkbook><x:ExcelWorksheets><x:ExcelWorksheet><x:Name>{worksheet}</x:Name><x:WorksheetOptions><x:DisplayGridlines/></x:WorksheetOptions></x:ExcelWorksheet></x:ExcelWorksheets></x:ExcelWorkbook></xml><![endif]--></head><body><table>{table}</table></body></html>'
, base64 = function (s) { return window.btoa(unescape(encodeURIComponent(s))) }
, format = function (s, c) { return s.replace(/{(\w+)}/g, function (m, p) { return c[p]; }) }
return function (table, name) {
if (!table.nodeType) table = document.getElementById(table)
var ctx = { worksheet: name || 'Worksheet', table: table.innerHTML }
window.location.href = uri + base64(format(template, ctx))
}
})()
The button I use to trigger it:
<input type="button" onclick="tableToExcel('InformatieTable', 'W3C Example Table')" value="Export to Excel">
but alas to no avail (I did not know what to do with the if (!table.nodeType) table = table line so I just commented it since it seemed to do nothing special).
Now I get an error, or well not really an error but this is what it says when I try to run this code:
Resource interpreted as Document but transferred with MIME type application/vnd.ms-excel: "data:application/vnd.ms-excel;base64,PGh0bWwgeG1sbnM6bz0idXJuOnNjaGVtYXMtbW…JzZXQ9VVRGLTgiLz48L2hlYWQ+PGJvZHk+PHRhYmxlPjwvdGFibGU+PC9ib2R5PjwvaHRtbD4=".
And I get an Excel file as download in my browser but when I try to open it I get an error about the content and file extension not matching and if I would still like to open it. So if I click ok it opens a empty Excel sheet and that's it.
I am currently trying to fix that error, though i don't think it will make any difference to the content of the Excel file.
Is there anyone that can help me fix this? Or provide an other way of doing this?
I do prefer it to be run client side (so jQuery/java) instead of server side to minimize server load.
EDIT
I've found a better example of the jQuery (one that does work) on http://www.codeproject.com/Tips/755203/Export-HTML-table-to-Excel-With-CSS
This converts 1 table into an excel file which is obviously not good enough. But now I have the code to do this so I should be able to adapt it to loop trough all tables on the web page.
Also updated the code in this example to the correct version I'm using now.
I also still get the same error yet when I click on ok when trying to open the Excel file it does show me the content of the table, so I'm just ignoring that for now. anyone who has a solution for this please share.
Thanks to #Axel Richter I got my answer, he reffered me to the following question
I have adapted the code a bit so it would Take all the tables on the web page so it now looks like this:
<script type="text/javascript">
var tablesToExcel = (function () {
var uri = 'data:application/vnd.ms-excel;base64,'
, tmplWorkbookXML = '<?xml version="1.0"?><?mso-application progid="Excel.Sheet"?><Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">'
+ '<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"><Author>Axel Richter</Author><Created>{created}</Created></DocumentProperties>'
+ '<Styles>'
+ '<Style ss:ID="Currency"><NumberFormat ss:Format="Currency"></NumberFormat></Style>'
+ '<Style ss:ID="Date"><NumberFormat ss:Format="Medium Date"></NumberFormat></Style>'
+ '</Styles>'
+ '{worksheets}</Workbook>'
, tmplWorksheetXML = '<Worksheet ss:Name="{nameWS}"><Table>{rows}</Table></Worksheet>'
, tmplCellXML = '<Cell{attributeStyleID}{attributeFormula}><Data ss:Type="{nameType}">{data}</Data></Cell>'
, base64 = function (s) { return window.btoa(unescape(encodeURIComponent(s))) }
, format = function (s, c) { return s.replace(/{(\w+)}/g, function (m, p) { return c[p]; }) }
return function (wsnames, wbname, appname) {
var ctx = "";
var workbookXML = "";
var worksheetsXML = "";
var rowsXML = "";
var tables = $('table');
for (var i = 0; i < tables.length; i++) {
for (var j = 0; j < tables[i].rows.length; j++) {
rowsXML += '<Row>'
for (var k = 0; k < tables[i].rows[j].cells.length; k++) {
var dataType = tables[i].rows[j].cells[k].getAttribute("data-type");
var dataStyle = tables[i].rows[j].cells[k].getAttribute("data-style");
var dataValue = tables[i].rows[j].cells[k].getAttribute("data-value");
dataValue = (dataValue) ? dataValue : tables[i].rows[j].cells[k].innerHTML;
var dataFormula = tables[i].rows[j].cells[k].getAttribute("data-formula");
dataFormula = (dataFormula) ? dataFormula : (appname == 'Calc' && dataType == 'DateTime') ? dataValue : null;
ctx = {
attributeStyleID: (dataStyle == 'Currency' || dataStyle == 'Date') ? ' ss:StyleID="' + dataStyle + '"' : ''
, nameType: (dataType == 'Number' || dataType == 'DateTime' || dataType == 'Boolean' || dataType == 'Error') ? dataType : 'String'
, data: (dataFormula) ? '' : dataValue.replace('<br>', '')
, attributeFormula: (dataFormula) ? ' ss:Formula="' + dataFormula + '"' : ''
};
rowsXML += format(tmplCellXML, ctx);
}
rowsXML += '</Row>'
}
ctx = { rows: rowsXML, nameWS: wsnames[i] || 'Sheet' + i };
worksheetsXML += format(tmplWorksheetXML, ctx);
rowsXML = "";
}
ctx = { created: (new Date()).getTime(), worksheets: worksheetsXML };
workbookXML = format(tmplWorkbookXML, ctx);
console.log(workbookXML);
var link = document.createElement("A");
link.href = uri + base64(workbookXML);
link.download = wbname || 'Workbook.xls';
link.target = '_blank';
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
}
})();
</script>
so now when ever I want a page to have an option to be exported to excel i add a refference to that script and i add the following button to my page:
<button onclick="tablesToExcel(['ServerInformatie', 'Relaties'], 'VirtueleMachineInfo.xls', 'Excel')">Export to Excel</button>
so the method:
tablesToExcel(WorksheetNames, fileName, 'Excel')
Where worksheetNames is an array which needs to contain as much names (or more) as there are tables on the page. You could ofcourse chose to create the worksheet names in a different way.
And where fileName is ofcourse the name of the file you'll be downloading.
Not having it all in 1 worksheet is a shame but at least this will do for now.
Here is the code that I used to put multiple HTML tables in the same Excel sheet:
import TableExport from 'tableexport';
const tbOptions = {
formats: ["xlsx"], // (String[]), filetype(s) for the export, (default: ['xlsx', 'csv', 'txt'])
bootstrap: true, // (Boolean), style buttons using bootstrap, (default: true)
exportButtons: false, // (Boolean), automatically generate the built-in export buttons for each of the specified formats (default: true)
position: "bottom", // (top, bottom), position of the caption element relative to table, (default: 'bottom')
}
DowlandExcel = (key) => {
const table = TableExport(document.getElementById(key), tbOptions);
var exportData = table.getExportData();
var xlsxData = exportData[key].xlsx;
console.log(xlsxData); // Replace with the kind of file you want from the exportData
table.export2file(xlsxData.data, xlsxData.mimeType, xlsxData.filename, xlsxData.fileExtension, xlsxData.merges, xlsxData.RTL, xlsxData.sheetname)
}
DowlandExcelMultiTable = (keys) => {
const tables = []
const xlsxDatas = []
keys.forEach(key => {
const selector = document.getElementById(key);
if (selector) {
const table = TableExport(selector, tbOptions);
tables.push(table);
xlsxDatas.push(table.getExportData()[key].xlsx)
}
});
const mergeXlsxData = {
RTL: false,
data: [],
fileExtension: ".xlsx",
filename: 'rapor',
merges: [],
mimeType: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
sheetname: "Rapor"
}
for (let i = 0; i < xlsxDatas.length; i++) {
const xlsxData = xlsxDatas[i];
mergeXlsxData.data.push(...xlsxData.data)
xlsxData.merges = xlsxData.merges.map(merge => {
const diff = mergeXlsxData.data.length - xlsxData.data.length;
merge.e.r += diff;
merge.s.r += diff;
return merge
});
mergeXlsxData.merges.push(...xlsxData.merges)
mergeXlsxData.data.push([null]);
}
console.log(mergeXlsxData);
tables[0].export2file(mergeXlsxData.data, mergeXlsxData.mimeType, mergeXlsxData.filename, mergeXlsxData.fileExtension, mergeXlsxData.merges, mergeXlsxData.RTL, mergeXlsxData.sheetname)
}

Get All Links in a Document

Given a "normal document" in Google Docs/Drive (e.g. paragraphs, lists, tables) which contains external links scattered throughout the content, how do you compile a list of links present using Google Apps Script?
Specifically, I want to update all broken links in the document by searching for oldText in each url and replace it with newText in each url, but not the text.
I don't think the replacing text section of the Dev Documentation is what I need -- do I need to scan every element of the doc? Can I just editAsText and use an html regex? Examples would be appreciated.
This is only mostly painful! Code is available as part of a gist.
Yeah, I can't spell.
getAllLinks
Here's a utility function that scans the document for all LinkUrls, returning them in an array.
/**
* Get an array of all LinkUrls in the document. The function is
* recursive, and if no element is provided, it will default to
* the active document's Body element.
*
* #param {Element} element The document element to operate on.
* .
* #returns {Array} Array of objects, vis
* {element,
* startOffset,
* endOffsetInclusive,
* url}
*/
function getAllLinks(element) {
var links = [];
element = element || DocumentApp.getActiveDocument().getBody();
if (element.getType() === DocumentApp.ElementType.TEXT) {
var textObj = element.editAsText();
var text = element.getText();
var inUrl = false;
for (var ch=0; ch < text.length; ch++) {
var url = textObj.getLinkUrl(ch);
if (url != null) {
if (!inUrl) {
// We are now!
inUrl = true;
var curUrl = {};
curUrl.element = element;
curUrl.url = String( url ); // grab a copy
curUrl.startOffset = ch;
}
else {
curUrl.endOffsetInclusive = ch;
}
}
else {
if (inUrl) {
// Not any more, we're not.
inUrl = false;
links.push(curUrl); // add to links
curUrl = {};
}
}
}
if (inUrl) {
// in case the link ends on the same char that the element does
links.push(curUrl);
}
}
else {
var numChildren = element.getNumChildren();
for (var i=0; i<numChildren; i++) {
links = links.concat(getAllLinks(element.getChild(i)));
}
}
return links;
}
findAndReplaceLinks
This utility builds on getAllLinks to do a find & replace function.
/**
* Replace all or part of UrlLinks in the document.
*
* #param {String} searchPattern the regex pattern to search for
* #param {String} replacement the text to use as replacement
*
* #returns {Number} number of Urls changed
*/
function findAndReplaceLinks(searchPattern,replacement) {
var links = getAllLinks();
var numChanged = 0;
for (var l=0; l<links.length; l++) {
var link = links[l];
if (link.url.match(searchPattern)) {
// This link needs to be changed
var newUrl = link.url.replace(searchPattern,replacement);
link.element.setLinkUrl(link.startOffset, link.endOffsetInclusive, newUrl);
numChanged++
}
}
return numChanged;
}
Demo UI
To demonstrate the use of these utilities, here are a couple of UI extensions:
function onOpen() {
// Add a menu with some items, some separators, and a sub-menu.
DocumentApp.getUi().createMenu('Utils')
.addItem('List Links', 'sidebarLinks')
.addItem('Replace Link Text', 'searchReplaceLinks')
.addToUi();
}
function searchReplaceLinks() {
var ui = DocumentApp.getUi();
var app = UiApp.createApplication()
.setWidth(250)
.setHeight(100)
.setTitle('Change Url text');
var form = app.createFormPanel();
var flow = app.createFlowPanel();
flow.add(app.createLabel("Find: "));
flow.add(app.createTextBox().setName("searchPattern"));
flow.add(app.createLabel("Replace: "));
flow.add(app.createTextBox().setName("replacement"));
var handler = app.createServerHandler('myClickHandler');
flow.add(app.createSubmitButton("Submit").addClickHandler(handler));
form.add(flow);
app.add(form);
ui.showDialog(app);
}
// ClickHandler to close dialog
function myClickHandler(e) {
var app = UiApp.getActiveApplication();
app.close();
return app;
}
function doPost(e) {
var numChanged = findAndReplaceLinks(e.parameter.searchPattern,e.parameter.replacement);
var ui = DocumentApp.getUi();
var app = UiApp.createApplication();
sidebarLinks(); // Update list
var result = DocumentApp.getUi().alert(
'Results',
"Changed "+numChanged+" urls.",
DocumentApp.getUi().ButtonSet.OK);
}
/**
* Shows a custom HTML user interface in a sidebar in the Google Docs editor.
*/
function sidebarLinks() {
var links = getAllLinks();
var sidebar = HtmlService
.createHtmlOutput()
.setTitle('URL Links')
.setWidth(350 /* pixels */);
// Display list of links, url only.
for (var l=0; l<links.length; l++) {
var link = links[l];
sidebar.append('<p>'+link.url);
}
DocumentApp.getUi().showSidebar(sidebar);
}
I offer another, shorter answer for your first question, concerning iterating through all links in a document's body. This instructive code returns a flat array of links in the current document's body, where each link is represented by an object with entries pointing to the text element (text), the paragraph element or list item element in which it's contained (paragraph), the offset index in the text where the link appears (startOffset) and the URL itself (url). Hopefully, you'll find it easy to suit it for your own needs.
It uses the getTextAttributeIndices() method rather than iterating over every character of the text, and is thus expected to perform much more quickly than previously written answers.
EDIT: Since originally posting this answer, I modified the function a couple of times. It now also (1) includes the endOffsetInclusive property for each link (note that it can be null for links that extend to the end of the text element - in this case one can use link.text.length-1 instead); (2) finds links in all sections of the document, not only the body, and (3) includes the section and isFirstPageSection properties to indicate where the link is located; (4) accepts the argument mergeAdjacent, which when set to true, will return only a single link entry for a continuous stretch of text linked to the same URL (which would be considered separate if, for instance, part of the text is styled differently than another part).
For the purpose of including links under all sections, a new utility function, iterateSections(), was introduced.
/**
* Returns a flat array of links which appear in the active document's body.
* Each link is represented by a simple Javascript object with the following
* keys:
* - "section": {ContainerElement} the document section in which the link is
* found.
* - "isFirstPageSection": {Boolean} whether the given section is a first-page
* header/footer section.
* - "paragraph": {ContainerElement} contains a reference to the Paragraph
* or ListItem element in which the link is found.
* - "text": the Text element in which the link is found.
* - "startOffset": {Number} the position (offset) in the link text begins.
* - "endOffsetInclusive": the position of the last character of the link
* text, or null if the link extends to the end of the text element.
* - "url": the URL of the link.
*
* #param {boolean} mergeAdjacent Whether consecutive links which carry
* different attributes (for any reason) should be returned as a single
* entry.
*
* #returns {Array} the aforementioned flat array of links.
*/
function getAllLinks(mergeAdjacent) {
var links = [];
var doc = DocumentApp.getActiveDocument();
iterateSections(doc, function(section, sectionIndex, isFirstPageSection) {
if (!("getParagraphs" in section)) {
// as we're using some undocumented API, adding this to avoid cryptic
// messages upon possible API changes.
throw new Error("An API change has caused this script to stop " +
"working.\n" +
"Section #" + sectionIndex + " of type " +
section.getType() + " has no .getParagraphs() method. " +
"Stopping script.");
}
section.getParagraphs().forEach(function(par) {
// skip empty paragraphs
if (par.getNumChildren() == 0) {
return;
}
// go over all text elements in paragraph / list-item
for (var el=par.getChild(0); el!=null; el=el.getNextSibling()) {
if (el.getType() != DocumentApp.ElementType.TEXT) {
continue;
}
// go over all styling segments in text element
var attributeIndices = el.getTextAttributeIndices();
var lastLink = null;
attributeIndices.forEach(function(startOffset, i, attributeIndices) {
var url = el.getLinkUrl(startOffset);
if (url != null) {
// we hit a link
var endOffsetInclusive = (i+1 < attributeIndices.length?
attributeIndices[i+1]-1 : null);
// check if this and the last found link are continuous
if (mergeAdjacent && lastLink != null && lastLink.url == url &&
lastLink.endOffsetInclusive == startOffset - 1) {
// this and the previous style segment are continuous
lastLink.endOffsetInclusive = endOffsetInclusive;
return;
}
lastLink = {
"section": section,
"isFirstPageSection": isFirstPageSection,
"paragraph": par,
"textEl": el,
"startOffset": startOffset,
"endOffsetInclusive": endOffsetInclusive,
"url": url
};
links.push(lastLink);
}
});
}
});
});
return links;
}
/**
* Calls the given function for each section of the document (body, header,
* etc.). Sections are children of the DocumentElement object.
*
* #param {Document} doc The Document object (such as the one obtained via
* a call to DocumentApp.getActiveDocument()) with the sections to iterate
* over.
* #param {Function} func A callback function which will be called, for each
* section, with the following arguments (in order):
* - {ContainerElement} section - the section element
* - {Number} sectionIndex - the child index of the section, such that
* doc.getBody().getParent().getChild(sectionIndex) == section.
* - {Boolean} isFirstPageSection - whether the section is a first-page
* header/footer section.
*/
function iterateSections(doc, func) {
// get the DocumentElement interface to iterate over all sections
// this bit is undocumented API
var docEl = doc.getBody().getParent();
var regularHeaderSectionIndex = (doc.getHeader() == null? -1 :
docEl.getChildIndex(doc.getHeader()));
var regularFooterSectionIndex = (doc.getFooter() == null? -1 :
docEl.getChildIndex(doc.getFooter()));
for (var i=0; i<docEl.getNumChildren(); ++i) {
var section = docEl.getChild(i);
var sectionType = section.getType();
var uniqueSectionName;
var isFirstPageSection = (
i != regularHeaderSectionIndex &&
i != regularFooterSectionIndex &&
(sectionType == DocumentApp.ElementType.HEADER_SECTION ||
sectionType == DocumentApp.ElementType.FOOTER_SECTION));
func(section, i, isFirstPageSection);
}
}
I was playing around and incorporated #Mogsdad's answer -- here's the really complicated version:
var _ = Underscorejs.load(); // loaded via http://googleappsdeveloper.blogspot.com/2012/11/using-open-source-libraries-in-apps.html, rolled my own
var ui = DocumentApp.getUi();
// #region --------------------- Utilities -----------------------------
var gDocsHelper = (function(P, un) {
// heavily based on answer https://stackoverflow.com/a/18731628/1037948
var updatedLinkText = function(link, offset) {
return function() { return 'Text: ' + link.getText().substring(offset,100) + ((link.getText().length-offset) > 100 ? '...' : ''); }
}
P.updateLink = function updateLink(link, oldText, newText, start, end) {
var oldLink = link.getLinkUrl(start);
if(0 > oldLink.indexOf(oldText)) return false;
var newLink = oldLink.replace(new RegExp(oldText, 'g'), newText);
link.setLinkUrl(start || 0, (end || oldLink.length), newLink);
log(true, "Updating Link: ", oldLink, newLink, start, end, updatedLinkText(link, start) );
return { old: oldLink, "new": newLink, getText: updatedLinkText(link, start) };
};
// moving this reused block out to 'private' fn
var updateLinkResult = function(text, oldText, newText, link, urls, sidebar, updateResult) {
// and may as well update the link while we're here
if(false !== (updateResult = P.updateLink(text, oldText, newText, link.start, link.end))) {
sidebar.append('<li>' + updateResult['old'] + ' → ' + updateResult['new'] + ' at ' + updateResult['getText']() + '</li>');
}
urls.push(link.url); // so multiple links get added to list
};
P.updateLinksMenu = function() {
// https://developers.google.com/apps-script/reference/base/prompt-response
var oldText = ui.prompt('Old link text to replace').getResponseText();
var newText = ui.prompt('New link text to replace with').getResponseText();
log('Replacing: ' + oldText + ', ' + newText);
var sidebar = gDocUiHelper.createSidebar('Update All Links', '<h3>Replacing</h3><p><code>' + oldText + '</code> → <code>' + newText + '</code></p><hr /><ol>');
// current doc available to script
var doc = DocumentApp.getActiveDocument().getBody();//.getActiveSection();
// Search until a link is found
var links = P.findAllElementsFor(doc, function(text) {
var i = -1, n = text.getText().length, link = false, url, urls = [], updateResult;
// note: the following only gets the FIRST link in the text -- while(i < n && !(url = text.getLinkUrl(i++)));
// scan the text element for links
while(++i < n) {
// getLinkUrl will continue to get a link while INSIDE the stupid link, so only do this once
if(url = text.getLinkUrl(i)) {
if(false === link) {
link = { start: i, end: -1, url: url };
// log(true, 'Type: ' + text.getType(), 'Link: ' + url, function() { return 'Text: ' + text.getText().substring(i,100) + ((n-i) > 100 ? '...' : '')});
}
else {
link.end = i; // keep updating the end position until we leave
}
}
// just left the link -- reset link tracking
else if(false !== link) {
// and may as well update the link while we're here
updateLinkResult(text, oldText, newText, link, urls, sidebar);
link = false; // reset "counter"
}
}
// once we've reached the end of the text, must also check to see if the last thing we found was a link
if(false !== link) updateLinkResult(text, oldText, newText, link, urls, sidebar);
return urls;
});
sidebar.append('</ol><p><strong>' + links.length + ' links reviewed</strong></p>');
gDocUiHelper.attachSidebar(sidebar);
log(links);
};
P.findAllElementsFor = function(el, test) {
// generic utility function to recursively find all elements; heavily based on https://stackoverflow.com/a/18731628/1037948
var results = [], searchResult = null, i, result;
// https://developers.google.com/apps-script/reference/document/body#findElement(ElementType)
while (searchResult = el.findElement(DocumentApp.ElementType.TEXT, searchResult)) {
var t = searchResult.getElement().editAsText(); // .asParagraph()
// check to add to list
if(test && (result = test(t))) {
if( _.isArray(result) ) results = results.concat(result); // could be big? http://jsperf.com/self-concatenation/
else results.push(result);
}
}
// recurse children if not plain text item
if(el.getType() !== DocumentApp.ElementType.TEXT) {
i = el.getNumChildren();
var result;
while(--i > 0) {
result = P.findAllElementsFor(el.getChild(i));
if(result && result.length > 0) results = results.concat(result);
}
}
return results;
};
return P;
})({});
// really? it can't handle object properties?
function gDocsUpdateLinksMenu() {
gDocsHelper.updateLinksMenu();
}
gDocUiHelper.addMenu('Zaus', [ ['Update links', 'gDocsUpdateLinksMenu'] ]);
// #endregion --------------------- Utilities -----------------------------
And I'm including the "extra" utility classes for creating menus, sidebars, etc below for completeness:
var log = function() {
// return false;
var args = Array.prototype.slice.call(arguments);
// allowing functions delegates execution so we can save some non-debug cycles if code left in?
if(args[0] === true) Logger.log(_.map(args, function(v) { return _.isFunction(v) ? v() : v; }).join('; '));
else
_.each(args, function(v) {
Logger.log(_.isFunction(v) ? v() : v);
});
}
// #region --------------------- Menu -----------------------------
var gDocUiHelper = (function(P, un) {
P.addMenuToSheet = function addMenu(spreadsheet, title, items) {
var menu = ui.createMenu(title);
// make sure menu items are correct format
_.each(items, function(v,k) {
var err = [];
// provided in format [ [name, fn],... ] instead
if( _.isArray(v) ) {
if ( v.length === 2 ) {
menu.addItem(v[0], v[1]);
}
else {
err.push('Menu item ' + k + ' missing name or function: ' + v.join(';'))
}
}
else {
if( !v.name ) err.push('Menu item ' + k + ' lacks name');
if( !v.functionName ) err.push('Menu item ' + k + ' lacks function');
if(!err.length) menu.addItem(v.name, v.functionName);
}
if(err.length) {
log(err);
ui.alert(err.join('; '));
}
});
menu.addToUi();
};
// list of things to hook into
var initializers = {};
P.addMenu = function(menuTitle, menuItems) {
if(initializers[menuTitle] === un) {
initializers[menuTitle] = [];
}
initializers[menuTitle] = initializers[menuTitle].concat(menuItems);
};
P.createSidebar = function(title, content, options) {
var sidebar = HtmlService
.createHtmlOutput()
.setTitle(title)
.setWidth( (options && options.width) ? width : 350 /* pixels */);
sidebar.append(content);
if(options && options.on) DocumentApp.getUi().showSidebar(sidebar);
// else { sidebar.attach = function() { DocumentApp.getUi().showSidebar(this); }; } // should really attach to prototype...
return sidebar;
};
P.attachSidebar = function(sidebar) {
DocumentApp.getUi().showSidebar(sidebar);
};
P.onOpen = function() {
var spreadsheet = SpreadsheetApp.getActive();
log(initializers);
_.each(initializers, function(v,k) {
P.addMenuToSheet(spreadsheet, k, v);
});
};
return P;
})({});
// #endregion --------------------- Menu -----------------------------
/**
* A special function that runs when the spreadsheet is open, used to add a
* custom menu to the spreadsheet.
*/
function onOpen() {
gDocUiHelper.onOpen();
}
Had some trouble getting Mogsdad's solution to work. Specifically it misses links which end their parent element so there isn't a trailing non-link character to terminate it. I've implemented something which addresses this and returns a standard range element. Sharing here incase someone finds it useful.
function getAllLinks(element) {
var rangeBuilder = DocumentApp.getActiveDocument().newRange();
// Parse the text iteratively to find the start and end indices for each link
if (element.getType() === DocumentApp.ElementType.TEXT) {
var links = [];
var string = element.getText();
var previousUrl = null; // The URL of the previous character
var currentLink = null; // The latest link being built
for (var charIndex = 0; charIndex < string.length; charIndex++) {
var currentUrl = element.getLinkUrl(charIndex);
// New URL means create a new link
if (currentUrl !== null && previousUrl !== currentUrl) {
if (currentLink !== null) links.push(currentLink);
currentLink = {};
currentLink.url = String(currentUrl);
currentLink.startOffset = charIndex;
}
// In a URL means extend the end of the current link
if (currentUrl !== null) {
currentLink.endOffsetInclusive = charIndex;
}
// Not in a URL means close and push the link if ready
if (currentUrl === null) {
if (currentLink !== null) links.push(currentLink);
currentLink = null;
}
// End the loop and go again
previousUrl = currentUrl;
}
// Handle the end case when final character is a link
if (currentLink !== null) links.push(currentLink);
// Convert the links into a range before returning
links.forEach(function(link) {
rangeBuilder.addElement(element, link.startOffset, link.endOffsetInclusive);
});
}
// If not a text element then recursively get links from child elements
else if (element.getNumChildren) {
for (var i = 0; i < element.getNumChildren(); i++) {
rangeBuilder.addRange(getAllLinks(element.getChild(i)));
}
}
return rangeBuilder.build();
}
You are right ... search and replace is not applicable here.
Use setLinkUrl() https://developers.google.com/apps-script/reference/document/container-element#setLinkUrl(String)
Basically you have to iterate through the elements recursively (elements can contain elements) and for each
use getLinkUrl() to get the oldText
if not null , setLinkUrl(newText) .... leaves displayed text unchanged
This Excel macro lists the links from a Word doc. You'd need to copy your data into a Word doc first.
Sub getLinks()
Dim wApp As Word.Application, wDoc As Word.Document
Dim i As Integer, r As Range
Const filePath = "C:\test\test.docx"
Set wApp = CreateObject("Word.Application")
'wApp.Visible = True
Set wDoc = wApp.Documents.Open(filePath)
Set r = Range("A1")
For i = 1 To wDoc.Hyperlinks.Count
r = wDoc.Hyperlinks(i).Address
Set r = r.Offset(1, 0)
Next i
wApp.Quit
Set wDoc = Nothing
Set wApp = Nothing
End Sub
Here's a quick and dirty way to accomplish the same goal with no scripting:
From Google Docs, save the document in RTF format.
In your editor of choice, edit the links in the RTF file (in my case, I wanted to modify all the hyperlinks, so I used Emacs and regexp-replace). Save the file when you're done.
Create a fresh, new Google Doc, and from the menu, select File>Open and open the RTF file. Docs will convert your edited RTF file back into a proper Google Doc, restoring all formatting.
Google Docs' RTF format is pretty complete--I haven't noticed any loss of fidelity in making the round trip, and it has the advantage of fully exposing all the hyperlinks, formatting, and everything else about the document in a form that's easy to edit and to apply regex tools to.