Get InlineImage in Specific Tables from Google Document - google-apps-script

I encountered the following problem. I have a Google Document, that contains a bunch of table objects and some of those tables contain inline images themselves.
With the Body.getImages() function I should be able to get the images of the whole document (right?). But is there any way to get the images from a specific table or is there a way to determine in which tables the images retrieved by the Body.getImages() method are located?
In case you are wondering what this is used for: My Google Doc is used to store several multiple-choice exam questions, where each question is represented by a table. I am trying to write a script to export these questions to a specific format and I encountered the problem that some of those questions contain images.

Correct - body.getImages() will return an array of images.
We can use this array of images to find the corresponding table's for each image. If we use a recursive function on each image, we can getParent() up the document tree until the Parent Table to a particular image is found, then we list the element number (the ChildIndex) for the Table. If there is a "Question #" header in the table, we can search for it and return the question number of the located Table.
function myFunction() {
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
var tables = body.getTables();
var images = doc.getBody().getImages();
Logger.log("Found " + images.length + " images");
Logger.log("Found " + tables.length + " tables");
//list body element #'s for each tables
let tableList = []
tables.forEach(table => tableList.push(String(table.getParent().getChildIndex(table))))
Logger.log("Tables at body element #s: ", tableList);
function findQuestionNumber (element, index) {
parent = element.getParent()
//IF found the parent Table
if (parent.getType() == DocumentApp.ElementType.TABLE) {
//Find the question # from the Table
let range = parent.findText("Question")
//Output where this image was found. (The childindex)
Logger.log("found Image", String(index + 1), "in ", range.getElement().getParent().getText(), " at body element #", String(parent.getParent().getChildIndex(parent)));
return
//use recursion to continue up the tree until the parent Table is found
} else {
findQuestionNumber(parent, index)
}
}
//Run Function for each image in getImages() Array
images.forEach((element, index) => findQuestionNumber(element, index));
}

Unfortunately there are errors. Here it is:
11:11:16 PM Notice Execution started
11:11:16 PM Info Found 26 images
11:11:16 PM Info Found 1 tables
11:11:16 PM Info Tables at body element #s:
11:11:17 PM Error
TypeError: Cannot read property 'getElement' of null
findQuestionNumber # Code.gs:22
findQuestionNumber # Code.gs:26
findQuestionNumber # Code.gs:26
findQuestionNumber # Code.gs:26
(anonymous) # Code.gs:31
myFunction # Code.gs:31

Related

footer.replaceText() does not replace my placeholder with new text

Can anyone explain why footer.replaceText() does not replace my placeholder with new text. It works for both header and body.
//opens the master report document which is now saved as the student's name and gets the contents of the header
let header = DocumentApp.openById(documentId).getHeader()
//opens the master report document which is now saved as the student's name and gets the contents of the body
let body = DocumentApp.openById(documentId).getBody()
//opens the master report document which is now saved as the student's name and gets the contents of the footer
let footer = DocumentApp.openById(documentId).getFooter()
header.replaceText('{{First Name}}', row[studentDetails.firstName])
body.replaceText('{{Maths Attainment}}', row[studentDetails.mathsAttainment])
footer.replaceText('{{Class Teacher}}', row[studentDetails.classTeacher])
I can't seem to find an answer on Stack Overflow that works.
If you're using the option 'Different first page' you can't get the footer and header of the first page in the common (documented) way.
Based on this answer try to get them this 'secret' way:
let first_header = header.getParent().getChild(3);
let first_footer = header.getParent().getChild(4);
And then you can change them as usual:
first_header.replaceText('{{First Name}}', row[studentDetails.firstName]);
first_footer.replaceText('{{Class Teacher}}', row[studentDetails.classTeacher]);
I would like to give credit to Mr. Amit Agarwal who has provided a working solution to above.
The code is below:
const replaceHeaderFooter = () => {
// Returns the document with the specified ID
const doc = DocumentApp.openById('DOCUMENT ID');
// Retrieves the headers's container element which is DOCUMENT
const parent = doc.getHeader().getParent();
for (let i = 0; i < parent.getNumChildren(); i += 1) {
// Retrieves the child element at the specified child index
const child = parent.getChild(i);
// Determine the exact type of a given child element
const childType = child.getType();
if (childType === DocumentApp.ElementType.HEADER_SECTION) {
// Replaces all occurrences of a given text in regex pattern
child.asHeaderSection().replaceText('{{Company}}', 'Digital Inspiration');
} else if (childType === DocumentApp.ElementType.FOOTER_SECTION) {
// Replaces all occurrences of a given text in regex pattern
child.asFooterSection().replaceText('{{Copyright}}', '© Amit Agarwal');
}
}
// Saves the current Document.
// Causes pending updates to be flushed and applied.
doc.saveAndClose();
};
Thank you!

How to replace words in Google doc while maintaing text style?

I want to scan words in a Google doc from left to right and replace the first occurrences of some keywords with a URL or a bbcode like tag wrapper around them.
I cannot use findText API because it's not simple regex finding but complex pattern matching involving lots of if else conditions involving business logic.
Here is how I want to solve this
let document = DocumentApp.getActiveDocument().getBody();
let paragraph = document.getParagraphs()[0];
let contents = paragraph.getText();
// makeAllTheNecessaryReplacemens has all the business logic to identify which keywords need to changed
let newContents = makeAllTheNecessaryReplacemens(contents);
paragraph.setText(newContents);
The problem here is that text style gets wiped out and also makeAllTheNecessaryReplacemens cannot add hyperlinks to string text.
Please suggest a way to do this.
Proposed function
/**
* This is a wrapper around the attribute functions
* this allows setting one attribute at a time
* based of a complete attribute object obtained
* from another element. This makes it far more
* reliable.
*/
const attributeKey = {
FONT_SIZE : (o,s,e,a) => o.setFontSize(s,e,a),
STRIKETHROUGH : (o,s,e,a) => o.setStrikethrough(s,e,a),
FOREGROUND_COLOR : (o,s,e,a) => o.setForegroundColor(s,e,a),
LINK_URL : (o,s,e,a) => o.setLinkUrl(s,e,a),
UNDERLINE : (o,s,e,a) => o.setUnderline(s,e,a),
BOLD : (o,s,e,a) => o.setBold(s,e,a),
ITALIC : (o,s,e,a) => o.setItalic(s,e,a),
BACKGROUND_COLOR : (o,s,e,a) => o.setBackgroundColor(s,e,a),
FONT_FAMILY : (o,s,e,a) => o.setFontFamily(s,e,a)
}
/**
* Replace textToReplace with replacementText
* Will reatain formatting and hyperlinks
*/
function replaceTextPlus(textToReplace, replacementText) {
// Initializing
let body = DocumentApp.getActiveDocument().getBody();
let searchResult = body.findText(textToReplace);
while (searchResult != null) {
// Getting info about result
let foundElement = searchResult.getElement();
let start = searchResult.getStartOffset();
let end = searchResult.getEndOffsetInclusive();
// This returns a complete attributes object
// Many attributes have null as a value
let attributes = foundElement.getAttributes(start);
// Replacing text
foundElement.deleteText(start, end);
foundElement.insertText(start, replacementText);
// Setting new end index
let newEnd = start + replacementText.length - 1
// Set attributes for new text skipping over null values
// This requires the constant defined at the top.
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
// Modifies the actual searchResult so that the next findText
// starts at the NEW end index.
try {
let rangeBuilder = DocumentApp.getActiveDocument().newRange();
rangeBuilder.addElement(foundElement, start, newEnd);
searchResult = rangeBuilder.getRangeElements()[0];
} catch (e){
Logger.log("End of Document")
return null
}
// searches for next result
searchResult = body.findText(textToReplace, searchResult);
}
}
Extending the findText API
This function relies on the findText API, but it adds in a few more steps.
Find the text.
Get the element containing the text.
Get the start and end indices of the text.
Get the attributes of the text (font, color, hyperlink etc)
Replace the text.
Update the end index.
Use the old attributes to update the new text.
You call it like this:
replaceTextPlus("Bing", "Google")
replaceTextPlus("occurrences", "happenings")
replaceTextPlus("text", "prefixedtext")
How to set the formatting and link attributes.
This relies on the attributes object that gets returned from getAttributes. Which looks something like this:
{
FOREGROUND_COLOR=#ff0000,
LINK_URL=null,
FONT_SIZE=null,
ITALIC=true,
STRIKETHROUGH=null,
FONT_FAMILY=null,
BOLD=null,
UNDERLINE=true,
BACKGROUND_COLOR=null
}
I tried to use setAttributes but it was very unreliable. Using this method almost always resulted in some formatting loss.
To fix this I make an object attributeKey that wraps all the different functions for setting individual attributes, so that they can be called from this loop:
for (let a in attributes) {
if (attributes[a] != null) {
attributeKey[a](foundElement, start, newEnd, attributes[a]);
}
}
This allows null values to be skipped which seems to have solved the unreliability problem. Perhaps the update buffer gets confused with many values.
Limitations
This function gets the formatting of the first character of the found word. If the same work has different formatting within itself. For example, "Hello" (Mixed normal with bold and italic), the replacement word will have the formatting of the first letter. This could potentially be fixed by identifying the word and iterating over every single letter.
References
Text class
Body class
DocumentApp
Element Interface
Attribute Enum

Output from an array in html page

I am trying to output the contents of an array within an array to a small area on an HTML page. I can only get one dimensional arrays to output.
Simplified, the intended array has a number of properties, but am struggling to find the correct code to output an array nested inside an array.
Properties are;
ID(integer)
Location(string)
Postcode(String)
other properties may be added down the line.
To output the information I am using the following code (which I can only get to work on a single array - even if I change to using [i][x] )
document.write("<tr><td>ID " + i + " is:</td>");
document.write("<td>" + LocationArray[i] + "</td></tr>");
How do I correctly create an array capable of storing the information and then output a specific part of it? eg display the contents of LocationArray[2][3]
Is document.write an efficient method, or is there something better?
I put something together, that could help you. To answer your question at the end about creating an array 'the right way'; There are two possibilities:
Create an array with 'property'-based properties : var locationsArray = [{ID:123,Location:'blabla',Postalcode:'1234'}];
Create an array with string-keys : var locationsArray = [{'ID':123,'Location':'blabla','Postalcode':'1234'}];
In my example I used the first attempt.
To your second question: document.write just writes at the end of the document. If you want to write to a specific area of the website, create a container (for example) and give it an id. Then change the property innerHTML of the created container, as I did in my example.
HTML:
<div id="locations"></div>
<button onclick="printLocations()">Print Locations</button>
Javascript:
function printLocations() {
var locationsArray = [{
ID : 123,
Location : 'Candyland',
Postalcode : '1234'
}, {
ID : 456,
Location : 'Middle-Earth',
Postalcode : '4567'
}
];
var locationsHtml = '';
for (var index in locationsArray) {
locationsHtml += 'ID: ' + locationsArray[index].ID + ', ' +
'Location: ' + locationsArray[index].Location + ', ' +
'Postalcode: ' + locationsArray[index].Postalcode + '<br />';
}
console.log(locationsHtml);
document.getElementById('locations').innerHTML = locationsHtml;
}
If you just want to write a specific part of the array (in your example just one specific location) just use the index you want and access it the same way as in the for loop in my example:
var locationsHtml = locationsArray[1].ID + locationsArray[1].Location + etc...;
/*with string-keys: var locationsHtml = locationsArray[1]['ID'] + etc...;*/
document.getElementById('locations').innerHTML = locationsHtml;

How do I search sub-folders and sub-sub-folders in Google Drive?

This is a commonly asked question.
The scenario is:-
folderA____ folderA1____folderA1a
\____folderA2____folderA2a
\___folderA2b
... and the question is how do I list all the files in all of the folders under the root folderA.
EDIT: April 2020 Google have announced that multi-parent files is being disabled from September 2020. This alters the narrative below and means option 2 is no longer an option. It might be possible to implement Option 2 using shortcuts. I will update this answer further as I test the new restrictions/features
We are all used to the idea of folders (aka directories) in Windows/nix etc. In the real world, a folder is a container, into which documents are placed. It is also possible to place smaller folders inside bigger folders. Thus the big folder can be thought of as containing all of the documents inside its smaller children folders.
However, in Google Drive, a Folder is NOT a container, so much so that in the first release of Google Drive, they weren't even called Folders, they were called Collections. A Folder is simply a File with (a) no contents, and (b) a special mime-type (application/vnd.google-apps.folder). The way Folders are used is exactly the same way that tags (aka labels) are used. The best way to understand this is to consider GMail. If you look at the top of an open mail item, you see two icons. A folder with the tooltip "Move to" and a label with the tooltip "Labels". Click on either of these and the same dialogue box appears and is all about labels. Your labels are listed down the left hand side, in a tree display that looks a lot like folders. Importantly, a mail item can have multiple labels, or you could say, a mail item can be in multiple folders. Google Drive's Folders work in exactly the same way that GMail labels work.
Having established that a Folder is simply a label, there is nothing stopping you from organising your labels in a hierarchy that resembles a folder tree, in fact this is the most common way of doing so.
It should now be clear that a file (let's call it MyFile) in folderA2b is NOT a child or grandchild of folderA. It is simply a file with a label (confusingly called a Parent) of "folderA2b".
OK, so how DO I get all the files "under" folderA?
Alternative 1. Recursion
The temptation would be to list the children of folderA, for any children that are folders, recursively list their children, rinse, repeat. In a very small number of cases, this might be the best approach, but for most, it has the following problems:-
It is woefully time consuming to do a server round trip for each sub folder. This does of course depend on the size of your tree, so if you can guarantee that your tree size is small, it could be OK.
Alternative 2. The common parent
This works best if all of the files are being created by your app (ie. you are using drive.file scope). As well as the folder hierarchy above, create a dummy parent folder called say "MyAppCommonParent". As you create each file as a child of its particular Folder, you also make it a child of MyAppCommonParent. This becomes a lot more intuitive if you remember to think of Folders as labels. You can now easily retrieve all descdendants by simply querying MyAppCommonParent in parents.
Alternative 3. Folders first
Start by getting all folders. Yep, all of them. Once you have them all in memory, you can crawl through their parents properties and build your tree structure and list of Folder IDs. You can then do a single files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents.... Using this technique you can get everything in two http calls.
The pseudo code for option 3 is a bit like...
// get all folders from Drive files.list?q=mimetype=application/vnd.google-apps.folder and trashed=false&fields=parents,name // store in a Map, keyed by ID // find the entry for folderA and note the ID // find any entries where the ID is in the parents, note their IDs // for each such entry, repeat recursively // use all of the IDs noted above to construct a ... // files.list?q='folderA-ID' in parents or 'folderA1-ID' in parents or 'folderA1a-ID' in parents...
Alternative 2 is the most effificient, but only works if you have control of file creation. Alternative 3 is generally more efficient than Alternative 1, but there may be certain small tree sizes where 1 is best.
Sharing a Python solution to the excellent Alternative 3 by #pinoyyid, above, in case it's useful to anyone. I'm not a developer so it's probably hopelessly un-pythonic... but it works, only makes 2 API calls, and is pretty quick.
Get a master list of all the folders in a drive.
Test whether the folder-to-search is a parent (ie. it has subfolders).
Iterate through subfolders of the folder-to-search testing whether they too are parents.
Build a Google Drive file query with one '<folder-id>' in parents segment per subfolder found.
Interestingly, Google Drive seems to have a hard limit of 599 '<folder-id>' in parents segments per query, so if your folder-to-search has more subfolders than this, you need to chunk the list.
FOLDER_TO_SEARCH = '123456789' # ID of folder to search
DRIVE_ID = '654321' # ID of shared drive in which it lives
MAX_PARENTS = 500 # Limit set safely below Google max of 599 parents per query.
def get_all_folders_in_drive():
"""
Return a dictionary of all the folder IDs in a drive mapped to their parent folder IDs (or to the
drive itself if a top-level folder). That is, flatten the entire folder structure.
"""
folders_in_drive_dict = {}
page_token = None
max_allowed_page_size = 1000
just_folders = "trashed = false and mimeType = 'application/vnd.google-apps.folder'"
while True:
results = drive_api_ref.files().list(
pageSize=max_allowed_page_size,
fields="nextPageToken, files(id, name, mimeType, parents)",
includeItemsFromAllDrives=True, supportsAllDrives=True,
corpora='drive',
driveId=DRIVE_ID,
pageToken=page_token,
q=just_folders).execute()
folders = results.get('files', [])
page_token = results.get('nextPageToken', None)
for folder in folders:
folders_in_drive_dict[folder['id']] = folder['parents'][0]
if page_token is None:
break
return folders_in_drive_dict
def get_subfolders_of_folder(folder_to_search, all_folders):
"""
Yield subfolders of the folder-to-search, and then subsubfolders etc. Must be called by an iterator.
:param all_folders: The dictionary returned by :meth:`get_all_folders_in-drive`.
"""
temp_list = [k for k, v in all_folders.items() if v == folder_to_search] # Get all subfolders
for sub_folder in temp_list: # For each subfolder...
yield sub_folder # Return it
yield from get_subfolders_of_folder(sub_folder, all_folders) # Get subsubfolders etc
def get_relevant_files(self, relevant_folders):
"""
Get files under the folder-to-search and all its subfolders.
"""
relevant_files = {}
chunked_relevant_folders_list = [relevant_folders[i:i + MAX_PARENTS] for i in
range(0, len(relevant_folders), MAX_PARENTS)]
for folder_list in chunked_relevant_folders_list:
query_term = ' in parents or '.join('"{0}"'.format(f) for f in folder_list) + ' in parents'
relevant_files.update(get_all_files_in_folders(query_term))
return relevant_files
def get_all_files_in_folders(self, parent_folders):
"""
Return a dictionary of file IDs mapped to file names for the specified parent folders.
"""
files_under_folder_dict = {}
page_token = None
max_allowed_page_size = 1000
just_files = f"mimeType != 'application/vnd.google-apps.folder' and trashed = false and ({parent_folders})"
while True:
results = drive_api_ref.files().list(
pageSize=max_allowed_page_size,
fields="nextPageToken, files(id, name, mimeType, parents)",
includeItemsFromAllDrives=True, supportsAllDrives=True,
corpora='drive',
driveId=DRIVE_ID,
pageToken=page_token,
q=just_files).execute()
files = results.get('files', [])
page_token = results.get('nextPageToken', None)
for file in files:
files_under_folder_dict[file['id']] = file['name']
if page_token is None:
break
return files_under_folder_dict
if __name__ == "__main__":
all_folders_dict = get_all_folders_in_drive() # Flatten folder structure
relevant_folders_list = [FOLDER_TO_SEARCH] # Start with the folder-to-archive
for folder in get_subfolders_of_folder(FOLDER_TO_SEARCH, all_folders_dict):
relevant_folders_list.append(folder) # Recursively search for subfolders
relevant_files_dict = get_relevant_files(relevant_folders_list) # Get the files
Sharing a javascript solution using recursion to build an array of folders, starting with the first level folder and moving down the hierarchy. This array is composed by recursively cycling through the parent Id's of the file in question.
The extract below makes 3 separate queries to the gapi:
get the root folder id
get a list of folders
get a list of files
the code iterates through the list of files, then creating an array of folder names.
const { google } = require('googleapis')
const gOAuth = require('./googleOAuth')
// resolve the promises for getting G files and folders
const getGFilePaths = async () => {
//update to use Promise.All()
let gRootFolder = await getGfiles().then(result => {return result[2][0]['parents'][0]})
let gFolders = await getGfiles().then(result => {return result[1]})
let gFiles = await getGfiles().then(result => {return result[0]})
// create the path files and create a new key with array of folder paths, returning an array of files with their folder paths
return pathFiles = gFiles
.filter((file) => {return file.hasOwnProperty('parents')})
.map((file) => ({...file, path: makePathArray(gFolders, file['parents'][0], gRootFolder)}))
}
// recursive function to build an array of the file paths top -> bottom
let makePathArray = (folders, fileParent, rootFolder) => {
if(fileParent === rootFolder){return []}
else {
let filteredFolders = folders.filter((f) => {return f.id === fileParent})
if(filteredFolders.length >= 1 && filteredFolders[0].hasOwnProperty('parents')) {
let path = makePathArray(folders, filteredFolders[0]['parents'][0])
path.push(filteredFolders[0]['name'])
return path
}
else {return []}
}
}
// get meta-data list of files from gDrive, with query parameters
const getGfiles = () => {
try {
let getRootFolder = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(name, parents)',
q: "'root' in parents and trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
let getFolders = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(id,name,parents), nextPageToken',
q: "trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
let getFiles = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(id,name,parents, mimeType, fullFileExtension, webContentLink, exportLinks, modifiedTime), nextPageToken',
q: "trashed = false and mimeType != 'application/vnd.google-apps.folder'"})
return Promise.all([getFiles, getFolders, getRootFolder])
}
catch(error) {
return `Error in retriving a file reponse from Google Drive: ${error}`
}
}
// make call out gDrive to get meta-data files. Code adds all files in a single array which are returned in pages
const getGdriveList = async (params) => {
const gKeys = await gOAuth.get()
const drive = google.drive({version: 'v3', auth: gKeys})
let list = []
let nextPgToken
do {
let res = await drive.files.list(params)
list.push(...res.data.files)
nextPgToken = res.data.nextPageToken
params.pageToken = nextPgToken
}
while (nextPgToken)
return list
}
The following works very well but requires an additional call to the API.
It shares the root folder, does a search where file is shared, then removed the share. This works great in our production environments.
userPermission = new Permission()
{
Type = "user",
Role = "reader",
EmailAddress = "AnyEmailAddress"
};
var request = service.Permissions.Create(userPermission, rootFolderID);
var result = request.ExecuteAsync().ContinueWith(t =>
{
Permission permission = t.Result;
if (t.Exception == null)
{
//Do your search here
// make sure you add 'AnyEmailAddress' in readers
service.Files.List......
// then remove the share
var requestDeletePermission = service.Permissions.Delete(rootFolderID, permission.filePermissionID);
requestDeletePermission.Execute();
}
});
For Google Apps Script, I've written this function:
function getSubFolderIdsByFolderId(folderId, result = []) {
let folder = DriveApp.getFolderById(folderId);
let folders = folder.getFolders();
if (folders && folders.hasNext()) {
while (folders.hasNext()) {
let f = folders.next();
let childFolderId = f.getId();
result.push(childFolderId);
result = getSubFolderIdsByFolderId(childFolderId, result);
}
}
return result.filter(onlyUnique);
}
function onlyUnique(value, index, self) {
return self.indexOf(value) === index;
}
With this call:
const subFolderIds = getSubFolderIdsByFolderId('1-id-of-the-root-folder-to-check')
And this for loop:
let q = [];
for (let i in subFolderIds) {
let subFolderId = subFolderIds[i];
q.push('"' + subFolderId + '" in parents');
}
if (q.length > 0) {
q = '(' + q.join(' or ') + ') and';
} else {
q = '';
}
I get the required query part, for the DriveApp.searchFiles call.
A major disadvantage of this approach is the number of requests and the time you'll have to wait for, till you got the complete list - depending on the size of the root directory. I would not call this an ideal solution!
Maybe caching could increase the performance for additional calls, when you take the modification date into account of the drive API query.
I'm curious because, in the Google Drive Browser version, you can search recursively within folders. And it does not take that much time, as my approach.

How do I find the page number/number of pages in a document?

I want to create a new document based on a template and need to know when my insertion or append results in a new page in the final printed output is there any property/attribute eg number of pages that can be used for this?
I've search this a lot in the past and I don't think there's any property or any other way to know page info.
The solution I use is to insert page breaks on my template or via the script, using my own knowledge of how my template works, i.e. how much space it takes as I iterate, etc.
And then I know which page I am by counting the page breaks.
Anyway, you could an enhancement request on the issue tracker.
One way to get total number of pages:
function countPages() {
var blob = DocumentApp.getActiveDocument().getAs("application/pdf");
var data = blob.getDataAsString();
var re = /Pages\/Count (\d+)/g;
var match;
var pages = 0;
while(match = re.exec(data)) {
Logger.log("MATCH = " + match[1]);
var value = parseInt(match[1]);
if (value > pages) {
pages = value;
}
}
Logger.log("pages = " + pages);
return pages;
}