Is there a method in Google Apps Scrips that returns the word count from a Google Document?
Lets say I'm writing a report that have a particular limit on word count. It's quite precise and it states exactly 1.8k - 2k words (yes and it's not just a single case, but many...)
In Microsoft Office Word there was a handy status bar at the bottom of the page which automatically updated the word count for me, so I tried to make one using Google Apps Scrips.
Writing a function that rips out whole text out from a current document and then calculates words again and again several times in a minute feels like a nonsense to me. It's completely inefficient and it makes CPU run for nothing but I couldn't find that function for the word count in Docs Reference.
Ctr+Shift+C opens a pop-up that contains it, which means that a function that returns total word count of a Google Document definitely exists...
But I can't find it!
Sigh... I spent few hours digging through Google, but I simply cannot find it, please help!
Wrote a little snippet that might help.
function myFunction() {
var space = " ";
var text = DocumentApp.getActiveDocument().getBody().getText();
var words = text.replace(/\s+/g, space).split(space);
Logger.log(words.length);
}
I understand the the request is for a built in function, which I looked for as well, but couldn't find anywhere in the documentation. I had to use polling.
I started with a script like Amit's, but found that I was never matching Google's word count. This is what I had to do to get it work. I know this can't be efficient, but it now matches google docs count most of the time. What I had to do was clean/rebuild the string first, then count it.
function countWords() {
var s = DocumentApp.getActiveDocument().getBody().getText();
//this function kept returning "1" when the doc was blank
//so this is how I stopped having it return 1.
if (s.length === 0)
return 0;
//A simple \n replacement didn't work, neither did \s not sure why
s = s.replace(/\r\n|\r|\n/g, " ");
//In cases where you have "...last word.First word..."
//it doesn't count the two words around the period.
//so I replace all punctuation with a space
var punctuationless = s.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()"?“”]/g," ");
//Finally, trim it down to single spaces (not sure this even matters)
var finalString = punctuationless.replace(/\s{2,}/g," ");
//Actually count it
var count = finalString.trim().split(/\s+/).length;
return count;
}
I think this function probably covers most cases for word count with English characters. If I overlooked something, please comment.
function testTheFunction(){
var myDoc = DocumentApp.openByUrl('https://docs.google.com/document/d/?????/edit');
Logger.log(countWordsInDocument(myDoc));
}
function countWordsInDocument(theDoc){
var theText = theDoc.getBody().getText();
var theRegex = new RegExp("[A-Za-z]") // or include other ranges for other languages or numbers
var wordStarted = false;
var theCount = 0;
for(var i=0;i<theText.length;i++){
var theLetter = theText.slice(i,i+1);
if(theRegex.test(theLetter)){
if(!wordStarted){
wordStarted=true;
theCount++;
}
}else if(wordStarted){
wordStarted=false;
}
}
return theCount;
}
Related
I'm trying to get a collection of files where user (let's use billyTheUser#gmail.com) is an editor.
I know this can be accomplished almost instantly on the front-end of google drive by doing a search for to:billyTheUser#gmail.com in the drive search bar.
I presume this is something that can be done in Google App Scripts, but maybe I'm wrong. I figured DriveApp.searchFiles would work, but I'm having trouble structuring the proper string syntax. I've looked at the Google SDK Documentation and am guessing I am doing something wrong with the usage of the in matched to the user string search? Below is the approaches I've taken, however if there's a different method to accomplishing the collection of files by user, I'd be happy to change my approach.
var files = DriveApp.searchFiles(
//I would expect this to work, but this doesn't return values
'writers in "billyTheUser#gmail.com"');
//Tried these just experimenting. None return values
'writers in "to:billyTheUser#gmail.com"');
'writers in "to:billyTheUser#gmail.com"');
'to:billyTheUser#gmail.com');
// this is just a test to confirm that some string searches successfully work
'modifiedDate > "2013-02-28" and title contains "untitled"');
Try flipping the operands within the in clause to read as:
var files = DriveApp.searchFiles('"billyTheUser#gmail.com" in writers');
Thanks #theAddonDepot! To illustrate specifically how the accepted answer is useful, I used it to assist in building a spreadsheet to help control files shared with various users. The source code for the full procedure is at the bottom of this post. It can be used directly within this this google sheet if you copy it.
The final result works rather nicely for listing out files by rows and properties in columns (i.e. last modified, security, descriptions... etc.).
The ultimate purpose is to be able to update large number of files without impacting other users. (use case scenario for sudden need to immediately revoke security... layoffs, acquisition, divorce, etc).
//code for looking up files by security
//Posted on stackoverlow here: https://stackoverflow.com/questions/62940196/return-collection-of-google-drive-files-shared-with-specific-user
//sample google File here: https://docs.google.com/spreadsheets/d/1jSl_ZxRVAIh9ULQLy-2e1FdnQpT6207JjFoDq60kj6Q/edit?usp=sharing
const ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("FileList");
const clearRange = true;
//const clearRange = SpreadsheetApp.getActiveSpreadsheet().getRangeByName("ClearRange").getValue();
//if you have the named range setup.
function runReport() {
//var theEmail= SpreadsheetApp.getActiveSpreadsheet().getRangeByName("emailFromExcel").getValue();
//or
var theEmail = 'billyTheUser#gmail.com';
findFilesByUser(theEmail);
}
function findFilesByUser(theUserEmail) {
if(clearRange){
ss.getDataRange().offset(1,0).deleteCells(SpreadsheetApp.Dimension.ROWS)
}
var someFiles = DriveApp.searchFiles('"' + theUserEmail + '" in writers');
var aListOfFiles = []
while(someFiles.hasNext()){
var aFile = someFiles.next();
aListOfFiles.push([aFile.getId()
,aFile.getName()
,aFile.getDescription()
,aFile.getSharingAccess()
,aFile.getSharingPermission()
,listEmails(aFile.getEditors())
,listEmails(aFile.getViewers())
,aFile.getMimeType().replace('application/','').replace('vnd.google-apps.','')
,aFile.getDateCreated()
,aFile.getLastUpdated()
,aFile.getSize()
,aFile.getUrl()
,aFile.getDownloadUrl()
])
}
if(aListOfFiles.length==0){
aListOfFiles.push("no files for " + theUserEmail);
}
ss.getRange(ss.getDataRange().getLastRow()+1,1, aListOfFiles.length, aListOfFiles[0].length).setValues(aListOfFiles);
}
function listEmails(thePeople){
var aList = thePeople;
for (var i = 0; i < aList.length;i++){
aList[i] = aList[i].getEmail();
}
return aList.toString();
}
I'm a vanilla C guy, and just having trouble wrapping my head around google script, in a google sheets project. I want to copy/paste a blob of text into cell(1, 1), press a button, then pull data out of that string and turn it into a pretty table. Just having trouble parsing the string...
My thought is to search for the word "DUG", then give me the word after it.
The format of the blob is:
garbage garbage DUG Username garbage garbage DUG Usernumber garbage garbage DUG Username garbage garbage DUG Usernumber (etc etc).
The problem: I can locate the first instance of the word DUG, but I can't seem to chop the string at that location. I've heard to use left(), right(), or mid(), but it's saying those functions don't exist. It seems like I'm missing something easy. Any tips for a noob?
Thank you so much!
// function runs when button is clicked
function SortCustomerList() {
var spreadsheet = SpreadsheetApp.getActiveSheet();
var sheetName = spreadsheet.getName();
var pastedDataRange = spreadsheet.getRange(1, 1); // location of the pasted data
var pastedData = pastedDataRange.getValue(); // get the data from that cell
var pastedText = pastedData.toString(); // convert that data into a string
// returns an index location of where "DUG" starts in the string
var foundLoc = pastedText.indexOf("DUG");
// show me that location for testing (number. int?) ***works!!***
spreadsheet.getRange(4, 1).activate();
spreadsheet.getCurrentCell().setValue(foundLoc);
// i want to chop the string at "DUG", show me everything to the right, starting where you found "DUG"
// ****doesnt work. returns the entire paste, not chopped
spreadsheet.getRange(10, 1).activate();
spreadsheet.getCurrentCell().setValue(pastedText.split(foundLoc));
// future: rather than the rest of the string, just give me the word after "DUG"
// set index (foundLoc) to AFTER the "DUG Username", to find the next instance of "DUG"
// loop it, until there are no more instances of the word "DUG"
// put it into pretty rows and columns
spreadsheet.getRange(5, 1).activate();
spreadsheet.getCurrentCell().setValue('Test Ran!');
};
it looks like the tool i needed is substring(). just replace split() with substring() and it "just works". thank you to the friend who helped, and thank you all for the help ive looked up over the years. cheers!
This question is an extension from another.
Apply basic filter to multiple values in a spreadsheet column
I am experiencing an error, specifically Service error: Spreadsheets (line 8, file "Filter") with the following code:
function testFilter() {
var ss = SpreadsheetApp.getActive();
var monthlyDetailSht = ss.getSheetByName("Monthly_Detail");
var filterRange = monthlyDetailSht.getRange(2,12,359,1).getValues(); //Get L column values
var hidden = getHiddenValueArray2(filterRange,["Apple"]); //get values except Apple
var filterCriteria = SpreadsheetApp.newFilterCriteria().setHiddenValues(hidden).build();
var rang = monthlyDetailSht.getDataRange();
var filter = rang.getFilter() || rang.createFilter();// getFilter already available or create a new one
//remove filter and flush
if(monthlyDetailSht.getFilter() != null){monthlyDetailSht.getFilter().remove();}
SpreadsheetApp.flush();
filter.setColumnFilterCriteria(12, filterCriteria);
};
//flattens and strips column L values of all the values in the visible value array
function getHiddenValueArray2(colValueArr,visibleValueArr){
var flatArr = colValueArr.map(function(e){return e[0];}); //Flatten column L
visibleValueArr.forEach(function(e){ //For each value in visible array
var i = flatArr.indexOf(e.toString());
while (i != -1){ //if flatArray has the visible value
flatArr.splice(i,1); //splice(delete) it
i = flatArr.indexOf(e.toString());
}
});
return flatArr;
}
I have used a Logger.log(hidden) to capture the values returned by the function and it is a list of all of the other "fruits" repeated as many times as they are available in column L. I am using fruits as a substitute for the sensitive data.
So here goes my question. Why am I getting that error now when it was working perfectly fine for a couple of days? How can I correct this?
Attempted fixes:
I've tried to add rows to the end of my data. Did not fix.
I tried removing filter, flushing, setting filter. Did not fix. (updated code above with what I did to flush in case anyone else is interested.)
It's working now. A couple of things I want to note for people who stumble upon this with their google searches. First, the issue was in fact an error on Google's side. Using the same code I have above it now works. I did not change it.
Second, I was able to record the filtering through the macro recorder and that code worked when my original code did not. This may help people who are on a time crunch and can't wait for google to get their stuff together. I'm still not sure what specifically in my original code caused the error, but my guess is that it does not matter. I've dedicated a full day to researching this error and it seems sporadic with not a single culprit. My issue may not be the same as yours if it happens in the future.
Hope that helps!
First, thanks to Serge insas for enough help to even write this script!
The script runs, but doesn't give me anything like the desired result.
What I want to do is
(1) Create a new document in a given folder(Let's call it 'myfolder') and write a title for it.
That works--sort of. I get a document, but it's icon is a tiny image for a doc file but with its corner turned over. It only opens in the viewer. It does have the title, but nothing that I tried to write to it subsequently.
(2) Get an array of all the files in 'myfolder'. That's where the Serge's help came in. They need to be converted into readable documents. I THINK that worked (More on this later). At least the debugger did not throw an error. I ended up with a 'contents' array.
(3) For each of those documents, get the file name and extract the Table of Contents. Append each of those to the doc created in (1). To do this, I used a for loop which iterated from one to contents.length. Now the FIRST problem arose. Whatever contents.length brought back wasn't right because the next loop, where the processing occurred errored out at approximately the number of documents, not counting these little half-docs the script generated. I got around this by a try-catch construction that stopped when it hit an undefined file. That and the execution transcript suggested that the script did go through that loop.
So here's the BIG problem. Whatever those little half-docs were, nothing other than the first introductory line was written to them.
The debugger is useless.
(a) It doesn't show me the log file. Just a date.
(b) If I place a breakpoint, it MAY stop at it, but I see mostly a list of objects. If I click on the + sign, I get a list of methods. A few of the non-object variables are shown. It's impossible to step through the code because it can take up to three minutes to go from one line to the next, so it's been rather tough to debug this.
The only info I could garner was that the execution transcript did suggest that I did get doc objects from the files.
I think the problem is with a*doc = DocumentApp.openById(docObject[jj]);* which is in bold in the code below.
Sorry for the strange error handling; I was trying to get some insight into what was wrong. Alas, the debugger tells me that Err is an string and gives me no value.
Whether the rest of the code works or not, I can't tell if it does, indeed error out the first time it hits the bolded line.
Thanks for your patience.
function listDocTOCsInFolder()
{
// Thanks to crucial help from Serge insas via Stack Overflow.
var afolder; // Folder you want to work on
var contents; // Files in the folder
var TOCListDoc; // Document you will create to hold your TOC List
var docObject = []; // Holds a list of documents created from contents
var aname;
var adoc;
var err = "";
var isErr = false;
TOCListDoc = DocsList.createFile("TOCList", "Document TOC List");
afolder = DocsList.getFolderById("0B-UcimyrHLl2bm1OanExaHotc2M")
//Can't figure out what exactly constitutes the path of a folder.
TOCListDoc.addToFolder(afolder);
// Get all the document files in your named folder. Unfortunately,they are not document objects
var contents = afolder.getFilesByType(DocsList.FileType.DOCUMENT);
// This loop gives you an array of DocumentApp objects.
for (var ii = 0; ii < contents.length; ii++)
{
docObject.push(DocumentApp.openById(contents[ii].getId()));
}
// Now you can do a for loop to gather up the contents into one document.
Logger.log(contents.length);
// It isn't clear what this actually gets, because unless I set a trap of
// undefined documents, the loop keeps right on going.
var len = docObject.length;
var jj = 0;
for(jj = 0; jj < len; II++ )
{
try
{
**adoc = DocumentApp.openById(docObject[jj]);**
}
catch(err)
{
isErr = true;
}
if(!isErr)
{
// Get the information you want to write to your list doc
var TOC = adoc.getAs(DocumentApp.ElementType.TABLE_OF_CONTENTS);
logger.log(TOC);
aname = adoc.getName();
Logger.log(aname);
body.appendParagraph(counter, name);
body.appendParagraph(TOC);
//.setHeading(DocumentApp.ParagraphHeading.HEADING1);
}
else
{
Logger.log("Errored out");
}
}
}
when you say
"For each of those documents, get the file name and extract the Table
of Contents. Append each of those to the doc created in (1)
that's indeed what you should do... you used this code :
// Get the information you want to write to your list doc
var TOC = adoc.getAs(DocumentApp.ElementType.TABLE_OF_CONTENTS);
logger.log(TOC);
but doing this you assume that the first element in this document is necessarily the TOC and I'm not sure you can do that !!
What I I would try is to iterate through all the document's elements and check the type of each element, then copy the one that is a TOC to your new document.
See eventually this post to check how you could iterate into the document's elements and check their types, the purpose of this script was different but I guess the approach should help you.
Good luck
instead of
for(jj = 0; jj < len; II++ ) try for(jj = 0; jj < len; jj++ ) instead
and: read again the troubleshooting guide...
getAuthors() in Google Apps Script returns an array, so I'm assuming it is intended to capture all people who have edited the page. However, it only returns a single value, which seems to be the person who created the page. If another person edits the page, it still returns an array containing only the first value.
Is getAuthor() intended to be limited to just the creator?
Is there any way to return an array with all of people who have worked on
a page (those who have saved edits)?
If not, is possible to change the author?
function testAuthor(){
var site = SitesApp.getSite(DOMAIN, NAME);
var decendents = site.getAllDescendants();
for (var i=0;i<decendents.length; i++){
Logger.log(decendents[i].getAuthors().join(', '));
};
}