How to replace text within a pdf file? - google-apps-script

Is it possible to replace text within a pdf file using Google Apps Script?
I am trying the following code without success on the replace, it seems like the string is encoded in a way I cannot understand.
var pdfFile = DocsList.getFileById("pdf-doc-id");
var asBlob = pdfFile.getBlob();
var asString = asBlob.getDataAsString();
var s2s = "old string";
var s2r = "new string";
var repString = asString.replace(s2s, s2r);
var repBlob = Utilities.newBlob(repString).setContentType("application/pdf").setName("Testing");
DocsList.createFile(repBlob);
EDIT1: I got an empty pdf back
Any ideas?
Thanks

The function getDataAsString() doesn't return the textual content of a PDF file, but instead a textual representation of the binary content of the file. That function works on any file, even those that don't have text (like images).
Unfortunately I don't think you can fully accomplish your goal with Apps Script. If you are able to import your PDF as a Google Document using the Drive UI, then you can use Apps Script's DocumentApp to modify the document and export it as a PDF.

Related

Trying to export to pdf using google app scripts

I am currently trying to export a sheet as a pdf. This works fine using the getBlob() parameter, but I wish to export with a custom name of the file. When trying to use the following, the MimeType throws up an error message, and if removed the file is exported as plain text which I don't want. Any help would be great, either with exporting the file as a pdf or renaming it later
var sqc = ss.getSheetByName('SQC')
var folder = DriveApp.getFolderById('FOLDER ID')
//var pdf = folder.createFile(ss.getBlob());
var pdf = folder.createFile('New Text File', ss.getBlob, MimeType.exportPDF)
Based on my understanding you are trying to export your Google Sheet to a PDF file.
Here are some of my observations:
You are using an invalid MimeType, it should be MimeType.PDF
When you use createFile(name, content, mimeType), It expects a string input (Not a blob object). If you use a blob object as its content, the output pdf file will have the text Blob in it.
The most easiest way to export your sheet to pdf is to set your blob's name with file extension.
Sample Replication:
var blob = ss.getBlob();
// Test 1, set the file name and mimetype using createFile()
folder.createFile("Test1",blob,MimeType.PDF);
// Test 2, set the file name with extension in the blob object
blob.setName("Test2.pdf");
folder.createFile(blob);
Output:
Test1
Test2
I was able to sort it out. Just chuck in
pdf.setName = "Name"

PDF Realtime down load and conversion

Im Looking for a way to use Google Apps Script to Download PDF file and convert file into Google Sheets.
The reason for this is that website only gives data in PDF form and i cant use Import function to get data for real-time updates
It depends on the way you will download your pdf files.
Here is simple example how you can convert PDF file from your Google Drive into Googe Document:
function pdfToDoc() {
var fileBlob = DriveApp.getFilesByName('test.pdf').next().getBlob();
var resource = {
title: fileBlob.getName(),
mimeType: fileBlob.getContentType()
};
var options = {
ocr: true
};
var docFile = Drive.Files.insert(resource, fileBlob, options);
Logger.log(docFile.alternateLink);
}
To make it work you need to enable Drive API:
Based on the answer: https://webapps.stackexchange.com/a/136948
And as far as I can see there is only DOC as output. Probably you can extract data from DOC and put it into Spreadsheet with script. But it depends on how exactly looks your data.

How to parse a XML file stored in my google drive but which stands out as a html type?

How to parse a XML file stored in my google drive but which stands out as a html type ?!
I save on my google Drive cloud a copie of an xml of the source: http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621
I can parsing the source but i cant'xml parsing the copie that look like a html type !!
i have parsing error like: The element type "meta" must be terminated by the matching end-tag ""
or Element type "a.length" must be followed by either attribute specifications, ">" or "/>"
I shared it on https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing to give you an access and test my script.
I know that i can using cacheService and it works but for have other control of the buffering i woud try this way
function xmlParsingXmlStoreOnGoogleDrive(){
//So , this is the original xml that is good parsed
var fetched=UrlFetchApp.fetch("http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621")
var blob=fetched.getBlob();
var getAs=blob.getAs("text/xml")
var data=getAs.getDataAsString("UTF-8")
Logger.log(data.substring(1,350)); // substring to not saturate the debug display this expected code XML:
/*
?xml version="1.0" encoding="utf-8"?>
<!-- Copyright © 2019 AlloCiné -->
<movie code="265621" xmlns="http://www.allocine.net/v6/ns/">
<movieType code="4002">Long-métrage</movieType>
<originalTitle>Mise à jour sur Google play</originalTitle>
<title>Mise à jour sur Google play</title>
<keywords>Portrait of a Lady on Fire </keywords>
*/
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // Display the expected result :"Portrait of a Lady on Fire "
// And this my copie of the original xml, that i can't parsing
var fetched=UrlFetchApp.fetch("https://drive.google.com/file/d/1K3-9dHy-h0UoOOY5jYfiSoYPezSi55h1/view?usp=sharing")
var blob=fetched.getBlob();
var getAs=blob.getAs("text/xml")
var data=getAs.getDataAsString("UTF-8")
Logger.log(data.substring(1,350)); // substring to not saturate the debug display this non expected code HTML !:
/*
!DOCTYPE html><html><head><meta name="google" content="notranslate"><meta http-equiv="X-UA-Compatible" content="IE=edge;">
<style>#font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:local('Roboto Italic'),local('Roboto-Italic'),
url(//fonts.gstatic.com/s/roboto/v18/KFOkCnqEu92Fr1Mu51xIIzc.ttf)format('truetype');}#font-face{font-fam......
*/
var xmlDocument=XmlService.parse(data); // ABORT WITH THE ERROR: Element type "a.length" must be followed by either attribute specifications, ">" or "/>"
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords);
}
I read on this similar ask :Parse XML file (which is stored on GoogleDrive) with Google app script
that "Unfortunately we can't directly get xml files in the google drive" !!
Is it right and would that simply mean that I can not realize my script?
You want to retrieve the data from the file on Google Drive and parse as XML data using XmlService.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this answer?
Modification points:
About var fetched=UrlFetchApp.fetch("https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing"), in this case, the file content cannot be retrieved from this endpoint. If you want to retrieve the file content with UrlFetchApp, please use the endpoint of https://drive.google.com/uc?id=16kJ5Nko-waVb8s2T12LaTEKaFY01603n&export=download. This is webContentLink.
When the file is in your Google Drive and/or shared publicly, you can retrieve the data with the script of DriveApp.getFileById(fileId).getBlob().getDataAsString().
Modified script:
For example, when your shared sample file of https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing is used, the script becomes as follows.
Sample script 1:
In this pattern, the file content is retrieved from your shared file with UrlFetchApp.fetch().
var data = UrlFetchApp.fetch("https://drive.google.com/uc?id=16kJ5Nko-waVb8s2T12LaTEKaFY01603n&export=download").getContentText(); // Modified
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // <--- You can see "Portrait of a Lady on Fire" at log.
In this case, the script is required to be shared publicly. If you want to retrieve the file content without sharing, please use the access token for requesting.
Sample script 2:
In this pattern, the file content is retrieved from your shared file with DriveApp.getFileById().
var fileId = "16kJ5Nko-waVb8s2T12LaTEKaFY01603n"; // Added
var data = DriveApp.getFileById(fileId).getBlob().getDataAsString(); // Added
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // <--- You can see "Portrait of a Lady on Fire" at log.
16kJ5Nko-waVb8s2T12LaTEKaFY01603n of https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing is the file ID.
In this case, the file is not required to be shared. But the file is required to be in your Google Drive.
References:
Files of Drive API
webContentLink: A link for downloading the content of the file in a browser using cookie based authentication. In cases where the content is shared publicly, the content can be downloaded without any credentials.
getFileById(id)
If I misunderstood your question and this was not the direction you want, I apologize.
Wonderful ! You are write. Your two suggestions are working.
I just made a mistake elsewhere in my code. So that solution 1 does not work anymore.
That is why give a new script to test it . For my training only, because my project is safe thanks to you :)
function storeXmlOnGoogleDriveThenParsIt(url){
url=url||"http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621"; // to test
// on my Google Drive i make a copi of the url called. (This to preserve the server from too many request.)
var bufferedXml=DriveApp.getRootFolder().searchFolders('title = "BufferFiles"').next().createFile("xmlBuffered.xml", UrlFetchApp.fetch(url).getContentText(),MimeType.PLAIN_TEXT);
var urlBufferedXml=bufferedXml.getUrl() // The new url ,of the buffered file
var fileId=urlBufferedXml.match(/https:\/\/drive.google.com\/file\/d\/(.*)\/view.*/)[1];
//Now i want to pars the buffered xml file
//[ Your seconde way to get data is working perect ! THANK YOU A LOT !!!
var data = DriveApp.getFileById(fileId).getBlob().getDataAsString();
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var mynamespace=root.getNamespace();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log("keywords:"+keywords) // and parsing success ]
//[ The first way to get data was ok BUT DAMNED it now aborting ! Since modifications on the line code that create the xml, and i cant' retrieve the right code
var downloadUrlBufferedXml="https://drive.google.com/uc?id="+fileId+"&export=download";
var data = UrlFetchApp.fetch(downloadUrlBufferedXml).getContentText(); // was good but now data is here again like a html text ! :(
Logger.log("data"+data.substring(1,350)); // this show that data is HTML type and not XML type ! :(
var xmlDocument=XmlService.parse(data); // So i have Error like: The element type "meta" must be terminated by the matching end-tag "</meta>" ]
var root=xmlDocument.getRootElement();
var mynamespace=root.getNamespace();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log("keywords:"+keywords)
}

PDF Template Archiving

I have Created a form that generates a response sheet. I also have created a Doc which is a Template that my responses fill into. From here it was being turned into a PDF and e-mailed to specific recipients. I now need to archive these into specific folders based on a columns answer. I simply first would like to just be able to move or copy them into a specific folder. How is this possible. I have used multiple scripts but just cant see where the disconnect is. Any help would be greatly appreciated. Thank you enter link description here
You could try using some code like this:
function moveFileToFolder() {
var theFolder = DriveApp.getFolderById('your Folder ID');
var theFile = DriveApp.getFileById('Your File ID').makeCopy(theFolder);
var oldFileName = theFile.getName();
var archivedName = oldFileName.slice(5);
Logger.log('archivedName: ' + archivedName);
archivedName = "archive" + archivedName;
theFile.setName(archivedName);
}
To delete the old file without having to send it to the trash:
//This requires the Drive API To be turned on in the Advanced Google Services.
//RESOURCES menu, ADVANCED GOOGLE SERVICES
function deleteFile(idToDLET) {
//idToDLET = 'the File ID';
//This deletes a file without needing to move it to the trash
var rtrnFromDLET = Drive.Files.remove(idToDLET);
}

Is it possible to search with in an uploaded pdf or google doc for keywords?

Here is the situation:
I have a pdf and google doc in my drive list of documents. I would like to build an interface that could search through these documents for keywords and return the document name and possible preview of the text that matched the search parameters. Is this or some variation of this possible?
Regards,
Shawn
Getting text from a Google Doc is simple:
// Get text from GDOC
var gdocDoc = DocumentApp.openById(gdocFile.id);
var text = gdocDoc.getBody().getText();
The pdfToText() utility from Get pdf-attachments from Gmail as text uses the advanced Drive service and DocumentApp to convert PDF to Google-Doc to text. You can get the OCR'd text this way, or save it directly to a txt file in any folder on your Drive.
// Start with a Blob object
var blob = DriveApp.getFilesByName("my.pdf")[0];
// filetext will contain text from pdf file, no residual files are saved:
var filetext = pdfToText( blob, {keepTextfile: false} );
Once you have the text, a search for keywords becomes dead easy!
if (filetext.indexOf( keyword ) !== -1) {
// Found keyword...
}