Google Apps Script - how can I LOAD pdf? - google-apps-script

Just a simple question as titled; I have a pdf file which only contains text then I want to load it and extract text.

I believe your goal is as follows.
You want to retrieve the text data from PDF data including only texts using Google Apps Script.
In this case, how about the following flow?
Convert PDF to Google Document using Drive API as a temporal file.
Export text from the created Google Document.
When this is reflected in a script, it becomes as follows.
Sample script:
In this sample, Drive API is used. So, before you test this script, please enable Drive API at Advanced Google services.
function myFunction() {
const fileId = "###"; // Please set the file ID of PDF file on Google Drive.
// Convert PDF to Google Document.
const docId = Drive.Files.copy({title: "temp", mimeType: MimeType.GOOGLE_DOCS}, fileId).id;
// Retrieve text from Google Document.
const text = DocumentApp.openById(docId).getBody().getText();
// If you want to remove the template Google Document, please run this script.
// Drive.Files.remove(docId);
console.log(text); // You can see the retrieved text in the log.
// DriveApp.createFile("sample.txt", text); // If you want to save the text as a file, please use this line.
}
References:
Files: copy
getText()

Related

Download Google Sheet to .xlsx Format using Link - multiple tabs

I need using the link to download the Google Sheet to .xlxs format.
This is works for me. but it is for single tab only, the thing is I have to download three or more tabs. I believe the format of "gid" would be different.
https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=1eVcHMWyH1YIDN_i0iMcv468c4_jnPk9Tw5gea-2FCyk&gid=626501804&exportFormat=xlsx
I believe your goal is as follows.
You want to export a Google Spreadsheet in XLSX format.
You want to include several specific Sheets in the Google Spreadsheet.
In this case, how about the following workaround? In this workaround, a Google Spreadsheet including the several sheets you want to include is created as a temporal Spreadsheet. In this case, as a simple method, Google Apps Script is used for retrieving the URL.
Sample script:
function sample() {
const exportSheetNames = ["Sheet1", "Sheet2", "Sheet3"]; // Please set the sheet names you want to export.
const spreadsheetId = "###"; // Please set your Spreadsheet ID.
const source = SpreadsheetApp.openById(spreadsheetId);
const temp = source.copy("temp_" + source.getName());
temp.getSheets().forEach(s => {
if (!exportSheetNames.includes(s.getSheetName())) temp.deleteSheet(s);
});
const url = `https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=${temp.getId()}&exportFormat=xlsx`;
console.log(url);
}
When this script is run, you can see the URL for exporting the Spreadsheet in XLSX format including the specific sheets you want at the log. From your question, I thought that you might want the URL for exporting.
This is a simple sample script for achieving your goal. For example, if you want to automatically export the XLSX file using a script, you can see the sample script at this thread.

Convert HTML to RichText in Google App Script

I have to convert some HTML code to a Rich Text in order to assign it to a cell of a Google Spreadsheet App, and render it not as pure HTML, but as formatted text.
I am new on these stuffs, so I wish to have some suggestions from you.
Thank you so much!
I believe your goal is as follows.
You want to convert the rich text with HTML to the rich text into a cell of Google Spreadsheet.
You want to achieve this using Google Apps Script.
In this case, how about the following sample script? In this sample script, RichTextApp of a Google Apps Script library is used. I have created this library for managing the rich text between Google Spreadsheet, Google Document, and HTML.
Usage:
1. Install Google Apps Script library.
You can see the detail of this at here.
2. Sample script.
This sample script is from here. This script uses Drive API. So please enable Drive API at Advanced Google services.
function convertHTMLToRichText() {
var html = '###'; // Please set your HTML data. Of course, you can retrieve this from a HTML file on Google Drive.
var sheet = SpreadsheetApp.openById("###").getSheetByName("Sheet1"); // Please set the Spreadsheet ID and sheet name.
// Create Google Document by converting HTML to Google Document as a temporal file.
var blob = Utilities.newBlob(html, MimeType.HTML, "sample.html");
var tempDocId = Drive.Files.insert(
{ title: "temp", mimeType: MimeType.GOOGLE_DOCS },
blob
).id;
// Put the value to a cell as the rich text using the method of "DocumentToSpreadsheet".
var res = RichTextApp.DocumentToSpreadsheet({
range: sheet.getRange("A1"),
document: DocumentApp.openById(tempDocId),
});
console.log(res);
// Remove the temporal file.
DriveApp.getFileById(tempDocId).setTrashed(true);
}
In this sample script, html of HTML data is converted to the rich text and put to a cell ("A1" of "Sheet1") of Google Spreadsheet as the rich text.
Note:
This is a simple sample script. So please modify this for your actual situation.
Reference:
RichTextApp

How to change the thumbnail of a google drive .zip file?

I have come across this guide to change the thumbnail of a .zip file. Though, I am not strong in python, and would prefer a google scripts alternative. Is this possible?
Thankyou in advance
I believe your goal is as follows.
You want to change the thumbnail of a ZIP file using Google Apps Script.
In this case, how about the following sample script?
Sample script:
Before you use this script, please enable Drive API at Advanced Google services. And, please set the file IDs of ZIP file and the image file you want to use as the new thumbnail image.
function myFunction() {
const zipFileId = "###"; // Please set the file ID of zip file.
const imageFileId = "###"; // Please set the file ID of the image file.
const metadata = {
thumbnail: {
image: Utilities.base64EncodeWebSafe(DriveApp.getFileById(imageFileId).getBlob().getBytes()),
mimeType: "image/png",
}
};
Drive.Files.update(metadata, zipFileId); // or const res = Drive.Files.patch(metadata, zipFileId);
}
Note:
About the limitation of the thumbnail, please check this thread.
When you want to use the downloaded image as the thumbnail, please modify image: Utilities.base64EncodeWebSafe(DriveApp.getFileById(imageFileId).getBlob().getBytes()), as follows.
image: Utilities.base64EncodeWebSafe(UrlFetchApp.fetch("URL").getContent()),
In the current stage, Drive API of Advanced Google services is the version 2. By this, the request body is different from Drive API v3. Please be careful this.
Reference:
Files: update

Opening CSV file into Google Sheets has certain lines being merged into a single cell

I have a webscraper creating an index of links and attributes that is outputting to csv, with the intent of opening it up in Google Sheets and using Filter Views to sort and search through it.
However when I open the csv in Sheets, groups of lines are being stuck into a single cell .
I have shared a test file that demonstrates my problem and gives me the result in the image here
I also tried opening it with LibreOffice and got the same result.
Any help would be great.
You simply have an error in the CSV. If you simply add the right double quotes (as you can see here), you will have no problem in any spreadsheet.
When I saw your CSV data, I noticed that at 6th row which has A Single Phone Call, " is not closed. I thought that this might be the reason of your issue. In this case, when the CSV data is retrieved using IMPORTDATA, I confirmed the same issue with your situation.
But, I confirmed that when Utilities.parseCsv() of Google Apps Script is used, your CSV data could be correctly parsed. So how about the following sample script?
Sample script 1:
In this sample, the Google Apps Script is used as the custom function. In this case, please use the file ID of the CSV file. Please copy and paste the following script to the script editor of Google Spreadsheet, and put the custom function of =SAMPLE("1WB2PZ1vyuarBSzjETkbWmn744ZC41Ps8") to a cell. 1WB2PZ1vyuarBSzjETkbWmn744ZC41Ps8 is the file ID of your sample CSV file.
const SAMPLE = fileId => Utilities
.parseCsv(
UrlFetchApp.fetch(`https://drive.google.com/uc?export=download&id=${fileId}`).getContentText()
);
If you want to directly retrieve the CSV data from an URL, you can also the following script.
const SAMPLE = url => Utilities.parseCsv(UrlFetchApp.fetch(url).getContentText());
Result:
When this script is used, the following result is obtained.
Sample script 2:
In this sample, the Google Apps Script is run with the script editor. Please copy and paste the following script to the script editor of Google Spreadsheet and set the variables, and then, please run myFunction. By this, the CSV data is parsed and put to the Spreadsheet.
function myFunction() {
const sheetName = "Sheet1"; // Please set the sheet name.
const id = "1WB2PZ1vyuarBSzjETkbWmn744ZC41Ps8"; // This is the file ID of your sample CSV file.
// 1. Retrieve CSV data.
const csv = DriveApp.getFileById(id).getBlob().getDataAsString();
// 2. Parse CSV data to an array.
const ar = Utilities.parseCsv(csv);
// 3. Put the array to the Spreadsheet.
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
sheet.getRange(1, 1, ar.length, ar[0].length).setValues(ar);
}
This sample script is the same result with above one.
References:
Custom Functions in Google Sheets
parseCsv(csv)

.xlsx File is Corrupt when Converted to Blob in Google Apps Script

I have as series of Google Apps Scripts that seem to work well with converting .xlsx attachments from my Gmail and saving them as Google Sheets in my Drive.
Unfortunately, I'm not having luck duplicating that process to an .xlsx file that I'm trying to download from a specific URL.
Here's the code:
// Fetch the URL and convert it to a blob
var url = ** URL with the file **;
var response = UrlFetchApp.fetch(url);
var uniqueFeed = Utilities.newBlob(response, "application/vnd.ms-excel", "Unique.xlsx");
var fileToImport = DriveApp.createFile(uniqueFeed);
That DOES create the file in my Drive. However, it's corrupt. When I console.log the blob as a string, I get a bunch of random junk (for example):
[20-06-29 15:35:49:776 CDT] PK�����!�v��������[Content_Types].xml �(������������������������������������������������������������������������������������������������������������������������U�N�0�#����q�!Ԕ�$�L�M�:���B���MA�C��$�cg�c{2�^t����ƻJ��CQ���6�������RH�ie��J,����d���W;�DK��ĺ�Na�8���)�fldP�L5 G�ᅬ
My code to convert the .xlsx file then fails, I'm ASSuming because the newly created file is no good. I also don't get why it mentions .xml. The URL and the file are definitely .xlsx.
Just to clarify one thing: If I manually download the file from the URL to my computer, and then drop it into my Drive, I CAN convert it to a Google Sheet without issue. I'm trying to automate that manual step out... hence the problem.
Any insights would be much appreciated. I can PM the URL upon request -- just don't want it floating around the interwebs.
I believe your goal and situation as follows.
You want to download a XLSX file from the URL and save it to your Google Drive using Google Apps Script.
Your URL is the direct link of the XLSX file.
For this, how about this answer?
Modification point:
UrlFetchApp.fetch(url) returns the object of HTTPResponse. In this case, you can directly retrieve the file blob using getBlob() from the object of HTTPResponse.
When this point is reflected to your script, it becomes as follows.
Modified script:
From:
var response = UrlFetchApp.fetch(url);
var uniqueFeed = Utilities.newBlob(response, "application/vnd.ms-excel", "Unique.xlsx");
To:
var uniqueFeed = UrlFetchApp.fetch(url).getBlob().setName("Unique.xlsx");
Note:
If your URL doesn't directly return the XLSX file, this modification cannot be used. So please be careful this.
References:
fetch(url)
getBlob()