Convert a TXT delimited TAB to Google Sheets - google-apps-script

I'm looking for a mean to convert my TXT file into a Google Sheets :
function convert_txt_gsheets(){
var file = DriveApp.getFilesByName('file.txt').next();
var body = file.getBlob().getDataAsString().split(/\n/);
var result = body.map( r => r.split(/\t/));
SpreadsheetApp.getActive().getSheets()[0].getRange(1,1,result.length,result[0].length).setValues(result);
return;
}
An error occured "The number of columns in the data does not match the number of columns in the range. The data has 1 but the range has 18."
Does someone have an idea ?
If I import the txt file manually it works but I need to do it through an G apps script.

I only see typos/wrong method names for getBlob and getFilesByName (you used getBlobl and getFileByName), but aside from that, the only possible issue that will cause this is that something from the file is written unexpectedly.
Update:
Upon checking, your txt file has a line at the bottom containing a blank row. Delete that and you should successfully write the file. That's why the error is range is expecting 18 columns but that last row only has 1 due to not having any data.
You could also filter the file before writing. Removing rows that doesn't have 18 columns will fix the issue. See code below:
Modification:
var result = body.map( r => r.split(/\t/)).filter( r => r.length == 18);
Successful run:

Related

Parsing a CSV from a Mail (Gmail) with double quotes and ? characters with Google Apps Script

I did set up routines with the following code to parse CSVs into specific spreadsheets:
function updateGmvAndNmv() {
const threads = GmailApp.search("from:(sender#xxx.de) subject:(uniqueHeader)");
const messages = threads[0].getMessages();
const length = messages.length;
const lastMessage = messages[length - 1];
const attachemnt = lastMessage.getAttachments()[0];
const csvData = Utilities.parseCsv(attachemnt.getDataAsString(), ",");
const ss = SpreadsheetApp.openById("spreadsheetID").getSheetByName("sheetName")
const ssOriginalRange = ss.getRange("A:E");
const ssToPaste = ss.getRange(1,1,csvData.length,csvData[0].length);
ssOriginalRange.clear();
ssToPaste.setValues(csvData)
}
With the latest CSV that I want to parse, I encounter an issue, where I am stuck. I tried to play around with the settings in the app that sends me the report but I can not change the way the CSV is being constructed. When I look at the CSV with a text Editor, I see something like this:
GMV and NMV per partner
"Merchant",,"NMV","GMV bef Cancellation","GMV bef Return"
When I let the above code run, it gets the file and outputs the following in my spreadsheet:
Spreadsheet Example
Which brings up the following questions:
Why do I have "" (double quotes) in row 5? I assumed the parseCsv-function removes those.
With my other CSVs I did not have any issues, but there I did not have any double quotes. Can someone explain the difference in CSVs, once with double quotes and once without?
How can I treat this data correctly, in order to get the data without the "" into the spreadsheet?
Why do I see some ? symbols (please look at the fx input field, row 1 and 7) and how do I get rid of them? The export should be without any format (CSV) and in a text editor I do see all values normally - without any ?.
The issue was the encoding. The correct encoding of the file is UTF-16, while the standard encoding of .getDataAsString() is UTF-8.

Prevent Auto-Format DriveApi 3 Google Apps script

Using the Drive API3, I'm looking for a way to make a copy of a CSV file in Google Sheets format, without having to convert the text to numbers, nor the functions and dates as it can be proposed in the Google Sheets menu:
File>Import>(Select your CSV file)> Untick "Convert text to number, dates and formula".
At the moment, I've got something such as :
function convert(){
var file = DriveApp.getFileById('1234');
var resource = { title : "Title", mimeType : MimeType.GOOGLE_SHEETS,parents : [{id: file.getParents().next().getId()}],}
Drive.Files.copy(resource,file.getId())
}
To illustrate my example : I've got a text in my CSV file "2021-25-03", if I run my macro, the new spreadsheet will automaticaly format my text to a Date and that's not my goal.
TFR.
There doesn't seem to be a setting in the API or in Apps Script to prevent the automatic conversion of numbers and dates, but we can build a script to work around this. Two tools are useful:
Apps Script's Utilities.parseCsv() method, which will build a 2D array of the values in the CSV file (as pure text--it does not interpret numbers and dates).
The fact that Google Sheets interprets any value starting with a single quote ' as text. This is true whether the value is entered in the UI or programmatically.
So the overall strategy is:
Copy the file as you are doing (or just create a new blank file, as we will write the values to it).
Parse the CSV values and prepend a ' to each one.
Write these modified values to the sheet.
Something like this:
function convert(){
var file = DriveApp.getFileById(CSV_FILE_ID);
// Create the copy:
var resource = { title : "Title", mimeType : MimeType.GOOGLE_SHEETS,parents : [{id: file.getParents().next().getId()}],}
var sheetsFile = Drive.Files.copy(resource,file.getId())
// Parse the original csv file:
var csv = Utilities.parseCsv(file.getBlob().getDataAsString())
// csv is a 2D array; prepend each value with a single quote:
csv.forEach(function(row){
row.forEach(function(value, i){
row[i] = "'" + value
})
})
// Open the first (and only) sheet in the file and overwrite the values with these modified ones:
var sheet = SpreadsheetApp.openById(sheetsFile.id).getSheets()[0]
sheet.getRange(1,1,csv.length, csv[0].length).setValues(csv)
}

XLSX Sheetnames unexpectedly being truncated by using Openpyxl load_workbook

I wrote a simple script to read a thousand xlsx files, with files having 400~500 Sheets and names with more than 50 characters. After obtaining the sheet names, the script would save those names into csv files that would eventually upload to a DB. Here is the script:
extension = 'XLSX'
xlsxfiles = [i for i in glob.glob('*.{}'.format(extension))]
for xlsxfile in xlsxfiles:
fins = op.load_workbook(xlsxfile,read_only=True)
sheetnames = fins.sheetnames
with open('test_xlsx-'+xlsxfile+'.csv','w',newline = '') as fout:
fout.write(str(xlsxfile))
I have two issues that need help:
Openpyxl load_workbook only returned 31 characters of the sheetnames. If more than 31, it truncates to “Sheetname something something_4””, but it should be
“Sheetname something something Real”
I tried Pandas.ExcelFile.sheet_names but got the same issue.
The CSV file saved the sheetnames as a column by column.
[‘Cover Page’ ‘Sheetname something something_4’ ‘Sheetname other’]
But I need the data as a row by row and drop all “[“ or “ ’ “.
Cover Page
Sheetame something something Real
Sheetname other
I am a novice in Python. All ideas and comments are welcome.
Still unable to get how to fix the first 31-characters issue.
For the second issue, I add a for loop for going through each sheet name and treat each one as a list. Here is code.
extension = 'XLSX'
xlsxfiles = [i for i in glob.glob('*.{}'.format(extension))]
for xlsxfile in xlsxfiles:
fins = op.load_workbook(xlsxfile,read_only=True)
sheetnames = fins.sheetnames
with open('test_xlsx-'+xlsxfile+'.csv','w',newline = '') as fout:
sheetnameout = csv.writer(fout)
for name in sheetnames:
sheetnameout.writerow([name]) # That "[]" took me 8 hours.
fout.close()
Again, I am novice in Python. All ideas and comments are welcome.

persistently sporadic out of memory error in Google Apps Scripts when copying data validations from a template

I have hundreds of Spreadsheets that were made from a template sheet. They all have the same number/name of sheets, rows, columns, etc...
I have added some data validations to the template. I want to copy the data validations from the template to each of the Spreadsheets. I have the code, and it works, but it throws a memory error.
It always throws the error -- the only thing that changes is how many destination Spreadsheets it has processed before it throws the error. Sometimes it'll process 4 Spreadsheets before it throws the error, sometimes 50, sometimes more/less. I cannot figure out why.
I trimmed my code down to a working sample. I can't share the source files but they are just normal Spreadsheets with 5 sheets/tabs and various data validations. If it matters, the data validations do use named ranges. For example: =REGEXMATCH(LOWER(google_drive_url) , "^https:\/\/drive\.google\.com\/drive\/folders\/[a-z0-9_-]{33}$").
I have commented the below code but here is a recap:
Get the template Spreadsheet and cache all of the data validations in it
Go through each destination Spreadsheet:
Alear all of the data validations
Apply the data validations from the template
In my real code I have an array of destination file IDs. For testing purposes I am just using one destination file and applying the data validations from the template multiple times.
function myFunction() {
var sourceFileID = "1rB7Z0C615Kn9ncLykVhVAcjmwkYb5GpYWpzcJRjfcD8";
var destinationFileID = "1SMrwTuknVa1Xky9NKgqwg16_JNSoHcFTZA6QxzDh7q4";
// get the source file
var sourceSpreadsheet = SpreadsheetApp.openById(sourceFileID);
var sourceDataValidationCache = {};
// go through each sheet and get a copy of the data validations
// cache them for quick access later
sourceSpreadsheet.getSheets().forEach(function(sourceSheet){
var sheetName = sourceSheet.getName();
// save all the data validations for this sheet
var thisSheetDataValidationCache = [];
// get the full sheet range
// start at first row, first column, and end at max rows and max columns
// get all the data validations in it
// go through each data validation row
sourceSheet.getRange(1, 1, sourceSheet.getMaxRows(), sourceSheet.getMaxColumns()).getDataValidations().forEach(function(row, rowIndex){
// go through each column
row.forEach(function(cell, columnIndex){
// we only need to save if there is a data validation
if(cell)
{
// save it
thisSheetDataValidationCache.push({
"row" : rowIndex + 1,
"column" : columnIndex + 1,
"dataValidation" : cell
});
}
});
});
// save to cache for this sheet
sourceDataValidationCache[sheetName] = thisSheetDataValidationCache;
});
// this is just an example
// so just update the data validations in the same destination numerous times to show the memory leak
for(var i = 0; i < 100; ++i)
{
// so we can see from the log how many were processed before it threw a memory error
Logger.log(i);
// get the destination
var destinationSpreadsheet = SpreadsheetApp.openById(destinationFileID);
// go through each sheet
destinationSpreadsheet.getSheets().forEach(function(destinationSheet){
var sheetName = destinationSheet.getName();
// get the full range and clear existing data validations
destinationSheet.getRange(1, 1, destinationSheet.getMaxRows(), destinationSheet.getMaxColumns()).clearDataValidations();
// go through the cached data validations for this sheet
sourceDataValidationCache[sheetName].forEach(function(dataValidationDetails){
// get the cell/column this data validation is for
// copy it, build it, and set it
destinationSheet.getRange(dataValidationDetails.row, dataValidationDetails.column).setDataValidation(dataValidationDetails.dataValidation.copy().build());
});
});
}
}
Is there something wrong with the code? Why would it throw an out of memory error? Is there anyway to catch/prevent it?
In order to get a better idea of what's failing I suggest you keep a counter of the iterations, to know how many are going through.
I also just noticed the line
sourceSheet.getRange(1, 1, sourceSheet.getMaxRows(), sourceSheet.getMaxColumns()).getDataValidations().forEach(function(row, rowIndex){
This is not a good idea, because getMaxRows() & getMaxColumns() will get the total number of rows in the sheet, not the ones with data, meaning, if your sheet is 100x100 and you only have data in the first 20x20 cells, you'll get a range that covers the entire 10,000 cells, and then calling a forEach means you go through every cell.
A better approach to this would be using getDataRange(), it will return a range that covers the entirety of your data (Documentation). With that you can use a much smaller range and considerably less cells to process.

Google Script - arrayofarrays - incorrect range width

I have a Google Script that imports a TSV file and copies the contents into a Google Spreadsheet (code isn't mine, I copied it off someone else's project after googling a bit).
After some tinkering, the script works great but I keep getting a "Incorrect range width, was 1 but should be 6 (line 46, file "Test")" error.
Now, the TSV file getting imported is basically a list consisting of 6 columns, ie:
fruits vegetables cars countries colors names
pear carrot ford nicaragua yellow frank
After doing some reading up on how arrayofarrays works and googling for the same error, I've concluded that the problem is that somewhere in the importing/outputting process, a final empty row gets added. ie: instead of having 2 rows of 6 columns, I have 2 rows of 6 columns and a 3rd row with 1 empty column.
Apparently arrayofarrays needs all rows to be the exact same width.
As far as I can tell, this extra column is not present in the source file. I also cannot modify the source file so any fix needs to be done by the script itself.
Like I said, the script works but I'd like to a) stop getting emails about errors in my script, and b) understand how to fix this.
Any suggestions?
Thanks in advance.
Here's the code (there's some commented lines where I tried some fixes which clearly didn't work but I'm leaving them there for the sake of completion).
function getFileIDFromName(fileName) {
// Return the file ID for the first
// matching file name.
// NB: DocsList has been deprecated as of 2014-12-11!
// Using DriveApp instead.
var files = DriveApp.getFiles();
while (files.hasNext()) {
var file = files.next();
if (file.getName() === fileName) {
return file.getId();
}
}
return;
}
function getTsvFileAsArrayOfArays(tsvFileID) {
// Read the file into a single string.
// Split on line char.
// Loop over lines yielding arrays by
// splitting on tabs.
// Return an array of arrays.
var txtFile = DriveApp.getFileById(tsvFileID),
fileTextObj = txtFile.getAs('text/plain'),
fileText = fileTextObj.getDataAsString(),
lines = fileText.split('\n'),
lines2DArray = [];
lines.forEach(function (line) {
lines2DArray.push(line.split('\t'));
});
return lines2DArray;
}
function writeArrayOfArraysToSheet(tsvFileID) {
// Target range dimensions are determine
// from those of the nested array.
var sheet = SpreadsheetApp.getActiveSheet(),
arrayOfArrays = getTsvFileAsArrayOfArays(tsvFileID),
dimensions = {rowCount: Math.floor(arrayOfArrays.length),
colCount: Math.floor(arrayOfArrays[0].length)},
targetRng;
sheet.clearContents();
targetRng = sheet.getRange(1, 1,
dimensions.rowCount,
dimensions.colCount);
// .setValues(arrayOfArrays);
//targetRng = sheet.getRange(1, 1, arrayOfArrays.length, 6).setValues(arrayOfArrays);
//targetRng.setValues(arrayOfArrays);
targetRng.setValues(arrayOfArrays);
}
function runTsv2Spreadsheet() {
// Call this function from the Script Editor.
var fileName,
fileID,
file;
//fileName = Browser.inputBox('Enter a TSV file name:',
// Browser.Buttons.OK_CANCEL);
//fileID = getFileIDFromName(fileName);
//Logger.log(fileID)
fileID = "0B-9wMGgdNc6CdWxSNGllNW5FZWM"
if (fileID) {
writeArrayOfArraysToSheet(fileID);
}
}
Don't know if this is helpful. But, have you tried the appendRow() method for Google sheets? You can manipulate your array into the item format you want, then loop through it appending into the next empty row in the sheet.
In the past I looped though arrays trying to figure out the positions, but then started using the appendRow() method, and it made things easier for me.