Google Apps Script Utilities.parseCsv() change decimal and thousand separator - csv

I am new to GAS and I am struggling badly with the problem that I have. (I haven't found a similar question on the site that would have solved my problem, therefore I am asking a new one)
Goal: Import CSV from Google Drive into Google Sheets
Problem:
Currencies in the csv file are "1,000.57" --> US format
Currency format that I need "1.000,57" --> European format
Currently with the Utilities.parseCsv() the formats just gets messed up and the currencies are plain wrong.
Question: Is there a way to change "," to "." and "." to "," during the parse? If so, will there be further problems since the delimiter for the csv is "," as well.
I already found some code snippets in the web (not my code: props to spreadsheet.dev) and tried to change the following, but it does not seem to work:
//Imports a CSV file in Google Drive into the Google Sheet
function importCSVFromDrive() {
var fileName = promptUserForInput("Please enter the name of the CSV file to import from Google Drive:");
var files = findFilesInDrive(fileName);
if(files.length === 0) {
displayToastAlert("No files with name \"" + fileName + "\" were found in Google Drive.");
return;
} else if(files.length > 1) {
displayToastAlert("Multiple files with name " + fileName +" were found. This program does not support picking the right file yet.");
return;
}
var file = files[0];
var csvString = file.getBlob().getDataAsString()
var escapedString = csvString.replace(",",".")
.replace(".",",");
var contents = Utilities.parseCsv(escapedString);
var sheetName = writeDataToSheet(contents);
displayToastAlert("The CSV file was successfully imported into " + sheetName + ".");
}
//Prompts the user for input and returns their response
function promptUserForInput(promptText) {
var ui = SpreadsheetApp.getUi();
var prompt = ui.prompt(promptText);
var response = prompt.getResponseText();
return response;
}
//Returns files in Google Drive that have a certain name.
function findFilesInDrive(filename) {
var files = DriveApp.getFilesByName(filename);
var result = [];
while(files.hasNext())
result.push(files.next());
return result;
}
//Inserts a new sheet and writes a 2D array of data in it
function writeDataToSheet(data) {
var ss = SpreadsheetApp.getActive();
sheet = ss.insertSheet();
sheet.getRange(1, 1, data.length, data[0].length).setValues(data);
return sheet.getName();
}
What am I doing wrong?

Utilities.parseCsv() is a hot mess. I recommend you not to use it. Instead, try the Advanced Google Service - Drive V2
You will need to add Drive under Services.
Here is the code snippet you will need:
function insertFromCsv(fileName) {
var blob = DriveApp.getFilesByName(fileName).next().getBlob();
var tempFile = Drive.Files.insert({title: "tempSheet"}, blob, {
convert: true
});
var tempSsId = tempFile.getId();
var tempSheet = SpreadsheetApp.openById(tempSsId).getSheets()[0];
var newSheet = tempSheet.copyTo(SpreadsheetApp.getActive());
DriveApp.getFileById(tempSsId).setTrashed(true);
return newSheet.getName();
}
and change importCSVFromDrive as follows:
function importCSVFromDrive() {
var fileName = promptUserForInput("Please enter the name of the CSV file to import from Google Drive:");
var files = findFilesInDrive(fileName);
if(files.length === 0) {
displayToastAlert("No files with name \"" + fileName + "\" were found in Google Drive.");
return;
} else if(files.length > 1) {
displayToastAlert("Multiple files with name " + fileName +" were found. This program does not support picking the right file yet.");
return;
}
var file = files[0];
// var csvString = file.getBlob().getDataAsString()
// var escapedString = csvString.replace(",",".")
// .replace(".",",");
// var contents = Utilities.parseCsv(escapedString);
// var sheetName = writeDataToSheet(contents);
var fileName = file.getName();
var sheetName = insertFromCsv(fileName);
displayToastAlert("The CSV file was successfully imported into " + sheetName + ".");
}

So sample of your CSV data is here: https://drive.google.com/file/d/1ASevYOWtu8YL6YA4w-UqDuNXAS0RfaJF/view?usp=sharing
It looks for me like just plain CSV data:
Trades,Header,DataDiscriminator,Asset Category,Currency,Symbol,Date/Time,Quantity,T.Price
Trades,Data,Order,Stocks,USD,ALGN,"2021-06-28,10:50:27",3,627.17,621.52,-1881.51,-1,1882.51,0,-16.95,O
Trades,Data,Order,Stocks,USD,AMAT,"2021-06-29,09:38:53",14,142.15,141.92,-1990.1,-1,1991.1,0,-3.22,O
Trades,Data,Order,Stocks,USD,APH,"2021-07-02,09:30:01",30,69.438,69.95,-2083.14,-1,2084.14,0,15.36,O
I see no "european" formatted numbers out there.
I believe it can be parsed correctly into this:
Trades
Header
DataDiscriminator
Asset Category
Currency
Symbol
Date/Time
Quantity
T.Price
Trades
Data
Order
Stocks
USD
ALGN
"2021-06-28,10:50:27"
3
627.17
621.52
-1881.51
-1
1882.51
0
-16.95
O
Trades
Data
Order
Stocks
USD
AMAT
"2021-06-29,09:38:53"
14
142.15
141.92
-1990.1
-1
1991.1
0
-3.22
O
Trades
Data
Order
Stocks
USD
APH
"2021-07-02,09:30:01"
30
69.438
69.95
-2083.14
-1
2084.14
0
15.36
O
I haven't tried to do it with Utilities.parseCsv(), I wrote my own little csv-parser, just to be sure that the task is doable and my assumptions are correct:
var s = `Trades,Header,DataDiscriminator,Asset Category,Currency,Symbol,Date/Time,Quantity,T.Price
Trades,Data,Order,Stocks,USD,ALGN,"2021-06-28,10:50:27",3,627.17,621.52,-1881.51,-1,1882.51,0,-16.95,O
Trades,Data,Order,Stocks,USD,AMAT,"2021-06-29,09:38:53",14,142.15,141.92,-1990.1,-1,1991.1,0,-3.22,O
Trades,Data,Order,Stocks,USD,APH,"2021-07-02,09:30:01",30,69.438,69.95,-2083.14,-1,2084.14,0,15.36,O`;
// replace ',' with '_' inside quotes
s.match(/("[^,]+),(.+")/g).forEach(t=>s=s.split(t).join(t.replace(/,/g,'_')));
// replace ',' with '\t', replace '_' with ',' and split string into 2-d array
var array = s.replace(/,/g,"\t").replace(/_/g,',').split('\n').map(x => x.split('\t'));
console.table(array);
Output:
┌─────────┬──────────┬──────────┬─────────────────────┬──────────────────┬────────────┬──────────┬─────────────────────────┬────────────┬───────────┬──────────┬────────────┬──────┬───────────┬─────┬──────────┬─────┐
│ (index) │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │ 12 │ 13 │ 14 │ 15 │
├─────────┼──────────┼──────────┼─────────────────────┼──────────────────┼────────────┼──────────┼─────────────────────────┼────────────┼───────────┼──────────┼────────────┼──────┼───────────┼─────┼──────────┼─────┤
│ 0 │ 'Trades' │ 'Header' │ 'DataDiscriminator' │ 'Asset Category' │ 'Currency' │ 'Symbol' │ 'Date/Time' │ 'Quantity' │ 'T.Price' │ │ │ │ │ │ │ │
│ 1 │ 'Trades' │ 'Data' │ 'Order' │ 'Stocks' │ 'USD' │ 'ALGN' │ '"2021-06-28,10:50:27"' │ '3' │ '627.17' │ '621.52' │ '-1881.51' │ '-1' │ '1882.51' │ '0' │ '-16.95' │ 'O' │
│ 2 │ 'Trades' │ 'Data' │ 'Order' │ 'Stocks' │ 'USD' │ 'AMAT' │ '"2021-06-29,09:38:53"' │ '14' │ '142.15' │ '141.92' │ '-1990.1' │ '-1' │ '1991.1' │ '0' │ '-3.22' │ 'O' │
│ 3 │ 'Trades' │ 'Data' │ 'Order' │ 'Stocks' │ 'USD' │ 'APH' │ '"2021-07-02,09:30:01"' │ '30' │ '69.438' │ '69.95' │ '-2083.14' │ '-1' │ '2084.14' │ '0' │ '15.36' │ 'O' │
└─────────┴──────────┴──────────┴─────────────────────┴──────────────────┴────────────┴──────────┴─────────────────────────┴────────────┴───────────┴──────────┴────────────┴──────┴───────────┴─────┴──────────┴─────┘
If you add range.setValues(array) instead of console.table(array) you probably will get a propper table in your sheet.
Update
To replace 123.45 --> 123,45 in the array you need to add one line at the end:
array = array.map(row => row.map(cell => cell.replace(/(\d)\.(\d)/g, '$1,$2')));

I create a csv file and test here https://docs.google.com/spreadsheets/d/148muW2MhTBwXfppFhO85gQv6GRduvyb5OMHeS68VWH4/edit?usp=sharing
function importCsvFromId() {
var id = '1yc04iLf20k6oVlwSMpWm-xythMPKJoBO';
var csv = DriveApp.getFileById(id).getBlob().getDataAsString();
var csvData = Utilities.parseCsv(csv);
var f = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet()
f.getRange(1, 1, csvData.length, csvData[0].length).setValues(csvData);
}

Related

Polars List Type to Comma Separated String

I have a df that I'd like to groupby and write to csv format. However, one of the columns has a list type that prevents writing the df to csv.
df = pl.DataFrame({"Column A": ["Variable 1", "Variable 2", "Variable 2", "Variable 3", "Variable 3", "Variable 4"],
"Column B": ["AB", "AB", "CD", "AB", "CD", "CD"]})
Which I want to group by as below:
df.groupby(by="Column A").agg(pl.col("Column B").unique())
Output:
shape: (4, 2)
┌────────────┬──────────────┐
│ Column A ┆ Column B │
│ --- ┆ --- │
│ str ┆ list[str] │
╞════════════╪══════════════╡
│ Variable 3 ┆ ["AB", "CD"] │
│ Variable 1 ┆ ["AB"] │
│ Variable 4 ┆ ["CD"] │
│ Variable 2 ┆ ["CD", "AB"] │
└────────────┴──────────────┘
When trying to write the above dataframe to csv it comes up with an error: "ComputeError: CSV format does not support nested data. Consider using a different data format. Got: 'list[str]'"
If trying to convert the list type to pl.Utf8 it leads to an error
(df
.groupby(by="Column A").agg(pl.col("Column B").unique())
.with_columns(pl.col("Column B").cast(pl.Utf8))
)
Output: "ComputeError: Cannot cast list type"
If I try to explode the list in the groupby context:
df.groupby(by="Column A").agg(pl.col("Column B").unique().explode())
The output is not desired:
shape: (4, 2)
┌────────────┬─────────────────────┐
│ Column A ┆ Column B │
│ --- ┆ --- │
│ str ┆ list[str] │
╞════════════╪═════════════════════╡
│ Variable 1 ┆ ["A", "B"] │
│ Variable 3 ┆ ["A", "B", ... "D"] │
│ Variable 2 ┆ ["A", "B", ... "B"] │
│ Variable 4 ┆ ["A", "B", ... "D"] │
└────────────┴─────────────────────┘
What would be the most convenient way for me to groupby and then write to csv?
Desired output written in csv:
shape: (4, 2)
┌────────────┬──────────────┐
│ Column A ┆ Column B │
│ --- ┆ --- │
│ str ┆ list[str] │
╞════════════╪══════════════╡
│ Variable 3 ┆ ["AB", "CD"] │
│ Variable 1 ┆ ["AB"] │
│ Variable 4 ┆ ["CD"] │
│ Variable 2 ┆ ["CD", "AB"] │
└────────────┴──────────────┘
There was a recent discussion about why this is the case.
It is possible to use ._s.get_fmt() to "stringify" the lists:
print(
df
.groupby(by="Column A").agg(pl.col("Column B").unique())
.with_columns(
pl.col("Column B").map(lambda row:
[row._s.get_fmt(n, 0) for n in range(row.len())]
).flatten())
.write_csv(),
end=""
)
Column A,Column B
Variable 3,"[""AB"", ""CD""]"
Variable 1,"[""AB""]"
Variable 4,"[""CD""]"
Variable 2,"[""AB"", ""CD""]"
Another way is using str() as #FObersteiner has suggested.
print(
df.groupby("Column A").agg(
pl.col("Column B")
.unique()
.apply(lambda col: str(col.to_list()))
).write_csv(),
end=""
)
Column A,Column B
Variable 2,"['CD', 'AB']"
Variable 1,['AB']
Variable 3,"['CD', 'AB']"
Variable 4,['CD']
The main probem with "stringifying" lists is - when you read the CSV data back in - you no longer have a list[] type.
import io
pl.read_csv(io.StringIO(
'Column A,Column B\nVariable 4,"[""CD""]"\n'
'Variable 1,"[""AB""]"\nVariable 2,"[""AB"", ""CD""]"\n'
'Variable 3,"[""CD"", ""AB""]"\n'
))
shape: (4, 2)
┌────────────┬──────────────┐
│ Column A | Column B │
│ --- | --- │
│ str | str │
╞════════════╪══════════════╡
│ Variable 4 | ["CD"] │
│ Variable 1 | ["AB"] │
│ Variable 2 | ["AB", "CD"] │
│ Variable 3 | ["CD", "AB"] │
└────────────┴──────────────┘
This is the reason for the recommendation of using an alternative format.

How can I use Julia CSV package rowWriter?

I'm using Julia. I would write a single row again and again on existed CSV file.
I think 'CSV.RowWriter' can make it happen, but I don't know how to use. Can anyone show me an example?
CSV.RowWriter is an iterator that produces consecutive rows of a table as strings. Here is an example:
julia> df = DataFrame(a=1:5, b=11:15)
5×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 11
2 │ 2 12
3 │ 3 13
4 │ 4 14
5 │ 5 15
julia> for row in CSV.RowWriter(df)
#show row
end
row = "a,b\n"
row = "1,11\n"
row = "2,12\n"
row = "3,13\n"
row = "4,14\n"
row = "5,15\n"
You would now just need to write these strings to a file in append mode.
Most likely, since you want to append you want to drop he header. You can do it e.g. like this:
julia> for row in CSV.RowWriter(df, writeheader=false)
#show row
end
row = "1,11\n"
row = "2,12\n"
row = "3,13\n"
row = "4,14\n"
row = "5,15\n"
If you want me to show how to write to a file please comment.
The reason why I do not show it is that you do not need to use CSV.RowWriter to achieve what you want. Just do the following:
CSV.write(file, table, append=true)
EDIT: example of writing with CSV.RowWriter:
julia> using DataFrames, CSV
julia> df = DataFrame(a=[1, 2], b=[3, 4])
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 3
2 │ 2 4
julia> isfile("test.txt") # make sure the file does not exist yet
false
julia> open("test.txt", "w") do io # create a file and write with header as the file does not exist
foreach(row -> print(io, row), CSV.RowWriter(df))
end
julia> readlines("test.txt") # chceck all is as expected
3-element Vector{String}:
"a,b"
"1,3"
"2,4"
julia> open("test.txt", "a") do io # append to file and write without header
foreach(row -> print(io, row), CSV.RowWriter(df, writeheader=false))
end
julia> readlines("test.txt") # check that all is as expected
5-element Vector{String}:
"a,b"
"1,3"
"2,4"
"1,3"
"2,4"

How to plot TimeArray in julia with zoom in for hourly, zoom out for daily/monthly?

There is a sample csv data like (real data is in millisecond percision)
using TimeSeries, Plots
s="DateTime,Open,High,Low,Close,Volume
2020/01/05 16:14:01,20,23,19,20,30
2020/01/05 16:14:11,23,27,19,22,20
2020/01/05 17:14:01,24,28,19,23,10
2020/01/05 18:14:01,25,29,20,24,40
2020/01/06 08:02:01,26,30,22,25,50"
ta=readtimearray(IOBuffer(s),format="yyyy/mm/dd HH:MM:SS")
plot(ta.Volume)
I found the package TimeSeries and Temporal are based on daily plot. Is there any easy way to aggregate them into minutes/hourly/daily/weekly... and plot them?
For the Open value, it should keep the first value during the period.
For the High value, it should be the maximum value during the period.
For the Low value, it should be the minimum value during the period.
For the Close value, it should be the last value during the period.
For the Volume value, it should be the sum value during the period.
I expect it could the volume like tb
s="DateTime,Volume
2020/01/05 16:00:00,50
2020/01/05 17:00:00,10
2020/01/05 18:00:00,40
2020/01/06 08:00:00,50"
tb=readtimearray(IOBuffer(s),format="yyyy/mm/dd HH:MM:SS")
plot(tb.Volume)
Method 1: I found a workable but not perfect method. For example, plot in hourly, by Volume
using DataFrames,Statistics,Dates
df = DataFrame(ta)
df.ms = Date.value.(df.timestamp)
df.hour = df.ms
df.hour = df.ms .÷ (60*60*1000)
df2 = aggregate(df[:, [:hour, :Volume]], :hour, sum)
df2.timestamp = convert.(DateTime, Dates.Millisecond.(df2.hour.*(60*60*1000)))
tb=TimeArray(df2[:,[:timestamp,:Volume_sum]], timestamp=:timestamp)
plot(tb)
the content of tb
4×1 TimeArray{Float64,1,DateTime,Array{Float64,1}} 2020-01-05T16:00:00 to 2020-01-06T08:00:00
│ │ Volume_sum │
├─────────────────────┼────────────┤
│ 2020-01-05T16:00:00 │ 50.0 │
│ 2020-01-05T17:00:00 │ 10.0 │
│ 2020-01-05T18:00:00 │ 40.0 │
│ 2020-01-06T08:00:00 │ 50.0 │
Method 2: There seems a more easy way by floor function
df.hour2 = floor.(df.timestamp, Dates.Hour(1))
df2 = aggregate(df[:, [:hour2, :Volume]], :hour2, sum)
tb=TimeArray(df2[:,[:hour2,:Volume_sum]], timestamp=:hour2)
Method 3: Just use collapse second form syntax
using Statistics
tb1 = collapse(ta[:, :Open], hour, first, first)
tb2 = collapse(ta[:, :High], hour, first, maximum)
tb3 = collapse(ta[:, :Low], hour, first, minimum)
tb4 = collapse(ta[:, :Close], hour, first, last)
tb5 = collapse(ta[:, :Volume], hour, first, sum)
tb = merge(tb1, tb2, tb3, tb4, tb5)

readtable() when a string ends with \

When I read in a csv file containing
"number","text"
1,"row1text\"
2,"row2text"
with the commands
using DataFrames
readtable(filename.csv)
I get a dataframe with only one row. Apparently, the backslash at the end of the text in the first row is a problem. Is this expected behavior? Is there an alternative way where this problem is avoided?
As a side note: The following works fine (i.e. I get two rows) but is obviously impractical for reading in big files
df = csv"""
"number","text"
1,"row1text\"
2,"row2text"
"""
Since the backslash is the escape character by default, it escapes the quote mark and messes everything up. One workaround would be to use the CSV.jl package and specify a different escape character:
julia> using CSV
julia> CSV.read("filename.csv", escapechar = '~')
2×2 DataFrames.DataFrame
│ Row │ number │ text │
├─────┼────────┼─────────────┤
│ 1 │ 1 │ "row1text\" │
│ 2 │ 2 │ "row2text" │
But then you have to make sure the ~ chars are not escaping something else. There might be a better way of doing this, but this would be one hack to get around the problem.
Another way would be to process the data row by row. Here is a way over-complicated example of doing so:
julia> open("filename.csv", "r") do f
for (i, line) in enumerate(eachline(f))
if i == 1
colnames = map(Symbol, split(line, ','))
global df = DataFrame(String, 0, length(colnames))
rename!(df,
Dict([(old_name, new_name) for (old_name, new_name) in zip(names(df), colnames)]))
else
new_row = map(String, split(replace(line, "\\\"", "\""), ','))
# replace quotes around vales
new_row = map(x -> replace(x, "\"", ""), new_row)
push!(df, new_row)
end
end
end
julia> df
2×2 DataFrames.DataFrame
│ Row │ "number" │ "text" │
├─────┼──────────┼────────────┤
│ 1 │ "1" │ "row1text" │
│ 2 │ "2" │ "row2text" │

Readtable() with differing number of columns - Julia

I'm trying to read a CSV file into a DataFrame using readtable(). There is an unfortunate issue with the CSV file in that if the last x columns of a given row are blank, instead of generating that number of commas, it just ends the line. For example, I can have:
Col1,Col2,Col3,Col4
item1,item2,,item4
item5
Notice how in the third line, there is only one entry. Ideally, I would like readtable to fill the values for Col2, Col3, and Col4 with NA, NA, and NA; however, because of the lack of commas and therefore lack of empty strings, readtable() simply sees this as a row that doesn't match the number of columns. If I run readtable() in Julia with the sample CSV above, I get the error "Saw 2 Rows, 2 columns, and 5 fields, * Line 1 has 6 columns". If I add in 3 commas after item5, then it works.
Is there any way around this, or do I have to fix the CSV file?
If the CSV parsing doesn't need too much quote logic, it is easy to write a special purpose parser to handle the case of missing columns. Like so:
function bespokeread(s)
headers = split(strip(readline(s)),',')
ncols = length(headers)
data = [String[] for i=1:ncols]
while !eof(s)
newline = split(strip(readline(s)),',')
length(newline)<ncols && append!(newline,["" for i=1:ncols-length(newline)])
for i=1:ncols
push!(data[i],newline[i])
end
end
return DataFrame(;OrderedDict(Symbol(headers[i])=>data[i] for i=1:ncols)...)
end
Then the file:
Col1,Col2,Col3,Col4
item1,item2,,item4
item5
Would give:
julia> df = bespokeread(f)
2×4 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │ Col4 │
├─────┼─────────┼─────────┼──────┼─────────┤
│ 1 │ "item1" │ "item2" │ "" │ "item4" │
│ 2 │ "item5" │ "" │ "" │ "" │
The answer of Dan Getz is nice, but it converts everything to strings.
The following solution instead "fill" the gap and write a new file (in a memory-efficient way) that can then be normally imported using readtable():
function fillAll(iF,oF,d=",")
open(iF, "r") do i
open(oF, "w") do o # "w" for writing
headerRow = strip(readline(i))
headers = split(headerRow,d)
nCols = length(headers)
write(o, headerRow*"\n")
for ln in eachline(i)
nFields = length(split(strip(ln),d))
write(o, strip(ln))
[write(o,d) for y in 1:nCols-nFields] # write delimiters to match headers
write(o,"\n")
end
end
end
end
fillAll("data.csv","data_out.csv",";")
Even better: just use CSV.jl.
julia> f = IOBuffer("Col1,Col2,Col3,Col4\nitem1,item2,,item4\nitem5"); # or the filename
julia> CSV.read(f)
2×4 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │ Col4 │
├─────┼─────────┼─────────┼───────┼─────────┤
│ 1 │ "item1" │ "item2" │ #NULL │ "item4" │
│ 2 │ "item5" │ #NULL │ #NULL │ #NULL │