Lua - How to analyse a .csv export to show the highest, lowest and average values etc - csv

Using Lua, i’m downloading a .csv file and then taking the first line and last line to help me validate the time period visually by the start and end date/times provided.
I’d also like to scan through the values and create a variety of variables e.g the highest, lowest and average value reported during that period.
The .csv is formatted in the following way..
created_at,entry_id,field1,field2,field3,field4,field5,field6,field7,field8
2021-04-16 20:18:11 UTC,6097,17.5,21.1,20,20,19.5,16.1,6.7,15.10
2021-04-16 20:48:11 UTC,6098,17.5,21.1,20,20,19.5,16.3,6.1,14.30
2021-04-16 21:18:11 UTC,6099,17.5,21.1,20,20,19.6,17.2,5.5,14.30
2021-04-16 21:48:11 UTC,6100,17.5,21,20,20,19.4,17.9,4.9,13.40
2021-04-16 22:18:11 UTC,6101,17.5,20.8,20,20,19.1,18.5,4.4,13.40
2021-04-16 22:48:11 UTC,6102,17.5,20.6,20,20,18.7,18.9,3.9,12.40
2021-04-16 23:18:11 UTC,6103,17.5,20.4,19.5,20,18.4,19.2,3.5,12.40
And my code to get the first and last line is as follows
print("Part 1")
print("Start : check 2nd and last row of csv")
local ctr = 0
local i = 0
local csvfilename = "/home/pi/shared/feed12hr.csv"
local hFile = io.open(csvfilename, "r")
for _ in io.lines(csvfilename) do ctr = ctr + 1 end
print("...... Count : Number of lines downloaded = " ..ctr)
local linenumbera = 2
local linenumberb = ctr
for line in io.lines(csvfilename) do i = i + 1
if i == linenumbera then
secondline = line
print("...... 2nd Line is = " ..secondline) end
if i == linenumberb then
lastline = line
print("...... Last line is = " ..lastline)
-- return line
end
end
print("End : Extracted 2nd and last row of csv")
But I now plan to pick a column, ideally by name (as I’d like to be able to use this against other .csv exports that are of a similar structure.) And get the .csv into a table/array...
I’ve found an option for that here - Csv file to a Lua table and access the lines as new table or function()
See below..
#!/usr/bin/lua
print("Part 2")
print("Start : Convert .csv to table")
local csvfilename = "/home/pi/shared/feed12hr.csv"
local csv = io.open(csvfilename, "r")
local items = {} -- Store our values here
local headers = {} --
local first = true
for line in csv:gmatch("[^\n]+") do
if first then -- this is to handle the first line and capture our headers.
local count = 1
for header in line:gmatch("[^,]+") do
headers[count] = header
count = count + 1
end
first = false -- set first to false to switch off the header block
else
local name
local i = 2 -- We start at 2 because we wont be increment for the header
for field in line:gmatch("[^,]+") do
name = name or field -- check if we know the name of our row
if items[name] then -- if the name is already in the items table then this is a field
items[name][headers[i]] = field -- assign our value at the header in the table with the given name.
i = i + 1
else -- if the name is not in the table we create a new index for it
items[name] = {}
end
end
end
end
print("End : .csv now in table/array structure")
But I’m getting the following error ??
pi#raspberrypi:/ $ lua home/pi/Documents/csv_to_table.lua
Part 2
Start : Convert .csv to table
lua: home/pi/Documents/csv_to_table.lua:12: attempt to call method 'gmatch' (a nil value)
stack traceback:
home/pi/Documents/csv_to_table.lua:12: in main chunk
[C]: ?
pi#raspberrypi:/ $
Any ideas on that ?
I can confirm that the .csv file is there ?
Once everything (hopefully) is in a table - I then want to be able to generate a list of variables based on the information in a chosen column, which I can then use and send within a push notification or email (which I already have the code for).
The following is what I’ve been able to create so far, but I would appreciate any/all help to do more analysis of the values within the chosen column so I can see all things like get highest, lowest, average etc.
print("Part 3")
print("Start : Create .csv analysis values/variables")
local total = 0
local count = 0
for name, item in pairs(items) do
for field, value in pairs(item) do
if field == "cabin" then
print(field .. " = ".. value)
total = total + value
count = count + 1
end
end
end
local average = tonumber(total/count)
local roundupdown = math.floor(average * 100)/100
print(count)
print(total)
print(total/count)
print(rounddown)
print("End : analysis values/variables created")

io.open returns a file handle on success. Not a string.
Hence
local csv = io.open(csvfilename, "r")
--...
for line in csv:gmatch("[^\n]+") do
--...
will raise an error.
You need to read the file into a string first.
Alternatively can iterate over the lines of a file using file:lines(...) or io.lines as you already do in your code.
local csv = io.open(csvfilename, "r")
if csv then
for line in csv:lines() do
-- ...
You're iterating over the file more often than you need to.
Edit:
This is how you could fill a data table while calculating the maxima for each row on the fly. This assumes you always have valid lines! A proper solution should verify the data.
-- prepare a table to store the minima and maxima in
local colExtrema = {min = {}, max = {}}
local rows = {}
-- go over the file linewise
for line in csvFile:lines() do
-- split the line into 3 parts
local timeStamp, id, dataStr = line:match("([^,]+),(%d+),(.*)")
-- create a row container
local row = {timeStamp = timeStamp, id = id, data = {}}
-- fill the row data
for val in dataStr:gmatch("[%d%.]+") do
table.insert(row.data, val)
-- find the biggest value so far
-- our initial value is the smallest number possible
local oldMax = colExtrema[#row.data].max or -math.huge
-- store the bigger value as the new maximum
colExtrema.max[#row.data] = math.max(val, oldMax)
end
-- insert row data
table.insert(rows, row)
end

Related

Lua - Match pattern for CSV import to array, that factors in empty values (two commas next to each other)

I have been using the following Lua code for a while to do simply csv to array conversions, but everything previously had a value in every column, but this time on a csv formatted bank statement there are empty values, which this does not handle.
Here’s an example csv, with debit and credits.
Transaction Date,Transaction Type,Sort Code,Account Number,Transaction Description,Debit Amount,Credit Amount,Balance
05/04/2022,DD,'11-70-79,6033606,Refund,,10.00,159.57
05/04/2022,DEB,'11-70-79,6033606,Henry Ltd,30.00,,149.57
05/04/2022,SO,'11-70-79,6033606,NEIL PARKS,20.00,,179.57
01/04/2022,FPO,'11-70-79,6033606,MORTON GREEN,336.00,,199.57
01/04/2022,DD,'11-70-79,6033606,WORK SALARY,,100.00,435.57
01/04/2022,DD,'11-70-79,6033606,MERE BC,183.63,,535.57
01/04/2022,DD,'11-70-79,6033606,ABC LIFE,54.39,,719.20
I’ve tried different patterns (https://www.lua.org/pil/20.2.html), but none seem to work, I’m beginning to think I can’t fix this via the pattern as it’ll break how it works for the rest? I appreciate it if anyone can share how they would approach this…
local csvfilename = "/mnt/nas/Fireflyiii.csv"
local MATCH_PATTERN = "[^,]+"
local function create_array_from_file(csvfilename)
local file = assert(io.open(csvfilename, "r"))
local arr = {}
for line in file:lines() do
local row = {}
for match in string.gmatch(line, MATCH_PATTERN) do
table.insert(row, match)
end
table.insert(arr, row)
end
return arr
end

insert Knn csv into table LUA

I'm trying to load csv containing knn data (3 columns no names)
e.g
4 3 a
1 3 a
3 3 a
4 5 b
I have been able to load the file into a string.
When I try to move that into a table I get no errors, however when I print the table to screen I get values of nil.
I tried changing contents of file which gives the same result and if changed to (knn_data) I get the path address of the csv in all keys.
I'm trying to get the csv data to appear within the indexed table and in its 3 columns.
Here is the code:
--load kNN file.
local knn_data = system.pathForFile("knn.csv", system.ResourceDirectory)
local file, errorString = io.open(knn_data, "r")
if not file then
print("File Error: File Unavailable")
else
local contents = file:read("*a")
print(contents)
io.close(file)
end
file = nil
-- load data into table
dataset = {}
for line in io.lines(knn_data) do
dataset[#dataset+1] = (contents)
Previously attached screenshot of code
...
else
local contents= file:read("*a")
print(contents)
--io.close(file)
end
contents is a local variable in your else statement.
Outside of it, contents is nil.
dataset = {}
for line in io.lines(iknn_data) do
dataset[#dataset+1] = (contents)
So dataset[#dataset+1]= (contents) is equivalent to dataset[#dataset+1]= nil
Within that generic for loop, line contains the line read from the file. So actually you should work with that.

Upload contents of CSV as new maximum stock position in Exact Online

I want to upload the contents of a CSV file as new values in Exact Online data set using for instance the following SQL statement:
update exactonlinerest..ItemWarehouses
set maximumstock=0
where id='06071a98-7c74-4c26-9dbe-1d422f533246'
and maximumstock != 0
I can retrieve the contents of the file using:
select *
from files('C:\path\SQLScripts', '*.csv', true)#os fle
join read_file_text(fle.file_path)#os
But seem unable to change the multi-line text in the file_contents field to separate lines or records.
How can I split the file_contents's field into multi lines (for instance using 'update ...' || VALUE and then running it through ##mydump.sql or directly using insert into / update statement)?
For now I've been able to solve it using regular expressions and then loading generated SQL statement into the SQL engine as follows:
select regexp_replace(rft.file_contents, '^([^,]*),([^,]*)(|,.*)$', 'update exactonlinerest..ItemWarehouses set maximumstock = $1 where code = $2 and maximumstock != $1;' || chr(13), 1, 0, 'm') stmts
, 'dump2.sql' filename
from files('C:\path\SQLScripts', '*.csv', true)#os fle
join read_file_text(fle.file_path)#os rft
local export documents in stmts to "c:\path\sqlscripts" filename column filename
#c:\hantex\path\dump2.sql
However, it is error prone when I have a single quote in the article code.

Dealing with currency values in PIG - pigstorage

I have 2 column CSV file loaded in HDFS. Column 1 is a Model name, column 2 is a price in $. Example - Model: IE33, Price: $52678.00
When I run the following script, the price values all return as a two digit result example $52.
ultraPrice = LOAD '/user/maria_dev/UltrasoundPrice.csv' USING PigStorage(',') AS (
Model, Price);
dump ultraPrice;
All my values are between $20000 and $60000. I don't know why it is being cut off.
If I change the CSV file and remove the $ from the price values everything works fine, but I know there has to be a better way.
Note that in your load statement you are not specifying the datatype.By default the model and price will be of type bytearray and hence the discrepancy.
You can either remove the $ from the csv file or load the data as chararray and replace the $ sign and cast it into float.
A = LOAD '/user/maria_dev/UltrasoundPrice.csv' USING TextLoader() as (line:chararray);
A1 = FOREACH A GENERATE REPLACE(line,'([^a-zA-Z0-9.,\\s]+)','');
B = FOREACH A1 GENERATE FLATTEN(STRSPLIT($0,','));
B1 = FOREACH B GENERATE $0 as Model,(float)$1 as Price;
DUMP B1;

How to import comma delimited text file into datawindow (powerbuilder 11.5)

Hi good day I'm very new to powerbuilder and I'm using PB 11.5
Can someone know how to import comma delimited text file into datawindow.
Example Text file
"1234","20141011","Juan, Delacruz","Usa","001992345456"...
"12345","20141011","Arc, Ino","Newyork","005765753256"...
How can I import the third column which is the full name and the last column which is the account number. I want to transfer the name and account number into my external data window. I've tried to use the ImportString(all the rows are being transferred in one column only). I have three fields in my external data window.the Name and Account number.
Here's the code
ls_File = dw_2.Object.file_name[1]
li_FileHandle = FileOpen(ls_File)
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
li_FileRead = FileRead(li_FileHandle, ls_Text)
ll_row = dw_1.ImportString(ls_Text,1)
Loop.
Please help me with the code! Thank You
It seems that PB expects by default a tab-separated csv file (while the 'c' from 'csv' stands for 'coma'...).
Add the csv! enumerated value in the arguments of ImportString() and it should fix the point (it does in my test box).
Also, the columns defined in your dataobject must match the columns in the csv file (at least for the the first columns your are interested in). If there are mode columns in the csv file, they will be ignored. But if you want to get the 1st (or 2nd) and 3rd columns, you need to define the first 3 columns. You can always hide the #1 or #2 if you do not need it.
BTW, your code has some issues :
you should always test the return values of function calls like FileOpen() for stopping processing in case of non-existent / non-readable file
You are reading the text file twice for the first row: once before the while and another inside of the loop. Or maybe it is intended to ignore a first line with column headers ?
FWIF, here is a working code based on yours:
string ls_file = "c:\dev\powerbuilder\experiment\data.csv"
string ls_text
int li_FileHandle, li_fileread, li_count
long ll_row
li_FileHandle = FileOpen(ls_File)
if li_FileHandle < 1 then
return
end if
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
ll_row = dw_1.ImportString(csv!,ls_Text,1)
li_FileRead = FileRead(li_FileHandle, ls_Text)//read next line
Loop
fileclose(li_fileHandle)
use datawindow_name.importfile(CSV!,file_path) method.