Unable to Extract simple Csv file using U-SQL - csv

I have this csv file,
Almost all the records are getting processed fine, however there are two cases in which i am experiencing an issue.
Case 1:
A record containing quotes within quotes:
"some data "some data" some data"
Case 2:
A record containing comma within quotes:
"some data, some data some data"
i have looked into this issue, and got my way around looking into quoting parameter of the extractor, but i have observed that setting (quoting:false) solves case 1 and fails for case 2 and setting (quoting:true) solves case 2 but fails for case 1.
constraints: There is no room for changing the data file, the future data will be tailored accordingly but for this existing data i have to resolve this.

Try this, import records as one row and fix the row text using double quotes (do the same for the commas):
DECLARE #input string = #"/Samples/Data/Sample1.csv";
DECLARE #output string = #"/Output/Sample1.txt";
// Import records as one row
#data =
EXTRACT rowastext string
FROM #input
USING Extractors.Text('\n', quoting: false );
// Fix the row text using double quotes
#query =
SELECT Regex.Replace(rowastext, "([^,])\"([^,])", "$1\"\"$2") AS rowascsv
FROM #data;
OUTPUT #query
TO #output
USING Outputters.Csv(quoting : false);

Related

Lua - Match pattern for CSV import to array, that factors in empty values (two commas next to each other)

I have been using the following Lua code for a while to do simply csv to array conversions, but everything previously had a value in every column, but this time on a csv formatted bank statement there are empty values, which this does not handle.
Here’s an example csv, with debit and credits.
Transaction Date,Transaction Type,Sort Code,Account Number,Transaction Description,Debit Amount,Credit Amount,Balance
05/04/2022,DD,'11-70-79,6033606,Refund,,10.00,159.57
05/04/2022,DEB,'11-70-79,6033606,Henry Ltd,30.00,,149.57
05/04/2022,SO,'11-70-79,6033606,NEIL PARKS,20.00,,179.57
01/04/2022,FPO,'11-70-79,6033606,MORTON GREEN,336.00,,199.57
01/04/2022,DD,'11-70-79,6033606,WORK SALARY,,100.00,435.57
01/04/2022,DD,'11-70-79,6033606,MERE BC,183.63,,535.57
01/04/2022,DD,'11-70-79,6033606,ABC LIFE,54.39,,719.20
I’ve tried different patterns (https://www.lua.org/pil/20.2.html), but none seem to work, I’m beginning to think I can’t fix this via the pattern as it’ll break how it works for the rest? I appreciate it if anyone can share how they would approach this…
local csvfilename = "/mnt/nas/Fireflyiii.csv"
local MATCH_PATTERN = "[^,]+"
local function create_array_from_file(csvfilename)
local file = assert(io.open(csvfilename, "r"))
local arr = {}
for line in file:lines() do
local row = {}
for match in string.gmatch(line, MATCH_PATTERN) do
table.insert(row, match)
end
table.insert(arr, row)
end
return arr
end

Regex/Gsub removing all nested double quotes instead of just the outer ones

I want to import the following CSV into rails db:
"[{""id"":""actions"",""name"":""app"",""description""}]"
After importing, I always get:
"{\"id\":\"actions\",\"name\":\"app\",\"description\"]}"
I want the import to looks like this:
[{"id":,"actions":,"name":,"description",...}]"
I tried using .gsub!(/"/,'') but this returns:
"[{id:actions,name:actions,description:}]"
The issue is that all the quote marks have been removed so its just id insted of "id"
My code is:
def import_app_version
path = Rails.root.join('db', 'csv_export', 'app_versions.csv')
counter = 0
puts "Inserts on table app_versions started..."
CSV.foreach(path, headers: true) do |row|
next if row.to_hash['modules'] == nil
row.to_hash['modules'].gsub!(/"/,'')
next if row.to_hash['deleted_at'] != nil
counter += 1
AppVersion.skip_callbacks = true
AppVersion.create!(row.to_hash)
end
AppVersion.skip_callbacks = false
puts "#{counter} inserts on table app_versions complete"
end
What is the right way to do this so that the import works correctly and the data is imported as it's meant to be?
I have been searching for half a day, and found so many answers but they all end up removing all of the double quotes as displayed above.
Or even better would be, if someone knew a way to import csv with JSON content the right way.
Do not use replace, or do not try to remove quotes manually.
"[{""id"":""actions"",""name"":""app"",""description""}]"
If you read the above data with a csv parser/reader (maybe with a parameter 'quoted fields') all should be fine.
The surrounding quotes will be removed and the inside quotes will be unescaped to one double quote.

Error parsing JSON: more than one document in the input (Redshift to Snowflake SQL)

I'm trying to convert a query from Redshift to Snowflake SQL.
The Redshift query looks like this:
SELECT
cr.creatives as creatives
, JSON_ARRAY_LENGTH(cr.creatives) as creatives_length
, JSON_EXTRACT_PATH_TEXT(JSON_EXTRACT_ARRAY_ELEMENT_TEXT (cr.creatives,0),'previewUrl') as preview_url
FROM campaign_revisions cr
The Snowflake query looks like this:
SELECT
cr.creatives as creatives
, ARRAY_SIZE(TO_ARRAY(ARRAY_CONSTRUCT(cr.creatives))) as creatives_length
, PARSE_JSON(PARSE_JSON(cr.creatives)[0]):previewUrl as preview_url
FROM campaign_revisions cr
It seems like JSON_EXTRACT_PATH_TEXT isn't converted correctly, as the Snowflake query results in error:
Error parsing JSON: more than one document in the input
cr.creatives is formatted like this:
"[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]"
It seems to me that you are not working with valid JSON data inside Snowflake.
Please review your file format used for the copy into command.
If you open the "JSON" text provided in a text editor , note that the information is not parsed or formatted as JSON because of the quoting you have. Once your issue with double quotes / escaped quotes is handled, you should be able to make good progress
Proper JSON on Left || Original Data on Right
If you are not inclined to reload your data, see if you can create a Javascript User Defined Function to remove the quotes from your string, then you can use Snowflake to process the variant column.
The following code is working POJO that can be used to remove the doublequotes for you.
var textOriginal = '[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]';
function parseText(input){
var a = input.replaceAll('""','\"');
a = JSON.parse(a);
return a;
}
x = parseText(textOriginal);
console.log(x);
For anyone else seeing this double double quote issue in JSON fields coming from CSV files in a Snowflake external stage (slightly different issue than the original question posted):
The issue is likely that you need to use the FIELD_OPTIONALLY_ENCLOSED_BY setting. Specifically, FIELD_OPTIONALLY_ENCLOSED_BY = '"' when setting up your fileformat.
(docs)
Example of creating such a file format:
create or replace file format mydb.myschema.my_tsv_file_format
type = CSV
field_delimiter = '\t'
FIELD_OPTIONALLY_ENCLOSED_BY = '"';
And example of querying from a stage using this file format:
select
$1 field_one
$2 field_two
-- ...and so on
from '#my_s3_stage/path/to/file/my_tab_separated_file.csv' (file_format => 'my_tsv_file_format')

How to import comma delimited text file into datawindow (powerbuilder 11.5)

Hi good day I'm very new to powerbuilder and I'm using PB 11.5
Can someone know how to import comma delimited text file into datawindow.
Example Text file
"1234","20141011","Juan, Delacruz","Usa","001992345456"...
"12345","20141011","Arc, Ino","Newyork","005765753256"...
How can I import the third column which is the full name and the last column which is the account number. I want to transfer the name and account number into my external data window. I've tried to use the ImportString(all the rows are being transferred in one column only). I have three fields in my external data window.the Name and Account number.
Here's the code
ls_File = dw_2.Object.file_name[1]
li_FileHandle = FileOpen(ls_File)
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
li_FileRead = FileRead(li_FileHandle, ls_Text)
ll_row = dw_1.ImportString(ls_Text,1)
Loop.
Please help me with the code! Thank You
It seems that PB expects by default a tab-separated csv file (while the 'c' from 'csv' stands for 'coma'...).
Add the csv! enumerated value in the arguments of ImportString() and it should fix the point (it does in my test box).
Also, the columns defined in your dataobject must match the columns in the csv file (at least for the the first columns your are interested in). If there are mode columns in the csv file, they will be ignored. But if you want to get the 1st (or 2nd) and 3rd columns, you need to define the first 3 columns. You can always hide the #1 or #2 if you do not need it.
BTW, your code has some issues :
you should always test the return values of function calls like FileOpen() for stopping processing in case of non-existent / non-readable file
You are reading the text file twice for the first row: once before the while and another inside of the loop. Or maybe it is intended to ignore a first line with column headers ?
FWIF, here is a working code based on yours:
string ls_file = "c:\dev\powerbuilder\experiment\data.csv"
string ls_text
int li_FileHandle, li_fileread, li_count
long ll_row
li_FileHandle = FileOpen(ls_File)
if li_FileHandle < 1 then
return
end if
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
ll_row = dw_1.ImportString(csv!,ls_Text,1)
li_FileRead = FileRead(li_FileHandle, ls_Text)//read next line
Loop
fileclose(li_fileHandle)
use datawindow_name.importfile(CSV!,file_path) method.

Incrementing numerical value and changing string in SQL

I have a database that has stored values in a complicated, serialized array where one component is a string and another is the length of the characters of the string, in this format:
s:8:"test.com"
Where "s" holds the character length of the string in the quotations.
I would like to change the string from "test.com" to "testt.com", and I'm using the following statement in SQL:
UPDATE table SET row=(REPLACE (row, 'test.com','testt.com'))
However, this breaks the script in question, because it doesn't update the character length in the "s" preceding the string where "test.com" is stored.
I was wondering if there is a query I can use that would replace the string, and then also increment the value of this "s" preceding to where the replacement occurs, something like this:
UPDATE table SET row=(REPLACE (row, 's:' number 'test.com','s:' number+1 'testt.com'))
Does anyone know if this kind of query is even possible?
UPDATE table set row = concat('s:',length('testt.com'),':"testt.com"');
If you need to change exact string, then use exact query -
UPDATE table SET row = 's:9:"testt.com"' WHERE row = 's:8:"test.com"';
The string is a "serialized string".
If there are multiple strings to be replaced, it might be easier to create a script to handle this.
In PHP, it goes something like this:
$searchfor = serialize('test.com');
$replaceby = serialize('testt.com');
// strip last semicolon from serialized string
$searchfor = trim($searchfor,';');
$replaceby = trim($replaceby,';');
$query = "UPDATE table SET field = '$replaceby' WHERE field = '$searchfor';";
This way, you can create an exact query string with what you need.
Do fill in the proper code for db connection if necessary.