Escape "\" backslash from dataweave for csv output Mule - csv

CSV output is generated from java Map in Dataweave,
Output response adds "\" to every "," present within the values.
All the map values are added inside the double quotes, ex: map.put('key',"key-Value");
Response Received :
Header1, Header2
1234,ABC \,text
7890,XYZ \,text
Expected Response :
Header1, Header2
1234,ABC ,text
7890,XYZ ,text
Header2 should contain "ABC,text" as value without quotes ""
Tried using %output application/csv escape=" ", but this adds extra space to each blank space in the values i.e if the value is "ABC XYZ" then output is "ABC XYZ" (2 spaces in between)
Any suggestion will be helpful...

Embedded commas in data in a comma separated value file must be escaped or there is no way to tell those values apart from field separators. If you want some way to have the commas in your CSV file without escaping them, then you need to use a separator other than a comma.
You Expected Response as shown would not be valid, as you have a two field header, but data lines that would be interpreted as having 3 fields, not 2 fields, one with an embedded comma which is what the data has and is shown in the Response Received table.

I've got the same scenario, where we have comma in the data itself, like in your case Header2 .
To solve the issue, I've just added below
%output application/csv quoteValues=true
above solved my problem, and we got the expected output.

Related

Import RFC4180 files (CSV spec) into snowflake? (Unable to create file format that matches CSV RFC spec)

Summary:
Original question from a year ago: How to escape double quotes within a data when it is already enclosed by double quotes
I have the same need as the original poster: I have a CSV file that matches the CSV RFC spec (my data has double quotes that are properly qualified, my data has commas in it, and my data also has line feeds in it. Excel is able to read it just fine because the file matches the spec and excel properly reads the spec).
Unfortunately I can't figure out how to import files that match the CSV RFC 4180 spec into snowflake. Any ideas?
Details:
We've been creating CSV files that match the RFC 4180 spec for years in order to maximize compatibility across applications and OSes.
Here is a sample of what my data looks like:
KEY,NAME,DESCRIPTION
1,AFRICA,This is a simple description
2,NORTH AMERICA,"This description has a comma, so I have to wrap the whole field in double quotes"
3,ASIA,"This description has ""double quotes"" in it, so I have to qualify the double quotes and wrap the field in double quotes"
4,EUROPE,"This field has a carriage
return so it is wrapped in double quotes"
5,MIDDLE EAST,Simple descriptoin with single ' quote
When opening this file in Excel, Excel properly reads the rows/columns (because excel follows the RFC spec):
In order to import this file into Snowflake, I first try to create a file format and I set the following:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
1
Field optionally enclosed by
Double Quote
Escape Character
"
Escape Unenclosed Field
None
But when go to save the file format, I get this error:
Unable to create file format "CSV_SPEC".
SQL compilation error: value ["] for parameter 'FIELD_OPTIONALLY_ENCLOSED_BY' conflict with parameter 'ESCAPE'
It would appear that I'm missing something? I would think that I must be getting the snowflake configuration wrong. (
While writing up this question and testing all the scenarios I could think of, I found a file format that seems to work:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
1
Field optionally enclosed by
Double Quote
Escape Character
None
Escape Unenclosed Field
None
Same information, but for those that prefer screenshots:
Same information again, but in SQL form:
ALTER FILE FORMAT "DB_NAME"."SCHEMA_NAME"."CSV_SPEC3" SET COMPRESSION = 'NONE' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = 'NONE' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
I don't know why this works, but it does, so, there you.

Exporting data from SSRS to a .csv file adds lots of quotation marks how do I get just one set?

I have a report which is just a simple SELECT statement which generates a list of columns full of data. I want this data to be exported as a CSV file with each datum being enclosed in " quotation marks. I have created a table and used this as my expression
=""""+Fields!Activity_Code.Value+""""
When I run the report inside ReportBuilder 3.0 I get exactly what I'm looking for
No headers and each datum has quotation marks, perfect.
But when I hit export to csv, and then open with notepad I see this.
The headers are in there where they shouldn't be and each datum has 3 quotation marks on each side. What am I doing wrong?
This is perfectly normal.
When csv fields contain a separator or double quotes, the fields are enclosed in double quotes and the quotes inside the fields are escaped with another quote.
Example - the fields:
123
"27" monitor"
456
become:
123,"""27"" monitor""",456
or:
"123","""27"" monitor""","456"
A csv reader/parser should handle this correctly when reading the data (or you could provide a parameter telling the parser that the fields are quoted).
On the other hand, if you just want your fields to be quoted inside the csv (and not visible after opening the file), you can tell the csv generator to quote the fields (or in this case do nothing since the generator seems to be adding quotes already).

Unable to load csv file into Snowflake

Iam getting the below error when I try to load CSV From my system to Snowflake table:
Unable to copy files into table.
Numeric value '"4' is not recognized File '#EMPP/ui1591621834308/snow.csv', line 2, character 25 Row 1, column "EMPP"["SALARY":5] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
You appear to be loading your CSV with the file format option of FIELD_OPTIONALLY_ENCLOSED_BY='"' specified.
This option will allow reading any fields properly quoted with the " character, and even support such fields carrying the delimiter character as well as the " character if properly escaped. Some examples that could be considered valid:
CSV FORM | ACTUAL DATA
------------------------
abc | abc
"abc" | abc
"a,bc" | a,bc
"a,""bc""" | a,"bc"
In particular, notice that the final example follows the specified rule:
When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:
A ""B"" C
If your CSV file carries quote marks within the data but is not necessarily quoting the fields (and delimiters and newlines do not appear within data fields), you can remove the FIELD_OPTIONALLY_ENCLOSED_BY option from your file format definition and just read the file at the delimited (,) fields.
If your CSV does use quoting, ensure that whatever is producing the CSV files is using a valid CSV format writer and not simple string munging, and recreate it with the quotes properly escaped. If the above data example is to be considered valid in quoted form, it must instead appear within the file as "4" or 4.
The error message is saying that you have a value in your file that contains a "4 which is being added into a table that has a number field for that value. Since that isn't a number, it fails. This appears to be happening in your very first row of your file, so you could open it up and take a look at the value. If its just one record, you can add the ON_ERROR = 'CONTINUE' to your command, so that it skips it and moves on.

Spark read csv with comma inside string

536381,22411,JUMBO SHOPPER VINTAGE RED PAISLEY,10,12/1/2010 9:41,1.95,15311,United Kingdom
"536381,82567,""AIRLINE LOUNGE,METAL SIGN"",2,12/1/2010 9:41,2.1,15311,United Kingdom"
536381,21672,WHITE SPOT RED CERAMIC DRAWER KNOB,6,12/1/2010 9:41,1.25,15311,United Kingdom
These lines are example of rows in a csv file.
I'm trying to read it in Databricks, using:
df = spark.read.csv ('file.csv', sep=',', inferSchema = 'true', quote = '"')
but, the line in the middle and other similar are not getting into the right column because of the comma within the string. How can I workaround it?
Set the quote to:
'""'
df = spark.read.csv('file.csv', sep=',', inferSchema = 'true', quote = '""')
It looks like your data has double quotes - so when it's being read it sees the double quotes as being the start and end of the string.
Edit: I'm also assuming the problem comes in with this part:
""AIRLINE LOUNGE,METAL SIGN""
this is not only related to Excel; I have the same issue when retrieving data from a source into Azure Synapse. the comma within one column causes the process to enclose entire column data with double quotes and including double quotes get doubled as shown above, second line (see Retrieve CSV format over https)

how to pass a field contains commas and double quotes both in csv file

My csv file contains 2 fields :1. Common Set and second is ["028-51YYSH89","029-5201KSAL97"]
How can I pass second value in csv file that it should consider as only one field
Currently getting output as Common Set, ["028-51YYSH89"
Please suggest.
["028-51YYSH89","029-5201KSAL97"]
You need to quote this field, and escape the quotes inside the field with an extra quote:
"[""028-51YYSH89"",""029-5201KSAL97""]"