remove special characters " and ' from a column using SSIS - csv

I have a csv which has a column called "Value" and in it there are a combination of strings & special characters.
Eg:
Value
abc
xyz
"
pqr
'
I want to replace the " and ' special characters with empty string "" in the output file.I used the derived column "Replace" function as
Replace(Replace(Value,"'",""),"\"","")
and it doesn't seem to work
tried this: in the derived column
Replace(Replace(Value,"'",""),"\"","")
I expect the output to be in a flat file with value column as
Value
"abc"
"xyz"
""
"pqr"
""

I want to replace the " and ' special characters with empty string ""
If you are looking to remove all quotations from all values then the derived column expression you are using is fine:
Replace(Replace(Value,"'",""),"\"","")
Since you mentioned that it doesn't seem to work, then you have to check many possible causes:
Check that the quotation is ' not `, try using the following expression
Replace(Replace(Replace(Value,"'",""),"\"",""),"`","")
If your goal is that all values in the destination flat file are enclosed within two double quotes: "abc", "" then you should edit the Flat File connection manager of the destination and set " as text qualifier and make sure that columns qualified property is set to true.
Text Qualifier in SSIS Example
Meaning of TextQualified attribute in flat file connections
As #Srikarmogaliraju mentioned, make sure that the derived column output is mapped in the destination

Related

Import RFC4180 files (CSV spec) into snowflake? (Unable to create file format that matches CSV RFC spec)

Summary:
Original question from a year ago: How to escape double quotes within a data when it is already enclosed by double quotes
I have the same need as the original poster: I have a CSV file that matches the CSV RFC spec (my data has double quotes that are properly qualified, my data has commas in it, and my data also has line feeds in it. Excel is able to read it just fine because the file matches the spec and excel properly reads the spec).
Unfortunately I can't figure out how to import files that match the CSV RFC 4180 spec into snowflake. Any ideas?
Details:
We've been creating CSV files that match the RFC 4180 spec for years in order to maximize compatibility across applications and OSes.
Here is a sample of what my data looks like:
KEY,NAME,DESCRIPTION
1,AFRICA,This is a simple description
2,NORTH AMERICA,"This description has a comma, so I have to wrap the whole field in double quotes"
3,ASIA,"This description has ""double quotes"" in it, so I have to qualify the double quotes and wrap the field in double quotes"
4,EUROPE,"This field has a carriage
return so it is wrapped in double quotes"
5,MIDDLE EAST,Simple descriptoin with single ' quote
When opening this file in Excel, Excel properly reads the rows/columns (because excel follows the RFC spec):
In order to import this file into Snowflake, I first try to create a file format and I set the following:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
1
Field optionally enclosed by
Double Quote
Escape Character
"
Escape Unenclosed Field
None
But when go to save the file format, I get this error:
Unable to create file format "CSV_SPEC".
SQL compilation error: value ["] for parameter 'FIELD_OPTIONALLY_ENCLOSED_BY' conflict with parameter 'ESCAPE'
It would appear that I'm missing something? I would think that I must be getting the snowflake configuration wrong. (
While writing up this question and testing all the scenarios I could think of, I found a file format that seems to work:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
1
Field optionally enclosed by
Double Quote
Escape Character
None
Escape Unenclosed Field
None
Same information, but for those that prefer screenshots:
Same information again, but in SQL form:
ALTER FILE FORMAT "DB_NAME"."SCHEMA_NAME"."CSV_SPEC3" SET COMPRESSION = 'NONE' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = 'NONE' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
I don't know why this works, but it does, so, there you.

SSIS - How to insert all values inside ""

there is a requirement for all the values integrating from SQL Server into a flat file (.csv) being inserted between a double quotation mark, such as 123 to be inserted as "123".
I am having such difficulty with this, i tried the derived columns with the script "\"\"" + [columnName] + "\"\"" but does not work at all.
Please be advised i need the column headers to have the same "" as well.
Many thanks!
If you mean you want to export data from SQL Server into a csv file using SSIS, and that you want the values to be double quoted, you just need to set the Text Qualifier property of your Destination connection to a double quote " character.

Unable to load csv file into Snowflake

Iam getting the below error when I try to load CSV From my system to Snowflake table:
Unable to copy files into table.
Numeric value '"4' is not recognized File '#EMPP/ui1591621834308/snow.csv', line 2, character 25 Row 1, column "EMPP"["SALARY":5] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
You appear to be loading your CSV with the file format option of FIELD_OPTIONALLY_ENCLOSED_BY='"' specified.
This option will allow reading any fields properly quoted with the " character, and even support such fields carrying the delimiter character as well as the " character if properly escaped. Some examples that could be considered valid:
CSV FORM | ACTUAL DATA
------------------------
abc | abc
"abc" | abc
"a,bc" | a,bc
"a,""bc""" | a,"bc"
In particular, notice that the final example follows the specified rule:
When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:
A ""B"" C
If your CSV file carries quote marks within the data but is not necessarily quoting the fields (and delimiters and newlines do not appear within data fields), you can remove the FIELD_OPTIONALLY_ENCLOSED_BY option from your file format definition and just read the file at the delimited (,) fields.
If your CSV does use quoting, ensure that whatever is producing the CSV files is using a valid CSV format writer and not simple string munging, and recreate it with the quotes properly escaped. If the above data example is to be considered valid in quoted form, it must instead appear within the file as "4" or 4.
The error message is saying that you have a value in your file that contains a "4 which is being added into a table that has a number field for that value. Since that isn't a number, it fails. This appears to be happening in your very first row of your file, so you could open it up and take a look at the value. If its just one record, you can add the ON_ERROR = 'CONTINUE' to your command, so that it skips it and moves on.

How to deal with multiple quotes when loading CSV in power query?

I have several CSV files to combine in one table (files have the same structure), but the files structure is f**ed enough to be problematic.
The first row is ordinary, just headers split by a comma:
Account,Description,Entity,Risk,...
but then the rows with actual data are starting and ending with doublequote ", columns are separated by a comma, but people (full name) has two double-quotes at beginning and end. I understand that it's an escape character to keep the name in one column, but one would be enough.
"1625110,To be Invoiced,587,Normal,""Doe, John"",..."
So what I need to do and don't know how is to remove " from the beginning and end of every row with data and replace "" with " in every line with data.
I need to do it in Power Query because there will be more of similar CSV files over time and I don't want to clean them manually.
Any ideas?
I was trying with simple:
= Table.AddColumn(#"Removed Other Columns", "Custom", each Csv.Document(
[Content],
[
Delimiter = ",",
QuoteStyle = QuoteStyle.Csv
]
))
Try loading to a single column first, replace values to remove extra quotes, and then split by ",".
Here's what that looks like for loading a single file:
let
Source = Csv.Document(File.Contents("filepath\file.csv"),[Delimiter="#(tab)"]),
ReplaceQuotes = Table.ReplaceValue(Source,"""""","""",Replacer.ReplaceText,{"Column1"}),
SplitIntoColumns = Table.SplitColumn(ReplaceQuotes, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv)),
#"Promoted Headers" = Table.PromoteHeaders(SplitIntoColumns, [PromoteAllScalars=true])
in
#"Promoted Headers"
I used the tab delimiter to keep it from splitting in the first step.

Changing field separator

I can see that using maatkit I can export the data as comma or tab separated values. But is there anyway to change the field separator?
http://www.maatkit.org/doc/mk-parallel-dump.html
I have a table that has comma as well as tabs in the data. Besides I need to process the data using awk that does not seem to work with certain fields data. I want to change the separator while dumping data using maatkit. Is it possible?
Commas present in data should not cause any problem: the text fields (and maybe all fields) are within quotes:
"field 1 value", "field 2 value", 3.38823, "field 4 value"