ssis text qualifier ignore LF CR in the field - csv

Is there a setting on SSIS to ignore CR and LF in the text qualified segment. I have a CSV file with a comments field where within the text qualified segment of this field there are some line breaks. Is there a way to ignore them within the text qualified segment?

CSV is a simple format that can't handle the line separator appearing inside the line itself.
It's far easier to export the data using a line separator that is guaranteed NOT to appear inside the line itself.
Most export tools allow you to specify the field and record separator. I've used |,§ and even ¶ and ¤ as field and line separators many times to extract data from mainframes that did have newlines and quotes in some fields. It was far easier and faster that trying to parse the text, identifying whether a newline appeared inside multiple quotes or not.

Related

Exporting data from SSRS to a .csv file adds lots of quotation marks how do I get just one set?

I have a report which is just a simple SELECT statement which generates a list of columns full of data. I want this data to be exported as a CSV file with each datum being enclosed in " quotation marks. I have created a table and used this as my expression
=""""+Fields!Activity_Code.Value+""""
When I run the report inside ReportBuilder 3.0 I get exactly what I'm looking for
No headers and each datum has quotation marks, perfect.
But when I hit export to csv, and then open with notepad I see this.
The headers are in there where they shouldn't be and each datum has 3 quotation marks on each side. What am I doing wrong?
This is perfectly normal.
When csv fields contain a separator or double quotes, the fields are enclosed in double quotes and the quotes inside the fields are escaped with another quote.
Example - the fields:
123
"27" monitor"
456
become:
123,"""27"" monitor""",456
or:
"123","""27"" monitor""","456"
A csv reader/parser should handle this correctly when reading the data (or you could provide a parameter telling the parser that the fields are quoted).
On the other hand, if you just want your fields to be quoted inside the csv (and not visible after opening the file), you can tell the csv generator to quote the fields (or in this case do nothing since the generator seems to be adding quotes already).

SSIS Text Qualifier not working on last column

My flat files fields are tab delimited (\t) with a quotation mark (") text qualifier, and each row is linefeed (LF) separated.
My SSIS package works great when
no fields are text qualified
any field EXCEPT the last column is text qualified
When the last column is text qualified my package errors out saying it couldn't find the delimiter for my last column ... any ideas?
In a programmer's life, 3 problems (that often take hours to track down) are certain: permissions, case sensitivity, and line endings.
In my case, it is line endings. When a CRLF is pressed against the text qualifier ("), SSIS apparently doesn't interpret the text qualifier correctly, but does see the line break.
Here's what my setup looked like when I was having issues:
Here's what my setup looked like after changing the column delimiter:
The official answer here then is to change the line endings. The unfortunate side effect of that is to change a package that works on all the other files - leading to a need to convert files with CRLF to LF before hitting this package, or ending up with unsightly workarounds as seen here.
use this application on your SSIS Execute task
http://www.softsea.com/review/U2WIN.html
and put in flat file folder
I tried set TextQualified = 'False' on your last column to see if it helps

How to tell what type of newline is being used in a txt file?

I have a txt file which contains quoted, comma deliminated text, and i am trying to figure out what type of newline is being used.
The reason is because i am trying to import it into mysql server, using local infile, and obviously i need to tell it the correct LINES TERMINATED BY
When i use either \n or \r\n it imports exactly half, of the records only, skipping a line each time.
But when i use \r it imports exactly double, giving me the exact number of rows as all values null, as there are records.
When i open the file in notepad, there is no space in between lines, however, if i open it in a browser, there is a blank line in between each line, as though there is a paragraph there somewhere. Like wise if i choose "open with > Excel" it does not put into columns, and has a blank line between each. The only way to open properly in excel is to use "get external data > From text" and choose comma deliminator.
I provide a couple of lines below exactly by just copying and pasting, and obviously it would be great if someone could let me know the correct settings to use for importing. But i it would be even more great, if there was a way for me to quickly know what type of newline any particular file is using (there is also a blank line at the very end of the file as per the other rows).
"Item No.","Description","Description 2","Customers Price","Home stock","Brand Name","Expected date for delivery","Item Group No.","Item Group Name","Item Product Link (Web)","Item Picture Link (Web)","EAN/UPC","Weight","UNSPSC code","Product type code","Warranty"
"/5PL0006","Drum Unit","DK-23","127.00","32","Kyocera","04/11/2013","800002","Drums","http://uk.product.com/product.aspx?id=%2f5PL0006","http://s.product.eu/products/2_PICTURE-TAKEN.JPG","5711045360824","0.30","44103109","","3M"
"/DK24","DK-24 Drum Unit FS-3750","","119.00","8","Dell","08/11/2013","800002","Drums","http://uk.product.com/product.aspx?id=%2fDK24","http://s.product.eu/products/2_PICTURE-TAKEN.JPG","5711045360718","0.20","44103109","","3M"

Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

Take this XLS file
I then save this XLS file as CSV and then open it up with a text editor. This is what I see:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.
I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.
Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.
And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.
What do I have to do to get Excel and SSIS to get along?
Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.
I find that if I modify the file from this
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
...to this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"
i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?
According to Wikipedia, here are a couple of traits of a CSV file:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...
Update:
If I use Notepad change the input file to
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"
Excel reads it just fine
but SSIS returns
The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Conclusion:
Just like the error message says in your update...
The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.
Do you need to use a comma delimiter.
I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.
1|ABC|AB"C|D,E|F|03|3,2
You have 3 options in my opinion.
Read the data into a stage table.
Run any update queries you need on the columns
Now select your data from the stage table and output it to a flat file.
OR
Use pipes are you delimiters.
OR
Do all of this in a C# application and build it in code.
You could send the row to a script in SSIS and parse and build the file you want there as well.
Using text qualifiers and "character" delimited fields is problematic for sure.
Have Fun!

How to remove double quotes surrounding the text while importing a CSV file?

I have data which resembles the following:
"D.STEIN","DS","01","ALTRES","TTTTTTFFTT"
"D.STEIN","DS","01","APCASH","TTTTTTFFTT"
"D.STEIN","DS","01","APINH","TTTTTTFFTT"
"D.STEIN","DS","01","APINV","TTTTTTFFTT"
"D.STEIN","DS","01","APMISC","TTTTTTFFTT"
"D.STEIN","DS","01","APPCHK","TTTTTTFFTT"
"D.STEIN","DS","01","APWLNK","TTTTTTFFTT"
"D.STEIN","DS","01","ARCOM","TTTTTTFFTT"
"D.STEIN","DS","01","ARINV","TTTTTTFFTT"
I've used a Flat File Source Editor to load the data. What is the easiest way to remove all of the double quotes?
Further searching revealed that I should use the Text Qualifier on the General Tab of the Flat File Source.
Flat file content when viewed in a Notepad++. CRLF denotes that the lines end with Carriage Return and Line Feed.
On the flat file connection manager, enter the double quotes in the Text qualifier text box.
Once the text qualifier is set, the data would be parsed correctly and displayed as shown below:
while loading CSV with double quotes and comma there is one limitation that extra double quotes has been added and the data also enclosed with the double quotes you can check in the preview of source file.
So, add the derived column task and give the below expression:-
(REPLACE(REPLACE(RIGHT(SUBSTRING(TRIM(COL2),1,LEN(COL2) - 1),LEN(COL2) - 2)," ","#"),"\"\"","\""),"#"," ")
the bold part removes the data enclosed with double quotes.
Try this and do let me know if this is helpful
substring([column 5], 2,(len([column 5])-2) )
I would rather use the following statement....
REPLACE(REPLACE(REPLACE(ColumnName, '""', '[YourOwnuniqueString]'), '"', ''), '[YourOwnuniqueString]', '"')
Note: please make sure your YourOwnuniqueString should be unique and not used any where in the columns as data. E.x: SQL#RT2#myCode -It is case sensitive-