SSIS Text Qualifier not working on last column - csv

My flat files fields are tab delimited (\t) with a quotation mark (") text qualifier, and each row is linefeed (LF) separated.
My SSIS package works great when
no fields are text qualified
any field EXCEPT the last column is text qualified
When the last column is text qualified my package errors out saying it couldn't find the delimiter for my last column ... any ideas?

In a programmer's life, 3 problems (that often take hours to track down) are certain: permissions, case sensitivity, and line endings.
In my case, it is line endings. When a CRLF is pressed against the text qualifier ("), SSIS apparently doesn't interpret the text qualifier correctly, but does see the line break.
Here's what my setup looked like when I was having issues:
Here's what my setup looked like after changing the column delimiter:
The official answer here then is to change the line endings. The unfortunate side effect of that is to change a package that works on all the other files - leading to a need to convert files with CRLF to LF before hitting this package, or ending up with unsightly workarounds as seen here.

use this application on your SSIS Execute task
http://www.softsea.com/review/U2WIN.html
and put in flat file folder

I tried set TextQualified = 'False' on your last column to see if it helps

Related

SSIS CRLF in SourceData Field

I've got several flat files that I'm working on loading into SSIS and have hit a major snag I can't seem to fix. The flat files are all column delimited by pipe (|) and each row is delimited by CRLF - HOWEVER; one of the columns is SupplierEmail --> this is user input for multiple ERPs and sometimes there is a CRLF in that field. As you can imagine that shows that and the next line off thus breaking the package...
How do you deal with CRLF in the source data?
The usual way to deal with this is to enclose each columns' value in Quotation marks, and any CRLF that is inside of pair of quotes is part of the data, and any CRLF that is outside of any quotes is a row delimiter.

CSV using windows line break as a delimiter

I'm trying to build a .CSV file out of information in a database. Unfortunately I don't have access to do this right from a SQL query and the rest of the site uses Classic ASP.
The information in the database often contains line breaks. Users enter this information with a html <textarea> field on a website. Line breaks show up in the resulting web page when the information is pulled from the database. The long term goal is to use the .CSV created here to change information then upload the file to update the database. Because of that I'm looking for a way to preserve the line breaks, or maybe replace them with <br/> tags.
The problem I'm running into is that Excel is using whatever line break character as a row delimiter. Is it better to replace the break character with <br/>? If that's the case how do I find the character I'm replacing for my Replace()? Is there a way to escape these characters and avoid the replace()? I've already got each cell surrounded by double quotes to escape other quotes.
I was able to replace the line break character with a <br/> that seems to work just as well when information is displayed on the site. I ended up using
writethis = Replace(myString, vbCrLf, "<br/>")
Which replaced the line break with a <br/> and stopped the row from ending prematurely.

How to tell what type of newline is being used in a txt file?

I have a txt file which contains quoted, comma deliminated text, and i am trying to figure out what type of newline is being used.
The reason is because i am trying to import it into mysql server, using local infile, and obviously i need to tell it the correct LINES TERMINATED BY
When i use either \n or \r\n it imports exactly half, of the records only, skipping a line each time.
But when i use \r it imports exactly double, giving me the exact number of rows as all values null, as there are records.
When i open the file in notepad, there is no space in between lines, however, if i open it in a browser, there is a blank line in between each line, as though there is a paragraph there somewhere. Like wise if i choose "open with > Excel" it does not put into columns, and has a blank line between each. The only way to open properly in excel is to use "get external data > From text" and choose comma deliminator.
I provide a couple of lines below exactly by just copying and pasting, and obviously it would be great if someone could let me know the correct settings to use for importing. But i it would be even more great, if there was a way for me to quickly know what type of newline any particular file is using (there is also a blank line at the very end of the file as per the other rows).
"Item No.","Description","Description 2","Customers Price","Home stock","Brand Name","Expected date for delivery","Item Group No.","Item Group Name","Item Product Link (Web)","Item Picture Link (Web)","EAN/UPC","Weight","UNSPSC code","Product type code","Warranty"
"/5PL0006","Drum Unit","DK-23","127.00","32","Kyocera","04/11/2013","800002","Drums","http://uk.product.com/product.aspx?id=%2f5PL0006","http://s.product.eu/products/2_PICTURE-TAKEN.JPG","5711045360824","0.30","44103109","","3M"
"/DK24","DK-24 Drum Unit FS-3750","","119.00","8","Dell","08/11/2013","800002","Drums","http://uk.product.com/product.aspx?id=%2fDK24","http://s.product.eu/products/2_PICTURE-TAKEN.JPG","5711045360718","0.20","44103109","","3M"

ssis text qualifier ignore LF CR in the field

Is there a setting on SSIS to ignore CR and LF in the text qualified segment. I have a CSV file with a comments field where within the text qualified segment of this field there are some line breaks. Is there a way to ignore them within the text qualified segment?
CSV is a simple format that can't handle the line separator appearing inside the line itself.
It's far easier to export the data using a line separator that is guaranteed NOT to appear inside the line itself.
Most export tools allow you to specify the field and record separator. I've used |,§ and even ¶ and ¤ as field and line separators many times to extract data from mainframes that did have newlines and quotes in some fields. It was far easier and faster that trying to parse the text, identifying whether a newline appeared inside multiple quotes or not.

Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

Take this XLS file
I then save this XLS file as CSV and then open it up with a text editor. This is what I see:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.
I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.
Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.
And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.
What do I have to do to get Excel and SSIS to get along?
Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.
I find that if I modify the file from this
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
...to this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"
i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?
According to Wikipedia, here are a couple of traits of a CSV file:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...
Update:
If I use Notepad change the input file to
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"
Excel reads it just fine
but SSIS returns
The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Conclusion:
Just like the error message says in your update...
The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.
Do you need to use a comma delimiter.
I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.
1|ABC|AB"C|D,E|F|03|3,2
You have 3 options in my opinion.
Read the data into a stage table.
Run any update queries you need on the columns
Now select your data from the stage table and output it to a flat file.
OR
Use pipes are you delimiters.
OR
Do all of this in a C# application and build it in code.
You could send the row to a script in SSIS and parse and build the file you want there as well.
Using text qualifiers and "character" delimited fields is problematic for sure.
Have Fun!