SSIS CRLF in SourceData Field - ssis

I've got several flat files that I'm working on loading into SSIS and have hit a major snag I can't seem to fix. The flat files are all column delimited by pipe (|) and each row is delimited by CRLF - HOWEVER; one of the columns is SupplierEmail --> this is user input for multiple ERPs and sometimes there is a CRLF in that field. As you can imagine that shows that and the next line off thus breaking the package...
How do you deal with CRLF in the source data?

The usual way to deal with this is to enclose each columns' value in Quotation marks, and any CRLF that is inside of pair of quotes is part of the data, and any CRLF that is outside of any quotes is a row delimiter.

Related

Exporting data from SSRS to a .csv file adds lots of quotation marks how do I get just one set?

I have a report which is just a simple SELECT statement which generates a list of columns full of data. I want this data to be exported as a CSV file with each datum being enclosed in " quotation marks. I have created a table and used this as my expression
=""""+Fields!Activity_Code.Value+""""
When I run the report inside ReportBuilder 3.0 I get exactly what I'm looking for
No headers and each datum has quotation marks, perfect.
But when I hit export to csv, and then open with notepad I see this.
The headers are in there where they shouldn't be and each datum has 3 quotation marks on each side. What am I doing wrong?
This is perfectly normal.
When csv fields contain a separator or double quotes, the fields are enclosed in double quotes and the quotes inside the fields are escaped with another quote.
Example - the fields:
123
"27" monitor"
456
become:
123,"""27"" monitor""",456
or:
"123","""27"" monitor""","456"
A csv reader/parser should handle this correctly when reading the data (or you could provide a parameter telling the parser that the fields are quoted).
On the other hand, if you just want your fields to be quoted inside the csv (and not visible after opening the file), you can tell the csv generator to quote the fields (or in this case do nothing since the generator seems to be adding quotes already).

formatting MySQL output to valid CSV or XLSX

I have a query whose output I format and dump onto a CSV file.
This is the code I'm using,
(query.....)
INTO OUTFILE
"/tmp/dump.csv"
FIELDS TERMINATED BY
','
ENCLOSED BY
'"'
LINES TERMINATED BY
'\n'
;
However when I open the CSV in Google Sheets or Excel, the columns are broken up into hundreds of smaller ones.
When I open the CSV in a plain text editor, I see that the column values itself contain quotes (single and double), commas, line-breaks.
Only the double-quotes are escaped.
Even though the double-quotes are escaped, they are omitted when interpreted by Google Sheets and Excel.
I tried manually editing the CSV entries; escaping the commas and such. But no luck. The commas still break the columns. However, in a couple of instances they didn't break the column. I am not able to figure why though.
So my question is how do I correctly format the output to accommodate for these characters and dump it onto a CSV or even an XLXS ( in case a CSV is not capable for situations like these )?
For context, I'm operating in a WordPress environment. If there is a solution in PHP, that can work too.
EDIT ::
Here is a sample line from the CSV,
"1369","Blaze Pannier Mounts for KTM Duke 200 & 390","HTA.04.740.80200/B","<strong>Product Description</strong><span data-sheets-value=\"[null,2,"SW Motech brings you the Blaze Pannier Brackets for the Duke 200 & 390. "]\" data-sheets-userformat=\"[null,null,15293,[null,0],11]\">SW Motech brings you the Blaze Pannier Brackets for the Duke 200 & 390.</span>"," <strong>What's in the box? </strong><span data-sheets-value=\"[null,2,"2 Quick Lock SupportsnMounting materialnMounting Instructions"]\" data-sheets-userformat=\"[null,null,15293,[null,0],null,[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],[null,[[null,2,0,null,null,[null,2,13421772]],[null,0,0,3],[null,1,0,null,1]]],null,0,1,0,null,[null,2,0],"calibri,arial,sans,sans-serif",11]\">2 Quick Lock SupportsMounting materialMounting Instructions</span> ","Installation Instructions"
From RFC 4180
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
Any double quotes inside fields enclosed with double quotes need to be escaped with another double quote. So given abc,ab"c," the expected formatting would be abc,"ab""c","""".

SSIS Text Qualifier not working on last column

My flat files fields are tab delimited (\t) with a quotation mark (") text qualifier, and each row is linefeed (LF) separated.
My SSIS package works great when
no fields are text qualified
any field EXCEPT the last column is text qualified
When the last column is text qualified my package errors out saying it couldn't find the delimiter for my last column ... any ideas?
In a programmer's life, 3 problems (that often take hours to track down) are certain: permissions, case sensitivity, and line endings.
In my case, it is line endings. When a CRLF is pressed against the text qualifier ("), SSIS apparently doesn't interpret the text qualifier correctly, but does see the line break.
Here's what my setup looked like when I was having issues:
Here's what my setup looked like after changing the column delimiter:
The official answer here then is to change the line endings. The unfortunate side effect of that is to change a package that works on all the other files - leading to a need to convert files with CRLF to LF before hitting this package, or ending up with unsightly workarounds as seen here.
use this application on your SSIS Execute task
http://www.softsea.com/review/U2WIN.html
and put in flat file folder
I tried set TextQualified = 'False' on your last column to see if it helps

Using Excel to create a CSV file with special characters and then Importing it into a db using SSIS

Take this XLS file
I then save this XLS file as CSV and then open it up with a text editor. This is what I see:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.
I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.
Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.
And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.
What do I have to do to get Excel and SSIS to get along?
Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.
I find that if I modify the file from this
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"
...to this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"
i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?
According to Wikipedia, here are a couple of traits of a CSV file:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...
Update:
If I use Notepad change the input file to
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"
Excel reads it just fine
but SSIS returns
The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Conclusion:
Just like the error message says in your update...
The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.
Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.
Do you need to use a comma delimiter.
I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.
1|ABC|AB"C|D,E|F|03|3,2
You have 3 options in my opinion.
Read the data into a stage table.
Run any update queries you need on the columns
Now select your data from the stage table and output it to a flat file.
OR
Use pipes are you delimiters.
OR
Do all of this in a C# application and build it in code.
You could send the row to a script in SSIS and parse and build the file you want there as well.
Using text qualifiers and "character" delimited fields is problematic for sure.
Have Fun!

How to handle single and double quotes in XSLX spreadsheet when converted to csv for phpMyAdmin import

My client is providing me with an XSLX spreadsheet that, in some columns, can have single and/or double quotes. I'm opening it up in LibreOffice and saving it as a CSV. Then I try to import it in phpMyAdmin, but every time the import gets tripped up on a line with either single or double quotes, depending on which I indicate to use for escaping.
When saving the XLSX as a CSV I select UTF-8 for encoding (it's defaulting to Windows-1252), comma for column delimiter, leave "Save cell content as shown" checked. For "Text delimiter" and "Quote all text cells", I've tried both options each (single and double quotes for delimiter and checked/unchecked for Quote).
Then in phpMyAdmin, for the import I leave UTF-8 selected, columns enclosed with double-quote (or single quote, matching what I selected in LibreOffice) and for columns escaped with I've tried backslash and double-quote and single-quote.
In ALL cases I keep getting the error "Invalid column count in CSV input on line n." The line number depends on what I selected for column escape/delimiter (single or double quote). If I selected double quote as delimiter, I get the error on the first line that has a column with an unescaped single quote in it, and vice versa for single quote delimiters.
How can I get this spreadsheet imported with both single and double quotes in the cells?
Okay, after some more research and "fiddling around" I figured it out.
In my situation, for the import I selected CSV using LOAD DATA, used double quotes for "columns enclosed with", and cleared the columns escaped with.
LOAD DATA apparently tells phpMyAdmin to allow MySQL to handle the file directly. I'm not sure why this would affect my issue if I can specify delimiters, etc for the "regular" CSV import selection, but it seems to have worked for me!
Hope this helps someone else out.