SSIS Exporting to Flat File Destination (CSV) - Custom Property EscapeQualifier Not Working (Undocumented?) - csv

Many questions have been asked on this topic, but I can't find anything specifically addressing what I see in Visual Studio 2017 (SSDT). A Custom Property named "EscapeQualifier" exists for a flat-file destination component in an SSIS project. Unfortunately, setting this to true doesn't seem to do anything.
Searching official documentation from MS doesn't even show the property existed.
On the surface, using this option seems to be a very elegant solution to the common issue of creating a "real" CSV file when the data being exported contains the double-quote character. If it worked as it seems it should, then it would double any double-quotes (or similarly escape whatever character you defined as your text-qualifier) for all quotable fields in the destination.
The solutions for "the CSV problem" that I've been able to find suggest modifying the specific data via transforms or at the data-retrieval level, but that's very impractical to do on each and every text-qualified data column.
To add insult to injury, I found a KB article from MS that suggests "exporting to CSV" is an official thing in SSDT.
KB4135137 - SSMS and SSDT do not escape double quotation marks when you export data as CSV
For example, you export a table into CSV format in a SQL Server Integration Services (SSIS) project.
This article suggests that the double-quotes not being escaped are a bug that has been fixed. Maybe it has, but only for the "Save results as..." option within SSMS. I still don't see any possible way to specify a true CSV export in an SSIS package, and this "EscapeQualifier" option gave me false hope.
Does this "EscapeQualifier" option ever do anything? If so, how do I get it to work? If not, is there another universal solution to the SSIS export to CSV issue?

Note: I created a pull request to add information about this property to Microsoft Docs.
As mentioned in the Flat File Destination properties, the EscapeQualifier property is used to:
When text qualifier is enabled, specifies whether the text qualifier in the data written to the destination file will be escaped or not
To test this property, I created a package the transfer data from a flat file to another one.
In the source flat file connection manager, the Text Qualifier is set to <none>, while in the destination flat file connection manager the text qualifier is set to ". The source flat file only contains the following value: my name is "hadi".
Is set the EscapeQualifier property as True in the flat file destination and execute the package. As shown in the screenshot below, the destination file contains the following value: "My name is ""hadi""" which means that this property worked as excepted.
Make sure that you have set a text qualifier in the flat file connection manager to ensure that this property will work as excepted.

Related

ADF Merge-Copying JSON files in Copy Data Activity creates error for Mapping Data Flow

I am trying to do some optimization in ADF. Setup is a third-party tool copies one JSON file per object to a BLOB storage container. These feed to a Mapping Data Flow. The individual files written by the third party tool work great. If I copy these files to a different BLOB folder using an Azure Copy Data activity, the MDF can no longer parse the files and gives an error: "JSON parsing error, unsupported encoding or multiline." I started this with a Merge Files, but outcome is same regardless of copy behavior I choose.
2ND EDIT: After another day's work, I have found that the Copy Activity Merge File from JSON to JSON definitely adds an EOL character to each single JSON object as it gets imported to the Merge file. I have also found that the MDF fails definitely with those EOL characters in the Merge file. If I remove all EOL characters from the Merge file, the same MDF will work. For me, this is a bug. The copy activity is adding a character that breaks the MDF. There seems to be a second issue in some of my data that doesn't fail as an individual file but does when concatenated that breaks the MDF when I try to pull all the files together, but I have tested the basic behavior on 1-5000 files and been able to repeat the fail/success tests.
I took the original file, and the copied file, ran them through all of sorts of test, what I eventually found when I dump into Notepad++:
Copied file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\r\n
Original file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\n
If I change the copied file from ending with \r\n to \n, the MDF can read the file again. What is going on here? And how do I change the file write behavior or the MDF settings so that I can concatenate or copy files without the CRLF?
EDIT: NEW INFORMATION -- It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Other possibly relevant information:
Both the source and sink JSON datasets are set to use UTF-8 (not default(UTF-8), although I tried that). Would a different encoding fix this?
I have tried remapping schemas, creating new data sets, creating new Mapping Data Flows, still get the same error.
EDITED for clarity based on comments:
In the case of a single JSON element in a file, I can get this to work -- data preview returns same success or failure as pipeline when run
In the case of multiple documents merged by ADF I get the below instead. It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Repro: Create any valid JSON as a single file, put it in blob storage, use it as a source in a mapping data flow, to do any sink operation. Create a second file with same schema, get them both to run in same flow using wildcard paths. Use a Copy Activity with Merge Files as the Sink Copy Activity and Array of Objects as the File pattern. Try to make your MDF use this new file. If it fails, download the file created by ADF, run it through a formatter (I have used both VS Code -> "Format Document" from standard VS Code JSON extension, and VS 2019 "Unminify" command) and reupload... It should work now.
don't know if you already solved the problem: I came across the exact same problem 3 days ago and after several tries I found a solution:
in the copy data activity under sink settings, use "set of objects" (instead of "array of objects") under File Pattern, so that the merged big JSON has the value of the original small JSON files written per line
in the MDF after setting up the wildcard paths with the *.json pattern, under JSON Settings select: Document per line as the Document form.
After that you should be good to go, as least it solved my problem. The automatic written CRLF in "array of objects" setting in the copy data activity should be a default setting and MSFT should provide the option to omit it in the settings in the future.
According to my test:
1.copy data activity can't change unix(LF) to windows(CRLF).
2.MDF can also parse unix(LF) file and windows(CRLF) file.
Maybe there is something else wrong.
By the way,I see there is a comma after "name":"Customer Name" in your Original file,I delete it before my test.

SSIS Handling a Flat File Missing a Text Qualifier

I'm currently designing as SSIS package to import some CSV files and needs to account for various error types. One of the errors is an incorrect or missing text qualifier.
I.E: "col1","col2","col3/,"col4"
The package is currently throwing the error "[ProductMaster CSV [66]] Error: The column delimiter for column "Column 2" was not found.".
Which is what I would expect to see in this situation.
Apparently getting the file initially sent in the correct format isn't an option at the moment.
I've tried changing the file to have no text qualifier, but this then falls over if there is a comma in a field so is not a viable solution.
Is there any way of handling this?
I use a third party tool to read csv files and it handles this type of situation. If you must do something on your own I would import the entire line to one column and then parse it with either a stored procedure or a script component.
There are plenty of solutions out there, some free and some with a minimal cost.
I have never found a way to handle this with SSIS connection managers 'out of the box'.
To solve this issue, look at your file format. Use text editor like Notepad++ and if your file is CR, make sure you don't use (") instead choose in the text qualifier and choose CR in the header row. This should work 100%

SSIS Package not reading the last row in flat file

I have SSIS Package which will load .EXT file into my Database table.
The package Flat File connection manager Editor properties are
Format: Ragged Right
Code Page: 1252 ANSI (Latin-I)
Text Qualifier: <None>
Header Row Delimiter: <LF>
While trying to preview the file before loading, i am able to see all the rows in columns and
preview tab of Flat File connection manager Editor.
But in actual loading of the file, last record alone is not getting imported into table.
It was loading fine and still it is processing the file on daily basis.
Only for two days file, it was not imported last records. I am trying to find the root cause.
I suspected something wrong with the file, but i do not find any differences between the
working and not-working version of files.
Please suggest us to resolve the same. Kindly let me know if any informations required.
I ran into the same issue and did some research to find a solution that worked from me. Apparently the SSIS package had gone through a conversion from an earlier version at one point. When the conversion was done, the text qualifier property on the flat file connection was mangled. It had originally been <none>, but the conversion changed it to _x003C_none_x003E_. I opened the flat file connection manager and changed the text qualifier property on the general tab back to the proper value of <none>.
Credit goes to this thread for providing the answer.
I had a similar issue. My flat file didn't had any text qualifiers. When i added a text qualifier the package ran successfully. My guess is that the file is read as text and the CRLF is not recognized at the last line.
If you can provide a sample of the data from the file

CodedUI test does not read data from CSV input file

I am having difficulty mapping a CSV file with the Coded UI test method. This is most likely a stupid question but I cannot seem to find a solution for my problem, at least not one that works. I have made sure to set the property of the CSV file to Copy always.
I have also imported the CSV file by writing the following line above the test method.
[DataSource("Microsoft.VisualStudio.TestTools.DataSource.CSV", "|DataDirectory|\\Data\\login.csv", "login#csv", DataAccessMethod.Sequential), DeploymentItem("login.csv"), TestMethod]
The file name is login.csv and it resides in the Data folder.
The test will compile without any problem but once the test executes the fields that should receive input from the CSV file are left empty and the execution is interrupted. I've tried replacing the data from the CSV file by using Strings and it works perfectly fine. The piece of code I am using to import each parameter is:
TestContext.DataRow["Username"].ToString()
Also, the CSV file contains something along the following lines:
Username,Password,Fullname
admin#mail.com,password,Admin
Is there anyone who can point what it is I am forgetting.
Update: I pinpointed the issue, it seems like the issue only revolves around the first column in the csv file. When I try to import any of the other values it works perfectly fine.
Some text files start with a Byte Order Mark (BOM). The CSV reader within Coded UI does not handle the BOM and treats it as part of the first field name. The screen shot below shows the debug trace of a CSV file with a BOM and that same file shown in Notepad++. The DataRow.ItemArray[...] values are as expected. The DataRow.Table.Columns.ResultsView[...] shows the field names, but the first field name includes the BOM.
This CSV file with a BOM was created in Visual Studio using Solution Explorer => Add => New item => C# => General => Text file. Previously I have created a spread sheet with Microsoft Excel and saved it as a CSV file, that file did not have a BOM. I have also created files with Notepad++ and saved as CSV and they did not have a BOM. It appears that Visual Studio creates files with a BOM but when editing CSV files it does not add a BOM.
Visual Studio can create files with the correct encoding. Within "Step 2 - Create a data set" of this Microsoft page it states the text below. (Thanks also to Holistic Developer for providing very similar details in a comment.):
It is important to save the .csv file using the correct encoding. On the FILE menu, choose Advanced Save Options and choose Unicode
(UTF-8 without signature) – Codepage 65001 as the encoding.
For Visaul Studio 2010, i could solve issue be selecting "Western European (Windows) - Codepage 1252" encoding for CSV files.
Summary of steps:
In visual studio 2010, Open CSV file > Go to File menu > Select " Advanced Save Options" > Select "Western European (Windows) - Codepage 1252" > Save.
This should help.
This is not the best solution but its kind of a workaround. I simply set the first element to something random and since I don't need access to the first element it doesn't matter that I don't have access to it.
If anyone finds a correct way to solve this problem I'd be grateful for your solution.

Issues with Access parsing double quotes in a CSV file

I have a large CSV file that I am trying to import into Microsoft Access but I am running into issues. Assume pipes represent different cells in the database
Assume my content is the below. The second entry will only parse the word my with default settings and will not import the word content into the database even though the import wizard implies that it will. The default settings being , delimiter and " text qualifier.
|my content is good|
|my|
Now if i change the text qualifier to NONE it parses the entire second entry and my content will be imported into the database however the first entry will wind up being in 3 different cells in the data base and will show up as
my|content|is|good.
|my content
I used pipes to imply different cells.
This seems like a limitation in Microsoft Access. Is anyone familiar with a workaround for this?
Original content:
,"my,content,is,good","",
,my"content","",
I am using the import wizard
Yes, this is a limitation of the CSV import capabilities in Access. For whatever reason, Access has always been more restrictive than Excel in its abilities to parse CSV files.
So, one workaround would be to open the CSV file in Excel, save the file as an actual Excel sheet, and then import the Excel sheet into Access. For example, the CSV file
this,is,a "test",CSV file,"Ugly, yes, but still parsable."
is "non-standard" (if one is willing to concede that there is such a thing as a CSV "standard"), and Access cannot import it directly. (It either complains of an "Unparsable Record" or it splits the last field on the commas, depending on the "Text Qualifier" setting.)
However, we can open it in Excel
save the file as "foo.xlsx", and then import the .xlsx file into Access