Remove NEWLINE character in a SSIS Flat File - ssis

Is there a way to remove newline character from a flat file in SSIS ??... without a Script Task.
CONTEXT: I need to send all the content of a flat file as a single string value.

If you have access to some SQL database, you could try exporting the data of the flat file onto an SQL table in that database. The data would be loaded into multiple rows in SQL. Then all you need to do is to concatenate the data from all the rows.
TABLE STRUCTURE
DataRow
----
This
is a
really
good way.
Code to go into as SQL Statement
DECLARE #Data VARCHAR(MAX)
SELECT #Data = COALESCE(#Data + '', '') + DataRow
FROM YourTable
WHERE ISNULL(DataRow, '') <> ''
SELECT #Data
Then you can have the above SQL statements inside an Execute SQL Task and the output into a result set and map it onto some string variable, which can be used as per your wish.

I opted to write a simple method to read the entire file and write it to a single String variable, which was actually a DTS variable:
public String getFileData(FileInfo file)
{
string text = System.IO.File.ReadAllText(#file.FullName);
return text;
}

Use this expression in the sql query for the column with the new line character
select
RTRIM(LTRIM(REPLACE(REPLACE(REPLACE(ColumnWithNewLineChar,CHAR(9),''), CHAR(10),''),CHAR(13),''))) as ColumnWithOutNewLineChar from table
Tab, char(9). Line feed, char(10). Carriage return, char(13)
Above expression removes/replace the new line chars

Related

Importing a series of .CSV files that contain one field while adding additional 'known' data in other fields

I've got a process that creates a csv file that contains ONE set of values that I need to import into a field in a MySQL database table. This process creates a specific file name that identifies the values of the other fields in that table. For instance, the file name T001U020C075.csv would be broken down as follows:
T001 = Test 001
U020 = User 020
C075 = Channel 075
The file contains a single row of data separated by commas for all of the test results for that user on a specific channel and it might look something like:
12.555, 15.275, 18.333, 25.000 ... (there are hundreds, maybe thousands, of results per user, per channel).
What I'm looking to do is to import directly from the CSV file adding the field information from the file name so that it looks something like:
insert into results (test_no, user_id, channel_id, result) values (1, 20, 75, 12.555)
I've tried to use "Bulk Insert" but that seems to want to import all of the fields where each ROW is a record. Sure, I could go into each file and convert the row to a column and add the data from the file name into the columns preceding the results but that would be a very time consuming task as there are hundreds of files that have been created and need to be imported.
I've found several "import CSV" solutions but they all assume all of the data is in the file. Obviously, it's not...
The process that generated these files is unable to be modified (yes, I asked). Even if it could be modified, it would only provide the proper format going forward and what is needed is analysis of the historical data. And, the new format would take significantly more space.
I'm limited to using either MATLAB or MySQL Workbench to import the data.
Any help is appreciated.
Bob
A possible SQL approach to getting the data loaded into the table would be to run a statement like this:
LOAD DATA LOCAL INFILE '/dir/T001U020C075.csv'
INTO TABLE results
FIELDS TERMINATED BY '|'
LINES TERMINATED BY ','
( result )
SET test_no = '001'
, user_id = '020'
, channel_id = '075'
;
We need the comma to be the line separator. We can specify some character that we are guaranteed not to tppear to be the field separator. So we get LOAD DATA to see a single "field" on each "line".
(If there isn't trailing comma at the end of the file, after the last value, we need to test to make sure we are getting the last value (the last "line" as we're telling LOAD DATA to look at the file.)
We could use user-defined variables in place of the literals, but that leaves the part about parsing the filename. That's really ugly in SQL, but it could be done, assuming a consistent filename format...
-- parse filename components into user-defined variables
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(f.n,'T',-1),'U',1) AS t
, SUBSTRING_INDEX(SUBSTRING_INDEX(f.n,'U',-1),'C',1) AS u
, SUBSTRING_INDEX(f.n,'C',-1) AS c
, f.n AS n
FROM ( SELECT SUBSTRING_INDEX(SUBSTRING_INDEX( i.filename ,'/',-1),'.csv',1) AS n
FROM ( SELECT '/tmp/T001U020C075.csv' AS filename ) i
) f
INTO #ls_u
, #ls_t
, #ls_c
, #ls_n
;
while we're testing, we probably want to see the result of the parsing.
-- for debugging/testing
SELECT #ls_t
, #ls_u
, #ls_c
, #ls_n
;
And then the part about running of the actual LOAD DATA statement. We've got to specify the filename again. We need to make sure we're using the same filename ...
LOAD DATA LOCAL INFILE '/tmp/T001U020C075.csv'
INTO TABLE results
FIELDS TERMINATED BY '|'
LINES TERMINATED BY ','
( result )
SET test_no = #ls_t
, user_id = #ls_u
, channel_id = #ls_c
;
(The client will need read permission the .csv file)
Unfortunately, we can't wrap this in a procedure because running LOAD DATA
statement is not allowed from a stored program.
Some would correctly point out that as a workaround, we could compile/build a user-defined function (UDF) to execute an external program, and a procedure could call that. Personally, I wouldn't do it. But it is an alternative we should mention, given the constraints.

How to transfer JSON data to SQL and put the appropriate data into a specific table

I am trying to transfer a JSON database to SQL: https://data.cityofnewyork.us/api/views/6bzx-emuu/rows.json?accessType=DOWNLOAD
But the problem I have is that how can I insert the data in the appropriate table when the only thing that separates each data is just a comma.
Thanks
This will break the string into an array at the commas and trim whiteespace.
my_string = "blah, lots , of , spaces, here "
[x.strip() for x in my_string.split(',')]
Then you loop through the array elements in Python and insert each in turn using a parameterized SQL statement. The idea is to do all the string-manipulation work in Python until you have a record ready to insert. then use Python to build and exeecute your SQL statement.

Base64 string in SSIS

I have a database table with a field "FileName" and a second field which is a base64 string (nvarchar(MAX)). It's an archive from my financial system. I want to convert this string back into a file using an Byte[] in a SSIS script task but i can't get the string value out of this object variable.
First I get the value from the SQL database in a SSIS variable (Base64Data). This variable is of type Object since the SQL type is nvarchar(MAX). I use sql statement: SELECT Base64Data FROM SubjectConnector WHERE FileName= '16-VMA-37041.pdf' which returns only one row. I then connect the Base64Data to a variable [User::Base64Data] in the Result Set window of the Execute SQL Task Editor. No problems here (at least so it seems).
But when I check the value of this object variable with:
MessageBox.Show(Dts.Variables["User::Base64Data"].Value.ToString());
it states:
System._ComObject
What it's going on? Is the result from the SQL query empty? How can I check this or what else is wrong?
Here's my SQL data
Please help.
Variable type Object is in fact ADO recordset, so you cannot get it directly with Dts.Variables... statement. I extracted NVARCHAR(MAX) value with FOR EACH enumerator for ADO.NET recordset, and fetched value from the first column into text variable.

Mysql dump character escaping and CSV read

I am trying to dump out the contents of my mysql query into a csv and read it using some java based open source csv reader. Here are the problems that I face with that,
My data set is having around 50 fields. The data set contains few fields with text having line breaks. Hence to prevent breaking my CSV reader, I gave Fields optionally enclosed by "\"" so that line breaks will be wrapped inside double quotes. In this case, for other fields even if there are no line breaks, it wraps them inside double quotes.
Looks like by default the escape character while doing mysql dump is \ ( backslash) This causes line breaks to appear with \ at the end which breaks the csv parser. To remove this \ at the end, if I give Fields escaped by '' ( empty string), it causes my double quotes in the text not to be escaped, still breaking the csv read.
It would be great if I can skip the line break escaping, but still retain escaping double quotes to cause csv reader not to break.
Any suggestions what can I follow here?
Thanks,
Sriram
Try dumping your data into CSV using uniVocity-parsers. You can then read the result using the same library:
Try this for dumping the data out:
ResultSet resultSet = executeYourQuery();
// To dump the data of our ResultSet, we configure the output format:
CsvWriterSettings writerSettings = new CsvWriterSettings();
writerSettings.getFormat().setLineSeparator("\n");
writerSettings.setHeaderWritingEnabled(true); // if you want want the column names to be printed out.
// Then create a routines object:
CsvRoutines routines = new CsvRoutines(writerSettings);
// The write() method takes care of everything. Both resultSet and output are closed by the routine.
routines.write(resultSet, new File("/path/to/your.csv"), "UTF-8");
And this to read your file:
// creates a CSV parser
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");
parserSettings.setHeaderExtractionEnabled(true); //extract headers from file
CsvParser parser = new CsvParser(parserSettings);
// call beginParsing to read records one by one, iterator-style. Note that there are many ways to read your file, check the documentation.
parser.beginParsing(new File("/path/to/your.csv"), "UTF-8);
String[] row;
while ((row = parser.parseNext()) != null) {
System.out.println(Arrays.toString(row));
}
Hope this helps.
Disclaimer: I'm the author of this library, it's open source and free (Apache V2.0 license)

BCP : Retaining null values as '\N'

Have to move a table from MS SQL Server to MySQL (~ 8M rows with 8 coloumns). One of the coloumns (DECIMAL Type) is exported as empty string with "bcp" export to a csv file. When I'm using this csv file to load data into MySQL table, it fails saying "Incorrect decimal value".
Looking for possible work arounds or suggestions.
I would create a view in MS SQL which converts the decimal column to a varchar column:
CREATE VIEW MySQLExport AS
SELECT [...]
COALESCE(CAST(DecimalColumn AS VARCHAR(50)),'') AS DecimalColumn
FROM SourceTable;
Then, import into a staging table in MySQL, and use a CASE statement for the final INSERT:
INSERT INTO DestinationTable ([...])
SELECT [...]
CASE DecimalColumn
WHEN '' THEN NULL
ELSE CAST(DecimalColumn AS DECIMAL(10,5))
END AS DecimalColumn,
[...]
FROM ImportMSSQLStagingTable;
This is safe because the only way the value can be an empty string in the export file is if it's NULL.
Note that I doubt you can cheat by exporting it with COALESCE(CAST(DecimalColumn AS VARCHAR(50)),'\N'), because LOAD INFILE would see that as '\N', which is not the same as \N.