Viewing Results which are not Output to a Destination - ssis

When developing a Data Flow I don't always want to output the results to a destination - but I would like to see the data.
Is there a way to attach a Data viewer to an output without having to have a destination?
The file and raw destination have limitations on the data type they accept - and I don't want to attach conversions just to test/build code.
Is there some kind of output to null ? i could then get a data view on the result set

There is a (free) custom "trash" destination available from a third party:
http://www.sqlis.com/post/Trash-Destination-Adapter.aspx

I usually use Export column transform - if left with default configuration, it does nothing, so it is equivalent to the custom "Trash" destination mentioned by Ed, but you don't have to install anything.

When debugging and wanting to view the data in the buffer, I usually throw in a Union All and connect it below the component who's output I want to see. Then add a Data Viewer on the connector and voila, there it is!

There isn't really. You can use a RecordSet Destination, or a Row Count transform instead.

Related

Advanced mapping of JSON in Azure Data Factory - some guidance requested

I'm trying to map a JSON document (sensor data) into a more meaningful representation using Mapping Dataflows. However, hard time getting this to work and would really appreciate some insight/recommendations on how to solve the following:
The input is
What I would like to end up with is the following:
Any pointers as to how this can be implemented are more than welcome.
This can be accomplished using the Copy activity and then split function in Derived Column transformation in Azure Data Factory.
Use the copy activity to read the JSON file as source and in sink, use SQL database to store the data as table. In Mapping tab, Import the schema and map the JSON records to the corresponding column names. Refer this third-part tutorial for guidance - https://sqlkover.com/dynamically-map-json-to-sql-in-azure-data-factory/
Finally, use the Data Flow activity and choose the SQL table as source now which you have used as sink above.
Select the Derived Column transformation.
Use split function.
Add the column which will take the split values which you want to split as shown below.
Use split(<column_name_to_split>, '_') function to split the column on with _ delimiter. Change <column_name_to_split> to the name of column you cant to split. Refer image below.
Preview the data to check the result.

processing multiple files in business objects data services

I am new to the Business Objects Data services.
I have to run a dataflow reading from a file. Filename should be read based on wild chars like Platform. And I want to run the dataflow only if the file exists, if file is not present , it should not error out or should not do anything but it should just move on to the next dataflow or workflow in the job.
I tried below code to check if the file exists as built_in function File_Exists cannot check the file based on wild chars.
*$FILEEXISTSFLAG= exec('/bin/ksh',' "ls xxxxxx/Platform.csv',8);*
My intention is based on the value assigned to $FILEEXISTSFLAG from above code, I will decide whether to execute the data flow or not (if $FILEEXISTSFLAG is null do nothing otherwise execute the data flow ) but its retrieving below output.
*ls: cannot access /xxxxxx/Platform.csv: No such file*
Is there any other way to achieve this?
I was able to solve the above problem by using the index function.
$FILEEXISTSFLAG is containing a value like "ls: cannot access Platform: No such file or directory ". So, I have used the index function as below. So if the output is not null for below index function, it will execute the dataflow, otherwise it will do nothing.
index( $FILEEXISTSFLAG , 'No such file',1)
Thanks,
Phani.

Importing flat file which has changing column order using SSIS [duplicate]

Problem.
I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.
Furthermore the arrangement of these files are inconsistent.
Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.
Any suggestion, samples or direction will greatly be appreciated.
http://www.cozyroc.com/ssis/data-flow-task
Some forums
http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400
http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow
Off the top of my head, I have a 50% solution for you.
The problem
SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.
Query based solution
If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.
Variables
I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.
InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + #[User::CurrentFilename]
Connection manager
Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + #[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"
Control Flow
My Control Flow looks like a Data flow task nested in a Foreach file enumerator
Foreach File Enumerator
My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for #[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"
In the Variable Mappings tab, assign the value to our #[User::CurrentFileName] variable
At this point, each iteration of the loop will change the value of the #[User::Query to reflect the current file name.
Data Flow
This is actually the easiest piece. Use an OLE DB source and wire it as indicated.
Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the #[User::Query] variable in there, click OK and you're ready to work.
Sample data
I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.
col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444
and
col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222
Running the package results in these two screen shots
What's missing
I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.
Our solution. We use parent child packages. In the parent pacakge we take the individual client files and transform them to our standard format files then call the child package to process the standard import using the file we created. This only works if the client is consistent in what they send though, if they try to change their format from what they agreed to send us, we return the file.

Is there any way to reorder fields in an SSIS flat file source?

I have an SSIS package using a tab delimited flat file source with a TON of fields. Recently the provider of the tab delimited flat file has decided to change the format of the flat file by sprinkling a couple dozen new fields at random into the file. Needless to say, this hosed the package.
Rather than rebuild another flat file source and redefine all the fields, types, and lengths all over again, is there a way to reorder the fields in the flat file source? Sure would have been nice if Microsoft allowed you to move the fields around in the Advanced Columns pane, but noooooo.
Any help is appreciated.
If you only need to add columns to your file, you can do that in the Flat File connection editor. In the advanced window, you can select the field next to the new one and click the chevron next to the New button. It will give you the choice insert before or insert after.
If you truly have to move things around, you'll need to edit the XML source. If you use the existing file definition as a guide, you can build the new one in Excel or T-SQL relatively easily. Easier than typing everything in all over again at least.
I had a similar issue: I needed to change the order of columns in my flat file destination. The time-saving approach I settled on:
Delete the FF destination and FF connection manager (note down file name/location!),
Clear the check boxes that enable output columns in the source component
Re-enable the columns in the order you want
Add a new FF destination and FF connection right from the FF destination's connection manager drop-down.
Review/sanity check column sizes in FF connection, as usual
Not a direct answer to the question, but I came here looking for advice on "how to rearrange flat file destination columns", perhaps this will help someone.
I haven't seen an solution for that problem. SSIS isn't very strong in changing metadata. You could try to do it in notepad, but that is very tricky and very buggy. I would not recommand that to you.
In the connection managers below of your IDE you can double click your file name and edit everything you want.
This is still a "feature" of SSIS. To work around this I create a flat file connection called "NULL" with a single column named "NULL". Use the "New" button to add the column. I change the default column name from "Column 0" to "NULL". This column name must not match any column name in the list to be re-populated. If you have a real column named "NULL", pick something else for the column name that's not in use. You can keep the "NULL" flat file connection in the project for later use. (I expect to need it a few more times in this project.)
For this example, I use a flat file destination. Change the Flat File Destination to use the NULL connection.
Check the mapping to see there are no columns mapped. Saving this resets the metadata stored for the mapping.
Finally, change the Flat File Destination back to the correct connection to get a new mapping without metadata interference.
My example is a flat file destination. It should work for a flat file source for resetting the metadata. It is similar to the trick of changing a query to "select 1 as [NULL]" and back to purge metadata when using a ODBC source or such.
you could probably try something, but i havent tested.. use expressions to set everything for your flat file source? turn design time validation off

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.