How to pass Multiple Input for SSIS Script Component - ssis

I have a Custom Source DataFlow Component whose O/P will differ every time, I need to Insert those records in to a Destination table.
Problem:-
Can't Specify the Input columns at Design time for Destination Component.. as in actual for every call to the DataFlow task, The source component gonna return Different O/P Columns based on the Table Schema.
Solution Needed For:-
How to accept whatever inputs are available without any mapping in Destination DataFlow component(either by any Existing Component or by Custom Component)?

The data flow's fixed structure is there for data validation and to optimization purposes. All of it's components are going to have fixed input and output columns. I would suggest the following possibilities:
Write a data flow for every possible schema. There are probably a finite number of possibilities. You could reduce the effort of this task by using BIML which could generate the package structure for you. This may also introduce the possibility of parallel loading.
Use a script task instead of a data flow. In the script task, write the rows for each input into a table.
m

If you need to pass multiple inputs to a single script component, the only way I know to do this is by passing the multiple inputs to a UNION ALL component, and then passing the single output from the UNION ALL to the Script.
You'll have to account for any differences between the two columns structures in the UNION ALL, and maybe use derived columns if you need an easy way to identify which original input a row came from.

I know this is way late but I keep seeing this UNION ALL approach and don't like it.
How about this approach.
Run both data flows into their own recordset destination and save into a variable of type ADO object
Create a new dataflow and use a script source and bring in both ADO objects
Fill datatables using adapter and then do what ever you want with them.

Related

Alternative approach to parameters in Invantive Control to control query outcome

I would like to use parameters in Invantive control.
For example, I would like to retrieve only the hours, of Exact Online Project management which, are in the given data parameters.
There are three often used approaches:
Use parameters in the model editor.
Use Excel values.
Use data from the databases involved.
Model Parameters
To use parameters in the model editor, you define them in the model editor:
Select a unique code.
Possibly provide a default value (always string, use to_... or casting to change it in the queries).
Define list-of-values providing a quick pick.
The unique code is in general all in uppercase and in the format P_..., but as long it is a legal identifier anything should work.
To use them in one or more block queries or triggers:
Use the building blocks button.
Or type $P{CODE} in the SQL or trigger for Excel.
The values can be entered using the parameters button (the green funnel in the Control ribbon):
Please note that parameters are always bound as parameters, and not lexicographically substituted so you can NOT say: select * from $P{TABLE_NAME}.
Please note that parameters can be dependent on each other, so in the query for the parameter you can use another parameter. Such as first choosing the country of a project and then showing a list of projects in that country. But be wise, avoid recursion and other overly complex scenarios; the user will not easily understand it.
Excel values
To use Excel values, you can define them as follows:
Enter a value somewhere in Excel.
Optionally define a named range to make it easier to change the location.
You can of course assign lists as normal in Excel as a pop-up or other validations. Also cell locking works fine.
To use the actual value in a query or trigger of Invantive Control you can use the building blocks in the query editor or use something like select * from table where code = $X{projectcode} or select * from table where code = $X{B2&C2&D2}. The last one shows that you can also use other type of Excel expressions.
Note that Excel parameters are also bound as parameters to the query, but that they are also typed, so the following query can be different depending on the general format of the Excel cell:
select *
from table
where code like $X{CELL}
When cell is a text, the database or Exact Online in this case will retrieve:
select *
from table
where code like :ex0
With ex0 being a text such as '8%'. But when cell is a percentage, the contents might still display in Excel as '8%', but the actual query will be with identical to the outcome of:
select *
from table
where code like 0.08
Caused me some headaches, but typing is in general a useful feature, especially with dates.
Parameter using database data
Option 3 is practically not feasible with Exact Online, since they are little possibilities to create your own tables and/or fields.
On other platforms such as Oracle you might want to enter new rows in Invantive Control in Excel and them upload them on sync to provide parameters. Especially handy in case of complex risk models.

Sorting files with the same header names using SSIS

I have a folder with a lot of data files in. I want to be able to loop through the files, look at the headers and sort them into folders if they have the same headers. Is that possible to do in SSIS? If so would anyone be able point me the direction of how to do this?
I am going to try and explain this as best I can without writing a book as this a multi stepped process that isn't too complex but, might be hard to explain with just test. My apologies but I do not have access to ssdt at the moment so I can not provide images to aid here.
I would use the TextFieldParser class in the VisualBasics.dll. in a script task. This will allow you to read the header from file into a string array. You can then build the string array into a delimited column and load an object variable with a datatable that has been populated with two columns. The first column being the filename and the second being the delimiter headers.
Once you have this variable you can load a sql table with this information. (optional to skip if you want to load the columns directly into sql as you read them. your call)
Once you have your sql table you can create an enumerator for that dataset based on the unique headers column.
Then use a foreach loop task with script task to enumerate thru the unique header sets. Use a sql task to assign the file names that belong to the unique header set.
Within the script loop thru the returned file names and apply the necessary logic to move the files to there respective folders.
This is sort of a high level overview as I am assuming you are familiar enough with SSIS to understand the steps necessary to complete each step. If not then I would be able to elaborate later in the day when I am able to get to my SSIS rig.

Dynamic SSIS transforms building tables lists at run time

I'm looking for some pointers in creating an SSIS based workflow that reads a list of tables at run time from a database and then uses each of these as ADO inputs, selects specific columns from each table and then adds these to a staging area. I've had a quick play with the union task but was looking for some pointers in terms of direction to take ?
I can't seem to find anything on the net that does what I need and am not sure if SSIS can bend to suit my needs.
Many thanks in advance.
You can do this but the only method I can think of is a little convoluted.
You would need to use a "for each loop container" to loop through your list of tables & read each table name into an SSIS variable.
Within the "foreach":
add a script task to build your actual query into another SSIS variable.
add a data flow
within the Data Flow use a source of "SQL Command from variable".
do data flow "stuff"
I hope this makes some kind of sense? :-)

SSIS 2008 determine source column from destination column (programatically)

Not sure if there's any way to do this, but we're trying to programmatically determine dependencies in our ETL process, specifically whether modifying a column in our source data set will impact our ETLs and if so, which ones, ie. with a package 'myPackage' containing a data flow task that draws from 'sourceTable' and includes various columns including 'column1' and ultimately loads 'destinationTable' with 'column1New' is there any way to query the SSIS package itself to determine that column1New is based on column1 (does lineage provide anything of use here?)
Each column you use in a transformation of your package is associated an ID. The next component to which the column is passed down to will refer to that column using the lineage ID property, but is given a new id.
You could query the XML of your package to trace the path a column takes by creating a map of these IDs. However, this might be difficult to implement in a stable way.
This might help you on your way:
http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx

How do I re-use reports on different datasets?

What is the best way to re-use reports on different tables / datasets?
I have a number of reports built in BIRT, which get their data from a flat (un-normalized) MySQL table, the data which in turn has been imported from an excel sheet.
In BIRT, I've constructed my query like this, such that I can change the field names and re-use the report:
SELECT * FROM
(SELECT index as "Index", name as "Name", param1 as "First Parameter" FROM mytable) t
However, then when I switch to a new client's data, I need to change the query to the new data source and this doesn't seem sustainable or anywhere near a good practice.
So... what is a good practice?
Is this a reporting issue, or a database-design issue?
Do I create a standard view that the report connects to?
If I have a standard view, do I create a different view with the same structure for each data table, or keep replacing the view with a reference to the correct data table each time I run the report?
What's annoying is the excel sheets keep changing - new columns are added, and different clients name their data differently. Even if I can standardize this, I'd store different client data in different tables... so would I need to create a different report for each client, or pass in the table name to the report?
There are two ways and the path you choose is really dictated by how much flexibility you have architecturally.
First, you are on the right track by renaming your selected columns to a common name since that name is what is used to bind the data to the control on the report. Have you considered a stored procedure to access the data? This removes the query from the report and allows you to set up the stored proc on any database to return the necessary columns. If you cannot off-load to a stored proc, you can always rely on altering the query text at run-time. Because BIRT reports are not compiled (they are XML) you can change the query based on parameters and have it executed for each run of the design. Look at the onCreate event for the Data Set and you can access this.queryText and do any dynamic string substitution you need via JavaScript. Hidden parameters are a good way to help alter/tune the query. If you build the Data Set correctly, the changing of the underlying data could be as easy as changing the Data Source and then re-associating the Data Set to the new Data Source (in the edit data set window). I have done this MANY times and it works well. If you are going down this route, I would add the Data Source(s), Data Set(s) and any controls that they provide data to a report library. With the library you can use the controls in many reports and maintain them in one spot. If you update the library, all the reports using the library get updated as well.
Alternatively, if you want to really commit to a fully re-usable strategy that allows you to build a library of reusable components you could check out the free Reusable Component Library at BIRT Exchange (Reusable Component Library). In my opinion this strategy would give you the re-use you are looking for but at the expense of maintainability. It is abstraction to the point of obfuscation. It requires totally generic names for columns and controls that make debugging very difficult. While it would not be my first choice (the option above would be) others have used it successfully so I thought I would include it here since it directly speaks to your question.