Passing a variable value from Control Flow to Data Flow in SSIS - sql-server-2008

I have a fairly straightforward SSIS package where I can't sucessfully pass the value of a package-scoped variable from the Control Flow to a Data Flow task. Consider the below diagram:
The Execute SQL task gets values from a list of "machines". This is used to control a ForEach Loop Container, which works very well. Next a script task performs some math and assigns a single number to a package scoped variable (integer type). I have added message boxes that pop up during the loop so that I can verify that the value of this variable is being set properly.
The last icon is a data flow where I want to use the variable value. I have a simple script task that contains just a message box showing me the current value of this same variable. Every time, the variable is the value that I initially set in the designer (BIDS). Therefore, the value is not being "passed" to the data flow. I have verified multiple times that the names of the variables are correct (including case sensitive values).
This should be pretty simple, and I am getting frustrated with this issue. I would greatly appreciate and suggestions or comments. Thank you!

How are you setting the package variable from your script task? It should look like this (c#):
DTS.Variables["testVariable"].Value = "some value";
Then to test it from the script component in your dataflow task:
public override void PostExecute()
{
base.PostExecute();
MessageBox.Show(Variables.testVariable, "test");
}
I did this in a test package and it worked fine.
EDIT
Also make sure that you added the variable to the ReadWriteVariables section of the properties for the script tasks.

Related

How to set up SSIS parent package such that 4 child packages can run at the same time with different parameter values passed in?

I have created a child SSIS package that executes according to the "ProcessName" variable value that is specified initially. Now, I wish to create a parent package such that I can execute 4 child package tasks with different ProcessName values passed in to be executed in parallel. How can I maintain my child package and pass in different values to each of the 4 execute packages task such that the ProcessNames variable values are different for each of them? I am new to SSIS and would deeply appreciate if someone could advice or give a direction on how I could go about doing so.
I would see this as a pattern like the following
The "trick" here is that within each Sequence Container, SEQC, I need to define my variable that holds my parameter value. That variable needs to be scoped to the container - otherwise, there is only one SSIS variable and the 4 processes that attempt to initialize that value will be in conflict.
In the SSIS Variables menu, there is a Move Variable icon (second one listed)
Here you can see that I have ParameterValue defined in both "SEQC Opt 1a" and "SEQC Opt 1b" and they're initialized with different values.
The first step within the Sequence container is an Execute SQL Task where I pull back the intended parameter value. Maybe that is not needed in your case but it can be helpful to have a repository of run-time values. In the case of 1b, this is much more what my execution pattern looks like. I have a query that pulls back any packages to be run within the scope of this container and the starting value. e.g.
ContainerName|PackageName|StartingValue
SEQC Opt 1a |Child0.dtsx|100
SEQC Opt 1a |Child1.dtsx|200
SEQC Opt 1a |Child2.dtsx|300
SEQC Opt 1b |Child5.dtsx|600
SEQC Opt 1b |Child6.dtsx|700
SEQC Opt 1b |Child7.dtsx|800
This table pattern allows me to dynamically run packages in both parallel and in serial. Assuming Child7 and Child2 in the above set are very slow but the other 4 packages are relatively fast. The fast ones would start up, do their work and complete and the next runs. There are limits to how many parallel operations can fire at once so you can't scale infinitely across processes so a balance of serial and parallel operations makes sense.
Once you have your pattern working for one sequence container: copy, paste, rename and assuming you lookup in a table based on the task name as I show above, it's ready to go.
NOTE for everyone reading this answer: This answer is not full/complete with examples/full steps. From comment above I am posting this now so requestor can see it and get started.
This was from notes I wrote myself a long while back on how to do this for myself. I am posting it as answer as it is helpful and too large to post as comment. Plus I have not rewritten anything in what wrote for myself and what I am posting.
Currently I can not find my full code yet to post full details/steps. If/when I do I will post here, but this should be good details on what/how to do it. Plus this gives info on how to handle child package error trapping as well.
-- my notes I saved for myself posting as answer:
Steps for creating child packages:
Create any variables needed in the child package
Create the coorisponding variable name in the parent package (the name doesnot have to be the same, and maybe want to name it something to identify it as a child package variable)
Child Package:
Need to set up: Package Configurations
a. Right click on package and click Package Paramaters
b. Click the checkox to Enable Package configurations
Click Add and set the paramaters:
a. Configuration Type: Pareknt Package variable
b. Specify the configuration setting directly: Put the parent variable name in here that the child package is going to access
c. Click "Next"
d. In "Objects" window scroll down to the variable you are setting from the parent variable name you selected above and click the "Value" option under Properties for that variable name
e. Click "Next"
f. Under Configuration Name: Set a detailed name for what this variable is/does.
Error Handling (NOTE: This is not required but you wont capture the child error messages if you dont do this):
a. Go to Event Handlers Tab
b. Under drop down (on top right) select OnError
c. Add a Scrit Task
d. Pass as read only variables:
System::ErrorDescription
System::SourceName
System::PackageName
e. Copy/paste the code below into the script task uin the Main() function.
----- this is for the error handling
public void Main()
{
// build our the error message
string ErrorMessageToPassToParent = "Package Name: " + Dts.Variables["System::PackageName"].Value.ToString() + Environment.NewLine + Environment.NewLine +
"Step Failed On: " + Dts.Variables["System::SourceName"].Value.ToString() + Environment.NewLine + Environment.NewLine +
"Error Description: " + Dts.Variables["System::ErrorDescription"].Value.ToString();
// have to do this FIRST so you can access variable without passing it into the script task from SSIS tool box
// Populate collection of variables. This will include parent package variables.
Variables vars = null;
Dts.VariableDispenser.GetVariables(ref vars);
// checks if this variable exists in parent first, and if so then will set it to the value of the child variable
// (do this so if parent package does not have the variable it will not error out when trying to set a non-existant varaible)
if (Dts.VariableDispenser.Contains("OnError_ErrorDescription_FromChild") == true)
{
// Lock the to and from variables.
// parent variable
Dts.VariableDispenser.LockForWrite("User::OnError_ErrorDescription_FromChild");
// Need to call GetVariables again after locking them. Not sure why - perhaps to get a clean post-lock set of values.
Dts.VariableDispenser.GetVariables(ref vars);
// Set parentvar = childvar
vars["User::OnError_ErrorDescription_FromChild"].Value = ErrorMessageToPassToParent;
vars.Unlock();
}
Dts.TaskResult = (int)ScriptResults.Success;
}
Parent Package:
Add this variable to propertly capture the child error messages (not required but you wont capture chidl error messages if you dont):
variable: OnError_ErrorDescription_FromChild
Error Handling(NOTE: This is not required but you wont capture the child error messages if you dont do this):
a. Go to Event Handlers Tab
b. Under drop down (on top right) select OnError
c. Add a Scrit Task
d. Pass as read only variables:
User::OnError_ErrorDescription_FromChild
e. Copy/paste the code below into the script task uin the Main() function.
----- this is for the error handling
public void Main()
{
// get the varaible from the parent package for the error
string ErrorFromChildPackage = Dts.Variables["User::OnError_ErrorDescription_FromChild"].Value.ToString();
// do a check if the value is empty or not (so we knwo if the error came from the child package or the occurent in the parent package itself
if (ErrorFromChildPackage.Length > 0)
{
// Then raise the error that was created in the child package
Dts.Events.FireError(0, "Capture Error From Child Package Failure",
ErrorFromChildPackage
, String.Empty, 0);
//Dts.TaskResult = (int)ScriptResults.Failure;
} // end if the error length of variable is > 0
Dts.TaskResult = (int)ScriptResults.Success;
}
NOTES:
For error handling:
a. The child package error handling is written so it wont fail if the variables or error handling does not exist in parent package.
b. If you include the error handling (and variable) in the parent package it MUST exist in the child package though.

SSIS Compilation problems - DirectRowToOutput

Input buffer does not contain a definition for 'DirectRowToOutput0' or likewise for the the other properties below.
Row.DirectRowToOutput0();
Row.ErrorMessage = ex.Message;
Row.DirectRowToFailedValidation();
I had some packages on SSIS package store, and attempted to import them using the Package Import Wizard project. But it had some issues and compilation failed, and completely broke all previous script components so I fished the code out of some backups, and pasted it back into some new script tasks.
'ErrorMessage' I did add to a new output flow and column, but it looks like things don't work that way anymore.
New Script tasks appear to be C# 2012.
What have I missed? Am struggling to find which documentation I should be using, and these version conflicts are really hard to deal with.
Using SSDT 2017.
"DirectRowToOutputX()" means "Provide support for filtering outputs to named output groups". In other words, if you ADD SUPPORT for output filtering, then you get the functionality that you list above. Here's how:
When you configure your Script Component, you need to click Inputs and Outputs, then in the inputs and outputs pane, select the output that you'd like to filter. Then in Common Properties, select ExclusionGroup and set it to some value other than zero. Now go back and edit your script and the Row.DirectRowToOutput0() statement will work. Code example below.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
bool keepThisRow = ValidateMyRow(Row);
if (keepThisRow)
{
// Get/set Row values, do something useful
Row.DirectRowToOutput0(); // for Output0
}
/* Else do nothing - the row will be filtered OUT of the output if not explicitly
* included
*/
}

Reading project parameters in Script Task

This is what I'm trying to do in a script task:
long lngMaxRowsToPull = Convert.ToInt64(Dts.Variables["Project::MaxRowsPerPull"].Value);
I get an error message that the variable does not exist.
Yet Its defined as a ReadOnlyVariable to the script and it does exist as a project parameter.
So close. ;)
Your code is trying to access a variable/parameter named Project::MaxRowsPerPull
In fact, the $ is significant so you need to reference $Project::MaxRowsPerPull
Also note that you have the data type for the parameter as Int32 but are then pushing it into Int64. You can always put a smaller type into a larger container but if you tried to fill the parameter with too large a value your package will asplode.
You need to add $ to your parameter fetch name as per syntax.
long lngMaxRowsToPull = Convert.ToInt64(Dts.Variables["$Project::MaxRowsPerPull"].Value);

How can I set an expression to the FileSpec property on Foreach File enumerator?

I'm trying to create an SSIS package to process files from a directory that contains many years worth of files. The files are all named numerically, so to save processing everything, I want to pass SSIS a minimum number, and only enumerate files whose name (converted to a number) is higher than my minimum.
I've tried letting the ForEach File loop enumerate everything and then exclude files in a Script Task, but when dealing with hundreds of thousands of files, this is way too slow to be suitable.
The FileSpec property lets you specify a file mask to dictate which files you want in the collection, but I can't quite see how to specify an expression to make that work, as it's essentially a string match.
If there's an expression within the component somewhere which basically says Should I Enumerate? - Yes / No, that would be perfect. I've been experimenting with the below expression, but can't find a property to which to apply it.
(DT_I4)REPLACE( SUBSTRING(#[User::ActiveFilePath],FINDSTRING( #[User::ActiveFilePath], "\", 7 ) + 1 ,100),".txt","") > #[User::MinIndexId] ? "True" : "False"
Here is one way you can achieve this. You could use Expression Task combined with Foreach Loop Container to match the numerical values of the file names. Here is an example that illustrates how to do this. The sample uses SSIS 2012.
This may not be very efficient but it is one way of doing this.
Let's assume there is a folder with bunch of files named in the format YYYYMMDD. The folder contains files for the first day of every month since 1921 like 19210101, 19210201, 19210301 .... all the upto current month 20121101. That adds upto 1,103 files.
Let's say the requirement is only to loop through the files that were created since June 1948. That would mean the SSIS package has to loop through only the files greater than 19480601.
On the SSIS package, create the following three parameters. It is better to configure parameters for these because these values are configurable across environment.
ExtensionToMatch - This parameter of String data type will contain the extension that the package has to loop through. This will supplement the value to FileSpec variable that will be used on the Foreach Loop container.
FolderToEnumerate - This parameter of String data type will store the folder path that contains the files to loop through.
MinIndexId - this parameter of Int32 data type will contain the minimum numerical value above which the files should match the pattern.
Create the following four parameters that will help us loop through the files.
ActiveFilePath - This variable of String data type will hold the file name as the Foreach Loop container loops through each file in the folder. This variable is used in the expression of another variable. To avoid error, set it to a non-empty value, say 1.
FileCount - This is a dummy variable of Int32 data type will be used for this sample to illustrate the number of files that the Foreach Loop container will loop through.
FileSpec - This variable of String data type will hold the file pattern to loop through. Set the expression of this variable to below mentioned value. This expression will use the extension specified on the parameters. If there are no extensions, it will *.* to loop through all files.
"*" + (#[$Package::ExtensionToMatch] == "" ? ".*" : #[$Package::ExtensionToMatch])
ProcessThisFile - This variable of Boolean data type will evaluate whether a particular file matches the criteria or not.
Configure the package as shown below. Foreach loop container will loop through all the files matching the pattern specified on the FileSpec variable. An expression specified on the Expression Task will evaluate during runtime and will populate the variable ProcessThisFile. The variable will then be used on the Precedence constraint to determine whether to process the file or not.
The script task within the Foreach loop container will increment the counter of variable FileCount by 1 for each file that successfully matches the expression.
The script task outside the Foreach loop will simply display how many files were looped through by the Foreach loop container.
Configure the Foreach loop container to loop through the folder using the parameter and the files using the variable.
Store the file name in variable ActiveFilePath as the loop passes through each file.
On the Expression task, set the expression to the following value. The expression will convert the file name without the extension to a number and then will check if it evaluates to greater than the given number in the parameter MinIndexId
#[User::ProcessThisFile] = (DT_BOOL)((DT_I4)(REPLACE(#[User::ActiveFilePath], #[User::FileSpec] ,"")) > #[$Package::MinIndexId] ? 1: 0)
Right-click on the Precedence constraint and configure it to use the variable ProcessThisFile on the expression. This tells the package to process the file only if it matches the condition set on the expression task.
#[User::ProcessThisFile]
On the first script task, I have the variable User::FileCount set to the ReadWriteVariables and the following C# code within the script task. This increments the counter for file that successfully matches the condition.
public void Main()
{
Dts.Variables["User::FileCount"].Value = Convert.ToInt32(Dts.Variables["User::FileCount"].Value) + 1;
Dts.TaskResult = (int)ScriptResults.Success;
}
On the second script task, I have the variable User::FileCount set to the ReadOnlyVariables and the following C# code within the script task. This simply outputs the total number of files that were processed.
public void Main()
{
MessageBox.Show(String.Format("Total files looped through: {0}", Dts.Variables["User::FileCount"].Value));
Dts.TaskResult = (int)ScriptResults.Success;
}
When the package is executed with MinIndexId set to 1948061 (excluding this), it outputs the value 773.
When the package is executed with MinIndexId set to 20111201 (excluding this), it outputs the value 11.
Hope that helps.
From investigating how the ForEach loop works in SSIS (with a view to creating my own to solve the issue) it seems that the way it works (as far as I could see anyway) is to enumerate the file collection first, before any mask is specified. It's hard to tell exactly what's going on without seeing the underlying code for the ForEach loop but it seems to be doing it this way, resulting in slow performance when dealing with over 100k files.
While #Siva's solution is fantastically detailed and definitely an improvement over my initial approach, it is essentially just the same process, except using an Expression Task to test the filename, rather than a Script Task (this does seem to offer some improvement).
So, I decided to take a totally different approach and rather than use a file-based ForEach loop, enumerate the collection myself in a Script Task, apply my filtering logic, and then iterate over the remaining results. This is what I did:
In my Script Task, I use the asynchronous DirectoryInfo.EnumerateFiles method, which is the recommended approach for large file collections, as it allows streaming, rather than having to wait for the entire collection to be created before applying any logic.
Here's the code:
public void Main()
{
string sourceDir = Dts.Variables["SourceDirectory"].Value.ToString();
int minJobId = (int)Dts.Variables["MinIndexId"].Value;
//Enumerate file collection (using Enumerate Files to allow us to start processing immediately
List<string> activeFiles = new List<string>();
System.Threading.Tasks.Task listTask = System.Threading.Tasks.Task.Factory.StartNew(() =>
{
DirectoryInfo dir = new DirectoryInfo(sourceDir);
foreach (FileInfo f in dir.EnumerateFiles("*.txt"))
{
FileInfo file = f;
string filePath = file.FullName;
string fileName = filePath.Substring(filePath.LastIndexOf("\\") + 1);
int jobId = Convert.ToInt32(fileName.Substring(0, fileName.IndexOf(".txt")));
if (jobId > minJobId)
activeFiles.Add(filePath);
}
});
//Wait here for completion
System.Threading.Tasks.Task.WaitAll(new System.Threading.Tasks.Task[] { listTask });
Dts.Variables["ActiveFilenames"].Value = activeFiles;
Dts.TaskResult = (int)ScriptResults.Success;
}
So, I enumerate the collection, applying my logic as files are discovered and immediately adding the file path to my list for output. Once complete, I then assign this to an SSIS Object variable named ActiveFilenames which I'll use as the collection for my ForEach loop.
I configured the ForEach loop as a ForEach From Variable Enumerator, which now iterates over a much smaller collection (Post-filtered List<string> compared to what I can only assume was an unfiltered List<FileInfo> or something similar in SSIS' built-in ForEach File Enumerator.
So the tasks inside my loop can just be dedicated to processing the data, since it has already been filtered before hitting the loop. Although it doesn't seem to be doing much different to either my initial package or Siva's example, in production (for this particular case, anyway) it seems like filtering the collection and enumerating asynchronously provides a massive boost over using the built in ForEach File Enumerator.
I'm going to continue investigating the ForEach loop container and see if I can replicate this logic in a custom component. If I get this working I'll post a link in the comments.
The best you can do is use FileSpec to specify a mask, as you said. You could include at least some specs in it, like files starting with "201" for 2010, 2011 and 2012. Then, in some other task, you could filter out those you don't want to process (for instance, 2010).

What is the simple way to find the column name from Lineageid in SSIS

What is the simple way to find the column name from Lineageid in SSIS. Is there any system variable avilable?
I remember saying this can't be that hard, I can write some script in the error redirect to lookup the column name from the input collection.
string badColumn = this.ComponentMetaData.InputCollection[Row.ErrorColumn].Name;
What I learned was the failing column isn't in that collection. Well, it is but the ErrorColumn reported is not quite what I needed. I couldn't find that package but here's an example of why I couldn't get what I needed. Hopefully you will have better luck.
This is a simple data flow that will generate an error once it hits the derived column due to division by zero. The Derived column generates a new output column (LookAtMe) as the result of the division. The data viewer on the Error Output tells me the failing column is 73. Using the above script logic, if I attempted to access column 73 in the input collection, it's going to fail because that is not in the collection. LineageID 73 is LookAtMe and LookAtMe is not in my error branch, it's only in the non-error branch.
This is a copy of my XML and you can see, yes, the outputColumn id 73 is LookAtme.
<outputColumn id="73" name="LookAtMe" description="" lineageId="73" precision="0" scale="0" length="0" dataType="i4" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Computation" errorRowDisposition="RedirectRow" truncationRowDisposition="RedirectRow" externalMetadataColumnId="0" mappedColumnId="0"><properties>
I really wanted that data though and I'm clever so I can union all my results back together and then conditional split it back out to get that. The problem is, Union All is an asynchronous transformation. Async transformations result in the data being copied from one set of butters to another resulting in...new lineage ids being assigned so even with a union all bringing the two streams back together, you wouldn't be able to call up the data flow chain to find that original lineage id because it's in a different buffer.
Around this point, I conceded defeat and decided I could live without intelligent/helpful error reporting in my packages.
I know this is a long dead thread but I tripped across a manual solution to this problem and thought I would share for anyone who happens upon this same problem. Granted this doesn't provide a programmatic solution to the problem but for simple debugging it should do the trick. The solution uses a Derived Column as an example but this seems to work for any Data Flow component.
Answer provided by Todd McDermid and taken from AskSQLServerCentral:
"[...] Unfortunately, the lineage ID of your columns is pretty well hidden inside SSIS. It's the "key" that SSIS uses to identify columns. So, in order to figure out which column it was, you need to open the Advanced Editor of the Derived Column component or Data Conversion. Do that by right clicking and selecting "Advanced Editor". Go to the "Input and Output Properties" tab. Open the first node - "Derived Column Input" or "Data Conversion Input". Open the "Input Columns" tab. Click through the columns, noting the "LineageID" property of each. You may have to do the same with the "Derived Column Output" node, and "Output Columns" inside there. The column that matches your recorded lineage ID is the offending column."
For anyone using SQL Server versions before SS2016, here are a couple of reference links for a way to get the Column name:
http://www.andrewleesmith.co.uk/2017/02/24/finding-the-column-name-of-an-ssis-error-output-error-column-id/
which is based on:
http://toddmcdermid.blogspot.com/2016/04/finding-column-name-for-errorcolumn.html
I appreciate we aren't supposed to just post links, but this solution is quite convoluted, and I've tried to summarise by pulling info from both Todd and Andrew's blog posts and recreating them here. (thank you to both if you ever read this!)
From Todd's page:
Go to the "Inputs and Outputs" page, and select the "Output 0" node.
Change the "SynchronousInputID" property to "None". (This changes
the script from synchronous to asynchronous.)
On the same page, open the "Output 0" node and select the "Output
Columns" folder. Press the "Add Column" button. Change the "Name"
property of this new column to "LineageID".
Press the "Add Column" button again, and change the "DataType"
property to "Unicode string [DT_WSTR]", and change the "Name"
property to "ColumnName".
Go to the "Script" page, and press the "Edit Script" button. Copy
and paste this code into the ScriptMain class (you can delete all
other method stubs):
public override void CreateNewOutputRows() {
IDTSInput100 input = this.ComponentMetaData.InputCollection[0];
if (input != null)
{
IDTSVirtualInput100 vInput = input.GetVirtualInput();
if (vInput != null)
{
foreach (IDTSVirtualInputColumn100 vInputColumn in vInput.VirtualInputColumnCollection)
{
Output0Buffer.AddRow();
Output0Buffer.LineageID = vInputColumn.LineageID;
Output0Buffer.ColumnName = vInputColumn.Name;
}
}
} }
Feel free to attach a dummy output to that script, with a data viewer,
and see what you get. From here, it's "standard engineering" for you
ETL gurus. Simply merge join the error output of the failing
component with this metadata, and you'll be able to transform the
ErrorColumn number into a meaningful column name.
But for those of you that do want to understand what the above script
is doing:
It's getting the "first" (and only) input attached to the script
component.
It's getting the virtual input related to the input. The "input" is
what the script can actually "see" on the input - and since we
didn't mark any columns as being "ReadOnly" or "ReadWrite"... that
means the input has NO columns. However, the "virtual input" has
the complete list of every column that exists, whether or not we've
said we're "using" it.
We then loop over all of the "virtual columns" on this virtual
input, and for each one...
Get the LineageID and column name, and push them out as a new row on
our asynchronous script.
The image and text from Andrew's page helps explain it in a bit more detail:
This map is then merge-joined with the ErrorColumn lineage ID(s)
coming down the error path, so that the error information can be
appended with the column name(s) from the map. I included a second
script component that looks up the error description from the error
code, so the error table rows that we see above contain both column
names and error descriptions.
The remaining component that needs explaining is the conditional split
– this exists just to provide metadata to the script component that
creates the map. I created an expression (1 == 0) that always
evaluates to false for the “No Rows – Metadata Only” path, so no rows
ever travel down it.
Whilst this solution does require the insertion of some additional
plumbing within the data flow, we get extremely valuable information
logged when errors do occur. So especially when the data flow is
running unattended in Production – when we don’t have the tools &
techniques available at design time to figure out what’s going wrong –
the logging that results gives us much more precise information about
what went wrong and why, compared to simply giving us the failed data
and leaving us to figure out why it was rejected.
There is no simple way to find out column name by lineage id.
If you want to know this using BIDS You have to inspect all components inside dataflow using Advanced properties, Input and Output columns tab and see LineageID for each column and input/output path.
But You can:
inspect XML - this is very difficult
write .NET application and use FindColumnByLineageId
However, second solution includes a lot of coding and understanding of pipeline because You have to programmaticaly: open the package, iterate over tasks, iterate inside containers, iterate over transformations inside data flows to find particular component to use proposed method.
Here is a solution that:
Works at package runtime (not pre-populating)
Is automated through a Script Task and Component
Doesn't involve installing new assemblies or custom components
Is nicely BIML compatible
Check out the full solution here.
EDIT
Here is the short version.
Create 2 Object variables, execsObj and lineageIds
Create Script Task in Control flow, give it ReadWrite access to both variables
Add an assembly reference to your script task for Microsoft.SqlServer.DTSPipelineWrap.dll (may required SQL Client SDK be installed; required for the MainPipe object below)
Insert the following code into your Script Task
Dictionary<int, string> lineageIds = null;
public void Main()
{
// Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
// Why the executables? Well, SSIS won't let us store a reference to the Package itself...
Dts.Variables["User::execsObj"].Value = ((Package)Dts.Variables["User::execsObj"].Parent).Executables;
Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
lineageIds = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;
Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;
ReadExecutables(execs);
Dts.TaskResult = (int)ScriptResults.Success;
}
private void ReadExecutables(Executables executables)
{
foreach (Executable pkgExecutable in executables)
{
if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
{
TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
{
ProcessDataFlowTask(pkgExecTaskHost);
}
}
else if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
{
// Recurse into FELCs
ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
}
}
}
private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
{
MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;
foreach (IDTSComponentMetaData100 currentComponent in currentDataFlow.ComponentMetaDataCollection)
{
// Get the inputs in the component.
foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
foreach (IDTSInputColumn100 currentInputColumn in currentInput.InputColumnCollection)
lineageIds.Add(currentInputColumn.ID, currentInputColumn.Name);
// Get the outputs in the component.
foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
foreach (IDTSOutputColumn100 currentoutputColumn in currentOutput.OutputColumnCollection)
lineageIds.Add(currentoutputColumn.ID, currentoutputColumn.Name);
}
}
Create Script Component in Dataflow with ReadOnly access to lineageIds and the following code.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Dictionary<int, string> lineageIds = (Dictionary<int, string>)Variables.lineageIds;
int? colNum = Row.ErrorColumn;
if (colNum.HasValue && (lineageIds != null))
{
if (lineageIds.ContainsKey(colNum.Value))
Row.ErrorColumnName = lineageIds[colNum.Value];
else
Row.ErrorColumnName = "Row error";
}
Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
}