How to read value from proj.params in the Biml file - ssis

I'm developing a SSIS project, where I use global project params. Recently I linked values for these params with VS configurations:
Now, I would like to assign values of these params to variables in my Biml code, depending on the which configuration is active at the moment in VS. However, I do not know how to access these parameters using Biml class hierarchy:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
var s = Dts.Variables[$Project::strFolderPath];
#>
The above expression I tried is unfortunately not valid.
Any idea which expression should I use to access a value of parameter that is active at the moment?

When calling the Dts.Variables, you need to treat the parameter reference like a string:
var s = Dts.Variables["$Project::strFolderPath"];
^ ^

Related

Is it possible to have a module function that modifies parameter values of the caller script?

Motivation
Reduce the maintenance of an Azure DevOps task that invokes a Powershell script with a lot of parameters ("a lot" could be 5).
The idea relies on the fact that Azure DevOps generates environment variables to reflect the build variables. So, I devised the following scheme:
Prefix all non secret Azure DevOps variables with MyBuild.
The task powershell script would call a function to check the script parameters against the MyBuild_ environment variables and would automatically assign the value of the MyBuild_xyz environment variable to the script parameter xyz if the latter has no value.
This way the task command line would only contain secret parameters (which are not reflected in the environment). Often, there are no secret parameters and so the command line remains empty. We find this scheme to reduce the maintenance of the tasks driven by a powershell script.
Example
param(
$DBUser,
[ValidateNotNullOrEmpty()]$DBPassword,
$DBServer,
$Configuration,
$Solutions,
$ClientDB = $env:Build_DefinitionName,
$RawBuildVersion = $env:Build_BuildNumber,
$BuildDefinition = $env:Build_DefinitionName,
$Changeset = $env:Build_SourceVersion,
$OutDir = $env:Build_BinariesDirectory,
$TempDir,
[Switch]$EnforceNoMetadataStoreChanges
)
$ErrorActionPreference = "Stop"
. $PSScriptRoot\AutomationBootstrap.ps1
$AutomationScripts = GetToolPackage DevOpsAutomation
. "$AutomationScripts\vNext\DefaultParameterValueBinding.ps1" $PSCommandPath -Required 'ClientDB' -Props #{
OutDir = #{ DefaultValue = [io.path]::GetFullPath("$PSScriptRoot\..\..\bin") }
TempDir = #{ DefaultValue = 'D:\_gctemp' }
DBUser = #{ DefaultValue = 'SomeUser' }
}
The described parameter binding logic is implemented in the script DefaultParameterValueBinding.ps1 which is published in a NuGet package. The code installs the package and thus gets access to the script.
In the example above, some parameters default to predefined Azure Devops variables, like $RawBuildVersion = $env:Build_BuildNumber. Some are left uninitialized, like $DBServer, which means it would default to $env:MyBuild_DBServer.
We can get away without the special function to do the binding, but then the script author would have to write something like this:
$DBServer = $env:MyBuild_DBServer,
$Configuration = $env:MyBuild_Configuration,
$Solutions = $env:MyBuild_Solutions,
I wanted to avoid this, because of the possibility of an accidental name mismatch.
The Problem
The approach does not work when I package the logic of DefaultParameterValueBinding.ps1 into a module function. This is because of the module scope isolation - I just cannot modify the parameters of the caller script.
Is it still possible to do? Is it possible to achieve my goal in a more elegant way? Remember, I want to reduce the cost associated with maintaining the task command line in Azure DevOps.
Right now I am inclined to retreat back to this scheme:
$xyz = $(Resolve-ParameterValue 'xyz' x y z ...)
Where Resolve-ParameterValue would first check $env:MyBuild_xyz and if not found select the first not null value out of x,y,z,...
But if the Resolve-ParameterValue method comes from a module, then the script must assume the module has already been installed, because it has no way to install it before the parameters are evaluated. Or has it?
EDIT 1
Notice the command line used to invoke the DefaultParameterValueBinding.ps1 script does not contain the caller script parameters! It does include $PSCommandPath, which is used to obtain the PSBoundParameters collection.
Yea, but it will require modifications to the calling script and the function. Pass the parameters by reference. Adam B. has a nice piece on passing parameters by reference in the following:
https://mcpmag.com/articles/2015/06/04/reference-variables-in-powershell.aspx
Net-net, the following is an example:
$age = 12;
function birthday {
param([ref]$age)
$age.value += 1
}
birthday -age ([ref]$age)
Write-Output $age
I've got an age of 12. I pass it into a function as a parameter. The function increments the value of $age by 1. You can do the same thing with a function in a module. You get my drift.

How do you make the executable path dynamic in DTS Execute ProcessTask?

I have a DTS package that calls an executable via an Execute Process Task object. The path of executable can change based on where the product that this is contained in is installed. Is there some way to make the executable path dynamic?
I tried using an expression for the executable property. I set it to the a value that came out of a stored procedure, but it seems to only calculate the value when you save the package. I tried setting DelayValidation = true, but it doesn't seem to ever update it at runtime.
I believe you have something amiss with your package. Update your question with concrete details or compare away to my sample.
Setup
I create 7 subfolders from my base location and inside each, I placed a batch file
#echo off
REM N replaced with value 0-6
ECHO C:\ssisdata\EXEC\N\RunMe.bat
This led to a structure like
C:\ssisdata\EXEC\0\RunMe.bat
C:\ssisdata\EXEC\1\RunMe.bat
C:\ssisdata\EXEC\2\RunMe.bat
C:\ssisdata\EXEC\3\RunMe.bat
C:\ssisdata\EXEC\4\RunMe.bat
C:\ssisdata\EXEC\5\RunMe.bat
C:\ssisdata\EXEC\6\RunMe.bat
When I run them, it simply reports back the hard-coded location message
SSIS
I created an SSIS package that had a For Loop Container and inside was an Execute Process Task coupled to a Script Task
Variables
FolderBase: string - C:\ssisdata\EXEC Abstracts away the common path
FolderChoice: int - 0 Montonically increasing value from 0 to 6. Use by the loop to force change the location of the executable
Output: string - `` Captures the output from the executable to prove it works as expected
CurrentExecutable: string - C:\ssisdata\EXEC\0\RunMe.bat This is an Expression based on the above variables. Expression is #[User::FolderBase] + "\\" + (DT_WSTR, 1) #[User::FolderChoice] + "\\RunMe.bat"
Execute Process Task
I did nothing of interest here. I route standard out to an SSIS Variable and I actually used C:\ssisdata\Exec\RunMe.bat as my source but the next step updated this screenshot.
On the Expressions tab, I used my Variable #[User::CurrentExecutable] and assigned it to the Executable property.
Script Task
I passed in my #[User::Output] variable and call Dts.Events.FireInformation to make the output show up.

Execute different SSIS packages based on Flat File name

I have several SSIS packages that parse different files into their respective tables. Using SSIS File Watcher task, I want to create a master package that will check a folder for the files and then select the proper sub package based on the OutputVariableName using a precedence constraint. I'm not sure how to write the expression to set the variable correctly. I want to set it based on a FINDSTRING() in the file name.
Your question is about how to use the File Watcher Task from Konesans in an integration services packages.
OutputVariableName String The name of the variable into which the full file path found will be written on completion of the task. The variable specified should be of type string
Via http://www.sqlis.com/post/file-watcher-task.aspx
Create an SSIS Variable at the Package level, called CurrentFileName. Configure the File Watcher Task such that the property OutputVariableName is User::CurrentFileName. When you drop a file into the folder that the Task is watching, it will assign the full path to that variable CurrentFileName.
Your desire is to do something with that Variable with FindString function to help determine the package to fire. Since you don't specify the something I'm going to assume it's based on file name.
It's a pity you're forcing a person to use FindString to perform this task. I say that because the .NET library offers an excellent static method to determine base file names and I hate seeing people re-invent (and debug) the wheel.
I would create 3 Variables to support this endeavor.
BaseFileName - String - Evaluate as Expression = True
FileExtensionPosition - Int32 - Evaluate as Expression = True
LastSlashPosition - Int32 - Evaluate as Expression = True
The LastSlashPosition is going to use FindString to determine the last occurrence of \ in a string. The FileExtensionPosition is going to determine the last occurence of . in a string. BaseFileName will use these numbers to calculate where to slice the #[User::CurrentFileName]` string.
The lazy trick I use for finding the last X in a string is to reverse it. It's then the first element in the string and I can pass 1 as the final parameter to FindString.
The expression assigned to #[User::LastSlashPosition] is
FINDSTRING(REVERSE(#[User::CurrentFileName]), "\\", 1)
The expression assigned to #[User::FileExtensionPosition] is
FINDSTRING(REVERSE(#[User::CurrentFileName]), ".", 1)
The expression assigned to #[User::BaseFileName] then becomes
SUBSTRING((RIGHT(#[User::CurrentFileName], #[User::LastSlashPosition] -1 )), 1, LEN(RIGHT( #[User::CurrentFileName], #[User::LastSlashPosition] -1)) - #[User::FileExtensionPosition] )
Breaking that down, (RIGHT(#[User::CurrentFileName], #[User::LastSlashPosition] -1 )) translates to the base file name. If CurrentFileName equals J:\ssisdata\so\FileWatcher\CleverFile.txt then that evaluates to "CleverFile.txt" I substring that to strip out the end. You can reduce this to just a single substring operation but my brain hurts this late.
Now you are trying to use, I assume, a series of Execute Package Tasks based on a Precedence Constraint of Success and #[User::BaseFilename] == "SubPackage1 You can do this and it'll work fine, it's just that you'll need to set each one up and as you go to set up a new child, you'll have to repeat the work.
As an alternative to this approach, I'd use a ForEach enumerator and define all my key value pairs in there.
Column 0 is the value that BaseFileName could evaluate to.
Column 1 is the SSIS package I want to fire off in response.
I created two variables to support this and configured them as such
My resulting package looks like
Not shown is the File Watcher Task because I have no desire to install that on my machine. Assume that connects to the ForEach Loop.
The ForEach loop spins through the key value pairs shown above. The Script Task inside the container is there just to provide a base for the Precedence Constraint to work. I configure the precedence constraint as Success and #[User::KeyName] == #[User::BaseFileName]
I then simulate the Execute Package Task firing based on that. The actual package name would be driven by the value of #[User::KeyPackage]
And that's your over-engineered solution for the day ;)

How can I set an expression to the FileSpec property on Foreach File enumerator?

I'm trying to create an SSIS package to process files from a directory that contains many years worth of files. The files are all named numerically, so to save processing everything, I want to pass SSIS a minimum number, and only enumerate files whose name (converted to a number) is higher than my minimum.
I've tried letting the ForEach File loop enumerate everything and then exclude files in a Script Task, but when dealing with hundreds of thousands of files, this is way too slow to be suitable.
The FileSpec property lets you specify a file mask to dictate which files you want in the collection, but I can't quite see how to specify an expression to make that work, as it's essentially a string match.
If there's an expression within the component somewhere which basically says Should I Enumerate? - Yes / No, that would be perfect. I've been experimenting with the below expression, but can't find a property to which to apply it.
(DT_I4)REPLACE( SUBSTRING(#[User::ActiveFilePath],FINDSTRING( #[User::ActiveFilePath], "\", 7 ) + 1 ,100),".txt","") > #[User::MinIndexId] ? "True" : "False"
Here is one way you can achieve this. You could use Expression Task combined with Foreach Loop Container to match the numerical values of the file names. Here is an example that illustrates how to do this. The sample uses SSIS 2012.
This may not be very efficient but it is one way of doing this.
Let's assume there is a folder with bunch of files named in the format YYYYMMDD. The folder contains files for the first day of every month since 1921 like 19210101, 19210201, 19210301 .... all the upto current month 20121101. That adds upto 1,103 files.
Let's say the requirement is only to loop through the files that were created since June 1948. That would mean the SSIS package has to loop through only the files greater than 19480601.
On the SSIS package, create the following three parameters. It is better to configure parameters for these because these values are configurable across environment.
ExtensionToMatch - This parameter of String data type will contain the extension that the package has to loop through. This will supplement the value to FileSpec variable that will be used on the Foreach Loop container.
FolderToEnumerate - This parameter of String data type will store the folder path that contains the files to loop through.
MinIndexId - this parameter of Int32 data type will contain the minimum numerical value above which the files should match the pattern.
Create the following four parameters that will help us loop through the files.
ActiveFilePath - This variable of String data type will hold the file name as the Foreach Loop container loops through each file in the folder. This variable is used in the expression of another variable. To avoid error, set it to a non-empty value, say 1.
FileCount - This is a dummy variable of Int32 data type will be used for this sample to illustrate the number of files that the Foreach Loop container will loop through.
FileSpec - This variable of String data type will hold the file pattern to loop through. Set the expression of this variable to below mentioned value. This expression will use the extension specified on the parameters. If there are no extensions, it will *.* to loop through all files.
"*" + (#[$Package::ExtensionToMatch] == "" ? ".*" : #[$Package::ExtensionToMatch])
ProcessThisFile - This variable of Boolean data type will evaluate whether a particular file matches the criteria or not.
Configure the package as shown below. Foreach loop container will loop through all the files matching the pattern specified on the FileSpec variable. An expression specified on the Expression Task will evaluate during runtime and will populate the variable ProcessThisFile. The variable will then be used on the Precedence constraint to determine whether to process the file or not.
The script task within the Foreach loop container will increment the counter of variable FileCount by 1 for each file that successfully matches the expression.
The script task outside the Foreach loop will simply display how many files were looped through by the Foreach loop container.
Configure the Foreach loop container to loop through the folder using the parameter and the files using the variable.
Store the file name in variable ActiveFilePath as the loop passes through each file.
On the Expression task, set the expression to the following value. The expression will convert the file name without the extension to a number and then will check if it evaluates to greater than the given number in the parameter MinIndexId
#[User::ProcessThisFile] = (DT_BOOL)((DT_I4)(REPLACE(#[User::ActiveFilePath], #[User::FileSpec] ,"")) > #[$Package::MinIndexId] ? 1: 0)
Right-click on the Precedence constraint and configure it to use the variable ProcessThisFile on the expression. This tells the package to process the file only if it matches the condition set on the expression task.
#[User::ProcessThisFile]
On the first script task, I have the variable User::FileCount set to the ReadWriteVariables and the following C# code within the script task. This increments the counter for file that successfully matches the condition.
public void Main()
{
Dts.Variables["User::FileCount"].Value = Convert.ToInt32(Dts.Variables["User::FileCount"].Value) + 1;
Dts.TaskResult = (int)ScriptResults.Success;
}
On the second script task, I have the variable User::FileCount set to the ReadOnlyVariables and the following C# code within the script task. This simply outputs the total number of files that were processed.
public void Main()
{
MessageBox.Show(String.Format("Total files looped through: {0}", Dts.Variables["User::FileCount"].Value));
Dts.TaskResult = (int)ScriptResults.Success;
}
When the package is executed with MinIndexId set to 1948061 (excluding this), it outputs the value 773.
When the package is executed with MinIndexId set to 20111201 (excluding this), it outputs the value 11.
Hope that helps.
From investigating how the ForEach loop works in SSIS (with a view to creating my own to solve the issue) it seems that the way it works (as far as I could see anyway) is to enumerate the file collection first, before any mask is specified. It's hard to tell exactly what's going on without seeing the underlying code for the ForEach loop but it seems to be doing it this way, resulting in slow performance when dealing with over 100k files.
While #Siva's solution is fantastically detailed and definitely an improvement over my initial approach, it is essentially just the same process, except using an Expression Task to test the filename, rather than a Script Task (this does seem to offer some improvement).
So, I decided to take a totally different approach and rather than use a file-based ForEach loop, enumerate the collection myself in a Script Task, apply my filtering logic, and then iterate over the remaining results. This is what I did:
In my Script Task, I use the asynchronous DirectoryInfo.EnumerateFiles method, which is the recommended approach for large file collections, as it allows streaming, rather than having to wait for the entire collection to be created before applying any logic.
Here's the code:
public void Main()
{
string sourceDir = Dts.Variables["SourceDirectory"].Value.ToString();
int minJobId = (int)Dts.Variables["MinIndexId"].Value;
//Enumerate file collection (using Enumerate Files to allow us to start processing immediately
List<string> activeFiles = new List<string>();
System.Threading.Tasks.Task listTask = System.Threading.Tasks.Task.Factory.StartNew(() =>
{
DirectoryInfo dir = new DirectoryInfo(sourceDir);
foreach (FileInfo f in dir.EnumerateFiles("*.txt"))
{
FileInfo file = f;
string filePath = file.FullName;
string fileName = filePath.Substring(filePath.LastIndexOf("\\") + 1);
int jobId = Convert.ToInt32(fileName.Substring(0, fileName.IndexOf(".txt")));
if (jobId > minJobId)
activeFiles.Add(filePath);
}
});
//Wait here for completion
System.Threading.Tasks.Task.WaitAll(new System.Threading.Tasks.Task[] { listTask });
Dts.Variables["ActiveFilenames"].Value = activeFiles;
Dts.TaskResult = (int)ScriptResults.Success;
}
So, I enumerate the collection, applying my logic as files are discovered and immediately adding the file path to my list for output. Once complete, I then assign this to an SSIS Object variable named ActiveFilenames which I'll use as the collection for my ForEach loop.
I configured the ForEach loop as a ForEach From Variable Enumerator, which now iterates over a much smaller collection (Post-filtered List<string> compared to what I can only assume was an unfiltered List<FileInfo> or something similar in SSIS' built-in ForEach File Enumerator.
So the tasks inside my loop can just be dedicated to processing the data, since it has already been filtered before hitting the loop. Although it doesn't seem to be doing much different to either my initial package or Siva's example, in production (for this particular case, anyway) it seems like filtering the collection and enumerating asynchronously provides a massive boost over using the built in ForEach File Enumerator.
I'm going to continue investigating the ForEach loop container and see if I can replicate this logic in a custom component. If I get this working I'll post a link in the comments.
The best you can do is use FileSpec to specify a mask, as you said. You could include at least some specs in it, like files starting with "201" for 2010, 2011 and 2012. Then, in some other task, you could filter out those you don't want to process (for instance, 2010).

Accessing the Body of a Function with Lua

I'm going back to the basics here but in Lua, you can define a table like so:
myTable = {}
myTable [1] = 12
Printing the table reference itself brings back a pointer to it. To access its elements you need to specify an index (i.e. exactly like you would an array)
print(myTable ) --prints pointer
print(myTable[1]) --prints 12
Now functions are a different story. You can define and print a function like so:
myFunc = function() local x = 14 end --Defined function
print(myFunc) --Printed pointer to function
Is there a way to access the body of a defined function. I am trying to put together a small code visualizer and would like to 'seed' a given function with special functions/variables to allow a visualizer to 'hook' itself into the code, I would need to be able to redefine the function either from a variable or a string.
There is no way to get access to body source code of given function in plain Lua. Source code is thrown away after compilation to byte-code.
Note BTW that function may be defined in run-time with loadstring-like facility.
Partial solutions are possible — depending on what you actually want to achieve.
You may get source code position from the debug library — if debug library is enabled and debug symbols are not stripped from the bytecode. After that you may load actual source file and extract code from there.
You may decorate functions you're interested in manually with required metadata. Note that functions in Lua are valid table keys, so you may create a function-to-metadata table. You would want to make this table weak-keyed, so it would not prevent functions from being collected by GC.
If you would need a solution for analyzing Lua code, take a look at Metalua.
Check out Lua Introspective Facilities in the debugging library.
The main introspective function in the
debug library is the debug.getinfo
function. Its first parameter may be a
function or a stack level. When you
call debug.getinfo(foo) for some
function foo, you get a table with
some data about that function. The
table may have the following fields:
The field you would want is func I think.
Using the debug library is your only bet. Using that, you can get either the string (if the function is defined in a chunk that was loaded with 'loadstring') or the name of the file in which the function was defined; together with the line-numbers at which the function definition starts and ends. See the documentation.
Here at my current job we have patched Lua so that it even gives you the column numbers for the start and end of the function, so you can get the function source using that. The patch is not very difficult to reproduce, but I don't think I'll be allowed to post it here :-(
You could accomplish this by creating an environment for each function (see setfenv) and using global (versus local) variables. Variables created in the function would then appear in the environment table after the function is executed.
env = {}
myFunc = function() x = 14 end
setfenv(myFunc, env)
myFunc()
print(myFunc) -- prints pointer
print(env.x) -- prints 14
Alternatively, you could make use of the Debug Library:
> myFunc = function() local x = 14 ; debug.debug() end
> myFunc()
> lua_debug> _, x = debug.getlocal(3, 1)
> lua_debug> print(x) -- prints 14
It would probably be more useful to you to retrieve the local variables with a hook function instead of explicitly entering debug mode (i.e. adding the debug.debug() call)
There is also a Debug Interface in the Lua C API.