What is SSIS order of data transformation component method calls - ssis

I am working on a custom data transformation component. I'm using NUnit and NMock2 to test as I code. Testing and getting the custom UI and other features right is a huge pain, in part because I can't find any documentation about the order in which SSIS invokes methods on the component at design time as well as runtime.
I can correct the issues readily enough, but it's tedious and time consuming to unregister the old version, register the new version, fire up the test ssis package, try to display the UI, get an obscure error message, backtrace it, modify the component and continue.
One of the big issues involves the UI component needing access to the componentmetadata and buffermanager properties of the component at design time, and what I need to provide for to support properties that won't be initialized until after the user enters them in the UI.
I can work through it; but if someone knows of some docs or tips that would speed me up, I'd greatly appreciate it. The samples I've found havn't been much use; they seem to be directed to showing off cool stuff (Twitter, weather.com) rather than actual work.
Thanks in advance.

Here's a timeline of the run-time execution sequence: Run-time Methods of a Data Flow Component
The design-time sequence isn't laid out in MSDN that nicely because there just isn't such a sequence, but here's what I think/know:
1. ProvideComponentProperties - called ONCE EVER when the component is dropped on the design surface.
2. PerformUpgrade - called ONLY if the metadata version is different than the version attribute on the class - called on package load.
3. Validate - called FREQUENTLY... during package load, input attachment, entry into editor, etc...
4. ReinitializeMetaData - called infrequently, and only because a VS_NEEDSNEWMETADATA value is returned from Validate.
Everything other override (OnInputAttached, etc.) is fairly straightforward as to when it gets called. Here's the not-so-descriptive article: Design-time Methods of a Data Flow Component.

Related

SSIS Package error--Column data type DT_R8 is not supported by the PipelineBuffer class

"Column data type DT_R8 is not supported by the PipelineBuffer class."
This log was happening when im running the whole container.
on script component sebugging is working finely, single task also working finely. but while im running whole package im getting this error. advance thanks to helpers :)
I ran into this issue today. While my situation seem to be slightly different, the solution may still be applicable. Let me describe my situation first.
I have a rather complex package which has been in use for a long time. It stopped working after I made some changes to it today. The package still worked from inside the Visual Studio (2017), but gave a few different types of errors when run on the server as an agent job. The errors were:
Column data type DT_R8 is not supported by PipelineBuffer class.
System.IndexOutOfRangeException: Index was outside the bounds of the array.
Solution, which worked on two of my packages, was to recreate the Script Component. By recreating, I mean that create a brand new Script Component, as opposed to simply copy-and-paste the entire old Script Component. Then, manually transfer the code starting from "public class ScriptMain : UserComponent" and all the way to the end.

Restructuring to avoid accessing components in models

Continuing to work on my port of a CakePHP 1.3 app to 3.0, and have run into another issue. I have a number of areas where functionality varies based on certain settings, and I have previously used a modular component approach. For example, Leagues can have round-robin, ladder or tournament scheduling. This impacts on the scheduling algorithm itself, such that there are different settings required to configure each type, but also dictates the way standings are rendered, ties are broken, etc. (This is just one of 10 areas where I have something similar, though not all of these suffer from the problem below.)
My solution to this in the past was to create a LeagueComponent with a base implementation, and then extend that class as LeagueRoundRobinComponent, LeagueLadderComponent and LeagueTournamentComponent. When controllers need to do anything algorithm-specific, they check the schedule_type field in the leagues table, create the appropriate component, and call functions in it. This still works just fine.
I mentioned that this also affects views. The old solution for this was to pass the league component object from the controller to the view via $this->set. The view can then query it for various functionality. This is admittedly a bit kludgy, but the obvious alternative seems to be extracting all the info the view might require and setting it all individually, which doesn't seem to me to be a lot better. If there's a better option, I'm open to it, but I'm not overly concerned about this at the moment.
The problem I've encountered is when tables need to get some of that component info. The issue at hand is when I am saving my add/edit form and need to deal with the custom settings. In order to be as flexible as possible for the future, I don't have all of these possible setting fields represented in the database, but rather serialize them into a single "custom" column. (Reading this all works quite nicely with a custom constructor and getters.) I had previously done this by loading the component from the beforeSave function in the League model, calling the function that returns the list of schedule-specific settings, extracting those values and serializing them. But with the changes to component access in 3.0, it seems I can no longer create the component in my new beforeMarshal function.
I suppose the controller could "pass" the component to the table by setting it as a property, but that feels like a major kludge, and there must be a better way. It doesn't seem like extending the table class is a good solution, because that would horribly complicate associations. I don't think that custom types are the solution, as I don't see how they'd access a component either. I'm leaning towards passing just the list of fields from the controller to the model, that's more of a "configuration" method. Speaking of configuration, I suppose it could all just go into the central Configure data store, but that's always felt to me like somewhere that you only put "small" data. I'm wondering if there's a better design pattern I could follow that would let the table continue to take care of these implementation details on its own without the controller needing to get involved; if at some point I decide to change from the serialized method to adding all of the possible columns, it would be nice to have those changes restricted to the table class.
Oh, and keep in mind that this list of custom settings is needed in both a view and the table, so whatever solution is proposed will ideally provide a way for both of them to access it, rather than requiring duplication of code.

Word-VBA Functions "Method Saveas2 of object failed"

I have an access-vba application that also makes use of word-vba. While running the application on my local machine, it functions well. Once it is moved to others (same versions of access and word) it will crash when it comes to the vba portion of word. Commands such as document.open or .saveas2 fail: Method 'SaveAs2' of object failed for example.
I've also noticed that libraries that I've referenced in the application are required by any other end user. I'm used to just compiling with the libraries and from that point they are always included in the .jar/.exe/etc, but, it seems when you move the application to other's computers it's always trying to recompile?
I'm not well versed in VBA so I'm speculating that my failing word-vba functions are because of a referencing error, any other ideas?
The "libraries" that VBA can reference are actually COM objects, usually packaged as DLL files. They are objects which are dynamically instantiated at runtime (if they aren't already) when requested. They are loaded by Windows into memory and your program uses the COM standard to interact with them, calling methods and getting or setting properties (interprocess communication). There are generally two ways of interacting with them: early binding and late binding.
With early binding, you add a reference to the library while you are still writing code, which allows the VBA IDE to provide autocompletion and some compile-time error checking. You instantiate objects with the "new" keyword and by directly typing the object name. However, early binding requires that you select a specific dll and possibly a specific version of the interface. This can lead to issues if you reference a specific interface version which one of your users doesn't have.
With late binding, you instantiate objects using CreateObject or GetObject, requesting them by name from Windows. Windows will look the name up and return a reference to the object. The variables in your code are simply objects and calling methods is a bit dangerous because the compiler allows you to type in whatever method name you want and provides no compile-time warnings. This has the advantage that as long as you are calling well established methods and nothing new or deprecated, the code will work regardless of the user's version.
As for the error you are getting, you may want to check the version of Office on the user machines - SaveAs2 was added in Office 2010.

How can I check for downstream components in an SSIS custom transform?

I am working on a custom SSIS component that has 4 asynchronous outputs. It works just fine but now I have a user request for an enhancement and I am not sure how to handle it. They want to use the component in another context where only 2 of the 4 outputs will be well defined. I foolishly said that this would be trivial for me to support, I planned to just look to see if the two "undefined" streams were even connected, if not then I would skip over that part of the processing.
My problem is that I cannot figure out if an output is connected at run time, I had hoped that the output pipeline or output buffer would be missing. It doesn't look like that is the case; even when they are not hooked up the output and buffer are present.
Does anyone know where I should be looking to see if an output has a downstream consumer or not?
Thanks!
Edit: I was never able to figure out how to do this reliably, so I ended up making this behaviour configurable by the user. It is not automatic like I would have hoped but the difference I found between the BIDS environment and the DTExec environment pushed me to the conclusion that a component probably should not be making assumptions about the component graph it is embedded in.

Deterministic initialization and dependency injection (constructor based)

My demo application I'm working on has a very long startup routine. The application I'm trying to replace with the new ideas log a lot to the console during that (imagine: "now loading data... reticulating splines... login to third party service...").
After spending the whole day learning DI basically from scratch, I create the whole (!) object graph now with a single call to the container. Thank you, everybody here, btw, for providing so many ideas and amazing answers. This community rocks.
But now, what I want to do is to make initialization deterministic again, so I can log in my workflow (I'm using Workflow Foundation 4.0, because I like the declarative style and the fact that I can show people in graphics what happens) when I load data, reticulate splines and all that.
Do you think it would be an acceptable practice to have a "StartupManager" - class (the only singleton in my architecture now, I killed every other "instance getter"!) that will call secondary initialization methods on the objects it got injected (I used buildUp() and property based DI here)?
Reason is that I want to explicitly call the long initialization methods in my workflow activities. Looks amazing in the editor, my boss will be very happy when I present that (he didn't ask for it, it was my idea to spend the weekend doing something, also I think it is a lot of fun).
i assume you're creating your own DI framework for fun and to learn, right? otherwise just use existing one.
no :) you shouldn't have anything static. your algorithm may look like that:
create instance of your DI builder
feed that instance with dependencies definition (from file or programatically)
call your buildUp on that configured builder. this method should return an instance of context
on the context you call give_me_object_x and you should get an object x filled with all dependencies
or just look how spring is built - it's a very good example of well written DI framework