I'd like to pass a Data Flow from one package to another for the following reasons:
It would help in refactoring common
logic in SSIS packages.
It would enable concurrent
development of larger SSIS packages.
At first glance, the Execute Package Task sounded promising, but it looks like I can only pass fairly simple variables in and out of the package.
Is there a way to do this using SSIS?
cozyroc.com is a third party tool that can do this I believe.
A bit of clarity Paul, are you talking about 1) code reusability or 2) allowing the results from one DFT to be used in another DFT?
The first you can't do in "native" SSIS I believe - a set of DFT modules that you can call from other packages, but I would approach it as building a set of packages that are quite simple
initialision routines
DFT
cleanup
Then having variables passed to the child package that are (e.g.) table to be processed, variable(s) to be selected from the source table.
It would requrie a very clever schema and some clever thinking about what the common DFT would do. But I think it would be possible.
The second is not possible without jumping through a few hoops - like saving result sets to temporary tables then re-reading the tables into later DFTs, but then you would loose the data actually flowing through the task.
Related
I am working on a migration project to Azure. I have tried to search the internet with no avail. beside accessing the table name via the XML.
Your biggest challenge will be how clever previous developers were. A table name may be specified as a hard coded value - easy or an SSIS Variable - hard.
Why is the variable approach hard? I could have CurrentTableName and the value in the xml specifies table20180317. But there could be an expression on that variable that really makes the table name tableYYYYMMDD and when the package runs, it evaluates to table20191212. If no expression is set, then you still have to worry about Expression Tasks (2012+), Script Tasks and command line property overrides.
Since you haven't specified what the team/you are going to be good at using tool-wise, nor how complex the packages are, it's hard to say propose a best approach for resolving this. Personally, I would look at adding the free BimlExpress plugin to your Visual Studio/SSDT instance. I'd then reverse engineer the SSIS packages into Biml. Biml is an XML dialect that simplifies package creation and inspection. Instead of all the xml cruft that a .dtsx package contains, the attributes in the Biml representation of a package is going to be much simplier.
From there, I'd either hand inspect the Biml if the number of packages is something small. If something larger, I'd use Linq and the Biml object model to enumerate through the packages and build out a list of all the user defined variables, connection strings, the SQL from any Execute SQL Task and then the sources and sinks of a data flow along with which connection manager it uses and the target table/variable.
SSIS newbie here.
I have an SSIS package I created based on the wizard. I added a SQL task to run the script I was running previously separately, in order to reduce the process to one step. The script uses lots of temp tables, and one global ##temp at the end to make the result accessible outside the process.
When I try to execute the package, I get a complex "Package Validation Error" (error code 0x80040E14). I think the operative part of the error message is "Invalid object name '##roster5'."
I just realized it was the Data Flow task that was throwing the error, so I tried to put another SQL Task before everything else to create the table so that Data Flow task would see that the table is there; but it is still giving me the error: "Invalid object name '##ROSTER_MEMBER_NEW5'."
What am I missing/doing wrong? I don't know what I don't know. It seems like this shouldn't be that complicated (As a newbie, I know that this is probably a duplicate of...something, but I don't know how else to ask the question.)
Based on your responses, another option would be to add a T-SQL step in a SQL Agent job that executes stand-alone T-SQL. You would need to rethink the flow control of your original SSIS package and split that into 2 separate packages. The first SSIS package would execute all that is needed before the T-SQL step, the next step would execute the actual T-SQL needed to aggregate, then the last step would call the second package, which would complete the process.
I'm offering this advice with the caveat that it isn't advisable. What would work best is to communicate with your DBA, who will be able to offer you a service account to execute your SSIS package with the elevated privileges needed to truncate the staging table that will need to exist for your process to manage.
I actually want to post a non-answer. I tried to follow the advice above as well as I could, but nothing worked. My script was supposed to run, and then the data pump was supposed to, essentially copy the content of a global temp to another server/table. I was doing this as two steps, and tried to use SSIS to do it all in one step. there wasn't really a need to pass values within SSIS from component to component. It doesn't seem like this should be that hard.
In any event, as I said nothing worked. Ok, let me tell what I think happened. After making a lot of mistakes, a lot of undo's, and a lot of unsuccessful attempts, something started working. One of the things I think contributes is that I had set the ResultSetType to ResultSetType_None, since I wouldn't be using any results from that step. If anyone thinks that's not what happened, I'm happy to hear the actuality, since I want to learn.
I consider this a non-answer, because I have little confidence that I'm right, or that I got it by anything other than an accident.
I am using SSIS for data warehousing to import data from different sources like flat files,.xls and some other SQL server servers.
In my scenario i have 50 data flow task which execute in a package(Control flow) parallel.These data flow are independent means fetching data from different tables and files into my warehouse DB.
In my case sometime structure of my source table or file changed and then my package got failed means show validation error.
I need a solution by which I can skip only corrupted "data flow task" and other data flow task can complete their task. I don't like to make separate package for each data flow task.
Please advise what to do in such situation.
Regards
Shakti
I highly advise putting each of these into a separate package, and then using a scheduling tool or master package to call each one individually. It will make the maintainability of this solution much better.
If you insist on having them all in one package, you can use the "FailParentOnFailure", "FailPackageOnFailure", and "MaximumErrorCount" properties to have your data flow fail, but the container ignore errors, allowing other things to run. Really probably shouldn't do that though - failures could be for any number of reasons and having separate packages that run in parallel makes finding the error during a scheduled run much easier...
I need to find the list of all the tables used in a SSIS package. One way that I know of is to open the package in a textpad and search for the tables. But is there any reliable and fast way to do this using C# or some other technology?
I'm not sure there is a reliable way to do this. For example, tables can be referenced by variables that can change value at runtime or be amended via package configurations or values in a database. So while you could look through the XML code for mentions of tables, you could not guarantee with would be 100% correct for any given package. You could possibly look at running something like Profiler while the package is executing and then go through the output to see what tables were accessed and/or amended - this would be a very manual task however.
What is the best way to design a SSIS package? I'm loading multiple dimensions and facts as part of a project. Would it be better to:
Have 1 package and 1 data flow with all data extract and load logic in 1 dataflow?
Have 1 package and multiple data flows with each data flow taking on the logic for 1 dimension?
Have 1 package per dimension and then a master package that calls them all?
After doing some research 2 and 3 appears to be more viable options. Any experts out there that want to share their experience and/or propose an alternative?
Microsoft's Project Real is an excellent example of many many best practices:
Package Design and Config for Dimensional Modeling
Package logging
Partitioning
It's based in SQL 2005 but is very applicable to 2008. It supports your option #3.
You could also consider having multiple packages called by a SQL Server Agent job.
I would often go for option 3. This is the method used in the Kimball Microsoft Data Warehouse Toolkit book, worth a read.
http://www.amazon.co.uk/Microsoft-Data-Warehouse-Toolkit-Intelligence/dp/0471267155/ref=sr_1_1?ie=UTF8&s=books&qid=1245347732&sr=8-1
I think the answer is not quite as clear cut ... In the same way that there is often no "best" design for a DWH, I think there is no one "best" package method.
It is quite dependent on the number of dimensions and the number of related dimensions and the structure of data in your staging area.
I quite like the Project Real (mentioned above) approaches, especially thought the package logging was quite well done. I think I have read somewhere that Denali (SQL 2011) will have SSIS logging/tracking built in, but not sure of the details.
From a calling perspective, I would go for one SQL agent job, that calls a Master Package that then calls all the child packages and manages the error handling/logic/emailing etc between them, utilising Log/Error tables to track and manage the package flow. SSIS allows much more complex sets of logic that SQL agent (e.g. call this Child Package if all of tasks A and B and C have finished and not task D)
Further, I would go for one package per Snowflaked dimension, as usually from the staging data one source table will generate a number of snowflaked dimensions (e.g. DimProduct, DimProductCategory, DimProductSubCategory). It would make sense to have the data read in once in on data flow task (DFT) and written out to multiple tables. I would use one container per dimension for separation of logic.