I have a Data Flow with OLE DB Source, Script Component (Transformation), and Flat File Destination:
The OLE DB Source task has 100+ columns. The script component is going to cleanup data in each column and then output it to the Flat File Destination.
Adding output columns by hand in Script Component is unthinkable to me.
What options do I have to mirror the output columns with the input columns in the Script Component? While the output column name will be the same, I plan to change the datatype from DT_STR to DT_WSTR.
Thank you.
You are short of luck here. Possible scenarios:
Either you use Script Component and have to key in all columns and its properties manually. In your case, you have to set proper datatype.
Or you can create your own Custom Component which can be programmed to create output columns based on input columns. It is not easy and I cannot recommend a simple guideline, but it could be done.
This might have sense if you have to repeat similar operations in many places so it is not a one-time task.
You can create a BIML script that creates a package based on metadata. However, the metadata (list of columns and its datatypes) has to be prepared before running BIML script or do some tricks to get it during script execution. Again, some proficiency with BIML is essential.
So, for one-time job and little experience with BIML I would go for a pure manual approach.
Related
I have the following problem:
We have a case where we need to import several dozen, if not hundreds, of uniquely structured source files with SSIS. Perfect opportunity to use BIML for this. We do not have metadata information for the files as is, so I need to get them - not manually, file by file, though.
So I thought, easy, build a table with filepaths, use BIML to create a package with the source connection, SSIS identifies metadata (maybe not 100% correctly) and I can use this metadata (column name, datatype, length, etc.) to initially persist in a metadata table for further use.
But: there seems to be no way to achieve this. While I can view the metadata and even paste it to the clipboard in SSDT, I cannot get it to the pipeline. Tried a script component as well (not my best skill), can get everything BUT the column name in ProcessInput, but can't create an output.
So: is there any known way of achieving this? Googled for several hours already to no avail.
I have an SSIS data flow task that reads a CSV file with certain fields, tweaks it a little and inserts results into a table. The source file name is a package parameter. All is good and fine there.
Now, I need to process slightly different kind of CSV files with an extra field. This extra field can be safely ignored, so the processing is essentially the same. The only difference is in the column mapping of the data source..
I could, of course, create a copy of the whole package and tweak the data source to match the second file format. However, this "solution" seems like terrible duplication: if there are any changes in the course of processing, I will have to do them twice. I'd rather pass another parameter to the package that would tell it what kind of file to process.
The trouble is, I don't know how to make SSIS read from one data source or another depending on parameter, hence the question.
I would duplicate the Connection Manager (CSV definition) and Data Flow in the SSIS package and tweak them for the new file format. Then I would use the parameter you described to Enable/Disable either Data Flow.
In essence, SSIS doesnt work with variable metadata. If this is going to be a recurring pattern I would deal with it upstream from SSIS, building a VB / C# command-line app to shred the files into SQL tables.
You could make the connection manager push all the data into 1 column. Then use a script transformation component to parse out the data to the output, depending on the number of fields in the row.
You can split the data based on delimiter into say a string array (I googled for help when I needed to do this). With the array you can tell the size of it and thus what type of file it is that has been connected to.
Then, your mapping to the destination can remain the same. No need to duplicate any components either.
I had to do something similar myself once, because although the files I was using were meant to always be the same format - depending on version of the system sending the file, it could change - and thus by handling it in a script transformation this way I was able to handle the minor variations to the file format. If the files are 99% always the same that is ok.. if they were radically different you would be better to use a separate file connection manager.
This question is going to be a purely organizational question about SSIS project best practice for medium sized imports.
So I have source database which is continuously being enriched with new data. Then I have a staging database in which I sometimes load the data from the source database so I can work on a copy of the source database and migrate the current system. I am actually using a SSIS Visual Studio project to import this data.
My issue is that I realised the actual design of my project is not really optimal and now I would like to move this project to SQL Server so I can schedule the import instead of running manually the Visual Studio project. That means the actual project needs to be cleaned and optimized.
So basically, for each table, the process is simple: truncate table, extract from source and load into destination. And I have about 200 tables. Extractions cannot be parallelized as the source database only accepts one connection at a time. So how would you design such a project?
I read from Microsoft documentation that they recommend to use one Data Flow per package, but managing 200 different package seems quite impossible, especially that I will have to chain for scheduling import. On the other hand a single package with 200 Data Flows seems unamangeable too...
Edit 21/11:
The first apporach I wanted to use when starting this project was to extract my table automatically by iterating on a list of table names. This could have worked out well if my source and destination tables had all the same schema object names, but the source and destination database being from different vendor (BTrieve and Oracle) they also have different naming restrictions. For example BTrieve does not reserve names and allow more than 30 characters names, which Oracle does not. So that is how I ended up manually creating 200 data flows with a semi-automatic column mapping (most were automatic).
When generating the CREATE TABLE query for the destination database, I created a reusable C# library containing the methods to generate the new schema object names, just in case the methodology could automated. If there was any custom tool to generate the package that could use an external .NET library, then this might do the trick.
Have you looked into BIDS Helper's BIML (Business Intelligence Markup Language) as a package generation tool? I've used it to create multiple packages that all follow the same basic truncate-extract-load pattern. If you need slightly more cleverness than what's built into BIML, there's BimlScript, which adds the ability to embed C# code into the processing.
From your problem description, I believe you'd be able to write one BIML file and have that generate two hundred individual packages. You could probably use it to generate one package with two hundred data flow tasks, but I've never tried pushing SSIS that hard.
You can basically create 10 child packages each having 20 data flow tasks and create a master package which triggers these child pkgs.Using parent to child configuration create a single XML file configuration file .Define the precedence constraint for executing the package in serial fashion in master pkg. In this way maintainability will be better compared to having 200 packages or single package with 200 data flow tasks.
Following link may be useful to you.
Single SSIS Package for Staging Process
Hope this helps!
I got an excel file for pulling data to SQL Server DB. At Excel Source, I wanted to create ROW_NUMBER() like in T-SQL, as an additional column. Is it possible in EXcel Source query ? How ?
you can do it using a script component: Generating Surrogate Keys
or you can download a custom component that does that for you. There are a few, but the downsize of this option is that you have to deal with the deployment of this component (which is simple, just copy the dll, but is one more thing to worry about)
Being an SSIS newbie, I am trying to figure out the best possible way to transfer multiple tables. I am trying to import multiple tables from one database to another. I could write multiple parallel data flows for each table, however, I want to be smart about it.
For each of the tables, If I were to generalize,
I need to transfer rows from one table to a table in another database
I need to count the number of rows transferred
Have to record the start and finish time of the data transfer for each table
Record any errors
I am trying not to use Stored procedures since I want people to not have to dig deep into the DB to get the rules for this transformation. I would ideally like to have this done at the SSIS level using the components that therefore can be seen visually and understood.
Any best practises that people have used before?
I would ideally want to do something like
foreach (table in list of tables to transfer)
transfer (table name)
To make a generic table handler you would have to programatically construct the data flow. AFAIK SSIS has no auto-introspection facility. A script task will allow you to do this, and you can get the table metadata from the source. However, you will have to programatically construct the data flow, which means fiddling with the API.
I have worked on a product where this was done, although I didn't develop that component, so I can't offer words of wisdom off the top of my head as to how to do it. However, you can find resources on the web that explain how to do it.
You can find the table structure and types of the columns by querying against the system data dictionary. See this posting for some links to resources describing how this, including a link to a code sample.
What is your destination database doing with this info? Is it simply reading it?
Perhaps you would be best served by replicating the tables.
You could create a config table that has a list of your tables you want to move and then use a for loop to do something repeatedly....but what to do.
http://blogs.conchango.com/jamiethomson/archive/2005/02/28/SSIS_3A00_-Dynamic-modification-of-SSIS-packages.aspx
Below the bullet points, he states that SSIS cannot be modified to change metadata at run time. And to make it easy to maintain....you're going the wrong direction.
I'd keep it simple and use the wizard and then customize with logging/notifications etc.
Maybe you can call the stored procedure inside of your ssis scripts. Here is an example of how you might be able to use the sp
http://blog.sqlauthority.com/2012/10/31/sql-server-copy-data-from-one-table-to-another-table-sql-in-sixty-seconds-031-video/