I have an SSIS data flow task that reads a CSV file with certain fields, tweaks it a little and inserts results into a table. The source file name is a package parameter. All is good and fine there.
Now, I need to process slightly different kind of CSV files with an extra field. This extra field can be safely ignored, so the processing is essentially the same. The only difference is in the column mapping of the data source..
I could, of course, create a copy of the whole package and tweak the data source to match the second file format. However, this "solution" seems like terrible duplication: if there are any changes in the course of processing, I will have to do them twice. I'd rather pass another parameter to the package that would tell it what kind of file to process.
The trouble is, I don't know how to make SSIS read from one data source or another depending on parameter, hence the question.
I would duplicate the Connection Manager (CSV definition) and Data Flow in the SSIS package and tweak them for the new file format. Then I would use the parameter you described to Enable/Disable either Data Flow.
In essence, SSIS doesnt work with variable metadata. If this is going to be a recurring pattern I would deal with it upstream from SSIS, building a VB / C# command-line app to shred the files into SQL tables.
You could make the connection manager push all the data into 1 column. Then use a script transformation component to parse out the data to the output, depending on the number of fields in the row.
You can split the data based on delimiter into say a string array (I googled for help when I needed to do this). With the array you can tell the size of it and thus what type of file it is that has been connected to.
Then, your mapping to the destination can remain the same. No need to duplicate any components either.
I had to do something similar myself once, because although the files I was using were meant to always be the same format - depending on version of the system sending the file, it could change - and thus by handling it in a script transformation this way I was able to handle the minor variations to the file format. If the files are 99% always the same that is ok.. if they were radically different you would be better to use a separate file connection manager.
Related
I am using SSIS2017 and part of what I am doing involves running several (30ish) SQL scripts to be output into flat files into the same folder. My question is, to do this do I have to create 30 New File Connections or is there a way to define the folder I want all the outputs to go to, and have them saved there?
I am only really thinking of keeping a tidy Connection Manager tab. If there's a more efficient way to do it than 30something file connections that would be great?
A data flow is tightly bound to the columns and types defined within for performance reasons.
If your use case is "I need to generate an extract of sales by year for the past 30ish" then yes, you can make do with a single Flat File Connection Manager because the columns and data types will not change - you're simply segmenting the data.
However, if your use case is "I need to extract Sales, Employees, Addresses, etc" then you will need a Flat File Connection Manager (and preferably a data flow) per entity/data shape.
It's my experience that you would be nicely served by designing this as 30ish packages (SQL Source -> Flat File Destination) with an overall orchestrator package that uses Execute Package Task to run the dependent processes. Top benefits
You can have a team of developers work on the individual packages
Packages can be re-run individually in the event of failure
Better performance
Being me, I'd also look at Biml and see whether you can't just script all that out.
Addressing comments
To future proof location info, I'd define a project parameter of something like BaseFilePath (assuming the most probably change is that dev I use a path of something like C:\ssisdata\input\file1.txt, C:\ssisdata\input\file3.csv and then production would be \server\share\input\file1.txt or E:\someplace\for\data\file1.txt) which I would populate with the dev value C:\ssisdata\input and then assign the value of \\server\share\input for the production to the project via configuration.
The crucial piece would be to ensure that an Expression exists on the Flat File Connection Manager's ConnectionString property to driven, in part, by the parameter's value. Again, being a programmatically lazy person, I have a Variable named CurrentFilePath with an expression like #[Project$::BaseFilePath] + "\\file1.csv"
The FFCM then uses #[User::CurrentFilePath] to ensure I write the file to the correct location. And since I create 1 package per extract, I don't have to worry about creating a Variable per flat file connection manager as it's all the same pattern.
I have a Data Flow with OLE DB Source, Script Component (Transformation), and Flat File Destination:
The OLE DB Source task has 100+ columns. The script component is going to cleanup data in each column and then output it to the Flat File Destination.
Adding output columns by hand in Script Component is unthinkable to me.
What options do I have to mirror the output columns with the input columns in the Script Component? While the output column name will be the same, I plan to change the datatype from DT_STR to DT_WSTR.
Thank you.
You are short of luck here. Possible scenarios:
Either you use Script Component and have to key in all columns and its properties manually. In your case, you have to set proper datatype.
Or you can create your own Custom Component which can be programmed to create output columns based on input columns. It is not easy and I cannot recommend a simple guideline, but it could be done.
This might have sense if you have to repeat similar operations in many places so it is not a one-time task.
You can create a BIML script that creates a package based on metadata. However, the metadata (list of columns and its datatypes) has to be prepared before running BIML script or do some tricks to get it during script execution. Again, some proficiency with BIML is essential.
So, for one-time job and little experience with BIML I would go for a pure manual approach.
I have the following problem:
We have a case where we need to import several dozen, if not hundreds, of uniquely structured source files with SSIS. Perfect opportunity to use BIML for this. We do not have metadata information for the files as is, so I need to get them - not manually, file by file, though.
So I thought, easy, build a table with filepaths, use BIML to create a package with the source connection, SSIS identifies metadata (maybe not 100% correctly) and I can use this metadata (column name, datatype, length, etc.) to initially persist in a metadata table for further use.
But: there seems to be no way to achieve this. While I can view the metadata and even paste it to the clipboard in SSDT, I cannot get it to the pipeline. Tried a script component as well (not my best skill), can get everything BUT the column name in ProcessInput, but can't create an output.
So: is there any known way of achieving this? Googled for several hours already to no avail.
I need to do like this
Client puts data in FTP folder (data can be in these 3 format- .txt, .csv or .xls), The SSIS package need to pull data from ftp and check the data file for correct format such as last name not empty, phone is 10 digit, zip code is 5 digits, Address is not more than 20 character length etc etc)
After checking data file, if everything okay it should load file in dev. database, if not I need to run some cleaning quires (like taking first 5 digit for zip etc) and load data, if some column is missing, it need to send email to client asking different data file
Till now, I do this task by manually importing file and running lot of sql queries, which is time consuming. My manager asked me to write SSIS package to automate this process
I am fairly new in SSIS, can someone give me SSIS package design idea (I mean which task to use at which sequence etc) so I can try and learn
Thanks for the help
Here are a couple of suggestions:
Configure tasks to send errors caused by bad data to a separate file. This will identify problem rows while letting the good stuff to continue. You can also use the conditional split to redirect rows with bad data such as blank rows.
The Derived Column Transformation is handy to trim, format, slice, and dice data.
Use the Event Handler to send emails if a given condition is true.
Use the logging features. Very helpful in sorting out something that went sideways while you were sleeping.
I have a 3rd party system a user uses which requires the user manually import new data when the user chooses. I have a view in MS SQL server that has the fields in the exact order that is wanted.
This 3rd party system needs the export file in a comma quote format. For this I want every single field surrounded with quotes and not just the ones that contain the field delimiter (a comma).
I have worked with the configuration files to try and customize how csv is exported. It seems the available options for the CSV renderer does not allow me to get to this format. I think? Am I making this more difficult than I need to? What do I need to do to get to a format like this?
Seeing as this report could be run without any parameters every time I am contemplating setting up a thing with Python, as I could accomplish exactly what I want in a very small number of lines of code. However, it would be nice if I could use SSRS as it takes away my need to figure out the delivery of the export file and is also a simple enough interface any user should be able to figure out how to use it.
Thanks.
MSSQL is a data source, to get data out of. Since you are simply looking for a way to extract data from the database, a python script to create the file exactly as you wish would be the simples explanation. K.I.S.S. :)