Remove Column from CSV file using SSIS - ssis

I have a CSV file that I am using as a source in SSIS. The file contains additional blank columns in the file
There is an additional column in S, U, V; is there a way I can remove the column through SSIS Script Task before using it as a source file.

Perhaps I misunderstand the problem, but you have a CSV with blank columns that you'd like to import into SQL Server but you do not want to import the blank columns into your table.
So, don't create the target columns in your table. That's it. There's no problem with having "extra" data in your data pipeline, just don't land it.
The Flat File Connection manager must* have a definition for all the columns. As it appears you do not headers for the blank columns and a column name is required, you will need to set up the flat file connection manager as my file has no header columns but then skip 1 row to avoid the header row. It might be a data starts on line setting - doing this from memory. By specifying no header row, that means you need to manually provide your column names. I favor naming them something obvious like IgnoreBlankColumn_S, IgnoreBlankColumn_U, IgnoreBlankColumn_V that way future maintainers will know this is an intentional design decision since the source data has no data.
*You can write a query against a text file which would allow you to only pull in specific columns but this is not going to be worth the effort.

Related

Flat File Destination has multiple Inputs from custom data flow task

I have an SSIS Package setup with the following Data Flow Tasks in order:
Flat File Source
Derived Column
Custom Task
Flat File Destination
The Flat File source contains fixed-width rows of data (282 characters per row).
The Derived Column splits each row into columns using the SUBSTRING() method.
The Custom Task performs some Regular Expression validation and creates two new output columns: RowIsValid (a DT_BOOL) and InvalidReason (a DT_WSTR of 200). There is no Custom UI for this Task.
The Flat File Destination is the validated data in delimited column format. Eventually, this would be a
database destination.
I know that this can be done using a Script Task. In fact, I am currently doing so in my solution. However, what I am trying to accomplish is building a Custom Task so that code changes are done in a single-spot instead of having to change multiple Script Tasks.
I have a couple of issues I'm trying to overcome and am hoping for some help/guidance:
(Major) Currently, when I review the mappings of the Flat File Destination, the Available Input columns are coming from the Flat File Source, the Derived Column Task, and the Custom Task. Only one column is coming from the Flat File Source (because there is only one column), while the Derived Column and Custom Task each have all of the columns created in the Derived Column.
My expectation is that the Available Input Columns would/should only display the Custom Validator.[column name] columns (with only the column name) from the Custom Validator. Debugging, I don't see where I can manipulate and suppress the Derived Column.[column name] columns.
(Minor) Getting the input columns from the Derived Column Task to automatically be selected or used when the Input is attached.
Currently, after hooking up the input and output of the Custom Validator, I have to go to the Inputs tab on the Advanced Edit and select the columns I want. I'm selecting all, because I want all columns to go through the task, even though only some will be validated by the task.

How to skip irregular header information of a Flat File in SSIS?

I have a file like as seen below: Just Ex:
kwqif h;wehf uhfeqi f ef
fekjfnkenfekfh ijferihfq eiuh qfe iwhuq fbweq
fjqlbflkjqfh iufhquwhfe hued liuwfe
jewbkfb flkeb l jdqj jvfqjwv yjwfvjyvdfe
enjkfne khef kurehf2 kuh fkuwh lwefglu
gjghjgyuhhh jhkvv vytvgyvyv vygvyvv
gldw nbb ouyyu buyuy bjbuy
ID Name Address
1 Andrew UK
2 John US
3 Kate AUS
I want to dynamically skip header information and load flatfile to DB
Like below:
ID Name Address
1 Andrew UK
2 John US
3 Kate AUS
The header information may vary (not fixed no. of rows) from file to file.
Any help..Thanks in advance.
The generic SSIS components cannot meet this requirement. You need to code for this e.g. in an SSIS Script task.
I would code that script to read through the file looking for that header row ID Name Address, and then write that line and the rest of the file out to a new file.
Then I would load that new file using the SSIS Flat File Source component.
You might be able to avoid a script task if you'd prefer not to use one. I'll offer a few ideas here as it's not entirely clear which will be best from your example data. To some extent it's down to personal preference anyway, and also the different ideas might help other people in future:
Convert ID and ignore failures: Set the file source so that it expects however many columns you're forced into having by the header rows, and simply pull everything in as string data. In the data flow - immediately after the source component - add a data conversion component or conditional split component. Try to convert the first column (with the ID) into a number. Add a row count component and set the error output of the data conversion or conditional split to be redirected to that row count rather than causing a failure. Send the rest of the data on its way through the rest of your data flow.
This should mean you only get the rows which have a numeric value in the ID column - but if there's any chance you might get real failures (i.e. the file comes in with invalid ID values on rows you otherwise would want to load), then this might be a bad idea. You could drop your failed rows into a table where you can check for anything unexpected going on.
Check for known header values/header value attributes: If your header rows have other identifying features then you could avoid relying on the error output by simply setting up the conditional split to check for various different things: exact string matches if the header rows always start with certain values, strings over a certain length if you know they're always much longer than the ID column can ever be, etc.
Check for configurable header values: You could also put a list of unacceptable ID values into a table, and then do a lookup onto this table, throwing out the rows which match the lookup - then if you need to update the list of header values, you just have to update the table and not your whole SSIS package.
Check for acceptable ID values: You could set up a table like the above, but populate this with numbers - not great if you have no idea how many rows might be coming in or if the IDs are actually unique each time, but if you're only loading in a few rows each time and they always start at 1, you could chuck the numbers 1 - 100 into a table and throw away and rows you load which don't match when doing a lookup onto this table.
Staging table: This is probably the way I'd deal with it if I didn't want to use a script component, but in part that's because I tend to implement initial staging tables like this anyway, and I'm comfortable working in SQL - so your mileage may vary.
Pick up the file in a data flow and drop it into a staging table as-is. Set your staging table data types to all be large strings which you know will hold the file data - you can always add a derived column which truncates things or set the destination to ignore truncation if you think there's a risk of sometimes getting abnormally large values. In a separate data flow which runs after that, use SQL to pick up the rows where ID is numeric, and carry on with the rest of your processing.
This has the added bonus that you can just pick up the columns which you know will have data you care about in (i.e. columns 1 through 3), you can do any conversions you need to do in SQL rather than in SSIS, and you can make sure your columns have sensible names to be used in SSIS.

Sorting files with the same header names using SSIS

I have a folder with a lot of data files in. I want to be able to loop through the files, look at the headers and sort them into folders if they have the same headers. Is that possible to do in SSIS? If so would anyone be able point me the direction of how to do this?
I am going to try and explain this as best I can without writing a book as this a multi stepped process that isn't too complex but, might be hard to explain with just test. My apologies but I do not have access to ssdt at the moment so I can not provide images to aid here.
I would use the TextFieldParser class in the VisualBasics.dll. in a script task. This will allow you to read the header from file into a string array. You can then build the string array into a delimited column and load an object variable with a datatable that has been populated with two columns. The first column being the filename and the second being the delimiter headers.
Once you have this variable you can load a sql table with this information. (optional to skip if you want to load the columns directly into sql as you read them. your call)
Once you have your sql table you can create an enumerator for that dataset based on the unique headers column.
Then use a foreach loop task with script task to enumerate thru the unique header sets. Use a sql task to assign the file names that belong to the unique header set.
Within the script loop thru the returned file names and apply the necessary logic to move the files to there respective folders.
This is sort of a high level overview as I am assuming you are familiar enough with SSIS to understand the steps necessary to complete each step. If not then I would be able to elaborate later in the day when I am able to get to my SSIS rig.

process csv File with multiple tables in SSIS

i trying to figure out if its possible to pre-process a CSV file in SSIS before importing the Data into SQL.
I currently receive a file that contains 8 tables with different structures in one flat file.
the Tables are identified by a row with the Table name in it encapsulated by Square Brackets i.e. [DOL_PROD]
the the data is underneath in standard CSV format. Headers first and then the data.
the tables are split by a blank line and the process repeats for the next 7 tables.
[DOL_CONSUME]
TP Ref,Item Code,Description,Qty,Serial,Consume_Ref
12345,abc,xxxxxxxxx,4,123456789,abc
[DOL_ENGPD]
TP Ref,EquipLoc,BackClyLoc,EngineerCom,Changed,NewName
is it posible to split it out into seperate CSV files? or Process it in a loop?
i would really like to be able to perform this all with SSIS automatically.
Kind Regards,
Adam
You can't do that by flat file source and connection manager alone.
There are two ways to achieve your goal:
You can use Script Component as source of the rows and to process the files, and then you'd do whatever you want with a file programatically.
The other way, is to read your flat file treating every row as a single column (i.e. without specifying delimiter), and then, via Data Flow Transformations, you'd be splitting rows, recognizing table names, splitting flows and so on.
I'd strongly advise you to use Script Component, even if you'd have to learn .NET first, because the second option will be a nightmare :). I'd use Flat File Source to extract lines from file as single column, and thet work it in Script Component, rather than reading a "raw" file directly.
Here's a resource that should get you started: http://furrukhbaig.wordpress.com/2012/02/28/processing-large-poorly-formatted-text-file-with-ssis-9/

Reading Data from Header in a flat file in SSIS

I have pipe delineated flat file that SSIS is reading in. This flat file has 7 header rows. There is an option to skip (n) number of header rows, but the problem is, is that I need to have the ability to retrieve data from these rows as well.
What is the best way of retrieving this this information to be used later in data flow?
A couple of things to try.
If there is a field that denotes the header you can read in all the data then use a conditional split to split out the header records from the data.
Or you could use something like this.
When all else fails, you could always use Script Component of type Source.