SSIS - read a single header record from a flat file or an excel file prior to processing - ssis

Is there a method by which one can read just the first record of a file, i.e., to read header information so that a decision can be made whther or not to process the remainder of the file?
I know that with the split transformation component one can write an expression that will ignore all of the rows besides the header based on a key word in the header. I would rather not go that route as that is inefficiently reading every record in the file.
Specifically, is there script component logic that I can implement to close the file
and end the dataflow after the first record has been read?

See this post from Todd McDermid:
Basically, you would set up a Foreach
Container to loop over the files in
your directory. Inside the Foreach,
you would determine the "file type" -
perhaps by creating a variable with a
long-winded expression on it that
pulls apart your file name and assumes
the a "file type" value - then passes
control on to one of five Data Flows
via conditional connectors.
(Double-click on the standard green
connector, change it's Evaluation
Operation to Expression and
Constraint, and set the expression to
be "file_type_variable =
".) Then each Data Flow
picks apart one "file type".

Related

Jitterbit: target CSV-file created with only header although "do not create emtpy files" is checked

In Jitterbit Dataloader 10.37 I want to create CSV-files from Salesforce data but only if the query returns data.
I checked "do not create empty files" on the target type local file but it is still creating a csv just with the header but with no data. I do not want files created with no data in it. It is not an option to not have the header at all in the files - I will need it when there is data from the query.
Any suggestions? What am I missing?
I've seen this happen in situations where the write operation is after a couple of other operations. In that instance a header is written in the first operation, then another header is written in a second operation. The first row is read as the header, the second row (another header) is read as data, and written out.
I always add in a condition where I check if one of the fields equals its name. Something like this, to just skip those rows.
<trans>
if(Id=="Id",
false;,
true;
);
</trans>
The best way to do this is to send your output to a variable array. Then check the variable to see if data is present. So set your target to a global variable. Then add a script after that target and do your validation. To test your script use DEBUGBREAK(); to test and look at your variable content. That way you can see what is going into it.
Then make your condition statement.
if( Length($varailbe)>1,RunOperation("operation:myexport"),"novalue"):

Using the file name to determine if a SSIS package should be executed

I've got 4 different companies whose spreadsheets I have to process. Currently, if I receive a set of files, the SSIS solution is set to execute all of the packages in the solution whether I received the files or not. I want to change this behavior by looking the the Source directory that stores the .xlsx files.
For instance, I have the files from two companies: A & B. The file names will always be in this format with the companies' name at the beginning of the file name.
From there, I would like to use the file names to possibly set variables (? - not sure if this would be the best way ) to set precedence constraints to only execute the packages that relate to the companies' files we received.
Based on the screen shots, I want the packages of Company A & B to execute while C & D are skipped and then to move on to the next package.
Suggestions on how to achieve this goal?
One way would be to create a boolean variable for each company.
#CompanyA_Recd, #CompanyB_Recd, #CompanyC_Recd, #CompanyD_Recd
Your foreach loop would loop through the files, analyze each file name, and mark the appropriate variable as true if you received that company's files.
Then your precedence constraints could just check the variable for that company.
Note that, depending on the logic you want for executing the Final Report Package, this means you might need two precedence constraints for each company. One that goes to the package if the variable is true, and one that goes straight to the final package if the variable is false.

Ignore last/corrupted record from flat file source in SSIS

I have following csv file:
col1, col2, col3
"r1", "r2", "r3"
"r11", "r22", "r33"
"totals","","",
followed by 2 blank lines. The import is failing as there is extra comma at the end of the last data row and most probably will fail because of the extra blank lines at the end.
Can I skip the last row somehow or even better stop import when I get into that row? It always has "totals" string in the "col1".
UPDATE:
As far as I understood from the answers that it is not possible to do that with Flat File. Currently I did that with the "Script Component" as a source
You can do it by reading the row as a single string.
Conditionally split out Null and left(col0)=="total"
in script component you then use split function
finally trim("\"")
I know of nothing built-in to SSIS that lets you ignore the LAST line of a CSV.
One way to handle this is to precede your dataflow with a script task that uses the FileSystemObject to edit the CSV and remove the last line.
You will need to create a custom script where you read all lines but the last within SSIS.
This is old but it came up for me when searching this topic. My solution was to redirect rows on the destination. The last row is redirected instead of failing and the job completes. Of course you will potentially redirect rows you don't want to. It all depends on how much you can trust the data.

Importing flat file which has changing column order using SSIS [duplicate]

Problem.
I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.
Furthermore the arrangement of these files are inconsistent.
Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.
Any suggestion, samples or direction will greatly be appreciated.
http://www.cozyroc.com/ssis/data-flow-task
Some forums
http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400
http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow
Off the top of my head, I have a 50% solution for you.
The problem
SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.
Query based solution
If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.
Variables
I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.
InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + #[User::CurrentFilename]
Connection manager
Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + #[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"
Control Flow
My Control Flow looks like a Data flow task nested in a Foreach file enumerator
Foreach File Enumerator
My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for #[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"
In the Variable Mappings tab, assign the value to our #[User::CurrentFileName] variable
At this point, each iteration of the loop will change the value of the #[User::Query to reflect the current file name.
Data Flow
This is actually the easiest piece. Use an OLE DB source and wire it as indicated.
Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the #[User::Query] variable in there, click OK and you're ready to work.
Sample data
I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.
col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444
and
col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222
Running the package results in these two screen shots
What's missing
I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.
Our solution. We use parent child packages. In the parent pacakge we take the individual client files and transform them to our standard format files then call the child package to process the standard import using the file we created. This only works if the client is consistent in what they send though, if they try to change their format from what they agreed to send us, we return the file.

SSIS - Load flat files, save file names to SQL Table

I have a complex task that I need to complete. It worked well before since there was only one file but this is now changing. Each file has one long row that is first bulk inserted into a staging table. From here I'm supposed to save the file name into another table and then insert the the broken up parts of the staging table data. This is not the problem. We might have just one file or even multiple files to load at once. What needs to happen is this:
The first SSIS task is a script task that does some checks. The second task prepares the file list.
The staging table is truncated.
The third task is currently a Foreach loop container task that uses the files from the file list and processes it:
File is loaded into table using Bulk Insert task.
The file name needs to be passed as a variable to the next process. This was done with a C# task before but it is now a bit more complex since there could be more than one file and each file name needs to be saved separately.
The last task is a SQL task that executes a stored procedure with the file name as input variable.
My problem is that before it was only one file. This was easy enough. What would the best way be to go about it now?
In Data Flow Task which imports your file create a derrived column. Populate it with system variable value of filename. Load filename into the same table.
Use a Execute SQL task to retrieve distinc list of filenames into a recordset (Object type variable).
Use For Each Loop container to loop through the recordset. Place your code inside the container. Code will recieve filename from the loop as a value of a variable and process the file.
Use Execute SQL task in For Each Loop container to call SP. Pass filename as a parameter like:
Exec sp_MyCode param1, param2, ?
Where ? will pass filename INPUT as a string
EDIT
To make Flat File Connection to pick up the file specified by a variable - use Connection String property of the Flat File Connection
Select FF Connection, right click and select Properties
Click on empty field for Expressions and then click ellipsis that appears. With Expressions you can define every property of the object listed there using variables. Many objects in SSIS can have Expressions specified.
Add an Expression, select Connection String Property and define an expression with absolute path to the file (just to be on a safe side, it can be a UNC path too).
All the above can be accomplished using C# code in the script task itself. You can loop through all the files one by one and for each file :
1. Bulk Copy the data to the staging
2. Insert the filename to the other table
You can modify the logic as per your requirement and desired execution flow.
Add a colunm to your staging table - FileName
Capture the filename in a SSIS Variable (using expressions) then run something like this each loop:
UPDATE StagingTable SET FileName=? WHERE FileName IS NULL
Why are you messing about with C#? From your description it's totally unnecessary.