Populate Derived Column with File's Date Modified - ssis

I'm a wannabe to .Net and SQL and am working on an SSIS package that is pulling data from flat files and inputting it into a SQL table. The part that I need assistance on is getting the Date Modified of the files and populating a derived column I created in that table with it. I have created the following variables: FileDate of type DateTime, FilePath of String, and SourceFolder of String for the path of the files. I was thinking that the DateModified could be populated in the derived column w/i the DataFlow, using a Script Component? Can someone please advise on if I'm on the right track? I appreciate any help. Thanks.

A Derived Column Transformation can only work with Integration Services Expressions. A script task would allow you to access the .net libraries and you would want to use the method that #wil kindly posted or go with the static methods in System.IO.File
However, I don't believe you would want to do this in a Data Flow Task. SSIS would have to evaluate that code for every row that flows through from the file. On a semi-related note, you cannot write to a variable until the ... event is fired to signal the data flow has completed (I think it's OnPostExecute but don't quote me) so you wouldn't be able to use said variable in a downstream derived column at any rate. You would of course, just modify the data pipeline to inject the file modified date at that point.
What would be preferable and perhaps your intent is to use a Script Task prior to the Data Flow task to assign the value to your FileDate variable. Inside your Data Flow, then use a Derived Column to add the #FileDate variable into the pipeline.
// This code is approximate. It should work but it's only been parsed by my brain
//
// Assumption:
// SourceFolder looks like a path x:\foo\bar
// FilePath looks like a file name blee.txt
// SourceFolder [\] FilePath is a file that the account running the package can access
//
// Assign the last mod date to FileDate variable based on file system datetime
// Original code, minor flaws
// Dts.Variables["FileDate"].Value = File.GetLastWriteTime(System.IO.Path.Combine(Dts.Variables["SourceFolder"].Value,Dts.Variables["FilePath"].Value));
Dts.Variables["FileDate"].Value = System.IO.File.GetLastWriteTime(System.IO.Path.Combine(Dts.Variables["SourceFolder"].Value.ToString(), Dts.Variables["FilePath"].Value.ToString()));
Edit
I believe something is amiss with either your code or your variables. Do your values approximately line up with mine for FilePath and SourceFolder? Variables are case sensitive but I don't believe that to be your issue given the error you report.
This is the full script task and you can see by the screenshot below, the design-time value for FileDate is 2011-10-05 09:06 The run-time value (locals) is 2011-09-23 09:26:59 which is the last mod date for the c:\tmp\witadmin.txt file
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
namespace ST_f74347eb0ac14a048e9ba69c1b1e7513.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
public void Main()
{
Dts.Variables["FileDate"].Value = System.IO.File.GetLastWriteTime(System.IO.Path.Combine(Dts.Variables["SourceFolder"].Value.ToString(), Dts.Variables["FilePath"].Value.ToString()));
Dts.TaskResult = (int)ScriptResults.Success;
}
}
}
C:\tmp>dir \tmp\witadmin.txt
Volume in drive C is Local Disk
Volume Serial Number is 3F21-8G22
Directory of C:\tmp
09/23/2011 09:26 AM 670,303 witadmin.txt

Related

How to get file name dynamically with time stamp in SSIS

I have a flat file source which has to be loaded daily to a table. I receive the file in the following format "filename_20190509040235.txt"
I used expression to get file name with date, how can I get the time stamp?
The time stamp is different in each date. The file get generated in the afternoon and the package is planning to run every night.
Assuming you want to load files based on a certain time defined by the timestamp on the file name, an overview of this process is below. As noted, files with a timestamp within the 12 hours prior to the package execution are returned, and you may need to adjust this to your specific needs. This also uses the same file name/timestamp format as indicated in your question, i.e. filename_20190509040235.txt.
Create an object and string variable in SSIS. On the Flat File connection manager, add the string variable as the expression for the connection string. This can be done from the Properties window (press F4) on the connection manager, going to the Expressions field, pressing the ellipsis next to it, choosing the ConnectionString property on the next window and selecting the recently created string variable as the expression for this.
Add a Script Task on the Control Flow. Add the object variable in the ReadWriteVariables field. If the directory holding the files is stored in an SSIS variable add this variable in the in the ReadOnlyVariables field.
Example code for this is below. Your post stated the files are generated in the afternoon with the package running nightly. Not being sure of the exact requirements, this just returns files with a timestamp within 12 hours of the current time. You can change this by adjusting the parameter of DateTime.Now.AddHours, which currently subtracts 12 hours from the current time (i.e. adds -12). This will go in the Main method of the Script Task. Be sure to add the references noted below too.
Add a Foreach Loop after the Script Task and for the enumerator type select Foreach From Variable Enumerator. On the Variable field of the Collection tab, choose the object variable that was populated in the Script Task. Next on the Variable Mappings pane select the string variable created earlier (set as the connection string for the Flat File connection manager) at index 0.
Inside the Foreach Loop add a Data Flow Task. Within the Data Flow Task, create a Flat File Source component using the Flat File connection manager and add the appropriate destination component. Connect these two and ensure that the columns are mapped correctly on the destination.
Script Task:
using System.IO;
using System.Collections.Generic;
//get source folder from SSIS string variable (if held there)
string sourceDirectory = Dts.Variables["User::SourceDirectory"].Value.ToString();
DirectoryInfo di = new DirectoryInfo(sourceDirectory);
List<string> recentFiles = new List<string>();
foreach (FileInfo fi in di.EnumerateFiles())
{
//use Name to only get file name, not path
string fileName = fi.Name;
string hour = fileName.Substring(17, 2);
string minute = fileName.Substring(19, 2);
string second = fileName.Substring(21, 2);
string year = fileName.Substring(9, 4);
string month = fileName.Substring(13, 2);
string day = fileName.Substring(15, 2);
string dateOnFile = month + "/" + day + "/" + year + " "
+ hour + ":" + minute + ":" + second;
DateTime fileDate;
//prevent errors in case of bad dates
if (DateTime.TryParse(dateOnFile, out fileDate))
{
//files from last 12 hours
if (fileDate >= DateTime.Now.AddHours(-12))
{
//FullName for file path
recentFiles.Add(fi.FullName);
}
}
}
//populate SSIS object variable with file list
Dts.Variables["User::ObjectVariable"].Value = recentFiles;

Reading object variable values in SSIS script component source

Is it possible to read object variable values in SSIS script component source?
I have a variable, of type Object, which contains records from table populated by using a SQL Script Task.
I have used this Script Task and it's working perfectly by using below code
oleDA.Fill(dt, Dts.Variables("vTableRowsObj").Value)
in this way where vTableRowsObj is object variable .
I want to read this object in an SSIS script component so that I can directly give the output from script component to the destination table.
The end goal is that I am planning to create more object variables and simply by reading these objects, give the output to destination tables from script component.
I had a similar issue.
Here's some links to reference and my code is below for simple output for ID and Name.
http://agilebi.com/jwelch/2007/03/22/writing-a-resultset-to-a-flat-file/
http://www.ssistalk.com/2007/04/04/ssis-using-a-script-component-as-a-source/
http://consultingblogs.emc.com/jamiethomson/archive/2006/01/04/SSIS_3A00_-Recordsets-instead-of-raw-files.aspx
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Xml;
using System.Data.OleDb;
public override void CreateNewOutputRows()
{
DataTable dt = new DataTable();
OleDbDataAdapter oleda = new OleDbDataAdapter();
oleda.Fill(dt, this.Variables.ObjectVariable);
foreach (DataRow row in dt.Rows)
{
Output0Buffer.AddRow();
Output0Buffer.ID = Convert.ToInt32(row.ItemArray[0]);
Output0Buffer.Name = row.ItemArray[1].ToString();
}
}
Given that you have a table with records populated by a SQL Script task, why is it necessary to load that data into a variable of type Object? Why not just use that table as a data source in a data flow? The basic steps are...
1) Run your SQL Script task and load your results to a table (sounds like you are already doing this)
2) Skip loading the records to the Object variable
3) Instead add a Data Flow Component as a downstream connection to your SQL Script Task
4) Add a Source component to your Data Flow: use the the table you populated with the SQL Script Task as your data source
5) Add a Destination component to your Data Flow: use your destination table as your data destination
However in the spirit of answering the question you asked directly (if I have in fact understood your question correctly), then the simple answer is yes you can use an SSIS script component as a data source in a data flow. This article walks you through the steps.
Since I've stumbled on this problem today let me give you my solution:
First (something that you've done but placed here for clarity):
Create the ExecuteSQL task with "ResultSet" set to "Full result set" and assign it to the object type variable:
Then link it to the "Script task" and then add the variable either to "ReadOnly" or "ReadWriteVariables"
Now you just need to access this variable - as you suggested by filling it to a datatable, and then assign it to a string variable:
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.OleDb;
public void Main()
{
// TODO: Add your code here
DataTable dt = new DataTable();
var oleDa = new OleDbDataAdapter();
oleDa.Fill(dt, Dts.Variables["Destination"].Value);
string yourValueAsString= dt.Rows[0][0].ToString();
Dts.Variables["MyStringVariable"].Value = yourValueAsString;
[...]
Dts.TaskResult = (int)ScriptResults.Success;
}

Prevent Duplicate headers in flat file destination - SSIS

I need some help.
I am importing some data in .csv file from an oledb source. I don't want the headers to appear twice in the destination. If i Uncheck the "Column names in first data row" property , the headers don't get populated in the first execution as well.
Output as of now.
Col1,Col2
A,B
Col1,Col2
C,D
How can I make the package run in such a way that if the file is empty , the headers get inserted. Then if the execution happens again, headers are not included,just the data.
there was a similar thread, but wasn't able to apply the solution as how to use expressions to get the number of rows of destination itself. It was long back , so I created a new.
Your help is deeply appreciated.
-Akshay
Perhaps I'm missing something but this works for me. I am not having the read only trouble with ColumnNamesInFirstDataRow
I created a package level variable named AddHeader, type Boolean and set it to True. I added a Flat File Connection Manager, named FFCM and configured it to use a CSV output of 2 columns HeadCount (int), AddHeader (boolean). In the properties for the Connection Manager, I added an Expression for the property 'ColumnNamesInFirstDataRow' and assigned it a value of #[User::AddHeader]
I added a script task to test the size of the file. It has read/write access to the Variable AddHeader. I then used this script to determine whether the file was empty. If your definition of "empty" is that it has a header row, then I'd adjust the logic in the if check to match that length.
public void Main()
{
string path = Dts.Connections["FFCM"].ConnectionString;
System.IO.FileInfo stats = null;
try
{
stats = new System.IO.FileInfo(path);
// checking length isn't bulletproof based on how the disk is configured
// but should be good enough
// http://stackoverflow.com/questions/3750590/get-size-of-file-on-disk
if (stats != null && stats.Length != 0)
{
this.Dts.Variables["AddHeader"].Value = false;
}
}
catch
{
// no harm, no foul
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I looped through twice to ensure I'd generate the append scenario
I deleted my file and ran the package and only had a header once.
The property that controls whether the column names will be included in the output file or not is ColumnNamesInFirstDataRow. This is a readonly property.
One way to achieve what you are trying to do it would be to have two data flow tasks on the control flow surface preceded by a script task. these two data flow tasks will be identical except that they will be referring to two different flat file connection managers. Again, the only difference between these two would be the different values for the ColumnsInTheFirstDataRow; one true, another false.
Use this Script task to decide whether this is the first run or subsequent runs. Persist this information and check it within the script. Either you can have a separate table for this information, or use some log table to infer it.
Following solution is worked for me.You can also try the following.
Create three variables.
IsHeaderRequired
RowCount
TargetFilePath
Get the source row counts using Execute SQL task and save it in
RowCount variable.
Have script task. Add readonly variables TargetFilePath and
RowCount. Add read and write variable IsHeaderRequired.
Edit the script and add the following line of code.
string targetFilePath = Dts.Variables["TargetFilePath"].Value.ToString();
int rowCount = (int)Dts.Variables["RowCount"].Value;
System.IO.FileInfo targetFileInfo = new System.IO.FileInfo(targetFilePath);
if (rowCount > 0)
{
if (targetFileInfo.Length == 0)
{
Dts.Variables["IsHeaderRequired"].Value = true;
}
else
{
Dts.Variables["IsHeaderRequired"].Value = false;
}
}
Dts.TaskResult = (int)ScriptResults.Success;
Connect your script component to your database
Click connection manager of flat file[i.e your target file] and go
to properties. In the expression, mention the following as shown in
the screenshot.
Map the connectionString to variable "TargetFilePath".
Map the ColumnNamesInFirstDataRow to "IsHeaderRequired".
Expression for Flat file connection Manager.
Final package[screenshot]:
Hope this helps
A solution ....
First, add an SSIS integer variable in the scope of the Foreach Loop or higher - I'll call this RowCount - and make its default value negative (this is important!). Next, add a Row Count to your Data Flow, and assign the result to the RowCount SSIS variable we just made. Third, select your Connection Manager (don't double-click) and open the Properties window (F4). Find the Expressions property, select it, and hit the ellipsis (...) button. Select the ColumnNamesInFirstDataRow property, and use an expression like this:
[#User::RowCount] < 0
Now, when your package starts, RowCount has the static value of -1 or another negative number. When the data flow starts for the first time in your loop, the ColumnNamesInFirstDataRow property will have a value of TRUE. When the first data flow completes, the row count (even if it's zero) is written to the RowCount variable. On the second interation of the loop, the Connection Manager is then reconfigured to NOT write column names...

convert Image column is very slow

I want to convert data from old database to new database with new structure.
in old database I have attachment table that must be convert to attachment table in new database.
old database attachment table structure is below:
Attachment (ID int, Image Image, ...)
and new database attachment table structure is below :
Attachment (ID int, Image Image, OldID Int, ...)
each time I execute convert package copy only not exists data (new data) from old database to new database.
I use below format for do it :
lookup between old table and new table (ID --> OldID) for check exists record.
When I run SSIS Packages; SSIS, first cache all lookups and source component data in memory then execute package. my source data in this package is very huge and when I run this package it will be run very slowly. I want to get Image column data from old database for each new record after lookup for check exists component. if I use new lookup component for get image column data from old database, SSIS cache this new lookup data and execution time of run this package not change. what must I do?
thanks in advance.
Are you sure you're thinking this through correctly? SSIS should not be slow even if the amount of data you are loading is huge.
Your LOOKUP component needs to make sure it's not doing anything it doesn't need to. If you are pointing it to the table in the new database, change it to a SQL Query at once. In this query you only need to SELECT OldId FROM tbl and point the incoming ID from old database to this. Your data flow should contain ID and Image from Old database, which is mapped ID -> OldIdand "Image -> Image` in your OLE DB Destination. No more is needed for "Insert new rows only" operation like you are doing here.
For this job, there is no need for any custom code or dynamic SQL. You -do- want to get the ID and Image from your source system in the data flow (unless you have major network bottlenecks to sort out) - doing a RBAR lookup to get the image data from the old system is a very backwards way of thinking your ETL.
Select only ID from source table
Do lookup in destination db with no change
For its no match output do lookup in source table, with Cache Mode set to No cache, which will append Image to the flow.
In this case each image will be fetched separately, which may affect performance.
You may also do it in two Data Flows.
In first:
Select only ID from source table
Do lookup in destination db with no change
Store new Ids in string variable IdListToBeFetched as comma separated list using Srcipt Component as destination witch code similar to:
using System.Text;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
StringBuilder sb;
public override void PreExecute()
{
base.PreExecute();
sb = new StringBuilder();
}
public override void PostExecute()
{
base.PostExecute();
Variables.IdListToBeFetched = sb.ToString().TrimEnd(',');
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.ID_IsNull)
{
sb.AppendFormat("{0},", Row.ID);
}
}
}
In second Data Flow set sql command of source to dynamic generated query from expression similar to "select ID, Image from Attachment where ID in (" + #[User::IdListToBeFetched] + ")" and set DelayValidation = True. It will take all Images in single select which should be faster.
To set dynamic generated query as SqlCommand in sources like ADO NET Source or ODBC Source:
select Expression property of Data Flow Task containing your source
find property [your source name].[SqlCommand] and set expression here
To set dynamic generated query as sql command in OLE DB Source (taken from Jamie Thomson blog):
Create a new variable called SourceSQL
Open up the properties pane for SourceSQL variable (by pressing F4)
Set EvaluateAsExpression=TRUE
Set Expression to "select ID, Image from Attachment where ID in (" + #[User::IdListToBeFetched] + ")"
For your OLE DB Source component, open up the editor
Set Data Access Mode="SQL Command from variable"
Set VariableName = "SourceSQL"

SSIS transactional data (different record types, one file)

An interesting one, we're evaluating ETL tools for pre-processing statement data (e.g. utility bills, bank statements) for printing.
Some of the data comes through in a single flat file, with different record types.
e.g. a record type with "01" as the first field will be address data. This will have name and address fields. A record type with "02" will be summary data, with balances and totals. Record type "03" will be a line item on the statement.
Each statement will have one 01 and 02 records, and multiple 03 records. I could pre-parse the file and split into 3 files for loading into a table, but this is less than ideal.
We take the file and do a few manipulations on it (e.g. add in a couple more fields to the address record, and maybe do some totalling / validation), and then send the file in pretty much the same format (But with the extra fields added) to our print composition program.
How would you do this in SSIS?
The big problem with variant records in SSIS is that you don't get any of the benefits of the connection manager helping with the layout, since the connection manager can only handle a single layout.
So typically, you end up with a CRLF terminated flat file with two columns: recordtype and recorddata. Then you put the conditional split in and parse each type of row on different paths. The parsing will have to split up the remaining record data and put it in columns and convert as normal, either with a derived column transform or a script transform and potentially conversion transforms.
If you had a lot of packages to do, I would seriously consider writing a custom component which produced 3 outputs already converted to your destination types.
answered my own question - see below script. AcctNum come in from a derived column from the flat file source and will be correctly populated for 02 record types, save it in local static varialbe and put it back on the row for other record types that do not contain the acct number.
/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
static String AccountNumber = null;
public override void PreExecute()
{
base.PreExecute();
/*
Add your code here for preprocessing or remove if not needed
*/
}
public override void PostExecute()
{
base.PostExecute();
/*
Add your code here for postprocessing or remove if not needed
You can set read/write variables here, for example:
Variables.MyIntVar = 100
*/
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (Row.RecordType == "02")
AccountNumber = Row.AcctNum; // Store incomming Account Number into local script variable
else if (Row.RecordType == "06" || Row.RecordType == "07" || Row.RecordType == "08" ||
Row.RecordType == "09" || Row.RecordType == "10")
Row.AcctNum = AccountNumber; // Put Stored Account Number on this row.
}
}
This is possible, bu you will have to write custom logic. I did this once with DTS.
If the file is delimited, SSIS will import the fields correctly. You can write a script that examines the record type field, then branches into different inserts depending on the record type. If the file has records that are not delimited, but each type has its own fixed widths, this becomes a lot more complicated, since you'd have to parse and split each imported line, with the record types and their width hardcoded in the script.
There are a few ways to do it, but I think the easiest one to understand would be to add a conditional split after the source task, and then push it through a bunch of data conversion tasks to get the right format of data.
Make sure that your source is set up with the correct data types, so nothing falls through (e.g.-all strings). Then just check the "Record Type" field in that conditional split to send it to the right branch.