I have a flat file source which has to be loaded daily to a table. I receive the file in the following format "filename_20190509040235.txt"
I used expression to get file name with date, how can I get the time stamp?
The time stamp is different in each date. The file get generated in the afternoon and the package is planning to run every night.
Assuming you want to load files based on a certain time defined by the timestamp on the file name, an overview of this process is below. As noted, files with a timestamp within the 12 hours prior to the package execution are returned, and you may need to adjust this to your specific needs. This also uses the same file name/timestamp format as indicated in your question, i.e. filename_20190509040235.txt.
Create an object and string variable in SSIS. On the Flat File connection manager, add the string variable as the expression for the connection string. This can be done from the Properties window (press F4) on the connection manager, going to the Expressions field, pressing the ellipsis next to it, choosing the ConnectionString property on the next window and selecting the recently created string variable as the expression for this.
Add a Script Task on the Control Flow. Add the object variable in the ReadWriteVariables field. If the directory holding the files is stored in an SSIS variable add this variable in the in the ReadOnlyVariables field.
Example code for this is below. Your post stated the files are generated in the afternoon with the package running nightly. Not being sure of the exact requirements, this just returns files with a timestamp within 12 hours of the current time. You can change this by adjusting the parameter of DateTime.Now.AddHours, which currently subtracts 12 hours from the current time (i.e. adds -12). This will go in the Main method of the Script Task. Be sure to add the references noted below too.
Add a Foreach Loop after the Script Task and for the enumerator type select Foreach From Variable Enumerator. On the Variable field of the Collection tab, choose the object variable that was populated in the Script Task. Next on the Variable Mappings pane select the string variable created earlier (set as the connection string for the Flat File connection manager) at index 0.
Inside the Foreach Loop add a Data Flow Task. Within the Data Flow Task, create a Flat File Source component using the Flat File connection manager and add the appropriate destination component. Connect these two and ensure that the columns are mapped correctly on the destination.
Script Task:
using System.IO;
using System.Collections.Generic;
//get source folder from SSIS string variable (if held there)
string sourceDirectory = Dts.Variables["User::SourceDirectory"].Value.ToString();
DirectoryInfo di = new DirectoryInfo(sourceDirectory);
List<string> recentFiles = new List<string>();
foreach (FileInfo fi in di.EnumerateFiles())
{
//use Name to only get file name, not path
string fileName = fi.Name;
string hour = fileName.Substring(17, 2);
string minute = fileName.Substring(19, 2);
string second = fileName.Substring(21, 2);
string year = fileName.Substring(9, 4);
string month = fileName.Substring(13, 2);
string day = fileName.Substring(15, 2);
string dateOnFile = month + "/" + day + "/" + year + " "
+ hour + ":" + minute + ":" + second;
DateTime fileDate;
//prevent errors in case of bad dates
if (DateTime.TryParse(dateOnFile, out fileDate))
{
//files from last 12 hours
if (fileDate >= DateTime.Now.AddHours(-12))
{
//FullName for file path
recentFiles.Add(fi.FullName);
}
}
}
//populate SSIS object variable with file list
Dts.Variables["User::ObjectVariable"].Value = recentFiles;
Related
I have an SSIS requirement:
I have three Excel files with different dates in their file names, stored in a folder.
Folder path: D:\SourceFolder\
File names: Asia_Sale_07May2018.xlsx, Asia_Sale_20Jun2018.xlsx, Asia_Sale_15Aug2018.xlsx
I have a package parameter date of 07/15/2018.
Requirement: Process files where the file name date = parameter date.
If I set parameter date to 07/15/2018 the package should pick & load Asia_Sale_15Aug2018.xlsx
If I set parameter date to 06/20/2018 the package should pick & load Asia_Sale_20Jun2018.xlsx
If I set parameter date to 05/07/2018 the package should pick & load Asia_Sale_07May2018.xlsx
Thanks,
Ayman
1.Loop through the files using ForEach Loop and get the FileName and use Substring to get only the Date Part(07May2018/20Jun2018/15Aug2018 in your case). Convert this to the format you want using convert function.
select convert(varchar,convert(date,'15Aug2018'),101)
2.Use a precedence constraint in the control flow which compares both the values and load the file if it matches.
I would build the name of the file you are looking for and use a foreach loop looking for that specific file.
the C# logic for this is:
DateTime dt = DateTime.Parse("1/1/2018"); //Just set from your parameter
string str_dt = dt.ToString("ddMMMyyyy");
string fname = "Asia_Sale_" + str_dt + ".xlsx";
Once you've got that, use your variable to check your foreach loop for the file.
Can you guy please help?
I have a problem to load data from excel into database with dynamic file name in my source files.
For example, for this month, my filename is ABC 31122017.xlsx. I successfully loaded data from each tab in this excel file into database.
But how do I make it dynamic? For example next month I have excel file
ABC 31012018.xlsx. How to make the job dynamic to pick up the new file?
I able to put the date in variable, but I don't know how to proceed with the filepath in SSIS.
#[User::InputPath] + "ABC " + #[User::Report_DT_DDMMYYYY] + ".xlsx"
I used this in Expressions in the Connection already, set up ExcelFilePath, but it couldn't work.
As in Excel Source connector in SSIS, I already chose the 31122017.xlsx and chose the first tab. But after I put in the Expressions, it couldn't find the first tab I chosen already.
Please help me guys. Thank you.
May be below explanation will help you in overcome this issue (I have SSIS 2012) -
First SSIS variable will hold date value i.e., "20180218". Variable Name- TodayDate. This variable value will be change according to today date.
Second SSIS variable will hold FileName i.e., ""D:\SSIS\StackOverFlowTest1\InputFiles\AB " + #[User::TodayDate] + ".xlsx". Variable Name- FileNameExcel.
Create connection manager for excel and under its properties window change expressions and set ExcelFilePath to "FileNameExcel".
Change "Delay Validation" to True under "Data Flow Task" property.
Using a foreach:
Set up a string variable fileName (put in the actual file path / file name so you can develop your package.
Add a Foreach Loop (File Enumerator which is default)
Set Expression for Directory = #InputPath
Set Files to the proper mask for your excel file (i.e. "ABD *.xlsx")
Go to variable mappings and link to fileName
Add an Excel connection and connect to your actual file.
Set an expression on properties to ExcelFilePath = #fileName
Delay Validation
Develop your data flow(s) as normal.
I have an SSIS package set up that imports downloaded data files to the database (one file at a time by date)
Current Setup (for a file):
Downloaded file is at location (same file exists between the date range 1st Feb to Today)
C:\DataFiles\GeneralSale_20170201.txt
In SSIS the variables - For each file there are 4 variables. First is the location of the where the file is, called #Location
The second simple gives the name of the file named #GeneralSales returning the value
GeneralSale_
The third is the date (#ExportDateFormatted) for which the code is, (DT_WSTR,8)(DATEPART("yyyy", #[User::ExportDate]) * 10000 + DATEPART("mm", #[User::ExportDate]) * 100 + DATEPART("dd", #[User::ExportDate])) and [ExportDate] is set as DATEADD("DD", 0, GETDATE()).
[ExportDate] allows me to set the file date (which is already downloaded) that I want to import in my table dbo.GeneralSale i.e. If I want to import the file on 20170205 then I would adjust the export date and then run the package
The final variable is the #ExportFileExtension returning the value
txt
Then in the DataFlow which looks like the below:
The flat file source connects to the connection string below. The Property > Expressions > ConnectionString of the connection string runs the variables to make a file name. This is where I use the variables from before
#[User::Location] + #[User::GeneralSales] + #[User::ExportDateFormatted] + "." + #[User::ExportFileExtension]
Returning the value:
C:\DataFiles\GeneralSale_20170201.txt
This then populates the table with the data of that file. But to insert the date for another day I have to amend the date and run the package.
What I am trying to do is pass a start and end date to let the package insert all data from the files between those dates.
Hope the above information is clear of what goes on and what I am trying to achieve.
You need to iterate between two dates. In SSIS its pretty straightforward; I would describe the main steps:
Define two package parameters, StartDate and EndDate of type Date, and on the package start - validate that StartDate <= EndDate.
Define a Date variable ExtrDate, and add For Loop with settings initial Expression #ExtrDate = #StartDate, Evaluation - #ExtrDate <= #EndDate and Assign - #ExtrDate = DateAdd("dd", 1, #ExtrDate). Purpose of this loop is quite clear.
Put your extraction tasks inside For Loop container.
ExtrDate variable will be increased on each step of the loop.
Package parameters allow building more flexible package.
I need some help.
I am importing some data in .csv file from an oledb source. I don't want the headers to appear twice in the destination. If i Uncheck the "Column names in first data row" property , the headers don't get populated in the first execution as well.
Output as of now.
Col1,Col2
A,B
Col1,Col2
C,D
How can I make the package run in such a way that if the file is empty , the headers get inserted. Then if the execution happens again, headers are not included,just the data.
there was a similar thread, but wasn't able to apply the solution as how to use expressions to get the number of rows of destination itself. It was long back , so I created a new.
Your help is deeply appreciated.
-Akshay
Perhaps I'm missing something but this works for me. I am not having the read only trouble with ColumnNamesInFirstDataRow
I created a package level variable named AddHeader, type Boolean and set it to True. I added a Flat File Connection Manager, named FFCM and configured it to use a CSV output of 2 columns HeadCount (int), AddHeader (boolean). In the properties for the Connection Manager, I added an Expression for the property 'ColumnNamesInFirstDataRow' and assigned it a value of #[User::AddHeader]
I added a script task to test the size of the file. It has read/write access to the Variable AddHeader. I then used this script to determine whether the file was empty. If your definition of "empty" is that it has a header row, then I'd adjust the logic in the if check to match that length.
public void Main()
{
string path = Dts.Connections["FFCM"].ConnectionString;
System.IO.FileInfo stats = null;
try
{
stats = new System.IO.FileInfo(path);
// checking length isn't bulletproof based on how the disk is configured
// but should be good enough
// http://stackoverflow.com/questions/3750590/get-size-of-file-on-disk
if (stats != null && stats.Length != 0)
{
this.Dts.Variables["AddHeader"].Value = false;
}
}
catch
{
// no harm, no foul
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I looped through twice to ensure I'd generate the append scenario
I deleted my file and ran the package and only had a header once.
The property that controls whether the column names will be included in the output file or not is ColumnNamesInFirstDataRow. This is a readonly property.
One way to achieve what you are trying to do it would be to have two data flow tasks on the control flow surface preceded by a script task. these two data flow tasks will be identical except that they will be referring to two different flat file connection managers. Again, the only difference between these two would be the different values for the ColumnsInTheFirstDataRow; one true, another false.
Use this Script task to decide whether this is the first run or subsequent runs. Persist this information and check it within the script. Either you can have a separate table for this information, or use some log table to infer it.
Following solution is worked for me.You can also try the following.
Create three variables.
IsHeaderRequired
RowCount
TargetFilePath
Get the source row counts using Execute SQL task and save it in
RowCount variable.
Have script task. Add readonly variables TargetFilePath and
RowCount. Add read and write variable IsHeaderRequired.
Edit the script and add the following line of code.
string targetFilePath = Dts.Variables["TargetFilePath"].Value.ToString();
int rowCount = (int)Dts.Variables["RowCount"].Value;
System.IO.FileInfo targetFileInfo = new System.IO.FileInfo(targetFilePath);
if (rowCount > 0)
{
if (targetFileInfo.Length == 0)
{
Dts.Variables["IsHeaderRequired"].Value = true;
}
else
{
Dts.Variables["IsHeaderRequired"].Value = false;
}
}
Dts.TaskResult = (int)ScriptResults.Success;
Connect your script component to your database
Click connection manager of flat file[i.e your target file] and go
to properties. In the expression, mention the following as shown in
the screenshot.
Map the connectionString to variable "TargetFilePath".
Map the ColumnNamesInFirstDataRow to "IsHeaderRequired".
Expression for Flat file connection Manager.
Final package[screenshot]:
Hope this helps
A solution ....
First, add an SSIS integer variable in the scope of the Foreach Loop or higher - I'll call this RowCount - and make its default value negative (this is important!). Next, add a Row Count to your Data Flow, and assign the result to the RowCount SSIS variable we just made. Third, select your Connection Manager (don't double-click) and open the Properties window (F4). Find the Expressions property, select it, and hit the ellipsis (...) button. Select the ColumnNamesInFirstDataRow property, and use an expression like this:
[#User::RowCount] < 0
Now, when your package starts, RowCount has the static value of -1 or another negative number. When the data flow starts for the first time in your loop, the ColumnNamesInFirstDataRow property will have a value of TRUE. When the first data flow completes, the row count (even if it's zero) is written to the RowCount variable. On the second interation of the loop, the Connection Manager is then reconfigured to NOT write column names...
I'm a wannabe to .Net and SQL and am working on an SSIS package that is pulling data from flat files and inputting it into a SQL table. The part that I need assistance on is getting the Date Modified of the files and populating a derived column I created in that table with it. I have created the following variables: FileDate of type DateTime, FilePath of String, and SourceFolder of String for the path of the files. I was thinking that the DateModified could be populated in the derived column w/i the DataFlow, using a Script Component? Can someone please advise on if I'm on the right track? I appreciate any help. Thanks.
A Derived Column Transformation can only work with Integration Services Expressions. A script task would allow you to access the .net libraries and you would want to use the method that #wil kindly posted or go with the static methods in System.IO.File
However, I don't believe you would want to do this in a Data Flow Task. SSIS would have to evaluate that code for every row that flows through from the file. On a semi-related note, you cannot write to a variable until the ... event is fired to signal the data flow has completed (I think it's OnPostExecute but don't quote me) so you wouldn't be able to use said variable in a downstream derived column at any rate. You would of course, just modify the data pipeline to inject the file modified date at that point.
What would be preferable and perhaps your intent is to use a Script Task prior to the Data Flow task to assign the value to your FileDate variable. Inside your Data Flow, then use a Derived Column to add the #FileDate variable into the pipeline.
// This code is approximate. It should work but it's only been parsed by my brain
//
// Assumption:
// SourceFolder looks like a path x:\foo\bar
// FilePath looks like a file name blee.txt
// SourceFolder [\] FilePath is a file that the account running the package can access
//
// Assign the last mod date to FileDate variable based on file system datetime
// Original code, minor flaws
// Dts.Variables["FileDate"].Value = File.GetLastWriteTime(System.IO.Path.Combine(Dts.Variables["SourceFolder"].Value,Dts.Variables["FilePath"].Value));
Dts.Variables["FileDate"].Value = System.IO.File.GetLastWriteTime(System.IO.Path.Combine(Dts.Variables["SourceFolder"].Value.ToString(), Dts.Variables["FilePath"].Value.ToString()));
Edit
I believe something is amiss with either your code or your variables. Do your values approximately line up with mine for FilePath and SourceFolder? Variables are case sensitive but I don't believe that to be your issue given the error you report.
This is the full script task and you can see by the screenshot below, the design-time value for FileDate is 2011-10-05 09:06 The run-time value (locals) is 2011-09-23 09:26:59 which is the last mod date for the c:\tmp\witadmin.txt file
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
namespace ST_f74347eb0ac14a048e9ba69c1b1e7513.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
public void Main()
{
Dts.Variables["FileDate"].Value = System.IO.File.GetLastWriteTime(System.IO.Path.Combine(Dts.Variables["SourceFolder"].Value.ToString(), Dts.Variables["FilePath"].Value.ToString()));
Dts.TaskResult = (int)ScriptResults.Success;
}
}
}
C:\tmp>dir \tmp\witadmin.txt
Volume in drive C is Local Disk
Volume Serial Number is 3F21-8G22
Directory of C:\tmp
09/23/2011 09:26 AM 670,303 witadmin.txt