Load excel files based on file date selection - ssis

I have an SSIS requirement:
I have three Excel files with different dates in their file names, stored in a folder.
Folder path: D:\SourceFolder\
File names: Asia_Sale_07May2018.xlsx, Asia_Sale_20Jun2018.xlsx, Asia_Sale_15Aug2018.xlsx
I have a package parameter date of 07/15/2018.
Requirement: Process files where the file name date = parameter date.
If I set parameter date to 07/15/2018 the package should pick & load Asia_Sale_15Aug2018.xlsx
If I set parameter date to 06/20/2018 the package should pick & load Asia_Sale_20Jun2018.xlsx
If I set parameter date to 05/07/2018 the package should pick & load Asia_Sale_07May2018.xlsx
Thanks,
Ayman

1.Loop through the files using ForEach Loop and get the FileName and use Substring to get only the Date Part(07May2018/20Jun2018/15Aug2018 in your case). Convert this to the format you want using convert function.
select convert(varchar,convert(date,'15Aug2018'),101)
2.Use a precedence constraint in the control flow which compares both the values and load the file if it matches.

I would build the name of the file you are looking for and use a foreach loop looking for that specific file.
the C# logic for this is:
DateTime dt = DateTime.Parse("1/1/2018"); //Just set from your parameter
string str_dt = dt.ToString("ddMMMyyyy");
string fname = "Asia_Sale_" + str_dt + ".xlsx";
Once you've got that, use your variable to check your foreach loop for the file.

Related

How to get file name dynamically with time stamp in SSIS

I have a flat file source which has to be loaded daily to a table. I receive the file in the following format "filename_20190509040235.txt"
I used expression to get file name with date, how can I get the time stamp?
The time stamp is different in each date. The file get generated in the afternoon and the package is planning to run every night.
Assuming you want to load files based on a certain time defined by the timestamp on the file name, an overview of this process is below. As noted, files with a timestamp within the 12 hours prior to the package execution are returned, and you may need to adjust this to your specific needs. This also uses the same file name/timestamp format as indicated in your question, i.e. filename_20190509040235.txt.
Create an object and string variable in SSIS. On the Flat File connection manager, add the string variable as the expression for the connection string. This can be done from the Properties window (press F4) on the connection manager, going to the Expressions field, pressing the ellipsis next to it, choosing the ConnectionString property on the next window and selecting the recently created string variable as the expression for this.
Add a Script Task on the Control Flow. Add the object variable in the ReadWriteVariables field. If the directory holding the files is stored in an SSIS variable add this variable in the in the ReadOnlyVariables field.
Example code for this is below. Your post stated the files are generated in the afternoon with the package running nightly. Not being sure of the exact requirements, this just returns files with a timestamp within 12 hours of the current time. You can change this by adjusting the parameter of DateTime.Now.AddHours, which currently subtracts 12 hours from the current time (i.e. adds -12). This will go in the Main method of the Script Task. Be sure to add the references noted below too.
Add a Foreach Loop after the Script Task and for the enumerator type select Foreach From Variable Enumerator. On the Variable field of the Collection tab, choose the object variable that was populated in the Script Task. Next on the Variable Mappings pane select the string variable created earlier (set as the connection string for the Flat File connection manager) at index 0.
Inside the Foreach Loop add a Data Flow Task. Within the Data Flow Task, create a Flat File Source component using the Flat File connection manager and add the appropriate destination component. Connect these two and ensure that the columns are mapped correctly on the destination.
Script Task:
using System.IO;
using System.Collections.Generic;
//get source folder from SSIS string variable (if held there)
string sourceDirectory = Dts.Variables["User::SourceDirectory"].Value.ToString();
DirectoryInfo di = new DirectoryInfo(sourceDirectory);
List<string> recentFiles = new List<string>();
foreach (FileInfo fi in di.EnumerateFiles())
{
//use Name to only get file name, not path
string fileName = fi.Name;
string hour = fileName.Substring(17, 2);
string minute = fileName.Substring(19, 2);
string second = fileName.Substring(21, 2);
string year = fileName.Substring(9, 4);
string month = fileName.Substring(13, 2);
string day = fileName.Substring(15, 2);
string dateOnFile = month + "/" + day + "/" + year + " "
+ hour + ":" + minute + ":" + second;
DateTime fileDate;
//prevent errors in case of bad dates
if (DateTime.TryParse(dateOnFile, out fileDate))
{
//files from last 12 hours
if (fileDate >= DateTime.Now.AddHours(-12))
{
//FullName for file path
recentFiles.Add(fi.FullName);
}
}
}
//populate SSIS object variable with file list
Dts.Variables["User::ObjectVariable"].Value = recentFiles;

How to extract data from excel file into database with dynamic file name SSIS

Can you guy please help?
I have a problem to load data from excel into database with dynamic file name in my source files.
For example, for this month, my filename is ABC 31122017.xlsx. I successfully loaded data from each tab in this excel file into database.
But how do I make it dynamic? For example next month I have excel file
ABC 31012018.xlsx. How to make the job dynamic to pick up the new file?
I able to put the date in variable, but I don't know how to proceed with the filepath in SSIS.
#[User::InputPath] + "ABC " + #[User::Report_DT_DDMMYYYY] + ".xlsx"
I used this in Expressions in the Connection already, set up ExcelFilePath, but it couldn't work.
As in Excel Source connector in SSIS, I already chose the 31122017.xlsx and chose the first tab. But after I put in the Expressions, it couldn't find the first tab I chosen already.
Please help me guys. Thank you.
May be below explanation will help you in overcome this issue (I have SSIS 2012) -
First SSIS variable will hold date value i.e., "20180218". Variable Name- TodayDate. This variable value will be change according to today date.
Second SSIS variable will hold FileName i.e., ""D:\SSIS\StackOverFlowTest1\InputFiles\AB " + #[User::TodayDate] + ".xlsx". Variable Name- FileNameExcel.
Create connection manager for excel and under its properties window change expressions and set ExcelFilePath to "FileNameExcel".
Change "Delay Validation" to True under "Data Flow Task" property.
Using a foreach:
Set up a string variable fileName (put in the actual file path / file name so you can develop your package.
Add a Foreach Loop (File Enumerator which is default)
Set Expression for Directory = #InputPath
Set Files to the proper mask for your excel file (i.e. "ABD *.xlsx")
Go to variable mappings and link to fileName
Add an Excel connection and connect to your actual file.
Set an expression on properties to ExcelFilePath = #fileName
Delay Validation
Develop your data flow(s) as normal.

Inserting Data from Flat file source to database between 2 dates

I have an SSIS package set up that imports downloaded data files to the database (one file at a time by date)
Current Setup (for a file):
Downloaded file is at location (same file exists between the date range 1st Feb to Today)
C:\DataFiles\GeneralSale_20170201.txt
In SSIS the variables - For each file there are 4 variables. First is the location of the where the file is, called #Location
The second simple gives the name of the file named #GeneralSales returning the value
GeneralSale_
The third is the date (#ExportDateFormatted) for which the code is, (DT_WSTR,8)(DATEPART("yyyy", #[User::ExportDate]) * 10000 + DATEPART("mm", #[User::ExportDate]) * 100 + DATEPART("dd", #[User::ExportDate])) and [ExportDate] is set as DATEADD("DD", 0, GETDATE()).
[ExportDate] allows me to set the file date (which is already downloaded) that I want to import in my table dbo.GeneralSale i.e. If I want to import the file on 20170205 then I would adjust the export date and then run the package
The final variable is the #ExportFileExtension returning the value
txt
Then in the DataFlow which looks like the below:
The flat file source connects to the connection string below. The Property > Expressions > ConnectionString of the connection string runs the variables to make a file name. This is where I use the variables from before
#[User::Location] + #[User::GeneralSales] + #[User::ExportDateFormatted] + "." + #[User::ExportFileExtension]
Returning the value:
C:\DataFiles\GeneralSale_20170201.txt
This then populates the table with the data of that file. But to insert the date for another day I have to amend the date and run the package.
What I am trying to do is pass a start and end date to let the package insert all data from the files between those dates.
Hope the above information is clear of what goes on and what I am trying to achieve.
You need to iterate between two dates. In SSIS its pretty straightforward; I would describe the main steps:
Define two package parameters, StartDate and EndDate of type Date, and on the package start - validate that StartDate <= EndDate.
Define a Date variable ExtrDate, and add For Loop with settings initial Expression #ExtrDate = #StartDate, Evaluation - #ExtrDate <= #EndDate and Assign - #ExtrDate = DateAdd("dd", 1, #ExtrDate). Purpose of this loop is quite clear.
Put your extraction tasks inside For Loop container.
ExtrDate variable will be increased on each step of the loop.
Package parameters allow building more flexible package.

SSIS assign value to variable to be used as Filter in copying file

How can I achieve this, filterFileName = "MyFilter_" + "date" + "*.csv"?
date variable stores value in this format = 12202015.
Is it even possible? How can I assign a value to a SSIS variable which will be used as filter when copying or moving file to another folder.
Or is my approach correct?
You can create a variable combining other variables defined in your solution. I normally name my output files using the following:
VarFileName: A string variable containing the base filename
varFilePath: A string variable containing the base file path
These can be updated via a config file or by obtaining filenames from a For..Each loop container
varFilepath+varFilename+varDate+”.csv”
varDate =
(dt_wstr,4)(datepart("yy",getdate()) ) + (dt_wstr,2)(datepart("mm",getdate()))
+ (dt_wstr,2)(datepart("dd",getdate()) )+ (dt_wstr,2(datepart("hh",getdate()))
+ (dt_wstr,2)(datepart("n",getdate()) )+(dt_wstr,2)(datepart("ss",getdate()))
You can use the FileSystem object to move/rename/delete this file

Loading data from multiple flatfiles with a derived column to a database

How can i load data from different file sources and the destination table should have a derived column as File_Name and the file name should be displayed for the rows from which file is the data loading into the sql table.
For example:
file1.txt contains
emp_id emp_name
1 abc
file2.txt contains
emp_id emp_name
2 adc
output table should contain
emp_id emp_name file_name
1 abc file1
2 adc file2
I have written several SSIS packages doing exactly what you seek. Assuming you're applying a foreach loop with a "Foreach File Enumerator" (reading from each file with a specific extension in a folder) you can set up an object variable under Collection that will function as an array storing all of your file names (including the complete path) and then under Variable Mappings set up a second Variable of type string (with Index = 0) that will temporarily store the name of the file (again includes the complete path). Call it FileWithFullPath.
There are steps that now need to be applied because you need to send the file name to a database field and the file name must not include the full path or extension (c:\documents\file1.txt becomes file1). To do this you'll need to add to your foreach loop the following in the order defined.
Add a Script Task. Under Script for ReadOnlyVariables, add the FileWithFullPath variable. Under ReadWriteVariables, create a new variable called FileNameOnly. Here we will be reading in the filename that currently includes the full path and then returning the file name excluding the path and extension. Select Edit Script and here you will apply the following C# code to make this happen.
Add "using System.IO;" to region Namespaces above.
Within the braces under public void Main(), add the following:
;
string PathandFileName = Dts.Variables["User::FileWithFullPath"].Value.ToString();
string FileNameWithExtension = Path.GetFileName(PathandFileName); //trim path
string FileNameOnly = Path.GetFileNameWithoutExtension(FileNameWithExtension); //trim extension
Assign final string value to new variable.
Dts.Variables["User::FileNameOnly"].Value = FileNameOnly;
Save and build the code and then exit out.
Finally, under the Control Flow tab add a Data Flow Task into the foreach loop. Select the Data Flow tab, add your source and destination, but in between add a Derived Column task. Here you will create a new column called FileName and drag into its expression the new variable you just populated in the C# code called User::FileNameOnly. This you will then map to the Filename column in your destination database table along with the other columns being read from that file.
Give this a try and let me know if you have any questions.
Thanks.