I have an SSIS package set up that imports downloaded data files to the database (one file at a time by date)
Current Setup (for a file):
Downloaded file is at location (same file exists between the date range 1st Feb to Today)
C:\DataFiles\GeneralSale_20170201.txt
In SSIS the variables - For each file there are 4 variables. First is the location of the where the file is, called #Location
The second simple gives the name of the file named #GeneralSales returning the value
GeneralSale_
The third is the date (#ExportDateFormatted) for which the code is, (DT_WSTR,8)(DATEPART("yyyy", #[User::ExportDate]) * 10000 + DATEPART("mm", #[User::ExportDate]) * 100 + DATEPART("dd", #[User::ExportDate])) and [ExportDate] is set as DATEADD("DD", 0, GETDATE()).
[ExportDate] allows me to set the file date (which is already downloaded) that I want to import in my table dbo.GeneralSale i.e. If I want to import the file on 20170205 then I would adjust the export date and then run the package
The final variable is the #ExportFileExtension returning the value
txt
Then in the DataFlow which looks like the below:
The flat file source connects to the connection string below. The Property > Expressions > ConnectionString of the connection string runs the variables to make a file name. This is where I use the variables from before
#[User::Location] + #[User::GeneralSales] + #[User::ExportDateFormatted] + "." + #[User::ExportFileExtension]
Returning the value:
C:\DataFiles\GeneralSale_20170201.txt
This then populates the table with the data of that file. But to insert the date for another day I have to amend the date and run the package.
What I am trying to do is pass a start and end date to let the package insert all data from the files between those dates.
Hope the above information is clear of what goes on and what I am trying to achieve.
You need to iterate between two dates. In SSIS its pretty straightforward; I would describe the main steps:
Define two package parameters, StartDate and EndDate of type Date, and on the package start - validate that StartDate <= EndDate.
Define a Date variable ExtrDate, and add For Loop with settings initial Expression #ExtrDate = #StartDate, Evaluation - #ExtrDate <= #EndDate and Assign - #ExtrDate = DateAdd("dd", 1, #ExtrDate). Purpose of this loop is quite clear.
Put your extraction tasks inside For Loop container.
ExtrDate variable will be increased on each step of the loop.
Package parameters allow building more flexible package.
Related
I have a flat file source which has to be loaded daily to a table. I receive the file in the following format "filename_20190509040235.txt"
I used expression to get file name with date, how can I get the time stamp?
The time stamp is different in each date. The file get generated in the afternoon and the package is planning to run every night.
Assuming you want to load files based on a certain time defined by the timestamp on the file name, an overview of this process is below. As noted, files with a timestamp within the 12 hours prior to the package execution are returned, and you may need to adjust this to your specific needs. This also uses the same file name/timestamp format as indicated in your question, i.e. filename_20190509040235.txt.
Create an object and string variable in SSIS. On the Flat File connection manager, add the string variable as the expression for the connection string. This can be done from the Properties window (press F4) on the connection manager, going to the Expressions field, pressing the ellipsis next to it, choosing the ConnectionString property on the next window and selecting the recently created string variable as the expression for this.
Add a Script Task on the Control Flow. Add the object variable in the ReadWriteVariables field. If the directory holding the files is stored in an SSIS variable add this variable in the in the ReadOnlyVariables field.
Example code for this is below. Your post stated the files are generated in the afternoon with the package running nightly. Not being sure of the exact requirements, this just returns files with a timestamp within 12 hours of the current time. You can change this by adjusting the parameter of DateTime.Now.AddHours, which currently subtracts 12 hours from the current time (i.e. adds -12). This will go in the Main method of the Script Task. Be sure to add the references noted below too.
Add a Foreach Loop after the Script Task and for the enumerator type select Foreach From Variable Enumerator. On the Variable field of the Collection tab, choose the object variable that was populated in the Script Task. Next on the Variable Mappings pane select the string variable created earlier (set as the connection string for the Flat File connection manager) at index 0.
Inside the Foreach Loop add a Data Flow Task. Within the Data Flow Task, create a Flat File Source component using the Flat File connection manager and add the appropriate destination component. Connect these two and ensure that the columns are mapped correctly on the destination.
Script Task:
using System.IO;
using System.Collections.Generic;
//get source folder from SSIS string variable (if held there)
string sourceDirectory = Dts.Variables["User::SourceDirectory"].Value.ToString();
DirectoryInfo di = new DirectoryInfo(sourceDirectory);
List<string> recentFiles = new List<string>();
foreach (FileInfo fi in di.EnumerateFiles())
{
//use Name to only get file name, not path
string fileName = fi.Name;
string hour = fileName.Substring(17, 2);
string minute = fileName.Substring(19, 2);
string second = fileName.Substring(21, 2);
string year = fileName.Substring(9, 4);
string month = fileName.Substring(13, 2);
string day = fileName.Substring(15, 2);
string dateOnFile = month + "/" + day + "/" + year + " "
+ hour + ":" + minute + ":" + second;
DateTime fileDate;
//prevent errors in case of bad dates
if (DateTime.TryParse(dateOnFile, out fileDate))
{
//files from last 12 hours
if (fileDate >= DateTime.Now.AddHours(-12))
{
//FullName for file path
recentFiles.Add(fi.FullName);
}
}
}
//populate SSIS object variable with file list
Dts.Variables["User::ObjectVariable"].Value = recentFiles;
I have an SSIS requirement:
I have three Excel files with different dates in their file names, stored in a folder.
Folder path: D:\SourceFolder\
File names: Asia_Sale_07May2018.xlsx, Asia_Sale_20Jun2018.xlsx, Asia_Sale_15Aug2018.xlsx
I have a package parameter date of 07/15/2018.
Requirement: Process files where the file name date = parameter date.
If I set parameter date to 07/15/2018 the package should pick & load Asia_Sale_15Aug2018.xlsx
If I set parameter date to 06/20/2018 the package should pick & load Asia_Sale_20Jun2018.xlsx
If I set parameter date to 05/07/2018 the package should pick & load Asia_Sale_07May2018.xlsx
Thanks,
Ayman
1.Loop through the files using ForEach Loop and get the FileName and use Substring to get only the Date Part(07May2018/20Jun2018/15Aug2018 in your case). Convert this to the format you want using convert function.
select convert(varchar,convert(date,'15Aug2018'),101)
2.Use a precedence constraint in the control flow which compares both the values and load the file if it matches.
I would build the name of the file you are looking for and use a foreach loop looking for that specific file.
the C# logic for this is:
DateTime dt = DateTime.Parse("1/1/2018"); //Just set from your parameter
string str_dt = dt.ToString("ddMMMyyyy");
string fname = "Asia_Sale_" + str_dt + ".xlsx";
Once you've got that, use your variable to check your foreach loop for the file.
I have 3 SSIS variables namely name, age, gender with initial values set. I want to write these values into excel sheet in one row. Later I will extend this to Array of records.
To do this I have created Excel connection attaching the excel sheet where I want to write.
I added control flow task and double clicked and then added Derived column component to create derived columns for each of above 3 variables . Inside derived column editor I selectd above variables as new derived columns.
And then pipelined excel destination component and mapped sheet columns to derived columns. I executed the SSIS package and its successful. But variables are not written into excel sheet.
What I am doing wrong ?
Again, you need a source. I gave you an "easy" solution. This is probably the best solution to your problem:
This time the source will be a script component (select Source).
Steps after you add Script Component:
Select Source
Go to Inputs and Outputs
Add your Output Columns (Don't forget about data types)
Go back to Script
Add you variables (Gender, Name and Age)
Go into Script
Add the following code
public override void CreateNewOutputRows()
{
Output0Buffer.AddRow();
Output0Buffer.Age = Variables.Age;
Output0Buffer.Gender = Variables.Gender;
Output0Buffer.Name = Variables.Name;
}
You need a source. the easiest would be to use a SQL connection.
Use a variable of type string named SQL.
Set SQL = "Select '" + name+ "' as name,"+ age + "as age,'" + gender + "' as Gender
Set your source to SQL variable.
Connect this Source to Destination and you should have 1 row with 3 columns
Listing the steps clearly as suggested by #KeithL
Create a SSIS variable selectQueryVariables with string datatype.
Assign variable expression as
"SELECT '"+#[User::name]+"' as Name,'"+#[User::gender]+"' as Gender,"+(DT_WSTR,4 )#[User::age]+" as Age"
Add OLE DB Source component and set data access mode as SQL command from variable and select the variable selectQueryVariables in dropdown. Now the source is ready with 3 columns Name, Age and Gender.
Pipeline this with Excel Destination and map columns source and destination.
Can you guy please help?
I have a problem to load data from excel into database with dynamic file name in my source files.
For example, for this month, my filename is ABC 31122017.xlsx. I successfully loaded data from each tab in this excel file into database.
But how do I make it dynamic? For example next month I have excel file
ABC 31012018.xlsx. How to make the job dynamic to pick up the new file?
I able to put the date in variable, but I don't know how to proceed with the filepath in SSIS.
#[User::InputPath] + "ABC " + #[User::Report_DT_DDMMYYYY] + ".xlsx"
I used this in Expressions in the Connection already, set up ExcelFilePath, but it couldn't work.
As in Excel Source connector in SSIS, I already chose the 31122017.xlsx and chose the first tab. But after I put in the Expressions, it couldn't find the first tab I chosen already.
Please help me guys. Thank you.
May be below explanation will help you in overcome this issue (I have SSIS 2012) -
First SSIS variable will hold date value i.e., "20180218". Variable Name- TodayDate. This variable value will be change according to today date.
Second SSIS variable will hold FileName i.e., ""D:\SSIS\StackOverFlowTest1\InputFiles\AB " + #[User::TodayDate] + ".xlsx". Variable Name- FileNameExcel.
Create connection manager for excel and under its properties window change expressions and set ExcelFilePath to "FileNameExcel".
Change "Delay Validation" to True under "Data Flow Task" property.
Using a foreach:
Set up a string variable fileName (put in the actual file path / file name so you can develop your package.
Add a Foreach Loop (File Enumerator which is default)
Set Expression for Directory = #InputPath
Set Files to the proper mask for your excel file (i.e. "ABD *.xlsx")
Go to variable mappings and link to fileName
Add an Excel connection and connect to your actual file.
Set an expression on properties to ExcelFilePath = #fileName
Delay Validation
Develop your data flow(s) as normal.
I'm trying to solve this problem for a long time, but now I have to ask for your help.
I have one QVD file on my local PC named e.g. server001_CPU.qvd and on remote servers I have shared folder with many files of many types. There are also files named server001_CPU_YYYYMMDD.csv (e.g. server001_CPU_20140806.csv) that are generated every day and that have same structure as local qvd file. They have column DATE. What I need is (in loading script) to check last DATE in local file and load remote files starting from that day to today and then concatenate it together. Something like this:
CPU:
LOAD * FROM server001_CPU.qvd
LET vMAX = Max(DATE) FROM CPU
DO WHILE vMAX <= Today()
CPU:
LOAD * FROM serverpath/server001_CPU_$(vMAX).csv
LOOP
I'm really trying but I'm new to QV and it has strange logic for me. Thanks in advance for any help.
You can try the below script snippet which should do what you need.
What this does is first open your existing data set (in the QVD), and then finds the maximum date and stores it in table MaxCPUDate. This maximum value is then read into a variable and the table is dropped.
This "Max Date" value is then subtracted from today's date to determine the number of loops to execute to load the individual files. The loop variable is added on to the "Max Date" value to create the filename to load.
CPU:
LOAD
*
FROM server001_CPU.qvd (qvd);
MaxCPUDate:
LOAD DISTINCT
max(DATE) as MaxDate
RESIDENT CPU;
LET vMaxCPUDate = peek('MaxDate',-1,'MaxCPUDate');
DROP TABLE MaxCPUDate;
FOR vFileNum = 0 TO (num(Today()) - $(vMaxCPUDate))
LET Filename ='serverpath/server001_CPU_' & date($(vMaxCPUDate) + $(vFileNum),'YYYYMMDD') & '.csv';
CONCATENATE (CPU)
LOAD
*
FROM $(Filename) (txt, codepage is 1252, embedded labels, delimiter is ',', msq);
NEXT