How can i load data from different file sources and the destination table should have a derived column as File_Name and the file name should be displayed for the rows from which file is the data loading into the sql table.
For example:
file1.txt contains
emp_id emp_name
1 abc
file2.txt contains
emp_id emp_name
2 adc
output table should contain
emp_id emp_name file_name
1 abc file1
2 adc file2
I have written several SSIS packages doing exactly what you seek. Assuming you're applying a foreach loop with a "Foreach File Enumerator" (reading from each file with a specific extension in a folder) you can set up an object variable under Collection that will function as an array storing all of your file names (including the complete path) and then under Variable Mappings set up a second Variable of type string (with Index = 0) that will temporarily store the name of the file (again includes the complete path). Call it FileWithFullPath.
There are steps that now need to be applied because you need to send the file name to a database field and the file name must not include the full path or extension (c:\documents\file1.txt becomes file1). To do this you'll need to add to your foreach loop the following in the order defined.
Add a Script Task. Under Script for ReadOnlyVariables, add the FileWithFullPath variable. Under ReadWriteVariables, create a new variable called FileNameOnly. Here we will be reading in the filename that currently includes the full path and then returning the file name excluding the path and extension. Select Edit Script and here you will apply the following C# code to make this happen.
Add "using System.IO;" to region Namespaces above.
Within the braces under public void Main(), add the following:
;
string PathandFileName = Dts.Variables["User::FileWithFullPath"].Value.ToString();
string FileNameWithExtension = Path.GetFileName(PathandFileName); //trim path
string FileNameOnly = Path.GetFileNameWithoutExtension(FileNameWithExtension); //trim extension
Assign final string value to new variable.
Dts.Variables["User::FileNameOnly"].Value = FileNameOnly;
Save and build the code and then exit out.
Finally, under the Control Flow tab add a Data Flow Task into the foreach loop. Select the Data Flow tab, add your source and destination, but in between add a Derived Column task. Here you will create a new column called FileName and drag into its expression the new variable you just populated in the C# code called User::FileNameOnly. This you will then map to the Filename column in your destination database table along with the other columns being read from that file.
Give this a try and let me know if you have any questions.
Thanks.
Related
I have an issue regarding a deletion of files based on a selection.
So, I have an SQL table in which there is a column with the file name.
What I have to do is take the file name from there and delete the physical file with that name from a specified path.
What I have tried is to execute an sql task with the selection of full file paths (including file names) which need to be deleted and store them in an object variable and then try to do a ForEach Loop Container with a 'from variable enumerator' to take those results and delete the files using a file system task, but I don't think that's the right approach.
The execution of the package ends in success, but it ends at the ForEach Loop Container without reaching (executing) the file system task, so the files are not being deleted.
Select statement (also added the full path of the file as a column in order for the file system task to know where the physical file is stored) :
select file_full_path from t1
where flag = 'YES'
Do you guys know a solution to this?
What you can do is set the loop to Foreach ADO Enumerator , so your process would be:
1: SQL Task - map file_full_path to an object variable, say 'filesdel'
2: Foreach Loop set to Foreach ADO Enumerator
3: Map the the the for loop items to a string variable, say 'filename':
4: Use the string variable in the File System Task
I have a Foreach File Enumerator that will read pdf files name from a folder and place the filename into database. However, i wan it to exclude reading filename that has more less than 3 underscore.
AAA_BBB_000004554_060420161906_S1234567H_M.pdf
AAA_BBB_000003345_060420161906_S9876543H_S.pdf
AAA_BBB_000008546_060420161906_S1234123H_V.pdf
AAA_BBB_201604.pdf
etc
AAA_BBB_201604.pdf should be excluded in the loop as the filename only has 2 underscore.
How can i archive that? i did some search and it seems like using expression is the key, but i had no idea how to do it. Kindly help thank you.
This can be done using TOKENCOUNT function in an Expression.
Create 2 variables
FileName of String type
TokenCount of Int32 type
Foreach Loop Container
Use Foreach Loop Container and set the Collection - Foreach File
Enumerator
Specify the folder location where your .pdf files exists
set ".pdf* under Files: Select the radio button Retrieve File Name -
Name only
Map the File Name retrieved
Next, put an Expression task inside the Foreach Loop Container and using the following expression
Next, drop an Execute SQL Task and connect it from Expression task
#[User::TokenCount] = TOKENCOUNT( #[User::FileName] ,"_")
This uses the TOKENCOUNT function - Returns the number of tokens in a string (FileName in your case) that contains tokens separated by the specified delimiters ('_' in your case)
Assign the token count to an int variable - #[User::TokenCount]
In the Precedence constraint Editor, provide the following Constraint Options
Configure the Execute SQL Task
Finally, it should like this
I put script task between Expression task and Execute SQL task for debugging purpose, if you want you my use this
Running the package - let's say you want to load these file names from this folder
Since, we gave the condition in the expression (Token count > 3), after running the package, these file names will be loading in the database
Hope this helps.
I have a folder having multiple files with the name as
P04_20140326_1234.zip
P04_20130324_58714.zip
P04_20130808_jurhet.zip
P04_20130815_85893.zip
etc
The name is in the format P04_systemdate_*.zip.
I want to pick the folder containing currentdate in the name and unzip it first and load the data from extracted file into the table.eg : file named as A.txt goes into table A, filenamed as B goes into table B and so on...
I guess you have already done the following:
Add a Data Flow
Inside the data flow, add a flat file source, and Ole_DB destination
Configure the flat file source to point to one of your files and connect all the appropriate columns so that data flows from file to database.
If all of this is already working, then let's do the For-Each loop
Create a variable (default to package root level) and call it CsvFileName of type string
Add a ForEach loop (not a For loop)
Change loop type to be a Foreach File Enumerator
Set your folder path and look for *.csv
Under Variable mappings, add the variable "User::CsvFileName" variable, and set the index to 0 - this means that all file names returned from the Foreach loop will show up in the variable.
In the Connection Managers (bottom) right click on the FlatFileSource, and choose properties
Set the DelayValidation to "True"
Click on Expressions, and then click on the ellipsis
Set the ConnectionString property to use the "CsvFileName" variable
Run it. This should load all files. Now, if you just want to restrict it to a date here's what you do:
Create a variable called "FilterDate"
Set the value to whichever date you want to set (20140322, for example)
In the ForEach loop, go to Collections, and then click on Expressions, then click the ellipsis
Set the FileSpec property to be "*" + #[User::FilterDate] + "*.csv"
Now it will only filter the files that you want.
I have a complex task that I need to complete. It worked well before since there was only one file but this is now changing. Each file has one long row that is first bulk inserted into a staging table. From here I'm supposed to save the file name into another table and then insert the the broken up parts of the staging table data. This is not the problem. We might have just one file or even multiple files to load at once. What needs to happen is this:
The first SSIS task is a script task that does some checks. The second task prepares the file list.
The staging table is truncated.
The third task is currently a Foreach loop container task that uses the files from the file list and processes it:
File is loaded into table using Bulk Insert task.
The file name needs to be passed as a variable to the next process. This was done with a C# task before but it is now a bit more complex since there could be more than one file and each file name needs to be saved separately.
The last task is a SQL task that executes a stored procedure with the file name as input variable.
My problem is that before it was only one file. This was easy enough. What would the best way be to go about it now?
In Data Flow Task which imports your file create a derrived column. Populate it with system variable value of filename. Load filename into the same table.
Use a Execute SQL task to retrieve distinc list of filenames into a recordset (Object type variable).
Use For Each Loop container to loop through the recordset. Place your code inside the container. Code will recieve filename from the loop as a value of a variable and process the file.
Use Execute SQL task in For Each Loop container to call SP. Pass filename as a parameter like:
Exec sp_MyCode param1, param2, ?
Where ? will pass filename INPUT as a string
EDIT
To make Flat File Connection to pick up the file specified by a variable - use Connection String property of the Flat File Connection
Select FF Connection, right click and select Properties
Click on empty field for Expressions and then click ellipsis that appears. With Expressions you can define every property of the object listed there using variables. Many objects in SSIS can have Expressions specified.
Add an Expression, select Connection String Property and define an expression with absolute path to the file (just to be on a safe side, it can be a UNC path too).
All the above can be accomplished using C# code in the script task itself. You can loop through all the files one by one and for each file :
1. Bulk Copy the data to the staging
2. Insert the filename to the other table
You can modify the logic as per your requirement and desired execution flow.
Add a colunm to your staging table - FileName
Capture the filename in a SSIS Variable (using expressions) then run something like this each loop:
UPDATE StagingTable SET FileName=? WHERE FileName IS NULL
Why are you messing about with C#? From your description it's totally unnecessary.
i have 3 csv files in a folder which contains eid, ename, country fields, and my 5 csv files names are test1_20120116_034512, test1_20120116_035512,test1_20120116_035812 etc.. my requirement is I want to take lastest file based on timne stamp and modified date, which i have done. Now i want to import the extracted file name into destination table..
my destination tables contains fields like,
filepath, filename, eid, ename, country
I have posted regarding this before in the same site i got an answer for extracting filename, now i want to load the extracted FileName into destination table
Import most recent csv file to sql server in ssis
my destination tables should have output as
C:/source test1_20120116_035812 1234 tester USA
In your DataFlow task, add a Derived Column Transformation. The value of CurrentFile will be the fully qualified path to the file. As you only want the file name, I would look to use a replace function on that with the base folder and then strip the remaining slash. This does not strip the file extension but you can add yet another call to REPLACE and substitute an empty string
Derived Column Name: filename
Derived Column:
Expression: REPLACE(REPLACE(#[User::CurrentFile], #[User::RootFolder], ""), "\\", "")
The above expects it to look like
CurrentFile = "C:\source\test1_20120116_035812.csv"
RootFolder = "C:\source"
Edit
I believe you've done something in your approach that I did not do. You should see a warning about possible truncation but given the values discussed in this and the preceding question, I don't believe the 4k limit on expressions will be of concern.
Displaying the derived column
Demonstrating the derived column does work
I will give you a +1 for providing an approach I wasn't aware of, but you'll still need to add a derived column to match your provided format (base path name)
Full path is provided from the custom properties. Use the above REPLACE section to remove the path info except use the column [FileName] instead of #[User::CurrentFile]
I tried to get the filename through the procedure which Billinkc has given, but its throwing me error stating that filename column failed becaue of truncation error..
Any how i tried different approach to load file name into table.
steps i have used
1. right click on flat file Source and click on show advanced edito for Flat file
2. select component Properties tab
3. Inside that Custom Properties section ---> it has a property FileNameColumnName
I have assigned Filename to that column property like
FileNameColumnName----> FileName thats it, am able to get the filename into my destination table..