How to using the loop and bulk load tasks to insert the name of the csv files being looped? - sql-server-2008

Description
I have created an SSIS package imports data from hundreds of csv files on a daily bases
I have used the bulk load and foreach loop container
Problem
I have created a column on a database table and wanted to know if it is possible to add the source file name on each row of data.

If you have the filename in a variable (which you could do in the for each loop) then you just use the variable as the data source for the column. Or ther may be a system variable that contains the file name, pole around a bit inthe system varaibles available to you and see.

Related

How to load multiple files into multiple tables using SSIS for each loop

Am beginner to SSIS. I have to load multiple files into multiple destinations using For Each loop in SSIS. The problem is for each loop is picking the files names dynamically but while loading into destination table its not changing and pointing to very first table. Which means job is loading the first file correctly but second file its still pointing to first table columns to load second file which causing the error i guess.
I have used below are the variables.
FileName
FolderPath
TargetTable
Please see first table source data in flat file
Second table source data in flat file
Control Flow task
Data Flow Task
Below is the error message :

SSIS package design, where 3rd party data is replacing existing data

I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.
Here's the quick description of the business need:
We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.
Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.
Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.
Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.
What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.
What have you done in a similar situation?
If files are not present update some flags like IsFile1Found to false and pass these flags to stored procedure which truncates on conditional basis.
If file is empty then Using powershell through Execute Process Task you can extract first two rows if there are two rows (header + data row) then it means data file is not empty. Then you can truncate the table and import the data.
other approach could be
you can load data into some staging table and from these staging table insert data to the destination table using SQL stored procedure and truncate these staging tables after data is moved to all the destination table. In this way before truncating destination table you can check if staging tables are empty or not.
I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.
What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.

dynamically adding derived column in SSIS

I have a scenario where my source can be on different versions of our database as a result the in source file I could have different number of columns while my destination have defined number of columns.
now
what we are trying to do is:
load data from source to flat files. move them to central server and
then load that data into central database. but if any column is
missing in flat file i need to add derived column.
what is the best way to do this?? how can i dynamically add derived columns?
You can either do this with BiMLScript as other have suggested in comments, or you can write a script task that reads the file, analyzes the contents, and imports it. Yet another option would be to bulk import the file as is to a staging table (that would have to be dropped and re-created everytime) and write a stored procedure that analyzes the DDL and contents, and imports data to the destination table.

Load and replace file path string with the content from that file in a MySQL database

I have a database of entries consisting of a 'name', 'id' and a 'description', but currently the 'description' field is set to the file path of a .txt file that actually contains the description content. Each .txt file's name is each row's 'id', plus the .txt extension and they all reside in the same directory.
Can I load and replace each 'description' field with the content from the relevant text file (using MySQL)?
You can't write a MySQL query directly that will read the description values from your file system. That would require the MySQL server to be able to read raw text from files in your file system. You Can't Do Thatâ„¢.
You certainly can write a program in your favorite host language (php, java, PERL, you name it) to read the rows from your database, and update your description rows.
You could maybe contrive to issue a LOAD DATA INFILE command to read each text file. But the text files would have to be very carefully formatted to resemble CSV or TSV files.
Purely using mysql this would be a difficult, if not impossible exercise because mysql does not really offer any means to open files.
The only way to open an external text file from mysql is to use LOAD DATA INFILE command, which imports the text file into a mysql table. What you can do is to write a stored procedure in mysql that:
Create a temporary table with a description field large enough to accommodate all descriptions.
Reads all id and description field contents into a cursor from your base table.
Loop through the cursor and use load data infile to load the given text file's data into your temporary table. This is where things can go wrong. The account under which mysql daemon / service runs needs to have access to the directories and fiels where the description files are stored. You also need to be able to parametrise the load data infile command to read the full contents of the text file into a single field, so you need to set the field and line terminated by parameters to such values that cannot be found in any of the description files. But, even for this you need to use a native UDF (user defined function) that can execute command line programs because running load data infile directly from stored procedures is not allowed.
See Using LOAD DATA INFILE with Stored Procedure Workaround-MySQL for full description how to this.
Issue an update statement using the id from the cursor to update the description field in your base table from the temporary table.
Delete the record from your temp table.
Go to 3.
It may be a lot easier to achieve this from an external programming language, that has better file manipulation functions and can update each record in your base table accordingly.

SSIS - dynamic column mappings

I am using SSIS to do data transformation from excel to OLEDB SQL. I have a set of sheets in a folder that i ll have to loop thru and insert the data in each of these sheets to a table. I have a scenario where i have to loop thru a set of Excel sheets that has different column structures. I can loop thru each sheet thru foreach loop enumerator find the filename and pass them on to the Excel source.
I want to know if there is way to escape this column mappings in the destination component which will be a OLEDB SQL table in my case. Because these mappings are different for each file. Is there way to do this dynamically?
While you can add a script task inside the loop to modify the mappings, it is not the easiest thing to do as you have to explicitly create each mapping in code. A simpler solution would be to replace excel sheets with delimited text files and import them inside the loop using the Bulk Insert Task. You will not have to provide any mapping info as long as the column order is the same in both files and tables.
If you can't do that, you will hove to store the mapping metadata somewhere, e.g. a table in your database and use this to create the dynamic mappings in a Script task.
A slightly easier way is to use just one script task that uses the metadata to configure an SqlBulkCopy object. You lose the flexibility of a full Data Flow task but if all you want is to load some files in temp talbes, it is good enough. Plus, it's a whole lot easier to configure a SqlBulkCopy object than a Data Flow task.