how to handle middle of data failure case in ssis - ssis

Hi I have one doubt in ssis
I want load source excel file data into sql server database table.
Source excel file have billions of data(huge data).
whiel loading time halfoffrecords are loaded into destination table after that its failed due some data comes incorrect format .
in this sistuvation how will handle package for loading all data into destination using ssis.
source: excel(Emp information data)
destination : Table : emp
I tried using check point configuration for rerun at the point of failure..
but its not usefully to handled data row level and duplicate data is loading.
and I tried another way truncate data in the destination table.
after that I used redirect row for error handling.buts its not good implementation due to
trunating destination table.
please tell me how many way to achive this task(complete load) in ssis package level.

Load your data from Excel into a staging table, which you truncate before every load.
Make all of the columns of the staging table nvarchar(max) type, so they can handle any format of incoming character data.
Then run a stored procedure that de-dupes, formats and transfers the data to the final destination table.

Related

SSIS package design, where 3rd party data is replacing existing data

I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.
Here's the quick description of the business need:
We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.
Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.
Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.
Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.
What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.
What have you done in a similar situation?
If files are not present update some flags like IsFile1Found to false and pass these flags to stored procedure which truncates on conditional basis.
If file is empty then Using powershell through Execute Process Task you can extract first two rows if there are two rows (header + data row) then it means data file is not empty. Then you can truncate the table and import the data.
other approach could be
you can load data into some staging table and from these staging table insert data to the destination table using SQL stored procedure and truncate these staging tables after data is moved to all the destination table. In this way before truncating destination table you can check if staging tables are empty or not.
I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.
What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.

dynamically adding derived column in SSIS

I have a scenario where my source can be on different versions of our database as a result the in source file I could have different number of columns while my destination have defined number of columns.
now
what we are trying to do is:
load data from source to flat files. move them to central server and
then load that data into central database. but if any column is
missing in flat file i need to add derived column.
what is the best way to do this?? how can i dynamically add derived columns?
You can either do this with BiMLScript as other have suggested in comments, or you can write a script task that reads the file, analyzes the contents, and imports it. Yet another option would be to bulk import the file as is to a staging table (that would have to be dropped and re-created everytime) and write a stored procedure that analyzes the DDL and contents, and imports data to the destination table.

add parameter value to table at import SSIS

I am importing an Excel table from an ftp site to SQL using SSIS. The destination table is going to be used to calculate good and bad records based on another SQL database. Here is my problem. The Excel file is name RTW_032613_ABC_123.xls. This file name is a concatenation of a number of fields. I cannot recreate it based on the fields in the table, so I need to retain it and pass it to the new table in SQL. I have a parameter #FileName that I am using to loop through the files in the ftp folder. What I would like to do is either combine the import of data from the Excel file with the file name or insert the file name in each record after the import. I am calling the SSIS procedure from another stored procedure in SQL. I tried adding a SQL data flow task but I am not seeing where I add the insert statement on either the SQL Server Compact Destination or SQL Server Destination.
I am over my head with SSIS on this one. The key is that the parameter that I need is available in SSIS but I really need to get it passed on to my SQL table.
TIA
If I'm reading your question right, you have an SSIS package with a variable containing the filename and you want to save the filename with each row that you are sending to your SQL table? If so:
Add a derived column to the data flow, making a new column and referencing the variable in the expression
Include that new column in the mapping for the destination of your data flow, sending the filename to whichever column you would like to save that data in.
No need for a seperate SQL task.

Reading excel from DB (varbinary) in SSIS

First I'd like to say that I'm brand new to SSIS so bear with me if this is a very basic question. I've searched and cannot find an answer.
I need to read data from SQL Server that is stored in a varbinary column that contains an excel document. I then need to store this data into another table with the appropriate columns (pre-defined format).
My question is essentially... How do I read this varbinary data into something I can work with and then insert into another table?
You could use Export Column Transformation available within the Data Flow Task to read the varbinary data and then save it as a file on local disk where the SSIS package is running.
MSDN documentation about Export Column transformation.
Sample: The Export Column Transformation on BI Monkey
Using another data flow task, you can read the saved file and import the data into the table of your choice.

ssis want to update a sql table based on a flat file

Rather new to SSIS so not sure how to handle this.
I have a flat file which i managed to successfully read from. So right now my data flow consists of just a flat file source.
What i want to do is something like this:
Update SqlTable S
set s.columnA = f.columna
from FlatFile f
where s.columnID = f.columnID
Right now the only way i can see of doing this would be to insert the contents of the flat file into a sql table, then doing my update. This seems wasteful considering i don't need to save the data of the flat file. I just need to update an existing sql table based on the data in the flat file. So is there some way to run the query directly in the SSIS package instead of having to insert a bunch of data into a sql table that i will just wind up dropping?
thanks
Update SqlTable S set s.columnA = f.columna from FlatFile f where s.columnID = f.columnID
That statement above is a SQL statement. You cannot connect a sql table to a flat file. You need to work in SQL to do an update, since that is where the table lives
You have 2 choices:
Use an OLEDB Command component within the data flow. The downside is this calls the statement for each record, so if you have 1,000s of records it is very inefficient.
Push the records to a table using an OLE DB Destination and then you can call your update using an Execute SQL Task. You can then truncate the table if you like
A possible 3rd option is to roll your own OLE DB destination to do an update on record sets vs records.
While this might sound wasteful, to create a table in the database to store update records, it is done very often. You just drop the worktable or truncate when complete.
You could add an OLE DB Command component to the Data Flow that retrieves data from the flat file. The OLE DB Command would do a single row update for each record retrieved from the flat file. This might be okay if there are few rows in the flat file; but, you can imagine how bad performance will be if there are many rows in the flat file.
I think you'll find that sending the flat file rows to a database table and running a single UPDATE is going to be the best performer for lots of data.
I haven't tried this but have you tried sending to a recordset destination and then running the update using that?
The bulk load into a temporary table is the way to go and then do your updates from the temp table. As a previous poster says it is quite a common aproach to stuff data into a staging area prior to doing some more work with the data and then dropping or truncating the table