I am running a SSIS package using SQL Server 2008 Job. The package crash at some point while running. I have created my own mechanism to grab the error and record it in a table. So I can see that there is an error with an specific task, but could not find what the error is.
When I run the same package from BIDS, it works perfect. no error.
What I want to do is, I need to write that error string to my own table which shown in the "Execution Result" tab.
So the question is which system variable holds the error string in SSIS.
The error is stored in the ErrorDescription system variable. See Handling Errors in the Data Flow for an example of how to get the error description.
Also, if you want to capture error information into a table, SSIS supports logging to a table using the SQL Server Log Provider. You can also customize the logging.
Too easy.
Left-Click (highlight) on the object you want to capture the error event (Script, or Data Flow, etc.)
Click on 'Event Handlers' - screen should open with Executable = object you clicked and Event Handler = OnError
Click URL (click here to create....)
Drag Execute SQL object from SSIS Toolbox
Configure to the database/table you want to house the error message
Write INSERT INTO DB.Schema.Table(DBName, SchemaName, TableName,ErrorMessage,DateAdded)
Write VALUES (?,?,?,'I am smart',getdate())
Click Parameters and select the USER::Variables for the ?'s + my comment.
Since this is ran at the database server it will pass in the ?'s. My SAC is already at the database as a value but you will have selected System::ErrorDescription as parameter 3. Remember, this array is 0 based. DO NOT TRY TO NAME THE PARAMETERS. Instead, number them 0 to ~? The datatypes are based on what you have going in; mine are all VARCHAR so... :)
This is a much better solution than just logging whatever the server allows you to.
I can also add a counter variable and adjust it wherever I like; then pass it to the event OnError. This will allow me to pinpoint exactly where the last successful object completed; works best in scripting objects but also available in other areas.
I'm using this so I can process thousands of cycles without actually failing the package. If a table doesn't exist or a column doesn't exist I simply log it for further review later. Oh yeah, I'm cycling through hundreds of databases capturing their architecture and maximum column size used; not to be confused with maximum column size.
Example: TelephoneNumber comes from a source column of char(500) (definitely bad programming but...you can't change everything so..). I capture the max len of that column and adjust the destination column to accommodate that size +/- a certain percentage.
If a table doesn't exist or a column doesn't exist anymore I log the error and keep churning. At the end, I can evaluate those entries and see if I can actually remove them from my warehouse. This happens more in the TEST and STAGE environments than in PROD. However, when a change goes through to PROD I most definitely will identify it as it's coming in to the warehouse.
Everything is configured, this includes dynamic MERGE/JOINs, INSERT, SELECT, ELEMENTS, SIZES, USAGESIZE, IDENTITY, SOURCEORDER, etc. with conversions of data to destination datatypes.
ALL that because the systemic version of logging will not provide you with the granularity you might need for this type of operation. This OnError Event Handler can if setup properly.
Check this out! He has explained with a Step by step process on how to configure SSIS logging which has the error message parameter.
Related
I am implementing a SSIS package and currently trying to do the following.
Truncate the destination table
Fetch the data by executing the stored procedure and insert it into the destination table.
I have created an Execute SQL task to address step 1 and dataflow with oledb source and oledb destination to address the second point. It been working successfully so far but isn't working for one my stored procedure that uses temp tables.
When I edit the oledb source and click the preview button, I get the error no column returned
I know that SSIS has an issue with generating column while executing stored procedures that depend on temp tables. I have converted the stored proc to use temporary table variables and its now able to return columns in SSIS when I do a preview. The only downside is that the stored procedure is taking longer time to execute. Its taking 1 hour 15 mins as compared to 15 mins while using temp tables.
I did see a suggestion to use SET FMTONLY before executing the stored procedure as an alternate solution to changing to temp table variables but that didn't seem to work as I am getting syntax or permission denied error.
Could somebody tell me a solution to my problem which does not compromise on the performance.
Sounds like you've already read all the approaches to using Temp tables in SSIS, including the IF 1=0... trick? If you haven't seen that one yet, google it.
You say that using Table Variables causes your stored procedure to take about 5 times longer than using Temp Tables. The most likely reason for that is that you are indexing your temp tables but not your table variables. If you didn't know that table variables can be indexed, they can. You might try that.
Finally, a solution that you haven't mentioned is that you can replace your temporary table with a real table that gets truncated when you're done using it.
Short comment:
Try EXEC WITH RESULT SETS and specify the metadata yourself for a proc with temp tables; or use the Script Component as a source and specify the Output columns yourself.
Long comment:
Technically speaking, it is the driver/database you are using in SSIS that would decide the behavior when working with temp tables.
Metadata is an important factor when using SSIS's pipeline components. By metadata, I mean the names of the columns, their data types etc that a pipeline component uses. When designing a data flow, someone/something should provide this metadata to the components that require it.
In most cases, SSIS automatically retreives the metadata. Components that do not connect to a external data source, like Conditional Split etc, get their metadata from the other components they are connected to. For the pipeline components that connect to a external data source (like Oledb source, oledb destination, Lookup etc.), SSIS provides a mechanism to get this metadata without human involvement. This mechanism involves the driver connecting to the database and retrieving the metadata of the output. If the driver/database is capable of returning the metadata, then that metadata is used. If the driver/database is incapable, then you get the errors you are seeing. The rest of my comments are based on the assumption that you are using a SQL Server database in your question.
When working with a SQL Server database in SSIS, typically, we use the native client drivers provided by Microsoft. When trying to get the metadata, these drivers try to get the metadata without actually executing the SQL Statement (actual execution can have side effects; and also, might take more than a few seconds/minutes/hours; and you dont want side effects and long wait times during package design time.) So to get the metadata, the driver relies on the metadata of the actual objects used in the sql command. If the command uses a physical table or view, SQL Server already has the metadata available and can supply it to the driver. If it is a temp table, SQL Server does not have the metadata until it can create the temp table. If using FMT ONLY option, you can use it in such a way to create the temp tables, but avoid any heavy processing/side affects and thus be able to retrieve metadata without penalties. Post 2012, these native client drivers rely on some newer functionality to retrieve metadata than the drivers before 2012. In 2012 and after, the driver uses the sp_describe_first_result_set proc to retrieve metadata. So, whether you can get metadata or not is determined by the ability of the sp_describe_first_result_set proc.
So while SSIS can automatically get the metadata (because of the driver/database), it does not automatically get the metadata in some cases (again because of the driver/database). In cases involving the second scenario, some other process (typically a human) can help the driver infer metadata or provide the metadata to the component directly.
To help the driver, in case of SQL Server 2012 and after, you can use the WITH RESULTSETS clause to specify the output metadata. When this clause is present, the driver will use it and doesnt try to query the metadata from system objects; and thus avoid the error which you would otherwise get. If you are using the drivers that came with SQL Server 2008, you can use FMT ONLY. This option is at the driver/database level.
Another option could be to use a Script Component as the Source and in the Output columns, you can specify the columns/metadata. SSIS would not try to retrieve metadata from the datasource in this case, but would rely on the definitions you provided in the Output section of the Script Component.
As you can see, both options involve a human (or some other process) specifying the metadata instead of SSIS trying to retrieve the metadata in an automated fashion. I would prefer the first option if working with SQL Server and the second option if working with databases like MySql.
I'm really new to SSIS and I would like to automate some of my workflow. However, most of my tasks require that other jobs/packages in SQL Server Job Agent have previously been run for the day/week by other team members. My question is: how can I direct my workflow based on the status of these other packages? Control Flow doesn't have a Conditional Split tool, and I can't kick off a separate job/package from Data Flow. I can easily write a SQL statement to determine whether any or all of the related packages have been run, and even have it return a boolean value if needed, but I'm at a loss for how to make this direct the workflow to either proceed to the next step or fail the entire package if any the others haven't been run (desired result).
I really appreciate any help in this. Thank you!
In SSIS:
You can set conditions in the Control Flow by double-clicking on the arrow that links one task to another. If you set up an Execute SQL Task to run a query to check whether a package has run and then return a result into a variable, you can then set up a condition to check that variable, and only continue to the next task if it is (or isn't) a certain value.
Here's an example of how you might set up a precedence constraint so that the next step would only happen if the previous one was successful, and also a particular variable did not have a value of C:
In SQL Server Agent:
You could also - if you want to avoid running an entire package - set something up in the SQL Server Agent job instead. You could add an initial step which checks on the status of a package, then quits the job reporting success. This does mean that if the step fails for another reason (say the database it's trying to run the select statement on is down), it will still quit and report success, so be careful - you might want to set up some other specific steps first which check for such situations.
Here's an article I used a while back when I wanted to understand how to set up a SQL Server Agent job in this way:
http://sqlactions.com/2012/08/05/how-to-create-custom-schedule-for-sql-server-agent-job/
I have a ssis project with 3 ssis packages, one is a parent package which calls the other 2 packages based on some condition. In the parent package I have a foreach loop container which will read multiple .csv files from a location and based on the file name one of the two child packages will be executed and the data is uploaded into the tables present in MS SQL Server 2008. Since multiple files are read, if any of the file generates an error in the the child packages, I have to log the details of error (like the filename, error message, row number etc) in a custom database table, delete all the records that got uploaded in the table and read the next file and the package should not stop for the files which are valid and doesn't generate any error when they are read.
Say if a file has 100 rows and there is a problem at row number 50, then we need to log the error details in a table, delete rows 1 to 49 which got uploaded in the database table and the package to start executing the next file.
How can I achieve this in SSIS?
You will have to set TransactionOption=*Required* on your foreach loop container and TransactionOption=*Supported* on the control flow items within it. This will allow for your transactions to be rolled back if any complications happen in your child packages. More information on 'TransactionOption' property can be found # http://msdn.microsoft.com/en-us/library/ms137690.aspx
Custom logging can be performed within the child packages by redirecting the error output of your destination to your preferred error destination. However, this redirection logging only occurs on insertion errors. So if you wish to catch errors that occur anywhere in your child package, you will have to set up an 'OnError' event handler or utilize the built-in error logging for SSIS (SSIS -> Logging..)
I suggest you try the creation of two dataflows in your loop container. The main idea here is to have a set of three tables to better and more easily handle the error situations. In the same flow you do the following:
1st dataflow:
Should read .csv file and load data to a temp table. If the file is processed with errors you simply truncate the temp table. In addition, you should also configure the flat file source output to redirect the errors to an error log table.
2nd dataflow:
On the other hand, in case of processing error-free, you need to transfer the rows from temp into the destination table. So, here, the OLEDB datasource is "temp table" and the OLEDB destination is "final table".
DonĀ“t forget to truncate the temp table in both cases, as the next file will need an empty table.
Let's break this down a bit.
I assume that you have a data flow that processes an individual file at a time. The data flow would read the input file via a source connection, transform it and then load the data into the destination. You would basically need to implement the Error Handler flow in your transformations by choosing "Redirect Row". Details on the Error Flow are available here: https://learn.microsoft.com/en-us/sql/integration-services/data-flow/error-handling-in-data.
If you need to skip an entire file due to a bad format, you will need to implement a Precedence Constraint for failure on the file system task.
My suggestion would be to get a copy of the exam preparation book for exam 70-463 - it has great practice examples on exactly the kind of scenarios that you have run into.
We do something similar with Excel files
We have an ErrorsFound variable which is reset each time a new file is read within the for each loop.
A script component validates each row of the data and sets the ErrorsFound variable to true if an error is found, and builds up a string containing any error details.
Then - based on the ErrorsFound variable - either the data is imported or the error is recorded in a log table.
It gets a bit more tricky when the Excel files are filled in badly enough for the process not to be able to read them at all - for example when text is entered in a date, number or currency field. In this case we use the OnError Event Handler of the Data flow task to record an error in the log but won't know which row(s) caused the problem
I have 2 questions about ssis buffer.
1) In my ssis package i am using 1 data flow task contain 2 oledb source,2 oledb destination.Both are independent.All setting are keeping default,We know the DefaultBufferMaxRows is 10000 rows.So,if i run this package,whether the number of record per buffer is 10000 per each or 5000 per each?
2) I have tried to use ssis logging(xml and database),but i dont know,it is not showing any thing.It is creating xml file,but there is no usefull information(other that some xml tags).also it is not creating any table.
logging event window also not displaying any thing.Can u please help me...
Addressing the second question as I'd need to do research on the first.
I find logging to SQL Server to be my preferred destination. The other options (file, event viewer, trace file, etc) are fine and dandy but to sieve through that data, a query is my tool.
You state it isn't showing anything useful, what did you expect it to show and what have you selected?
I generally select the following events: OnInformation, OnError, OnWarning, OnPreExecute, OnPostExecute. The first three provide information about what's wrong, potentially wrong or could be improved with my packages. The last two I use to establish duration of various tasks.
Check them only at the top level. I had a coworker that had checked the above events at the package level but on each sub task they checked one event. They had anticipated it would inherit the logging established at the root and add in the single event selected at that level. The reverse was true: only the innermost checked items were logged.
Where does everything get logged? In 2005, it'll be found in dbo.sysdtslog90. For 2008 forward, it can be found in dbo.sysssislog The master copy of this table exists in msdb but if you point your OLEDB connection to a different catalog (SALES, Adventureworks, etc) the first invocation of the package will result in that table being copied down into the target catalog.
I'm quite new to SSIS - using 2008 version.
I have a job that uses a few data flow tasks. On the third one I'm getting a primary key violation on the last row that it needs to insert, but only sometimes!
I'd like to ignore this problem for now and let the job continue. I have set the MaximumErrorCount property to 10 for the DataFlowTaks, the SequenceContainer and for the Package but still taks fails and this causes the package to stop.
Could anyone please advise how I can get the package to ignore the error?
Thanks
Rob.
That error count refers to the number of Tasks that SSIS will allow to error before it stops the package. You're wanting to allow a set number of rows to error - and that's not what it's counting.
Instead, you should go into your Destination and configure the Error Output on that destination to either ignore errors, or redirect errors (better). You can then pull a red arrow off the bottom of the destination component to a Derived Column (or any other type of component that doesn't need to attach its output to anything), and put a Data Viewer on that red link. Now all the rows that fail will go to the Derived Column, and show up in a Data Viewer for you to see (while in BIDS).
The other thing you'll have to do is change the Batch Size on the OLE DB Destination (if that's what you're using) to 1 so that it only inserts one row at a time. Otherwise, it will fail the whole batch that contains the error...