SSIS PACKAGE, Only want Derived Columns

SSIS PACKAGE, Only want Derived Columns - ssis

I have a SSIS Package that I have a For Each Loop which imports multiple txt files into a SQL Server table. That runs fine.
What I am trying to accomplish is to store the distinct filename and date it was imported into a separate table. I created a separate For Each Loop for this and then archive the txt file after it's complete with a File System Task.
The issue I am having is I put an event handler to invoke a SQL Task and Send Email task if there is a warning (I was hoping for a warning only if there were no files in the directory where the package is importing from).
However, I found a warning that a column in the Data Flow task was not being used and should be removed if not needed. But the Data Flow task requires at least one field for me to put a Derived Column task
Derived Column Field1: pulls the #User: CurrentFile from the ForEachLoop Container.
Field2 pulls the current date.
Is there a way to perform this without the warning?

It sounds like you're over-complicating thing.
You have a ForEach loop and you're therefore assigning a value into some Variable to contain the file name, #User::CurrentFile. You can get the date it was loaded through either a call to GETDATE() or reference the system scoped variable, StartTime #[System::StarTime]
The most straight forward option would be to add an Execute SQL Task wired up to the OnSuccess Precedent Constraint from your Data Flow Task. The Execute SQL Task will then have a statement like INSERT INTO dbo.MyLog(FileName, InsertDate) SELECT ?, ?, assuming OLE DB Connection Manger, and then you map in your two variables.
Easy, clean, no warnings fired about unused columns in your data flow.
What I think you have is something like this, based on
I created a separate For Each Loop for this

Related

SSIS: How to get the number of updated and deleted rows in an audit?

Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
‌
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
P‌S: i only have 4 weeks off SSIS, sorry for my noobieness :)

An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.

I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.

RecordSetDestination SSIS

What is the major use of recordset destination in SSIS?I heard that it is an in-memory,so the variable which is holding the data is it in raw format? Can someone explain the explain me the real time project use of Recordset destination?

A recordset destination can be used for just about anything you can think of. Some common uses I hear is to use the recordset in a foreach loop. Say you want to export several "categories" from a transaction table. Perhaps you get a recordset of the categories that exist and then call a new dataflow to export that category as it's own file. Or perhaps date ranges, months, etc.
One way I use it is in a script task to perform an action on the data that SSIS cannot do natively. I was using a script component but this particular task ran into a concurrency issue. So by dumping to a recordset I was able to use the recordset in a script task to do the logic in a manner to avoid that issue.
Another script task use is to build and send HTML emails.
I suppose a use for it might be when you have 1 data flow to get 1 record set then do a bunch of non dataflow tasks and then use that as a source in another data flow task, but that is not something I have ever done.

Logging errors in SSIS

I have a ssis project with 3 ssis packages, one is a parent package which calls the other 2 packages based on some condition. In the parent package I have a foreach loop container which will read multiple .csv files from a location and based on the file name one of the two child packages will be executed and the data is uploaded into the tables present in MS SQL Server 2008. Since multiple files are read, if any of the file generates an error in the the child packages, I have to log the details of error (like the filename, error message, row number etc) in a custom database table, delete all the records that got uploaded in the table and read the next file and the package should not stop for the files which are valid and doesn't generate any error when they are read.
Say if a file has 100 rows and there is a problem at row number 50, then we need to log the error details in a table, delete rows 1 to 49 which got uploaded in the database table and the package to start executing the next file.
How can I achieve this in SSIS?

You will have to set TransactionOption=*Required* on your foreach loop container and TransactionOption=*Supported* on the control flow items within it. This will allow for your transactions to be rolled back if any complications happen in your child packages. More information on 'TransactionOption' property can be found # http://msdn.microsoft.com/en-us/library/ms137690.aspx
Custom logging can be performed within the child packages by redirecting the error output of your destination to your preferred error destination. However, this redirection logging only occurs on insertion errors. So if you wish to catch errors that occur anywhere in your child package, you will have to set up an 'OnError' event handler or utilize the built-in error logging for SSIS (SSIS -> Logging..)

I suggest you try the creation of two dataflows in your loop container. The main idea here is to have a set of three tables to better and more easily handle the error situations. In the same flow you do the following:
1st dataflow:
Should read .csv file and load data to a temp table. If the file is processed with errors you simply truncate the temp table. In addition, you should also configure the flat file source output to redirect the errors to an error log table.
2nd dataflow:
On the other hand, in case of processing error-free, you need to transfer the rows from temp into the destination table. So, here, the OLEDB datasource is "temp table" and the OLEDB destination is "final table".
Don´t forget to truncate the temp table in both cases, as the next file will need an empty table.

Let's break this down a bit.
I assume that you have a data flow that processes an individual file at a time. The data flow would read the input file via a source connection, transform it and then load the data into the destination. You would basically need to implement the Error Handler flow in your transformations by choosing "Redirect Row". Details on the Error Flow are available here: https://learn.microsoft.com/en-us/sql/integration-services/data-flow/error-handling-in-data.
If you need to skip an entire file due to a bad format, you will need to implement a Precedence Constraint for failure on the file system task.
My suggestion would be to get a copy of the exam preparation book for exam 70-463 - it has great practice examples on exactly the kind of scenarios that you have run into.

We do something similar with Excel files
We have an ErrorsFound variable which is reset each time a new file is read within the for each loop.
A script component validates each row of the data and sets the ErrorsFound variable to true if an error is found, and builds up a string containing any error details.
Then - based on the ErrorsFound variable - either the data is imported or the error is recorded in a log table.
It gets a bit more tricky when the Excel files are filled in badly enough for the process not to be able to read them at all - for example when text is entered in a date, number or currency field. In this case we use the OnError Event Handler of the Data flow task to record an error in the log but won't know which row(s) caused the problem

Using dynamically named table in SSIS data flow task

I'm new to SSIS and am writing a package that includes moving data to a table that is created in a previous Execute SQL Task object.
The issue that I'm encountering is that I am unable to create a data flow destination task that uses a dynamic destination table name.
The intended process is:
Execute SQL Task object creates new table based on today's date (i.e. Table1_20111014)
Data Flow task moves data from table "Table1" to "Table1_20111014".
The column metadata for Table1 and Table1_20111014 are the same, and does not change. However, the name of the table the data needs to be moved to will change depending on the date at time of execution.
Is it possible to dynamically specify the destination table in a destination data flow object?
If not, are there known workarounds or is using SSIS for this task a bad idea?

As long as the meta data remains the same, there is no drawback to using dynamic destination table name.
To accomplish this, on the ole db destination instead of using "table name" or "table name fast load" use the equivalent "from variable" table load option. This obviously assumes you have a variable defined that contains the name of the table created in the execute sql task

Executing an SSIS DataFlow task with different variable value

I need to query three different database and dump them into csv files. Its the same procedure for the three databases. The only difference is the database and the name of the csv file. Can I do this without cutting and pasting? Is there a way to pass parameters to the data flow task?
Thanks!

Your flat file and db connection managers could have the connection string based on a package scoped variable.
Then use a foreach looping container to call your dataflow task. Configure the looping container with a foreach item enumerator and add the appropriate names to the collection.

santiiiii's explanation covers the use case of downloading the data in one package execution. If you need to get the data at different times, then you can use a conditional statement in a variable that will give you different file names and database connections based on the supplied value for the variable. You can then set the value of the variable in the SQL Server Agent Job in the Set Values tab. This can give you more flexibility, but santiiiii's solution is definately best if you want to process all three files at the same time.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SSIS PACKAGE, Only want Derived Columns - ssis

Related

SSIS: How to get the number of updated and deleted rows in an audit?

RecordSetDestination SSIS

Logging errors in SSIS

Using dynamically named table in SSIS data flow task

Executing an SSIS DataFlow task with different variable value

Categories

Resources