Intergrating a scalar value function into an SSIS package

Intergrating a scalar value function into an SSIS package - function

I have a scalar value function that I created to calculate aging and I want to make it apart of my SSIS package that I run monthly. How do I go about making this work?

If you mean you have a t-sql function that you would like to call in a data flow somehow, then that is not something you can actually do. Your have the following options:
Write the function as an SSIS expression and make the calculation in a derived column. This can be less than ideal if the function was complex.
Write a function in a script task and use a script transformation. This works well if the function is relatively simple, but isn't easily reusable.
Create an assembly that you then reference in a script transformation. You could also use this underlying code to create a CLR function on the server. This makes it easier to manage the code, but it requires more overhead to implement.
Load the data into a staging table in a database and then use the function when you try to merge the data into your final destination table. The benefit of this approach is that it is the easiest to implement. The downside of this approach is that you have to write the data to disk twice, so you are bound to get worse performance than if you used any of the other solutions.

Use the OLEDB Command Transformation, as per here:
SSIS return value of Stored Procedure within an OLE DB Command
Modify as required.

I'm guessing it's a scalar function in a MS sql database.
Use it in the SQL Command in a Data Flow Source in a select statement.
Select dbo.myFunction(myParam)
From myTable

Related

No columns returned SSIS

I am implementing a SSIS package and currently trying to do the following.
Truncate the destination table
Fetch the data by executing the stored procedure and insert it into the destination table.
I have created an Execute SQL task to address step 1 and dataflow with oledb source and oledb destination to address the second point. It been working successfully so far but isn't working for one my stored procedure that uses temp tables.
When I edit the oledb source and click the preview button, I get the error no column returned
I know that SSIS has an issue with generating column while executing stored procedures that depend on temp tables. I have converted the stored proc to use temporary table variables and its now able to return columns in SSIS when I do a preview. The only downside is that the stored procedure is taking longer time to execute. Its taking 1 hour 15 mins as compared to 15 mins while using temp tables.
I did see a suggestion to use SET FMTONLY before executing the stored procedure as an alternate solution to changing to temp table variables but that didn't seem to work as I am getting syntax or permission denied error.
Could somebody tell me a solution to my problem which does not compromise on the performance.

Sounds like you've already read all the approaches to using Temp tables in SSIS, including the IF 1=0... trick? If you haven't seen that one yet, google it.
You say that using Table Variables causes your stored procedure to take about 5 times longer than using Temp Tables. The most likely reason for that is that you are indexing your temp tables but not your table variables. If you didn't know that table variables can be indexed, they can. You might try that.
Finally, a solution that you haven't mentioned is that you can replace your temporary table with a real table that gets truncated when you're done using it.

Short comment:
Try EXEC WITH RESULT SETS and specify the metadata yourself for a proc with temp tables; or use the Script Component as a source and specify the Output columns yourself.
Long comment:
Technically speaking, it is the driver/database you are using in SSIS that would decide the behavior when working with temp tables.
Metadata is an important factor when using SSIS's pipeline components. By metadata, I mean the names of the columns, their data types etc that a pipeline component uses. When designing a data flow, someone/something should provide this metadata to the components that require it.
In most cases, SSIS automatically retreives the metadata. Components that do not connect to a external data source, like Conditional Split etc, get their metadata from the other components they are connected to. For the pipeline components that connect to a external data source (like Oledb source, oledb destination, Lookup etc.), SSIS provides a mechanism to get this metadata without human involvement. This mechanism involves the driver connecting to the database and retrieving the metadata of the output. If the driver/database is capable of returning the metadata, then that metadata is used. If the driver/database is incapable, then you get the errors you are seeing. The rest of my comments are based on the assumption that you are using a SQL Server database in your question.
When working with a SQL Server database in SSIS, typically, we use the native client drivers provided by Microsoft. When trying to get the metadata, these drivers try to get the metadata without actually executing the SQL Statement (actual execution can have side effects; and also, might take more than a few seconds/minutes/hours; and you dont want side effects and long wait times during package design time.) So to get the metadata, the driver relies on the metadata of the actual objects used in the sql command. If the command uses a physical table or view, SQL Server already has the metadata available and can supply it to the driver. If it is a temp table, SQL Server does not have the metadata until it can create the temp table. If using FMT ONLY option, you can use it in such a way to create the temp tables, but avoid any heavy processing/side affects and thus be able to retrieve metadata without penalties. Post 2012, these native client drivers rely on some newer functionality to retrieve metadata than the drivers before 2012. In 2012 and after, the driver uses the sp_describe_first_result_set proc to retrieve metadata. So, whether you can get metadata or not is determined by the ability of the sp_describe_first_result_set proc.
So while SSIS can automatically get the metadata (because of the driver/database), it does not automatically get the metadata in some cases (again because of the driver/database). In cases involving the second scenario, some other process (typically a human) can help the driver infer metadata or provide the metadata to the component directly.
To help the driver, in case of SQL Server 2012 and after, you can use the WITH RESULTSETS clause to specify the output metadata. When this clause is present, the driver will use it and doesnt try to query the metadata from system objects; and thus avoid the error which you would otherwise get. If you are using the drivers that came with SQL Server 2008, you can use FMT ONLY. This option is at the driver/database level.
Another option could be to use a Script Component as the Source and in the Output columns, you can specify the columns/metadata. SSIS would not try to retrieve metadata from the datasource in this case, but would rely on the definitions you provided in the Output section of the Script Component.
As you can see, both options involve a human (or some other process) specifying the metadata instead of SSIS trying to retrieve the metadata in an automated fashion. I would prefer the first option if working with SQL Server and the second option if working with databases like MySql.

RecordSetDestination SSIS

What is the major use of recordset destination in SSIS?I heard that it is an in-memory,so the variable which is holding the data is it in raw format? Can someone explain the explain me the real time project use of Recordset destination?

A recordset destination can be used for just about anything you can think of. Some common uses I hear is to use the recordset in a foreach loop. Say you want to export several "categories" from a transaction table. Perhaps you get a recordset of the categories that exist and then call a new dataflow to export that category as it's own file. Or perhaps date ranges, months, etc.
One way I use it is in a script task to perform an action on the data that SSIS cannot do natively. I was using a script component but this particular task ran into a concurrency issue. So by dumping to a recordset I was able to use the recordset in a script task to do the logic in a manner to avoid that issue.
Another script task use is to build and send HTML emails.
I suppose a use for it might be when you have 1 data flow to get 1 record set then do a bunch of non dataflow tasks and then use that as a source in another data flow task, but that is not something I have ever done.

Create Function Teradata

I want to create a small function (or like) which will take parameters and return result of excuting teradata sql statement. Purpose of this is to convert repeated using SQL into function which can be used into SELECT statement.
Please point me into right direction. Create Function in teradata requires C/C++ compilation which is too much effort after looking at use of required function.

Passing parameters to SQL can be done using a CREATE MACRO, but those can't be used in a SELECT.
SQL-UDFs are limited to simple scalar functions in Teradata, i.e. no SELECTs, etc.
If you need a more complex function (table or [window] aggregate) you must write it in C or Java.

SSIS Copy table if not empty

I would like to make a package that would copy data from a table only if table is not empty. I know how to do count and how to make a package for copying data but problem is that Source can't have any inputs so I don't know how to do it. Any suggestions?

I don't understand your comment about dragging a "green line from a package to a source" but instead of trying to determine in advance if the table is empty, just do your copy anyway and then see how many rows were copied:
Create a package variable for the rowcount
Populate the variable using the rowcount transformation
Use an expression in the precedence constraint to check the variable: if it's greater than zero then continue executing the rest of your package

#Pondlife I don't think you can use precedence constraint on the data flow task, can you?
I believe you can use it only on the control flow.
I would add a "Execute SQL Task" with the count, sending the result to a variable and from this task, I would drag the green arrow to the Data Flow task that makes the copy and on this arrow I would add the expression on the precedence constraint.

As you have correctly noted, a data flow source does not accept input so one cannot perform logic in the dataflow to determine whether this task should run.
Cannot create connector.
The destination component does not have any available inputs for use in creating a path.
However, there's nothing stopping you from setting up this logic in your control flow. I would use a query that hits the DMVs for a fast rowcount on the destination system, filtered to only the tables I wished to replicate.
Armed with the list of empty tables, it'd probably depend how I'd handle it. For a small number of tables, I'd define N dataflows all with a do nothing script task as a precedent and then use an expression on table name to enable a path, much like I did on this question.
If there are many tables, I'd define a package per table and then invoke execute package task with the package name built dynamically based on the empty table name.

Dynamic Linq - query a schema that is only known at run time?

I know with dynamic linq you can construct expressions dynamically in the same way that you might build and execute a dynamic SQL statement - e.g. a dynamic where clause or a dynamic select list. Is it possible to do this in cases where the schema is not known at compile time?
In a database I'm working with users can define their own entities which causes new tables/columns to be created in the back-end database. At run time I'll know the table & column names I need to work with but I won't know the schema at compile time hence I can't build a DBML to work with up front.
Is there any facility for the dynamic discovery of the schema at run time or is this a case where I need to stick with building dynamic SQL statements?

As far as we understand, you don't know neither schema name nor the full structure of your schema for sure.
In this case it seems that the strongly-typed ExecuteQuery method overload will be an option.
Just write the SQL queries and add the necessary parameters (like table and column names) either using string concatenation or as parameters.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008