temporary table in SSIS ado.net source - ssis

I can only find a solution for using temp table in OLE DB source.
But I can't find a solution for ADO.NET source. How can I successfully use temp table in the ADo.NET source in SSIS package?

I find working with temporary tables in SSIS more pain than they are usually worth. I hope your experience is better.
Create an ADO.NET connection. In the properties of the Connection Manager set the value of RetainSameConnection from false to true. This will allow the temporary table created to be in existence for the duration of the package execution by preventing connection pooling from swapping out threads.
My trouble extends from getting the metadata set up correctly. To get around this, I create a variable, QuerySource, that queries a physical table that mirrors what the temporary table will look like. SELECT S.src_id, S.src_value FROM dbo.SRC AS S; This allows the data flow to establish the correct meta data for the downstream components. I manually use this query in the ADO.NET source. Once that's done, I will need to change the query to use the temporary table, ##SRC. Unlike the OLE DB Source component, you cannot set this property inside the Data Flow task.
Once the data flow work is completed, back in the Control Flow, view the Properties of the Data Flow Task. Change the Delay Validation from false to true. This will prevent any design time validation from firing which is needed once we remove the non-temporary table "scaffolding." Next find the Expressions and click the ellipses (...). In the drop down, you should see the name of your ADO.NET source. I had renamed mine so I saw [ADONET Src].[TableOrViewName] and [ADONET Src].[SqlCommand] in the drop down list. I selected [ADONET Src].[SqlCommand] and as my value, I used #[User::SrcQuery].
I ran the package to ensure it still worked. It did. I then changed the value of my query to be SELECT S.src_id, S.src_value FROM ##SRC AS S; I reran and this time it correctly pulled data from my temporary table.
If you are using SQL Server 2012 as your source, you might be able to make it easier on yourself by using the WITH RESULT SETS option of the EXECUTE statement to explicitly describe your temporary tables metadata.

Related

I don't use temp table in SSIS

I have problem when I used OLE-DB Source temp table in SSIS.
I create temp table in Execute T-SQL Statement Task and I change DelayValidation : True and RetainSameConnection : True . But problem is not solved .
Background
What's likely happening here is that the table does not exist at the moment. Temporary tables come in two variants: local and global.
A local temporary table, uses a name with a single sharp/pound/hash/octothorpe prepended to it i.e. #TEMP. The only query that can use that instance of the temporary table is the one that creates it. This is why the advice across the internet says you need to set RetainSameConnection to true to ensure the connection that created the table is reused in the data flow. Otherwise, you're at the mercy of connection pooling and maybe the same connection is used in both places, maybe not and believe me that's an unpleasant bit of randomness to try and debug. The reason for the DelayValidation on the data flow is that as the package starts, the engine will validate that all the data looks as expected before it does any work. As the precursor step is what gets the data flow task into the expected state, we need to tell the execution to only validate the task immediately before execution. Validation always happens, it's just a matter of when you pay the price.
A global temporary table is defined with a double sharp/etc sign prepended to it, ##TEMP. This is accessible by any process, not just the connection that created it. It will live until the creating connection goes away (or explicitly drops it).
Resolution
Once the package is designed (the metadata is established in the data flow), using local temporary table is going to work just fine. Developing it though, it's impossible to use a local temporary table as a source in the data flow. If you execute the precursor step, that connection will open up, create the temporary table and then the connection goes away as does the temporary table.
I would resolve this by the following steps
Copy the query in your Execute SQL Task into a window in SSMS and create your local temporary table as a global temporary table, thus ##TEMP.
Create an SSIS variable called SourceQuery of type String with a value of SELECT * FROM ##TEMP;
Modify the "Data access mode" from the OLE DB Source from "SQL Command" to "SQL Command from Variable" and use the variable User::SourceQuery
Complete the design of the Data Flow
Save the package to ensure the metadata is persisted
Change the query in our variable from referencing ##TEMP to #TEMP
Save again.
Drop the ##TEMP table or close the connection
Run the package to ensure everything is working as I expect it.
Steps 2, 3, and 6 in the above allows you to emulate the magician pulling the tablecloth out from underneath all the dishes.
If you were to manually edit the query in the data flow itself from ##TEMP to #TEMP, the validation fires and since there is no #TEMP table available, it'll report VS_NEEDSNEWMETADATA and likely doesn't let you save the package. Using a variable as the query source provides a level of indirection that gets us around the "validate-on-change"/reinitialize metadata step.

SSIS: How to get the number of updated and deleted rows in an audit?

Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
‌
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
P‌S: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.

No columns returned SSIS

I am implementing a SSIS package and currently trying to do the following.
Truncate the destination table
Fetch the data by executing the stored procedure and insert it into the destination table.
I have created an Execute SQL task to address step 1 and dataflow with oledb source and oledb destination to address the second point. It been working successfully so far but isn't working for one my stored procedure that uses temp tables.
When I edit the oledb source and click the preview button, I get the error no column returned
I know that SSIS has an issue with generating column while executing stored procedures that depend on temp tables. I have converted the stored proc to use temporary table variables and its now able to return columns in SSIS when I do a preview. The only downside is that the stored procedure is taking longer time to execute. Its taking 1 hour 15 mins as compared to 15 mins while using temp tables.
I did see a suggestion to use SET FMTONLY before executing the stored procedure as an alternate solution to changing to temp table variables but that didn't seem to work as I am getting syntax or permission denied error.
Could somebody tell me a solution to my problem which does not compromise on the performance.
Sounds like you've already read all the approaches to using Temp tables in SSIS, including the IF 1=0... trick? If you haven't seen that one yet, google it.
You say that using Table Variables causes your stored procedure to take about 5 times longer than using Temp Tables. The most likely reason for that is that you are indexing your temp tables but not your table variables. If you didn't know that table variables can be indexed, they can. You might try that.
Finally, a solution that you haven't mentioned is that you can replace your temporary table with a real table that gets truncated when you're done using it.
Short comment:
Try EXEC WITH RESULT SETS and specify the metadata yourself for a proc with temp tables; or use the Script Component as a source and specify the Output columns yourself.
Long comment:
Technically speaking, it is the driver/database you are using in SSIS that would decide the behavior when working with temp tables.
Metadata is an important factor when using SSIS's pipeline components. By metadata, I mean the names of the columns, their data types etc that a pipeline component uses. When designing a data flow, someone/something should provide this metadata to the components that require it.
In most cases, SSIS automatically retreives the metadata. Components that do not connect to a external data source, like Conditional Split etc, get their metadata from the other components they are connected to. For the pipeline components that connect to a external data source (like Oledb source, oledb destination, Lookup etc.), SSIS provides a mechanism to get this metadata without human involvement. This mechanism involves the driver connecting to the database and retrieving the metadata of the output. If the driver/database is capable of returning the metadata, then that metadata is used. If the driver/database is incapable, then you get the errors you are seeing. The rest of my comments are based on the assumption that you are using a SQL Server database in your question.
When working with a SQL Server database in SSIS, typically, we use the native client drivers provided by Microsoft. When trying to get the metadata, these drivers try to get the metadata without actually executing the SQL Statement (actual execution can have side effects; and also, might take more than a few seconds/minutes/hours; and you dont want side effects and long wait times during package design time.) So to get the metadata, the driver relies on the metadata of the actual objects used in the sql command. If the command uses a physical table or view, SQL Server already has the metadata available and can supply it to the driver. If it is a temp table, SQL Server does not have the metadata until it can create the temp table. If using FMT ONLY option, you can use it in such a way to create the temp tables, but avoid any heavy processing/side affects and thus be able to retrieve metadata without penalties. Post 2012, these native client drivers rely on some newer functionality to retrieve metadata than the drivers before 2012. In 2012 and after, the driver uses the sp_describe_first_result_set proc to retrieve metadata. So, whether you can get metadata or not is determined by the ability of the sp_describe_first_result_set proc.
So while SSIS can automatically get the metadata (because of the driver/database), it does not automatically get the metadata in some cases (again because of the driver/database). In cases involving the second scenario, some other process (typically a human) can help the driver infer metadata or provide the metadata to the component directly.
To help the driver, in case of SQL Server 2012 and after, you can use the WITH RESULTSETS clause to specify the output metadata. When this clause is present, the driver will use it and doesnt try to query the metadata from system objects; and thus avoid the error which you would otherwise get. If you are using the drivers that came with SQL Server 2008, you can use FMT ONLY. This option is at the driver/database level.
Another option could be to use a Script Component as the Source and in the Output columns, you can specify the columns/metadata. SSIS would not try to retrieve metadata from the datasource in this case, but would rely on the definitions you provided in the Output section of the Script Component.
As you can see, both options involve a human (or some other process) specifying the metadata instead of SSIS trying to retrieve the metadata in an automated fashion. I would prefer the first option if working with SQL Server and the second option if working with databases like MySql.

Use temp tables in SSIS packages

I am writing a basic file dump from one database to another. I am using SSIS 2008 and creating several packages to transform the data I have from a MSSQL 2010 database to a MYSQL 5.1 database.
All the connections are set up and records can be tranfered between the two databases but I would like to use temp tables in the transform processes and use the temp table as the MSSQL source in a dataflow task to dump the table in an awaiting MYSQL table.
I have been having problems setting this up. I am using an OLEDB connection and have set the RetainSameConnection property as well as the DelayValidation property to true. When setting up the source figure as the source from the MSSQL database I cannot find the temp table I have created in an earlier task from the control flow. I am using the same connection manager for these two tasks.
Anyone have any ideas or experience with this?
As a simple example one task does..
SELECT *
INTO #TMP
FROM CUSTOMERS
(This is a simplified example and I relize in this case I could just use the Customers table so bear with me)
Is it possible to use this temp table in a dataflow operation as the source table?
As I mentioned in my comment, not much of a solution and more of a workaround. SSIS uses the shape of result sets to bind properties in tasks. As temp tables are not always available in the database this can cause errors in SSIS even if you set DelayValidation to true.
My solution is to create an SSIS schema in whichever database you're connecting to. The reasons for doing so are security and clear separation of objects that are only used within SSIS packages - primarily staging tables.
Instead of throwing tables in your dbo schema (you shouldn't be anyway, shame on you) you'd create them in the SSIS schema. A typical data flow would truncate the table when it begins, load values and perform whatever operations are required, optionally truncating it when complete. As long as the table is always available SSIS can examine the shape of result sets.
You should not use temp tables as the source as it will not recognize the columns for the select. use table variables or CTEs instead.

SSIS two staging tables

I would like to bring in an XML source and do data conversion and update it in a table. Data from this table will be used to update another table. How to accomplish this in SSIS?
I understand the first two steps. But lost after that.
XML Source (under dataflow task)
Data Conversion
OLE DB Destination? (If I use OLE DB Destination, then I cannot use that as a source again to update another table). What component should I be using to accomplish this?
TIA
Within a dataflow you can split the records to go to multiple tables using either a conditional split (if you want some records to go one way and some to go another way) or a mulicast task if you want all records to go to both destinations. We use a multicast to create two staging tables, one where the raw data from the file will stay and one where the data will be cleaned and transformed before going into our prod tables. This enables us to easily research if some problem data that came in was due to our transformation process (a bug) or bad data being sent (a problem at the client end, but which might require more steps to handle if they can't fix).
You can also have multiple data flows that all have the same source. Or you can insert to one staging table and then have a second data flow or exec SQL task to move that data to where you want it.
Use the OLE DB Destination to inject your XML source data into your staging table. Then, in your control flow use an Execute SQL task after your data flow task to execute a stored procedure or T-SQL script to move your data from the staging table into the production table(s) and truncate the staging table if required.
I've found that SSIS is great for ETL work, but moving data around inside a DB or aggregation work is best carried out using T-SQL in stored procs. Easier to write, control and you know you're not going to have any RBAR shenanigans you can happen upon in a DFT.
YMMV