SSIS SELECT VALUE from a table without a lookup - ssis

I'm fairly new to SSIS,
I'm importing from an XLS spreadsheet into a database table. Along the way I want to select a record from a table, but it is NOT a lookup, ie: a straight SELECT with no join from input source. Then I want to merge this along with the other rows from the XLS.
What is the best way to do this? Variables? OLE DB commands?
Thanks

You could use an OLE DB command but the important thing to remember about this is that it is fired on a per-row basis and could potentially be slow. You can still use a lookup for this purpose, but make sure that you use set the error output to ignore lookup errors for the cases when the lookup transformation does not contain an value for the match you are looking for.
You could also use a merge transformation with an outer join condition rather than an inner join.

If the record that you are retrieving from the database table is not dependent on the data within the row from the spreadsheet then it will probably be the same for each row - is that what you are hoping for?
In this case, I would consider using an Execute SQL Task in the Control Flow to retrieve the record and save it to a variable. You can use a Script Component in the Data Flow to copy the values in the record from the variable to the appropriate fields in each row. This will mean that the lookup data is retrieved only once and not once per row which is slow as jn29098 said above.
If the target for your Data Flow is the same database as the one from which you are extracting the 'lookup' record then you could also consider using an Execute SQL Task (in the Control Flow) to add the lookup values once the spreadsheet data has arrived in the database (once the Data Flow has completed). This would be much more efficient.

Related

SSIS: How to get the number of updated and deleted rows in an audit?

Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
‌
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
P‌S: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.

Package is struck at "Execute phase is beginning" at Lookup task

I have used a Lookup in my data flow task. When I use Full Cache mode, the data flow task runs fine. But when I use Partial Cache or no Cache in my lookup, the records do not go past the lookup task and it keeps running for hours. I have checked for errors but there aren't any errors displayed. Could anyone please help me on this?
A Lookup is not appropriate for your task. Instead:
Add an OLE DB Source to pull in the data
Sort the records from the incoming source and the OLE DB Source
Perform a merge join (Full outer).
Add a Derived Column Transformation to check for ISNULL on the two joining columns. Create a new output column Called Action. For the NULLs in the target then you will tag that as an INSERT record.
Add a conditional split to send the INSERT record to an OLE DB Destination to insert the new records.
You can also check to see if there are matches between the two populations and perform updates, or look for NULLs in the source and DELETE in the destination.

Which SSIS transformation can perform 'NOT IN' constraint used in SQL query?

I have two OLEDB Data Sources that have similar columns:
TMP_CRUZTRANS
-------------
CUENTA_CTE numeric (20,0)
TMP_CTACTE_S_USD
----------------
CON_OPE numeric(20,0)
I need to substract all the similar values between this two tables and keep the rows which are different. Is there a transformation/task within SSIS that can perform NOT IN constraint normally used in SQL query?
Currently, I am performing this operation using Execute SQL Task on Control Flow.
The top Data Flow creates the first table TMP_CRUZTRANS (Merge join between other 2 tables... But I guess that's not important for my question) that i need to keep the different values with the second table.
In the Execute SQL Task, I have the following statement:
INSERT INTO [dbo].[TMP_CYA]
SELECT RUT_CLIE, CUENTA_CTE, MONTO_TRANSAC
FROM [dbo].[TMP_CRUZTRANS]
WHERE CUENTA_CTE NOT IN (SELECT CON_OPE FROM TMP_CTACTE_S_USD)
Finally, with the new table TMP_CYA I can continue with my work.
The problem with this approach is that the TMP_CRUZTRANS got like 5 millions of rows, so it's VERY slow inserting all this data into a table using Execute SQL Task. It takes about like 5 hours to perform this operation. That's why I need to do this inside the Data Flow task.
You can use Lookup transformation available within Data Flow task to achieve your requirement.
Here is a sample that illustrates what you are trying to achieve.
Create a package with data flow task. Inside the data flow task, use OLE DB Source to read data from your source table TMP_CRUZTRANS. Use Lookup transformation to validate the existence of the values against the table dbo.TMP_CTACTE_S_USD between given columns. Then redirect the non-matching output to OLE DB Destination to insert rows into table dbo.TMP_CYA
Here is how data flow task would look like in place of the Execute SQL Task that you are currently using.
Configure the Lookup transformation as shown below:
On the General tab page, select Redirect rows to no match output from Specify how to handle rows with no matching entries because you are interested only in non matching rows.
On the Connection tab page, select the appropriate OLE DB Connection manager and select the table dbo.TMP_CTACTE_S_USD. That is the table against which you would like to validate the data.
On the Columns tab page, drag the column CUENTA_CTE and drop it on CON_OPE to establish the mapping between source and lookup tables. Click OK.
When you connect the Lookup transformation with OLE DB Destination, Input Output Selection dialog will appear. Please make sure to select Lookup No Match Output.
Here is the sample before executing the package.
You can see that only 2 rows non matching rows have been transferred to OLE DB destination.
You can notice that the destination table now contains the two non matching rows after package execution.
Hope that helps.

SSIS OLE DB conditional "insert"

I have no idea whether this can be done or not, but basically, I have the following data flow:
Extracts the data from an XML file (works fine)
Simply splits the records based on an enclosed condition (works fine)
Had to add a derived column object due to some character set issues (might be better methods, but it works)
Now "Step 4" is where I'm running into a scenario where I'd only like to insert the values that have a corresponding match in my database, for instance, the XML has about 6000 records, and from those, I have maybe 10 of them that I need to match back against and insert them instead of inserting all 6000 of them and doing the compare after the fact (which I could also do, but was hoping there'd be another method). I was thinking that I might be able to perform a sql insert command within the OLE DB DESTINATION object where the ID value in the file matches, but that's what I'm not 100% clear on or if it's even possible for that matter. Should I simply go the temp table route and scrub the data after the fact, or can I do this directly in the destination piece? Any suggestions would be greatly appreciated.
EDIT
Thanks to the last comment from billinkc, I managed to get bit closer, where I can identify the matches and use that result set, but somehow it seems to be running the data flow twice, which is strange.... I took the lookup object out to see whether it was causing it and somehow it seems to be the case, any reason why it would run this entire flow twice with the addition of the lookup? I should have a total of 8 matches, which I confirmed with the data viewer output, but then it seems to be running it a second time for the same file.
Is there a reason you can't use a Lookup transformation to find existing records. Configure it so that it routes non-match records to the no match output and then only connect the match found connector to the "Navigator Staging Manager Funds"
I believe that answers what you've asked but I wonder if you're expressing the right desire? My assumption is the lookup would go against the existing destination and so the lookup returns the id 10 for a row. All of the out of the box destinations in SSIS only perform inserts, so that row that found a match would now get doubled. As you are looking for existing rows, that usually implies you'd want to perform an update to an existing row. If that's the case, there is a specially designed transformation, the OLE DB Command. It is the component that allows for updates. There is a performance problem with that component, it issues a single update statement per row flowing through it. For 10 rows, I think it'd be fine. Otherwise, the pattern you'd use is to write all the new rows (inserts) into your destination table and then write all of your changed rows (updates) into a second staging-type table. After the data flow is complete, then use an Execute SQL Task to perform a set based update statement.
There are third party options that handle combined upserts. I know Pragmatic Works has an option and there are probably others on the tasks and components site.

SSIS: recordset or temp table

I have an SSIS application that needs to get data from 2 databases of different servers (not link). I need to get the match names and DOB records between 2 database then use the results to insert/update a table.
My initial approach is to use OLE DB source then Merge Join and put the results to recordset. Then on controlflow, use the results of the recordset to insert/update a table. But I can't see the recordset at the control flow.
Alternative solution is to create temp tables. But the temp tables are not visible since they reside at the tempdb database of each servers.
What is a better approach for this problem?
what do you mean by put the results to recordset?
If you join two sources on the data flow using a join, that "recordset" on the join will only be available during the current dataflow. You cant use it on the control flow after the data flow is finisehd.
why cant you just insert the resultset on the destination DB? You can perform any other transform operation on the same data flow and insert the result on the destination database.
Or, if you really need to do something that can only be done on the control flow before insert the data, you can yes, insert the recordset on a temp table on the destination using a oleDBDestination and access in on another dataflow (not a very good approach, though)
In this case, I would keep a database around for work table or create a schema for those work tables.
Next, add a SQL control flow task that truncates the table that will hold the intermediate result. After this, load the intermediate result set into the table, do the operation and optionally, truncate the table again.
The recordset destination is fine for smaller datasets. But if you plan to use it for larger datasets that dont fit memory it will be very slow.
If you dont have a database/schema that can serve as a workspace, you could use RAW files to hold the intermediate result. Those are very fast too.