how to use parameters inside look up transformation in ssis - ssis

I have an input csv file with columns eid,ename,designation. Next i use Lookup transformation, inside look up am using query like
select * from employee where ename=?
i need to pass parameter ? from csv file. That is ename which is in csv file has to be passed into the query using Lookup transformation.
Inside Lookup i have changed mode to Partial cache, and inside Advanced tab, i selected Modify the SQL Statement and placed my query, and clicke on paramters tab. But i don't know like how to pass the parameter.

you cant add parameters to your lookup query. If by adding the parameters your goal is to reduce the amount of data read from the database, you don't have to worry, the "partial cache" will do that for you.
Partial cache means that the lookup query is not executed on the validation phase (like the full cache option) and that rows are being added to the cache as they are being queried from the database one by one. So, if you have one million rows on your lookup cache and your query only have reference to 10 of those rows, your lookup will do 10 selects to your database and end up with 10 rows only.

Related

Running a DFT Based on a condition in SSIS

I've a package where I need to retrieve data from a mysql table and insert it into sql server table.
I've a situation where in old data often gets modified and the client wants to dump all data which is too large and time consuming...So I've come up with a proposal that we'd load only yesterday's data on week days and do complete dump on weekend...Is there a possibility of Enable/Disabling a DFT Based on an expression? I've Tried using Expressions->Disable based on DATEPART(WeekDAY,GETDATE()) but it runs for a complete load irrespective of expression's value
Regards,
Vijay
Create a SQL Task or Script Task that does your expression, and set the result to a variable.
Then create your data flow task.
Then connect the two with an arrow (aka precedence constraint)
Then right-click:Edit on the arror and choose Edit, and in the Precedence Constraint Editor choose
EvaluationOperation: Expression
Value: ##YourVariable= {an expression, such as #iRowsUpdated==True}
You should put the condition as a where clause against your select statement in your source query. You will need to change access mode from table to SQL Statement first

Which SSIS transformation can perform 'NOT IN' constraint used in SQL query?

I have two OLEDB Data Sources that have similar columns:
TMP_CRUZTRANS
-------------
CUENTA_CTE numeric (20,0)
TMP_CTACTE_S_USD
----------------
CON_OPE numeric(20,0)
I need to substract all the similar values between this two tables and keep the rows which are different. Is there a transformation/task within SSIS that can perform NOT IN constraint normally used in SQL query?
Currently, I am performing this operation using Execute SQL Task on Control Flow.
The top Data Flow creates the first table TMP_CRUZTRANS (Merge join between other 2 tables... But I guess that's not important for my question) that i need to keep the different values with the second table.
In the Execute SQL Task, I have the following statement:
INSERT INTO [dbo].[TMP_CYA]
SELECT RUT_CLIE, CUENTA_CTE, MONTO_TRANSAC
FROM [dbo].[TMP_CRUZTRANS]
WHERE CUENTA_CTE NOT IN (SELECT CON_OPE FROM TMP_CTACTE_S_USD)
Finally, with the new table TMP_CYA I can continue with my work.
The problem with this approach is that the TMP_CRUZTRANS got like 5 millions of rows, so it's VERY slow inserting all this data into a table using Execute SQL Task. It takes about like 5 hours to perform this operation. That's why I need to do this inside the Data Flow task.
You can use Lookup transformation available within Data Flow task to achieve your requirement.
Here is a sample that illustrates what you are trying to achieve.
Create a package with data flow task. Inside the data flow task, use OLE DB Source to read data from your source table TMP_CRUZTRANS. Use Lookup transformation to validate the existence of the values against the table dbo.TMP_CTACTE_S_USD between given columns. Then redirect the non-matching output to OLE DB Destination to insert rows into table dbo.TMP_CYA
Here is how data flow task would look like in place of the Execute SQL Task that you are currently using.
Configure the Lookup transformation as shown below:
On the General tab page, select Redirect rows to no match output from Specify how to handle rows with no matching entries because you are interested only in non matching rows.
On the Connection tab page, select the appropriate OLE DB Connection manager and select the table dbo.TMP_CTACTE_S_USD. That is the table against which you would like to validate the data.
On the Columns tab page, drag the column CUENTA_CTE and drop it on CON_OPE to establish the mapping between source and lookup tables. Click OK.
When you connect the Lookup transformation with OLE DB Destination, Input Output Selection dialog will appear. Please make sure to select Lookup No Match Output.
Here is the sample before executing the package.
You can see that only 2 rows non matching rows have been transferred to OLE DB destination.
You can notice that the destination table now contains the two non matching rows after package execution.
Hope that helps.

Update SQL Table with SSIS Rowcount

I have a flat file that I am saving to a SQL Table. I want to count the rows that inserted and write the count to another table.
The simple answer is to create an SSIS Variable and drop a RowCount transformation onto your dataflow.
Create a variable
On the Control Flow, click in the background. Do not click on any tasks or your variable would be created at the wrong scope (this caveat does not apply to 2012). Right click and select Variables. In the Variables window, click Add button and name it as RowCounts with a data type of Int32 (unless you need Int64 (more than 2M rows))
Add a row count transformation
Inside your data flow, add a Row Count transformation after your data source. Configure it to use the variable we created above. The resulting data flow might look something like this
It is important to note that the row count component does not assign the row count into the #User::RowCount variable until after the data flow completes.
Saving the row count value
Once the data flow finishes, you would then need to use an Execute SQL Task in the Control Flow to write the value into your table.
The Execute SQL Task would look something like this, depending on what your table is defined as.
INSERT INTO
dbo.RowCounts
(
rowcounts
)
SELECT
? AS rowcounts
In the Parameter Mapping tab, it would look like
User::RowCount Input Long 0 -1

SSIS OLE DB conditional "insert"

I have no idea whether this can be done or not, but basically, I have the following data flow:
Extracts the data from an XML file (works fine)
Simply splits the records based on an enclosed condition (works fine)
Had to add a derived column object due to some character set issues (might be better methods, but it works)
Now "Step 4" is where I'm running into a scenario where I'd only like to insert the values that have a corresponding match in my database, for instance, the XML has about 6000 records, and from those, I have maybe 10 of them that I need to match back against and insert them instead of inserting all 6000 of them and doing the compare after the fact (which I could also do, but was hoping there'd be another method). I was thinking that I might be able to perform a sql insert command within the OLE DB DESTINATION object where the ID value in the file matches, but that's what I'm not 100% clear on or if it's even possible for that matter. Should I simply go the temp table route and scrub the data after the fact, or can I do this directly in the destination piece? Any suggestions would be greatly appreciated.
EDIT
Thanks to the last comment from billinkc, I managed to get bit closer, where I can identify the matches and use that result set, but somehow it seems to be running the data flow twice, which is strange.... I took the lookup object out to see whether it was causing it and somehow it seems to be the case, any reason why it would run this entire flow twice with the addition of the lookup? I should have a total of 8 matches, which I confirmed with the data viewer output, but then it seems to be running it a second time for the same file.
Is there a reason you can't use a Lookup transformation to find existing records. Configure it so that it routes non-match records to the no match output and then only connect the match found connector to the "Navigator Staging Manager Funds"
I believe that answers what you've asked but I wonder if you're expressing the right desire? My assumption is the lookup would go against the existing destination and so the lookup returns the id 10 for a row. All of the out of the box destinations in SSIS only perform inserts, so that row that found a match would now get doubled. As you are looking for existing rows, that usually implies you'd want to perform an update to an existing row. If that's the case, there is a specially designed transformation, the OLE DB Command. It is the component that allows for updates. There is a performance problem with that component, it issues a single update statement per row flowing through it. For 10 rows, I think it'd be fine. Otherwise, the pattern you'd use is to write all the new rows (inserts) into your destination table and then write all of your changed rows (updates) into a second staging-type table. After the data flow is complete, then use an Execute SQL Task to perform a set based update statement.
There are third party options that handle combined upserts. I know Pragmatic Works has an option and there are probably others on the tasks and components site.

SSIS SELECT VALUE from a table without a lookup

I'm fairly new to SSIS,
I'm importing from an XLS spreadsheet into a database table. Along the way I want to select a record from a table, but it is NOT a lookup, ie: a straight SELECT with no join from input source. Then I want to merge this along with the other rows from the XLS.
What is the best way to do this? Variables? OLE DB commands?
Thanks
You could use an OLE DB command but the important thing to remember about this is that it is fired on a per-row basis and could potentially be slow. You can still use a lookup for this purpose, but make sure that you use set the error output to ignore lookup errors for the cases when the lookup transformation does not contain an value for the match you are looking for.
You could also use a merge transformation with an outer join condition rather than an inner join.
If the record that you are retrieving from the database table is not dependent on the data within the row from the spreadsheet then it will probably be the same for each row - is that what you are hoping for?
In this case, I would consider using an Execute SQL Task in the Control Flow to retrieve the record and save it to a variable. You can use a Script Component in the Data Flow to copy the values in the record from the variable to the appropriate fields in each row. This will mean that the lookup data is retrieved only once and not once per row which is slow as jn29098 said above.
If the target for your Data Flow is the same database as the one from which you are extracting the 'lookup' record then you could also consider using an Execute SQL Task (in the Control Flow) to add the lookup values once the spreadsheet data has arrived in the database (once the Data Flow has completed). This would be much more efficient.