Running a DFT Based on a condition in SSIS - ssis

I've a package where I need to retrieve data from a mysql table and insert it into sql server table.
I've a situation where in old data often gets modified and the client wants to dump all data which is too large and time consuming...So I've come up with a proposal that we'd load only yesterday's data on week days and do complete dump on weekend...Is there a possibility of Enable/Disabling a DFT Based on an expression? I've Tried using Expressions->Disable based on DATEPART(WeekDAY,GETDATE()) but it runs for a complete load irrespective of expression's value
Regards,
Vijay

Create a SQL Task or Script Task that does your expression, and set the result to a variable.
Then create your data flow task.
Then connect the two with an arrow (aka precedence constraint)
Then right-click:Edit on the arror and choose Edit, and in the Precedence Constraint Editor choose
EvaluationOperation: Expression
Value: ##YourVariable= {an expression, such as #iRowsUpdated==True}

You should put the condition as a where clause against your select statement in your source query. You will need to change access mode from table to SQL Statement first

Related

SSIS: How to get the number of updated and deleted rows in an audit?

Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
‌
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
P‌S: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.

How to create a new column with an SQL function in Pentaho PDI work-flow?

I have one sql function, let's say theFunction(item_id). It takes an item id and computes one value as its return. I read one table from the DB and I suppose to compute a new value to append for each row by this function given the item_id particular to taht row. Which desing block would do this form me with the following SQL (if not wrong).
select thsFunction(item_id);
I assume that the block gives me item_id of each row as a variable.
You can use another table input step, and have it accept fields from previous steps and execute for every row (both config options are at the bottom of the step's window).
Beware that this is a rather slow implementation. Each query is executed separately and as such each row requires a round trip to the database.
Alternatively, you can use the Row SQL Script. I believe it allows you to pass all SQL statements in a single trip to the database.
An SQL function is probably much more efficient to run in the database, for all rows at once, in stead of making a separate call into the database from PDI for each row to execute the function. So if performance is at all a relevant concern, I'd suggest a whole different strategy:
Write your rows to a table in the database. End your transformation here.
On the job level, first execute your transformation from above, then execute the function in an "Execute SQL script..." component, giving it an SQL command somewhat like "UPDATE my_temp_table set target_col = theFunction(item_id)".
Continue your job with the remaining steps in a new transformation, starting from that table as input.
This of course presupposes that you don't have too many other threads going on, but if your transofrmation is simple and linear -- or at least if it can be made single-linear at this particular step -- it may be possible to split it up into two parts before and after this SQL call.

SQL Profiler: who modified a column?

Is it possible to use SQL Server Profiler 2008 to catch which program, run from which computer, is modifying table SALES column POSTED from false to true, but excluding stored procedure 'salesposting'?
I suppose I can use 'objectname' filter for the table, but how to filter the column, the previous value, and the current value?
You could trace specific procedures or update statements through the text data option in the Filters list.
It is likely simpler to add a simple trigger capturing the user / input buffer is likely the best approach.
There is an ApplicationName and Hostname column that you can trace. Just click the check box "Show All columns" that you'll see them
If you want to filter by Stored Procedure, select filter the RPC:Completed event

What is the best tool to use to transfer Data from Reporting Database to another?

I have a reporting database and have to transfer data from that to another server where we run some other reports or functions on Data. What is the best way to transfer data periodically like months or by-weekly. I can use SSIS but is there anyway I can put some where clause on what rows should be extracted from the source database? like i only want to extract data for a current month. Please do let me know.
Thanks,
Vivek
For scheduling periodic extractions, I'd leave to that SQL Agent.
As for restricting the results by some condition, that's an easy thing. Instead of this (and you should always use SQL Command or SQL Command From Variable over Table Name/Table Name From Variable as they are faster)
Add a parameter. If you're use OLE DB connection manager, your indicator for a variable is ?. ADO.NET will be #parameterName
Now, wire the filter up by clicking the Parameters... button. With OLE DB, it's ordinal position starting at 0. If you wanted to use the same parameter twice, you will have to list it each time or use the ADO.NET connection manager.
The biggest question you will have to answer is how do I identify what row(s) need to go. Possibilities are endless: query into the target database and find most recent modified date for a table or highest key value. You could create a local table that tracks what's been sent and query that. You could perform an incremental load / ETL Instrumentation to identify new/updated/unchanged rows, etc.

SSIS SELECT VALUE from a table without a lookup

I'm fairly new to SSIS,
I'm importing from an XLS spreadsheet into a database table. Along the way I want to select a record from a table, but it is NOT a lookup, ie: a straight SELECT with no join from input source. Then I want to merge this along with the other rows from the XLS.
What is the best way to do this? Variables? OLE DB commands?
Thanks
You could use an OLE DB command but the important thing to remember about this is that it is fired on a per-row basis and could potentially be slow. You can still use a lookup for this purpose, but make sure that you use set the error output to ignore lookup errors for the cases when the lookup transformation does not contain an value for the match you are looking for.
You could also use a merge transformation with an outer join condition rather than an inner join.
If the record that you are retrieving from the database table is not dependent on the data within the row from the spreadsheet then it will probably be the same for each row - is that what you are hoping for?
In this case, I would consider using an Execute SQL Task in the Control Flow to retrieve the record and save it to a variable. You can use a Script Component in the Data Flow to copy the values in the record from the variable to the appropriate fields in each row. This will mean that the lookup data is retrieved only once and not once per row which is slow as jn29098 said above.
If the target for your Data Flow is the same database as the one from which you are extracting the 'lookup' record then you could also consider using an Execute SQL Task (in the Control Flow) to add the lookup values once the spreadsheet data has arrived in the database (once the Data Flow has completed). This would be much more efficient.