SSIS: Order by Column Full Result Set for Execute SQL Task - ssis

I have an Execute SQL Task placed before a Foreach Loop Container so that for every row returned by the Full Result Set a Script Tasks Inserts this into an Excel row.
However, on one particular column I would like to sort the Full Result Set in ASC order before iterating though each row so that I can have the Full Result Set in a particular order.
I've tried using the Order By Clause in the SQL Script inside the Execute SQL Task but it doesn't sort as expected.
Is there a way I can sort the query results in the Result Set object in order by a single column before passing onto another process i.e. the Script Task?

As it seems you've learned, you cannot use ORDER BY in an Execute SQL task. Instead, use a Sort task to order the data after it's been loaded from the Execute SQL task.
Using the Sort Transformation Editor you can choose what columns are sorted and how you want to view them, in addition to determining "pass-through" columns or removing them from the Data Flow.
This image uses an OLE DB Source, but the actual effect of the Sort task is the same if you use an Execute SQL task
This does pose the question: Why are use using an execute SQL Task? An OLE DB source would be much easier and more flexible to use. Consider rewriting your SSIS package to use OLE DB data sources when possible.

Related

SSIS: How to get the number of updated and deleted rows in an audit?

Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
‌
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
P‌S: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.

How to push the filtered data into data flow task in SSIS?

I need to push the filtered data into data flow task...
In the control flow task I have 2 'execute SQL task' and one Data flow task connected one after the other. HOW can I use the output result set of the Execute sql taks into the data flow ?
The two 'execute sql task' performs filter operations and is running fine while debugging.
Inside the datflow task I use a source OLEDB ? What shall I use as a source to get the filtered output data from SQL task in Control Flow...
Adding to this, since you have two EST (Execute SQL Task) which generate a filtered data set which needs to be passed to a DFT (Data Flow Task), you can use a variable substitution method.
Here, you can replace direct SQL with a variable and, create a dynamic SQL using Script task and assign final SQL to the SSIS variable. Now in DFT, use SQL with variable option in your OLEDB Source, this will allow you to get rid of 2 EST's with a single variable which has T-SQL statements
The output data of the Execute SQL Task must be written into the some storage OR in object type variable which can be used as a source in your data flow task.
You can also filter the data in source of the data flow task.
You can store the output of execute SQL task to #Temp table (other properties like delay validation, remainSameConnetion will be required to be set TRUE) OR Permanent table and access that from data flow.

SSIS execute sql task get value from a query statement

I am using SSIS 2008 and put a simple query (not proc) in an execute sql task (control flow). The query generate one column with a single value, what I am trying to do is based on this value to decide whether to do the following tasks. I tried mapping the value to a variable in the parameter mapping. I tried direction Output/Return value etc but all failed. The query takes no parameter. I know probably I can create a proc with a output parameter to be mapped to a variable but just wondering if there is other options (e.g. not creating proc, it's very simple query)?
As mentioned, you need to change the SQL Task to give a Result Set on a 'Single Row', you can then output that result set to a variable.
From here you can use the constraints within the Control Flow to execute different tasks based upon what the outcome variable will be; for example:

How to create a new column with an SQL function in Pentaho PDI work-flow?

I have one sql function, let's say theFunction(item_id). It takes an item id and computes one value as its return. I read one table from the DB and I suppose to compute a new value to append for each row by this function given the item_id particular to taht row. Which desing block would do this form me with the following SQL (if not wrong).
select thsFunction(item_id);
I assume that the block gives me item_id of each row as a variable.
You can use another table input step, and have it accept fields from previous steps and execute for every row (both config options are at the bottom of the step's window).
Beware that this is a rather slow implementation. Each query is executed separately and as such each row requires a round trip to the database.
Alternatively, you can use the Row SQL Script. I believe it allows you to pass all SQL statements in a single trip to the database.
An SQL function is probably much more efficient to run in the database, for all rows at once, in stead of making a separate call into the database from PDI for each row to execute the function. So if performance is at all a relevant concern, I'd suggest a whole different strategy:
Write your rows to a table in the database. End your transformation here.
On the job level, first execute your transformation from above, then execute the function in an "Execute SQL script..." component, giving it an SQL command somewhat like "UPDATE my_temp_table set target_col = theFunction(item_id)".
Continue your job with the remaining steps in a new transformation, starting from that table as input.
This of course presupposes that you don't have too many other threads going on, but if your transofrmation is simple and linear -- or at least if it can be made single-linear at this particular step -- it may be possible to split it up into two parts before and after this SQL call.

SSIS SELECT VALUE from a table without a lookup

I'm fairly new to SSIS,
I'm importing from an XLS spreadsheet into a database table. Along the way I want to select a record from a table, but it is NOT a lookup, ie: a straight SELECT with no join from input source. Then I want to merge this along with the other rows from the XLS.
What is the best way to do this? Variables? OLE DB commands?
Thanks
You could use an OLE DB command but the important thing to remember about this is that it is fired on a per-row basis and could potentially be slow. You can still use a lookup for this purpose, but make sure that you use set the error output to ignore lookup errors for the cases when the lookup transformation does not contain an value for the match you are looking for.
You could also use a merge transformation with an outer join condition rather than an inner join.
If the record that you are retrieving from the database table is not dependent on the data within the row from the spreadsheet then it will probably be the same for each row - is that what you are hoping for?
In this case, I would consider using an Execute SQL Task in the Control Flow to retrieve the record and save it to a variable. You can use a Script Component in the Data Flow to copy the values in the record from the variable to the appropriate fields in each row. This will mean that the lookup data is retrieved only once and not once per row which is slow as jn29098 said above.
If the target for your Data Flow is the same database as the one from which you are extracting the 'lookup' record then you could also consider using an Execute SQL Task (in the Control Flow) to add the lookup values once the spreadsheet data has arrived in the database (once the Data Flow has completed). This would be much more efficient.