I have a flat file that I am saving to a SQL Table. I want to count the rows that inserted and write the count to another table.
The simple answer is to create an SSIS Variable and drop a RowCount transformation onto your dataflow.
Create a variable
On the Control Flow, click in the background. Do not click on any tasks or your variable would be created at the wrong scope (this caveat does not apply to 2012). Right click and select Variables. In the Variables window, click Add button and name it as RowCounts with a data type of Int32 (unless you need Int64 (more than 2M rows))
Add a row count transformation
Inside your data flow, add a Row Count transformation after your data source. Configure it to use the variable we created above. The resulting data flow might look something like this
It is important to note that the row count component does not assign the row count into the #User::RowCount variable until after the data flow completes.
Saving the row count value
Once the data flow finishes, you would then need to use an Execute SQL Task in the Control Flow to write the value into your table.
The Execute SQL Task would look something like this, depending on what your table is defined as.
INSERT INTO
dbo.RowCounts
(
rowcounts
)
SELECT
? AS rowcounts
In the Parameter Mapping tab, it would look like
User::RowCount Input Long 0 -1
Related
I am using SSIS 2008 and put a simple query (not proc) in an execute sql task (control flow). The query generate one column with a single value, what I am trying to do is based on this value to decide whether to do the following tasks. I tried mapping the value to a variable in the parameter mapping. I tried direction Output/Return value etc but all failed. The query takes no parameter. I know probably I can create a proc with a output parameter to be mapped to a variable but just wondering if there is other options (e.g. not creating proc, it's very simple query)?
As mentioned, you need to change the SQL Task to give a Result Set on a 'Single Row', you can then output that result set to a variable.
From here you can use the constraints within the Control Flow to execute different tasks based upon what the outcome variable will be; for example:
I have one sql function, let's say theFunction(item_id). It takes an item id and computes one value as its return. I read one table from the DB and I suppose to compute a new value to append for each row by this function given the item_id particular to taht row. Which desing block would do this form me with the following SQL (if not wrong).
select thsFunction(item_id);
I assume that the block gives me item_id of each row as a variable.
You can use another table input step, and have it accept fields from previous steps and execute for every row (both config options are at the bottom of the step's window).
Beware that this is a rather slow implementation. Each query is executed separately and as such each row requires a round trip to the database.
Alternatively, you can use the Row SQL Script. I believe it allows you to pass all SQL statements in a single trip to the database.
An SQL function is probably much more efficient to run in the database, for all rows at once, in stead of making a separate call into the database from PDI for each row to execute the function. So if performance is at all a relevant concern, I'd suggest a whole different strategy:
Write your rows to a table in the database. End your transformation here.
On the job level, first execute your transformation from above, then execute the function in an "Execute SQL script..." component, giving it an SQL command somewhat like "UPDATE my_temp_table set target_col = theFunction(item_id)".
Continue your job with the remaining steps in a new transformation, starting from that table as input.
This of course presupposes that you don't have too many other threads going on, but if your transofrmation is simple and linear -- or at least if it can be made single-linear at this particular step -- it may be possible to split it up into two parts before and after this SQL call.
I have a DataFlow Task which picks up a variable while running. This variable changes its value three times so the job has to run three times. I want to have a Lookup Transformation in the DFT that checks if the new value to be inserted already exists in the database for the value of the current variable.(I cannot create any unique key constraints in the database.)How do I make the where clause of the LookUp transformation pick up the value from the variable? I cannot use execute sql as it is restricted to Control flow tasks.
A better way than use a lookup is to use a MERGE statement : http://technet.microsoft.com/en-us/library/bb510625.aspx
If you still want to use a lookup, you have to disable the cache on your component (or set it to partial), then in the advanced tab you can check the "Modify SQL instruction", type your query and use variables using the "parameters..." button.
I have an input csv file with columns eid,ename,designation. Next i use Lookup transformation, inside look up am using query like
select * from employee where ename=?
i need to pass parameter ? from csv file. That is ename which is in csv file has to be passed into the query using Lookup transformation.
Inside Lookup i have changed mode to Partial cache, and inside Advanced tab, i selected Modify the SQL Statement and placed my query, and clicke on paramters tab. But i don't know like how to pass the parameter.
you cant add parameters to your lookup query. If by adding the parameters your goal is to reduce the amount of data read from the database, you don't have to worry, the "partial cache" will do that for you.
Partial cache means that the lookup query is not executed on the validation phase (like the full cache option) and that rows are being added to the cache as they are being queried from the database one by one. So, if you have one million rows on your lookup cache and your query only have reference to 10 of those rows, your lookup will do 10 selects to your database and end up with 10 rows only.
I would like to make a package that would copy data from a table only if table is not empty. I know how to do count and how to make a package for copying data but problem is that Source can't have any inputs so I don't know how to do it. Any suggestions?
I don't understand your comment about dragging a "green line from a package to a source" but instead of trying to determine in advance if the table is empty, just do your copy anyway and then see how many rows were copied:
Create a package variable for the rowcount
Populate the variable using the rowcount transformation
Use an expression in the precedence constraint to check the variable: if it's greater than zero then continue executing the rest of your package
#Pondlife I don't think you can use precedence constraint on the data flow task, can you?
I believe you can use it only on the control flow.
I would add a "Execute SQL Task" with the count, sending the result to a variable and from this task, I would drag the green arrow to the Data Flow task that makes the copy and on this arrow I would add the expression on the precedence constraint.
As you have correctly noted, a data flow source does not accept input so one cannot perform logic in the dataflow to determine whether this task should run.
Cannot create connector.
The destination component does not have any available inputs for use in creating a path.
However, there's nothing stopping you from setting up this logic in your control flow. I would use a query that hits the DMVs for a fast rowcount on the destination system, filtered to only the tables I wished to replicate.
Armed with the list of empty tables, it'd probably depend how I'd handle it. For a small number of tables, I'd define N dataflows all with a do nothing script task as a precedent and then use an expression on table name to enable a path, much like I did on this question.
If there are many tables, I'd define a package per table and then invoke execute package task with the package name built dynamically based on the empty table name.