SSIS - How can I use the value of a variable to determine the ForEachLoop Container? - ssis

I'm working on an SSIS package, the goal of the package is to take a spreadsheet that has several columns (we need PartNum, PartType, and Qty)
and for each row in the spreadsheet, run a query to calculate consumption and dump that into a separate sheet.
I've got a few problems, but my initial problem, is that I have two part types, Manufactured and Purchased. I only want to run the query against Manufactured pieces. How Can I do that in SSIS? I'm trying to set it up in the expression builder for the variable to equal "M", but this always evaluates to false.
Ideally, I want to filter on both Part Type = M and Qty > 0.
Here is a picture of the SSIS package, basically I'm using a data flow to bring a spreadsheet into a Recordset, and then in a Foreach loops, an OLEDB Source to pass query parameters (the part and qty variables) to export into a .csv

In the initial Data Flow Task from the Excel Source into the Recordset Destination, instead of loading the entire Excel file just select records that satisfy the given criteria. Unless you need these records for another purpose in the package, this will also prevent adding unused rows in the Recordset Destination and processing them in subsequent components. You can do this in the Excel Source by changing the Data Access Mode to SQL Command and adding the necessary filters. Excel can be queried similar to SQL. The query you want should be somewhat similar to the following, with the table and column names substituted appropriately. If the columns contain spaces in their names, these will need to be enclosed in square brackets. For example, PartType would be [Part Type].
SELECT
PartNum,
PartType,
Qty
FROM Excel_Sheet
WHERE PartType = 'M' AND Qty > 0

Related

SSIS import from excel files with multiple sheets and different amount of columns

I have SSIS with 2 loops that loop over excel files and over the sheets. I have a data flow task in the for each sheet loop with variable name for sheetname and the source is excel and odbc destination.
The table in the db has all the columns I need such as userid, username, productname, supportname.
However, some sheets can have columns username, productname and others have userid, username, productname, supportname.
How can I load the excel files? Can I add columns to a derived column task that checks if a column exists and if not add it with a default value and then map it to the destination?
thanks
SSIS is a not an any format goes at run-time data loading engine. There was a conscious design decision to make the fastest possible ETL tool and one of those requirements was that they needed to define a contract between the data source's shape and the destination. That's why you'll inevitably run into VS_NEEDSNEWMETADATA error because something has altered the shape and the package needs to be edited in designer mode to update the columns and sizes.
If you want to write the C# to make a generic Excel ingest engine, more power to you.
An alternative approach would be to have multiple data flows defined within your file and worksheet looping construct. The trick would be to conditionally enable them based on the available column set.
Columns "username and productname" detected, enable DFT UserName and ProductName. And that DFT will have default values, or a lookup, for UserId, SupportName, etc
All columns present, enable DFT All.
Finally, Azure Data Factory can "slurp and burp" whatever source to whatever destination. Perhaps that might be a better fit for your problem.

Calculate Difference between current and previous rows in SSIS

How to Calculate Difference between current and previous rows in SSIS then use that result to add a new column to the existing table
I'm assuming when you say "current and previous rows" is
Create 2 package variables, lets say: 'NumBefore'and 'NumAfter'.
Both are Int32.
Inside the Data Flow Task, use a source component (lets say OLEDB Source) and select if its a table or a query. Lets say a table T
Drag 'Row Count' in the Data Flow Transformations list. Double click it and in the Section Variable Names, select the variable 'User::NumBefore'. Row Count task will save, in runtime, the result of the calculation in that variable.
Do whatever you want to do with the data extracted from table T. My guess is that you are going to insert new rows in the same table T, right?
You have to use a second Data Flow Task in the Control Flow. Inside drag another OLEDB Source with the same table T. Use another Row Count Task, but this time use the variable 'User::NumAfter'. After the Row Count Task use either a Script Component or a derived column.
If you use Derived Column, write a name for the column, choose the option 'Replace xxxx' if you want to replace the value of xxx column, or 'Add column' if you want to add that as a column output.
In expression, write: #[User::NumAfter] - #[User::NumBefore]. and the place your OLEDB Destination.
Hope this was you were looking for

SSIS parameters/variables in destination mapping

I am working on a large SSIS data migration project in which all of the output tables have fields for the creation date & user as well as the last update date and user. The values will be the same for all of the records in all of the output tables.
Is there a way to define parameters or variables that will appear in the destination mapping window, and can be used to populate the output table?
If I use a sql statement in the source, I could, of course, include extra fields for this, but then I also have to add a Data Conversion task for translating the string fields from varchar to nvarchar.
You cannot do this in the destination mapping.
As you've already considered, you could do this by including the extra fields in the source, but then you are passing all that uniform data through the entire dataflow and perhaps having to convert it as well.
A third option would be to run through your data flow without those columns at all (let them be NULL in the destination), and then follow the data flow with an UPDATE that sets those columns with a package variable value.

whats is the expression in SSIS to get the same dates as in source to destination table

What is the expression in SSIS to get the same dates as in source to destination. If I am using GETDATE() it will give current date but I want the same dates mentioned in source.
It sounds like you are looking to have the same date value for each row as it moves from the Source to the Destination. You can create your own variable and add it as a Derived Column transformation to the dataflow or you can use a system variable like ContainerStartTime from an Audit transformation (or Derived Column, too).
Here's an article on all the available System Variables in SSIS.
Since your wording was "same dates mentioned in source", you could do the following to get a single date from the source and use it in your data flow.
On the control flow, create a SQL task that returns GETDATE() as a single row result set from the source server. Save this result to a variable.
Within a data flow, add a derived column transformation after the source. Add the new variable value to the flow as a new column.
Map it to the destination column for a single date/time value that was derived from the source system right before the operation began.

SSIS row field to be sum of lookup

I have an SSIS data flow in SSIS 2012 project.
I need to calculate in the best way possible for every row field a sum of another table based on some criteria.
It would be something like a lookup but returning an aggregate on the lookup result.
Is there an SSIS way to do it by components or i need to turn to script task or stored procedure?
Example:
One data flow has a filed names LOT.
i need to get the sum(quantity) from table b where dataflow.LOT = tableb.lot
and write this back to a flow field
You just need to use the Lookup Component. Instead of selecting tableb write the query, thus
SELECT
B.Lot -- for matching
, SUM(B.quantity) AS TotalQuantity -- for data flow injection
FROM
tableb AS B
GROUP BY
B.Lot;
Now when the package begins, it will first run this query against that data source and generate the quantities across all lots.
This may or may not be a good thing based on data volumes and whether the values in tableB are changing. In the larger volume case, if it's a problem, then I'd look at whether I can do something about the above query. Maybe I only need current year's data. Maybe my list of Lots could be pushed into the remove server beforehand to only compute the aggregates for what I need.
If TableB is very active, then you might need to change your caching from the default of Full to a Partial or None. If Lot 10 shows up twice in the data flow, the None would perform 2 lookups against the source while the Partial would cache the values it has seen. Probably, depends on memory pressure, etc.