SSIS Conditional Split - multiple conditions - ssis

I am trying to check if any of the column has changed. I have tried put all in on condition:
Taxes != (ISNULL(LookupTaxes) ? 0 : LookupTaxes) || Checksum != (ISNULL(LookupChecksum) ? 0 : LookupChecksum) || FeeIncome != (ISNULL(LookupFeeIncome) ? 0 : LookupFeeIncome) || CommissionReceived != (ISNULL(LookupCommissionReceived) ? 0 : LookupCommissionReceived) || CommissionPaid != (ISNULL(LookupCommissionPaid) ? 0 : LookupCommissionPaid) || Premium != (ISNULL(LookupPremium) ? 0 : LookupPremium)
but this is always returning FALSE although I have manually changed Taxes. If I put each condition separately than it works, for example:
Taxes != (ISNULL(LookupTaxes) ? 0 : LookupTaxes)
returns TRUE. If I use 6 conditions (instead of 1) for each column and output results into Union All is this method giving me what I need? My biggest concern is if rows will be duplicated. I have checked and looks like they are not but I wonder why some condition picks X records and another Y when I have changed both columns (related to these two conditions). For example, all Taxes and Premium columns are changed. In Conditional Split output "Taxes have changed" condition picks 1,000,000 rows, "Premium has changed" picks 100 rows. I know that this doesn't make any difference in my case because for me it's important that these rows are picked up for update but I am just confused about how this thing works.

I believe your logic is sound but I would suggest you split all those chained conditionals out into separate derived columns, especially if you'll have 1M rows flowing through it.
The first reason is performance. By splitting out operations into smaller pieces, the data flow engine can better take advantage of parallelism. Investigation: Can different combinations of components affect Dataflow performance?. Money quote from SQL CAT on the subject
Our testing has shown that if you want to change more than one column
in a data flow by using a Derived Column task, there is a performance
benefit to splitting that transformation into multiple Derived Column
tasks. In this note we will show that doing many smaller pieces of
work in multiple Derived Column tasks is significantly faster than
doing all the work in a more complex work in a single Derived Column
task.
The second reason is maintainability. Not only is the expression editor unfriendly and unforgiving, it makes it incredibly challenging to inspect intermediate values.
Demo
I put together a reproduction package that uses a script task to sends N rows down a data flow with the same value in all the columns as the row number. In the first data flow, I modify the values of Checksum and Premium as I load into a cache connection manager (to simulate lookup values differing). Even numbered rows should have the Checksum nulled out and every third row should have Premium nulled.
In this data flow, I used both your original expression (All in one check) as well as split it out into a check per condition.
As you can maybe see by the data viewer attached to the "bit bucket" task, the Changed post-fixed named columns only evaluate to True when there is a difference between the source and lookup value. (The row corresponding to 0 is accurate as (ISNULL(LookupTaxes) ? 0 : LookupTaxes) forces null values to be zero.
Were I you, at this point I'd replace the "bit bucket" transformation with a Conditional split
Output Name = UpdateRequired
Condition = [TaxesChanged] || [ChecksumChanged] || [FeeIncomeChanged]|| [CommissionReceivedChanged] || [CommissionPaidChanged] || [PremiumChanged]
If you continue to have issues, then you can put a data viewer on the pipeline to find the conditions that are not evaluating as expected.

An alternative is to use 2 Derived Columns Data Flow Transformations (DFT) prior to a Conditional Split DFT.
Derived Column DFT 1: Check each attribute setting a value of 1 if there is a change in the data and 0 if there is no change. For example, check the value of the inbound Data of Birth Column to the Database Date of Birth Column.
DerivedColumn1 = ((!ISNULL(InDOB) && !ISNULL(DbDOB) && InDOB != DbDOB) || (ISNULL(DbDOB) && !ISNULL(InDOB))) ? 1 : 0
DerivedColumn1 result is a signed integer value = 1 or 0.
DerivedColumn2: Sum the Derived Column values from DFT1.
IdentifiedChange = DerivedColumn1 + DerivedColumn2 + ....
Conditional Split DFT: Identifies if there is a change in the data as determined by the result of DFT2.
YesChange IdentifiedChange > 0
Hope this helps.

Actually the answer would be as follows
(Taxes != (ISNULL(LookupTaxes) ? 0 : LookupTaxes)) ||
(Checksum != (ISNULL(LookupChecksum) ? 0 : LookupChecksum)) ||
(FeeIncome != (ISNULL(LookupFeeIncome) ? 0 : LookupFeeIncome)) ||
(CommissionReceived != (ISNULL(LookupCommissionReceived) ? 0 : LookupCommissionReceived)) ||
(CommissionPaid != (ISNULL(LookupCommissionPaid) ? 0 : LookupCommissionPaid)) ||
(Premium != (ISNULL(LookupPremium) ? 0 : LookupPremium))
The extra "(" and ")" at the beginning and end of each variable checks is needed. The || (OR) condition will think that Taxes != the OR condition, as in the first part of your condition.
This will work.

Related

Is there a way to sort the whole database table according to a calculated data

I have a very large table called user and it looks like this
id
events (array)
... (extra columns)
1
[]
...
2
[]
...
...
...
...
When I query the table, I will pass two extra parameters, no_per_page and page so that only the data I want will be retrieved.
And before returning the data, an extra column called 'total_event_hours' will be calculated using the 'events' column from the query. However, because it is also a column and will be presented in a table of a front-end app, I want it also to be sortable.
The naive way I could think of will be to query the whole table and sort it after the calculation, but since it is very large, it will slow down the process.
Is there a way to sort the table using the calculated column and keep the pagination at the same time? Any input will be appreciated.
Edit:
$no_per_page = $param['no_per_page']
$start = $param['page'] > 1 ? ($param['page'] - 1) * $no_per_page : 0;
$query = "SELECT * FROM user LIMIT :start, :no_per_page";
$get_query = $this->db->prepare($query);
$get_query->bindParam(':start', $start);
$get_query->bindParam(':no_per_page', $no_per_page);
$get_query->execute();
$data = $get_query->fetchAll(PDO::FETCH_ASSOC);
foreach ($data as $_data) {
$_data['total_event_hours'] = $this->_getTotalEventHours($_data['events'])
// I want to sort by this column but it is not presented in the table
}
return data;
To answer your question
Is there a way to sort the table using the calculated column and keep the pagination at the same time?
The DBMS cannot sort data by something it does not know. To sort by a calculated value, there are several possibilities, e. g.
sometimes the result of the calculation is in the same order as a column. Then you can sort by this column, of course.
you can rewrite your query to include a calculated column. That is, you must perform the calculation with SQL/a user defined function. You can then sort by the calculated column. Note that it's sometimes possible to get the same order with a simpler calculation.
approach #2 most likely prevents the DBMS from using an index for sorting. To speed things up, you could modify your table to contain the calculated column, and then update its value using triggers. If you have this column, you can also index it.
As always, there's a trade-off, so it depends on the specific circumstances/use case which option to choose.

How to use IF search condition as "true" value

I am creating a report generator which allows me to build custom reports by selecting tables and fields which are saved in the database. Each table, field, calculated value, etc. is treated as a separate entity, with no direct connection to any other entity. They are all self-contained.
I have a number of count fields/columns, which I currently have working, using code similar to the following:
sum(case when usasf_teams.division_levelid=7 then 1 else 0 end) as Dance
My problem is that there are quite a few of these columns, and most of the values are zero. This makes it hard to spot the non-zero values, which are what we want to see.
I know I can use the IF statement to return either the number or a space, but I don't want to put the server through the work of doing the same operation twice for every column in every row. I would like to do something like this:
if(sum(case when usasf_teams.division_levelid=7 then 1 else 0 end),
{{use the comparison value}}, ' ') as Dance
This way, the sum/case operation is only done once for each field.
Is this possible?
First, you can simplify your logic to:
sum(usasf_teams.division_levelid = 7) as Dance
Then, I would recommend changing the 0 values to NULL rather than a space:
nullif(sum(usasf_teams.division_levelid = 7), 0) as Dance
If you are going to return a real space, you need to be careful of types. You could do this as;
select (case when sum(usasf_teams.division_levelid = 7) > 0
then cast(sum(usasf_teams.division_levelid = 7) as char)
else ' '
end)
You don't need to worry about doing the sum() twice. The expense in an aggregation is manipulating the data, not (in general) calculating the aggregation functions themselves.

Using condition stored in a database table with Conditional Split in SSIS

I have a list of criteria in a database table that are entered by user. This criteria is in a format X > 5 for Segment A, X > 7 for segment B, and so on.
The data is collected using OLE DB Source where I specified a stored procedure to retrieve data. The record set has three columns: IdNumber, SegmentId and Total.
My conditional split should look like this:
SegmentId == 1 && Total > 5 (I would like to replace X with the actual value stored in Total column.
SegmentId == 1 && !(Total > 5)
... and so on.
So my question is, how can I use a condition which is string-based and stored in database in the Conditional Split Transformation Editor?
Regards,
Huske
You can't do this from the editor; you will need to add code (e.g. in a Script Task) to query the database, get the condition, build a complete expression and then set it in the Conditional Split programmatically.

Setting a default value in a blank field in an Access query

This may have been answered elsewhere, but I can't find it!
I'm combining 2 queries from different sources onto a single query for calculation and reporting purposes. Fields which are not common to both sources show as blank if there is no data, and can't then be used in arithmetic.
I want to set a default of zero where blank, instead of having to knife and fork the query into another table and run an update query over all the blank fields.
There's got to be something simpler! HELP!
Give this a try:
SELECT
nz(value1, 0), nz(value2,0), nz(value3,0)
FROM table1 left outer join table2 on table1.column = table2.column
unless your query resembles what John answered with (a cartesian) then his answer is more appropriate.... (cartesians are dangerous if not used correctly... depending on how big the individual tables are allowed to become you can kill an access application by using them)
You'll either want to use Nz() or iif(), depending on whether that "blank" really is just a blank (empty string) or Null.
So your options are:
SELECT Nz(source1.a + source2.b,0) FROM source1, source2
or:
SELECT iif(source1.a + source2.b <> "", source1.a + source2.b, 0) FROM source1, source2

SSIS Expressions - EvaluateAsExpression Problem

In a Data Flow, I have an Derived Column task. In the expression for one of the columns, I have the following expression:
[siteid] == "100" ? "1101" : [siteid] == "110" ? "1001" : [siteid] == "120" ? "2101" : [siteid] == "140" ? "1102" : [siteid] == "210" ? "2001" : [siteid] == "310" ? "3001" : [siteid]
This works just fine. However, I intend to reuse this in at least a dozen other places so I want to store this to a variable and use the variable in the Derived Column instead of the hard-coded expression. When I attempt to create a variable, using the expression above, I get a syntax error saying 'siteid' is not defined. I guess this makes sense because it isn't. But how can I get this the expression to work by using a variable? It seems like I need some sort of way to tell it that 'siteid' will be the column containing the data I want to apply the expression to.
You can't use column names in variable expressions. I'm afraid there is no way to easily "clone" this logic in SSIS.
I would suggest, however, that you don't attempt to clone hard-coded logic like this. Construct a reference table and use a Lookup component to fetch the value. That way, if your "case" statement ever changes, you only need to do one modification - you don't need to hunt down everywhere this logic gets used. And configuration of the Lookup is pretty drag and drop easy.