SSIS with Visual Studio Derived Column - ssis

I'm trying to take a list from a text file and export it to excel.
Rank,Millions,Name,Country,Source of wealth
1,12700,Lakshmi Mittal and family, India,Steel
2,12315,Alisher Usmanov, Russia,Mining and Investment
3,9500,Roman Abramovich, Russia,Oil and industry
4,8600,Sri and Gopi Hinduja, India,Industry and Finance
5,7580,Leonard Blavatnik, Russia,Industry
6,7400,Ernesto and Kirsty Bertarelli, Switzerland &  United Kingdom,Pharmaceuticals
7,7350,Gerald Grosvenor, 6th Duke of Westminster, United Kingdom,Property
8,7083,David and Simon Reuben, United Kingdom,Property and Internet
9,6600,John Fredriksen and family, Norway,Shipping and oil services
10,5900,Galen Weston, George Weston and family, Canada,Retailing
11,5490,Charlene de Carvalho-Heineken and Michel Carvalho, Netherlands,Inheritance, banking, brewing (Heineken)
12,4300,Hans Rausing and family, Sweden,Packaging
13,4100,Joseph Lau, China,Property
14,,Paul Chuckle, UK, IT Training
15,4300,Nicky Oppenheimer, South Africa,Mining, Diamonds
16,3900,Kirsten & Jorn Rausing, Sweden,Inheritance, Investment
17,3400,Sir Richard Branson, United Kingdom,Internet, Transport, Finance
18,3300,Sir Philip and Lady Green, United Kingdom,Retailing
19,,Barry Chuckle, UK, Comedy
20,,Roger Chuckle,UK,SSIS consultancy
For rank 14, 19, and 20, the Millions column is an empty string. I've already separated the data from rows that have Millions specified and those that have not by using a conditional split. What I want to do now is to take the 3 rows that aren't specified, and write in "Not Specified" using a Derived Column Transformation. I figure that I would be able to do something like (in the expression tab):
Millions=="Not Specified"
However when I run a Data Viewer on the output, the three columns show "False" in the Millions column.
I did a weird work around but I'm sure there's a better way:
Millions == "" || Millions != "" ? "Not Specified" : "Not Specified"
Can someone tell me why I can't use my first solution? I'm assuming it has something to do with boolean, but the data type is a string.

First of all, is there some special reason why are you separating records with empty value for "Millions" field with conditional split? You can just use Derived Column transformation and under the Derived Column tab select: Replace 'Millions'. In expression bar you can enter:
`Millions=="" ? "Not specified" : Millions`
This means that if Millions field has empty value, replace this with "Not specified" (true condition) or, on the opposite, just insert existing value (false condition). You are right, data type is string, but expression evaluation always points to the boolean. Let me just briefly explain why you can't use just Millions=="Not specified". Expression expects some arguments to evaluate against, so you must specify what happens if the expression evaluates to true or false. You can think of this operator as simplified version of if/else statement. So, the problem with your first solution is that you didn't specify what happens if Millions=="". Hope that this helps.

Related

SSIS Derived Column - REPLACENULL doubling column width

I'm using a derived column to concatenate two WSTR columns, either of which could be NULL.
Alert (3000 width column) and customAlert (125 wide column)
The problem I'm having is that when I run this with REPLACENULL so that the concatenation happens properly if one of the columns is NULL, it's doubling the calculated destination column width for each column, which is pushing the derived column outside the limit for a WSTR column (so for the expression as written, it wants 6254 column width, which is not possible without moving to NTEXT, which causes its own problems when I have to convert it back.) The column size requirements don't change if I don't use REPLACENULL, but that comes with other problems.
REPLACENULL(Alert,"") + REPLACENULL(customAlert,"")
I could work around this using a C# script in the data flow to do the work, but I'd rather understand why this is happening.
Congratulations, it is not often that I get to learn something new about SSIS.
It sure feels like there's a bug in the REPLACENULL expression but I was also unaware of its existence
Set up
I defined a data flow with an OLE DB Source using the following query
SELECT
REPLICATE(N'A', 3000) AS Alert
, REPLICATE(N'C', 125) AS customAlert;
That results in pipeline metadata of
Alert - DT_WSTR 3000
customAlert - DT_WSTR 125
I can concur that as provided the expression fails out with an error about the invalid length of the resulting string. What's really weird to my brain is that the computed string length, is as you say, 6254. That's not a perfect doubling so ... I have no idea what is going on there.
I advocate for breaking things down in SSIS to smaller pieces as they make debugging easier.
I started by splitting your 3 operations into 2 separate ones with an eye towards the concatenation in a later step.
Right off the bat, this alone errors out.
REPLACENULL(Alert, "")
Error approximately
the length is not valid. Must be between 0 and 4000. Computed length 6002
6002? I really have no idea what's going on behind the scenes. We'll pin that and come back to it.
I added
REPLACENULL(customAlert, "")
and that worked by itself but once again the length is doubled + 2 for an expected width of 252.
So, we can't use replacenull with Alert and using it with customAlert doubles our space.
What else can we do?
Invent a time machine and cut your teeth on SSIS 2005. ;) A different way of solving the problem is to use the ternary operator test condition ? true clause : false clause
ISNULL(Alert) ? "" : Alert
and
ISNULL(customAlert) ? "" : customAlert
That results in a column of length 3000 and 125 respectively. Personally, I'd add a derived column after the one that makes these two new columns and then the concatenation because the next error you run into, is going to make the all-in-one expression more difficult to diagnose
(ISNULL(Alert) ? "" : Alert ) + (ISNULL(customAlert) ? "" : customAlert )
This has a correct length of 3125
Nowhere in the documentation for REPLACENULL does it identify this unexpected behaviour. Tested using SQL SERVER 2017
SQL Server Integration Services
Microsoft SQL Server Integration Services Designer
Version 14.0.3002.92

Access 2016 Table Calculated Field

I have a table containing the following: Five Y/N fields and a calculated field [Priority Results] that totals the number of 'Yeses' from those five y/n fields. I'm trying to create another calculated field that will return a value of Low, Medium or High dependent on the number of boxes that have been checked. [Priority Results] currently returns the values 0 through -5. Low = 0 & -1, Medium = -2, High = -3 or lower. I've tried SEVERAL different versions of If/Then, If/Else, Iif statements and always receive a syntax error. I've read a lot of different sites and the following expression seems to be the most commonly used, but I'm still getting the error. Anyone have any ideas? I've even tried this statement on a non-calculated field and can't get it to work.
IIf([Priority results]<="-1","Low",IIf([Priority results]="-2","Medium",IIf([Priority results]>="-3","High")))
Here are the calculated field [Priority results] properties.
Expression:
[Class Non-Attendance]+[Instructor Referral]+[Late Registration]+[Low Starting GPA]+[Talon Log-in]
Result Type: Long Integer
enter image description here
The part of the table this question relates to has the following fields:
Class Non-Attendance: Yes/No
Instructor Referral: Yes/No
Late Registration: Yes/No
Low Starting GPA: Yes/No
Talon Log-In: Yes/No
Priority Results: Calculated field counting the Yes/No fields above
Priority Outcome: Calculated field (that isn't working) prioritizing based on Priority Results
Don't put parameters for number fields in quotes.
Consider:
IIf(Abs([Priority Results])<=1, "Low", IIf(Abs([Priority Results])=2, "Medium", "High"))
In a query or textbox, expression could be:
Switch(Abs([Priority Results])<=1, "Low", Abs([Priority Results])=2, "Medium", True, "High")
Parts of the question still confuse me, which is why this answer will be brief. You have a calculated field PriorityOutcome based on another calculated field PriorityResults and that is the problem. Access doesn't calculate PriorityResults before calculating PriorityOutcome. Instead Access says PriorityResults doesn't exist yet and passes null to PriorityOutcome resulting in either an error or a silent fail.
There are several fixes you can mix and match. You can repeat the calculation for PriorityResults inside PriorityOutcome: wasteful but often the fastest solution. You can also add a code module with public functions to do part or all of the calculations. Then refer to those public functions in your calculated fields Access intellisense can find public functions.

SSIS Substring Error Using Case When with Right Function

I am using Below code in my derived column of SSIS to remove title in Name column such as Mr,Mrs,Ms and Dr.
Ex:-
Mr ABC
MS XYZX
Mrs qwrer
DR ADCS
SO I am removing the title of the name.
SUBSTRING( [Name] , 1, 3)=="Mr" && LEN( [Name] ) >2 ? RIGHT([Name],LEN([Name])-2)
But getting Error as incomplete token or invalid statement.
Please help.
any other suggestion to remove the prefixes are also welcome but need to Use transformation.
A different way to think about the problem is that you want to look at the first "word" in the column Name where "word" is the collection of characters from the start of the string to the first space.
String matching in SSIS is case-sensitive so you'll want to force the first word to lower/upper case - however your master list of titles is cased (and then ensure the title list is all lower/upper case).
I am an advocate of making life easier on yourself so I'll add a Derived Column, actually lots of derived columns, that will identify the position of the first space in Name and I'll call this FirstSpace
DER GetFirstSpace
Add a new column, called FirstSpace. The expression we want to use is FINDSTRING
FINDSTRING([Name], " ", 1)
That will return the position of the first instance of a space (or zero if no space was found).
DER GetFirstWord
Add another derived column after the DER GetGetFirstSpace. We need to do this so we can inspect the values we're passing to get the first word. Do it all in a single Derived column and when you get something wrong, you won't be able to debug it and the real cost of development is maintenance. New column, FirstWord will be type DT_WSTR 4000 because that's what happens when you use the string manipulation expressions. I am going to force this to upper case as I'll need it later on.
UPPER(SUBSTRING([Name], 1, [FirstSpace]))
TODO: Verify whether that will be "DR" or "DR " with trailing space as I'm currently coding this by memory.
TODO: What happens if FirstSpace is 0 - we might need to make use of ternary operator ?:
At this point in the data flow, we have a single word in a column named FirstWord what we need to do is compare that to our list of known titles and if it exists, strip it from the original. And that's an interesting problem.
DER GetIsTitleMatched
Add yet another Derived column, this time to solve whether we've matched our list of titles. This will be a boolean type column named IsTitleMatched
[FirstWord] == "DR" || [FirstWord] == "MRS" || [FirstWord] == "MR" || [FirstWord] == "MS"
Following that pattern "FirstWord is exactly equal literal text OR..." when this derived column evaluates, we'll know whether the first word is something to be removed (finally)
DER SetFinalName
Here we're going to add yet another column, NameFinal The magic of stripping out the bad word will be that we use the RIGHT expression starting at the position of that opening space and going to the end of the word. You might need to add a left TRIM in there based on whether the RIGHT operation is inclusive of the starting point or not. Again, free handing at the moment so good but no guarantee of perfection.
(IsTitleMatched) ? RIGHT([Name], [FirstSpace]) : [Name]
I do violate my own rule here as I have a quasi complex expression there in the positive case. Feel free to insert a derived column task that computes the stripped version of the name.
At this point, you've got 4 to 5 derived columns in your data flow but you can add a data viewer between each to ensure you're getting the expected result. You're welcome to solve this in your own manner but that's the easiest approach I can think of.

Extracting String in Parenthesis - SSIS

I have the following strings in a column called Qualifications:
D.Ther (ZIMB) 2017
D.Ther (ZIMB) 2016
Cert in Dental Therapy (ZIMB) 2017
D Tech (ZIMB) 1971
D Tech (The South African Dental Technicians Council -S.A) 1984
D Tech (ZIMB) 1987
Dental Technology (Natal-S.A) 1991
How do i retain the string in parentheses only using SSIS? for example ZIMB, Natal-S.A
I would approach this by identifying the location of the opening and closing parenthesis FINDSTRING. Knowing those two values, you can then use the SUBSTRING operation to extract the middle text.
Debugging is challenging with SSIS expressions. I would take the following approach. Add 3, yes 3, Derived Column Tasks to your data flow.
Derived Column 1
Add a new column called OpeningParenthesis of type DT_I4 (32 bit integer). The following expression will populate the value
FINDSTRING([Qualifications], "(", 1)
Find the first opening parenthesis in the column (case sensitive) Qualifications, starting at position 1.
Derived Column 2
Add a new called ClosingParenthesis of type DT_I4. The following expression will be used
FINDSTRING([Qualifications], ")", [OpeningParenthesis])
Find the first opening parenthesis in the column (case sensitive) Qualifications, starting at the location of the opening parenthesis.
Derived Column 3
Here we'll use a substring operation to slice out the middle text. Add a new column named Country and the type will be DT_WSTR. Use an expression like
SUBSTRING([Qualifications], [OpeningParenthesis], [ClosingParenthesis]-[OpeningParenthesis])
You might need to have a plus or minus 1 in that length calculation [ClosingParenthesis]-[OpeningParenthesis]
I don't have SSIS open at the moment so that's the approximate answer. Any of the positions from FINDSTR might be off by one but using this approach allows you to attach the Data Viewer between components to determine where things have gone wrong.
Also, depending on your data there might be issues if the column doesn't contain parenthesis or if it's null.
This is untested but should get you close.
substring -- parses a string with start point and how many chars
findstring will locate the position of the first value of the string searched.
substring([column], findstring([column],"(")+1,findstring([column],")") - findstring([column],"(")+1)
I finally got the right way as below:
SUBSTRING([Primary Qualifications],FINDSTRING([Primary Qualifications],"(",1) + 1,FINDSTRING([Primary Qualifications],")",1) - FINDSTRING([Primary Qualifications],"(",1) - 1)

Conditional request from SSRS 2005 DataSet in an expression

I'm having difficulty with what seems to be a limitation in SSRS (BIDS) 2005, and upgrading to a newer version is not yet an option for me.
I have a DataSet that returns a bunch of payroll withholding data by employee, and I'm wanting to get the value of field "B" based on what I find in field "A." Specifically, I want to get a dollar amount based on a field "Code" being "OptlLife."
So, if Field A = "OptLife" give me the dollar value of Field B. Pretty simple, right?
The closest I can come is:
=IIF(First(Fields!Code.Value, "Withholdings") = "OptLife", First(Fields!AmtPct.Value, "Withholdings"), " ")
What's killing me is that "First" indicator. I don't want "first" or "last" or "max" or "sum." I want whatever row of data has the value OptLife. If I remove that indicator, I get a syntax error. How do I get around this?
What I really want is something like "select AmtPct from WithholdingsDataSet where Code = "OptLife" but it needs to be an SSRS expression.
EDIT PER IAN'S QUESTION:
So, I have a result set from the DataSet that looks like this.
Employee Code Method AmtPct
1121 401K A 400
1121 Roth null null
1121 FSAChild A 96.15
1121 FSAHealth A 192.31
1121 OptLife A 28.84
In my report, I have a textfield formatted for Currency, and need an expression that will pull the 28.84 into that field. Because I have multiple DataSets tied to this report, I need to specify which DataSet the value is coming from, hence the (Fields!Code.Value, "Withholdings") but that parenthetical statement has to be prepended with something. Nothing I put there gives me the value I need.
Once I nail this down, the same methodology will be used for distinguishing Roth and Traditional 401K, and FSA Childcare from FSA Healthcare.
You should be able to use something like:
=Sum(IIF(Fields!Code.Value = "OptLife", Fields!AmtPct.Value, Nothing), "Withholdings")
As you've seen, since you're referencing the data in a Textbox you need to use an aggregate expression.
The expression above will apply a Sum to the dataset, but will only consider OptLife rows, which seems to be what you're after. Assuming you only have one OptLife code in the Dataset, you'll only be considering that one row in the aggregate, which I think is what is required.