I am using Below code in my derived column of SSIS to remove title in Name column such as Mr,Mrs,Ms and Dr.
Ex:-
Mr ABC
MS XYZX
Mrs qwrer
DR ADCS
SO I am removing the title of the name.
SUBSTRING( [Name] , 1, 3)=="Mr" && LEN( [Name] ) >2 ? RIGHT([Name],LEN([Name])-2)
But getting Error as incomplete token or invalid statement.
Please help.
any other suggestion to remove the prefixes are also welcome but need to Use transformation.
A different way to think about the problem is that you want to look at the first "word" in the column Name where "word" is the collection of characters from the start of the string to the first space.
String matching in SSIS is case-sensitive so you'll want to force the first word to lower/upper case - however your master list of titles is cased (and then ensure the title list is all lower/upper case).
I am an advocate of making life easier on yourself so I'll add a Derived Column, actually lots of derived columns, that will identify the position of the first space in Name and I'll call this FirstSpace
DER GetFirstSpace
Add a new column, called FirstSpace. The expression we want to use is FINDSTRING
FINDSTRING([Name], " ", 1)
That will return the position of the first instance of a space (or zero if no space was found).
DER GetFirstWord
Add another derived column after the DER GetGetFirstSpace. We need to do this so we can inspect the values we're passing to get the first word. Do it all in a single Derived column and when you get something wrong, you won't be able to debug it and the real cost of development is maintenance. New column, FirstWord will be type DT_WSTR 4000 because that's what happens when you use the string manipulation expressions. I am going to force this to upper case as I'll need it later on.
UPPER(SUBSTRING([Name], 1, [FirstSpace]))
TODO: Verify whether that will be "DR" or "DR " with trailing space as I'm currently coding this by memory.
TODO: What happens if FirstSpace is 0 - we might need to make use of ternary operator ?:
At this point in the data flow, we have a single word in a column named FirstWord what we need to do is compare that to our list of known titles and if it exists, strip it from the original. And that's an interesting problem.
DER GetIsTitleMatched
Add yet another Derived column, this time to solve whether we've matched our list of titles. This will be a boolean type column named IsTitleMatched
[FirstWord] == "DR" || [FirstWord] == "MRS" || [FirstWord] == "MR" || [FirstWord] == "MS"
Following that pattern "FirstWord is exactly equal literal text OR..." when this derived column evaluates, we'll know whether the first word is something to be removed (finally)
DER SetFinalName
Here we're going to add yet another column, NameFinal The magic of stripping out the bad word will be that we use the RIGHT expression starting at the position of that opening space and going to the end of the word. You might need to add a left TRIM in there based on whether the RIGHT operation is inclusive of the starting point or not. Again, free handing at the moment so good but no guarantee of perfection.
(IsTitleMatched) ? RIGHT([Name], [FirstSpace]) : [Name]
I do violate my own rule here as I have a quasi complex expression there in the positive case. Feel free to insert a derived column task that computes the stripped version of the name.
At this point, you've got 4 to 5 derived columns in your data flow but you can add a data viewer between each to ensure you're getting the expected result. You're welcome to solve this in your own manner but that's the easiest approach I can think of.
Related
I'm using a derived column to concatenate two WSTR columns, either of which could be NULL.
Alert (3000 width column) and customAlert (125 wide column)
The problem I'm having is that when I run this with REPLACENULL so that the concatenation happens properly if one of the columns is NULL, it's doubling the calculated destination column width for each column, which is pushing the derived column outside the limit for a WSTR column (so for the expression as written, it wants 6254 column width, which is not possible without moving to NTEXT, which causes its own problems when I have to convert it back.) The column size requirements don't change if I don't use REPLACENULL, but that comes with other problems.
REPLACENULL(Alert,"") + REPLACENULL(customAlert,"")
I could work around this using a C# script in the data flow to do the work, but I'd rather understand why this is happening.
Congratulations, it is not often that I get to learn something new about SSIS.
It sure feels like there's a bug in the REPLACENULL expression but I was also unaware of its existence
Set up
I defined a data flow with an OLE DB Source using the following query
SELECT
REPLICATE(N'A', 3000) AS Alert
, REPLICATE(N'C', 125) AS customAlert;
That results in pipeline metadata of
Alert - DT_WSTR 3000
customAlert - DT_WSTR 125
I can concur that as provided the expression fails out with an error about the invalid length of the resulting string. What's really weird to my brain is that the computed string length, is as you say, 6254. That's not a perfect doubling so ... I have no idea what is going on there.
I advocate for breaking things down in SSIS to smaller pieces as they make debugging easier.
I started by splitting your 3 operations into 2 separate ones with an eye towards the concatenation in a later step.
Right off the bat, this alone errors out.
REPLACENULL(Alert, "")
Error approximately
the length is not valid. Must be between 0 and 4000. Computed length 6002
6002? I really have no idea what's going on behind the scenes. We'll pin that and come back to it.
I added
REPLACENULL(customAlert, "")
and that worked by itself but once again the length is doubled + 2 for an expected width of 252.
So, we can't use replacenull with Alert and using it with customAlert doubles our space.
What else can we do?
Invent a time machine and cut your teeth on SSIS 2005. ;) A different way of solving the problem is to use the ternary operator test condition ? true clause : false clause
ISNULL(Alert) ? "" : Alert
and
ISNULL(customAlert) ? "" : customAlert
That results in a column of length 3000 and 125 respectively. Personally, I'd add a derived column after the one that makes these two new columns and then the concatenation because the next error you run into, is going to make the all-in-one expression more difficult to diagnose
(ISNULL(Alert) ? "" : Alert ) + (ISNULL(customAlert) ? "" : customAlert )
This has a correct length of 3125
Nowhere in the documentation for REPLACENULL does it identify this unexpected behaviour. Tested using SQL SERVER 2017
SQL Server Integration Services
Microsoft SQL Server Integration Services Designer
Version 14.0.3002.92
I am working on a SSIS(2017) solution to read and load data from these 3 excel file names:
message_EDF_100420202.csv
message_UltaBIO_10042020.csv
message_SEIDV_10042020.csv
What I need to do is get only EDF or UltraBIO or SEIDV as a new column (derived column task)
so I need some help to set up correctly the substring function inside the derived column task.
any suggestion?
It appears your pattern is message_ Stuff-I-Want _junk (spaces not present in actual pattern). It's delimited by underscores and since the starting text is constant, that makes life easier.
Create a new column called MessageLessName
Remove the message_ portion with an expression
REPLACE([SourceFile], "message_", "")
Now, we want to take the left N most characters where N corresponds to the location of the underscore in our new column MessageLessName. For ease of debugging, I propose you add a second Derived Column Task to the output of the first one (where we defined MessageLessName). Here, we're going to create FirstUnderscore column
findstring([MessageLessName], "_", 1)
Finally, we'll add a third Derived Column Task and here-in is where we'll get to the final file name.
LEFT([MessageLessName], [FirstUnderscore])
Now that may be off by one due to my being lazy but because you can check each step along the way, you can verify MessageLessName is exactly what you think it should be and that FirstUnderscore is N characters in from our MessageLessName column.
script component using Split.
Row.ColumnName.ToString().Split('_')[1];
You are taking the column value and casting to string. (current value is the whole string)
Next is splitting based on '_' (current value is an array of three strings)
Finally you are taking the second value (0 based) (current value is the string you want)
Here's a little bonus. Getting the date as well:
string[] breakdown = Row.fileNames.Split('_');
Row.Type = breakdown[1];
string dateToFix = breakdown[2].Replace(".csv", "");
Row.Date = DateTime.Parse(dateToFix.Substring(0,2) +"/"
+ dateToFix.Substring(2,2) + "/" + dateToFix.Substring(4,4));
I have the following strings in a column called Qualifications:
D.Ther (ZIMB) 2017
D.Ther (ZIMB) 2016
Cert in Dental Therapy (ZIMB) 2017
D Tech (ZIMB) 1971
D Tech (The South African Dental Technicians Council -S.A) 1984
D Tech (ZIMB) 1987
Dental Technology (Natal-S.A) 1991
How do i retain the string in parentheses only using SSIS? for example ZIMB, Natal-S.A
I would approach this by identifying the location of the opening and closing parenthesis FINDSTRING. Knowing those two values, you can then use the SUBSTRING operation to extract the middle text.
Debugging is challenging with SSIS expressions. I would take the following approach. Add 3, yes 3, Derived Column Tasks to your data flow.
Derived Column 1
Add a new column called OpeningParenthesis of type DT_I4 (32 bit integer). The following expression will populate the value
FINDSTRING([Qualifications], "(", 1)
Find the first opening parenthesis in the column (case sensitive) Qualifications, starting at position 1.
Derived Column 2
Add a new called ClosingParenthesis of type DT_I4. The following expression will be used
FINDSTRING([Qualifications], ")", [OpeningParenthesis])
Find the first opening parenthesis in the column (case sensitive) Qualifications, starting at the location of the opening parenthesis.
Derived Column 3
Here we'll use a substring operation to slice out the middle text. Add a new column named Country and the type will be DT_WSTR. Use an expression like
SUBSTRING([Qualifications], [OpeningParenthesis], [ClosingParenthesis]-[OpeningParenthesis])
You might need to have a plus or minus 1 in that length calculation [ClosingParenthesis]-[OpeningParenthesis]
I don't have SSIS open at the moment so that's the approximate answer. Any of the positions from FINDSTR might be off by one but using this approach allows you to attach the Data Viewer between components to determine where things have gone wrong.
Also, depending on your data there might be issues if the column doesn't contain parenthesis or if it's null.
This is untested but should get you close.
substring -- parses a string with start point and how many chars
findstring will locate the position of the first value of the string searched.
substring([column], findstring([column],"(")+1,findstring([column],")") - findstring([column],"(")+1)
I finally got the right way as below:
SUBSTRING([Primary Qualifications],FINDSTRING([Primary Qualifications],"(",1) + 1,FINDSTRING([Primary Qualifications],")",1) - FINDSTRING([Primary Qualifications],"(",1) - 1)
I have a column in my access database table, I ran a query to make it proper case by using StrConv([MyColumn],3) but last two letters are state names and this query makes SOmeThing, soMethINg, NY to Something, Something, Ny,
I want the result as Something, Something, NY
Is there a another query I can run after to capitalize last letter?
You can use:
UcaseLast: Left([YourColumn], Len([YourColumn]) - 1) & UCase(Right([YourColumn], 1))
Well, most people would tell you to store your 'address', 'city', and 'state' as separate fields. Then you Proper Case each separately and concatenate them together. If you can do that... that is your best approach.
If this is a database or file that's been tossed at you and you can't make the field/table changes... it's still possible to get your desired results. However, you better make sure all strings end with your state code. Also make sure you don't have foreign addresses since Canadian (and other countries) use more that two letters for the province code at the end.
But if you are sure all records contain two letter state abbreviations, you can continue with the following:
MyColumnAdj: StrConv(Mid([MyColumn],1,len([MyColumn])-2),3) + StrConv(right([MyColumn],2),1)
This takes the midstring of your [MyColumn] from position 1 to the length of your [MyColumn] minus 2 (leaving off the state code) and it Proper Case's it all.
It then concatenates (using the plus sign) to a rightstring of [MyColumn] for a length of 2 and Upper Case's it.
Once again, this is dangerous if the field doesn't have the State Code consistently at the end of the string.
Best of luck. Hope this helps. :)
This might seem like a strange one, but can you store a conditional variable inside a text field (or something that can be made to work like one)?
I'm creating a notification system that groups notification by date and type, so if 2 or more of the same type are created in the same day it will group the notifications together and provide a count (e.g. Person1 and 3 others have done something).
The notification messages are stored in a table as a template e.g.
[[value]] has just done something
[[value]] is then replaced with Person1's name (and count OTHERS) by using
REPLACE(t1.message,
'[[value]]',
IF(t1.value_as_user=1,
CONCAT(t1.forename, ' ', t1.surname, IF((count((t1.id + DATE(t1.date)))-1)<>0,
CONCAT(' and ', (count((t1.id + DATE(t1.date)))-1), ' other'),
'')),
t1.value)) as 'message'
The only problem with this is it would mean that grammatically the message would no longer make sense i.e. "Person1 and 2 others has just done something" should now be "Person1 and 2 others have just done something"
Is there a way to select a string within a text field (like I have done with [[value]]) except have it where it would be something like [[has/have]] and depending on the count the correct one could chosen?
Sometimes you can change the sentence to avoid the grammatical variability:
Something has been done by Person1 and 2 others
(If you read The Elements of Style by Strunk & White, you may be conditioned to recoil at the passive voice. The above trick may seem like cheating. But don't let them bully you.)
Otherwise, no, there's no function in MySQL that can analyze a string and find out its grammatical usage in English and automatically conjugate verbs.
You could make [[has/have]] be another meta-field in your string, and then replace it with another expression based on the count.
But frankly, I wouldn't do the string formatting in an SQL expression. Not only for this reason, but because string-manipulation in SQL is clumsy compared to literally any other language (except Java of course).