Extracting String in Parenthesis - SSIS - ssis

I have the following strings in a column called Qualifications:
D.Ther (ZIMB) 2017
D.Ther (ZIMB) 2016
Cert in Dental Therapy (ZIMB) 2017
D Tech (ZIMB) 1971
D Tech (The South African Dental Technicians Council -S.A) 1984
D Tech (ZIMB) 1987
Dental Technology (Natal-S.A) 1991
How do i retain the string in parentheses only using SSIS? for example ZIMB, Natal-S.A

I would approach this by identifying the location of the opening and closing parenthesis FINDSTRING. Knowing those two values, you can then use the SUBSTRING operation to extract the middle text.
Debugging is challenging with SSIS expressions. I would take the following approach. Add 3, yes 3, Derived Column Tasks to your data flow.
Derived Column 1
Add a new column called OpeningParenthesis of type DT_I4 (32 bit integer). The following expression will populate the value
FINDSTRING([Qualifications], "(", 1)
Find the first opening parenthesis in the column (case sensitive) Qualifications, starting at position 1.
Derived Column 2
Add a new called ClosingParenthesis of type DT_I4. The following expression will be used
FINDSTRING([Qualifications], ")", [OpeningParenthesis])
Find the first opening parenthesis in the column (case sensitive) Qualifications, starting at the location of the opening parenthesis.
Derived Column 3
Here we'll use a substring operation to slice out the middle text. Add a new column named Country and the type will be DT_WSTR. Use an expression like
SUBSTRING([Qualifications], [OpeningParenthesis], [ClosingParenthesis]-[OpeningParenthesis])
You might need to have a plus or minus 1 in that length calculation [ClosingParenthesis]-[OpeningParenthesis]
I don't have SSIS open at the moment so that's the approximate answer. Any of the positions from FINDSTR might be off by one but using this approach allows you to attach the Data Viewer between components to determine where things have gone wrong.
Also, depending on your data there might be issues if the column doesn't contain parenthesis or if it's null.

This is untested but should get you close.
substring -- parses a string with start point and how many chars
findstring will locate the position of the first value of the string searched.
substring([column], findstring([column],"(")+1,findstring([column],")") - findstring([column],"(")+1)

I finally got the right way as below:
SUBSTRING([Primary Qualifications],FINDSTRING([Primary Qualifications],"(",1) + 1,FINDSTRING([Primary Qualifications],")",1) - FINDSTRING([Primary Qualifications],"(",1) - 1)

Related

How to Split the value and find the longest text string in google sheet

I have a column in google sheet in which each cell contains this type of text
manager, finance manager
accountant
accountant, chief accountant
manager, auditor, other, finance manager
accountant
I want to find the longest text like below and show that text into new cell
finance manager
accountant
chief accountant
finance manager
accountant
I used split function to split the text and find function but finding character or number of character is working but i am unable to understand how i will get whole longest word
Kindly help me.
An alternative would be to use
=FILTER(TRIM(SPLIT(A1, ",")), LEN(SPLIT(A1, ",")) = MAX(LEN(SPLIT(A1, ","))))
and fill down as far as needed.
Suppose your comma-separated lists reside in A2:A. Place the following in, say, B2 of an otherwise empty range B2:B ...
=ArrayFormula(IF(A2:A="",,TRIM(REGEXEXTRACT(A2:A,"[^,]{"&REGEXEXTRACT(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IF(REGEXMATCH(A2:A,"[^,]{"&SEQUENCE(1,30,30,-1)&"}")=FALSE,,SEQUENCE(1,30,30,-1))),,30))),"\S+")*1&"}"))))
This is a complex formula, one that would be difficult to explain. So I will leave it to you (and others who may be interested) to dissect, analyze and understand the inner workings. However, if there is a specific question I can answer should you (or others) get stuck, feel free to ask.
In short, the formula checks to see if there are any REGEX matches for non-comma groupings of 30 characters in length, then 29, 28 and so on to 1. If so, that number is returned in a space-separated list. The first number (which will be the highest) is returned and used to extract a non-comma REGEX expression of that exact length (which then has any leading or trailing spaces removed from it).
if your text is stored in column A, you can use this formula:
=hlookup(max(arrayformula(len(trim(split(A1,","))))), {arrayformula(len(trim(split(A1,","))));arrayformula(trim(split(A1,",")))},2,False)
Second option is to write a custom function.

SSIS Substring Error Using Case When with Right Function

I am using Below code in my derived column of SSIS to remove title in Name column such as Mr,Mrs,Ms and Dr.
Ex:-
Mr ABC
MS XYZX
Mrs qwrer
DR ADCS
SO I am removing the title of the name.
SUBSTRING( [Name] , 1, 3)=="Mr" && LEN( [Name] ) >2 ? RIGHT([Name],LEN([Name])-2)
But getting Error as incomplete token or invalid statement.
Please help.
any other suggestion to remove the prefixes are also welcome but need to Use transformation.
A different way to think about the problem is that you want to look at the first "word" in the column Name where "word" is the collection of characters from the start of the string to the first space.
String matching in SSIS is case-sensitive so you'll want to force the first word to lower/upper case - however your master list of titles is cased (and then ensure the title list is all lower/upper case).
I am an advocate of making life easier on yourself so I'll add a Derived Column, actually lots of derived columns, that will identify the position of the first space in Name and I'll call this FirstSpace
DER GetFirstSpace
Add a new column, called FirstSpace. The expression we want to use is FINDSTRING
FINDSTRING([Name], " ", 1)
That will return the position of the first instance of a space (or zero if no space was found).
DER GetFirstWord
Add another derived column after the DER GetGetFirstSpace. We need to do this so we can inspect the values we're passing to get the first word. Do it all in a single Derived column and when you get something wrong, you won't be able to debug it and the real cost of development is maintenance. New column, FirstWord will be type DT_WSTR 4000 because that's what happens when you use the string manipulation expressions. I am going to force this to upper case as I'll need it later on.
UPPER(SUBSTRING([Name], 1, [FirstSpace]))
TODO: Verify whether that will be "DR" or "DR " with trailing space as I'm currently coding this by memory.
TODO: What happens if FirstSpace is 0 - we might need to make use of ternary operator ?:
At this point in the data flow, we have a single word in a column named FirstWord what we need to do is compare that to our list of known titles and if it exists, strip it from the original. And that's an interesting problem.
DER GetIsTitleMatched
Add yet another Derived column, this time to solve whether we've matched our list of titles. This will be a boolean type column named IsTitleMatched
[FirstWord] == "DR" || [FirstWord] == "MRS" || [FirstWord] == "MR" || [FirstWord] == "MS"
Following that pattern "FirstWord is exactly equal literal text OR..." when this derived column evaluates, we'll know whether the first word is something to be removed (finally)
DER SetFinalName
Here we're going to add yet another column, NameFinal The magic of stripping out the bad word will be that we use the RIGHT expression starting at the position of that opening space and going to the end of the word. You might need to add a left TRIM in there based on whether the RIGHT operation is inclusive of the starting point or not. Again, free handing at the moment so good but no guarantee of perfection.
(IsTitleMatched) ? RIGHT([Name], [FirstSpace]) : [Name]
I do violate my own rule here as I have a quasi complex expression there in the positive case. Feel free to insert a derived column task that computes the stripped version of the name.
At this point, you've got 4 to 5 derived columns in your data flow but you can add a data viewer between each to ensure you're getting the expected result. You're welcome to solve this in your own manner but that's the easiest approach I can think of.

Access 2013 Count

I am working on a report in Access 2013 I need to seperate the first 20 records in a column that contain a value and assign a name to them. Such as at 1-20 I need it to insert Lot 1 at 21-40 need to assign Lot 2 etc... The report needs to be separated by lots of 20. I can also just insert a line when it reaches sets of 20 without a name if that makes it easier. Just need something to show a break at sets of 20.
Example: As you can see the report is separated by welder stencil. When the count in the VT column reaches 20 I need to enter a line or some type of divider to separate data. What our client is asking for is we separate the VT in sets of 20. I don't know whats the easiest way to accomplish this. I have researched it but haven't found anything.
Example Report with Divisions
Update the report's RecordSource query by adding "Lot" values for each row. There are multiple ways of doing this, but the easiest will be if your records already have a sequential, continuous numerical key. If they do not have such a key, you can research generating such sequential numbers for your query, but it is beyond the scope of this question and no details about the actual data schema were supplied in the question.
Let's imagine that you have such a key column [Seq]. You use the modulo (mod) and/or integer division operators (\ - backslash) to determine values that are exactly divisible by 20, e.g. ([Seq] - 1) mod 20 == 0.
Generate a lot value for each row. An example SQL snippet: SELECT ("Lot " & (([Seq] - 1) \ 20)) As LotNumber ...
Utilize Access report sorting and grouping features --grouping on the new Lot field-- to print a line and/or label at the start of each group. You can also have the report start a new page at the beginning or end of such a group.
The details about grouping can be found elsewhere in tutorials and Access documentation and are beyond the scope of this question.

SSIS with Visual Studio Derived Column

I'm trying to take a list from a text file and export it to excel.
Rank,Millions,Name,Country,Source of wealth
1,12700,Lakshmi Mittal and family, India,Steel
2,12315,Alisher Usmanov, Russia,Mining and Investment
3,9500,Roman Abramovich, Russia,Oil and industry
4,8600,Sri and Gopi Hinduja, India,Industry and Finance
5,7580,Leonard Blavatnik, Russia,Industry
6,7400,Ernesto and Kirsty Bertarelli, Switzerland &  United Kingdom,Pharmaceuticals
7,7350,Gerald Grosvenor, 6th Duke of Westminster, United Kingdom,Property
8,7083,David and Simon Reuben, United Kingdom,Property and Internet
9,6600,John Fredriksen and family, Norway,Shipping and oil services
10,5900,Galen Weston, George Weston and family, Canada,Retailing
11,5490,Charlene de Carvalho-Heineken and Michel Carvalho, Netherlands,Inheritance, banking, brewing (Heineken)
12,4300,Hans Rausing and family, Sweden,Packaging
13,4100,Joseph Lau, China,Property
14,,Paul Chuckle, UK, IT Training
15,4300,Nicky Oppenheimer, South Africa,Mining, Diamonds
16,3900,Kirsten & Jorn Rausing, Sweden,Inheritance, Investment
17,3400,Sir Richard Branson, United Kingdom,Internet, Transport, Finance
18,3300,Sir Philip and Lady Green, United Kingdom,Retailing
19,,Barry Chuckle, UK, Comedy
20,,Roger Chuckle,UK,SSIS consultancy
For rank 14, 19, and 20, the Millions column is an empty string. I've already separated the data from rows that have Millions specified and those that have not by using a conditional split. What I want to do now is to take the 3 rows that aren't specified, and write in "Not Specified" using a Derived Column Transformation. I figure that I would be able to do something like (in the expression tab):
Millions=="Not Specified"
However when I run a Data Viewer on the output, the three columns show "False" in the Millions column.
I did a weird work around but I'm sure there's a better way:
Millions == "" || Millions != "" ? "Not Specified" : "Not Specified"
Can someone tell me why I can't use my first solution? I'm assuming it has something to do with boolean, but the data type is a string.
First of all, is there some special reason why are you separating records with empty value for "Millions" field with conditional split? You can just use Derived Column transformation and under the Derived Column tab select: Replace 'Millions'. In expression bar you can enter:
`Millions=="" ? "Not specified" : Millions`
This means that if Millions field has empty value, replace this with "Not specified" (true condition) or, on the opposite, just insert existing value (false condition). You are right, data type is string, but expression evaluation always points to the boolean. Let me just briefly explain why you can't use just Millions=="Not specified". Expression expects some arguments to evaluate against, so you must specify what happens if the expression evaluates to true or false. You can think of this operator as simplified version of if/else statement. So, the problem with your first solution is that you didn't specify what happens if Millions=="". Hope that this helps.

Reporting services: Join all field on a dataset

In a report, I've a dataset with a filter(based on a MultiValue parameter).
This dataset contains two field: Id and Name.
I need to display somewhere the concatenation of all names:
Name1 / Name2 / Name3
The problem is that the join method works only on array, and then I cannot specify a dataset as value.
I looked in custom code too, but I didn't found anything working.
How should I do this ?
I may be a bit late for this but for anyone that's interested in this, there is a rather easy way of doing this in SSRS:
=Join(LookupSet(1,1,Fields!Name.Value, "DatasetName")," / ")
SSRS-2008 R2 and higher...
1. Using LookupSet
If you're beyond the 2008 version OP has, there exists a good solution:
=Join(LookupSet(1, 1, Fields!Name.Value, "DatasetName"), " / ")
Credit for this answer using the LookupSet solution goes entirely to #urbanhusky's answer.
SSRS-2008 and lower...
I'm keeping this answer though because it aggregates #urbanhusky's solution with the solutions available to poor souls stuck with OP's version of SSRS and below.
In SSRS 2008 there's only three "options" as far as I can see, each with its own downside. The first one's probably the least hackish.
2. Extra parameter
Create an internal parameter (e.g. "NameParameter", see this SO answer or MSDN) with Allow Multiple Values. Set the default value of the parameter to the Name field from your dataset. Then use the function =Join(Parameters!NameParameter.Value, " / ") to show the joined names in a textbox.
This may be your best bet, but if there are a lot of values the parameter may not work very well.
3. Use a List
Create a List and drag/drop the Name field to it. If necessary, group on the Name as well.
The disadvantage here is that (AFAIK) the list can't be made to show horizontally.
4. Use a Matrix
Oh boy, this one's real ugly. Nonetheless, here goes: create a matrix, drag the Name field to the column header, and hide the first column as well as the second row (for displaying the data).
The main disadvantage is that it's a hack (and quite some overkill), plus you'll have to trim the last seperator character manually with an expression.