SSIS Substring Extract based on qualifier - ssis

I've looked through a few different post trying to find a solution for this. I have a column that contains descriptions that follow the following format:
String<Numeric>
However the column isn't limited to one set of the previous mentioned format it could be something like
UNI<01> JPG<84>
JPG<84> UNI<01>
JPG<84>
UNI<01>
And other variations without any controlled pattern.
What I am needing to do is extract the number between <> into a separate column in another table based on the string before the <>. So UNI would qualify the following numeric to go to a certain table.column, while JPG would qualify to another table etc. I have seen functions to extract the numeric but not qualifying and only pulling the numeric if it is prefaced with a given qualifier string.

Based on the scope limitation mentioned in the question's comments that only one type of token (Foo, Bar, Blat, etc.) needs to be found at a time: you could use an expression in a Derived Column to find the token of interest and then extract the value between the arrows.
For example:
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1) == 0)?
NULL(DT_WSTR, 1) :
SUBSTRING([InputColumn],
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1)
+ LEN(#[User::SearchToken]) + 1,
FINDSTRING(
SUBSTRING([InputColumn],
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1)
+ LEN(#[User::SearchToken]) + 1,
LEN([InputColumn])
), ">", 1) - 1
)
First, the expression checks whether the token specified in #[User::SearchToken] is used in the current row. If it is, SUBSTRING is used to output the value between the arrows. If not, NULL is returned.
The assumption is made that no token's name will end with text matching the name of another token. Searching for token Bar will match Bar<123> and FooBar<123>. Accommodating Bar and FooBar as distinct tokens is possible but the requisite expression will be much more complex.

You could use an asynchronous Script Component that outputs a row with type and value columns for each type<value> token contained in the input string. Pass the output of this component through a Conditional Split to direct each type to the correct destination (e.g. table).
Pro: This approach gives you the option of using one data flow to process all tag types simultaneously vs. requiring one data flow per tag type.
Con: A Script Component is involved, which it sounds like you'd prefer to avoid.
Sample Script Component Code
private readonly string pattern = #"(?<type>\w+)<(?<value>\d+)>";
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
foreach (Match match in Regex.Matches(Row.Data, pattern, RegexOptions.ExplicitCapture))
{
Output0Buffer.AddRow();
Output0Buffer.Type = match.Groups["type"].Value;
Output0Buffer.Value = match.Groups["value"].Value;
}
}
Note: Script Component will need an output created with two columns (perhaps named Type and Value) and then have the output's SynchronousInputID property set to None).

I ended up writing a CTE for a view to handle the data manipulation and then handled the joins and other data pieces in the SSIS package.
;WITH RCTE (Status_Code, lft, rgt, idx)
AS ( SELECT a.Status_code
,LEFT(a.Description, CASE WHEN CHARINDEX(' ', a.Description)=0 THEN LEN(a.Description) ELSE CHARINDEX(' ', a.Description)-1 END)
,SUBSTRING(a.Description, CASE WHEN CHARINDEX(' ', a.Description)=0 THEN LEN(a.Description) ELSE CHARINDEX(' ', a.Description)-1 END + 1, DATALENGTH(a.Description))
,0
FROM [disp] a WHERE NOT( Description IS NULL OR Description ='')
UNION ALL
SELECT r.Status_Code
,CASE WHEN CHARINDEX(' ', r.rgt) = 0 THEN r.rgt ELSE LEFT(r.rgt, CHARINDEX(' ', r.rgt) - 1) END
,CASE WHEN CHARINDEX(' ', r.rgt) > 0 THEN SUBSTRING(r.rgt, CHARINDEX(' ', r.rgt) + 1, DATALENGTH(r.rgt)) ELSE '' END
,idx + 1
FROM RCTE r
WHERE DATALENGTH(r.rgt) > 0
)
SELECT Status_Code
-- ,lft,rgt -- Uncomment to see whats going on
,SUBSTRING(lft,0, CHARINDEX('<',lft)) AS [Description]
,CASE WHEN ISNUMERIC(SUBSTRING(lft, CHARINDEX('<',lft)+1, LEN(lft)-CHARINDEX('<',lft)-1)) >0
THEN CAST (SUBSTRING(lft, CHARINDEX('<',lft)+1, LEN(lft)-CHARINDEX('<',lft)-1) AS INT) ELSE NULL END as Value
FROM RCTE
where lft <> ''

Related

MYSQL: Validate Input Values in Specific Format within a FUNCTION

i have one question with regard to MYSQL. I want to create a function that is able to check whether an Input is given in a specific format.
The output should be in the following shape:
***x x (a) n (n) (n) (n)
with :
x = letters and numbers
n = numbers
a = letters
brackets = optional Values***
my code is written below this.
CREATE FUNCTION validate_of_number(testnumber VARCHAR(7))
RETURNS INT
DETERMINISTIC
RETURN
CASE
WHEN
(SELECT * FROM flightexecution WHERE FlightNo REGEXP
'^[[:alnum:]+[:alnum:]] + [[:alpha:]|''] + [:digit:] +
[[:digit:]|''] + [[:digit:]|''] + [[:digit:]|'']') > 0
Then 1
Else 0
END;`
However, it does not work and i donĀ“t know why. The Output is just a 'OK' without any further information.
I'm assuming based on your description of valid values and the context of your question that the values you are trying to validate look something like CX727 or QF1566 or BA1 etc (i.e. IATA flight designator codes). In that case, this function will give you the results you want:
CREATE FUNCTION validate_of_number(testnumber VARCHAR(7))
RETURNS INT
DETERMINISTIC
RETURN testnumber REGEXP '^[[:alnum:]]{2}[[:alpha:]]?[[:digit:]]{1,4}$';
Examples:
SELECT validate_of_number('A1B4352')
, validate_of_number('QF12')
, validate_of_number('CX727')
, validate_of_number('AB14352')
, validate_of_number('BA1')
, validate_of_number('1C42')
Output
1, 1, 1, 0, 1, 1
Demo on dbfiddle
In terms of using it with your table you might use a query like
SELECT * FROM flightexecution WHERE validate_of_number(FlightNo)

Teradata Masking - Retain all chararcters at position 1,4,8,12,16 .... in a string and mask remaining characters with 'X'

I have a requirement where I need to mask all but characters in position 1,4,8,12,16.. for a variable length string with 'X'
For example:
Input string - 'John Doe'
Output String - 'JXXn xxE'
SPACE between the two strings must be retained.
Kindly help or reach out for more details if required.
I think maybe an external function would be best here, but if that's too much to bite off, you can get crafty with strtok_split_to_table, xml_agg and regexp_replace to rip the string apart, replace out characters using your criteria, and stitch it back together:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
FROM
(
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
) stringshred
GROUP BY outkey
This won't be fast on a large data set, but it might suffice depending on how much data you have to process.
Breaking this down:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
This CTE is just adding a comma between every character of our incoming string using that regexp_replace function. Your name will come out like J,o,h,n, ,D,o,e. You can ignore the sys_calendar part, I just put that in so it would spit out exactly 1 record for testing.
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
This subquery is the important bit. Here we create a record for every character in your incoming name. strtok_split_to_table is doing the work here splitting that incoming name by comma (which we added in the CTE)
The Case statement just runs your criteria swapping out 'X' in the correct positions (record 1, or a multiple of 4, and not a space).
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
Finally we use XMLAGG to combine the many records back into one string in a single record. Because XMLAGG adds a space in between each character we have to hit it a couple of times with regexp_replace to flip those spaces back to nothing.
So... it's ugly, but it does the job.
The code above spits out:
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
I couldn't think of a solution, but then #JNevill inspired me with his idea to add a comma to each character :-)
SELECT
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2)
,'(\([^ ])', 'X')
,'(\(|\[)')
,'this is a test of this functionality' AS inputString
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
The 1st RegExp_Replace starts at the 2nd character (keep the 1st character as-is) and processes groups of (up to) 4 characters adding either a ( (characters #1,#2,#4, to be replaced by X unless it's a space) or [ (character #3, no replacement), which results in :
t(h(i[s( (i(s[ (a( (t[e(s(t( [o(f( (t[h(i(s( [f(u(n(c[t(i(o(n[a(l(i(t[y(
Of course this assumes that both characters don't exists in your input data, otherwise you have to choose different ones.
The 2nd RegExp_Replace replaces the ( and the following character with X unless it's a space, which results in:
tXX[s( XX[ X( X[eXX( [oX( X[hXX( [fXXX[tXXX[aXXX[y(
Now there are some (& [ left which are removed by the 3rd RegExp_Replace.
As I still consider me as a beginner in Regular Expressions, there will be better solutions :-)
Edit:
In older Teradata versions not all parameters were optional, then you might have to add values for those:
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2, 0 'c')
,'(\([^ ])', 'X', 1, 0 'c')
,'(\(|\[)', '', 1, 0 'c')

SSRS 2008 R2 - all parameters multi value AND optional

I've got a report that has 6 parameters. All parameters need to be optional and 3 have to be multi-value. One of the optional parameters is a dropdown, the rest are manually keyed in text boxes.
The Where clause below works when there are multiple #VendorNum values and one #FullJA value, but fails with multiple #FullJA values regardless of the #VendorNum count.
Parameters:
#VendorNum - keyed manually by user (space delimited) - optional, can be multivalue
#FullJA - keyed manually by user (space delimited) - optional, can be multivalue
#BU - optional, can be multivalue - when #JA is populated, this will auto-populate, if #JA isn't populated it's a dropdown with all selected.
#JA3 - keyed by user - optional, single value
#StartDate and #EndDate - optional single values
select * from some_table
WHERE
/*FULL JA*/
(
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND LEFT(JA, 7) IN (#FullJA)
AND BU IN(#BU)
AND #JA3 IS NULL
)
OR
/*DATE RANGE*/
(
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND LEN(ISNULL(CONVERT(VARCHAR(20), Cleared_When), '0')) >= #ClearedOnly
AND ad.Audit_Publish_Date >= ISNULL(#StartDate, '2015-01-01')
AND ad.Audit_Publish_Date <= ISNULL(#EndDate, '2025-12-31')
AND BU IN (#BU)
AND #FullJA IS NULL
AND #JA3 IS NULL
)
/*BUS UNIT AND JA3*/
OR (
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND BU IN (#BU)
AND ad.Audit_Publish_Date >= ISNULL(#StartDate, '2015-01-01')
AND ad.Audit_Publish_Date <= ISNULL(#EndDate, '2025-12-31')
AND LEFT(JA, 3) = (#JA3)
AND #FullJA IS NULL
)
/*BUS UNIT ONLY*/
OR (
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND BU IN (#BU)
AND ad.Audit_Publish_Date >= ISNULL(#StartDate, '2015-01-01')
AND ad.Audit_Publish_Date <= ISNULL(#EndDate, '2025-12-31')
AND #JA3 IS NULL
AND #FullJA IS NULL
)
The dataset parameter values for #FullJA and #VendorNum are both
=IIF(InStr(Parameters!FullJA.Value," ")>0,SPLIT(Parameters!FullJA.Value," "),Parameters!FullJA.Value) and all params are set as NOT multivalue, with nulls allowed.
Any help would be greatly appreciated. I've written over 200 reports for this project and this is the only one that is really grinding my gears!
Thanks!
I would approach this by building up some temp tables / table variables, to hold the potentially multi-valued variables, and then joining to those tables. This has the advantage of you being able to insert all possible values, in the case they have omitted the variable. So, you'd split your strings and put them into those tables (something along the lines of this example) if given the variable, and otherwise just do an insert into to populate your temp table / table variable.
For a split function, I prefer something like this:
create FUNCTION [dbo].[Split] (#sep VARCHAR(32), #s VARCHAR(MAX))
RETURNS TABLE
AS
RETURN
(
SELECT r.value('.','VARCHAR(MAX)') as Item
FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(REPLACE(REPLACE(#s,'& ','& '),'<','<'), #sep, '</r><r>') + '</r></root>') as valxml) x
CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
)
GO
GRANT SELECT
ON OBJECT::[dbo].[Split] TO PUBLIC
AS [dbo];
I would then put those variables into a table using something like this (my separator is a ", "):
select ltrim(rtrim(ppl.Item)) as PersonName
into #gppl
from dbo.Split(', ', #PersonListForCompare) as ppl
You would do something more like:
select ltrim(rtrim(vnd.Item)) as VendorNum
into #vendorNums
from dbo.Split(', ', #VendorNum) as vnd
You would then join to that temp table just like any other table & use it to limit your results that way. In your case, you want to put in all vendors (possibly) if they didn't give you any input. So, you'd do something like:
create table #vendorNums (VendorName varchar(64)) --I have no idea, here, what this data looks like
if #VendorNum is not null and datalength(#VendorNum) > 0
insert into into #vendorNums (VendorNum)
select ltrim(rtrim(vnd.Item))
from dbo.Split(', ', #VendorNum) as vnd
else
insert into into #vendorNums (VendorNum)
select VendorNum
from dbo.Vendors
That said, I think that you could use your select from dbo.Split directly as a table in a join, rather than putting it into the temp table. Only problem would be you'd have to be sure you had data in there to split, or else you're going to have a bunch of combinations to get the right match-up of null parameters vs. filled ones.

mySQL extract number or string depending on value

Please take a look at a table below:
I need "the last part" of "what I have" to be number if it's all number and string if it contains character.
In Excel I've achieved this with the following function (as shown above):
=IFERROR(VALUE(TEXT(D2;"0"));TEXT(D2;"0"))
However I want to do this in mySQL in order to compute more effectively.
I've "floated" somwhere around CASE with CAST or CONVERT and also TRIM functions, but I haven't been able to put up something sensical.
A nice "bonus" would be to extract "the part part" by looking for the last "" character (so first "" from right of the string) but no idea at all how to achieve that.
Use SUBSTRING_INDEX.
Query
SELECT
CASE
WHEN SUBSTRING_INDEX(records.data, '_', -1) > 1 # is int check '00004949' returns 1
THEN SUBSTRING_INDEX(records.data, '_', -1) + 0 # converts '00004949' to 4949
ELSE SUBSTRING_INDEX(records.data, '_', -1)
END
AS word
FROM (
SELECT "TRA_PL_NWL_EMA_NWLY_DAI_000_20170610_IN1_01P002bc" AS DATA
UNION
SELECT "TRA_PL_NWL_EMA_NWLY_DAI_000_2017_0909_JET_00004949" AS DATA
) records
Result
word
----------
01P002bc
4949
Following query will somewhat achieve the task:
SELECT
case
when SUBSTRING_INDEX(value, "_", -1) REGEXP('(^[0-9]+$)')
then Trim(Leading 0 from SUBSTRING_INDEX(value, "_", -1))
else SUBSTRING_INDEX(value, "_", -1)
end as Value
From yourtable;
Click here for Demo
Hope it helps!
I hope this is what u want.
At least it does what you have asked for.
SELECT
CASE
WHEN (
CONVERT(
substring(
txt,
LENGTH(txt) - LOCATE('_', REVERSE(txt))+2,
length(txt)
)
, signed integer
)
) = 0
THEN substring(
txt,
LENGTH(txt) - LOCATE('_', REVERSE(txt))+2,
length(txt)
)
ELSE CONVERT(
substring(
txt,
LENGTH(txt) - LOCATE('_', REVERSE(txt))+2,
length(txt)
)
, signed integer
)
END as NUMBER
from test.test
This is my test Table and result of SQL:
txt NUMBER
DA_DA_ADAD_ADAD_ADAD_asd123 asd123
DA_DA_ADAD_ADAD_ADAD_000123 123
DA_DA_ADAD_ADAD_ADAD_444 444
DA_DA_ADAD_ADAD_ADAD_bsd123 bsd123
DA_DA_ADAD_ADAD_ADAD_csd123 csd123
DA_DA_ADAD_ADAD_ADAD_dsd123 dsd123

MySQL Correct title/proper case for more than one string in a field

I have a database which has a field titled 'address1'. If there is only 1 string in this field for a record, I am able to correct the case from eg 'PAULSTOWN' to 'Paulstown', or 'bishopslough' to 'Bishopslough'.
I have done this by creating a function:
CREATE FUNCTION init_cap (s VARCHAR(255))
RETURNS VARCHAR(255) DETERMINISTIC
RETURN CONCAT( UPPER( SUBSTRING( s, 1, 1 ) ) , LOWER( SUBSTRING( s FROM 2 ) ) );
Then using:
UPDATE customer SET address1 = init_cap(address1);
To correct records.
However, this does not fully correct records that contain more than one string, eg 'dalesfort road' will only be corrected to 'Dalesfort road' and not 'Dalesfort Road'. There are also some entries with more than 2 strings.
How could I change the above function to cater for 2 or more strings? Also is that function declared correctly, or should I be using begin and end sections?
It's ok I found the answer at artfulsoftware.com
Now I just need to analyse the code and learn how it works!