Why is the integer output quoted in MySQL custom variable

Why is the integer output quoted in MySQL custom variable - mysql

SELECT score ,TRIM(BOTH "" FROM ink) as `rank` FROM (
SELECT
score,
#r :=
IF (#p = score, #r, #r + 1) AS ink,
#p := score
FROM
scores,
(SELECT #r := 0, #p := NULL) init
ORDER BY
score DESC)dep
print{"headers": ["score", "rank"], "values": [[4.00, "1"], [4.00, "1"], [3.85, "2"], [3.65, "3"], [3.65, "3"], [3.50, "4"]]}
79/5000
I made a custom integer variable in MySQL. After I output, there were more quotes than the title, and then I used a function to remove the quotes, but the output result was still there. This situation makes me confused, I hope someone can answer it. Thank you.

TRIM() output datatype is string. So double quote chars are not a part of the value. They are added by PHP while dumping the value for to mark that the datatype is string.
If you want to obtain the output column of numeric datatype (and in this case PHP will not add dquotes) use ink + 0 or CAST(ink AS UNSIGNED).
DEMO fiddle. Pay attention to the value in column adjustment - strings are left-justified, numbers are right-justified.
PS. If you alter initial value for #p from NULL to some value which cannot be present in score (for example, -1, '' and so on) then #r := IF(#p = score, #r, #r + 1) may be simplified to #r := #r + (#p = score).
--
I was wondering why I defined #r=0, shouldn't it be an integer, but the output value is in quotes – johnson
This is user-defined variables processing feature.
Server cannot predict the value of what datatype will be assigned to the variable in future (in your query - while next row will be processing). So it uses the most common datatype for user-defined variable, the datatype which can accept the value of absolutely any datatype. And this the most common datatype is binary string (more precisely - LONGBLOB).
Additionally - all user-defined variables values are stored in PERFORMANCE_SCHEMA.user_variables_by_thread table. And according column datatype is LONGBLOB.

Related

How secure is format() for dynamic queries inside a function?

After reading Postgres manual and many posts here, I wrote this function tacking in mind all I found regarding security. It works great and does everything I looked for.
takes a json, each key has an array [visible, filter, arg1, optional arg2]
SELECT public.goods__list_json2('{"name": [true, 7, "Ad%"], "category": [true], "stock": [false, 4, 0]}', 20, 0);
returns a json array with requested data.
[{"name": "Adventures of TRON", "category": "Atari 2600"}, {"name": "Adventure", "category": "Atari 2600"}]
My question is, how could I really be sure that when I create the query using user input arguments, passing them as %L with format is injection safe?
By my db design, all is done through functions, running most of them as security definer only allowing certain roles to execute them.
Being secure, my intention is to convert old functions to this dynamic logic and save myself to write a lot of lines of code creating new or specific queries.
I would really appreciate a experienced Postgres developer could give me an advice on this.
I'm using Postgres 13.
CREATE FUNCTION public.goods__list_json (IN option__j jsonb, IN limit__i integer, IN offset__i integer)
RETURNS jsonb
LANGUAGE plpgsql
VOLATILE
STRICT
SECURITY DEFINER
COST 1
AS $$
DECLARE
table__v varchar := 'public.goods_full';
column__v varchar[] := ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'];
filter__v varchar[] := ARRAY['<', '>', '<=', '>=', '=', '<>', 'LIKE', 'NOT LIKE', 'ILIKE', 'NOT ILIKE', 'BETWEEN', 'NOT BETWEEN'];
select__v varchar[];
where__v varchar[];
sql__v varchar;
key__v varchar;
format__v varchar;
temp__v varchar;
temp__i integer;
betw__v varchar;
result__j jsonb;
BEGIN
FOR key__v IN SELECT jsonb_object_keys(option__j) LOOP
IF key__v = ANY(column__v) THEN
IF (option__j->key__v->0)::bool THEN
select__v := array_append(select__v, key__v);
END IF;
temp__i := (option__j->key__v->1)::int;
IF temp__i > 0 AND temp__i <= array_length(filter__v, 1) THEN
temp__v := (option__j->key__v->>2)::varchar;
IF temp__i >= 11 THEN
betw__v := (option__j->key__v->>3)::varchar;
format__v := format('%I %s %L AND %L', key__v, filter__v[temp__i], temp__v, betw__v);
ELSE
format__v := format('%I %s %L', key__v, filter__v[temp__i], temp__v);
END IF;
where__v := array_append(where__v, format__v);
END IF;
END IF;
END LOOP;
sql__v := 'SELECT jsonb_agg(t) FROM (SELECT '
|| array_to_string(select__v, ', ')
|| format(' FROM %s WHERE ', table__v)
|| array_to_string(where__v, ' AND ')
|| format(' OFFSET %L LIMIT %L', offset__i, limit__i)
|| ') t';
RAISE NOTICE 'SQL: %', sql__v;
EXECUTE sql__v INTO result__j;
RETURN result__j;
END;
$$;

A word of warning: this style with dynamic SQL in SECURITY DEFINER functions can be elegant and convenient. But don't overuse it. Do not nest multiple levels of functions this way:
The style is much more error prone than plain SQL.
The context switch with SECURITY DEFINER has a price tag.
Dynamic SQL with EXECUTE cannot save and reuse query plans.
No "function inlining".
And I'd rather not use it for big queries on big tables at all. The added sophistication can be a performance barrier. Like: parallelism is disabled for query plans this way.
That said, your function looks good, I see no way for SQL injection. format() is proven good to concatenate and quote values and identifiers for dynamic SQL. On the contrary, you might remove some redundancy to make it cheaper.
Function parameters offset__i and limit__i are integer. SQL injection is impossible through integer numbers, there is really no need to quote them (even though SQL allows quoted string constants for LIMIT and OFFSET). So just:
format(' OFFSET %s LIMIT %s', offset__i, limit__i)
Also, after verifying that each key__v is among your legal column names - and while those are all legal, unquoted column names - there is no need to run it through %I. Can just be %s
I'd rather use text instead of varchar. Not a big deal, but text is the "preferred" string type.
Related:
Format specifier for integer variables in format() for EXECUTE?
Function to return dynamic set of columns for given table
COST 1 seems too low. The manual:
COST execution_cost
A positive number giving the estimated execution cost for the
function, in units of cpu_operator_cost. If the function
returns a set, this is the cost per returned row. If the cost is not
specified, 1 unit is assumed for C-language and internal functions,
and 100 units for functions in all other languages. Larger values
cause the planner to try to avoid evaluating the function more often
than necessary.
Unless you know better, leave COST at its default 100.
Single set-based operation instead of all the looping
The whole looping can be replaced with a single SELECT statement. Should be noticeably faster. Assignments are comparatively expensive in PL/pgSQL. Like this:
CREATE OR REPLACE FUNCTION goods__list_json (_options json, _limit int = NULL, _offset int = NULL, OUT _result jsonb)
RETURNS jsonb
LANGUAGE plpgsql SECURITY DEFINER AS
$func$
DECLARE
_tbl CONSTANT text := 'public.goods_full';
_cols CONSTANT text[] := '{id, id__category, category, name, barcode, price, stock, sale, purchase}';
_oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN"}';
_sql text;
BEGIN
SELECT concat('SELECT jsonb_agg(t) FROM ('
, 'SELECT ' || string_agg(t.col, ', ' ORDER BY ord) FILTER (WHERE t.arr->>0 = 'true')
-- ORDER BY to preserve order of objects in input
, ' FROM ' || _tbl
, ' WHERE ' || string_agg (
CASE WHEN (t.arr->>1)::int BETWEEN 1 AND 10 THEN
format('%s %s %L' , t.col, _oper[(arr->>1)::int], t.arr->>2)
WHEN (t.arr->>1)::int BETWEEN 11 AND 12 THEN
format('%s %s %L AND %L', t.col, _oper[(arr->>1)::int], t.arr->>2, t.arr->>3)
-- ELSE NULL -- = default - or raise exception for illegal operator index?
END
, ' AND ' ORDER BY ord) -- ORDER BY only cosmetic
, ' OFFSET ' || _offset -- SQLi-safe, no quotes required
, ' LIMIT ' || _limit -- SQLi-safe, no quotes required
, ') t'
)
FROM json_each(_options) WITH ORDINALITY t(col, arr, ord)
WHERE t.col = ANY(_cols) -- only allowed column names - or raise exception for illegal column?
INTO _sql;
IF _sql IS NULL THEN
RAISE EXCEPTION 'Invalid input resulted in empty SQL string! Input: %', _options;
END IF;
RAISE NOTICE 'SQL: %', _sql;
EXECUTE _sql INTO _result;
END
$func$;
db<>fiddle here
Shorter, faster and still safe against SQLi.
Quotes are only added where necessary for syntax or to defend against SQL injection. Burns down to filter values only. Column names and operators are verified against the hard-wired list of allowed options.
Input is json instead of jsonb. Order of objects is preserved in json, so you can determine the sequence of columns in the SELECT list (which is meaningful) and WHERE conditions (which is purely cosmetic). The function observes both now.
Output _result is still jsonb. Using an OUT parameter instead of the variable. That's totally optional, just for convenience. (No explicit RETURN statement required.)
Note the strategic use of concat() to silently ignore NULL and the concatenation operator || so that NULL makes the concatenated string NULL. This way, FROM, WHERE, LIMIT, and OFFSET are only inserted where needed. A SELECT statement works without either of those. An empty SELECT list (also legal, but I suppose unwanted) results in a syntax error. All intended.
Using format() only for WHERE filters, for convenience and to quote values. See:
String concatenation using operator "||" or format() function
The function isn't STRICT anymore. _limit and _offset have default value NULL, so only the first parameter _options is required. _limit and _offset can be NULL or omitted, then each is stripped from the statement.
Using text instead of varchar.
Made constant variables actually CONSTANT (mostly for documentation).
Other than that the function does what your original does.

I tried to put all that I learned here and I came up with this below and new questions =D.
Is there any advantage declaring _oper this way '{LIKE, "NOT LIKE"}' instead of ARRAY['LIKE', 'NOT LIKE']?
Casting as int _limit and _offset, I'm assuming no SQLi, right?
Is it an elegant way for 'IN' and 'NOT IN' CASE? I wonder why string_agg() is allowed nested in concat() but not there where I needed to use a sub query.
This is a naive private function.
Edit: Removed "SECURITY DEFINER" as identified dangerous.
CREATE FUNCTION public.crud__select (IN _tbl text, IN _cols text[], IN _opts json, OUT _data jsonb)
LANGUAGE plpgsql STRICT AS
$$
DECLARE
_oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN", IN, "NOT IN"}';
BEGIN
EXECUTE (
SELECT concat('SELECT jsonb_agg(t) FROM ('
, 'SELECT ' || string_agg(e.col, ', ' ORDER BY ord) FILTER (WHERE e.arr->>0 = 'true')
, ' FROM ', _tbl
, ' WHERE ' || string_agg(
CASE
WHEN (e.arr->>1)::int BETWEEN 1 AND 10 THEN
format('%s %s %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2)
WHEN (e.arr->>1)::int BETWEEN 11 AND 12 THEN
format('%s %s %L AND %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2, e.arr->>3)
WHEN (e.arr->>1)::int BETWEEN 13 AND 14 THEN
format('%s %s (%s)', e.col, _oper[(e.arr->>1)::int], (
SELECT string_agg(format('%L', ee), ',') FROM json_array_elements_text(e.arr->2) ee)
)
END, ' AND ')
, ' OFFSET ' || (_opts->>'_offset')::int
, ' LIMIT ' || (_opts->>'_limit')::int
, ') t'
)
FROM json_each(_opts) WITH ORDINALITY e(col, arr, ord)
WHERE e.col = ANY(_cols)
) INTO _data;
END;
$$;
Then for table or view, I create wrapper function executable for some roles.
CREATE FUNCTION public.goods__select (IN _opts json, OUT _data jsonb)
LANGUAGE sql STRICT SECURITY DEFINER AS
$$
SELECT public.crud__select(
'public.goods_full',
ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'],
_opts
);
$$;
SELECT public.goods__select('{"_limit": 10, "name": [true, 9, "a%"], "id__category": [true, 13, [1, 2]], "category": [true]}'::json);
[{"name": "Atlantis II", "category": "Atari 2600", "id__category": 1}, .. , {"name": "Amidar", "category": "Atari 2600", "id__category": 1}]

Teradata Masking - Retain all chararcters at position 1,4,8,12,16 .... in a string and mask remaining characters with 'X'

I have a requirement where I need to mask all but characters in position 1,4,8,12,16.. for a variable length string with 'X'
For example:
Input string - 'John Doe'
Output String - 'JXXn xxE'
SPACE between the two strings must be retained.
Kindly help or reach out for more details if required.

I think maybe an external function would be best here, but if that's too much to bite off, you can get crafty with strtok_split_to_table, xml_agg and regexp_replace to rip the string apart, replace out characters using your criteria, and stitch it back together:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
FROM
(
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
) stringshred
GROUP BY outkey
This won't be fast on a large data set, but it might suffice depending on how much data you have to process.
Breaking this down:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
This CTE is just adding a comma between every character of our incoming string using that regexp_replace function. Your name will come out like J,o,h,n, ,D,o,e. You can ignore the sys_calendar part, I just put that in so it would spit out exactly 1 record for testing.
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
This subquery is the important bit. Here we create a record for every character in your incoming name. strtok_split_to_table is doing the work here splitting that incoming name by comma (which we added in the CTE)
The Case statement just runs your criteria swapping out 'X' in the correct positions (record 1, or a multiple of 4, and not a space).
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
Finally we use XMLAGG to combine the many records back into one string in a single record. Because XMLAGG adds a space in between each character we have to hit it a couple of times with regexp_replace to flip those spaces back to nothing.
So... it's ugly, but it does the job.
The code above spits out:
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy

I couldn't think of a solution, but then #JNevill inspired me with his idea to add a comma to each character :-)
SELECT
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2)
,'(\([^ ])', 'X')
,'(\(|\[)')
,'this is a test of this functionality' AS inputString
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
The 1st RegExp_Replace starts at the 2nd character (keep the 1st character as-is) and processes groups of (up to) 4 characters adding either a ( (characters #1,#2,#4, to be replaced by X unless it's a space) or [ (character #3, no replacement), which results in :
t(h(i[s( (i(s[ (a( (t[e(s(t( [o(f( (t[h(i(s( [f(u(n(c[t(i(o(n[a(l(i(t[y(
Of course this assumes that both characters don't exists in your input data, otherwise you have to choose different ones.
The 2nd RegExp_Replace replaces the ( and the following character with X unless it's a space, which results in:
tXX[s( XX[ X( X[eXX( [oX( X[hXX( [fXXX[tXXX[aXXX[y(
Now there are some (& [ left which are removed by the 3rd RegExp_Replace.
As I still consider me as a beginner in Regular Expressions, there will be better solutions :-)
Edit:
In older Teradata versions not all parameters were optional, then you might have to add values for those:
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2, 0 'c')
,'(\([^ ])', 'X', 1, 0 'c')
,'(\(|\[)', '', 1, 0 'c')

Generate, then order random numbers in SQL

So I'm using SQL to build a database of math questions.
My typical query might look something like this:
SELECT #num1 := FLOOR(Rand()*8 + 2),#num2 := FLOOR(Rand()*8 + 2);
INSERT INTO `QuestionDB`(TopicID, TopicName, SubtopicID, SubtopicName, Question, Answer, Difficulty, Author, Projectable) VALUES (3,"Multiplying",1,"Multiplying single digit numbers",CONCAT(#num1, " × ", #num2 ,"= "),#num1*#num2,1,"Me","Yes");
I'm using PHPMyAdmin to do this and build up my database.
This works fine. However, what I would like to do is this, so that I can order numbers ie:
SELECT #num1 := FLOOR(Rand()),#num2 := FLOOR(Rand()),#num3 := FLOOR(Rand()),#num4 := FLOOR(Rand());
INSERT INTO `QuestionDB`(TopicID, TopicName, SubtopicID, SubtopicName, Question, Answer, Difficulty, Author, Projectable) VALUES (7,"Ordering",1,"Ordering whole numbers",CONCAT("<span class='smaller'>Order, from highest to lowest</span><br>", #num1, " , ", #num2, etc),ORDER_ASS(#num1,#num2,#num3,#num4),1,"Me","Yes");
However, obviously ORDER_ASS isn't a valid SQL function. Is there an SQL function that will do this?

Ordering an existing database by numbers in a string

This one is a bit of a nightmare. I'm working on frontend for an existing database,and I'm having to jump through hoops to make sure that data is displayed in the correct order. It'd make my life a whole lot simpler if I could just order by Id, but the Ids have little or no correlation to the data.
Here's what I mean
ID DATA
357 "7-1-5: Sensitive Information I can't share"
2521 "30-2-8-17: Yet more sensitive Information"
6002 "9-30: There's a 10 behind the colon, because I hate you"
8999 "2-2-4: This was populated in no particular order"
9001 "30-3: More Info."
I'm trying to get it ordered like this
ID DATA
0001 "2-2-4: This was populated in no particular order"
0002 "7-1-5: Sensitive Information I can't share"
0003 "9-30: There's a 10 behind the colon, because I hate you"
0004 "30-2-8-17: Yet more sensitive Information"
0005 "30-3: More Info."
Basically, I need it to sort by each 1 to 2 digit number that's separated by dashes, again and again, so that 1-3 comes after 1-2-1, which comes after 1-1-50.
Like I said in the beginning, I'm a frontend guy, so executing stuff in MySql is more than I can do alone. Any help would be immensely appreciated.
Edit: I just realized there's foreign keys in a separate table pointing to this one, making things just that much worse.

Try this query:
SELECT col
FROM yourTable
ORDER BY SUBSTRING(col, INSTR(col, '"') + 1, INSTR(col, ':') - INSTR(col, '"') - 1)
The SUBSTRING(...) term in the ORDER BY clause extracts just the ids from the text. Presumably you want them to sorted numerically, from left to right. Even though they are varchar, numerical sorting should still work.
For your sample data, this produced the following output:
ID 8999 DATA "2-2-4: This was populated in no particular order"
ID 2521 DATA "30-2-8-17: Yet more sensitive Information"
ID 357 DATA "7-1-5: Sensitive Information I can't share"
ID 6002 DATA "9-30: There's a 10 behind the colon, because I hate you"
Fiddle is down as of the writing of this answer, but I tested the query in MySQL Workbench and it seems to work well.
Edit:
If you want to assign a new ID to each record, you create a new table (newTable) with an ID column which is auto increment. Then you can use INSERT INTO ... SELECT along with the above ORDER BY logic to populate the table. The ID field should be incremented automatically by MySQL.
INSERT INTO newTable (`id`, `col`)
SELECT NULL, col
FROM yourTable
ORDER BY SUBSTRING(col, INSTR(col, '"') + 1, INSTR(col, ':') - INSTR(col, '"') - 1)

Something like this should work, but it is very delicate; all the fields calculated in the outer SELECT (after the *) must be performed in that exact order. Note that the calculations aliased nl#, p#, and r# (except r0) repeat exactly... so the query is not as complicated as it initially appears.
SELECT *
, #r := dataOrd AS r0 -- #r is "remaining string"
, #nextSep := INSTR(#r, '-') AS nl1
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p1
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r1
, #nextSep := INSTR(#r, '-') AS nl2
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p2
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r2
, #nextSep := INSTR(#r, '-') AS nl3
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p3
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r3
, #nextSep := INSTR(#r, '-') AS nl4
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p4
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r4
FROM
(
SELECT *, SUBSTR(`DATA`, 1, INSTR(`DATA`, ':') - 1) AS dataOrd
FROM yourTable
) AS sepSubQ
ORDER BY p1, p2, p3, p4
;
Technically, the last #r assignment (aliased r4) is unnecessary, but it completes the pattern that will be need to be repeated if you need to handle more than 4 ordering "parts"; in which case you just need to repeat the last three field calculations (with incremented aliases).
If you want to be rid of the "working" fields, you can wrap this in another outer query only selects the fields from the original table you wanted and the pX fields from the above query; technically, you don't even need to select the pX fields as the order will already be performed by this query, or can be done in the wrapper without selecting them.
SELECT `ID`, `DATA`
FROM ([the query above]) AS subQ
ORDER BY p1, p2, p3, p4
;

SSIS Substring Extract based on qualifier

I've looked through a few different post trying to find a solution for this. I have a column that contains descriptions that follow the following format:
String<Numeric>
However the column isn't limited to one set of the previous mentioned format it could be something like
UNI<01> JPG<84>
JPG<84> UNI<01>
JPG<84>
UNI<01>
And other variations without any controlled pattern.
What I am needing to do is extract the number between <> into a separate column in another table based on the string before the <>. So UNI would qualify the following numeric to go to a certain table.column, while JPG would qualify to another table etc. I have seen functions to extract the numeric but not qualifying and only pulling the numeric if it is prefaced with a given qualifier string.

Based on the scope limitation mentioned in the question's comments that only one type of token (Foo, Bar, Blat, etc.) needs to be found at a time: you could use an expression in a Derived Column to find the token of interest and then extract the value between the arrows.
For example:
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1) == 0)?
NULL(DT_WSTR, 1) :
SUBSTRING([InputColumn],
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1)
+ LEN(#[User::SearchToken]) + 1,
FINDSTRING(
SUBSTRING([InputColumn],
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1)
+ LEN(#[User::SearchToken]) + 1,
LEN([InputColumn])
), ">", 1) - 1
)
First, the expression checks whether the token specified in #[User::SearchToken] is used in the current row. If it is, SUBSTRING is used to output the value between the arrows. If not, NULL is returned.
The assumption is made that no token's name will end with text matching the name of another token. Searching for token Bar will match Bar<123> and FooBar<123>. Accommodating Bar and FooBar as distinct tokens is possible but the requisite expression will be much more complex.

You could use an asynchronous Script Component that outputs a row with type and value columns for each type<value> token contained in the input string. Pass the output of this component through a Conditional Split to direct each type to the correct destination (e.g. table).
Pro: This approach gives you the option of using one data flow to process all tag types simultaneously vs. requiring one data flow per tag type.
Con: A Script Component is involved, which it sounds like you'd prefer to avoid.
Sample Script Component Code
private readonly string pattern = #"(?<type>\w+)<(?<value>\d+)>";
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
foreach (Match match in Regex.Matches(Row.Data, pattern, RegexOptions.ExplicitCapture))
{
Output0Buffer.AddRow();
Output0Buffer.Type = match.Groups["type"].Value;
Output0Buffer.Value = match.Groups["value"].Value;
}
}
Note: Script Component will need an output created with two columns (perhaps named Type and Value) and then have the output's SynchronousInputID property set to None).

I ended up writing a CTE for a view to handle the data manipulation and then handled the joins and other data pieces in the SSIS package.
;WITH RCTE (Status_Code, lft, rgt, idx)
AS ( SELECT a.Status_code
,LEFT(a.Description, CASE WHEN CHARINDEX(' ', a.Description)=0 THEN LEN(a.Description) ELSE CHARINDEX(' ', a.Description)-1 END)
,SUBSTRING(a.Description, CASE WHEN CHARINDEX(' ', a.Description)=0 THEN LEN(a.Description) ELSE CHARINDEX(' ', a.Description)-1 END + 1, DATALENGTH(a.Description))
,0
FROM [disp] a WHERE NOT( Description IS NULL OR Description ='')
UNION ALL
SELECT r.Status_Code
,CASE WHEN CHARINDEX(' ', r.rgt) = 0 THEN r.rgt ELSE LEFT(r.rgt, CHARINDEX(' ', r.rgt) - 1) END
,CASE WHEN CHARINDEX(' ', r.rgt) > 0 THEN SUBSTRING(r.rgt, CHARINDEX(' ', r.rgt) + 1, DATALENGTH(r.rgt)) ELSE '' END
,idx + 1
FROM RCTE r
WHERE DATALENGTH(r.rgt) > 0
)
SELECT Status_Code
-- ,lft,rgt -- Uncomment to see whats going on
,SUBSTRING(lft,0, CHARINDEX('<',lft)) AS [Description]
,CASE WHEN ISNUMERIC(SUBSTRING(lft, CHARINDEX('<',lft)+1, LEN(lft)-CHARINDEX('<',lft)-1)) >0
THEN CAST (SUBSTRING(lft, CHARINDEX('<',lft)+1, LEN(lft)-CHARINDEX('<',lft)-1) AS INT) ELSE NULL END as Value
FROM RCTE
where lft <> ''

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why is the integer output quoted in MySQL custom variable - mysql

Related

How secure is format() for dynamic queries inside a function?

Teradata Masking - Retain all chararcters at position 1,4,8,12,16 .... in a string and mask remaining characters with 'X'

Generate, then order random numbers in SQL

Ordering an existing database by numbers in a string

SSIS Substring Extract based on qualifier

Categories

Resources