NetSuite Saved Search: REGEXP_SUBSTR Pattern troubles - mysql

I am trying to break down a string that looks like this:
|5~13~3.750~159.75~66.563~P20~~~~Bundle A~~|
Here is a second example for reference:
|106~10~0~120~1060.000~~~~~~~|
Here is a third example of a static sized item:
|3~~~~~~~~~~~5:12|
Example 4:
|3~23~5~281~70.250~upper r~~~~~~|
|8~22~6~270~180.000~center~~~~~~|
|16~22~1~265~353.333~center~~~~~~|
Sometimes there are multiple lines in the same string.
I am not super familiar with setting up patterns for regexp_substr and would love some assistance with this!
The string will always have '|' at the beginning and end and 11 '~'s used to separate the numeric/text values which I am hoping to obtain. Also some of the numeric characters have decimals while others do not. If it helps the values are separated like so:
|Quantity~ Feet~ Inch~ Unit inches~ Total feet~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|
As you can see, if there isn't something specified it shows as blank, but it may have them in another string, its rare for all of the values to have data.
For this specific case I believe regexp_substr will be my best option but if someone has another suggestion I'd be happy to give it a shot!
This is the formula(Text) I was able to come up with so far:
REGEXP_SUBSTR({custbody_msm_cut_list},'[[:alnum:]. ]+|$',1,1)
This allows me to pull all the matches held in the strings, but if some fields are excluded it makes presenting the correct data difficult.

TRIM(REGEXP_SUBSTR({custbody_msm_cut_list}, '^\|(([^~]*)~){1}',1,1,'i',2))
From the start of the string, match the pipe character |, then match anything except a tilde ~, then match the tilde. Repeat N times {1}. Return the last of these repeats.
You can control how many tildes are processed by the integer in the braces {1}
EG:
TRIM(REGEXP_SUBSTR('|Quantity~ Feet~ Inch~ Unit inches~ Total feet~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|', '^\|(([^~]*)~){1}',1,1,'i',2))
returns "Quantity"
TRIM(REGEXP_SUBSTR('|Quantity~ Feet~ Inch~~~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|', '^\|(([^~]*)~){7}',1,1,'i',2))
returns "Punch Pattern"
The final value Pitch is a slightly special case as it is not followed by a tilde:
TRIM(REGEXP_SUBSTR('|~~~~~~~~~~ Radius~ Pitch|', '^\|(([^~]*)~){11}([^\|]*)',1,1,'i',3))
Adapted and improved from https://stackoverflow.com/a/70264782/7885772

Related

Capture a value from a repeating group on every iteration (as opposed to just last occurrence)

How does one capture a value recursively with regex, where value is a part of a group that repeats?
I have a serialized array in mysql database
These are 3 examples of a serialized array
a:2:{i:0;s:2:"OR";i:1;s:2:"WA";}
a:1:{i:0;s:2:"CA";}
a:4:{i:0;s:2:"CA";i:1;s:2:"ID";i:2;s:2:"OR";i:3;s:2:"WA";}
a:1 stands for array:{number of elements}
then in between {} i:0 means element 0, i:1 means element 1 etc.
then the actual value s:2:"CA" means string with length of 2
so I have 2 elements in first array, 1 element in the second and 4 elements in the last
I have this data in mysql database and I DO NOT HAVE an option to parse this with back-end code - this has to be done in mysql (10.0.23-MariaDB-log)
the repeating pattern is inside of the curly braces
the number of repeats is variable (as in 3 examples each has a different number of repeating patterns),
the number of repeating patterns is defined by the number at 3rd position (if that helps)
for the first example it's a:2:
and so there are 2 repeating blocks:
i:0;s:2:"OR";
i:1;s:2:"WA";
I only care to extract the values in bold
So I came up with this regex
^a:(?:\d+):\{(?:i:(?:\d+);s:(?:\d+):\"(\w\w)\";)+}$
it captures the values I want all right but problem is it only captures the last one in each repeating group
so going back to the example what would be captured is
WA
CA
WA
What I would want is
OR|WA
CA
CA|ID|OR|WA
these are the language specific regex functions available to me:
https://mariadb.com/kb/en/library/regular-expressions-functions/
I don't care which one is used to solve the problem
Ultimately I need this in as sensible form that can be presented to the client e.g. CA,ID,OR or CA|ID|OR
Current thoughts are perhaps this isn't possible in a one liner, and I have to write a multi-step function where
extract the repeating portion between the curly braces
then somehow iterate over each repeating portion
then use the regex on each
then return the results as one string with separated elements
I doubt if such a capture is possible. However, this would probably do the job for your specific purpose.
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(str1, '^a:\\d+:\{', ''),
'i:\\d+;s:\\d+:\"(\\w\\w)\";',
'\\1,'
),
'\,?\}$',
''
)
Basically, this works with the input string (or column) str1 like
remove the first part
replace every cell with the string you want
remove the last 2 characters, ,}
and voila! You get a string CA,ID,OR.
Aftenote
It may or may not work well when the original array before serialised is empty (it depends how it is serialised).

Get Unique String from a longer string when the unique string is at 2 different locations

In a web application that I am creating tests for, there are 2 sets of strings from which I wish to get a substring (which is unique) to use for identifying that element on the Web Page:
Parent Form:
InputText-eLeType-AQAAAAAAAAAAAAAAAAAAAVWZ-bMs-bms_9999999_3512-bMs-obj-bMsDot-com-bMsDot-bmssolutions-bMsDot-COMPONENT-bMsDot-bms_9999999_109-bMs-textField-bMs-ABNylGGXXu8IPwjI4jMM5y1K
SubForm:
InputText-eLeType-AQAAAAAAAAAAAAAAAAAAAVXJ-bMs-bms_FK_9999999_406_ID-bMs-obj-bMsDot-com-bMsDot-bmssolutions-bMsDot-COMPONENT-bMsDot-bms_9999999_177-bMs-searchLookupField-bMs-ABNylGGXXu8IPwjI4jMM5y1K-bMs-AQAAAAAAAAAAAAAAAAAAAVWZ-bMs-PRIMARY9999999_480-bMs-obj-bMsDot-com-bMsDot-bmssolutions-bMsDot-COMPONENT-bMsDot-bms_9999999_109
I wish to get the substring from both of these using a single function, so that I don't have to create a different functions for each type I encounter:
Substring in the above 2 provided strings is:
ABNylGGXXu8IPwjI4jMM5y1K
This substring can change for each element on the web page, but is unique for each element of the page and so useful to identify.
I cannot use the full string, as it changes for each environment or if I generate a new environment to host the web pages (the complete string depends on the Meta Data).
We tried doing it for the Parent Form, by using the "-" as the delimiter and identifying the last -bMs- and then taking the string, but that does not work for the SubForm.
So, my main question is, is there some RegEx that can be created to extract only that string (composed of alphabets [upper & lower case] and numbers) from the full string? Or some other simpler way to identify that string?
You could try a combination of positive Lookbehind, [A-Z] and [a-z]. Try this code:
(?<=-bMs-)[A-Z]{3}[a-z]\w+
Demo: https://regex101.com/r/YUZiFa/1
It seems to work without even the positive Lookbehind
[A-Z]{3}[a-z]\w+
Demo: https://regex101.com/r/YUZiFa/2
If you're happy to base the selection of the elements on the previous one, then this might work for you:
(?<=searchLookupField-bMs-|textField-bMs-)\w+
Example
And if you wanted to be extra certain, you could append a second lookahead to the end.
(?<=searchLookupField-bMs-|textField-bMs-)\w+(?=-bMs-|$)
Example
If these don't work, or if the whole string varies greatly, then some more examples would help us narrow it down and come up with a great answer!

i10n json regex different cases

So my company needs to send our i10n json file to a translator that can translate the thing into other languages.
Now the our system uses this file. Because of this we are able to make some "funky" statements that can be understood by our system but not by our translators when they extract the file.
For instance we have a case like this:
"CHOOSE": "{VALUE, select, 1{Vælg bruger} other{Fejl}}",
In the above example our system either takes Vælg bruger or Fejl
We also have something like this:
"HAS_MATERIAL": "Indeholder {{COUNT}} {{COUNT > 1 ? 'filer' : 'fil'}}",
Basicly the result of this would be Indeholder and if count is bigger than 1 filer else fil.
The last case we have is something like this:
"YOU_HAVE_NOTIFICATION": "You have { LENGTH } {LENGTH, select, 1{new notification} other{new notifications}}",
Again Length is a temp value and that then decides which translation to take.
So now its my job to make a regex for this file so we can get a list of all the words that need to be translated. and i am rather lost. the above 3 cases has different ways of approaching the wanted value.
i attempted with something like this:
{(.*?)}
With a global flag
However this doesnt work on all the cases.
Since there are some kind of "command language" (or two) involved this probably will fail at some point, but it handles your given examples:
{\w+,\s*select,\s*\w+\s*{([^}]*)}\s\w+\s*{([^}]*)}|{[^?{}]+\?\s*'([^']*)'\s*:\s*'([^']*)'\s*}]*}|^([^{}]+)|([^{}]+)$
It treats individual case one by one:
The SELECT statement
Inside braces, expect some expression followed by a ,, a command (in this case select), a ,, a case value and here we grab the text inside braces. Then expect some other case value and again - grab the text inside braces. I expect there will be case were there are more than two cases -> fail. (it can be expanded to handle more though)
Then the ternary operator
Inside braces, expect some expression followed by a ?, then grab the text inside single quotes**. Then expect a : and again - grab the text inside single quotes.
At the start of a line
grab all text up to {.
And end of line
grab anything after last }.
I guess this is far from complete. E.g. it won't handle text between "selects", and feels very fragile, but it might help you get started.
Check it out here at regex101.

How can I go about querying a database for non-similar, but almost matching items

How can I go about querying a database for items that are not only exactly similar to a sample, but also those that are almost similar? Almost as search engines work, but only for a small project, preferably in Java. For example:
String sample = "Sample";
I would like to retrieve all the following whenever I query sample:
String exactMatch = "Sample";
String nonExactMatch = "S amp le";
String nonExactMatch_2 = "ampls";
You need to define what similar means in terms that your database can understand.
Some possibilities include Levenshtein distance, for example.
In your example, sample matches...
..."Sample", if you search without case sensitivity.
..."S amp le", if you remove a set of ignored characters (here space only) from both the query string and the target string. You can store the new value in the database:
ActualValue SearchFor
John Q. Smith johnqsmith%
When someone searches for "John Q. Smith, Esq." you can boil it down to johnqsmithesq and run
WHERE 'johnqsmithesq' LIKE SearchFor
"ampls" is more tricky. Why is it that 'ampls' is matched by 'sample'? A common substring? A number of shared letters? Does their order count (i.e. are anagrams valid)? Many approaches are possible, but it is you who must decide. You might use Levenshtein distance, or maybe store a string such as "100020010003..." where every digit encodes the number of letters you have, up to 9 (so 3 C's and 2 B's but no A's would give "023...") and then run the Levenshtein distance between this syndrome and the one from each term in the DB:
ActualValue Search1 Rhymes abcdefghij_Contains anagramOf
John Q. Smith johnqsmith% ith 0000000211011... hhijmnoqst
...and so on.
One approach is to ask oneself, how must I transform both searched value and value searched for, so that they match?, and then proceed and implement that in code.
You can use match_against in myisam full text indexes columns.

How to compile a complete list of MySQL "Words"

Really getting into MySQL and one thought I've had on mastering one aspect of it is to gather a complete listing of MySQL words. One example of this might be the Reserved Words list, though it appears that's not a complete list; example: CONCAT, CRC32, etc.
Bizarre as it may seem, I was thinking that such a list might exist, or that there might even be a query that would yield it, and/or a way to extract it from the source code of MySQL.
It is a non-scientific method, but what I would do is:
extract all strings from Native_func_registry func_array. Lookup for it sql/item_create.cc , e.g in
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/item_create.cc
Those should cover builtin functions.
extract strings from 'symbols' and 'functions' in lexer :
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/lex.h
extract symbols from bison input http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/sql_yacc.yy from lines
%token SOMETOKEN
except when tokens have _SYM suffix (they are covered by sql/lex.h)
Combine all of those, and the resulting set might come near :)