i10n json regex different cases - json

So my company needs to send our i10n json file to a translator that can translate the thing into other languages.
Now the our system uses this file. Because of this we are able to make some "funky" statements that can be understood by our system but not by our translators when they extract the file.
For instance we have a case like this:
"CHOOSE": "{VALUE, select, 1{Vælg bruger} other{Fejl}}",
In the above example our system either takes Vælg bruger or Fejl
We also have something like this:
"HAS_MATERIAL": "Indeholder {{COUNT}} {{COUNT > 1 ? 'filer' : 'fil'}}",
Basicly the result of this would be Indeholder and if count is bigger than 1 filer else fil.
The last case we have is something like this:
"YOU_HAVE_NOTIFICATION": "You have { LENGTH } {LENGTH, select, 1{new notification} other{new notifications}}",
Again Length is a temp value and that then decides which translation to take.
So now its my job to make a regex for this file so we can get a list of all the words that need to be translated. and i am rather lost. the above 3 cases has different ways of approaching the wanted value.
i attempted with something like this:
{(.*?)}
With a global flag
However this doesnt work on all the cases.

Since there are some kind of "command language" (or two) involved this probably will fail at some point, but it handles your given examples:
{\w+,\s*select,\s*\w+\s*{([^}]*)}\s\w+\s*{([^}]*)}|{[^?{}]+\?\s*'([^']*)'\s*:\s*'([^']*)'\s*}]*}|^([^{}]+)|([^{}]+)$
It treats individual case one by one:
The SELECT statement
Inside braces, expect some expression followed by a ,, a command (in this case select), a ,, a case value and here we grab the text inside braces. Then expect some other case value and again - grab the text inside braces. I expect there will be case were there are more than two cases -> fail. (it can be expanded to handle more though)
Then the ternary operator
Inside braces, expect some expression followed by a ?, then grab the text inside single quotes**. Then expect a : and again - grab the text inside single quotes.
At the start of a line
grab all text up to {.
And end of line
grab anything after last }.
I guess this is far from complete. E.g. it won't handle text between "selects", and feels very fragile, but it might help you get started.
Check it out here at regex101.

Related

NetSuite Saved Search: REGEXP_SUBSTR Pattern troubles

I am trying to break down a string that looks like this:
|5~13~3.750~159.75~66.563~P20~~~~Bundle A~~|
Here is a second example for reference:
|106~10~0~120~1060.000~~~~~~~|
Here is a third example of a static sized item:
|3~~~~~~~~~~~5:12|
Example 4:
|3~23~5~281~70.250~upper r~~~~~~|
|8~22~6~270~180.000~center~~~~~~|
|16~22~1~265~353.333~center~~~~~~|
Sometimes there are multiple lines in the same string.
I am not super familiar with setting up patterns for regexp_substr and would love some assistance with this!
The string will always have '|' at the beginning and end and 11 '~'s used to separate the numeric/text values which I am hoping to obtain. Also some of the numeric characters have decimals while others do not. If it helps the values are separated like so:
|Quantity~ Feet~ Inch~ Unit inches~ Total feet~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|
As you can see, if there isn't something specified it shows as blank, but it may have them in another string, its rare for all of the values to have data.
For this specific case I believe regexp_substr will be my best option but if someone has another suggestion I'd be happy to give it a shot!
This is the formula(Text) I was able to come up with so far:
REGEXP_SUBSTR({custbody_msm_cut_list},'[[:alnum:]. ]+|$',1,1)
This allows me to pull all the matches held in the strings, but if some fields are excluded it makes presenting the correct data difficult.
TRIM(REGEXP_SUBSTR({custbody_msm_cut_list}, '^\|(([^~]*)~){1}',1,1,'i',2))
From the start of the string, match the pipe character |, then match anything except a tilde ~, then match the tilde. Repeat N times {1}. Return the last of these repeats.
You can control how many tildes are processed by the integer in the braces {1}
EG:
TRIM(REGEXP_SUBSTR('|Quantity~ Feet~ Inch~ Unit inches~ Total feet~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|', '^\|(([^~]*)~){1}',1,1,'i',2))
returns "Quantity"
TRIM(REGEXP_SUBSTR('|Quantity~ Feet~ Inch~~~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|', '^\|(([^~]*)~){7}',1,1,'i',2))
returns "Punch Pattern"
The final value Pitch is a slightly special case as it is not followed by a tilde:
TRIM(REGEXP_SUBSTR('|~~~~~~~~~~ Radius~ Pitch|', '^\|(([^~]*)~){11}([^\|]*)',1,1,'i',3))
Adapted and improved from https://stackoverflow.com/a/70264782/7885772

How to properly set a conditional statement in sequence?

I'm trying to set a conditional statement in web methods. I think I'm on the right path and missing couple small steps trying to get it done.
So my branch is going is going through a path. /barcode
Then the sequence's job is to keep the length 18 or below characters. My label in the sequence looks like this: (%barcode%)<=18.
This flow is getting skipped over.
if you are trying to find the length of the value in barcode. Use this service pub.string:length in a map step to find out the length and then use that length for the comparison.

Capture a value from a repeating group on every iteration (as opposed to just last occurrence)

How does one capture a value recursively with regex, where value is a part of a group that repeats?
I have a serialized array in mysql database
These are 3 examples of a serialized array
a:2:{i:0;s:2:"OR";i:1;s:2:"WA";}
a:1:{i:0;s:2:"CA";}
a:4:{i:0;s:2:"CA";i:1;s:2:"ID";i:2;s:2:"OR";i:3;s:2:"WA";}
a:1 stands for array:{number of elements}
then in between {} i:0 means element 0, i:1 means element 1 etc.
then the actual value s:2:"CA" means string with length of 2
so I have 2 elements in first array, 1 element in the second and 4 elements in the last
I have this data in mysql database and I DO NOT HAVE an option to parse this with back-end code - this has to be done in mysql (10.0.23-MariaDB-log)
the repeating pattern is inside of the curly braces
the number of repeats is variable (as in 3 examples each has a different number of repeating patterns),
the number of repeating patterns is defined by the number at 3rd position (if that helps)
for the first example it's a:2:
and so there are 2 repeating blocks:
i:0;s:2:"OR";
i:1;s:2:"WA";
I only care to extract the values in bold
So I came up with this regex
^a:(?:\d+):\{(?:i:(?:\d+);s:(?:\d+):\"(\w\w)\";)+}$
it captures the values I want all right but problem is it only captures the last one in each repeating group
so going back to the example what would be captured is
WA
CA
WA
What I would want is
OR|WA
CA
CA|ID|OR|WA
these are the language specific regex functions available to me:
https://mariadb.com/kb/en/library/regular-expressions-functions/
I don't care which one is used to solve the problem
Ultimately I need this in as sensible form that can be presented to the client e.g. CA,ID,OR or CA|ID|OR
Current thoughts are perhaps this isn't possible in a one liner, and I have to write a multi-step function where
extract the repeating portion between the curly braces
then somehow iterate over each repeating portion
then use the regex on each
then return the results as one string with separated elements
I doubt if such a capture is possible. However, this would probably do the job for your specific purpose.
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(str1, '^a:\\d+:\{', ''),
'i:\\d+;s:\\d+:\"(\\w\\w)\";',
'\\1,'
),
'\,?\}$',
''
)
Basically, this works with the input string (or column) str1 like
remove the first part
replace every cell with the string you want
remove the last 2 characters, ,}
and voila! You get a string CA,ID,OR.
Aftenote
It may or may not work well when the original array before serialised is empty (it depends how it is serialised).

How to compile a complete list of MySQL "Words"

Really getting into MySQL and one thought I've had on mastering one aspect of it is to gather a complete listing of MySQL words. One example of this might be the Reserved Words list, though it appears that's not a complete list; example: CONCAT, CRC32, etc.
Bizarre as it may seem, I was thinking that such a list might exist, or that there might even be a query that would yield it, and/or a way to extract it from the source code of MySQL.
It is a non-scientific method, but what I would do is:
extract all strings from Native_func_registry func_array. Lookup for it sql/item_create.cc , e.g in
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/item_create.cc
Those should cover builtin functions.
extract strings from 'symbols' and 'functions' in lexer :
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/lex.h
extract symbols from bison input http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/view/head:/sql/sql_yacc.yy from lines
%token SOMETOKEN
except when tokens have _SYM suffix (they are covered by sql/lex.h)
Combine all of those, and the resulting set might come near :)

Query in MUMPS statement

I $P(GIH,,24)= S $P(GIH,,24)="C" S
What is the meaning of two S's in above MUMPS statement?
Let me start out by saying the original statement is NOT either Standard MUMPS or InterSystems Cache, or GT.M code. Even broadly guessing what was originally meant, the final S on the line isn't something you would do in MUMPS. A single S could be a SET command, but you still don't have any arguments telling what variable could be assigned, or what value should be assigned to it.
The rest of my reply is trying to figure out what it could have meant.
Your question seems to be broken by some software. either that on stackoverflow or the cut-and-paste process to put it here:
I saw:
I $P(GIH,,24)= S $P(GIH,,24)="C" S
What is the meaning of two S's in above MUMPS statement?
It is hard to figure out what you meant, since it would require hypothesizing where quotes might be and which ones could have been deleted by the transmission of the question.
First of all, let's do something we can guess is reasonable. $P is usually an abbreviation for the built-in (intrinsic) function $PIECE. an I standing alone is probably an IF command
and an S standing alone is probably a SET command. This runs into a problem with your example, because the format of a line of MUMPS code is COMMAND COMMAND-ARGUMENT.
Aside Note: I also just tried to put the text COMMAND-ARGUMENT in "angle brackets" ie: with a less-than character at the beginning of the word and a greater-than character at the end. The text COMMAND-ARGUMENT just disappeared. Which means that stackoverflow sees it as HTML markup. I notice there is a Code marker on the top of this edit window which may or may not help.
If we do the expansions to the code above, we get:
IF $PIECE(GIH,,24)= SET $PIECE(GIH,,24)="C" SET
When we expand the final S but it looks like a SET command, but without any set-argument.
Note, if this was in a Cache system, we might have an example of extra spaces allowed by Cache, which are not allowed in Standard MUMPS, ie the S may have been the right hand side of an equality operator in the IF command. This would only make sense if Cache also allowed the argument of the SET command to be in code without an actual SET command.
i.e.:
IF $PIECE(GIH,,24)=S $PIECE(GIH,,24)="C" SET
We still would have to deal with the two commas in a row for the $PIECE intrinsic function. Currently using two commas in a row to indicate a missing argument is only allowed in Programmer-written code, not when using built-in functions. So this might be a place where we can guess what you meant, or originally pasted in.
If we put in double-quotes we run into the problem that $PIECE command (which separates a string based on a delimiter) would have an quoted string of zero length given as its second argument. Which is just as erroneous as having an empty argument.
So if we hypothesize a quoted string that has angle brackets, we would get something this for your original line:
IF $PIECE(GIH,"<something>",24)="<something>" SET $PIECE(GIH,"<something>",24)="C" SET
Note: I just saw the Code marker allows use of grave accents to keep from assuming a line is HTML - which is good since grave accent is not a character used in MUMPS coding.
As has been mentioned on another reply, the SET-$PIECE-ARGUMENT form is used to change the data stored in a database at a particular delimited substring location.
So this code might be fine for guessing, but it has gone far afield of what you may or may not have done. So I'm stopping now until we get feedback that this is even close to what you wanted. As I said at the first, this is still not quite valid code.
This is pretty bizarre, but what I think is going on is:
I $P(GIH,<null>,24)=<null>
Calling $PIECE with the second argument null will replace the entire string with the value you're assigning, which, in this case, is also null. It looks like a convoluted way of clearing the value of GIH and permitting control to flow into the following SET statement. I seriously doubt that $PIECE sets the $T flag, though, which means that calling this as the condition for the IF operator probably isn't working the way you want it to.
S $P(GIH,,24)="C"
The next statement looks a lot like the first -- replace the entirety of GIH with "C".
S
I don't think the last SET is valid MUMPS.
Why this isn't written as follows is beyond me:
s GIH="C"
Hope that helps!
Maybe Intersystems Caché handles this syntax differently, but that code results in a syntax error when I try it in Caché. There may be other versions of MUMPS for which that is valid, but I don't think it is.
As other have pointed out this statement is not valid, It appears pieces are missing
But S is the SET command in Mumps
Here is what a statement like this might look like:
I $P(GIH,"^",24)="P" S $P(GIH,"^",24)="C" S UPDATEFLG=1
in this case GIH might look something like:
GIH=256^^^42^^^^Mike^^^^^^^^^^^^^^^^P^^^
which would make this evaluate to TRUE:
I $P(GIH,"^",24)="P"
so after:
S $P(GIH,"^",24)="C"
GIH will be:
GIH=256^^^42^^^^Mike^^^^^^^^^^^^^^^^C^^^
then it would set the variable UPDATEFLG=1
Hope this helps :-)