problems using replaceText for special characters: [ ] - google-apps-script

I want to replace "\cite{foo123a}" with "[1]" and backwards. So far I was able to replace text with the following command
body.replaceText('.cite{foo}', '[1]');
but I did not manage to use
body.replaceText('\cite{foo}', '[1]');
body.replaceText('\\cite{foo}', '[1]');
Why?
The back conversion I cannot get to work at all
body.replaceText('[1]', '\\cite{foo}');
this will replace only the "1" not the [ ], this means the [] are interpreted as regex character set, escaping them will not help
body.replaceText('\[1\]', '\\cite{foo}');//no effect, still a char set
body.replaceText('/\[1\]/', '\\cite{foo}');//no matches
The documentation states
A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.
Can I find a full description of what is supported and what not somewhere?

I'm not familiar with Google Apps Script, but this looks like ordinary regular expression troubles.
Your second conversion is not working because the string literal '\[1\]' is just the same as '[1]'. You want to quote the text \[1\] as a string literal, which means '\\[1\\]'. Slashes inside of a string literal have no relevant meaning; in that case you have written a pattern which matches the text /1/.
Your first conversion is not working because {...} denotes a quantifier, not literal braces, so you need \\\\cite\\{foo\\}. (The four backslashes are because to match a literal \ in a regular expression is \\, and to make that a string literal it is \\\\ — two escaped backslashes.)

Related

MySQL 8.0.30 Regular Expression Word Matching with Special Characters

While there's a told of "old" examples on the internet using the now unsupported '[[:<:]]word[[:>:]]' technique, I'm trying to find out how, in MySQL 8.0.30, to do exact word matching from our table with words that have special characters in them.
For example, we have a paragraph of text like:
"Senior software engineer and C++ developer with Unit Test and JavaScript experience. I also have .NET experience!"
We have a table of keywords to match against this and have been using the basic system of:
SELECT
sk.ID
FROM
sit_keyword sk
WHERE
var_text REGEXP CONCAT('\\b',sk.keyword,'\\b')
It works fine 90% of the time, but it completely fails on:
C#, C++, .NET, A+ or "A +" etc. So it's failing to match keywords with special characters in them.
I can't seem to find any recent documentation on how to address this since, as mentioned, nearly all of the examples I can find use the old unsupported techniques. Note I need to match these words (with special characters) anywhere in the source text, so it can be the first or last word, or somewhere in the middle.
Any advice on the best way to do this using REGEXP would be appreciated.
You need to escape special chars in the search phrase and use the construct that I call "adaptive dynamic word boundaries" instead of word boundaries:
var_text REGEXP CONCAT('(?!\\B\\w)',REGEXP_REPLACE(sk.keyword, '([-.^$*+?()\\[\\]{}\\\\|])', '\\$1'),'(?<!\\w\\B)')
The REGEXP_REPLACE(sk.keyword, '([-.^$*+?()\\[\\]{}\\\\|])', '\\$1') matches . ^ $ * + - ? ( ) [ ] { } \ | chars (adds a \ before them) and (?!\\B\\w) / (?<!\\w\\B) require word boundaries only when the search phrase start/ends with a word char.
More details on adaptive dynamic word boundaries and demo in my YT video.
Regular expressions treat several characters as metacharacters. These are documented in the manual on regular expression syntax: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax
If you need a metacharacter to be treated as the literal character, you need to escape it with a backslash.
This gets very complex. If you just want to search for substrings, perhaps you should just use LOCATE():
WHERE LOCATE(sk.keyword, var_text) > 0
This avoids all the trickery with metacharacters. It treats the string of sk.keyword as containing only literal characters.

HTML input pattern messing not working at all after matching whitespace [duplicate]

$.validator.addMethod('AZ09_', function (value) {
return /^[a-zA-Z0-9.-_]+$/.test(value);
}, 'Only letters, numbers, and _-. are allowed');
When I use somehting like test-123 it still triggers as if the hyphen is invalid. I tried \- and --
Escaping using \- should be fine, but you can also try putting it at the beginning or the end of the character class. This should work for you:
/^[a-zA-Z0-9._-]+$/
Escaping the hyphen using \- is the correct way.
I have verified that the expression /^[a-zA-Z0-9.\-_]+$/ does allow hyphens. You can also use the \w class to shorten it to /^[\w.\-]+$/.
(Putting the hyphen last in the expression actually causes it to not require escaping, as it then can't be part of a range, however you might still want to get into the habit of always escaping it.)
The \- maybe wasn't working because you passed the whole stuff from the server with a string. If that's the case, you should at first escape the \ so the server side program can handle it too.
In a server side string: \\-
On the client side: \-
In regex (covers): -
Or you can simply put at the and of the [] brackets.
Generally with hyphen (-) character in regex, its important to note the difference between escaping (\-) and not escaping (-) the hyphen because hyphen apart from being a character themselves are parsed to specify range in regex.
In the first case, with escaped hyphen (\-), regex will only match the hyphen as in example /^[+\-.]+$/
In the second case, not escaping for example /^[+-.]+$/ here since the hyphen is between plus and dot so it will match all characters with ASCII values between 43 (for plus) and 46 (for dot), so will include comma (ASCII value of 44) as a side-effect.
\- should work to escape the - in the character range. Can you quote what you tested when it didn't seem to? Because it seems to work: http://jsbin.com/odita3
A more generic way of matching hyphens is by using the character class for hyphens and dashes ("\p{Pd}" without quotes). If you are dealing with text from various cultures and sources, you might find that there are more types of hyphens out there, not just one character. You can add that inside the [] expression

Remove backslash from nested json [duplicate]

When I create a string containing backslashes, they get duplicated:
>>> my_string = "why\does\it\happen?"
>>> my_string
'why\\does\\it\\happen?'
Why?
What you are seeing is the representation of my_string created by its __repr__() method. If you print it, you can see that you've actually got single backslashes, just as you intended:
>>> print(my_string)
why\does\it\happen?
The string below has three characters in it, not four:
>>> 'a\\b'
'a\\b'
>>> len('a\\b')
3
You can get the standard representation of a string (or any other object) with the repr() built-in function:
>>> print(repr(my_string))
'why\\does\\it\\happen?'
Python represents backslashes in strings as \\ because the backslash is an escape character - for instance, \n represents a newline, and \t represents a tab.
This can sometimes get you into trouble:
>>> print("this\text\is\not\what\it\seems")
this ext\is
ot\what\it\seems
Because of this, there needs to be a way to tell Python you really want the two characters \n rather than a newline, and you do that by escaping the backslash itself, with another one:
>>> print("this\\text\is\what\you\\need")
this\text\is\what\you\need
When Python returns the representation of a string, it plays safe, escaping all backslashes (even if they wouldn't otherwise be part of an escape sequence), and that's what you're seeing. However, the string itself contains only single backslashes.
More information about Python's string literals can be found at: String and Bytes literals in the Python documentation.
As Zero Piraeus's answer explains, using single backslashes like this (outside of raw string literals) is a bad idea.
But there's an additional problem: in the future, it will be an error to use an undefined escape sequence like \d, instead of meaning a literal backslash followed by a d. So, instead of just getting lucky that your string happened to use \d instead of \t so it did what you probably wanted, it will definitely not do what you want.
As of 3.6, it already raises a DeprecationWarning, although most people don't see those. It will become a SyntaxError in some future version.
In many other languages, including C, using a backslash that doesn't start an escape sequence means the backslash is ignored.
In a few languages, including Python, a backslash that doesn't start an escape sequence is a literal backslash.
In some languages, to avoid confusion about whether the language is C-like or Python-like, and to avoid the problem with \Foo working but \foo not working, a backslash that doesn't start an escape sequence is illegal.

Escape square bracket in Tcl_StringCaseMatch

I am using Tcl_StringCaseMatch function in C++ code for string pattern matching. Everything works fine until input pattern or string has [] bracket. For example, like:
str1 = pq[0]
pattern = pq[*]
Tcl_StringCaseMatch is not working i.e returning false for above inputs.
How to avoid [] in pattern matching?
The problem is [] are special characters in the pattern matching. You need to escape them using a backslash to have them treated like plain characters
pattern= "pq\\[*\\]"
I don't think this should affect the string as well. The reason for double slashing is you want to pass the backslash itself to the TCL engine.
For the casual reader:
[] have a special meaning in TCL in general, beyond the pattern matching role they take here - "run command" (like `` or $() in shells), but [number] will have no effect, and the brackets are treated normally - thus the string str1 does not need escaping here.
For extra confusion:
TCL will interpret ] with no preceding [ as a normal character by default. I feel that's getting too confusing, and would rather that TCL complains on unbalanced brackets. As OP mentions though, this allows you to forgo the final two backslashes and use "pq\\[*]". I dislike this, and rather make it obvious both are treated normally and not the usual TCL way, but to each her/is own.

How to trigger a node using email regex in Watson Conversation?

I am trying to extract an email address from user input text in Watson Conversation. First thing first, I need to trigger a particular node using an if condition like this:
input.text.contains('\^(([^<>()[].,;:s#\"]+(.[^<>()[].,;:s#\"]+)*)|(\".+\"))#(([[‌​0-9]{1,3}.[0-9]{1,3}‌​.[0-9]{1,3}.[0-9]{1,‌​3}])|(([a-zA-Z-0-9]+‌​.)+[a-zA-Z]{2,}))$\')
But it doesn't work, I tried a lot of regexes that I found on the internet but none of them work. Does anyone know how to write a proper regex?
I suggest using a much simpler, approximate, regex to match emails that you need to use with String.matches(string regexp) method that accepts a regex:
input.text.matches('^\\S+#\\S+\\.\\S+$')
Do not forget to double escape backslashes so as to define literal backslashes in the pattern.
Pattern details:
^ - start of string
\\S+ - one or more non-whitespace chars
# - a # symbol
\\S+ - one or more non-whitespace chars
\\. - a literal dot
\\S+ - one or more non-whitespace chars
$ - end of string.