I am using Tcl_StringCaseMatch function in C++ code for string pattern matching. Everything works fine until input pattern or string has [] bracket. For example, like:
str1 = pq[0]
pattern = pq[*]
Tcl_StringCaseMatch is not working i.e returning false for above inputs.
How to avoid [] in pattern matching?
The problem is [] are special characters in the pattern matching. You need to escape them using a backslash to have them treated like plain characters
pattern= "pq\\[*\\]"
I don't think this should affect the string as well. The reason for double slashing is you want to pass the backslash itself to the TCL engine.
For the casual reader:
[] have a special meaning in TCL in general, beyond the pattern matching role they take here - "run command" (like `` or $() in shells), but [number] will have no effect, and the brackets are treated normally - thus the string str1 does not need escaping here.
For extra confusion:
TCL will interpret ] with no preceding [ as a normal character by default. I feel that's getting too confusing, and would rather that TCL complains on unbalanced brackets. As OP mentions though, this allows you to forgo the final two backslashes and use "pq\\[*]". I dislike this, and rather make it obvious both are treated normally and not the usual TCL way, but to each her/is own.
Related
I am new to TCL language. I am having difficulties to catch data in myFile.txt.
MyFile.txt
set obj "{Hello}"
set obj "{Bye}"
set obj "{Nice}"
set obj "{Yoh}"
I want to catch words inside the curly bracket as shown below.
Hello, Bye, Nice, Yoh
How to do it with regexp in TCL.
The first thing to try would be this simple thing:
regexp {\{(\w+)\}} $obj -> word
The key points:
{ and } are metacharacters inside Tcl's RE language variant, so they need to be escaped.
The bit we want to extract (“non-empty word character sequence”, so \w+) needs to be captured and matched with a capture variable that comes afterwards: the -> is just a dummy variable for a capture that we want to ignore.
Always put REs in Tcl inside curly braces unless you know exactly what you're doing. (When you know what you're doing, you'll know to put them in braces virtually always anyway.) This allows us to write REs without an inflammation of backslashes.
When I create a string containing backslashes, they get duplicated:
>>> my_string = "why\does\it\happen?"
>>> my_string
'why\\does\\it\\happen?'
Why?
What you are seeing is the representation of my_string created by its __repr__() method. If you print it, you can see that you've actually got single backslashes, just as you intended:
>>> print(my_string)
why\does\it\happen?
The string below has three characters in it, not four:
>>> 'a\\b'
'a\\b'
>>> len('a\\b')
3
You can get the standard representation of a string (or any other object) with the repr() built-in function:
>>> print(repr(my_string))
'why\\does\\it\\happen?'
Python represents backslashes in strings as \\ because the backslash is an escape character - for instance, \n represents a newline, and \t represents a tab.
This can sometimes get you into trouble:
>>> print("this\text\is\not\what\it\seems")
this ext\is
ot\what\it\seems
Because of this, there needs to be a way to tell Python you really want the two characters \n rather than a newline, and you do that by escaping the backslash itself, with another one:
>>> print("this\\text\is\what\you\\need")
this\text\is\what\you\need
When Python returns the representation of a string, it plays safe, escaping all backslashes (even if they wouldn't otherwise be part of an escape sequence), and that's what you're seeing. However, the string itself contains only single backslashes.
More information about Python's string literals can be found at: String and Bytes literals in the Python documentation.
As Zero Piraeus's answer explains, using single backslashes like this (outside of raw string literals) is a bad idea.
But there's an additional problem: in the future, it will be an error to use an undefined escape sequence like \d, instead of meaning a literal backslash followed by a d. So, instead of just getting lucky that your string happened to use \d instead of \t so it did what you probably wanted, it will definitely not do what you want.
As of 3.6, it already raises a DeprecationWarning, although most people don't see those. It will become a SyntaxError in some future version.
In many other languages, including C, using a backslash that doesn't start an escape sequence means the backslash is ignored.
In a few languages, including Python, a backslash that doesn't start an escape sequence is a literal backslash.
In some languages, to avoid confusion about whether the language is C-like or Python-like, and to avoid the problem with \Foo working but \foo not working, a backslash that doesn't start an escape sequence is illegal.
I want to pass a dict value to another shell (in my application it passes through a few 'shell' levels), and the dict contains characters (space, double quotes, etc) that cause issues.
I can use something like ::base64::encode -wrapchar $dict and the corresponding ::base64::decode $str and it works as expected but the result is, of course, pretty much unreadable.
However, for debugging & presentation reasons I would prefer an encoded/sanitised string that resembled the original dict value inasmuch as reasonable and used a character set that avoids spaces, quotes, etc.
So, I am looking for something like ::base64 mapping procs but with a lighter
touch.
Any suggestions would be appreciated.
You can make lighter-touch quoting schemes using either string map or regsub to do the main work.
Here's an example of string map:
set input "O'Donnell's Bait Shop"
set quoted '[string map {' {'\''}} $input]' ; #'# This comment just because of stupid Stack Overflow syntax highlighter
puts $quoted
# ==> 'O'\''Donnell'\''s Bait Shop'
Here's an example of regsub:
set input "This uses a hypothetical quoting of some letters"
set quoted <[regsub -all {[pqr]} $input {«&»}]>
puts $quoted
# ==> <This uses a hy«p»othetical «q»uoting of some lette«r»s>
You'll need to decide what sort of quoting you really want to use. For myself, if I was going through several shells, I'd be wanting to avoid quoting at all (because it is difficult to get right) and instead find ways to send the data in some other way, perhaps over a pipeline or in a temporary file. At a pinch, I'd use an environment variable, as shells tend to not mess around with those nearly as much as arguments.
Consider the pattern is:
PPP(GJ) {
__hj_o:
}
What is the regular expression match the above pattern?
Tcl's regular expressions can contain newlines just fine, but for anything complicated it can help to put it in its own variable instead of having it as an inline literal:
set RE {PPP(GJ) {
__hj_o:
}}
if {[regexp $RE $someString]} {
# We got a match!
}
Indeed, regexp would also match the above with this:
set RE {PPP(GJ)\s+{\s+__hj_o:\s+}}
because newlines are just ordinary whitespace characters (i.e., are matched by \s and .) by default. (The above REs are probably not exactly what you want; they likely need suitable patterns for the non-whitespace portions as well.)
However, you need to ensure that the string you are matching against has the whole thing that you want to match. If you're just feeding through one line at a time, that multiline pattern will consistently fail. This sounds obvious, but it is the easiest mistake to make.
I want to replace "\cite{foo123a}" with "[1]" and backwards. So far I was able to replace text with the following command
body.replaceText('.cite{foo}', '[1]');
but I did not manage to use
body.replaceText('\cite{foo}', '[1]');
body.replaceText('\\cite{foo}', '[1]');
Why?
The back conversion I cannot get to work at all
body.replaceText('[1]', '\\cite{foo}');
this will replace only the "1" not the [ ], this means the [] are interpreted as regex character set, escaping them will not help
body.replaceText('\[1\]', '\\cite{foo}');//no effect, still a char set
body.replaceText('/\[1\]/', '\\cite{foo}');//no matches
The documentation states
A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.
Can I find a full description of what is supported and what not somewhere?
I'm not familiar with Google Apps Script, but this looks like ordinary regular expression troubles.
Your second conversion is not working because the string literal '\[1\]' is just the same as '[1]'. You want to quote the text \[1\] as a string literal, which means '\\[1\\]'. Slashes inside of a string literal have no relevant meaning; in that case you have written a pattern which matches the text /1/.
Your first conversion is not working because {...} denotes a quantifier, not literal braces, so you need \\\\cite\\{foo\\}. (The four backslashes are because to match a literal \ in a regular expression is \\, and to make that a string literal it is \\\\ — two escaped backslashes.)