Tcl quoting proc to sanitise string to pass to other shells - tcl

I want to pass a dict value to another shell (in my application it passes through a few 'shell' levels), and the dict contains characters (space, double quotes, etc) that cause issues.
I can use something like ::base64::encode -wrapchar $dict and the corresponding ::base64::decode $str and it works as expected but the result is, of course, pretty much unreadable.
However, for debugging & presentation reasons I would prefer an encoded/sanitised string that resembled the original dict value inasmuch as reasonable and used a character set that avoids spaces, quotes, etc.
So, I am looking for something like ::base64 mapping procs but with a lighter
touch.
Any suggestions would be appreciated.

You can make lighter-touch quoting schemes using either string map or regsub to do the main work.
Here's an example of string map:
set input "O'Donnell's Bait Shop"
set quoted '[string map {' {'\''}} $input]' ; #'# This comment just because of stupid Stack Overflow syntax highlighter
puts $quoted
# ==> 'O'\''Donnell'\''s Bait Shop'
Here's an example of regsub:
set input "This uses a hypothetical quoting of some letters"
set quoted <[regsub -all {[pqr]} $input {«&»}]>
puts $quoted
# ==> <This uses a hy«p»othetical «q»uoting of some lette«r»s>
You'll need to decide what sort of quoting you really want to use. For myself, if I was going through several shells, I'd be wanting to avoid quoting at all (because it is difficult to get right) and instead find ways to send the data in some other way, perhaps over a pipeline or in a temporary file. At a pinch, I'd use an environment variable, as shells tend to not mess around with those nearly as much as arguments.

Related

Use of brackets and space character in Tcl

I am still confused about the usage of the bracket i.e () [] and {} use in Tcl. I always get caught out using the wrong bracket, having missed brackets when it was required to use them or having used too many of them. Besides this, I am also getting confused by Tcl giving me different result depending on presence or absence of space character (in math expression) and also if I have used more than one space character in succession.
Can someone please give me the basic rules that I must keep in mind to get out of this mess. Brackets have always been simple to use in C and some other languages but here they are totally different.
At the level you're looking at, Tcl is very different to any other language you've ever worked with. The heart of Tcl is defined by the Tcl(n) manual page, which states that (among other things):
Whitespace separates words. Every command takes its arguments as a sequence of words. Newlines and semicolons separate command calls; they're totally equivalent, but good style is to use a newline instead of a semicolon.
{braces} are used mainly for quoting text so that it is passed to commands with no substitutions or word separation performed on it. They nest properly. Braces are also used after $ to do variable substitution in a few cases: that's a rare use.
"double quotes" are used for quoting text so that it is passed to commands with substitutions applied, but no word separation.
[brackets] are a command substitution. They are replaced with the result of running the script inside the bracket. The script is usually a single command.
(parentheses) only have one base language use: for (associative) array elements. Thus, $a(b) is a variable substitution that will use the value of the b element in the a array.
The rest of what people call Tcl is really just a standard library, a set of commands to get you started. Some are fundamental. For example:
if is a conditional command, evaluating a branch (a script) if a condition is true. In order for this to be meaningful, the branch has to be not evaluated until the condition has been evaluated and tested; that pretty much requires putting it in braces.
while is a looping command, and not only do you want to brace its body (that's probably going to be evaluated over and over) but you also want to put the condition expression in braces as well as you definitely want that to be reevaluated each time round the loop.
proc is a command that makes your own custom commands. The body of the procedure definitely is something you want to evaluate later; it goes in braces.
expr is a general expression evaluation command. Under all normal circumstances, you'll want to put its expression in braces so that the code can be compiled and won't have double substitution problems. Note that expressions often make heavy use of parentheses: they have additional meanings in expression syntax. In particular, apart from being array element lookups, they're also used for function calls and grouping.
Note that if and while also use that same expression evaluation engine. They just use the result of the expression to decide what to do.
Scoping is a matter for commands to decide. The usual commands for dealing with introducing a scope are proc and namespace eval. This is nothing like C, C++, Java, C#, or Javascript; they have different rules. Variables are local to their procedure unless you explicitly say otherwise.
The community practice is to do calls like this:
if { $foo(bar) > (17 + $grill) * 7 } {
# This is a comment; it lasts to the end of the line
puts "the foobar $foo(bar) is too large"
set foo(bar) [ComputeSmallerValue $grill]
}
That is, barewords (if and puts) are unquoted, expressions and inner scripts are brace-quoted, parentheses are used where meaningful but most for arrays and expressions, whitespace separates all words, inner scripts are indented (usually by 4) for clarity (it doesn't have semantic meaning, but it sure helps with reading), and “blocks” use egyptian braces so that you don't have to add backslashes all over the place.
You don't have to follow these rules (they're guidelines, not the law) but they make your life easier if you do. Sometimes you do need to break the rules, but then you should know to be careful.
You cannot compare Tcl to C. In C, {} defines scope. In Tcl, {} is a grouping operator.
In Tcl, {} may group a string:
{hello world}
Or a list:
{a b c d e f g h}
Or a script:
{
puts -nonewline {hello }
puts world\n
}
Every command is simply a series of groups (which may be a word, a list,
an expression or a script):
{if} {true} { puts "hello\n" }
Of course, you don't need to put braces around every word,
but you do need braces to enclose a script:
if true { puts hello\n }
Generally, for the if statement, not bracing the expression is a bad idea,
so this is better:
if { true } { puts hello\n }
This simple rule creates Tcl's remarkably simple syntax. Every command is simply
a series of groups, whether a word, an expression, a list or script:
if expr script
while expr script
proc name argument-list script
puts string
for initialization condition nextloop script
The one important thing to remember is whenever an expression is wanted, it
should be enclosed within braces in order to prevent early substitution. e.g.:
set i 0
while { $i < 10 } {
incr i
}
The square brackets, [], are replaced with the output of a command enclosed
by the square brackets:
set output [expr {2**5}]
Parentheses are used within expressions as usual:
set output [expr {(2**5)+2}]
And for arrays:
set i 0
while { $i < 5 } {
set output($i) [expr {2**$i}]
incr i
}
parray output

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

Escape square bracket in Tcl_StringCaseMatch

I am using Tcl_StringCaseMatch function in C++ code for string pattern matching. Everything works fine until input pattern or string has [] bracket. For example, like:
str1 = pq[0]
pattern = pq[*]
Tcl_StringCaseMatch is not working i.e returning false for above inputs.
How to avoid [] in pattern matching?
The problem is [] are special characters in the pattern matching. You need to escape them using a backslash to have them treated like plain characters
pattern= "pq\\[*\\]"
I don't think this should affect the string as well. The reason for double slashing is you want to pass the backslash itself to the TCL engine.
For the casual reader:
[] have a special meaning in TCL in general, beyond the pattern matching role they take here - "run command" (like `` or $() in shells), but [number] will have no effect, and the brackets are treated normally - thus the string str1 does not need escaping here.
For extra confusion:
TCL will interpret ] with no preceding [ as a normal character by default. I feel that's getting too confusing, and would rather that TCL complains on unbalanced brackets. As OP mentions though, this allows you to forgo the final two backslashes and use "pq\\[*]". I dislike this, and rather make it obvious both are treated normally and not the usual TCL way, but to each her/is own.

Escape single and double quote in TCL

I am using the following script , but it is throwing error message
tcl;
eval {
add command "Audit Param"\
setting "Error : Part's and Spec's desc contains \"OBS\" or \"REPLACE\"" "(Reference No)"\
user all;
}
It is showing error as : Expected word got 'and'.
I tried with Part\'s, but still not working. How to escape both single and double quote , if it is having both?
Single quote and Tcl
In Tcl itself, the single quote character (') has no special meaning at all. It's just an ordinary character like comma (,) or period (.). (Well, except commas have special meaning in expressions and periods are used in floating point values and Tk widget names. Single quote has no meaning at all by comparison.)
With what you have written, any special meaning (and hence any need to quote) is limited to the add command.
Complex quoting situations are often resolved in Tcl by using a different quoting strategy. In particular, putting things in braces disables all substitutions (except backslash-newline-whitespace collapsing). This lets me write the equivalent to what you've written as:
add command "Audit Param" \
setting {Error : Part's and Spec's desc contains "OBS" or "REPLACE"} \
"(Reference No)" user all
Any complaint here is coming from inside that code and is not in the code as written per se. (The eval { ... } adds nothing. Nor does it incur a penalty other than making your code slightly harder to read.)
The real problem
At a very loose guess, that problem string is being used inside an SQL statement with direct string substitution instead of prepared parameters; that could produce that sort of error message. Check the contents of the global errorInfo variable after the failure happens to get a stack trace that can help pin down what went wrong; that might help you see where inside things the code is failing. If it is a piece of naughty SQL, there is code to fix because you've got something that is vulnerable to SQL injection problems (which might or might not be a security problem, depending on the exposure of that command). And if that's the case, doubling up each single quote (changing ' to '') ought to work around the problem in the short run.

Is JSON safe to use as a command line argument or does it need to be sanitized first?

Is the following dangerous?
$ myscript '<somejsoncreatedfromuserdata>'
If so, what can I do to make it not dangerous?
I realize that this can depend on the shell, OS, utility used for making system calls (if being done inside a programming language), etc. However, I'd just like to know what kind of things I should watch out for.
Yes. That is dangerous.
JSON can include single quotes in string values (they do not need to be escaped). See "the tracks" at json.org.
Imagine the data is:
{"pwned": "you' & kill world;"}
Happy coding.
I would consider piping the data in to the program in question (e.g. use "popen" or even a version of "exec" that passes arguments directly) -- this can avoid issues that result from passing through the shell, for instance. Just as with SQL: using placeholders eliminates the need to trifle with "escaping".
If passing through a shell is the only way, then this may be an option (it is not tested, but something similar holds for a "<script>" context):
For every character in the JSON, which is either outside the range of "space" to "~" in ASCII, or has a special meaning in the '' context of a the shell such as \ and ' (but excluding " or any other character -- such as digits -- that can appear outside of "string" data, which is a limitation of this trivial approach), then encode the character using the \uXXXX JSON form. (Per the limitations defined above this should only encode potentially harmful characters appearing within the "strings" in the JSON and there should be no \\ pairs, no trailing \, and no 's, etc.)
It's ok. Just escape the character you use to wrap the string:
' should become '\''
So the JSON string
{"pwned": "you' & kill world;"}
becomes
{"pwned": "you'\'' & kill world;"}
and your final command, as the shell sees it, will be:
$ myscript '{"pwned": "you'\'' & kill world;"}'