Converting character to hex via Tcl - tcl

I am trying to convert some characters into hex using tcl.
And i would usually do something like this: [binary format a* 'o'] that return 111, which is the int representation of 'o' that can then be converted.
However the way that i retrive the character, [value string_split] returns "o" instead of 'o' cuasing the the function to throw an error, esentially like doing: [binary format a* "o"] which returns "ERROR: Nothing is named "o""
So, what is the difference between "o" and 'o' in a tcl context and how can i get my [binary format a* [value string_split]] call to return 111 like [binary format a* 'o'] would do.
It should be noted that i am using TheFoundry's Nuke to do this and I don't know exactly what version of TCL they are using, but it is a rather old one.

You can use scan with a format of %c to get the Unicode codepoint value of a character, and then format to print it as hex:
#!/usr/bin/env tclsh
set o_str o
scan $o_str %c o_value
puts $o_value ;# 111
puts [format 0x%x $o_value] ;# 0x6f

Related

How to insert ASCII control characters into a TCL string?

I need to create a TCL script that contains ASCII control characters. This is the full list of these characters from the ASCII table but I am only interested in putting in the "start of text" value 2 and "end of text" value 3.
You can enter a hex code in a string by writing \xnn where nn is the code, e.g.
set start_of_text "\x02"
set end_of_text "\x03"
See the documentation at https://www.tcl-lang.org/man/tcl8.6/TclCmd/Tcl.htm#M27
You can also use format with the %c code (which might be more useful if you don't know the relevant number until run-time because it's in a variable or whatever):
set ascii(STX) [format %c 2]
set ascii(ETX) [format %c 3]
If I'm going to be wrapping text in a control sequence (often for things like applying a colouring) then I'll make a procedure to do the job:
proc wrapped {string} {
# These use Unicode escapes
return "\u0002$string\u0003"
}
puts [wrapped "this is some test text"]

Length in bytes of text with (CR) (LF)

I’ve got from sqlite3 value that could be written in hex like "0x0D 0x0A". Yes, it’s (CR) and (LF). I want to know a length of data i’ve got. But command "string length" returns 1, not 2. "string bytelength" returns 1 too. How can I get correct length of data in bytes?
It’s a simple example. In real program I’ve got different text data from sqlite with unknown encoding. All I need is to get length of data in bytes. But every (CR)(LF) in text are counting as 1 byte.
Examples of getting data from sqlite and file:
sqlite dbcmd messages.db
set t [dbcmd message from messages limit 1,1]
string length $t
set f [open test.txt r]
set t [read $f]
string length $t
(Windows 7, ActiveTcl 8.6.4, tclkit 8.6.6)
By default, Tcl transforms CR-LF sequences in files being read into simple LF characters. This is usually useful, as it simplifies ordinary text processing in scripts greatly. However, if want the exact values then you can use fconfigure to put the channel into an alternate processing mode. For example, changing the channel's -translation setting to lf (from auto) will make all carriage-returns be preserved (and line-feeds too).
set f [open test.txt r]
fconfigure $f -translation lf
set t [read $f]
string length $t
There are other settings that could — in general — affect what you get, particularly the -eofchar and -encoding options. The -eofchar is usually EOF (i.e., the character associated with Ctrl+Z) and the -encoding is a system-specific value that depends on things like what your platform is and what your locale is. If you want to really work with binary data, i.e., get just the bytes, you can set the -translation option to binary, which sets everything up right for handling binary data. There's a shorthand for that common option in the open command:
set f [open test.txt rb]; # ««« “b” flag in open mode
set t [read $f]
string length $t
If you do get the bytes and want to get characters from them at some point, the encoding convertfrom command is the tool you'll need. Remember, characters and bytes are not the same thing. That had to be given up in order to allow people to use more characters than there are values expressible in a byte.

TCL special character \x1

I have a problem with TCL string
set WORD 128
set CELL_NAME "MCELL_$WORD\x1"
# real: MCELL_128.. (.. is 2 special characters that I can't paste here)
# expected: "MCELL_128x1"
How can I format the string as expected?
set CELL_NAME "MCELL_${WORD}x1"
gives you the expected output.
Other possibilities:
set CELL_NAME "MCELL_[set WORD]x1"
set CELL_NAME [format "MCELL_%dx1" $WORD]
Documentation:
format,
set,
Summary of Tcl language syntax, particularily item [8].

How to treat tcl string as a hex number and convert it into binary?

I have a tcl string set in a variable. I want to treat it as a hex to convert into binary of it. Can anybody help me to achieve this.
Here is what i am doing :
$ /usr/bin/tclsh8.5
% set a a1a2a3a4a5a6
a1a2a3a4a5a6
% set b [ string range $a 0 3 ]
a1a2
Now i want that a1a2 value of variable "b" should be treated as 0xa1a2, so that i can convert it into binary. Please help me to solve this.
If you are using Tcl 8.6, then binary decode hex is the best choice:
binary decode hex $b
If you are using an older version of Tcl, then you have to use the binary format with the H format specifier:
binary format H* $b
You can write the resulting byte array to a file or send it through a socket etc, but if you want to display it as text, I suggest converting it to a string first:
encoding convertfrom utf-8 [binary format H* $b]

Help needed in writing regular expression -- TCL

Just seeking a favour to write a regular expression to match the following set of strings. I want to write an expression which matches all the following strings TCL
i) ( XYZ XZZ XVZ XWZ )
Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
I want to write an another regexp which catches/matches only the following string wherever comes
ii) (XYZ)
My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}
Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)
i) 1st Question Tested
set to_Match_Str "XYZ XZZ XVZ XWZ"
foreach {wholeStr to_Match_Str} [regexp -all -inline {X[YZVW]Z} $to_Match_Str] {
puts "MATCH $to_Match_Str in the list"
}
It prints only XZZ XWZ from the list. Its leaves out XYZ & XVZ
When I include the paranthesis [regexp -all -inline {X([YZVW])Z} $to_Match_Str]. It prints all the middle characters correctly Y Z V W
i) (XYZ XZZ XVZ XWZ)
Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
Assuming you're not after literal parentheses around the whole lot, you match that using this:
regexp {X([YZVW])Z} $string -> matchedSubstr
That's because the interior strings are all single characters. (It also stores the matched substring in the variable matchedSubstr; choose any variable name there that you want.) You should not use | inside a [] in a regular expression, as it has no special meaning there. (You might need to add ^$ anchors round the outside.)
On the other hand, if you want to match multiple character sequences (which the Y etc. are just stand-ins for) then you use this:
regexp {X(Y|Z|V|W)Z} $string -> matchedSubstr
Notice that | is being used here, but [] is not.
If your real string has many of these strings (whichever pattern you're using to match them) then the easiest way to extract them all is with the -all -inline options to regexp, typically used in a foreach like this:
foreach {wholeStr matchedSubstr} [regexp -all -inline {X([YZVW])Z} $string] {
puts "Hey! I found a $matchSubstr in there!"
}
Mix and match to taste.
My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}
Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)
That's optimal for an exact comparison. And in fact Tcl will optimize that internally to a straight string equality test if that's literal.
My trial: [regexp {^X([Y|Z|V|W]*)Z$}]
That would match the strings given, but as you are using the * multiplier it would also match strings like "XZ", "XYYYYYYYYYYYYYYYYZ" and "XYZYVWZWWWZVYYWZ". To match the middle character only once, don't use a multiplier:
^X([Y|Z|V|W])Z$
My trial: [regexp {^X([Y]*)Z$}]
The same there, it will also match strings like "XZ", "XYYZ" and "XYYYYYYYYYYYYYYYYZ". Don't put a multiplier after the set:
^X([Y])Z$
or simply regexp {^XYZ$}
That won't catch anything. To make it do the same as the other (catch the Y character), you need the parentheses:
^X(Y)Z$
You can use the Visual Regexp tool to help, it provides feedback as you construct your regular expression.