Trying to read 32-bit unsigned integer in Tcl using binary scan - tcl

I'm sure I'm doing something very stupid but can't seem to get this simple item figured out. I'm using the native-messaging API in a browser extension to pass a JSON string to a Tcl script. I have this working in C using uint32_t but can't get the 32 bit unsigned integer that is prefixed to the message read in Tcl. I know that it is reading something because I wrote the $len to file and I'm pretty sure it sits and waits for the procedure coread to read more bytes than there are in the message. I've tried Iu and iu in addition to nu in the binary scan command. Would you please tell me what I'm doing wrong?
Should it matter if the messages include multi-byte characters such as Hebrew and Greek?
Added later: I'm pretty sure it's the multi-byte characters because the code works until they are included. But why would that be? Do not the browser and Tcl both count the length of the message in bytes? Whatever is happening, the length determined in Tcl exceeds the length that Tcl reads.
Thank you.
Doc. reads:
Each message is serialized using JSON, UTF-8 encoded and is preceded
with an unsigned 32-bit value containing the message length in native
byte order.
In C the form for listening for incoming message on stdin is:
int listen( void ) {
uint32_t msg_len;
while ( fread( &msg_len, sizeof msg_len, 1, stdin ) == 1 ) {
char *buf = malloc( msg_len );
if ( !buf ) {
}
else if ( fread( buf, sizeof *buf, msg_len, stdin ) != msg_len ) {
}
else {
// Read the full messsage.
}
fflush( stdout );
free( buf );
}
return 0;
}
In Tcl I'm been trying:
proc coread {reqBytes} {
set ::extMsg {}
set remBytes $reqBytes
while {![chan eof stdin]} {
yield
append ::extMsg [set data [read stdin $remBytes]]
set remBytes [expr {$remBytes - [string length $data]}]
if {$remBytes == 0} {
return $reqBytes
}
}
throw {COREAD EOF} "Unexpected EOF"
}
proc ExtRead {} {
chan event stdin readable coro_stdin
while {1} {
if { [coread 4] != 4 || [binary scan $::extMsg nu len] != 1 } {
exit
}
set ::extMsg {}
if { [coread $len] != $len } {
exit
}
# Read full message.
}
}
proc Listen {} {
# Listening on stdin
set ::forever 1
coroutine coro_stdin ExtRead
vwait forever
}
set extMsg {}
Listen

You nearly figured it out. The problem is almost certainly due to multi-byte characters. With the default stdin encoding setting, Tcl counts each character as 1, while the length you got is probably indicating the bytes, not the characters. If the JSON string includes crlf line endings, that would be another cause for discrepancies.
The solution is to configure stdin to binary and then apply the utf-8 decoding later:
fconfigure stdin -translation binary
coread 4
binary scan $::extMsg nu len
coread $len
puts [encoding convertfrom utf-8 $::extMsg]
Not using binary translation could also mess up your length. For example: If the length of the message is 269, it would come out as 266. This is because the \r gets turned into a \n, changing 0x10d into 0x10a.

Related

Reading Hebrew in JSON via native-messaging API and writing to file but Hebrew looks wrong in the file?

I'm using this code to read messages received from a browser extension via the native-messaging API. Fortunately, it works and reads all the data. Unfortunately, I'm doing something wrong or failing to do something right (if there is a difference there) and when the data is written to the file, it appears that the multi-byte characters (Hebrew, at the moment) are being written as individual bytes.
Would you please tell me what needs to be done before writing [dict get $msg doc] to the file? $msg is a big string of HTML. Am I configuring stdin incorrectly or does the data being written to file need to be encoded?
Instead of seeing Hebrew, the file shows items like this ×”Ö¸×ָֽרֶץ×.
According to the MDN document the incoming message is:
On the application side, you use standard input to receive messages
and standard output to send them.
Each message is serialized using JSON, UTF-8 encoded and is preceded
with an unsigned 32-bit value containing the message length in native
byte order.
Thank you.
#!/usr/bin/env tclsh
package require json
proc coread {reqBytes} {
set ::extMsg {}
set remBytes $reqBytes
while {![chan eof stdin]} {
yield
append ::extMsg [set data [read stdin $remBytes]]
set remBytes [expr {$remBytes - [string length $data]}]
if {$remBytes == 0} {
return $reqBytes
}
}
throw {COREAD EOF} "Unexpected EOF"
}
proc ExtRead {} {
chan event stdin readable coro_stdin
while {1} {
set ::extMsg {}
chan puts $::fp_log "starting coread 4"
if { [coread 4] != 4 || [binary scan $::extMsg iu len] != 1 } {
exit
}
if { [coread $len] != $len } {
exit
}
set msg [::json::json2dict $::extMsg]
set book [dict get $msg book]
set path "../${book}.html"
set fp [open $path w]
chan puts $fp [dict get $msg doc]
close $fp
}
}
proc CloseConnect {} {
set ::forever 0
}
proc Listen {} {
# Listening on stdin
set ::forever 1
chan configure stdin -buffering full -blocking 0 -encoding iso8859-1 -translation crlf
coroutine coro_stdin ExtRead
vwait forever
}
set extMsg {}
Listen
I don't really want to answer my own question but this is too much for a comment; and I'll be happy to delete it if someone provides a better one.
As #Abra pointed out, I had the encoding configured wrong on stdin. I thought all that needed done then was to use [string bytelength $data] instead of [string length $data] but that only worked for a few iterations. I do not understand why [coread 4] was ending up reading more than 4 bytes such that $remBytes was negative. It resulted in misreading the data and the coroutine waited for far more data than there was. So, I moved the configuring of stdin into proc extRead and changed it from binary to utf-8 for each part of the read. Binary for the 4 bytes indicating the length of the message, and utf-8 for the message. And, I had to use [string length $data] for the binary read of 4 bytes and [string bytelength $data] for reading the message; and this resulted in $remBytes always equaling zero upon completion of reading both the length and the message, such that if {$remBytes == 0} works rather than needing the earlier change mentioned in my comment of if {$remBytes <= 0}.
It works and the Hebrew now appears correctly. But it bothers me that I don't understand why [string bytelength $data] would not be 4 when reading 4 bytes. I write the values out to a log/debugging file at various steps to verify this.
I should add, here, that I had a similar question in the past which #Schelte Bron answered but I don't think I followed his advice correctly or unknowingly reverted to an old version of my code. He recommended:
The solution is to configure stdin to binary and then apply the utf-8
decoding later:
I don't know why I didn't start with that but he probably provided the answer three months ago.
#!/usr/bin/env tclsh
package require json
proc coread {reqBytes readType} {
set ::extMsg {}
set remBytes $reqBytes
while {![chan eof stdin]} {
yield
append ::extMsg [set data [read stdin $remBytes]]
if { $readType } {
set remBytes [expr {$remBytes - [string bytelength $data]}]
} else {
set remBytes [expr {$remBytes - [string length $data]}]
}
if {$remBytes == 0} {
return $reqBytes
}
}
throw {COREAD EOF} "Unexpected EOF"
}
proc ExtRead {} {
chan event stdin readable coro_stdin
while {1} {
chan configure stdin -buffering full -blocking 0 -encoding binary
set ::extMsg {}
if { [coread 4 0] != 4 || [binary scan $::extMsg iu len] != 1 } {
exit
}
chan configure stdin -buffering full -blocking 0 -encoding utf-8 -translation crlf
if { [coread $len 1] != $len } {
exit
}
set msg [::json::json2dict $::extMsg]
set book [dict get $msg book]
set path "../${book}.html"
set fp [open $path w]
chan puts $fp [dict get $msg doc]
close $fp
}
}
proc CloseConnect {} {
set ::forever 0
}
proc Listen {} {
# Listening on stdin
set ::forever 1
coroutine coro_stdin ExtRead
vwait forever
}
set extMsg {}
Listen

How to limit standard output at tclsh command line?

When running commands interactively at the tclsh command line, is there a way to truncate how much of a return value gets printed to stdout?
For example, this will take a very long time because the return value will print to stdout.
tclsh> set a [lrepeat 500000000 x]
I know I can add a dummy command in the same line, but this is an ad hoc solution. Is there some I could set in my ~/.tclshrc to truncate stdout to a finite length?
tclsh> set a [lrepeat 500000000 x] ; puts ""
Maybe this is an XY-problem (as turning off or swallowing prints to stdout seems to satisfy the OP), but the actual question was:
Is there some I could set in my ~/.tclshrc to truncate stdout to a
finite length?
You can use an interceptor on stdout (and/ or, stderr) to cap strings to a default limit:
oo::class create capped {
variable max
constructor {m} {
set max $m
}
method initialize {handle mode} {
if {$mode ne "write"} {error "can't handle reading"}
return {finalize initialize write}
}
method finalize {handle} {
# NOOP
}
method write {handle bytes} {
if {[string length $bytes] > $max} {
set enc [encoding system]
set str [encoding convertfrom $enc $bytes]
set newStr [string range $str 0 $max-1]
if {[string index $str end] eq "\n"} {
append newStr "\n"
}
set bytes [encoding convertto $enc $newStr]
}
return $bytes
}
}
Using chan push and chan pop you may turn on/off capping to, e.g., 30 characters:
% chan push stdout [capped new 30]
serial1
% puts [string repeat € 35]
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
% chan pop stdout
% puts [string repeat € 35]
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
Some remarks:
You can use an object, a namespace, or a proc offering the required interface of channel interceptors (initialize, write, ...); I prefer objects.
Ad write: You want to cap based on a character-based limit, not a byte-level one. However, write receives a string of bytes, not a string of characters. So, you need to be careful when enforcing the limit (back-transform the byte string into a char string, and vice versa, using encoding convertfrom and encoding convertto).
Similar, whether certain values of max might not be a good choice or the value range should be restricted. E.g., a max of 1 or 0 will turn off the basic REPL (the prompt % ), effectively.
As for tclshrc: You may want place the interceptor definition and chan push call therein, to enable capping per default?
tclsh is a REPL, and the "P" there is what you're seeing. Without digging into the source, I don't know that there's a simple way to accomplish exactly what you're asking.
If I remember to do it, the list command is useful to provide no output
set a [lrepeat 500000000 x]; list
or perhaps something informative
set a [lrepeat 500000000 x]; llength $a
If you want to get programmy:
proc i {val} {set ::tcl_interactive $val}
Then do i off or i 0 or i false to turn off interactivity and then execute the commands with large results. Going non-interactive silences the printing of command results, but it also turns off the prompt which could be confusing. Restore interactivity with i on or i 1 or i true

In an expect script, how do I remove a set of special characters from a string variable?

Say I have a variable that is set to some user input. I have no control over what the user will enter.
How would I go about removing all characters that are not in [A-Za-z0-9], spaces, periods, or commas?
proc getUserInput {} {
set timeout 60
send_user "\nEnter user input: "
expect_user {
-re "(.*)\n" {
set userInput $expect_out(1,string)
}
timeout {
exitTimeout "Timed out waiting for user input!"
}
}
return $userInput
}
set rawValue [ getUserInput ]
// massage variable goes here?
set massagedValue "$rawValue"
Not sure if it matters, but I'm using expect 5.45.
$ expect -v
expect version 5.45
Expect is a Tcl extension so you can use all Tcl commands when writing Expect scripts. You can try this in tclsh:
% set v1 "###the string###"
###the string###
% set v2 [regsub -all {[^ .,[:alnum:]]} $v1 ""]
the string
%

How to suppress a proc's return value in tcl prompt

I'm relatively new in TCL, in TCL prompt, when we invoke a proc with some return value, the proc's return value is echoed back by tcl. Is there a way to stop it (without affecting puts or similar functionality) as an example
bash$ tclsh
% proc a {} { puts "hello"; return 34; }
% a
hello
34
%
Now how do i suppress the 34 coming to the screen? Any help is appreciated.
Update:
Actually the proc is a part of another tool, earlier it did not have any return value, but now conditionally it can return a value.
it can be called from a script and there won't be any problem (as Bryan pointed out). and it can be called from interactive prompt, then after all the necessary outputs, the return value is getting printed unnecessarily.
So 1) I don't have the facility of changing a user's tclshrc 2) existing scripts should continue to work.
And it seems strange that every time the proc is called, after all the necessary outputs, a number gets printed. To a user, this is a needless information unless he has caught the value and wants to do something. So i wanted the value to be delivered to user, but without getting printed to prompt/UI (hope i'm clear )
The interactive shell code in tclsh and wish will print any non-empty result. To get nothing printed, you have to have the last command on the “line” produce an empty result. But which command to use?
Many commands will produce an empty result:
if 1 {}
subst ""
format ""
However, the shortest is probably:
list
Thus, you could write your code like:
a;list
Of course, this only really becomes useful when your command actually produces a large result that you don't want to see. In those cases, I often find that it is most useful to use something that measures the size of the result, such as:
set tmp [something_which_produces a_gigantic result]; string length $tmp
The most useful commands I find for that are string length, llength and dict size.
If you absolutely must not print the result of the command, you have to write your own interactive loop. There are two ways to do this, depending on whether you are running inside the event loop or not:
Without the event loop
This simplistic version just checks to see if the command name is in what the user typed. It's probably not a good idea to arbitrarily throw away results otherwise!
set accum ""
while {[gets stdin line] >= 0} {
append accum $line "\n"
if {[info complete $accum]} {
if {[catch $accum msg]} {
puts stderr $msg
} elseif {$msg ne "" && ![string match *TheSpecialCommand* $accum]} {
puts $msg
}
set accum ""
}
}
With the event loop
This is just handling the blocking IO case; that's the correct thing when input is from a cooked terminal (i.e., the default)
fileevent stdin readable handleInput
set accum ""
proc handleInput {} {
global accum
if {[gets stdin line] < 0} {
exit; # Or whatever
}
append accum $line "\n"
if {[info complete $accum]} {
if {[catch {uplevel "#0" $accum} msg]} {
puts stderr $msg
} elseif {$msg ne "" && ![string match *TheSpecialCommand* $accum]} {
puts $msg
}
set accum ""
}
}
vwait forever; # Assuming you're not in wish or have some other event loop...
How to detect the command is being executed
The code above uses ![string match *TheSpecialCommand* $accum] to decide whether to throw away the command results, but this is very ugly. A more elegant approach that leverages Tcl's own built-in hooks is to use an execution trace to detect whether the command has been called (I'll just show the non-event-loop version here, for brevity). The other advantage of this is that it is simple to extend to suppressing the output from multiple commands: just add the trace to each of them.
trace add execution TheSpecialCommand enter SuppressOutput
proc SuppressOutput args {
# Important; do not suppress when it is called inside another command
if {[info level] == 1} {
set ::SuppressTheOutput 1
}
}
# Mostly very similar from here on
set accum ""
while {[gets stdin line] >= 0} {
append accum $line "\n"
if {[info complete $accum]} {
set SuppressTheOutput 0; # <<<<<< Note this!
if {[catch $accum msg]} {
puts stderr $msg
} elseif {$msg ne "" && !$SuppressTheOutput} { # <<<<<< Note this!
puts $msg
}
set accum ""
}
}
To be clear, I wouldn't ever do this in my own code! I'd just suppress the output manually if it mattered.
You could make an empty procedure in .tclshrc...
proc void {} {}
...and when you don't need a return value, end the line with ;void.
Use tcl_interactive variable to enable the return of of the value, although I'm not sure where this would be useful...
proc a {} {
puts "hello"
if { [info exist tcl_interactive] } {
return {};
} else {
return 34;
}
}

TCL Flush Ignore Backspace

Is there a way to ignore backspaces when performing a flush in tcl to capture user input?
I am performing a function where I capture the user input in a variable to be used in another command at a later time. So I perform the following function.
puts -nonewline "What is the username? "
flush stdout
set usrnm [gets stdin]
So let's say using that command as long as I don't use a backspace everything works the way I expect it however if I do use a backspace a "\x7F" is added as a character. Is there a way for the backspace to not be treated as a character?
That seems to depend on your terminal; when I try that code with these key sequences:
BackspaceabcReturn
abcBackspacedReturn
Then I get a length 3 string (measured via string length) in the usrnm variable in both cases. This is what I'd expect when the terminal is properly in cooked mode (the usual default). Since a \x7f is probably not a valid character in a user name anyway, I'd guess that you could filter it out:
set usrnm [string map {\x7f ""} $usrnm]
The only way to be absolutely sure that the character isn't there is to put the terminal in to raw mode (and probably no-echo too) and do all the character input processing yourself. That's a huge amount of work relative to the size of problem; a post-filter seems more sensible to me (and I still wonder what's up with your terminal).
[EDIT]: To put your terminal back into cooked mode, do:
exec stty -raw <#stdin
I just ran into this recently and I wrote a procedure to handle the char 127 character (backspace). If any other input cleansing needs to happen you can do it here too, such as removing special characters. I have a feeling this can be more elegant but it does work.
proc cleanInput {str} {
set return ""
for {set i 0} {$i < [string length $str]} {incr i} {
set char [string index $str $i]
set asc [scan $char %c]
if {$asc == 127} { #backspace
if {[string length $return] > 0} {
set return [string range $return 0 [expr "[string length $return] - 2"]]
}
} else {
append return $char
}
}
return $return
}