Reading Hebrew in JSON via native-messaging API and writing to file but Hebrew looks wrong in the file? - tcl

I'm using this code to read messages received from a browser extension via the native-messaging API. Fortunately, it works and reads all the data. Unfortunately, I'm doing something wrong or failing to do something right (if there is a difference there) and when the data is written to the file, it appears that the multi-byte characters (Hebrew, at the moment) are being written as individual bytes.
Would you please tell me what needs to be done before writing [dict get $msg doc] to the file? $msg is a big string of HTML. Am I configuring stdin incorrectly or does the data being written to file need to be encoded?
Instead of seeing Hebrew, the file shows items like this ×”Ö¸×ָֽרֶץ×.
According to the MDN document the incoming message is:
On the application side, you use standard input to receive messages
and standard output to send them.
Each message is serialized using JSON, UTF-8 encoded and is preceded
with an unsigned 32-bit value containing the message length in native
byte order.
Thank you.
#!/usr/bin/env tclsh
package require json
proc coread {reqBytes} {
set ::extMsg {}
set remBytes $reqBytes
while {![chan eof stdin]} {
yield
append ::extMsg [set data [read stdin $remBytes]]
set remBytes [expr {$remBytes - [string length $data]}]
if {$remBytes == 0} {
return $reqBytes
}
}
throw {COREAD EOF} "Unexpected EOF"
}
proc ExtRead {} {
chan event stdin readable coro_stdin
while {1} {
set ::extMsg {}
chan puts $::fp_log "starting coread 4"
if { [coread 4] != 4 || [binary scan $::extMsg iu len] != 1 } {
exit
}
if { [coread $len] != $len } {
exit
}
set msg [::json::json2dict $::extMsg]
set book [dict get $msg book]
set path "../${book}.html"
set fp [open $path w]
chan puts $fp [dict get $msg doc]
close $fp
}
}
proc CloseConnect {} {
set ::forever 0
}
proc Listen {} {
# Listening on stdin
set ::forever 1
chan configure stdin -buffering full -blocking 0 -encoding iso8859-1 -translation crlf
coroutine coro_stdin ExtRead
vwait forever
}
set extMsg {}
Listen

I don't really want to answer my own question but this is too much for a comment; and I'll be happy to delete it if someone provides a better one.
As #Abra pointed out, I had the encoding configured wrong on stdin. I thought all that needed done then was to use [string bytelength $data] instead of [string length $data] but that only worked for a few iterations. I do not understand why [coread 4] was ending up reading more than 4 bytes such that $remBytes was negative. It resulted in misreading the data and the coroutine waited for far more data than there was. So, I moved the configuring of stdin into proc extRead and changed it from binary to utf-8 for each part of the read. Binary for the 4 bytes indicating the length of the message, and utf-8 for the message. And, I had to use [string length $data] for the binary read of 4 bytes and [string bytelength $data] for reading the message; and this resulted in $remBytes always equaling zero upon completion of reading both the length and the message, such that if {$remBytes == 0} works rather than needing the earlier change mentioned in my comment of if {$remBytes <= 0}.
It works and the Hebrew now appears correctly. But it bothers me that I don't understand why [string bytelength $data] would not be 4 when reading 4 bytes. I write the values out to a log/debugging file at various steps to verify this.
I should add, here, that I had a similar question in the past which #Schelte Bron answered but I don't think I followed his advice correctly or unknowingly reverted to an old version of my code. He recommended:
The solution is to configure stdin to binary and then apply the utf-8
decoding later:
I don't know why I didn't start with that but he probably provided the answer three months ago.
#!/usr/bin/env tclsh
package require json
proc coread {reqBytes readType} {
set ::extMsg {}
set remBytes $reqBytes
while {![chan eof stdin]} {
yield
append ::extMsg [set data [read stdin $remBytes]]
if { $readType } {
set remBytes [expr {$remBytes - [string bytelength $data]}]
} else {
set remBytes [expr {$remBytes - [string length $data]}]
}
if {$remBytes == 0} {
return $reqBytes
}
}
throw {COREAD EOF} "Unexpected EOF"
}
proc ExtRead {} {
chan event stdin readable coro_stdin
while {1} {
chan configure stdin -buffering full -blocking 0 -encoding binary
set ::extMsg {}
if { [coread 4 0] != 4 || [binary scan $::extMsg iu len] != 1 } {
exit
}
chan configure stdin -buffering full -blocking 0 -encoding utf-8 -translation crlf
if { [coread $len 1] != $len } {
exit
}
set msg [::json::json2dict $::extMsg]
set book [dict get $msg book]
set path "../${book}.html"
set fp [open $path w]
chan puts $fp [dict get $msg doc]
close $fp
}
}
proc CloseConnect {} {
set ::forever 0
}
proc Listen {} {
# Listening on stdin
set ::forever 1
coroutine coro_stdin ExtRead
vwait forever
}
set extMsg {}
Listen

Related

Why does the channel event readable keep firing in this code?

Could you tell me what in this code would cause the chan event $sock readable [list ReadLine $sock] to repeatedly be fired after a socket has been reset?
I'm trying to keep the socket open on an extremely simple local server by resetting the socket to read again at the first line.
I'm not sure if it is the browser or the Tcl code. I flush the socket before sending each response using chan flush $sock; so, I don't think there should be any data left in the input buffer to fire the readable event.
Everything works fine until I stop using the application for a few minutes, and then procedure ReadLine is repeatedly invoked with a state of 1 but no data.
I included procedure GetLexi at the end because its the one I've been testing when this started occurring and, perhaps, I'm doing something wrong there and the browser doesn't know the response is complete. I should add also that it is called from with procedure GetHandler and should return back to the switch block and be reset. I did test that the socket is reset after an invocation of GetLexi.
Thank you for any guidance you may be able to provide.
proc ResetSock {sock} {
global state
set state($sock) {1}
chan configure $sock -encoding iso8859-1 -translation crlf
}; #close ResetSock
proc ClientConnect {sock client_ip client_port} {
global state
if {![info exists state($sock)]} {
set state($sock) {1}; # 1 for first line; 2 for any other header line.
chan configure $sock -buffering line -blocking 0 -encoding iso8859-1 -translation crlf
}
chan event $sock readable [list ReadLine $sock]
}; #close ClientConnect
proc ReadLine {sock} {
global state
set sptr state($sock)
set row [lindex [set $sptr] 0]
if {[catch {gets $sock line} len]} {
# Handle this error.
return
}
if {$len == 0} {
#According to Tclhttpd, in between requests, a row 1 and len 0
#combination can occur. There, it is ignored.
if {$row == 2 } {
switch [lindex [set $sptr] 1] {
"POST" {
set post [PostHandler $sock [lindex [set $sptr] 3]]
puts stdout "Posted: $post"
ResetSock $sock
}
"GET" {
GetHandler $sock [lindex [set $sptr] 2]
ResetSock $sock
}
default { CloseSock $sock }
}
}
} elseif {$len > 0} {
switch $row {
1 {
# First row of request.
lassign [split [string trim $line]] op arg rest
lappend $sptr $op $arg
lset $sptr 0 0 2
}
2 {
# Read headers.
}
default { }
}
} else {
# Only other option for $len is a negative value;
# thus, an error to be handled.
}
}; #close ReadLine
proc GetLexi { nbr sock } {
chan flush $sock
set sql { select img_base64 from lexi_raw where strongs_no = $nbr }
dbws eval $sql {
set lexi [binary format a* "{\"lexi\":\"$img_base64\"}"]
}
set headers ""
append headers "HTTP/1.1 200 OK\n"
append headers "Content-Type: application/json; charset: utf-8\n"
append headers "Content-length: [string length $lexi]\n"
append headers "Connection: Keep-Alive\n"
puts $sock $headers
chan configure $sock -translation binary
puts $sock $lexi
}; #close GetLexi
set state(x) {}
if [catch {set listener [socket -server ClientConnect -myaddr 127.0.0.1 8000]}] {
error "couldn't open listening socket"
}
vwait forever
catch {close $listener}
It's possible for the amount of data received in a packet to be not enough to complete a line. TCP hides most of the details, of course, but it remains entirely possible for the readable event to fire when a complete line is not available. As you've put the socket in non-blocking mode, that means that gets will do a zero length read (empty string written to line, len becomes 0); if the socket was blocking, the gets would block the thread until a complete line was available. You can also get zero-length reads if the channel is closed; detection of closure isn't very reliable with TCP (because networking is like that) but may happen. When the socket is closed, all reading from it in non-blocking mode results in zero-length results.
How do you distinguish these cases?
First, we check for end-of-stream:
if {[chan eof $sock]} {
# It's gone; there's not much else you can do at this point except log it I guess
close $sock
return
}
Then we need to see if things are blocked, and if so, how much is buffered:
if {[chan blocked $sock]} {
set bufferSize [chan pending input $sock]
if {$bufferSize > 4096} { # 4k is enough for most things
# Line is too long; client not well-behaved…
# You *might* send an error here.
close $sock
return
}
}
If it's neither of those cases, we've actually read a line.
if {$len == 0} {
# Empty line; end of HTTP header
} else {
# etc.
}

Breaking Condition in If construct

Hi I am using this piece of code for inserting pipe in TCL. Can anybody please let me understand when this condition [gets $pipe line] >= 0 fails.
For eg: only when [gets $pipe line] is a negative number this will fail.
In my case it is never returning a negative number and the TestEngine hangs forever
set pipeline [open "|Certify.exe filename" "r+"]
fileevent $pipeline readable [list handlePipeReadable $pipeline]
fconfigure $pipeline -blocking 0
proc handlePipeReadable {pipe} {
if {[gets $pipe line] >= 0} {
# Managed to actually read a line; stored in $line now
} elseif {[eof $pipe]} {
# Pipeline was closed; get exit code, etc.
if {[catch {close $pipe} msg opt]} {
set exitinfo [dict get $opt -errorcode]
} else {
# Successful termination
set exitinfo ""
}
# Stop the waiting in [vwait], below
set ::donepipe $pipe
} else {
puts ""
# Partial read; things will be properly buffered up for now...
}
}
vwait ::donepipe
The gets command (when given a variable to receive the line) returns a negative number when it is in a minor error condition. There are two such conditions:
When the channel has reached end-of-file. After the gets the eof command (applied to the channel) will report a true value in this case.
When the channel is blocked, i.e., when it has some bytes but not a complete line (Tcl has internal buffering to handle this; you can get the number of pending bytes with chan pending). You only see this when the channel is in non-blocking mode (because otherwise the gets will wait indefinitely). In this case, the fblocked command (applied to the channel) will return true.
Major error conditions (such as the channel being closed) result in Tcl errors.
If the other command only produces partial output or does something weird with buffering, you can get an eternally blocked pipeline. It's more likely with a bidirectional pipe, such as you're using, as the Certify command is probably waiting for you to close the other end. Can you use it read-only? There are many complexities to interacting correctly with a process bidirectionally! (For example, you probably want to make the pipe's output buffering mode be unbuffered, fconfigure $pipeline -buffering none.)
Please find the way the certify process is being triggered from the command prompt and the print statements are given just for the understanding. At the end the process hangs and the control is not transferred back to the TCL
From the documentation for gets:
If varName is specified and an empty string is returned in varName because of end-of-file or because of insufficient data in nonblocking mode, then the return count is -1.
Your script is working completely fine. checked with set pipeline [open "|du /usr" "r+"]
instead of your pipe and included puts "Line: $line" to check the result. So its clear that there is some problem in Certify command. Can you share your command, how do you use on terminal and how did you use with exec?
################### edited by Drektz
set pipeline [open "|du /usr" "r+"]
fileevent $pipeline readable [list handlePipeReadable $pipeline]
fconfigure $pipeline -blocking 0
proc handlePipeReadable {pipe} {
if {[gets $pipe line] >= 0} {
# Managed to actually read a line; stored in $line now
################### included by Drektz
puts "Line: $line"
} elseif {[eof $pipe]} {
# Pipeline was closed; get exit code, etc.
if {[catch {close $pipe} msg opt]} {
set exitinfo [dict get $opt -errorcode]
} else {
# Successful termination
set exitinfo ""
}
# Stop the waiting in [vwait], below
set ::donepipe $pipe
} else {
puts ""
# Partial read; things will be properly buffered up for now...
}
}
vwait ::donepipe
you can see it in the CMDwith& screenshot mentioned.Is there any workaround that I have to overcome the issue
Please see the issue when run with the help of exec executable args &

Strange line buffering behaviour in Tcl 8.6?

EDIT: Original example and alternative solution framework modified for clarity.
The line buffering behaviour might behave differently than expected in Tcl 8.6. The following code blocks without any output, unless the "chan close" line is uncommented:
set data {one two four}
set stream [open |[list cat -n] r+]
chan configure $stream -buffering line
chan puts $stream "$data\n"
chan puts $stream "\n"
chan flush $stream
#chan close $stream write
set out [chan read $stream]
puts "output: $out"
chan close $stream
So this simplistic solution does not work for interactive I/O, and this might be related to synchronization problems at both ends of the pipe.
Using a channel event structure (e.g., based on http://www.beedub.com/book/2nd/event.doc.html), seems to be preferable:
proc chanReader { pipe } {
global extState
while 1 {
set len [chan gets $pipe line]
if { $len > 0 } {
puts "<< $line."
continue
} else {
if { [chan blocked $pipe] } {
set extState 1
return
} elseif { [chan eof $pipe] } {
set extState 2
return
}
}
}
}
set data {one two foure}
set timeout 5000
#set stream [open [list | cat -n] r+]
#set stream [open [list | ispell -a] r+]
set stream [open [list | tr a-z A-Z] r+]
#set stream [open [list | fmt -] r+]
chan configure $stream -blocking 0 -buffering line
set extState 0
chan event $stream readable [list chanReader $stream]
foreach word $data {
puts "> $word\n"
chan puts $stream "$word\n"
chan flush $stream
#chan close $stream write
set aID [after $timeout {set extState 3}]
vwait extState
if { $extState == 1 } {
# Got regular output.
after cancel $aID
puts "Cancel $aID."
continue
} elseif { $extState == 2 } {
puts "External program closed."
chan close $stream
exit 2
} elseif { $extState == 3 } {
puts "Timeout."
chan close $stream
exit 3
}
}
puts "End of task."
chan close $stream
exit 0
This code fragment works with the "cat -n" and "ispell -a" external programs (commented lines), but still fails with other external programs. For instance it does not work with the "tr a-z A-Z" and "fmt" examples above.
If the line "chan close $stream write" above is uncommented, we receive output from the external program, but this terminates the interaction with it. How to reliably connect (interactively) to these external programs?
I'm guessing that the core issue here is that there's two sources of buffering going on, and Tcl only has control over one of them. But both stem from the fact that virtually all output, when not going to an “interactive” destination (i.e., a terminal), is buffered. There's basically a call in the C standard library that determines this and enables the buffering feature, and Tcl follows that rule too (despite using its entirely independent I/O library). Doing this massively speeds up non-interactive pipeline processing, but means that if you're expecting to see every byte output exactly at the point when the program thinks it is writing it, you're going to be disappointed.
Of course, programs can switch this buffering off if they want. In Tcl, this is done by fconfigure $channel -buffering none (or line for line-oriented buffering). In cat, the -n option makes it do the equivalent (calling setvbuf() in C) and ispell is probably doing the same. But most programs don't. Some instead call fflush() from time to time; that works too, but is also a minority practice. So with a bidirectional pipeline such as you're using, you can easily force the side where you feed into it from Tcl not buffer, but you can't usually get the other side to do the same.
There is a workaround: run the subprocess with Expect. That puts a fake terminal between Tcl and the subprocess (instead of a pipe) and tricks it into thinking it is talking direct to the user. But the consequence of this is that you have to substantially rewrite your Tcl program and you gain a dependency on a (very fine!) external package.

In a tcl script how can i use puts to write a string to the console and to a file at the same time?

# Prints the string in a file
puts $chan stderr "$timestamp - Running test: $test"
# Prints the string on a console
puts "$timestamp - Running test: $test"
Is there a way I can send the output of puts to the screen and to a log file at the same time? Currently I have both the above two lines one after the other in my script to achieve this.
Or is there any other solution in tcl ?
Use the following proc instead of puts:
proc multiputs {args} {
if { [llength $args] == 0 } {
error "Usage: multiputs ?channel ...? string"
} elseif { [llength $args] == 1 } {
set channels stdout
} else {
set channels [lrange $args 0 end-1]
}
set str [lindex $args end]
foreach ch $channels {
puts $ch $str
}
}
Examples:
# print on stdout only
multiputs "1"
# print on stderr only
multiputs stderr "2"
set brieflog [open brief.log w]
set fulllog [open detailed.log w]
# print on stdout and in the log files
multiputs stdout $brieflog $fulllog "3"
This isn't something I've used extensively, but it seems to work (Tcl 8.6+ only):
You need the channel transform tcl::transform::observe package:
package require tcl::transform::observe
Open a log file for writing and set buffering to none:
set f [open log.txt w]
chan configure $f -buffering none
Register stdout as a receiver:
set c [::tcl::transform::observe $f stdout {}]
Anything written to the channel $c will now go to both the log file and stdout.
puts $c foobar
Note that it would seem to make more sense to have the channel transformation on top of stdout, with the channel to the log file as receiver, but I haven't been able to make that work.
Documentation:
chan,
open,
package,
puts,
set,
tcl::transform::observe (package)

How to received proc return data in fileevent

I am calling a proc through fileevent. that proc returns a line od data.
how to receive this data?
the following code I have written to receive data from pipe when ever data is available. I dont want to block by using direct gets.
proc GetData1 { chan } {
if {[gets $chan line] >= 0} {
return $line
}
}
proc ReadIO {chan {timeout 2000} } {
set x 0
after $timeout {set x 1}
fconfigure $chan -blocking 0
fileevent $chan readable [ list GetData1 $chan ]
after cancel set x 3
vwait x
# Do something based on how the vwait finished...
switch $x {
1 { puts "Time out" }
2 { puts "Got Data" }
3 { puts "App Cancel" }
default { puts "Time out2 x=$x" }
}
# How to return data from here which is returned from GetData1
}
ReadIO $io 5000
# ** How to get data here which is returned from GetData1 ? **
There are probably as many ways of doing this as there are Tcl programmers. Essentially, you shouldn't use return to pass the data back from your fileevent handler as it isn't called in the usual way so you can get at what it returns.
Here are a few possible approaches.
Disclaimer None of these is tested, and I'm prone to typing mistakes, so treat with a little care!
1) Get your fileevent handler to write to a global veriable:
proc GetData1 {chan} {
if {[gets $chan line]} >= 0} {
append ::globalLine $line \n
}
}
.
.
.
ReadIO $io 5000
# ** The input line is in globalLine in the global namespace **
2) Pass the name of a global variable to your fileevent handler, and save the data there
proc GetData2 {chan varByName} {
if {[gets $chan line]} >= 0} {
upvar #0 $varByName var
append var $line \n
}
}
fileevent $chan readable [list GetData1 $chan inputFromChan]
.
.
.
ReadIO $chan 5000
# ** The input line is in ::inputFromChan **
A good choice for the variable here might be an array indexed by $chan, e.g. fileevent $chan readable [list GetDetail input($chan)]
3) Define some kind of class to look after your channels that stashes the data away internally and has a member function to return it
oo::class create fileeventHandler {
variable m_buffer m_chan
constructor {chan} {
set m_chan $chan
fileevent $m_chan readable [list [self object] handle]
set m_buffer {}
}
method data {} {
return $m_buffer
}
method handle {} {
if {[gets $m_chan line]} >= 0 {
append m_buffer $line \n
}
}
}
.
.
.
set handlers($chan) [fileeventHandler new $chan]; # Save object address for later use
ReadIO $io 5000
# Access the input via [handlers($chan) data]