How to control reading of bits in XOR data frames? - binary

I'm trying to learn to read the XOR data frames used in web sockets in Tcl.
I was reading the HTTP requests using:
chan configure $sock -buffering line -blocking 0 -encoding iso8859-1 -translation crlf
chan event $sock readable [list ReadLine $sock]
[catch {chan gets $sock line} len]
Now after the socket is opened, chan configure $sock -translation binary to read the component bits of the XOR frame, but I'm confused about the -buffering and -buffersize
and I changed the chan event to not get a full line but chan read numChars; but the readable event seems to fire for every character or again after each character is read.
Should the various segments of bits be read directly from the channel or should larger pieces be read from the channel into variables and then the bits separated from those pieces?
What is the proper channel configuration in order to read the bits in a controlled manner?
Also, it reads here https://www.tcl.tk/man/tcl/TclCmd/chan.html#M35 that in non-blocking mode chan read may not read all the requested characters. What is to be done? Count them and read again until get them all?
Thank you.

The -buffering and -buffersize are options used to manage the output side of the channel, i.e., when you write data to the socket with puts (or chan puts; it's an alternate name for the same thing). They're not used for input.
When you have the channel in binary mode, the characters you read and write correspond one-to-one with the bytes. You probably shouldn't use gets (chan gets) on binary data; read (chan read) is more likely to be appropriate. (For writing, the -nonewline option to puts is virtually mandatory.)
When you read a non-blocking channel with a number of characters/bytes requested, you can get up to that amount of data. If the request can be satisfied with what is in the read buffer, that is used and no request to the underlying file descriptor is done. If the request can be partially satisfied with buffered data, that's used first and only then is a request done for more data; if that request produces more data than needed, it is stored in the buffer (you can see how much with chan pending, but that's not normally important for binary channels). However, if that one non-blocking request does not deliver enough data to give you what you asked for, read returns anyway: you have a short read. Short reads don't necessarily mean that you're at the end of the channel, use chan eof and chan blocked to find out more (especially if you get the special case of a zero-length read). Being blocked might also not mean that you're at the end of a message within a higher-level protocol; more data may be coming, but it hasn't reached the OS yet (which is why you need a framing protocol on top of TCP; websockets are one such framing protocol).
Counting the data is easy: string length.
tl;dr: In non-blocking mode, the maximum amount that read of a binary channel can return is whatever is currently in the input buffers plus whatever is obtained from one non-blocking read of the file descriptor. In blocking mode, read will wait until the requested amount of data is available or definitely not available (end-of-file), performing multiple reads of the file descriptor if necessary.

Related

How is a channel's output-buffer content deleted without writing it to the channel?

I don't know much about PHP or Tcl; but I am trying to learn both concurrently.
In PHP, I read that every script should start with ob_start and, therefore, have been using the following.
ob_start(NULL, 0, PHP_OUTPUT_HANDLER_STDFLAGS);
echo header('Content-Length: '.ob_get_length());
ob_end_flush();
ob_end_clean();
In Tcl channels, I see that the options of -buffering full and -buffersize take care of ob_start() and chan flush is analogous to ob_end_flush() and chan pending output returns the number of bytes written to the output buffer but not yet written out.
I've been looking at my two texts on Tcl and the Tcl manual web page for channels and I can't find a method of just clearing the channel output buffer without writing it.
If data is being written to a channel set to -buffering full and an error is caught/trapped is it possible to empty the buffer and not write it to the channel?
It though perhaps that could use chan seek to set the position back to start similar to setting a pointer back to the beginning of a segment of RAM but the pipe example doesn't appear to create a channel that supports seeking.
lassign [chan pipe] rchan wchan
chan configure $rchan -buffering line -blocking 0 -translation crlf
chan configure $wchan -buffering full -blocking 0 -translation crlf
chan puts $wchan "This is the full messsage which shall attempt to truncate."
chan puts stdout "wchan pending: [chan pending output $wchan]"
chan puts stdout "wchan tell: [chan tell $wchan]"
# => -1 Thus, channel does not support seeking.
#chan seek $wchan 5 start
# => Errors invalid seek
chan flush $wchan
chan puts stdout [chan gets $rchan]
Thank you.
Sounds like you want to only output text written to a channel if no error happens in the middle of writing?
One way is to use a variable channel from tcllib; everything written to the channel is stored in a variable, which can then be written out to the real target on successful completion of whatever you're trying to do.
Example:
#!/usr/bin/env tclsh
package require tcl::chan::variable
proc main {} {
variable output
set output ""
set outputchan [::tcl::chan::variable output]
try {
puts $outputchan "Some text"
error "This is an error"
# Won't get written if an error is raised
chan flush $outputchan
puts -nonewline $output
} on error {errMsg errOptions} {
# Report error if you want
} finally {
chan close $outputchan
}
}
main
I don't think Tcl provides the functionality you are looking for. It's assumed that if you send something to a channel then it should always be written out.

Tcl - Reading characters from stdin without having to press enter in Tcl

I would like to read from stdin on a per character basis without the stdin being flushed. I could not find how to do that after tweaking for hours. Tcl always seems to wait for the channel to be flushed even in fconfigure stdin -blocking 0 -buffering none. Is that true? How would I otherwise approach this?
More explanation:
Imagine a Tcl program that makes its own prompt with some threads running code in the background. I would like this prompt to react to single keystrokes, for example: when you press 'p' (without pressing enter) the prompt reads that character and pauses the threads, when you press 'q' the prompt kills the threads and stops the program. The cleanest and closest solution is shortly demonstrated in the following code snippet.
proc readPrompt { } {
set in [ read stdin 1 ]
if { $in eq "q" } {
puts "Quitting..."
set ::x 1
} {
puts "Given unknown command $in"
}
}
fconfigure stdin -blocking 0 -buffering none
fileevent stdin readable { readPrompt }
vwait x
The result from running this is:
a
Given unknown command a
Given unknown command
After pressing the 'a', nothing happens. My guess is that the stdin is not flushed or something. Pressing enter or CTRL-d triggers the fileevent and the prompt then reads both the characters 'a' and 'enter'.
Ideally, I want the enter-press not to be needed. How could I accomplish this?
EDIT: I found this question and solution about a related use in Python: Determine the terminal cursor position with an ANSI sequence in Python 3 This is approximately the behaviour I'm looking for, but in Tcl.
If you have 8.7 (currently in alpha) then this is “trivial”:
fconfigure stdin -inputmode raw
That delivers all characters to you, without echoing them. (There's also modes normal and password, both of which preprocess the data before delivery and only one of which echoes.) You'll have to look after giving visual feedback to the user yourself, and be aware that all includes all characters usually only used for line editing purposes.
Otherwise, on Unixes (Linux, macOS) you do:
exec stty raw -echo <#stdin >#stdout
to switch the mode to the same config, and:
exec stty -raw echo <#stdin >#stdout
to switch back. (Not all Unixes need the input and output redirects, but some definitely do.)
Windows consoles have something similar in 8.7, but not in previous versions; a workaround might be possible using the TWAPI console support but that's a very low level API (and I don't know the details).

Youtube script with API on eggdrop not showing correct charset

Got some issues with Youtube API, it doesnt show åäö in titles. Urltitle script does, after an eggdrop recompile. So any suggestions how to make 2nd script use utf8 or so?
There are two key problems here:
Ensuring that the data from the data source is understood correctly by Tcl.
Ensuring that the correct data known by Tcl is reported to the channel in a way that it can digest.
Hopefully the first is already working; the http package and the various database access packages mostly get that right (except when the data source tells lies, as can happen occasionally with real-world data). You can test this by doing:
set msg ""
foreach char [split $data ""] { # For each UNICODE character...
append msg [format %04x [scan $char %c]]; # The hex code for the char
}
putserv "PRIVMSG $chan :$msg"
For example, this would generate a message like this for data of åäö€:
00e500e400f620ac
If that's working (it's the hard part to solve if it isn't), all you've got to do is ensure that the data actually goes correctly to the channel. This might be as simple as doing this (if the Tcl side of the channel is in binary mode):
putserv "PRIVMSG $chan :[encoding convertto utf-8 $data]"
In theory, the Tcl channel that putserv writes on could do the conversion for you, but getting that part right is tricky when there's unknown code between your code and the actual channel.
It's also possible that the IRC server expects the data in a different encoding such as iso8859-15. There isn't a universal rule there, alas.

Reading the wrong number of bytes from a binary file

I have the following code:
set myfile "the path to my file"
set fsize [file size $myfile]
set fp [open $myfile r]
fconfigure $fp -translation binary
set data [read $fp $fsize]
close $fp
puts $fsize
puts [string bytelength $data]
And it shows that the bytes read are different from the bytes requested. The bytes requested match what the filesystem shows; the actual bytes read are 22% more (requested 29300, got 35832). I tested this on Windows, with Tcl 8.6.
Use string length. Don't use string bytelength. It gives the “wrong” answers, or rather it answers a question you probably don't want to ask.
More Depth
The string bytelength command returns the length in bytes of the data in Tcl's internal almost-UTF-8 encoding. If you're not working with Tcl's C API directly, you really have no sensible use for that value, and C code is actually pretty able to get the value without that command. For ASCII text, the length and the byte-length are the same, but for binary data or text with NULs or characters greater than U+00007F (the Unicode character that is equivalent to ASCII DEL), the values will differ. By contrast, the string length command knows how to handle binary data correctly, and will report the number of bytes in the byte-string that you read in. We plan to deprecate the string bytelength command, as it turns out to be a bug in someone's code almost every time they use it.
(I'm guessing that your input data actually has 6532 bytes outside the range 1–127 in it; the other bytes internally use a two-byte representation in almost-UTF-8. Fortunately, Tcl doesn't actually convert into that format until it needs to, and instead uses a compact array of bytes in this case; you're forcing it by asking for the string bytelength.)
Background Information
The question of “how much memory is actually being used by Tcl to read this data” is quite hard to answer, because Tcl will internally mutate data to hold it in the form that is most efficient for the operations you are applying to it. Because Tcl's internal types are all precisely transparent (i.e., conversions to and from them don't lose information) we deliberately don't talk about them much except from an optimisation perspective; as a programmer, you're supposed to pretend that Tcl has no types other than string of unicode characters.
You can peel the veil back a bit with the tcl::unsupported::representation command (introduced in 8.6). Don't use the types for decisions on what to do in your code, as that is really not something guaranteed by the language, but it does let you see a lot more about what is really going on under the covers. Just remember, the values that you see are not the same as the values that Tcl's implementation thinks about. Thinking about the values that you see (without that magic command) will keep you thinking about things that it is correct to write.

Tcl flush command returns error

I've gotten this error from Tcl flush command:
error flushing "stdout": I/O error
Any ideas why it can happen? Also, the man page doesn't say anything about flush returning errors. Can the same error come from puts?
And ultimately: what do do about it?
Thank you.
By default Tcl uses line buffering on stdout (more on this later). This means whatever you puts there gets buffered and is only output when the newline is seen in the buffer or when you flush the channel. So, yes, you can get the same error from puts directly, if a call to it is done on an unbuffered channel or if it managed to hit that "buffer full" condition so that the underlying medium is really accessed.
As to "I/O error", we do really need more details. Supposedly the stdout of your program has been redirected (or reopened), and the underlying media is inaccessible for some reason.
You could try to infer more details about the cause by inspecting the POSIX errno of the actual failing syscall — it's accessible via the global errorCode variable after a Tcl command involving such a syscall signalized error condition. So you could go like this:
set rc [catch {flush stdout} err]
if {$rc != 0} {
global errorCode
set fd [open error.log w]
puts $fd $err\n\n$errorCode
close $fd
}
(since Tcl 8.5 you can directly ask catch to return all the relevant info instead of inspecting magic variables, see the manual).
Providing more details on how you run your program or whether you reopen stdout is strongly suggested.
A note on buffering. Output buffering can be controlled using fconfigure (or chan configure since 8.5 onwards) on a channel by manipulating its -buffering option. Usually it's set to line on stdout but it can be set to full or none. When set to full, the -buffersize option can be used to control the buffer size explicitly.