Writing to output without buffering in Julia - output

What is the way to print data on standard output without buffering?
println buffers data and writes them all together.
Equivalently a command to empty the print buffer would be useful.

flush is the command to empty the print buffer for a given stream: https://docs.julialang.org/en/v1/base/io-network/#Base.flush

Related

How to control reading of bits in XOR data frames?

I'm trying to learn to read the XOR data frames used in web sockets in Tcl.
I was reading the HTTP requests using:
chan configure $sock -buffering line -blocking 0 -encoding iso8859-1 -translation crlf
chan event $sock readable [list ReadLine $sock]
[catch {chan gets $sock line} len]
Now after the socket is opened, chan configure $sock -translation binary to read the component bits of the XOR frame, but I'm confused about the -buffering and -buffersize
and I changed the chan event to not get a full line but chan read numChars; but the readable event seems to fire for every character or again after each character is read.
Should the various segments of bits be read directly from the channel or should larger pieces be read from the channel into variables and then the bits separated from those pieces?
What is the proper channel configuration in order to read the bits in a controlled manner?
Also, it reads here https://www.tcl.tk/man/tcl/TclCmd/chan.html#M35 that in non-blocking mode chan read may not read all the requested characters. What is to be done? Count them and read again until get them all?
Thank you.
The -buffering and -buffersize are options used to manage the output side of the channel, i.e., when you write data to the socket with puts (or chan puts; it's an alternate name for the same thing). They're not used for input.
When you have the channel in binary mode, the characters you read and write correspond one-to-one with the bytes. You probably shouldn't use gets (chan gets) on binary data; read (chan read) is more likely to be appropriate. (For writing, the -nonewline option to puts is virtually mandatory.)
When you read a non-blocking channel with a number of characters/bytes requested, you can get up to that amount of data. If the request can be satisfied with what is in the read buffer, that is used and no request to the underlying file descriptor is done. If the request can be partially satisfied with buffered data, that's used first and only then is a request done for more data; if that request produces more data than needed, it is stored in the buffer (you can see how much with chan pending, but that's not normally important for binary channels). However, if that one non-blocking request does not deliver enough data to give you what you asked for, read returns anyway: you have a short read. Short reads don't necessarily mean that you're at the end of the channel, use chan eof and chan blocked to find out more (especially if you get the special case of a zero-length read). Being blocked might also not mean that you're at the end of a message within a higher-level protocol; more data may be coming, but it hasn't reached the OS yet (which is why you need a framing protocol on top of TCP; websockets are one such framing protocol).
Counting the data is easy: string length.
tl;dr: In non-blocking mode, the maximum amount that read of a binary channel can return is whatever is currently in the input buffers plus whatever is obtained from one non-blocking read of the file descriptor. In blocking mode, read will wait until the requested amount of data is available or definitely not available (end-of-file), performing multiple reads of the file descriptor if necessary.

expect: how to send an EOF to spawnd process

I have a program read from stdin and process it. ( like "tee /some/file" )
This program wait stdin end to exit itself.
If I spawn it from Expect, after I send many content, how to send an "EOF" to the program?
there is a close command in Expect, but it will also send a SIGHUP, and can not expect program output anymore.
Expect works (on non-Windows) by using a virtual terminal which the spawned program runs within. This means that you can do things by sending character sequences to simulate keys. In particular, the EOF control sequence is done with Ctrl+D, which becomes the character U+000004. The terminal processes this to turn it into a true EOF.
There's a few ways to write it, depending on which escape sequence you prefer, but one of these will work:
# Hexadecimal-encoded escape
send \x04
# Octal-encoded escape
send \004
# UNICODE escape (also hexadecimal)
send \u0004
# Generate by a command
send [format "%c" 4]
When Expect is using Tcl 8.6, these all generate the same bytecode so use whichever you prefer.

Read and parse a >400MB .json file in Julia without crashing kernel

The following is crashing my Julia kernel. Is there a better way to read and parse a large (>400 MB) JSON file?
using JSON
data = JSON.parsefile("file.json")
Unless some effort is invested into making a smarter JSON parser, the following might work: There is a good chance file.json has many lines. In this case, reading the file and parsing a big repetitive JSON section line-by-line or chunk-by-chuck (for the right chunk length) could do the trick. A possible way to code this, would be:
using JSON
f = open("file.json","r")
discard_lines = 12 # lines up to repetitive part
important_chunks = 1000 # number of data items
chunk_length = 2 # each data item has a 2-line JSON chunk
for i=1:discard_lines
l = readline(f)
end
for i=1:important_chunks
chunk = join([readline(f) for j=1:chunk_length])
push!(thedata,JSON.parse(chunk))
end
close(f)
# use thedata
There is a good chance this could be a temporary stopgap solution for your problem. Inspect file.json to find out.

How do I use a shell-script as Chrome Native Messaging host application

How do you process a Chrome Native Messaging API-call with a bash script?
I succeeded in doing it with python with this example
Sure I can call bash from the python code with subprocess, but is it possible to skip python and process the message in bash directly?
The problematic part is reading the JSON serialized message into a variable. The message is serialized using JSON, UTF-8 encoded and is preceded with 32-bit message length in native byte order through stdin.
echo $* only outputs:
chrome-extension://knldjmfmopnpolahpmmgbagdohdnhkik/
Also something like
read
echo $REPLY
doesn't output anything. No sign of the JSON message. Python uses struct.unpack for this. Can that be done in bash?
I suggest to not use (bash) shell scripts as a native messaging host, because bash is too limited to be useful.
read without any parameters reads a whole line before terminating, while the native messaging protocol specifies that the first four bytes specify the length of the following message (in native byte order).
Bash is a terrible tool for processing binary data. An improved version of your read command would specify the -n N parameter to stop reading after N characters (note: not bytes) and -r to remove some processing. E.g. the following would store the first four characters in a variable called var_prefix:
IFS= read -rn 4 var_prefix
Even if you assume that this stores the first four bytes in the variable (it does not!), then you have to convert the bytes to an integer. Did I already mention that bash automatically drops all NUL bytes? This characteristics makes Bash utterly worthless for being a fully capable native messaging host.
You could cope with this shortcoming by ignoring the first few bytes, and start parsing the result when you spot a { character, the beginning of the JSON-formatted request. After this, you have to read all input until the end of the input is found. You need a JSON parser that stops reading input when it encounters the end of the JSON string. Good luck with writing that.
Generating output is a easier, just use echo -n or printf.
Here is a minimal example that assumes that the input ends with a }, reads it (without processing) and replies with a result. Although this demo works, I strongly recommend to not use bash, but a richer (scripting) language such as Python or C++.
#!/bin/bash
# Loop forever, to deal with chrome.runtime.connectNative
while IFS= read -r -n1 c; do
# Read the first message
# Assuming that the message ALWAYS ends with a },
# with no }s in the string. Adopt this piece of code if needed.
if [ "$c" != '}' ] ; then
continue
fi
message='{"message": "Hello world!"}'
# Calculate the byte size of the string.
# NOTE: This assumes that byte length is identical to the string length!
# Do not use multibyte (unicode) characters, escape them instead, e.g.
# message='"Some unicode character:\u1234"'
messagelen=${#message}
# Convert to an integer in native byte order.
# If you see an error message in Chrome's stdout with
# "Native Messaging host tried sending a message that is ... bytes long.",
# then just swap the order, i.e. messagelen1 <-> messagelen4 and
# messagelen2 <-> messagelen3
messagelen1=$(( ($messagelen ) & 0xFF ))
messagelen2=$(( ($messagelen >> 8) & 0xFF ))
messagelen3=$(( ($messagelen >> 16) & 0xFF ))
messagelen4=$(( ($messagelen >> 24) & 0xFF ))
# Print the message byte length followed by the actual message.
printf "$(printf '\\x%x\\x%x\\x%x\\x%x' \
$messagelen1 $messagelen2 $messagelen3 $messagelen4)%s" "$message"
done

Convert io.BytesIO to io.StringIO to parse HTML page

I'm trying to parse a HTML page I retrieved through pyCurl but the pyCurl WRITEFUNCTION is returning the page as BYTES and not string, so I'm unable to Parse it using BeautifulSoup.
Is there any way to convert io.BytesIO to io.StringIO?
Or Is there any other way to parse the HTML page?
I'm using Python 3.3.2.
the code in the accepted answer actually reads from the stream completely for decoding. Below is the right way, converting one stream to another, where the data can be read chunk by chunk.
# Initialize a read buffer
input = io.BytesIO(
b'Inital value for read buffer with unicode characters ' +
'ÁÇÊ'.encode('utf-8')
)
wrapper = io.TextIOWrapper(input, encoding='utf-8')
# Read from the buffer
print(wrapper.read())
A naive approach:
# assume bytes_io is a `BytesIO` object
byte_str = bytes_io.read()
# Convert to a "unicode" object
text_obj = byte_str.decode('UTF-8') # Or use the encoding you expect
# Use text_obj how you see fit!
# io.StringIO(text_obj) will get you to a StringIO object if that's what you need