Why is the stream blocking in this iex session? - csv

I'm trying to parse some CSVs using elixir:
iex -S mix
Erlang/OTP 18 [erts-7.2] [source-e6dd627] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Interactive Elixir (1.3.0-dev) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> a = File.stream!("test/lib/fit_notes/fit_notes_export.csv") |> CSV.decode
#Function<49.97003610/2 in Stream.transform/3>
iex(2)> Stream.take(a, 1)
#Stream<[enum: #Function<49.97003610/2 in Stream.transform/3>,
funs: [#Function<38.97003610/1 in Stream.take/2>]]>
iex(3)> Enum.take(a, 1)
[["Date", "Exercise", "Category", "Weight (kgs)", "Reps", "Distance",
"Distance Unit", "Time"]]
iex(4)> Enum.take(a, 2)
^ this just blocks
The first Enum.take that I issue works, the second blocks forever. Can you tell me what am I doing wrong? I'm using this library for CSV parsing.

If you follow the example from http://hexdocs.pm/csv/CSV.html
your code would end up something like this:
a = File.stream!("test/lib/fit_notes/fit_notes_export.csv") |> CSV.decode |>
Stream.take(1) |>
Stream.take(1) |>
Enum.take(2)
Note I changed your first Enum.take(1) in to a Stream.take(1) so that the Stream doesn't get prematurely terminated. Also note doing two Stream.take(1) will be better converted into a single Stream.take(2). Also note how the stream piping works by adding a |> to the end of the each line until you reach an Enum call - which then fires the whole operation.
Added:
For Streams with side-effects (like logging) see
https://github.com/elixir-lang/elixir/issues/1831 where they recommend Stream.each

Related

cythonize under py3.6.4 Cannot convert 'basestring' object to bytes implicitly. This is not portable

This code snippet works just fine under python 3.6.4 but is triggering a portability issue when present in .pyx files. I could use some help figuring out how to best format python 3.5.1+ bytes in Cython.
EDIT: Changing this in light of DavidW's comment.
Following works in python 3.6.4 under ipython
def py_foo():
bytes_1 = b'bytes 1'
bytes_2 = b'bytes 2'
return b'%(bytes_1)b %(bytes_2)b' % {
b'bytes_1': bytes_1,
b'bytes_2': bytes_2}
As hoped this results in:
print(py_foo())
b'bytes 1 bytes 2'
Using cython with the only changes to the code being the name of the function, a return type declared, and declaring the two variables.
%load_ext Cython
# Cython==0.28
followed by:
%%cython
cpdef bytes cy_foo():
cdef:
bytes bytes_1, bytes_2
bytes_1 = b'bytes 1'
bytes_2 = b'bytes 2'
return b'%(bytes_1)b %(bytes_2)b' % {
b'bytes_1': bytes_1,
b'bytes_2': bytes_2}
Results in:
Error compiling Cython file:
....
return b'%(bytes_1)b %(bytes_2)b' % {
^
..._cython_magic_b0aa5be86bdfdf75b98df1af1a2394af.pyx:7:38: Cannot convert 'basestring' object to bytes implicitly. This is not portable.
-djv
I'm not sure if this is a useful answer or just a more detailed diagnosis, but: the issue is with the return type. If you do:
cpdef cy_foo1(): # no return type specified
# everything else exactly the same
then it's happy. If you do
cpdef bytes cy_foo2():
# everything else the same
return bytes(b'%(bytes_1)b %(bytes_2)b' % {
b'bytes_1': bytes_1,
b'bytes_2': bytes_2})
then it's happy. If you do
def mystery_function_that_returns_not_bytes():
return 1
cpdef bytes cy_foo3():
return mystery_function_that_returns_not_bytes()
then it compiles happily but gives a runtime exception (as you would expect)
The issue seems to be that it knows bytes % something returns a basestring but it isn't confident that it returns bytes and isn't prepared to leave it until runtime to try (unlike the cases where it's totally sure, or completely unsure, when it will leave it until runtime).
The above examples show a couple of ways of working round it. Personally, I'd just remove the return type - you don't get a lot of benefit from typing Python objects such as bytes anyway. You should probably also report this as a bug to https://github.com/cython/cython/issues

How to stream a large CSV response from a compojure API so that the whole response is not held in memory at once?

I'm new to using compojure, but have been enjoying using it so far. I'm
currently encountering a problem in one of my API endpoints that is generating
a large CSV file from the database and then passing this as the response body.
The problem I seem to be encountering is that the whole CSV file is being kept
in memory which is then causing an out of memory error in the API. What is the
best way to handle and generate this, ideally as a gzipped file? Is it possible
to stream the response so that a few thousand rows are returned at a time? When
I return a JSON response body for the same data, there is no problem returning
this.
Here is the current code I'm using to return this:
(defn complete
"Returns metrics for each completed benchmark instance"
[db-client response-format]
(let [benchmarks (completed-benchmark-metrics {} db-client)]
(case response-format
:json (json-grouped-output field-mappings benchmarks)
:csv (csv-output benchmarks))))
(defn csv-output [data-seq]
(let [header (map name (keys (first data-seq)))
out (java.io.StringWriter.)
write #(csv/write-csv out (list %))]
(write header)
(dorun (map (comp write vals) data-seq))
(.toString out)))
The data-seq is the results returned from the database, which I think is a
lazy sequence. I'm using yesql to perform the database call.
Here is my compojure resource for this API endpoint:
(defresource results-complete [db]
:available-media-types ["application/json" "text/csv"]
:allowed-methods [:get]
:handle-ok (fn [request]
(let [response-format (keyword (get-in request [:request :params :format] :json))
disposition (str "attachment; filename=\"nucleotides_benchmark_metrics." (name response-format) "\"")
response {:headers {"Content-Type" (content-types response-format)
"Content-Disposition" disposition}
:body (results/complete db response-format)}]
(ring-response response))))
Thanks to all the suggestion that were provided in this thread, I was able to create a solution using piped-input-stream:
(defn csv-output [data-seq]
(let [headers (map name (keys (first data-seq)))
rows (map vals data-seq)
stream-csv (fn [out] (csv/write-csv out (cons headers rows))
(.flush out))]
(piped-input-stream #(stream-csv (io/make-writer % {})))))
This differs from my solution because it does not realise the sequence using dorun and does not create a large String object either. This instead writes to a PipedInputStream connection asynchronously as described by the documentation:
Create an input stream from a function that takes an output stream as its
argument. The function will be executed in a separate thread. The stream
will be automatically closed after the function finishes.
Your csv-output function completely realises the dataset and turns it into a string. To lazily stream the data, you'll need to return something other than a concrete data type like a String. This suggests ring supports returning a stream, that can be lazily realised by Jetty. The answer to this question might prove useful.
I was also struggling with the streaming of large csv file. My solution was to use httpkit-channel to stream every single line of the data-seq to the client and then close the channel. My solution looks like that:
[org.httpkit.server :refer :all]
(fn handler [req]
(with-channel req channel (let [header "your$header"
data-seq ["your$seq-data"]]
(doseq [line (cons header data-seq)]
(send! channel
{:status 200
:headers {"Content-Type" "text/csv"}
:body (str line "\n")}
false))
(close channel))))

vimscript: commands that work in mappings, but not in functions

How can I rewrite these 2 commands, which work fine in a mapping, so that they will work in a function?
:if has_key(glos,#g)==1<cr>:let #j=eval('glos.'.#g)<cr>
The function concerned is executed by vim without comment, but #j remains unchanged, as if they had failed, but no message/error is generated.
Here is the complete code involved, the command that loads the dictionary, the function that does not work, and the mapping from that function that does.
" read the glossary into the dictionary, glos
let glos=eval(join(readfile("glossary.dict")))
" 2click item of interest and this will
" send image filepath to xv
" if item all-caps find same at start of its line
" If capitalized at eol find same at start of its line
" if all lower-case at eol find next occurrence of same
" look lower-case or capitalized word up in glossary.txt
" find _\d\+ (page no.) alone on its line in text
com! F call F()
function! F()
normal "ayiw"cyE"by$
let #c=substitute(#c,"[,.?':;!]\+","","g")
if #c=~'images\/ss\d\d\d*'
let #i='!display -geometry +0+0 '.#c.' &'
pkill display
#i
elseif #c==toupper(#c)
let #n=search('^'.#c,'sw')
elseif #c!=#b
let #f=3
let #g=tolower(#c)
while #f>0
try
let #j=eval('glos.'.#g)
catch
let #f=#f-1
let #g=strpart(#g,0,strlen(#g)-1)
continue
endtry
break
endwh
if #f>0
let #h=substitute(#j," glosimgs.*",'','')
if #h!=#j
let #i='!xv -geometry +0+380 '.substitute(#j,'^.\{-}\( glosimgs.*\)$','\1','').' &'
!pkill xv
#i
endif
echo #h
else
echo 'No matching entry for '.#c
endif
elseif #c=~'\u\l\+$'
let #n=search('^'.#c,'sw')
elseif #c=~'\l\+$'
norm *
elseif #c=~'^_\w\+$'
let #/='^'.#c.'$'
norm nzz
endif
endfunction
map <silent> <2-LeftMouse> "ayiw"cyE"by$:let #c=substitute(#c,"[,.?':;!]\+","","g")<cr>:if #c=~'images\/ss\d\d\d*'<cr>:let #i='!display -geometry +0+0 '.#c.' &'<cr>:pkill display<cr>:#i<cr>:elseif #c==toupper(#c)<cr>:let #n=search('^'.#c,'sw')<cr>:elseif #c!=#b<cr>:let #f=3<cr>:let #g=tolower(#c)<cr>:while #f>0<cr>:try<cr>:let #j=eval('glos["'.#g.'"]')<cr>:catch<cr>:let #f=#f-1<cr>:let #g=strpart(#g,0,strlen(#g)-1)<cr>:continue<cr>:endtry<cr>:break<cr>:endwh<cr>:if #f>0<cr>:let #h=substitute(#j," glosimgs.*",'','')<cr>:if #h!=#j<cr>:let #i='!xv -geometry +0+380 '.substitute(#j,'^.\{-}\( glosimgs.*\)$','\1','').' &'<cr>:!pkill xv<cr>:#i<cr>:endif<cr><cr<cr>>:echo #h<cr>:else<cr>:echo 'No matching entry for '.#c<cr>:endif<cr>:elseif #c=~'\u\l\+$'<cr>:let #n=search('^'.#c,'sw')<cr>:elseif #c=~'\l\+$'<cr>:norm *<cr>:elseif #c=~'^_\w\+$'<cr>:let #/='^'.#c.'$'<cr>:norm nzz<cr>:endif<cr>
Specifically, I should have written:
:if has_key(**g:**glos,#g)==1:let #j=eval('**g:**glos.'.#g)
:h g: goes straight to the heart of the matter; in a function all references are local to that function, so references to any variable outside the function must be global, by prepending 'g:' to the variable name. As I created the dictionary independent of the function, the function must reference it as a global item.
Of course, if you are not aware of 'g:', it is rather difficult to find that help reference, but that's a frequent problem using help.
And, of course, the ** surrounding g: aren't required, that's what this site gives you in lieu of bolded text, apparently.

Run R silently from command line, export results to JSON

How might I call an R script from the shell (e.g. from Node.js exec) and export results as JSON (e.g. back to Node.js)?
The R code below basically works. It reads data, fits a model, converts the parameter estimates to JSON, and prints them to stdout:
#!/usr/bin/Rscript --quiet --slave
install.packages("cut", repos="http://cran.rstudio.com/");
install.packages("Hmisc", repos="http://cran.rstudio.com/");
install.packages("rjson", repos="http://cran.rstudio.com/");
library(rjson)
library(reshape2);
data = read.csv("/data/records.csv", header = TRUE, sep=",");
mylogit <- glm( y ~ x1 + x2 + x3, data=data, family="binomial");
params <- melt(mylogit$coefficients);
json <- toJSON(params);
json
Here's how I'd like to call it from Node...
var exec = require('child_process').exec;
exec('./model.R', function(err, stdout, stderr) {
var params = JSON.parse(stdout); // FAIL! Too much junk in stdout
});
Except the R process won't stop printing to stdout. I've tried --quiet --slave --silent which all help a little but not enough. Here's what's sent to stdout:
The downloaded binary packages are in
/var/folders/tq/frvmq0kx4m1gydw26pcxgm7w0000gn/T//Rtmpyk7GmN/downloaded_packages
The downloaded binary packages are in
/var/folders/tq/frvmq0kx4m1gydw26pcxgm7w0000gn/T//Rtmpyk7GmN/downloaded_packages
[1] "{\"value\":[4.04458733165933,0.253895751245782,-0.1142272181932,0.153106007464742,-0.00289013062471735,-0.00282580664375527,0.0970325223603164,-0.0906967639834928,0.117150317941983,0.046131890754108,6.48538603593323e-06,6.70646151749708e-06,-0.221173770066275,-0.232262366060079,0.163331098409235]}"
What's the best way to use R scripts on the command line?
Running R --silent --slave CMD BATCH model.R per the post below still results in a lot of extraneous text printed to model.Rout:
Run R script from command line
Those options only stop R's own system messages from printing, they won't stop another R function doing some printing. Otherwise you'll stop your last line from printing and you won't get your json to stdout!
Those messages are coming from install.packages, so try:
install.packages(-whatever-, quiet=TRUE)
which claims to reduce the amount of output. If it reduces it to zero, job done.
If not, then you can redirect stdout with sink, or run things inside capture.output.

How do I get an unhandled exception to be reported in SML/NJ?

I have the following SML program in a file named testexc.sml:
structure TestExc : sig
val main : (string * string list -> int)
end =
struct
exception OhNoes;
fun main(prog_name, args) = (
raise OhNoes
)
end
I build it with smlnj-110.74 like this:
ml-build sources.cm TestExc.main testimg
Where sources.cm contains:
Group is
csx.sml
I invoke the program like so (on Mac OS 10.8):
sml #SMLload testimg.x86-darwin
I expect to see something when I invoke the program, but the only thing I get is a return code of 1:
$ sml #SMLload testimg.x86-darwin
$ echo $?
1
What gives? Why would SML fail silently on this unhandled exception? Is this behavior normal? Is there some generic handler I can put on main that will print the error that occurred? I realize I can match exception OhNoes, but what about larger programs with exceptions I might not know about?
The answer is to handle the exception, call it e, and print the data using a couple functions available in the system:
$ sml
Standard ML of New Jersey v110.74 [built: Tue Jan 31 16:23:10 2012]
- exnName;
val it = fn : exn -> string
- exnMessage;
val it = fn : exn -> string
-
Now, we have our modified program, were we have the generic handler tacked on to main():
structure TestExc : sig
val main : (string * string list -> int)
end =
struct
exception OhNoes;
open List;
fun exnToString(e) =
List.foldr (op ^) "" ["[",
exnName e,
" ",
exnMessage e,
"]"]
fun main(prog_name, args) = (
raise OhNoes
)
handle e => (
print("Grasshopper disassemble: " ^ exnToString(e));
42)
end
I used lists for generating the message, so to make this program build, you'll need a reference to the basis library in sources.cm:
Group is
$/basis.cm
sources.cm
And here's what it looks like when we run it:
$ sml #SMLload testimg.x86-darwin
Grasshopper disassemble: [OhNoes OhNoes (more info unavailable: ExnInfoHook not initialized)]
$ echo $?
42
I don't know what ExnInfoHook is, but I see OhNoes, at least. It's too bad the SML compiler didn't add a basic handler for us, so as to print something when there was an unhandled exception in the compiled program. I suspect ml-build would be responsible for that task.