Parallel MySQL access with Scheme/Ypsilon engines - mysql

I'm trying to achieve parallell MySQL access with Scheme engines (light threads, as I've come to understand it).
My problem is, I don't know how to interact with C properly, thus can't take care of the results from the MySQL C API. This could also be a question about integrating C and Scheme.
My choice of Ypsilon is because it's fast (faster than Spark at least) and R6RS. Choosing another Scheme implementation is only relevant if it's R6RS and has speed comparable to Ypsilon.
This file bind the C API:
http://code.google.com/p/ypsilon/source/browse/trunk/sitelib/ypsilon/mysql.scm?spec=svn444&r=444
Top make it compile, I had to change all int8_t to int, unsigned-int to int, etc. There's a lot of C types I miss in my Ypsilon library, that is (it's the latest complete distribution, still).
My book about Scheme, The Scheme Programming Language by Dybvig (4:th edition) sais nothing about C integration.
The function mysql_fetch_row returns a MYSQL_ROW struct. This page present a way to handle this:
(google for) fixedpoint mysql-client-in-ypsilon.html
I cannot compile this. Have to check why.
This is used:
make-bytevector-mapping = Provides transparent access to an arbitrary memory block
In its FFI, Ypsilon has a function called define-c-struct-type:
http://www.littlewing.co.jp/ypsilon/doc-draft/libref.ypsilon.c-types.html#define-c-struct-methods
I haven't tried that yet. I'll be back with more information. If anyone has succeeded with something like this, please let me know.
Regards
Olle
My code so far:
(import (core)
(ypsilon ffi)
(ypsilon c-types)
(ypsilon mysql))
(define NULL 0)
(define host "localhost")
(define user "root")
(define passwd "bla")
(define database "bla")
(define mysql)
(define query "select bla from bla")
(begin
(set! mysql (mysql_init #f))
(mysql_real_connect mysql host user passwd database 0 #f 0)
(display (mysql_stat mysql))
(newline)
(display (string-append "mysql_query:\t" (number->string (mysql_query mysql query))))
(newline)
(let ([result (mysql_store_result mysql)])
(let rec ([n (mysql_num_fields result)] [row (mysql_fetch_row result)] [lengths (mysql_fetch_lengths result)])
(display (string-append "num_fields:\t" (number->string n)))
(newline)
(display (string-append "lengths:\t" (number->string lengths)))
(newline)
(display (string-append "row:\t\t" (number->string row)))
(newline)
(if (= row (- 0 1))
""
(let ([row (make-bytevector-mapping row (bytevector-c-int-ref (make-bytevector-mapping lengths sizeof:int)) (* 10 sizeof:int))])
(display (utf8->string row))
(newline)
(newline)
(newline)
(rec n (- lengths 1) (mysql_fetch_row result))
)))
(newline)))
This will output:
Uptime: 1203 Threads: 1 Questions: 204 Slow queries: 0 Opens: 569 Flush tables: 1 Open tables: 64 Queries per second avg: 0.169
mysql_query: 0
num_fields: 1
lengths: 8525640
row: 8541880
error in bytevector-c-int-ref: expected 2, but 1 argument given
irritants:
(#<bytevector-mapping 0x821748 4>)
backtrace:
0 (bytevector-c-int-ref (make-bytevector-mapping lengths sizeof:int))
..."/home/olle/scheme/div.scm" line 54
1 (rec n (- lengths 1) (mysql_fetch_row result))
..."/home/olle/scheme/div.scm" line 63
2 (let rec ((n (mysql_num_fields result)) (row (mysql_fetch_row result)) (lengths ...) ...) ...)
..."/home/olle/scheme/div.scm" line 37

I looked at the documentation for the for bytevector-c-int-ref the first thing that jumped out to me was that it requires two arguments (like the error says) the second being a non-negative byte offset. Does it run with that change?

Related

LISP - Fast output of byte array

I am making a compiler for a language in LISP, and the overall aim is for the compiler to produce LISP code from the original language. Trying to measure the performance of the generated code, I found that it is severely lacking in printing strings.
In the original language, characters are byte -arithmetic- values, thus strings are arrays of bytes, and the value of the byte corresponds to the character whose value is the ascii-code of the byte. A "printable" byte array must be null-terminated. Thus, to print a byte array as a character string, I have to map the original array's elements into characters before printing it. The function that handles this is as follows:
(defun writeString (X &AUX (NPOS 0) (i 0))
(declare (type (simple-VECTOR fixnum *) x))
(declare (type fixnum NPOS i))
(SETF NPOS (POSITION 0 X))
(IF (NOT NPOS)
(SETF NPOS (LENGTH X)))
(princ (MAKE-ARRAY NPOS
:INITIAL-CONTENTS (map 'vector
#'code-char
(SUBSEQ X 0 NPOS))
:ELEMENT-TYPE 'base-char)))
and it is injected into the generated code.
Running a sample code with time, I found out that the princ part results in a lot of consing during execution, which slows things down. When in the place of make-array... a static string is put, there is no slowdown and no consing, so I guess that's the part the damage is done.
While compiling, I have set flags full on speed, the byte values are declared as fixnum for now in the generated code.
Can anyone point me to a better way to print my byte array as character string while avoiding excessive consing?
I could store bytes as characters from the get-go, but that would result in the parts of the language that treat them as numbers being slower due to the need to convert.
Problems in your code
Your code:
(defun writeString (X &AUX (NPOS 0) (i 0))
(declare (type (simple-VECTOR fixnum *) x))
(declare (type fixnum NPOS i))
(SETF NPOS (POSITION 0 X))
(IF (NOT NPOS)
(SETF NPOS (LENGTH X)))
(princ (MAKE-ARRAY NPOS
:INITIAL-CONTENTS (map 'vector
#'code-char
(SUBSEQ X 0 NPOS))
:ELEMENT-TYPE 'base-char)))
There are a couple of mistakes in the code:
i is not used
the first type declaration is syntactical not valid
the declaration for NPOS is wrong. You define it as FIXNUM, but it can be NIL.
There are a bunch of programming mistakes:
there is no need to allocate any array if all you want is to output characters.
even if you want to generate an array, one can do it once
X is not a good name for a string
A simple solution:
(defun writestring (bytestring)
(loop for byte across bytestring
while (plusp byte)
do (write-char (code-char byte))))
A type declared version could be:
(defun writestring (bytestring)
(declare (vector bytestring))
(loop for byte of-type (integer 0 255) across bytestring
while (plusp byte)
do (write-char (code-char byte))))
Instead (integer 0 255) one can also use (unsigned-byte 8).
About generating vectors:
Let's also look how you try to create the array:
You create an array with make-array, using the contents from another array.
Why not tell MAP to generate the correct array?
CL-USER 46 > (map '(vector base-char) #'code-char #(102 111 111 98 97 114))
"foobar"
Now if you want to allocate arrays for some reason:
do it once
map the content into the generated array. Use map-into for that. It will stop with the shorter sequence.
Example:
CL-USER 48 > (let ((bytestring #(102 111 111 98 97 114 0 100 100 100)))
(map-into (make-array (or (position 0 bytestring)
(length bytestring))
:element-type 'base-char)
#'code-char
bytestring))
"foobar"
You could rely on write-sequence, which hopefully is optimized to write a sequence of characters or bytes. It also accepts an :end argument which is useful for delimiting the end of the written string.
I doubt you really need to use literal vectors (which are always simple-vector), but if so you maybe want to change them. You can do it at read-time:
(let ((input #.(coerce #(102 111 111 98 97 114 0 100 100 100)
'(vector (mod 256)))))
(write-sequence (map '(vector base-char)
#'code-char
input)
*standard-output*
:end (position 0 input)))
I never used something like the following, but you could also open the same file in both character and byte mode, and switch whenever necessary:
(with-open-file (out-c #P"/tmp/test"
:if-exists :supersede
:direction :output)
(with-open-file (out-8 #P"/tmp/test"
:element-type '(unsigned-byte 8)
:direction :output
:if-exists :append)
(format out-c "Hello [")
(file-position out-8 (file-position out-c))
(write-sequence #(102 111 111 98 97 114) out-8)
(file-position out-c (file-position out-8))
(format out-c "]")))
It prints "Hello [foobar]" in /tmp/test, and it seems to work with multibyte characters, but you probably need to test that more.

Recursion in Lisp and making my own length function

I am trying to make my own length/2 function(which allows you to find the length of a list) in lisp and am having an issue.
If I were to program this in java I would create a global index variable
that = 0
Then in the method I would do
if(list.equal(null))
{
return index;
}
else
{
index++;
return functioname(tail of list) ;
}.
Obviously this is not actual java syntax but I am just trying to relay the logic I want to apply in lisp.
My main issue is that ifs in lisp only allow you to do
if test expression
then do something
else
do something else
while I am trying to do
if test expression
then do something
else
do 2x something
Is there a way I can accomplish this in lisp or is there a better way to go about this problem?
Do it recursively:
len(list):
if list is null
return 0
else
remove first_item from list
return 1 + len(list)
(define (length items)
(if (null? items)
0
(+ 1
(length (cdr items)))))
(length (list 1 2 3))
3
Or use set!:
(define count 0)
(define (length items)
(when (not (null? items))
(set! count (+ count 1))
(length (cdr items))))
(length (list 1 2 3 4 5))
count
5
Scheme has the special operator begin (the Common Lisp equivalent is progn) that lets you group expressions which are executed in sequence, and returns the last one.
(if (null? items)
0
(begin (set! index (+ 1 index))
(my-length (cdr items))
index))
If the if expression returns nil under the false condition, then you can also use when (as in Rahn's answer) which is a bit more compact.
That said, having a global variable, or any variable at all, whose value gets changed (which is the purpose of a variable) is not the Lisp/recursive way of doing things. (Global variables are not a good thing in any language.) Rahn's first example is the only way any experienced Lisp programmer would do it.
In my example above, the (set! index (+ 1 index)) and index)) lines, and the global variable index are completely unnecessary if you add the 1 to the (my-length (cdr items)). If the list is empty, the length is zero; otherwise, the length is 1 + the length of the tail of the list.

Fast CSV reading in Common Lisp

What is the fastest way to read a csv file in CL in a way such that:
1) all fields in the first line go into one array called column-names
2) the first field of each of all following lines goes into another
array called row-names
3) all other fields go into another array called values
?
My file has the following form, just with a lot more columns and rows:
"";"ES1 Index";"VG1 Index";"TY1 Comdty";"RX1 Comdty";"GC1 Comdty"
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441.7
"1999-01-07";1432.32;3106.08;66.25;86.22;447.67
And the result I would like is:
#("1999-01-04" "1999-01-05" "1999-01-06" "1999-01-07" )
#("" "ES1 Index" "VG1 Index" "TY1 Comdty" "RX1 Comdty" "GC1 Comdty")
#(1391.12 3034.53 66.515625 86.2 441.39 1404.86 3072.41 66.3125 86.17 440.63
1435.12 3156.59 66.4375 86.32 441.7 1432.32 3106.08 66.25 86.22 447.67)
Are you aware of some CL library that does so already?
Are there any general issues regarding I/O performance, maybe compiler-specific, that I should be aware of?
Here is the way I am doing it now:
(with-open-file (stream "my-file.csv" :direction :input)
(let* ((header (read-line stream nil))
(columns-list (mapcar #'read-from-string
(cl-ppcre:split ";" header)))
(number-of-columns (length columns-list))
(column-names (make-array number-of-columns
:initial-contents columns-list))
(rownames (make-array 1 :adjustable t :fill-pointer 0))
(values (make-array 1 :adjustable t :fill-pointer 0)))
(set-syntax-from-char #\; #\ )
(loop
:for reader = (read stream nil stream)
:until (eq reader stream)
:do (progn (vector-push-extend reader row-names)
(loop
:for count :from 2 :upto number-of-columns
:do (vector-push-extend (read stream nil)
values)))
:finally (return (values row-names
column-names
values)))))
Note: I wouldn't use set-syntax-from-char in real code, I am using it just for the sake of this example.
I suspect that the I/O is the slowest part here. You can probably get faster I/O if you use READ-SEQUENCE rather than calling READ-LINE repeatedly. So your code might look something like this:
(with-open-file (s "my-file.csv")
(let* ((len (file-length s))
(data (make-array len)))
(read-sequence data s)
data))
Then split data by newlines and add your logic.
Whether that helps or not, it'd helpful for you to profile your code, e.g. with :sb-sprof, to see where most of the time is being spent.
To read csv files, I find very useful and fast the cl-csv package (https://github.com/AccelerationNet/cl-csv). For instance, to solve your problem, the following code could be used:
(let ((data (cl-csv:read-csv #P"my-file.csv" :separator #\;)))
(values (apply #'vector (first data))
(apply #'vector (rest (mapcar #'first data)))
(apply #'vector
(mapcar #'read-from-string (loop :for row :in (rest data)
:append (rest row))))))
cl-csv:read-csv returns a list contaning, for each row, a list of strings that are the contents of the cells.

recursive definition of map in common lisp [duplicate]

This question already has answers here:
Why do we need funcall in Lisp?
(3 answers)
Closed 8 years ago.
I was expecting the Scheme approach:
(defun mmap (f xs)
(if (equal xs NIL)
'()
(cons (f (car xs)) (mmap f (cdr xs)))))
but i get a compiler warning
;Compiler warnings for ".../test.lisp" :
; In MMAP: Undefined function F
I can see that I need to make the compiler aware that f is a function. Does this have anything to do with #'?
Common Lisp is a LISP2. It means variables that are in the first element of the form is evaluated in a function namespace while everything else is evaluated from a variable namespace. Basically it means you can use the same name as a function in your arguments:
(defun test (list)
(list list list))
Here the first list is looked up in the function namespace and the other two arguments are looked up in the variable namespace. First and the rest become two different values because they are fetched from different places. If you'd do the same in Scheme which is a LISP1:
(define (test list)
(list list list))
And pass a value that is not a procedure that takes two arguments, then you'll get an error saying list is not a procedure.
A function with name x in the function namespace can be fetched from other locations by using a special form (function x). Like quote function has syntax sugar equivalent so you can write #'x. You can store functions in the variable namespace (setq fun #'+) and you can pass it to higher order functions like mapcar (CL version of Scheme map). Since first element in a form is special there is a primitive function funcall that you can use when the function is bound in the variable namespace: (funcall fun 2 3) ; ==> 5
(defun my-mapcar (function list)
(if list
(cons (funcall function (car list))
(my-mapcar function (cdr list)))
nil))
(my-mapcar #'list '(1 2 3 4)) ; ==> ((1) (2) (3) (4))
A version using loop which is the only way to be sure not to blow the stack in CL:
(defun my-mapcar (function list)
(loop :for x :in list
:collect (funcall function x)))
(my-mapcar #'list '(1 2 3 4)) ; ==> ((1) (2) (3) (4))
Of course, mapcar can take multiple list arguments and the function map is like mapcar but it works with all sequences (lists, arrays, strings)
;; zip
(mapcar #'list '(1 2 3 4) '(a b c d)) ; ==> ((1 a) (2 b) (3 c) (4 d))
;; slow way to uppercase a string
(map 'string #'char-upcase "banana") ; ==> "BANANA"
functions in CL that accepts functions as arguments also accept symbols. Eg. #'+ is a function while '+ is a symbol. What happens is that it retrieves #'+ by fetching the function represented by + in the global namespace. Thus:
(flet ((- (x) (* x x)))
(list (- 5) (funcall #'- 5) (funcall '- 5))) ; ==> (25 25 -5)
Trivia: I think the number in LISP1 and LISP2 refer to the number of namespaces rather than versions. The very first LISP interpereter had only one namespace, but dual namespace was a feature of LISP 1.5. It was he last LISP version before commercial versions took over. A LISP2 was planned but was never finished. Scheme was originally written as a interpreter in MacLisp (Which is based on LISP 1.5 and is a LISP2). Kent Pittman compared the two approaches in 1988.

Standard function for replacing subsequences

Quite often I need to replace subsequence of certain elements with another sequence of the same type, but, probably with different length. Implementation of such function is no challenge, this is what I use now:
(defun substitute* (new old where &key key (test #'eql))
(funcall (alambda (rest)
(aif (search old rest :key key :test test)
(concatenate (etypecase rest
(string 'string)
(vector 'vector)
(list 'list))
(subseq rest 0 it)
new
(self (subseq rest (+ it (length old)))))
rest))
where))
Works like this:
CL-USER> (substitute* '(x y) '(z) '(1 z 5 8 y z))
(1 X Y 5 8 Y X Y)
CL-USER> (substitute* "green button" "red button"
"here are red indicator, red button and red wire")
"here are red indicator, green button and red wire"
CL-USER> (substitute* #(4) #(2 2) #(2 2 2 2 2))
#(4 4 2)
You see, it's very handy and useful, so I've feeling that I'm reinventing wheel and it must be in the standard library, I just don't know its name (sometimes names are not obvious, you may search for filter while what you need is set-difference).
As a result of compromise between clarity and efficiency:
(defun substitute* (new old where &key key (test #'eql))
(let ((type (etypecase where
(string 'string)
(vector 'vector)
(list 'list)))
(new (coerce new 'list))
(old (coerce old 'list))
(where (coerce where 'list)))
(coerce (funcall (alambda (rest)
(aif (search old rest :key key :test test)
(append (remove-if (constantly t) rest :start it)
new
(self (nthcdr (+ it (length old)) rest)))
rest))
where)
type)))
I don't think that there's any standard function for this. It's more complicated than the standard replace family of functions. Those can operate destructively because you know in advance that you can replace element by element. Even in that case, it's still somewhat difficult to do this efficiently, because the access time for lists and vectors is very different, so general-purpose functions like subseq can be problematic. As Rainer Joswig pointed out in a comment:
It's kind of unfortunate that for many algorithms over sequences there
is no single efficient implementation. I see often that there are two
versions, one for lists and one for vectors, which then get hidden
behind a dispatching function. For a hack a simple common version is
fine, but for a library function, often there are different
implementations - like shown here.
(In fact, in doing a bit of research on whether some library contains a function for this, one of the first Google results I got was a question on Code Review, Generic sequence splitter in Common Lisp, in which Rainer and I both had some comment similar to those here.)
A version for lists
However, your implementation is rather inefficient because it makes multiple copies of the the remainders of sequences. E.g., when you replace (z) in (1 z 2 z 3 z), with (x y), you'll first make (3 x y), then copy it in making (2 x y 3 z y), and then you'll copy that in making (1 x y 2 x y 3 x y). You might be better off in doing one pass over the sequence, determining the indices of the subsequences to replace, or collecting the bits that need to don't need to be replaced, etc. You'll probably want separate implementations for lists and for other sequences. E.g., with a list, you might do:
(defun splice-replace-list (old new list)
(do ((new (coerce new 'list))
(old-len (length old))
(parts '()))
((endp list)
(reduce 'append (nreverse parts) :from-end t))
(let ((pos (search old list)))
(push (subseq list 0 pos) parts)
(cond
((null pos)
(setf list nil))
(t
(push new parts)
(setf list (nthcdr (+ old-len pos) list)))))))
There are some optimizations you could make here, if you wanted. For instance, you could implement a search-list that, rather than returning the position of the first instance of the sought sequence, could return a copy of the head up until that point and the tail beginning with the sequence as multiple values, or even the copied head, and the tail after the sequence, since that's what you're really interested in, in this case. Additionally, you could do something a bit more efficient than (reduce 'append (nreverse parts) :from-end t) by not reversing parts, but using a reversed append. E.g.,
(flet ((xappend (l2 l1)
(append l1 l2)))
(reduce #'xappend '((5 6) (x y) (3 4) (x y))))
;=> (x y 3 4 x y 5 6)
I wrote this in a somewhat imperative style, but there's no reason that you can't use a functional style if you want. Be warned that not all Lisp implementation support tail call optimization, so it might be better to use do, but you certainly don't have to. Here's a more functional version:
(defun splice-replace-list (old new list)
(let ((new-list (coerce new 'list))
(old-len (length old)))
(labels ((keep-going (list parts)
(if (endp list)
(reduce 'append (nreverse parts) :from-end t)
(let* ((pos (search old list))
(parts (list* (subseq list 0 pos) parts)))
(if (null pos)
(keep-going '() parts)
(keep-going (nthcdr (+ old-len pos) list)
(list* new-list parts)))))))
(keep-going list '()))))
A version for vectors
For non lists, this is more difficult, because you don't have the specific sequence type that you're supposed to be using for the result. This is why functions like concatenate require a result-type argument. You can use array-element-type to get an element type for the input sequence, and then use make-array to get a sequence big enough to hold the result. That's trickier code, and will be more complicated. E.g., here's a first attempt. It's more complicated, but you'll get a result that's pretty close to the original vector type:
(defun splice-replace-vector (old new vector &aux (new-len (length new)))
(flet ((assemble-result (length parts)
(let ((result (make-array length :element-type (array-element-type vector)))
(start 0))
(dolist (part parts result)
(cond
((consp part)
(destructuring-bind (begin . end) part
(replace result vector :start1 start :start2 begin :end2 end)
(incf start (- end begin))))
(t
(replace result new :start1 start)
(incf start new-len)))))))
(do ((old-len (length old))
(total-len 0)
(start 0)
(indices '()))
((null start) (assemble-result total-len (nreverse indices)))
(let ((pos (search old vector :start2 start)))
(cond
((null pos)
(let ((vlength (length vector)))
(push (cons start vlength) indices)
(incf total-len (- vlength start))
(setf start nil)))
(t
(push (cons start pos) indices)
(push t indices)
(incf total-len (- pos start))
(incf total-len new-len)
(setf start (+ pos old-len))))))))
CL-USER> (splice-replace-vector '(#\z) '(#\x #\y) "12z")
"12xy"
CL-USER> (splice-replace-vector '(z) '(x y) #(x y))
#(X Y)
CL-USER> (splice-replace-vector '(z) '(x y) #(1 z 2 z 3 4 z))
#(1 X Y 2 X Y 3 4 X Y)
CL-USER> (splice-replace-vector '(#\z) #(#\x #\y) "1z2z34z")
"1xy2xy34xy"
If you only want to make one pass through the input vector, then you could use an adjustable array as the output, and append to it. An adjustable array will have a bit more overhead than a fixed size array, but it does make the code a bit simpler.
(defun splice-replace-vector (old new vector)
(do ((vlength (length vector))
(vnew (coerce new 'vector))
(nlength (length new))
(result (make-array 0
:element-type (array-element-type vector)
:adjustable t
:fill-pointer 0))
(start 0))
((eql start vlength) result)
(let ((pos (search old vector :start2 start)))
(cond
;; add the remaining elements in vector to result
((null pos)
(do () ((eql start vlength))
(vector-push-extend (aref vector start) result)
(incf start)))
;; add the elements between start and pos to the result,
;; add a copy of new to result, and increment start
;; accordingly
(t
;; the copying here could be improved with adjust-array,
;; and replace, instead of repeated calls to vector-push-extend
(do () ((eql start pos))
(vector-push-extend (aref vector start) result)
(incf start))
(loop for x across vnew
do (vector-push-extend x result))
(incf start (1- nlength)))))))
A “generic” version
Using these two functions, you could define a general splice-replace that checks the type of the original input sequence and calls the appropriate function:
(defun splice-replace (old new sequence)
(etypecase sequence
(list (splice-replace-list old new sequence))
(vector (splice-replace-vector old new sequence))))
CL-USER> (splice-replace #(z) '(x y) #(1 z 2 z 3 4 z))
#(1 X Y 2 X Y 3 4 X Y)
CL-USER> (splice-replace '(z) #(x y) '(1 z 2 z 3 4 z))
(1 X Y 2 X Y 3 4 X Y)