Clojure: generating files containing clojure breaks with persistent lists

Clojure: generating files containing clojure breaks with persistent lists - function

I asked a related question here: Clojure: How do I turn clojure code into a string that is evaluatable? It mostly works but lists are translated to raw parens, which fails
The answer was great but I realized that is not exactly what I needed. I simplified the example for stackoverflow, but I am not just writing out datum, I am trying to write out function definitions and other things which contain structures that contain lists. So here is a simple example (co-opted from the last question).
I want to generate a file which contains the function:
(defn aaa []
(fff :update {:bbb "bbb" :xxx [1 2 3] :yyy (3 5 7)}))
Everything after the :update is a structure I have access to when writing the file, so I call str on it and it emerges in that state. This is fine, but the list, when I load-file on this generated function, tries to call 3 as a function (as it is the first element in the list).
So I want a file which contains my function definition that I can then call load-file and call the functions defined in it. How can I write out this function with associated data so that I can load it back in without clojure thinking what used to be lists are now function calls?

You need to quote the structure prior to obtaining the string representation:
(list 'quote foo)
where foo is the structure.
Three additional remarks:
traversing the code to quote all lists / seqs would not do at all, since the top-level (defn ...) form would also get quoted;
lists are not the only potentially problematic type -- symbols are another one (+ vs. #<core$_PLUS_ clojure.core$_PLUS_#451ef443>);
rather than using (str foo) (even with foo already quoted), you'll probably want to print out the quoted foo -- or rather the entire code block with the quoted foo inside -- using pr / prn.
The last point warrants a short discussion. pr explicitly promises to produce a readable representation if *print-readably* is true, whereas str only produces such a representation for Clojure's compound data structures "by accident" (of the implementation) and still only if *print-readably* is true:
(str ["asdf"])
; => "[\"asdf\"]"
(binding [*print-readably* false]
(str ["asdf"]))
; => "[asdf]"
The above behaviour is due to clojure.lang.RT/printString's (that's the method Clojure's data structures ultimately delegate their toString needs to) use of clojure.lang.RT/print, which in turn chooses output format depending on the value of *print-readably*.
Even with *print-readably* bound to true, str may produce output inappropriate for clojure.lang.Reader's consumption: e.g. (str "asdf") is just "asdf", while the readable representation is "\"asdf\"". Use (with-out-str (pr foo)) to obtain a string object containing the representation of foo, guaranteed readable if *print-readably* is true.

Try this instead...
(defn aaa []
(fff :update {:bbb "bbb" :xxx [1 2 3] :yyy (list 3 5 7)}))

Wrap it in a call to quote to read it without evaluating it.

Related

How exactly does Clojure process function definitions?

I'm studying Clojure, and I've read that in Clojure a function definition is just data, i.e. parameters vector is just an ordinary vector. If that's the case, why can I do this
(def add (fn [a b]
(+ a b)))
but not this
(def vector-of-symbols [a b])
?
I know I normally would have to escape symbols like this:
(def vector-of-symbols [`a `b])
but why don't I have to do it in fn/defn? I assume this is due to fn/defn being macros. I tried examining their source, but they are too advanced for me so far. My attempts to recreate defn also fail, and I'm not sure why (I took example from a tutorial):
(defmacro defn2 [name param & body]
`(def ~name (fn ~param ~#body)))
(defn2 add [a b] (+ a b)) ;;I get "Use of undeclared Var app.core/defn2"
Can someone please explain, how exactly does Clojure turn data structures, especially symbols, into code? And what am I missing about the macro example?
Update Apparently, macro does not work because my project is actually in Clojurescript (in Clojure it does work). I did not think it matters, but as I progress - I discover more and more things that somehow don't work for me in with Clojurescript.
Update 2 This helps: https://www.clojurescript.org/about/differences

A function is a first-class citizen as other data in Clojure.
To define a vector you use (vector ...) or reader has syntaxic sugar [...], for a list it's (list ...) or '(...) the quote not to evaluate the list as a function call, for a set (set ...) or #{...}.
So the factory function for a function is fn (in fact fn*, that comes from Java core of Clojure, fn is a series of macros to manage to destructure and all).
(fn args body)
is a function call that returns a function, where args is a vector of argument(s) event. empty and body is a series of Clojure expressions to be evaluated with args bind to the environment. If nothing is to be evaluated it returns nil. There is also a syntactic sugar #(...) with %x as argument x and % as argument 1.
(fn ...) return a value that is a function. So
(def my-super-function (fn [a b c d] (println "coucou") (+ a b c d)))
binds the symbol my-super-function with the anonymous function returned by (fn [a b c d] (println "coucou") (+ a b c d)).
(def my_vector [1 2 3])
binds the symbol my_vector with the vector [1 2 3]

List of learning resources: https://github.com/io-tupelo/clj-template#documentation
As #jas said, your defn2 macro looks fine.
The main point is that macros are an advanced feature that one almost never needs. A macro is equivalent to a compiler extension, and that is almost never the best solution to a problem. Also keep in mind that functions can do some things macros can't.
Another point: the syntax-quote (aka backquote) ` is very different from a single quote '. In your example you want the single quote for ['a 'b]. Even better would be to quote the entire vector form '[a b].
As to your primary question, it is poorly explained how source-file text is converted into code. This is a 2-step process. The Clojure Reader consumes text string data (from a file or a literal string) and produces data structures like lists, vectors, strings, numbers, symbols. The Clojure compiler takes these data structures as input and produces java byte code that can be executed.
It is confusing because, when printed, one can't tell the difference between the text representation of a vector [1 2 3] and the text string that is input to the reader [1 2 3]. Ideally it would be color-coded or something. This problem doesn't exist in Java, etc since they don't have macros and hence there is no confusion between the source code (text) and the data structures used by a macro (not text).
For a more detailed answer on creating macros in Clojure, please see this answer.

How to convert a huge Elisp data structure to JSON?

org-element-parse-buffer returns a huge tree even for a small Org file. I want to transform this tree into JSON. Apparently, json.el uses recursive functions to traverse cons cells, and as Elisp doesn't support tail recursion, invocation of json-encode quickly runs out of stack. If I increase max-lisp-eval-depth and max-specpdl-size, Emacs crashes.
How do I workaround that and transform a huge tree structure into JSON? In general, how do I workaround when I have a huge data structure and a recursive function that may run out of stack?

Yes, json.el functions are recursive, but recursive functions called on Org-Element cause stack overflow not because org-element-parse-buffer returns a huge AST, but because it returns a circular list. A tree-recursive function on a circular list is like a squirrel in a cage.
I guess, the idea behind using self-references in the AST returns is that if you traverse it, at any point you can go back to parent by simply running plist-get on keyword :parent. I imagine this usage for traversing the AST up and down:
(let ((xs '#1=(:text "foo" :child (:text "bar" :parent #1#))))
(plist-get
(plist-get
xs
:child) ; (:text "bar" :parent (:text "foo" :child #0))
:parent)) ; (:text "foo" :child (:text "bar" :parent #0))
But JSON doesn't support circular lists, so you need to remove these self-references from the AST before trying to convert to any data serialization format. I haven't found the way to elegantly remove circular references in the AST, so I resorted to a dirty hack:
Convert the AST to a string
Remove references with regular expressions
Convert the string back to an Elisp data structure
Suppose I have an Org file called test.org with the following content:
* Heading
** Subheading
Text
Then variable tree contains the parsed Org data from this buffer: (setq tree (with-current-buffer "test.org" (org-element-parse-buffer))). Then to prepare this data for JSON export, I just run:
(car (read-from-string (replace-regexp-in-string ":parent #[0-9]+?" "" (prin1-to-string tree)))))
Even with all mentions of :parent removed, the new AST is still valid, so if the new AST is in variable tree2, then the following 3 expressions are equivalent:
(org-element-interpret-data tree2)
(with-current-buffer "test.org" (buffer-substring-no-properties 1 (buffer-end 1)))
"* Heading\n** Subheading\nText\n"
Note that for some reason org-element-interpret-data removes preceding whitespace, so the above is not technically true, when you have lines like text in your Org file.
Now all you need to do is to encode the new non-circular AST into JSON and write it into a file:
(f-write (json-encode tree2) 'utf-8 "test.json")
Notes
Elisp's cons cells are pairs of 2 slots: car and cdr. If cdr of each cell contains a link to another cons cell, we get a linked list. If both car and cdr point at 2 values, we get a dotted pair. Therefore (1 . (2 . (3 . nil))) is equivalent to (1 2 3). But a cdr (or car for that matter) might point at any other cons cell, including the one that already were earlier in the list, giving rise to circular linked list.
Exercise: create a complex tree data structure with several self-references to different subtrees. Then try traversing this tree and jumping by the self-references to get the idea.
With ->> threading macro from dash list manipulation library the expression is equivalent to:
(->> tree prin1-to-string (replace-regexp-in-string ":parent #[0-9]+?" "") read-from-string car)
(buffer-substring-no-properties 1 (buffer-end 1)) is like (buffer-string), but without annoying text properties attached.
f-write is a function that writes text to files from f third-party file manipulation library.

tl;dr
Here's how to remove references to parent and structure in an org tree before encoding it to json :
(let* ((tree (org-element-parse-buffer 'object nil)))
(org-element-map tree (append org-element-all-elements
org-element-all-objects '(plain-text))
(lambda (x)
(if (org-element-property :parent x)
(org-element-put-property x :parent "none"))
(if (org-element-property :structure x)
(org-element-put-property x :structure "none"))))
(json-encode tree))

What is the difference in purpose of TO and MAKE, and where are they documented?

I feel like I understand MAKE as being a constructor for a datatype. It takes two arguments... the first the target datatype, and the second a "spec".
In the case of objects it's fairly obvious that a block of Rebol data can be used as the "spec" to get back a value of type object!
>> foo: make object! [x: 10 y: 20 z: func [value] [print x + y + value] ]
== make object! [
x: 10
y: 20
]
>> print foo/x
10
>> foo/z 1
31
I know that if you pass an integer when you create a block, it will preallocate enough underlying memory to hold a block of that length, despite being empty:
>> foo: make block! 10
== []
That makes some sense. If you pass a string in, then you get the string parsed into Rebol tokens...
>> foo: make block! "some-set-word: {String in braces} some-word 12-Dec-2012"
== [some-set-word: "String in braces" some-word 12-Dec-2012]
Not all types are accepted, and again I'll say so far... so good.
>> foo: make block! 12-Dec-2012
** Script error: invalid argument: 12-Dec-2012
** Where: make
** Near: make block! 12-Dec-2012
By contrast, the TO operation is defined very similar, except it is for "conversion" instead of "construction". It also takes a target type as a first parameter, and then a "spec". It acts differently on values
>> foo: to block! 10
== [10]
>> foo: to block! 12-Dec-2012
== [12-Dec-2012]
That seems reasonable. If it received a non-series value, it wrapped it in a block. If you try an any-block! value with it, I'd imagine it would give you a block! series with the same values inside:
>> foo: to block! quote (a + b)
== [a + b]
So I'd expect a string to be wrapped in a block, but it just does the same thing MAKE does:
>> foo: to block! "some-set-word: {String in braces} some-word 12-Dec-2012"
== [some-set-word: "String in braces" some-word 12-Dec-2012]
Why is TO being so redundant with MAKE, and what is the logic behind their distinction? Passing integers into to block! gets the number inside a block (instead of having the special construction mode), and dates go into to block! making the date in a block instead of an error as with MAKE. So why wouldn't one want a to block! of a string to put that string inside a block?
Also: beyond reading the C sources for the interpreter, where is the comprehensive list of specs accepted by MAKE and TO for each target type?

MAKE is a constructor, TO is a converter. The reason that we have both is that for many types that operation is different. If they weren't different, we could get by with one operation.
MAKE takes a spec that is supposed to be a description of the value you're constructing. This is why you can pass MAKE a block and get values like objects or functions that aren't block-like at all. You can even pass an integer to MAKE and have it be treated like an allocation directive.
TO takes a value that is intended to be more directly converted to the target type (this value being called "spec" is just an unfortunate naming mishap). This is why the values in the input more directly correspond to the values in the output. Whenever there is a sensible default conversion, TO does it. That is why many types don't have TO conversions defined between them, the types are too different conceptually. We have fairly comprehensive conversions for some types where this is appropriate, such as to strings and blocks, but have carefully restricted some other conversions that are more useful to prohibit, such as from none to most types.
In some cases of simple types, there really isn't a complex way to describe the type. For them, it doesn't hurt to have the constructors just take self-describing values as their specs. Coincidentally, this ends up being the same behavior as TO for the same type and values. This doesn't hurt, so it's not useful to trigger an error in this case.
There are no comprehensive docs for the behavior of MAKE and TO because in Rebol 3 their behavior isn't completely finalized. There is still some debate in some cases about what the proper behavior should be. We're trying to make things more balanced, without losing any valuable functionality. We've already done a lot of work improving none and binary conversions, for instance. Once they are more finalized, and once we have a place to put them, we'll have more docs. In the meanwhile most of the Rebol 2 behavior is documented, and most of the changes so far for Rebol 3 are in CureCode.

Also: beyond reading the C sources for the interpreter, where is the
comprehensive list of specs accepted by MAKE and TO for each target
type?
May not be that useful, since it's red specific:
comparison-matrix
conversion-matrix
But it does at least mention if the behaviour is similar or different from rebol

Is the Macro argument a function?

I am trying to determine whether a given argument within a macro is a function, something like
(defmacro call-special? [a b]
(if (ifn? a)
`(~a ~b)
`(-> ~b ~a)))
So that the following two calls would both generate "Hello World"
(call-special #(println % " World") "Hello")
(call-special (println " World") "Hello")
However, I can't figure out how to convert "a" into something that ifn? can understand. Any help is appreciated.

You might want to ask yourself why you want to define call-special? in this way. It doesn't seem particularly useful and doesn't even save you any typing - do you really need a macro to do this?
Having said that, if you are determined to make it work then one option would be to look inside a and see if it is a function definition:
(defmacro call-special? [a b]
(if (#{'fn 'fn*} (first a))
`(~a ~b)
`(-> ~b ~a)))
This works because #() function literals are expanded into a form as follows:
(macroexpand `#(println % " World"))
=> (fn* [p1__2609__2610__auto__]
(clojure.core/println p1__2609__2610__auto__ " World"))
I still think this solution is rather ugly and prone to failure once you start doing more complicated things (e.g. using nested macros to generate your functions)

First, a couple of points:
Macros are simply functions that receive as input [literals, symbols, or collections of literals and symbols], and output [literals, symbols, or collections of literals and symbols]. Arguments are never functions, so you could never directly check the function the symbol maps to.
(call-special #(println % " World") "Hello") contains reader macro code. Since reader macros are executed before regular macros, you should expand this before doing any more analysis. Do this by applying (read-string "(call-special #(println % \" World\") \"Hello\")") which becomes (call-special (fn* [p1__417#] (println p1__417# "world")) "Hello").
While generally speaking, it's not obvious when you would want to use something when you should probably use alternative methods, here's how I would approach it.
You'll need to call macroexpand-all on a. If the code eventually becomes a (fn*) form, then it is guaranteed to be a function. Then you can safely emit (~a ~b). If it macroexpands to eventually be a symbol, you can also emit (~a ~b). If the symbol wasn't a function, then an error would throw at runtime. Lastly, if it macroexpands into a list (a function call or special form call), like (println ...), then you can emit code that uses the thread macro ->.
You can also cover the cases such as when the form macroexpands into a data structure, but you haven't specified the desired behavior.

a in your macro is just a clojure list data structure (it is not a function yet). So basically you need to check whether the data structure a will result is a function or not when it is evaluated, which can be done like show below:
(defmacro call-special? [a b]
(if (or (= (first a) 'fn) (= (first a) 'fn*))
`(~a ~b)
`(-> ~b ~a)))
By checking whether the first element of the a is symbol fn* or fn
which is used to create functions.
This macro will only work for 2 cases: either you pass it a anonymous function or an expression.

Haskell: List Comprehension to Combinatory

Inspired by this article. I was playing with translating functions from list comprehension to combinatory style. I found something interesting.
-- Example 1: List Comprehension
*Main> [x|(x:_)<-["hi","hello",""]]
"hh"
-- Example 2: Combinatory
*Main> map head ["hi","hello",""]
"hh*** Exception: Prelude.head: empty list
-- Example 3: List Comprehension (translated from Example 2)
*Main> [head xs|xs<-["hi","hello",""]]
"hh*** Exception: Prelude.head: empty list
It seems strange that example 1 does not throw an exception, because (x:_) pattern matches one of the definitions of head. Is there an implied filter (not . null) when using list comprehensions?

See the section on list comprehensions in the Haskell report. So basically
[x|(x:_)<-["hi","hello",""]]
is translated as
let ok (x:_) = [ x ]
ok _ = [ ]
in concatMap ok ["hi","hello",""]
P.S. Since list comprehensions can be translated into do expressions, a similar thing happens with do expressions, as detailed in the section on do expressions. So the following will also produce the same result:
do (x:_)<-["hi","hello",""]
return x

Pattern match failures are handled specially in list comprehensions. In case the pattern fails to match, the element is dropped. Hence you just get "hh" but nothing for the third list element, since the element doesn't match the pattern.
This is due to the definition of the fail function which is called by the list comprehension in case a pattern fails to match some element:
fail _ = []
The correct parts of this answer are courtesy of kmc of #haskell fame. All the errors are mine, don't blame him.

Yes. When you qualify a list comprehension by pattern matching, the values which don't match are filtered out, getting rid of the empty list in your Example 1. In Example 3, the empty list matches the pattern xs so is not filtered, then head xs fails. The point of pattern matching is the safe combination of constructor discrimination with component selection!
You can achieve the same dubious effect with an irrefutable pattern, lazily performing the selection without the discrimination.
Prelude> [x|(~(x:_))<-["hi","hello",""]]
"hh*** Exception: <interactive>:1:0-30: Irrefutable pattern failed for pattern (x : _)
List comprehensions neatly package uses of map, concat, and thus filter.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008