Clojure Iterate through JSON and update individual keys in map - json

I'm new to Clojure, after trying multiple methods I'm completely stuck. I know how to achieve this in any other imperative languages, but not in Clojure.
I have a JSON file https://data.nasa.gov/resource/y77d-th95.json containing meteor fall data, each fall includes a mass and year.
I'm trying to find which year had the greatest collective total mass of falls.
Here's what I have so far:
(def jsondata
(json/read-str (slurp "https://data.nasa.gov/resource/y77d-th95.json") :key-fn keyword))
;Get the unique years
(def years (distinct (map :year jsondata)))
;Create map of unique years with a number to hold the total mass
(def yearcount (zipmap years (repeat (count years) 0)))
My idea was to use a for function to iterate through the jsondata, and update the yearcount map with the corresponding key (year in the fall object) with the mass of the object (increment it by, as in += in C)
I tried this although I knew it probably wouldn't work:
(for [x jsondata]
(update yearcount (get x :year) (+ (get yearcount (get x :year)) (Integer/parseInt (get x :mass)))))
The idea of course being that the yearcount map would hold the totals for each year, on which I could then use frequencies, sort-by, and last to get the year with the highest mass.
Also defined this function to update values in a map with a function, although Im not sure if I could actually use this:
(defn map-kv [m f]
(reduce-kv #(assoc %1 %2 (f %3)) {} m))
I've tried a few different methods, had lots of issues and just can't get anywhere.

Here's an alternate version just to show an approach with a slightly different style. Especially if you're new to clojure it may be easier to see the stepwise thinking that led to the solution.
The tricky part might be the for statement, which is another nice way to build up a new collection by (in this case) applying functions to each key and value in an existing map.
(defn max-meteor-year [f]
(let [rdr (io/reader f)
all-data (json/read rdr :key-fn keyword)
clean-data (filter #(and (:year %) (:mass %)) all-data)
grouped-data (group-by #(:year %) clean-data)
reduced-data
(for [[k v] grouped-data]
[(subs k 0 4) (reduce + (map #(Double/parseDouble (:mass %)) v))])]
(apply max-key second reduced-data)))
clj.meteor> (max-meteor-year "meteor.json")
["1947" 2.303023E7]

Here is my solution. I think you'll like it because its parts are decoupled and are not joined into a single treading macro. So you may change and test any part of it when something goes wrong.
Fetch the data:
(def jsondata
(json/parse-string
(slurp "https://data.nasa.gov/resource/y77d-th95.json")
true))
Pay attention, you may just pass true flag that indicates the keys should be keywords rather than strings.
Declare a helper function that takes into account a case when the first argument is missing (is nil):
(defn add [a b]
(+ (or a 0) b))
Declare a reduce function that takes a result and an item from a collection of meteor data. It updates the result map with our add function we created before. Please note, some items do not have either mass or year keys; we should check them for existence before operate on them:
(defn process [acc {:keys [year mass]}]
(if (and year mass)
(update acc year add (Double/parseDouble mass))
acc))
The final step is to run reducing algorithm:
(reduce process {} jsondata)
The result is:
{"1963-01-01T00:00:00.000" 58946.1,
"1871-01-01T00:00:00.000" 21133.0,
"1877-01-01T00:00:00.000" 89810.0,
"1926-01-01T00:00:00.000" 16437.0,
"1866-01-01T00:00:00.000" 559772.0,
"1863-01-01T00:00:00.000" 33710.0,
"1882-01-01T00:00:00.000" 314462.0,
"1949-01-01T00:00:00.000" 215078.0,
I think that such a step-by-step solution is much more clearer and maintainable than a single huge ->> thread.

Update: sorry, I misunderstood the question. I think this will work for you:
(->> (group-by :year jsondata)
(reduce-kv (fn [acc year recs]
(let [sum-mass (->> (keep :mass recs)
(map #(Double/parseDouble %))
(reduce +))]
(assoc acc year sum-mass)))
{})
(sort-by second)
(last))
=> ["1947-01-01T00:00:00.000" 2.303023E7]
The reduce function here is starting out with an initial empty map, and its input will be the output of group-by which is a map from years to their corresponding records.
For each step of reduce, the reducing function is receiving the acc map we're building up, the current year key, and the corresponding collection of recs for that year. Then we get all the :mass values from recs (using keep instead of map because not all recs have a mass value apparently). Then we map over that with Double/parseDouble to parse the mass strings into numbers. Then we reduce over that to sum all the masses for all the recs. Finally we assoc the year key to acc with the sum-mass. This outputs a map from years to their mass sums.
Then we can sort those map key/value pairs by their value (second returns the value), then we take the last item with the highest value.

Related

Choosing/evaluating macro argument forms

The Common Lisp case macro always defaults to eql for testing whether its keyform matches one of the keys in its clauses. I'm aiming with the following macro to generalize case to use any supplied comparison function (although with evaluated keys):
(defmacro case-test (form test &rest clauses)
(once-only (form test)
`(cond ,#(mapcar #'(lambda (clause)
`((funcall ,test ,form ,(car clause))
,#(cdr clause)))
`,clauses))))
using
(defmacro once-only ((&rest names) &body body)
"Ensures macro arguments only evaluate once and in order.
Wrap around a backquoted macro expansion."
(let ((gensyms (loop for nil in names collect (gensym))))
`(let (,#(loop for g in gensyms collect `(,g (gensym))))
`(let (,,#(loop for g in gensyms for n in names collect ``(,,g ,,n)))
,(let (,#(loop for n in names for g in gensyms collect `(,n ,g)))
,#body)))))
For example:
(macroexpand '(case-test (list 3 4) #'equal
('(1 2) 'a 'b)
('(3 4) 'c 'd)))
gives
(LET ((#:G527 (LIST 3 4)) (#:G528 #'EQUAL))
(COND ((FUNCALL #:G528 #:G527 '(1 2)) 'A 'B)
((FUNCALL #:G528 #:G527 '(3 4)) 'C 'D)))
Is it necessary to worry about macro variable capture for a functional argument (like #'equal)? Can such arguments be left off the once-only list, or could there still be a potential conflict if #'equal were part of the keyform as well. Paul Graham in his book On Lisp, p.118, says some variable capture conflicts lead to "extremely subtle bugs", leading one to believe it might be better to (gensym) everything.
Is it more flexible to pass in a test name (like equal) instead of a function object (like #'equal)? It looks like you could then put the name directly in function call position (instead of using funcall), and allow macros and special forms as well as functions?
Could case-test instead be a function, instead of a macro?
Variable capture
Yes, you need to put the function into the once-only because it can be created dynamically.
The extreme case would be:
(defun random-test ()
(aref #(#'eq #'eql #'equal #'equalp) (random 4)))
(case-test foo (random-test)
...)
You want to make sure that the test is the same in the whole case-test form.
Name vs. object
Evaluating the test argument allows for very flexible forms like
(case-test foo (object-test foo)
...)
which allows "object-oriented" case-test.
Function vs. macro
Making case-test into a function is akin to making any other conditional (if and cond) into a function - how would you handle the proverbial
(case-test "a" #'string-equal
("A" (print "safe"))
("b" (launch missiles)))

How to count occurence of an item in a list?

I have been messing around with Haskell for two weeks now and have some functions written in Haskell. I heard that Erlang was quite similar(since they are both predominately functional) so I thought I would translate some of these functions to see if I could get them working in Erlang. However I have been having trouble with the syntax for this function I wrote. The purpose of this function is to simply take a character or int and go through a list. After it goes through the list I am just trying to count the amount of times that item occurs. Here is an example run it should return the following.
count (3, [3, 3, 2, 3, 2, 5]) ----> 3
count (c, [ a, b, c, d]) ----> 1
Whenever I run my code it just keeps spitting out syntax issues and it is really a pain debugging in Erlang. Here is the code I have written:
count(X,L) ->
X (L:ls) ->
X == L = 1+(count X ls);
count X ls.
Any ideas I can do to fix this?
It's not clear what you're going for, as your syntax is pretty far off. However, you could accomplish your call with something like this:
count(Needle, Haystack) -> count(Needle, Haystack, 0).
count(_, [], Count) -> Count;
count(X, [X|Rest], Count) -> count(X, Rest, Count+1);
count(X, [_|Rest], Count) -> count(X, Rest, Count).
To elaborate, you're creating a recursive function called count to find instances of Needle in Haystack. With each call, there are 3 cases to consider: the base case, where you have searched the entire list; the case in which the value you're searching for matches the first item in the list; and the case in which the value you're searching for does not match the first item in the list. Each case is a separate function definition:
count(_, [], Count) -> Count;
Matches the case in which the Haystack (i.e., the list you are scanning) is empty. This means you do not have to search anymore, and can return the number of times you found the value you're searching for in the list.
count(X, [X|Rest], Count) -> count(X, Rest, Count+1);
Matches the case in which the value you're searching for, X, matches the first item in the list. You want to keep searching the list for more matches, but you increment the counter before calling count again.
count(X, [_|Rest], Count) -> count(X, Rest, Count).
Matches the case in which the value you're searching for does not match the head of the list. In this case, you keep scanning the rest of the list, but you don't increment the counter.
Finally,
count(Needle, Haystack) -> count(Needle, Haystack, 0).
Is a helper that calls the three-argument version of the function with an initial count of 0.
Use list comprehension in Erlang:
Elem = 3,
L = [3, 3, 2, 3, 2, 5],
length([X || X <- L, X =:= Elem]) %% returns 3

How do I perform a function with side-effects over a vector?

The say-hello-to-first-person works fine, why doesn't say-hello-to-everyone?
(defpartial greeting [name]
[:p.alert "Hello " name ", how are you?"])
(defn say-hello [name]
(append $content (greeting name)))
(defn say-hello-to-first-person [names]
(say-hello (first names)))
(defn say-hello-to-everyone [names]
(map say-hello names))
(say-hello-to-first-person ["Chris" "Tom" "Jim" "Rocky"])
(say-hello-to-everyone ["Chris" "Tom" "Jim" "Rocky"])
You want doseq, Clojure's "foreach":
(doseq [name names]
(say-hello name))
map doesn't work because it produces a lazy sequence: Until you actually access the items in the sequence (which you don't here), none of them will be evaluated, and side effects won't occur.
Another option is to force evaluation by wrapping dorun around the map. Thinking about why this works may help your understanding (although doseq is the idiomatic approach here).
As #edbond says in his comment, map fn don't evaluate until necessary moment because map fn returns a lazy seq.
This is the doc of map fn
Returns a lazy sequence consisting of the result of applying f to the
set of first items of each coll, followed by applying f to the set
of second items in each coll, until any one of the colls is
exhausted. Any remaining items in other colls are ignored. Function
f should accept number-of-colls arguments.
If you need to be sure that the values are evaluate in one specific part of your code you can use doall or dorun

How can I get the name of a function from a symbol in clojure?

Suppose I define x as symbol function foo
(defn foo [x] x)
(def x foo)
Can the name "foo" be discovered if only given x?
Is there a way within foo to look up the name of the function x - "foo" in this case?
(foo x)
Is there or is it possible to create a function such as:
(get-fn-name x)
foo
A similar question was asked recently on this site; see here
When you do (def x foo), you are defining x to be "the value at foo", not "foo itself". Once foo has resolved to its value, that value no longer has any relationship whatsoever to foo.
So maybe you see one possible answer to your question now: don't resolve foo when you go to do define x. Instead of doing...
(def x foo)
...do...
(def x 'foo)
Now if you try to get the value of x, you will get foo (literally), not the value that foo resolves to.
user> x
=> foo
However, that is likely problematic, because you will probably also sometimes want to be able to get at the value that foo resolves to using x. However however, you would be able to do this by doing:
user> #(resolve x)
=> #<user$foo user$foo#157b46f>
If I were to describe what this does it would be: "get the value x resolves to, use that as a symbol, then resolve that symbol to its var (not its value), and dereference that var to obtain a value".
...Now let's do something hacky. I'm not sure I would advise doing either of these things I'm about to suggest, but you did ask Can the name "foo" be discovered if only given x?, and I can think of two ways you could do that.
Method #1: regex the fn var name
Notice what foo and x both resolve to below:
(defn foo [a] (println a))
(def x foo)
user> foo
=> #<user$foo user$foo#1e2afb2>
user> x
=> #<user$foo user$foo#1e2afb2>
Now, check this out:
user> (str foo)
=> "user$foo#1e2afb2"
user> (str x)
=> "user$foo#1e2afb2"
Cool. This only works because foo resolves to a function, which happens to have a var-like name, a name which will be the same for x because it refers to the same function. Note that "foo" is contained within the string produced by (str x) (and also by (foo x)). This is because the function's var name is apparently created with some backwards reference to the symbol that was used to initially define it. We're going to use this fact to find that very symbol from any function.
So, I wrote a regular expression to find "foo" inside that string of the function var name. It isn't that it looks for "foo", but rather that it looks for any sub-string--in regex terms, ".*"--that is preceded by a \$ character--in regex terms "(?<=\$)"--and followed by the \# character--in regex terms "(?=#)"...
user> (re-find #"(?<=\$).*(?=#)"
(str x))
=> "foo"
We can further convert this to a symbol by simply wrapping (symbol ...) around it:
user> (symbol (re-find #"(?<=\$).*(?=#)"
(str x)))
=> foo
Furthermore, this whole process could be generalized to a function that will take a function and return the symbol associated with that function's var name--which is the symbol was given when the function was initially defined (this process will not at all work nicely for anonymous functions).
(defn get-fn-init-sym [f]
(symbol (re-find #"(?<=\$).*(?=#)" (str f))))
...or this which I find nicer to read...
(defn get-fn-init-sym [f]
(->> (str f)
(re-find #"(?<=\$).*(?=#)")
symbol))
Now we can do...
user> (get-fn-init-sym x)
=> foo
Method #2: reverse lookup all ns mappings based on identity
This is going to be fun.
So, we're going to take all the namespace mappings, then dissoc 'x from it, then filter what remains based on whether the val at each mapping refers to the exact same thing as what x resolves to. We'll take the first thing in that filtered sequence, and then we'll take the key at that first thing in order to get the symbol.
user> (->> (dissoc (ns-map *ns*) 'x)
(filter #(identical? (let [v (val %)]
(if (var? v) #v v))
x))
first
key)
=> foo
Notice that if you replaced x with foo above, you would get x. Really all this is doing is returning the first name it finds that maps to the exact same value as x. As before, this could be generalized to a function:
(defn find-equiv-sym [sym]
(->> (dissoc (ns-map *ns*) sym)
(filter #(identical? (let [v (val %)]
(if (var? v) #v v))
#(resolve sym)))
first
key))
The main difference here is that the argument will have to be a quoted symbol.
user> (find-equiv-sym 'x)
=> foo
This find-equiv-sym function is really not very good. Problems will happen when you have multiple things in the namespace resolving to identical values. You could return this list of symbols that resolve to identical things (instead of just returning the first one), and then process it further from there. It would be simple to change the current function to make this work: delete the last two lines (first and key), and replace them with (map key).
Anyways, I hope this was as fun and interesting for you as it was for me, but I doubt whether either of these hacks would be a good way of going about things. I advocate my first solution.
It's not clear why you would want to do this - when you do (def x foo) you are effectively giving the name x to a new var in your namespace. It happens to have the same value as foo (i.e. it contains the same function) but is otherwise completely independent from foo. It's like having two references to the same object, to use a Java analogy.
Why should you continue to want to obtain the name foo?
If you really want to do something similar to this, this might be a case where you could use some custom metadata on the function which contains the original symbol:
(def foo
(with-meta
(fn [x] x)
{:original-function `foo}))
(def bar foo)
(defn original-function [v]
"Returns the :original-function symbol from the metadata map"
(:original-function (meta v)))
(original-function bar)
=> user/foo

Clojure: Assigning defrecord fields from Map

Following up on How to make a record from a sequence of values, how can you write a defrecord constructor call and assign the fields from a Map, leaving un-named fields nil?
(defrecord MyRecord [f1 f2 f3])
(assign-from-map MyRecord {:f1 "Huey" :f2 "Dewey"}) ; returns a new MyRecord
I imagine a macro could be written to do this.
You can simply merge the map into a record initialised with nils:
(merge (MyRecord. nil nil nil) {:f1 "Huey" :f2 "Dewey"})
Note that records are capable of holding values stored under extra keys in a map-like fashion.
The list of a record's fields can be obtained using reflection:
(defn static? [field]
(java.lang.reflect.Modifier/isStatic
(.getModifiers field)))
(defn get-record-field-names [record]
(->> record
.getDeclaredFields
(remove static?)
(map #(.getName %))
(remove #{"__meta" "__extmap"})))
The latter function returns a seq of strings:
user> (get-record-field-names MyRecord)
("f1" "f2" "f3")
__meta and __extmap are the fields used by Clojure records to hold metadata and to support the map functionality, respectively.
You could write something like
(defmacro empty-record [record]
(let [klass (Class/forName (name record))
field-count (count (get-record-field-names klass))]
`(new ~klass ~#(repeat field-count nil))))
and use it to create empty instances of record classes like so:
user> (empty-record user.MyRecord)
#:user.MyRecord{:f1 nil, :f2 nil, :f3 nil}
The fully qualified name is essential here. It's going to work as long as the record class has been declared by the time any empty-record forms referring to it are compiled.
If empty-record was written as a function instead, one could have it expect an actual class as an argument (avoiding the "fully qualified" problem -- you could name your class in whichever way is convenient in a given context), though at the cost of doing the reflection at runtime.
Clojure generates these days a map->RecordType function when a record is defined.
(defrecord Person [first-name last-name])
(def p1 (map->Person {:first-name "Rich" :last-name "Hickey"}))
The map is not required to define all fields in the record definition, in which case missing keys have a nil value in the result. The map is also allowed to contain extra fields that aren't part of the record definition.
As mentioned in the linked question responses, the code here shows how to create a defrecord2 macro to generate a constructor function that takes a map, as demonstrated here. Specifically of interest is the make-record-constructor macro.