Clojure read CSV and split the columns into several vectors - csv

Currently i have functions like this:
(def csv-file (.getFile (clojure.java.io/resource "datasources.csv")))
(defn process-csv [file]
(with-open [in-file (io/reader file)]
(doall (csv/read-csv in-file))))
what i need to do now is to produce vectors based on / group by columns from csv, i.e my process-csv output looks like this:
(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67])
my goal is to generate 3 vectors from column atom, neutron & photon:
atom: [10 23]
neutron: [22 23]
photon: [3 67]
FYI, i define 3 empty vectors before read the csv file:
(def atom [])
(def neutron[])
(def photon[])

first of all you can't modify these vectors, you've defined. It's the nature of immutable data structures. If you really need mutable vectors, use atom.
you can solve your task this way:
user> (def items (rest '(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67]
[1 2 3]
[5 6 7])))
user> (let [[atom neutron photon] (apply map vector items)]
{:atom atom :neutron neutron :photon photon})
{:atom [10 23 1 5], :neutron [22 23 2 6], :photon [3 67 3 7]}
that is how it work:
(apply map vector items) equals the following:
(map vector [10 22 3] [23 23 67] [1 2 3] [5 6 7])
it takes first items of each coll and make a vector of them, then second items and so on.
also, you can make it more robust, by taking row column names exactly from your csv data header:
user> (def items '(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67]
[1 2 3]
[5 6 7]))
#'user/items
user> (zipmap (map keyword (first items))
(apply map vector (rest items)))
{:atom [10 23 1 5], :neutron [22 23 2 6], :photon [3 67 3 7]}

I'll illustrate some other methods you could use, which can be combined with methods that leetwinski illustrates. Like leetwinski, I'll suggest using a hash map as your final structure, rather than three symbols containing vectors. That's up to you.
If you want, you can use core.matrix's transpose to do what leetwinski does with (apply map vector ...):
(require '[clojure.core.matrix :as mx])
(mx/transpose '(["atom" "neutron" "photon"] [10 22 3] [23 23 67]))
which produces:
[["atom" 10 23] ["neutron" 22 23] ["photon" 3 67]]
transpose is designed to work on any kind of matrix that implements the core.matrix protocols, and normal Clojure sequences of sequences are treated as matrices by core.matrix.
To generate a map, here's one approach:
(into {} (map #(vector (keyword (first %)) (rest %))
(mx/transpose '(["atom" "neutron" "photon"] [10 22 3] [23 23 67]))))
which produces:
{:atom (10 23), :neutron (22 23), :photon (3 67)}
keyword makes strings into keywords. #(vector ...) makes a pair, and (into {} ...) takes the sequence of pairs and makes a hash map from them.
Or if you want the vectors in vars, as you specified, then you can use a variant of leetwinski's let method. I suggest not defing the symbol atom, because that's the name of a standard function in Clojure.
(let [[adam neutron proton] (mx/transpose
(rest '(["atom" "neutron" "photon"]
[10 22 3]
[23 23 67])))]
(def adam adam)
(def neutron neutron)
(def proton proton))
It's not exactly good form to use def inside a let, but you can do it. Also, I don't recommend naming the local variables defined by let with the same names as the top-level variables. As you can see, if makes the defs confusing. I did this on purpose here just to show how the scoping rule works: In (def adam adam), the first instance of "adam" represents the top-level variable that gets defined, whereas the second instance of "adam" represents the local var defined by let, containing [10 23]. The result is:
adam ;=> [10 23]
neutron ;=> [22 23]
proton ;=> [3 67]
(I think there are probably some subtleties that I'm expressing incorrectly. If so, someone will no doubt comment about it.)

Related

Function to generate two random numbers between 0 and 1 as a pair of [x y]

Haven’t used clojure in a while would appreciate some help
I tried doing
(defn num [] (rand-int 2) (rand-int 2))
(defn randints [] [(rand-int 2) (rand-int 2)])
if you meant random integers between 0 and 1. Or
(defn randfloats [] [(rand) (rand)])
if you meant random floating point numbers between 0 and 1.

How to reset a counter in Re-frame (ClojureScript)

This must be one of those silly/complex things that everybody founds when learning a new framework. So I have this function:
(defn display-questions-list
[]
(let [counter (atom 1)]
[:div
(doall (for [question #(rf/subscribe [:questions])]
^{:key (swap! counter inc)} [question-item (assoc question :counter #counter)])])))
The #counter atom doesn't hold any important data, it's just a "visual" counter to display the number in the list. When the page is loaded for first time, all works fine, if there are five questions the list displays (1..5), the issue is that when a question is created/edited/deleted the subscription:
#(rf/subscribe [:questions])
is called again and then of course the list is displayed but now from 6 to 11. So I need a way to reset the #counter.
You should not be using an atom for this purpose. Your code should look more like this:
(ns tst.demo.core
(:use tupelo.test)
(:require [tupelo.core :as t]))
(defn display-questions-list
[]
[:div
(let [questions #(rf/subscribe [:questions])]
(doall (for [[idx question] (t/indexed questions)]
^{:key idx}
[question-item (assoc question :counter idx) ])))])
The tupelo.core/indexed function from the Tupelo library simply prepends a zero-based index value to each item in the collection:
(t/indexed [:a :b :c :d :e]) =>
([0 :a]
[1 :b]
[2 :c]
[3 :d]
[4 :e])
The source code is pretty simple:
(defn zip-lazy
"Usage: (zip-lazy coll1 coll2 ...)
(zip-lazy xs ys zs) => [ [x0 y0 z0]
[x1 y1 z1]
[x2 y2 z2]
... ]
Returns a lazy result. Will truncate to the length of the shortest collection.
A convenience wrapper for `(map vector coll1 coll2 ...)`. "
[& colls]
(assert #(every? sequential? colls))
(apply map vector colls))
(defn indexed
"Given one or more collections, returns a sequence of indexed tuples from the collections:
(indexed xs ys zs) -> [ [0 x0 y0 z0]
[1 x1 y1 z1]
[2 x2 y2 z2]
... ] "
[& colls]
(apply zip-lazy (range) colls))
Update
Actually, the main goal of the :key metadata is to provide a stable ID value for each item in the list. Since the items may be in different orders, using the list index value is actually a React antipattern. Using a unique ID either from within the data element (i.e. a user id, etc) or just the hashcode provides a unique reference value. So, in practice your code would be better written as this:
(defn display-questions-list
[]
[:div
(doall (for [question #(rf/subscribe [:questions])]
^{:key (hash question)}
[question-item (assoc question :counter idx)]))])
Some hashcode samples:
(hash 1) => 1392991556
(hash :a) => -2123407586
(hash {:a 1, :b [2 3 4]}) => 383153859

Wrong arity of simple function in clojure

I've started to learn clojure. In my book there is following exercise:
Write a function, mapset, that works like map except the return value is a set:
(mapset inc [1 1 2 2])
; => #{2 3}
I've started with something like this:
(defn mapset
[vect]
(set vect))
The result is error
"Wrong number of args (2) passed to: core/mapset"
I tried [& args] as well.
So, the question is: how can I solve such problem?
Take a closer look at your call to mapset:
(mapset inc [1 1 2 2])
Since code is data, this "call" is just a list of three elements:
The symbol mapset
The symbol inc
The vector [1 1 2 2]
When you evaluate this code, Clojure will see that it is a list and proceed to evaluate each of the items in that list (once it determines that it isn't a special form or macro), so it will then have a new list of three elements:
The function to which the symbol core/mapset was bound
The function to which the symbol clojure.core/inc was bound
The vector [1 1 2 2]
Finally, Clojure will call the first element of the list with the rest of the elements as arguments. In this case, there are two arguments in the rest of the list, but in your function definition, you only accounted for one:
(defn mapset
[vect]
(set vect))
To remedy this, you could implement mapset as follows:
(defn mapset
[f vect]
(set (map f vect)))
Now, when you call (mapset inc [1 1 2 2]), the argument f will be found to the function clojure.core/inc, and the argument vect will be bound to the vector [1 1 2 2].
Your definition of mapset takes a single argument vect
At a minimum you need to take 2 arguments, a function and a sequence
(defn mapset [f xs] (set (map f xs)))`
But it is interesting to think about this as the composition of 2 functions also:
(def mapset (comp set map))

Retrieve Clojure function metadata dynamically

Environment: Clojure 1.4
I'm trying to pull function metadata dynamically from a vector of functions.
(defn #^{:tau-or-pi: :pi} funca "doc for func a" {:ans 42} [x] (* x x))
(defn #^{:tau-or-pi: :tau} funcb "doc for func b" {:ans 43} [x] (* x x x))
(def funcs [funca funcb])
Now, retrieving the metadata in the REPL is (somewhat) straight-forward:
user=>(:tau-or-pi (meta #'funca))
:pi
user=>(:ans (meta #'funca))
42
user=>(:tau-or-pi (meta #'funcb))
:tau
user=>(:ans (meta #'funcb))
43
However, when I try to do a map to get the :ans, :tau-or-pi, or basic :name from the metadata, I get the exception:
user=>(map #(meta #'%) funcs)
CompilerException java.lang.RuntimeException: Unable to resolve var: p1__1637# in this context, compiling:(NO_SOURCE_PATH:1)
After doing some more searching, I got the following idea from a posting in 2009 (https://groups.google.com/forum/?fromgroups=#!topic/clojure/VyDM0YAzF4o):
user=>(map #(meta (resolve %)) funcs)
ClassCastException user$funca cannot be cast to clojure.lang.Symbol clojure.core/ns-resolve (core.clj:3883)
I know that the defn macro (in Clojure 1.4) is putting the metadata on the Var in the def portion of the defn macro so that's why the simple (meta #'funca) is working, but is there a way to get the function metadata dynamically (like in the map example above)?
Maybe I'm missing something syntactically but if anyone could point me in the right direction or the right approach, that'd would be great.
Thanks.
the expression #(meta #'%) is a macro that expands to a call to defn (actually def) which has a parameter named p1__1637# which was produced with gensym and the call to meta on that is attempting to use this local parameter as a var, since no var exists with that name you get this error.
If you start with a vector of vars instead of a vector of functions then you can just map meta onto them. You can use a var (very nearly) anywhere you would use a function with a very very minor runtime cost of looking up the contents of the var each time it is called.
user> (def vector-of-functions [+ - *])
#'user/vector-of-functions
user> (def vector-of-symbols [#'+ #'- #'*])
#'user/vector-of-symbols
user> (map #(% 1 2) vector-of-functions)
(3 -1 2)
user> (map #(% 1 2) vector-of-symbols)
(3 -1 2)
user> (map #(:name (meta %)) vector-of-symbols)
(+ - *)
user>
so adding a couple #'s to your original code and removing an extra trailing : should do the trick:
user> (defn #^{:tau-or-pi :pi} funca "doc for func a" {:ans 42} [x] (* x x))
#'user/funca
user> (defn #^{:tau-or-pi :tau} funcb "doc for func b" {:ans 43} [x] (* x x x))
#'user/funcb
user> (def funcs [#'funca #'funcb])
#'user/funcs
user> (map #(meta %) funcs)
({:arglists ([x]), :ns #<Namespace user>, :name funca, :ans 42, :tau-or-pi :pi, :doc "doc for func a", :line 1, :file "NO_SOURCE_PATH"} {:arglists ([x]), :ns #<Namespace user>, :name funcb, :ans 43, :tau-or-pi :tau, :doc "doc for func b", :line 1, :file "NO_SOURCE_PATH"})
user> (map #(:tau-or-pi (meta %)) funcs)
(:pi :tau)
user>
Recently, I found it useful to attach metadata to the functions themselves rather than the vars as defn does.
You can do this with good ol' def:
(def funca ^{:tau-or-pi :pi} (fn [x] (* x x)))
(def funcb ^{:tau-or-pi :tau} (fn [x] (* x x x)))
Here, the metadata has been attached to the functions and then those metadata-laden functions are bound to the vars.
The nice thing about this is that you no longer need to worry about vars when considering the metadata. Since the functions contain metadata instead, you can pull it from them directly.
(def funcs [funca funcb])
(map (comp :tau-or-pi meta) funcs) ; [:pi :tau]
Obviously the syntax of def isn't quite as refined as defn for functions, so depending on your usage, you might be interested in re-implementing defn to attach metadata to the functions.
I'd like to elaborate on Beyamor's answer. For some code I'm writing, I am using this:
(def ^{:doc "put the-func docstring here" :arglists '([x])}
the-func
^{:some-key :some-value}
(fn [x] (* x x)))
Yes, it is a bit unwieldy to have two metadata maps. Here is why I do it:
The first metadata attaches to the the-func var. So you can use (doc the-func) which returns:
my-ns.core/the-func
([x])
put the-func docstring here
The second metadata attaches to the function itself. This lets you use (meta the-func) to return:
{:some-key :some-value}
In summary, this approach comes in handy when you want both docstrings in the REPL as well as dynamic access to the function's metadata.

clojure join of two CSV files in vector of vectors format

I am new to clojure and want to do this correctly. I have two data sources of date stamped data from two CSV files. I have pulled them in a put them in vector of vectors format. I would like to do a join(outer join) sort of combining of the data.
;--- this is how I am loading the data for each file.... works great ---
(def csvfile (slurp "table.csv"))
(def csvdat (clojure.string/split-lines csvfile))
(def final (vec (rest (map (fn [x] (clojure.string/split x #",")) csvdat))))
CSV File 1:
date value1 value2 value3
CSV File 2:
date valueA valueB valueC
Resulting vector of vectors format:
date value1 value2 value3 valueA valueB valueC
I have several ugly ideas I just want to do the best ugly idea. :)
Option 1: get a unique set of times in sequnence and map all the data from the two vector of vectors into a new vector of vectors
Option 2: is there a clever way I can do a map from two vector of vectors to a new vector of vectors(more advanced mapping than I can speak to with my experience)
What is the most clojure idomatic method of doing "joins"? Should I be doing maps? I like vectors because I will be doing a lot of range calculations after csv's are joined, like moving a window(groups of rows) down the rows of the joined data.
Your data:
(def csv1 [["01/01/2012" 1 2 3 4]["06/15/2012" 38 24 101]])
(def csv2 [["01/01/2012" 99 98 97 96]["06/15/2012" 28 101 43]])
Convert CSV's vector of vectors representation to map:
(defn to-map [v] (into {} (map (fn [[date & data]] [date data]) v)))
Merge the maps:
(merge-with into (to-map csv1) (to-map csv2))
As I understand it, you have data that sort of looks like this:
(def csv1 [["01/01/2012" 1 2 3 4]["06/15/2012" 38 24 101]])
(def csv2 [["01/01/2012" 99 98 97 96]["06/15/2012" 28 101 43]])
Well, you can make a map out of that.
repl-test.core=> (map #(hash-map (keyword (first %1)) (vec (rest %1))) csv1)
({:01/01/2012 [1 2 3 4]} {:06/15/2012 [38 24 101]})
Now, you have a another csv file that may or may not be in the same order (csv2 above).
Suppose I take one line of csv1:
(def l1 (first csv1))
["01/01/2012" 1 2 3 4]
and concat the vector of the same date from that one line csv2
(concat (hash-map (keyword (first l1)) (vec (concat (rest l1) [44 43 42]))))
([:01/01/2012 [1 2 3 4 44 43 42]])
I'm going to leave the writing of the functions to you as an exercise.
Is this what you wanted to do?
Here are some components after using lein new bene-csv:
project.clj
(defproject bene-csv "1.0.4-SNAPSHOT"
:description "A csv parsing library"
:dependencies [[org.clojure/clojure "1.4.0"]
[clojure-csv/clojure-csv "1.3.2"]
[util "1.0.2-SNAPSHOT"]]
:aot [bene-csv.core]
:omit-source true)
core.clj (just the header)
(ns bene-csv.core
^{:author "Charles M. Norton",
:doc "bene-csv is a small library to parse a .csv file.
Created on March 8, 2012"}
(:require [clojure.string :as cstr])
(:require [util.core :as utl])
(:use clojure-csv.core))
routine in core.clj to parse csv file
(defn ret-csv-data
"Returns a lazy sequence generated by parse-csv.
Uses utl/open-file which will return a nil, if
there is an exception in opening fnam.
parse-csv called on non-nil file, and that
data is returned."
[fnam]
(let [ csv-file (utl/open-file fnam)
inter-csv-data (if-not (nil? csv-file)
(parse-csv csv-file)
nil)
csv-data (vec
(filter
#(and pos? (count %) (not (nil? (rest %))))
inter-csv-data))]
(pop csv-data)))