Serializing recursive refs in Datomic - json

I have a user entity type in my Datomic database which can follow other user types. My issue comes when one user follows another user who is already following them:
User A follows user B and also User B follows user A
When I try to serialize (using Cheshire) I get a StackOverflowError because of (I'm guessing) infinite recursion on the :user/follows-users attribute.
How would I go about serializing (to json for an API) two Datomic entities that reference each another in such a way?
Here's a basic schema:
; schema
[{:db/id #db/id[:db.part/db]
:db/ident :user/username
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity
:db.install/_attribute :db.part/db}
{:db/id #db/id[:db.part/db]
:db/ident :user/follows-users
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
; create users
{:db/id #db/id[:db.part/user -100000]
:user/username "Cheech"}
{:db/id #db/id[:db.part/user -200000]
:user/username "Chong"}
; create follow relationships
{:db/id #db/id[:db.part/user -100000]
:user/follows-users #db/id[:db.part/user -200000]}
{:db/id #db/id[:db.part/user -200000]
:user/follows-users #db/id[:db.part/user -100000]}]
And once the database is set up etc. on repl:
user=> (use '[cheshire.core :refer :all])
nil
user=> (generate-string (d/touch (d/entity (d/db conn) [:user/username "Cheech"])))
StackOverflowError clojure.lang.RestFn.invoke (RestFn.java:433)

The eager expansion of linked data structures is only safe in any language if they are cycle free. An api that promises to "eagerly expand data only until a cycle is found and then switch to linking (by user id)" may be harder to consume reliably than one that never expanded and always returned enough users to follow all the links in the response. For instance the request above could return the JSON:
[{"id": -100000,
"username": "Cheech",
"follows-users": [-200000]}
{"id": -200000,
"username": "Chong",
"follows-users": [-100000]}]
Where the list of selected users is found by reducing walk of the users graph into a set.

I'm a bit of a n00b to Datomic and am certain there must be a more idiomatic way of doing what #arthur-ulfeldt suggests above but in case anyone else is looking for a quick pointer on how to go about serializing Datomic EntityMaps into json where a self-referencing ref exists, here's the code that solves my problem:
(defn should-pack?
"Returns true if the attribute is type
ref with a cardinality of many"
[attr]
(->>
(d/q '[:find ?attr
:in $ ?attr
:where
[?attr :db/valueType ?type]
[?type :db/ident :db.type/ref]
[?attr :db/cardinality ?card]
[?card :db/ident :db.cardinality/many]]
(d/db CONN) attr)
first
empty?
not))
(defn make-serializable
"Stop infinite loops on recursive refs"
[entity]
(def ent (into {} entity))
(doseq [attr ent]
(if (should-pack? (first attr))
(def ent (assoc ent
(first attr)
(map #(get-entity-id %) (first (rest attr)))))))
ent)

Related

Decode, in clojure, a JSON, clojure.data.json & cheshire.core, can't custom decode w/cheshire

My project parses JSONs, with a read/write library, called:
cheshire.core
I was having problems, trying to get the decode (func) to work, so I started messing around with:
data.json
My JSON contains data that consists of a field named "zone" this contains a vector with :keys inside, like so {:zone : [:hand :table]} that is stored into strings within the vector stored like so: {"zone" : ["hand" "table"]}
So I figured out how to convert the sample data using:
(mapv keyword {"zone" : ["hand"]})
which was great, I then needed to figure out how to implement a decoder for cheshire, I couldn't do this with my logic, I only spent like an hour working on this, but I had been using data.json, and the decoder function is relatively easy I think.
I got my project to work, here is some sample code:
(ns clojure-noob.core (:require
[cheshire.core :refer [decode]]
[clojure.data.json :as j-data]
) (:gen-class))
(defn -main
"I don't do a whole lot ... yet."
[& args]
)
this is using cheshire:
(let [init (decode "{\"zone\" : [\"hand\"]}" true
(fn [field-name]
(if (= field-name "zone")
(mapv keyword [])
[])))]
(println (str init)))
this is using data.json:
(defn my-value-reader [key value]
(if (= key :zone)
(mapv keyword value)
value))
(let [init (j-data/read-str
"{\"zone\" : [\"hand\"]}"
:value-fn my-value-reader
:key-fn keyword)]
(println (str init)))
I want the bottom result of these two from the console:
{:zone ["hand"]}
{:zone [:hand]}
The problem is I would like to do this using cheshire 😎
p.s. I am reading the factory section of cheshire? maybe this easier?
I would agree with #TaylorWood. Don't mess with the decoder, just do a bite in a time. First, parse json. Second, transform the result.
(def data "{\"zone\" : [\"hand\"]}")
(-> data
(cheshire.core/decode true)
(update-in ["zone"] (partial mapv keyword)))
#=> {:zone [:hand]}
I recommend you use a tool like schema.tools to coerce the input. You can add a second pass that attempts to coerce JSON strings into richer clojure types.
Here's some sample code!
;; require all the dependencies. See links below for libraries you need to add
(require '[cheshire.core :as json])
(require '[schema.core :as s])
(require '[schema.coerce :as sc])
(require '[schema-tools.core :as st])
;; your data (as before)
(def data "{\"zone\" : [\"hand\"]}")
;; a schema that wants an array of keywords
(s/defschema MyData {:zone [s/Keyword]})
;; use `select-schema` along with a JSON coercion matcher
(-> data
(json/decode true)
(st/select-schema MyData sc/json-coercion-matcher))
;; output: {:zone [:hand]}
Using defschema to define the shape of data you want gives you a general solution for serializing into JSON while getting the full benefit of Clojure's value types. Instead of explicitly "doing" the work of transforming, your schema describes the expected outcome, and hopefully coercions can do the right thing!
Links to libraries:
- https://github.com/plumatic/schema
- https://github.com/metosin/schema-tools#coercion
Note: you can do a similar thing with clojure.spec using metosin/spec-tools. Check out their readme for some help.

How to stream a large CSV response from a compojure API so that the whole response is not held in memory at once?

I'm new to using compojure, but have been enjoying using it so far. I'm
currently encountering a problem in one of my API endpoints that is generating
a large CSV file from the database and then passing this as the response body.
The problem I seem to be encountering is that the whole CSV file is being kept
in memory which is then causing an out of memory error in the API. What is the
best way to handle and generate this, ideally as a gzipped file? Is it possible
to stream the response so that a few thousand rows are returned at a time? When
I return a JSON response body for the same data, there is no problem returning
this.
Here is the current code I'm using to return this:
(defn complete
"Returns metrics for each completed benchmark instance"
[db-client response-format]
(let [benchmarks (completed-benchmark-metrics {} db-client)]
(case response-format
:json (json-grouped-output field-mappings benchmarks)
:csv (csv-output benchmarks))))
(defn csv-output [data-seq]
(let [header (map name (keys (first data-seq)))
out (java.io.StringWriter.)
write #(csv/write-csv out (list %))]
(write header)
(dorun (map (comp write vals) data-seq))
(.toString out)))
The data-seq is the results returned from the database, which I think is a
lazy sequence. I'm using yesql to perform the database call.
Here is my compojure resource for this API endpoint:
(defresource results-complete [db]
:available-media-types ["application/json" "text/csv"]
:allowed-methods [:get]
:handle-ok (fn [request]
(let [response-format (keyword (get-in request [:request :params :format] :json))
disposition (str "attachment; filename=\"nucleotides_benchmark_metrics." (name response-format) "\"")
response {:headers {"Content-Type" (content-types response-format)
"Content-Disposition" disposition}
:body (results/complete db response-format)}]
(ring-response response))))
Thanks to all the suggestion that were provided in this thread, I was able to create a solution using piped-input-stream:
(defn csv-output [data-seq]
(let [headers (map name (keys (first data-seq)))
rows (map vals data-seq)
stream-csv (fn [out] (csv/write-csv out (cons headers rows))
(.flush out))]
(piped-input-stream #(stream-csv (io/make-writer % {})))))
This differs from my solution because it does not realise the sequence using dorun and does not create a large String object either. This instead writes to a PipedInputStream connection asynchronously as described by the documentation:
Create an input stream from a function that takes an output stream as its
argument. The function will be executed in a separate thread. The stream
will be automatically closed after the function finishes.
Your csv-output function completely realises the dataset and turns it into a string. To lazily stream the data, you'll need to return something other than a concrete data type like a String. This suggests ring supports returning a stream, that can be lazily realised by Jetty. The answer to this question might prove useful.
I was also struggling with the streaming of large csv file. My solution was to use httpkit-channel to stream every single line of the data-seq to the client and then close the channel. My solution looks like that:
[org.httpkit.server :refer :all]
(fn handler [req]
(with-channel req channel (let [header "your$header"
data-seq ["your$seq-data"]]
(doseq [line (cons header data-seq)]
(send! channel
{:status 200
:headers {"Content-Type" "text/csv"}
:body (str line "\n")}
false))
(close channel))))

What is the simplest way to do upsert with Ecto (MySQL)

Doing upsert is common in my app and I want to implement the cleanest and simple way to implement upsert.
Should I use fragments to implement native sql upsert?
Any idiomatic ecto way to do upsert?
You can use Ecto.Repo.insert_or_update/2, please note that for this to work, you will have to load existing models from the database.
model = %Post{id: 'existing_id', ...}
MyRepo.insert_or_update changeset
# => {:error, "id already exists"}
Example:
result =
case MyRepo.get(Post, id) do
nil -> %Post{id: id} # Post not found, we build one
post -> post # Post exists, using it
end
|> Post.changeset(changes)
|> MyRepo.insert_or_update
case result do
{:ok, model} -> # Inserted or updated with success
{:error, changeset} -> # Something went wrong
end
In my case insert_or_update raised an error due to the unique index constraint 🤔
What did work for me was Postgres v9.5 upsert through on_conflict parameter:
(considering unique column is called user_id)
changeset
|> MyRepo.insert(
on_conflict: :replace_all,
conflict_target: :user_id
)
If you're looking to upsert by something other than id, you can swap in get_by for get like this:
model = %User{email: "existing_or_new_email#heisenberg.net", name: "Cat", ...}
model |> User.upsert_by(:email)
# => {:found, %User{...}} || {:ok, %User{...}}
defmodule App.User do
alias App.{Repo, User}
def upsert_by(%User{} = record_struct, selector) do
case User |> Repo.get_by({selector, record_struct |> Map.get(selector)}) do
nil -> %User{} # build new user struct
user -> user # pass through existing user struct
end
|> User.changeset(record_struct |> Map.from_struct)
|> Repo.insert_or_update
end
end
On the off chance you're looking for a flexible approach that works across models and for multiple selectors (ie country + passport number), check out my hex package EctoConditionals!

Why does this query return no results?

Given these definitions of a datascript db,
(def schema
{:tag/name { :db/unique :db.unique/identity }
:item/tag {:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many}
:outfit/item {:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many}}
)
(defonce conn (d/create-conn schema))
(defn new-entity! [conn attrs]
(let [entity (merge attrs {:db/id -1})
txn-result (d/transact! conn [entity])
temp-ids (:tempids txn-result)]
(temp-ids -1)))
(defonce init
(let [tag1 (new-entity! conn {:tag/name "tag1"})
item1 (new-entity! conn {:item/tag tag1})
outfit1 (new-entity! conn {:outfit/item item1})]
:ok))
If I run this devcard, I don't get any results:
(defcard find-by-tag-param
"find items by tag"
(d/q '[ :find ?item
:in ? ?tagname
:where
[ ?tag :tag/name ?tagname ]
[ ?item :item/tag ?tag ]]
#conn "tag1"))
Why does this query return no results?
For starters, your in clause should be :in $ ?tagname; The binding you have in there leaves you with no default database, meaning that nothing will match your query clauses.
The $ symbol is a special symbol which gets used as the default database in the :where forms. You can use non-default databases by prefixing your :where clauses with the name-symbol of the alternate db (e.g. :in ?alt-db :where [?alt-db ?tag :tag/name ?tagname] ...).
I haven't worked with dev cards, so it's possible there is something else needed to get this working, but fixing your query is the first step.

How to read json file into Clojure defrecord (to be searched later)

I have created a defrecord in a Clojure REPL:
user=> (defrecord Data [column1 column2 column3])
user.Data
How do I automate adding data to this record by reading in a .json file? Each of the columns in the defrecord corresponds exactly to a key in the json data. If the file contained a single record it would look similar to this:
[
{
"column1" : "value1"
"column2" : "value2"
"column3" : "value3"
}
]
But there are many thousands of such records in the file.
I can slurp the contents of the file like this:
(json/read-json (slurp "path/to/file.json")))
The dependencies for the read-json function are added to the project.clj file found in the directory where I am running lein repl from the command line: :dependencies [org.clojure/data.json "0.2.1"].
I would just like to be able to search the values of the records using a Clojure function, such that the value I am passing to the search function is between the values of a single record's column1 and column2 values (i.e., nth-record.column1.value <= query <= nth-record.column2.value). Once I've found a matching record, I want to return the value of another column in that same record (nth-record.column3.value). The values of columns 1 and 2 will be unique, representing a non-overlapping range of values. The value of column3 is not unique.
This seems like a fairly trivial task, but I can't figure out how to do it using the Clojure documentation or the examples I've found online. It doesn't matter to me how the records are stored internally in Clojure, as long as I can search them and return the value of a related field in the same record.
Using data.json package:
(require '[clojure.data.json :as json])
Read values into memory:
(def all-records (json/read-str (slurp "path/to/file.json")
:key-fn keyword))
;; ==> [ { :column1 "value1", :column2 "value2", :column3 "value3" }, ...]
Find matching records:
(def query "some-value")
(def matching (filter #(and (< (:column1 %) query) (< query (:column2 %))) all-records))
Get column3:
(map :column3 matching)
Collecting it all together (and making it more flexible):
(defn find-matching [select-fn result-fn records]
(map result-fn (filter select-fn records)))
(defn select-within [rec query]
(and (< (:column1 rec) query) (< query (:column2 rec))))
(find-matching #(select-within % "some-value") :column3 all-records)
Should probably use cheshire for speed.
If your queries get sufficiently complex, consider lucene, clojure has a nice wrapper.
I think you're thinking records are somehow more suitable for this than maps, as far as I can tell, you're not using any features that make records special like polymorphism. There might be a way to make cheshire spit out records, but I wouldn't bother.