Error Accumulation with Scalaz Validation - json

I have a complex JSON that it's persisted in a database. It's complexity is "segregated" in "blocks", as follows:
Entire JSON:
{
"block1" : {
"param1" : "val1",
"param2" : "val2"
},
"block2" : {
"param3" : "val3",
"param4" : "val4"
},
...
}
In the database, every block is stored and treated individually:
Persisted Blocks
"block1" : {
"param1" : "val1",
"param2" : "val2"
}
"block2" : {
"param3" : "val3",
"param4" : "val4"
}
Every block has a business meaning, so, every one it's mapped to a case-class.
I'm building a Play API that stores, updates and retrieves this JSON Structure and i want to validate if someone altered it's data for the sake of integrity.
I'm doing the retrieval (parsing and validation) of every block as follows:
val block1 = Json.parse(block1).validate[Block1].get
val block2 = Json.parse(block2).validate[Block2].get
...
The case-classes:
trait Block
sealed case class Block1 (param1: String, param2: String, ...) extends Block
sealed case class Block2 (param3: String, param4: String, ...) extends Block
sealed case class Request (block1: Block1, block2: Block2, ...)
With the current structure, if some field is altered and doesn't match with the defined type for it Play throws this exception:
[NoSuchElementException: JsError.get]
So, i want to build an accumulative error structure with Scalaz and Validation that catch all possible parsing and validation errors. I have seen this and this so i've coded the validation this way:
def build(block1: String, block2: String, ...): Validation[NonEmptyList[String], Request] = {
val block1 = Option(Json.parse(block1).validate[Block1].get).toSuccess("Error").toValidationNel
val block2 = Option(Json.parse(block2).validate[Block2].get).toSuccess("Error").toValidationNel
...
val request = (Request.apply _).curried
blockn <*> (... <*> (... <*> (...<*> (block2 <*> (block1 map request)))))
}
Note that i'm using the applicative functor <*> because Request has 20 fields (building that is a parentheses-mess with that syntax), |#| applicative functor only applies for case classes up to 12 parameters.
That code works for the happy path, but, when i modify some field Play throws the Execution Exception later described.
Question: I want to accumulate all possible structure faults that Play can detect while parsing every block. How can i do that?
Note: If in some way Shapeless has something to do with this i'm open to use it (i'm already using it).

If multi-validation is the only reason for adding Scalaz to your project then why not consider an alternative which will not require you adapt your Play project to the monadic way in which Scalaz forces you to solve problems (which might be a good thing, but not necessarily if the only reason for that is multi-validation).
Instead consider using a standard Scala approach with scala.util.Try:
val block1 = Try(Json.parse(block1).validate[Block1].get)
val block2 = Try(Json.parse(block2).validate[Block2].get)
...
val allBlocks : List[Try[Block]] = List(block1, block2, ...)
val failures : List[Failure[Block]] = allBlocks.collect { case f : Failure[Block] => f }
This way you can still operate on standard scala collections to retrieve the list of failures to be processed further. Also this approach does not constrain you in terms of number of blocks.

Related

How to handle different JSON schemas and dispatch hem to be handled by the right parser?

I am currently building a very simple JSON parser in Scala that has to deal with two (slightly) different schemas. My objective is to parse one value in the json, and based on that value, I would like to dispatch it to the relevant decoder. I have used circe for my implementation, but other implementations and/or suggestions are also welcome.
I have formulated a simplified version for my example to help clarifying the question.
There are two types of JSONs that I can receive, either a stock:
"data": {
"name": "XYZ"
},
"type": "STOCK"
}
Or a quote (which is similar to the stock but includes a price).
"data": {
"name": "ABC",
"price": 1151.6214,
},
"type": "QUOTE"
}
On my side, I have developed a simple decoder that looks like this (for the stock):
implicit private val dataDecoder: Decoder[Stock] = (hCursor: HCursor) => {
for {
isin <- hCursor.downField("data").downField("name").as[String]
typ <- hCursor.downField("type").as[StockType]
} yield Instrument(name, typ, LocalDateTime.now())
}
I could also develop a parser that only parses the "type" part of the JSON, and then send the data to be handled by the relevant parser (quote or stock). However, I am wondering what is the:
Efficient way to do this
Idiomatic/clean way to do this
To help rephrase my question if needed, what is the right & efficient way to handle slightly different JSON schemas, and forward them to be handled by the right parser.
I usually encounter this situation when need to serialize ADTs. As somebody mentioned in the comments to your question, circe has support for ADT codec auto generation, however I usually prefer to manually write the codecs.
In any case in a situation like your I would do something along these lines:
sealed trait Data
case class StockData(name: String) extends Data
case class QuoteData(name: String, quote: Double) extends Data
implicit val stockDataEncoder: Encoder[StockData] = ???
implicit val stockDataDecoder: Decoder[StockData] = ???
implicit val quoteDataEncoder: Encoder[QuoteData] = ???
implicit val quoteDataDecoder: Decoder[QuoteData] = ???
implicit val dataEncoder: Encoder[Data] = Encoder.instance {
case s: StockData => stockDataEncoder(s).withObject(_.add("type", "stock))
case q: QuoteData => quoteDataEncoder(q).withObject(_.add("type", "quote"))
}
implicit val dataDecoder: Decoder[Data] = Decoder.instance { c =>
for {
stype <- c.get[String]("type)
res <- stype match {
case "stock" => stockDataDecoder(c)
case "quote" => quoteDataDecoder(c)
case unk => Left(DecodingFailure(s"Unsupported data type: ${unk}", c.history))
}
} yield res
}

Decoding structured JSON arrays with circe in Scala

Suppose I need to decode JSON arrays that look like the following, where there are a couple of fields at the beginning, some arbitrary number of homogeneous elements, and then some other field:
[ "Foo", "McBar", true, false, false, false, true, 137 ]
I don't know why anyone would choose to encode their data like this, but people do weird things, and suppose in this case I just have to deal with it.
I want to decode this JSON into a case class like this:
case class Foo(firstName: String, lastName: String, age: Int, stuff: List[Boolean])
We can write something like this:
import cats.syntax.either._
import io.circe.{ Decoder, DecodingFailure, Json }
implicit val fooDecoder: Decoder[Foo] = Decoder.instance { c =>
c.focus.flatMap(_.asArray) match {
case Some(fnJ +: lnJ +: rest) =>
rest.reverse match {
case ageJ +: stuffJ =>
for {
fn <- fnJ.as[String]
ln <- lnJ.as[String]
age <- ageJ.as[Int]
stuff <- Json.fromValues(stuffJ.reverse).as[List[Boolean]]
} yield Foo(fn, ln, age, stuff)
case _ => Left(DecodingFailure("Foo", c.history))
}
case None => Left(DecodingFailure("Foo", c.history))
}
}
…which works:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false, 137 ]""")
res3: io.circe.Decoder.Result[Foo] = Right(Foo(Foo,McBar,137,List(true, false)))
But ugh, that's horrible. Also the error messages are completely useless:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false ]""")
res4: io.circe.Decoder.Result[Foo] = Left(DecodingFailure(Int, List()))
Surely there's a way to do this that doesn't involve switching back and forth between cursors and Json values, throwing away history in our error messages, and just generally being an eyesore?
Some context: questions about writing custom JSON array decoders like this in circe come up fairly often (e.g. this morning). The specific details of how to do this are likely to change in an upcoming version of circe (although the API will be similar; see this experimental project for some details), so I don't really want to spend a lot of time adding an example like this to the documentation, but it comes up enough that I think it does deserve a Stack Overflow Q&A.
Working with cursors
There is a better way! You can write this much more concisely while also maintaining useful error messages by working directly with cursors all the way through:
case class Foo(firstName: String, lastName: String, age: Int, stuff: List[Boolean])
import cats.syntax.either._
import io.circe.Decoder
implicit val fooDecoder: Decoder[Foo] = Decoder.instance { c =>
val fnC = c.downArray
for {
fn <- fnC.as[String]
lnC = fnC.deleteGoRight
ln <- lnC.as[String]
ageC = lnC.deleteGoLast
age <- ageC.as[Int]
stuffC = ageC.delete
stuff <- stuffC.as[List[Boolean]]
} yield Foo(fn, ln, age, stuff)
}
This also works:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false, 137 ]""")
res0: io.circe.Decoder.Result[Foo] = Right(Foo(Foo,McBar,137,List(true, false)))
But it also gives us an indication of where errors happened:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false ]""")
res1: io.circe.Decoder.Result[Foo] = Left(DecodingFailure(Int, List(DeleteGoLast, DeleteGoRight, DownArray)))
Also it's shorter, more declarative, and doesn't require that unreadable nesting.
How it works
The key idea is that we interleave "reading" operations (the .as[X] calls on the cursor) with navigation / modification operations (downArray and the three delete method calls).
When we start, c is an HCursor that we hope points at an array. c.downArray moves the cursor to the first element in the array. If the input isn't an array at all, or is an empty array, this operation will fail, and we'll get a useful error message. If it succeeds, the first line of the for-comprehension will try to decode that first element into a string, and leaves our cursor pointing at that first element.
The second line in the for-comprehension says "okay, we're done with the first element, so let's forget about it and move to the second". The delete part of the method name doesn't mean it's actually mutating anything—nothing in circe ever mutates anything in any way that users can observe—it just means that that element won't be available to any future operations on the resulting cursor.
The third line tries to decode the second element in the original JSON array (now the first element in our new cursor) as a string. When that's done, the fourth line "deletes" that element and moves to the end of the array, and then the fifth line tries to decode that final element as an Int.
The next line is probably the most interesting:
stuffC = ageC.delete
This says, okay, we're at the last element in our modified view of the JSON array (where earlier we deleted the first two elements). Now we delete the last element and move the cursor up so that it points at the entire (modified) array, which we can then decode as a list of booleans, and we're done.
More error accumulation
There's actually an even more concise way you can write this:
import cats.syntax.all._
import io.circe.Decoder
implicit val fooDecoder: Decoder[Foo] = (
Decoder[String].prepare(_.downArray),
Decoder[String].prepare(_.downArray.deleteGoRight),
Decoder[Int].prepare(_.downArray.deleteGoLast),
Decoder[List[Boolean]].prepare(_.downArray.deleteGoRight.deleteGoLast.delete)
).map4(Foo)
This will also work, and it has the added benefit that if decoding would fail for more than one of the members, you can get error messages for all of the failures at the same time. For example, if we have something like this, we should expect three errors (for the non-string first name, the non-integral age, and the non-boolean stuff value):
val bad = """[["Foo"], "McBar", true, "true", false, 13.7 ]"""
val badResult = io.circe.jawn.decodeAccumulating[Foo](bad)
And that's what we see (together with the specific location information for each failure):
scala> badResult.leftMap(_.map(println))
DecodingFailure(String, List(DownArray))
DecodingFailure(Int, List(DeleteGoLast, DownArray))
DecodingFailure([A]List[A], List(MoveRight, DownArray, DeleteGoParent, DeleteGoLast, DeleteGoRight, DownArray))
Which of these two approaches you should prefer is a matter of taste and whether or not you care about error accumulating—I personally find the first a little more readable.

Converting "recursive" object to JSON (Play Framework 2.4 with Scala)

I have reached a point where my code compiles successfully, yet I have doubts about my solution and am posting this question for that reason.
I have a Node class defined as:
case class Node(id: Long, label: String, parent_id: Option[Long])
The reason I quote/unquote recursive is because technically, I do not store a Node within a Node. Rather, each node has a pointer to its parent and I can say: give me all children of Node id=X.
Here is an example tree, for the sake of visualization. I would like to give the root_node's ID, and obtain the tree's conversion to a Json String:
root_node
|_ node_1
| |_ node_11
| |_ node_111
|_ node_2
|_ node_3
The Json would look like:
{"title": "root_node", "children": [...]}
with the children array containing node_1, 2 and 3 etc... recursively
Here is the Writes Converter for Node:
/** json converter of Node to JSON */
implicit val NodeWrites = new Writes[Node] {
def writes(node: Node) = Json.obj(
"title" -> node.label,
"children" -> Node.getChildrenOf(node.id)
)
}
Quoting Play docs:
The Play JSON API provides implicit Writes for most basic types, such
as Int, Double, String, and Boolean. It also supports Writes for
collections of any type T that a Writes[T] exists.
I need to point out that Node.getChildrenOf(node.id) returns a List of Nodes from the DB. So according to Play's docs, I should be able to convert a List[Node] to Json. It seems that doing this within the Writes converter itself is a bit more troublesome.
Here is the resulting error from running this code:
type mismatch;
found : List[models.Node]
required: play.api.libs.json.Json.JsValueWrapper
Note: implicit value NodeWrites is not applicable here because it comes after the application point and it lacks an explicit result type
I added the "explicit result type" to my Writes converter, here is the result:
/** json converter of Node to JSON */
implicit val NodeWrites: Writes[Node] = new Writes[Node] {
def writes(node: Node) = Json.obj(
"title" -> node.label,
"children" -> Node.getChildrenOf(node.id)
)
}
The code now executes properly, I can visualize the tree on the browser.
Even though this looks to me like the cleanest working solution, IntelliJ still complains about the line:
"children" -> Node.getChildrenOf(node.id)
saying:
Type mismatch: found(String, List[Node]), required (String, Json.JsValueWrapper)
Could it be that IntelliJ's error reporting is not based off of the Scala compiler?
Finally, is the overall approach of the JSON converter terrible?
Thanks and sorry for the long post.
The problem lies in "children" -> Node.getChildrenOf(node.id). Node.getChildrenOf(node.id) returns a List[Node]. Whereas any attribute in the Json.obj expects JsValueWrappers. In this case a JsArray.
Something like this should work:
implicit val writes = new Writes[Node] {
def writes(node: Node) = Json.obj(
"title" -> node.label,
// Note that we have to pass this since the writes hasn't been defined just yet.
"children" -> JsArray(Node.getChildrenOf(node).map(child => Json.toJson(child)(this)))
)
}
This at least compiles, I haven't tested it with any data though.

Abstraction to extract data from JSON in Scala

I am looking for a good abstraction to extract data form JSON (I am using json4s now).
Suppose I have a case class A and data in JSON format.
case class A(a1: String, a2: String, a3: String)
{"a1":"xxx", "a2": "yyy", "a3": "zzz"}
I need a function to extract the JSON data and return A with these data as follows:
val a: JValue => A = ...
I do not want to write the function a from scratch. I would rather compose it from primitive functions.
For example, I can write a primitive function to extract string by field name:
val str: (String, JValue) => String = {(fieldName, jval) => ... }
Now I would like to compose the function a: JValue => A from str. Does it make sense ?
Consider use of Play-JSON, which has a composable "Reads" object. If you've ever used ReactiveMongo, it can be used in much the same way. Contrary to some older posts here, it can be used stand-alone, without most of the rest of Play.
It uses the common "implicit translator" (my term) idiom. I found that my favorite deserializing pattern for using it is not highlighted in the docs, though - the pattern they espouse is a lot harder to get right, IMHO. I make heavy use of .as and .asOpt, which are documented on the first linked page above, in the small section "Using JsValue.as/asOpt". When deserializing a JSON object, you can say something like
val person:Person = (someParsedJsonObject \ "aPerson").as[Person]
and as long as you have an implicit Reads[Person] in scope, all just works. There are built-in Reads for all primitive types and many collection types. In many cases, it makes sense to put the Reads and Writes implicit objects in the companion object for, e.g., Person.
I thought json4s had a similar feature, but I could be wrong.
Argonaut is fully functional Scala library.
It allows to encode/decode case classes (JSON codecs).
import argonaut._, Argonaut._
case class Person(name: String, age: Int)
implicit def PersonDecodeJson: DecodeJson[Person]
jdecode2L(Person.apply)("name", "age")
// Codec for Person case class from JSON of form
// { "name": "string", "age": 1 }
It also provides JSON cursor (lenses/monocle) for custom parsing.
implicit def PersonDecodeJson: DecodeJson[Person] =
DecodeJson(c => for {
name <- (c --\ "_name").as[String]
age <- (c --\ "_age").as[String].map(_.toInt)
} yield Person(name, age))
// Decode Person from a JSON with property names different
// from those of the case class, and age passed as string:
// { "_name": "string", "age": "10" }
Parsing result is represented by DecodeResult type that can be composed (.map, .flatMap) and handle error cases.

How to report parsing errors when using JSON.parseFull with Scala

When my app is fed syntactically incorrect JSON I want to be able to report the error to the user with some useful detail that will allow the problem area to be located.
So in this example j will be None because of the trailing comma after "File1". Is there a way to obtain details of last parse error?
val badSyntax = """
{
"menu1": {
"id": "file1",
"value": "File1",
},
"menu2": {
"id": "file2",
"value": "File2",
}
}"""
val j = JSON.parseFull(badSyntax)
When you get a parse error, use JSON.lastNoSuccess to get the last error. It is of type JSON.NoSuccess of which thare are two subclasses, JSON.Error and JSON.Failure, both containing a msg: String member detailing the error.
Note that JSON.lastNoSuccess is not thread safe (it is a mere global variable) and is now deprecated (bound to disappear in scala 2.11)
UPDATE: Apparently, I was wrong about it not being thread-safe: it was indeed not thread-safe before scala 2.10, but now lastNoSuccess is backed by a thread-local variable (and is thus safe to use in a multi-threaded context).
After seing this, you'd be forgiven to think that as long as you read right after a parsing failure in the same thread as the one that was used to do the parsing (the thread where you called parseFull), then everything will work as expected. Unfortunately, during this refactor they also changed how they use lastNoSuccess internally inside Parsers.phrase (which is called by JSON.parseFull.
See https://github.com/scala/scala/commit/1a4faa92faaf495f75a6dd2c58376f6bb3fbf44c
Since this refactor, lastNoSuccess is reset to None at the end of Parsers.phrase. This is no problem in parsers in general, as lastNoSuccess is used as a temporary value that is returned as the result of Parsers.phrase anyway.
The problem here is that we don't call Parsers.phrase, but JSON.parseFull, which drops any error info (see case None => None inside method JSON.parseRaw at https://github.com/scala/scala/blob/v2.10.0/src/library/scala/util/parsing/json/JSON.scala).
The fact that JSON.parseFull drops any error info could easily be circumvented prior to scala 2.10 by directly reading JSON.lastNoSuccess as I advised before, but now that this value is reset at the end of Parsers.phrase, there is not much you can do to get the error information out of JSON.
Any solution? Yes. What you can do is to create your very own version of JSON that will not drop the error information:
import util.parsing.json._
object MyJSON extends Parser {
def parseRaw(input : String) : Either[NoSuccess, JSONType] = {
phrase(root)(new lexical.Scanner(input)) match {
case Success(result, _) => Right(result)
case ns: NoSuccess => Left(ns)
}
}
def parseFull(input: String): Either[NoSuccess, Any] = {
parseRaw(input).right.map(resolveType)
}
def resolveType(input: Any): Any = input match {
case JSONObject(data) => data.transform {
case (k,v) => resolveType(v)
}
case JSONArray(data) => data.map(resolveType)
case x => x
}
}
I just changed Option to Either as the result type, so that I can return parsing errors as an Left. Some test in the REPL:
scala> MyJSON.parseFull("[1,2,3]")
res11: Either[MyJSON.NoSuccess,Any] = Right(List(1.0, 2.0, 3.0))
scala> MyJSON.parseFull("[1,2,3")
res12: Either[MyJSON.NoSuccess,Any] =
Left([1.7] failure: end of input
[1,2,3
^)