Decoding structured JSON arrays with circe in Scala - json

Suppose I need to decode JSON arrays that look like the following, where there are a couple of fields at the beginning, some arbitrary number of homogeneous elements, and then some other field:
[ "Foo", "McBar", true, false, false, false, true, 137 ]
I don't know why anyone would choose to encode their data like this, but people do weird things, and suppose in this case I just have to deal with it.
I want to decode this JSON into a case class like this:
case class Foo(firstName: String, lastName: String, age: Int, stuff: List[Boolean])
We can write something like this:
import cats.syntax.either._
import io.circe.{ Decoder, DecodingFailure, Json }
implicit val fooDecoder: Decoder[Foo] = Decoder.instance { c =>
c.focus.flatMap(_.asArray) match {
case Some(fnJ +: lnJ +: rest) =>
rest.reverse match {
case ageJ +: stuffJ =>
for {
fn <- fnJ.as[String]
ln <- lnJ.as[String]
age <- ageJ.as[Int]
stuff <- Json.fromValues(stuffJ.reverse).as[List[Boolean]]
} yield Foo(fn, ln, age, stuff)
case _ => Left(DecodingFailure("Foo", c.history))
}
case None => Left(DecodingFailure("Foo", c.history))
}
}
…which works:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false, 137 ]""")
res3: io.circe.Decoder.Result[Foo] = Right(Foo(Foo,McBar,137,List(true, false)))
But ugh, that's horrible. Also the error messages are completely useless:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false ]""")
res4: io.circe.Decoder.Result[Foo] = Left(DecodingFailure(Int, List()))
Surely there's a way to do this that doesn't involve switching back and forth between cursors and Json values, throwing away history in our error messages, and just generally being an eyesore?
Some context: questions about writing custom JSON array decoders like this in circe come up fairly often (e.g. this morning). The specific details of how to do this are likely to change in an upcoming version of circe (although the API will be similar; see this experimental project for some details), so I don't really want to spend a lot of time adding an example like this to the documentation, but it comes up enough that I think it does deserve a Stack Overflow Q&A.

Working with cursors
There is a better way! You can write this much more concisely while also maintaining useful error messages by working directly with cursors all the way through:
case class Foo(firstName: String, lastName: String, age: Int, stuff: List[Boolean])
import cats.syntax.either._
import io.circe.Decoder
implicit val fooDecoder: Decoder[Foo] = Decoder.instance { c =>
val fnC = c.downArray
for {
fn <- fnC.as[String]
lnC = fnC.deleteGoRight
ln <- lnC.as[String]
ageC = lnC.deleteGoLast
age <- ageC.as[Int]
stuffC = ageC.delete
stuff <- stuffC.as[List[Boolean]]
} yield Foo(fn, ln, age, stuff)
}
This also works:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false, 137 ]""")
res0: io.circe.Decoder.Result[Foo] = Right(Foo(Foo,McBar,137,List(true, false)))
But it also gives us an indication of where errors happened:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false ]""")
res1: io.circe.Decoder.Result[Foo] = Left(DecodingFailure(Int, List(DeleteGoLast, DeleteGoRight, DownArray)))
Also it's shorter, more declarative, and doesn't require that unreadable nesting.
How it works
The key idea is that we interleave "reading" operations (the .as[X] calls on the cursor) with navigation / modification operations (downArray and the three delete method calls).
When we start, c is an HCursor that we hope points at an array. c.downArray moves the cursor to the first element in the array. If the input isn't an array at all, or is an empty array, this operation will fail, and we'll get a useful error message. If it succeeds, the first line of the for-comprehension will try to decode that first element into a string, and leaves our cursor pointing at that first element.
The second line in the for-comprehension says "okay, we're done with the first element, so let's forget about it and move to the second". The delete part of the method name doesn't mean it's actually mutating anything—nothing in circe ever mutates anything in any way that users can observe—it just means that that element won't be available to any future operations on the resulting cursor.
The third line tries to decode the second element in the original JSON array (now the first element in our new cursor) as a string. When that's done, the fourth line "deletes" that element and moves to the end of the array, and then the fifth line tries to decode that final element as an Int.
The next line is probably the most interesting:
stuffC = ageC.delete
This says, okay, we're at the last element in our modified view of the JSON array (where earlier we deleted the first two elements). Now we delete the last element and move the cursor up so that it points at the entire (modified) array, which we can then decode as a list of booleans, and we're done.
More error accumulation
There's actually an even more concise way you can write this:
import cats.syntax.all._
import io.circe.Decoder
implicit val fooDecoder: Decoder[Foo] = (
Decoder[String].prepare(_.downArray),
Decoder[String].prepare(_.downArray.deleteGoRight),
Decoder[Int].prepare(_.downArray.deleteGoLast),
Decoder[List[Boolean]].prepare(_.downArray.deleteGoRight.deleteGoLast.delete)
).map4(Foo)
This will also work, and it has the added benefit that if decoding would fail for more than one of the members, you can get error messages for all of the failures at the same time. For example, if we have something like this, we should expect three errors (for the non-string first name, the non-integral age, and the non-boolean stuff value):
val bad = """[["Foo"], "McBar", true, "true", false, 13.7 ]"""
val badResult = io.circe.jawn.decodeAccumulating[Foo](bad)
And that's what we see (together with the specific location information for each failure):
scala> badResult.leftMap(_.map(println))
DecodingFailure(String, List(DownArray))
DecodingFailure(Int, List(DeleteGoLast, DownArray))
DecodingFailure([A]List[A], List(MoveRight, DownArray, DeleteGoParent, DeleteGoLast, DeleteGoRight, DownArray))
Which of these two approaches you should prefer is a matter of taste and whether or not you care about error accumulating—I personally find the first a little more readable.

Related

Apply key to map obtained via pattern matching in Scala (type erased)

I am trying to query an API which returns a JSON array (e.g. [{"name":"obj1", "value":5}, {"name":"obj2", "value":2}]) and process the result, which gets parsed as an Option[List[Map[String,Any]]]. However, I am not sure how to properly extract each Map, since the types are erased at runtime.
import scala.util.parsing.json._
import scalaj.http._
val url = "https://api.github.com/users/torvalds/repos"
val req = Http(url).asString
val parsed = JSON.parseFull(req.body) match {
case Some(data) => data match {
case list: List[_] => list
case _ => sys.error("Result is not a list.")
}
case None => sys.error("Invalid JSON received.")
}
parsed.foreach{
case x: Map[_,_] => x.get("full_name") // error here
}
The error occurs because I cannot apply the function with a String key type. However, because of type erasure, the key and value type are unknown, and specifying that it's a String map throws compiler warnings.
Am I going about things the wrong way? Or maybe I'd have better luck with a different HTTP/JSON library?
You can replace your last line with:
parsed.collect{ case x: Map[_,_] => x.asInstanceOf[Map[String,Any]].get("full_name") }
We sort of "cheat" here since we know the keys in a JSON are always Strings.
As for your last question, if you need something lightweight, I think what you have here is as simple as it gets.
Take a look at this SO post if you want to do something more powerful with your pattern matching.

Play Json Writes: Scala Seq to Json Object

In Scala, I have the following data structure (Item names are always unique within the same Container):
case class Container(content: Seq[Item])
case class Item(name: String, elements: Seq[String])
Example instance:
val container = Container(Seq(
Item("A", Seq("A1", "A2")),
Item("B", Seq("B1", "B2"))
))
What I want to do is to define a Writes[Container] that produces the following JSON:
{
"A": ["A1", "A2"],
"B": ["B1", "B2"]
}
I guess a possible solution could be to transform the Container(Seq[Item]) into a Map[String, Seq[String]] where each key corresponds to an item's name and the value to an item's elements and let the API do the rest (there's probably an implicit write for maps, at least this is the case when reading JSON).
But: this approach creates a new Map for every Container with no other purpose than producing JSON. There are a lot Containerinstances that need to be transformed to JSON, so I assume this approach is rather expensive. How else could I do this?
I don't think you should necessarily worry about the speed here (or at least verify that it is a problem before worrying about it), and converting to a map is probably the easiest option. An alternative, which may well not perform any better, is:
val customWrites: Writes[Container] = new Writes[Container] {
override def writes(container: Container): JsValue = {
val elems = container.content.map(
elem => elem.name -> Json.toJsFieldJsValueWrapper(elem.elements))
Json.obj(elems: _*)
}
}
(The explicit conversion to a JsValueWrapper - which is normally implicit - seems to be necessary in this context for reasons I don't entirely understand or have time to delve into. This answer has some details.)
One advantage of this method is that it will handle Item objects with duplicate names (which of course is legal JSON but would cause collisions with a map.)

Abstraction to extract data from JSON in Scala

I am looking for a good abstraction to extract data form JSON (I am using json4s now).
Suppose I have a case class A and data in JSON format.
case class A(a1: String, a2: String, a3: String)
{"a1":"xxx", "a2": "yyy", "a3": "zzz"}
I need a function to extract the JSON data and return A with these data as follows:
val a: JValue => A = ...
I do not want to write the function a from scratch. I would rather compose it from primitive functions.
For example, I can write a primitive function to extract string by field name:
val str: (String, JValue) => String = {(fieldName, jval) => ... }
Now I would like to compose the function a: JValue => A from str. Does it make sense ?
Consider use of Play-JSON, which has a composable "Reads" object. If you've ever used ReactiveMongo, it can be used in much the same way. Contrary to some older posts here, it can be used stand-alone, without most of the rest of Play.
It uses the common "implicit translator" (my term) idiom. I found that my favorite deserializing pattern for using it is not highlighted in the docs, though - the pattern they espouse is a lot harder to get right, IMHO. I make heavy use of .as and .asOpt, which are documented on the first linked page above, in the small section "Using JsValue.as/asOpt". When deserializing a JSON object, you can say something like
val person:Person = (someParsedJsonObject \ "aPerson").as[Person]
and as long as you have an implicit Reads[Person] in scope, all just works. There are built-in Reads for all primitive types and many collection types. In many cases, it makes sense to put the Reads and Writes implicit objects in the companion object for, e.g., Person.
I thought json4s had a similar feature, but I could be wrong.
Argonaut is fully functional Scala library.
It allows to encode/decode case classes (JSON codecs).
import argonaut._, Argonaut._
case class Person(name: String, age: Int)
implicit def PersonDecodeJson: DecodeJson[Person]
jdecode2L(Person.apply)("name", "age")
// Codec for Person case class from JSON of form
// { "name": "string", "age": 1 }
It also provides JSON cursor (lenses/monocle) for custom parsing.
implicit def PersonDecodeJson: DecodeJson[Person] =
DecodeJson(c => for {
name <- (c --\ "_name").as[String]
age <- (c --\ "_age").as[String].map(_.toInt)
} yield Person(name, age))
// Decode Person from a JSON with property names different
// from those of the case class, and age passed as string:
// { "_name": "string", "age": "10" }
Parsing result is represented by DecodeResult type that can be composed (.map, .flatMap) and handle error cases.

Handle multidimensional JSON with scala Play framework

I am trying to send data from the client to the server using a JSON request. The body of the JSON request looks like this:
[
[
{"x":"0","y":"0","player":0},
{"x":"0","y":"1","player":0},
{"x":"0","y":"2","player":1}
],
[
{"x":"1","y":"0","player":0},
{"x":"1","y":"1","player":2},
{"x":"1","y":"2","player":0}
],
[
{"x":"2","y":"0","player":0},
{"x":"2","y":"1","player":1},
{"x":"2","y":"2","player":2}
]
]
On server side I would like to transform data with Play 2 framework to Scala 2D list like this:
List(
List(0,0,1),
List(0,2,0),
List(0,1,2)
)
this is 3x3 but it can be variable like 50x50 or so.
Thanks for any help.
It might be incomplete (don't know if you want to modelize the square matrix contraint as well) but something like that could be a good start:
First here is what the controller (and model) part can define
import play.api.libs.json.Json._
import play.api.libs.json._
type PlayerT = (String, String, Int)
implicit val playerTripleReads:Reads[PlayerT] = (
(__ \ "x").read[String] and
(__ \ "y").read[String] and
(__ \ "player").read[Int]
tupled
)
def getJson = Action(parse.json) { request =>
request.body.validate[List[List[PlayerT]]].map{
case xs => Ok(xs.mkString("\n"))
}.recoverTotal{
e => BadRequest("Detected error:"+ JsError.toFlatJson(e))
}
}
In this version, you'll get a list of list holding validated tuples of the form (String, String, Int) which has been aliased with the PlayerT type to save some typing.
As you may saw, the reader as been created "by-hand" by composing (using the and combinator) three basic blocks and the result is flattened using the tupled operator.
With this solution you're now on track to play with those tuples, but IMO the code will suffer from bad readability, because of the usage of _1, _2 and _3 along the way.
So here is a different approach (which is in fact even easier...) that tackles this problem of sane coding, this will simply defined a `case class that models your atomic data
case class Player(x:String, y:String, player:Int)
implicit val playerReads = Json.reads[Player]
def getJson = Action(parse.json) { request =>
request.body.validate[List[List[Player]]].map{
case xs => Ok(xs.mkString("\n"))
}.recoverTotal{
e => BadRequest("Detected error:"+ JsError.toFlatJson(e))
}
}
Note that, the reader will always follow further changes in your data representation, that is the case class's fields thanks to the use of the implicit creation of the reader at compile time.
Now, you'll be able to use x, y and player fields rather than _1, _2 and _3.

How to report parsing errors when using JSON.parseFull with Scala

When my app is fed syntactically incorrect JSON I want to be able to report the error to the user with some useful detail that will allow the problem area to be located.
So in this example j will be None because of the trailing comma after "File1". Is there a way to obtain details of last parse error?
val badSyntax = """
{
"menu1": {
"id": "file1",
"value": "File1",
},
"menu2": {
"id": "file2",
"value": "File2",
}
}"""
val j = JSON.parseFull(badSyntax)
When you get a parse error, use JSON.lastNoSuccess to get the last error. It is of type JSON.NoSuccess of which thare are two subclasses, JSON.Error and JSON.Failure, both containing a msg: String member detailing the error.
Note that JSON.lastNoSuccess is not thread safe (it is a mere global variable) and is now deprecated (bound to disappear in scala 2.11)
UPDATE: Apparently, I was wrong about it not being thread-safe: it was indeed not thread-safe before scala 2.10, but now lastNoSuccess is backed by a thread-local variable (and is thus safe to use in a multi-threaded context).
After seing this, you'd be forgiven to think that as long as you read right after a parsing failure in the same thread as the one that was used to do the parsing (the thread where you called parseFull), then everything will work as expected. Unfortunately, during this refactor they also changed how they use lastNoSuccess internally inside Parsers.phrase (which is called by JSON.parseFull.
See https://github.com/scala/scala/commit/1a4faa92faaf495f75a6dd2c58376f6bb3fbf44c
Since this refactor, lastNoSuccess is reset to None at the end of Parsers.phrase. This is no problem in parsers in general, as lastNoSuccess is used as a temporary value that is returned as the result of Parsers.phrase anyway.
The problem here is that we don't call Parsers.phrase, but JSON.parseFull, which drops any error info (see case None => None inside method JSON.parseRaw at https://github.com/scala/scala/blob/v2.10.0/src/library/scala/util/parsing/json/JSON.scala).
The fact that JSON.parseFull drops any error info could easily be circumvented prior to scala 2.10 by directly reading JSON.lastNoSuccess as I advised before, but now that this value is reset at the end of Parsers.phrase, there is not much you can do to get the error information out of JSON.
Any solution? Yes. What you can do is to create your very own version of JSON that will not drop the error information:
import util.parsing.json._
object MyJSON extends Parser {
def parseRaw(input : String) : Either[NoSuccess, JSONType] = {
phrase(root)(new lexical.Scanner(input)) match {
case Success(result, _) => Right(result)
case ns: NoSuccess => Left(ns)
}
}
def parseFull(input: String): Either[NoSuccess, Any] = {
parseRaw(input).right.map(resolveType)
}
def resolveType(input: Any): Any = input match {
case JSONObject(data) => data.transform {
case (k,v) => resolveType(v)
}
case JSONArray(data) => data.map(resolveType)
case x => x
}
}
I just changed Option to Either as the result type, so that I can return parsing errors as an Left. Some test in the REPL:
scala> MyJSON.parseFull("[1,2,3]")
res11: Either[MyJSON.NoSuccess,Any] = Right(List(1.0, 2.0, 3.0))
scala> MyJSON.parseFull("[1,2,3")
res12: Either[MyJSON.NoSuccess,Any] =
Left([1.7] failure: end of input
[1,2,3
^)