In Scala, I have the following data structure (Item names are always unique within the same Container):
case class Container(content: Seq[Item])
case class Item(name: String, elements: Seq[String])
Example instance:
val container = Container(Seq(
Item("A", Seq("A1", "A2")),
Item("B", Seq("B1", "B2"))
))
What I want to do is to define a Writes[Container] that produces the following JSON:
{
"A": ["A1", "A2"],
"B": ["B1", "B2"]
}
I guess a possible solution could be to transform the Container(Seq[Item]) into a Map[String, Seq[String]] where each key corresponds to an item's name and the value to an item's elements and let the API do the rest (there's probably an implicit write for maps, at least this is the case when reading JSON).
But: this approach creates a new Map for every Container with no other purpose than producing JSON. There are a lot Containerinstances that need to be transformed to JSON, so I assume this approach is rather expensive. How else could I do this?
I don't think you should necessarily worry about the speed here (or at least verify that it is a problem before worrying about it), and converting to a map is probably the easiest option. An alternative, which may well not perform any better, is:
val customWrites: Writes[Container] = new Writes[Container] {
override def writes(container: Container): JsValue = {
val elems = container.content.map(
elem => elem.name -> Json.toJsFieldJsValueWrapper(elem.elements))
Json.obj(elems: _*)
}
}
(The explicit conversion to a JsValueWrapper - which is normally implicit - seems to be necessary in this context for reasons I don't entirely understand or have time to delve into. This answer has some details.)
One advantage of this method is that it will handle Item objects with duplicate names (which of course is legal JSON but would cause collisions with a map.)
Related
Suppose I need to decode JSON arrays that look like the following, where there are a couple of fields at the beginning, some arbitrary number of homogeneous elements, and then some other field:
[ "Foo", "McBar", true, false, false, false, true, 137 ]
I don't know why anyone would choose to encode their data like this, but people do weird things, and suppose in this case I just have to deal with it.
I want to decode this JSON into a case class like this:
case class Foo(firstName: String, lastName: String, age: Int, stuff: List[Boolean])
We can write something like this:
import cats.syntax.either._
import io.circe.{ Decoder, DecodingFailure, Json }
implicit val fooDecoder: Decoder[Foo] = Decoder.instance { c =>
c.focus.flatMap(_.asArray) match {
case Some(fnJ +: lnJ +: rest) =>
rest.reverse match {
case ageJ +: stuffJ =>
for {
fn <- fnJ.as[String]
ln <- lnJ.as[String]
age <- ageJ.as[Int]
stuff <- Json.fromValues(stuffJ.reverse).as[List[Boolean]]
} yield Foo(fn, ln, age, stuff)
case _ => Left(DecodingFailure("Foo", c.history))
}
case None => Left(DecodingFailure("Foo", c.history))
}
}
…which works:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false, 137 ]""")
res3: io.circe.Decoder.Result[Foo] = Right(Foo(Foo,McBar,137,List(true, false)))
But ugh, that's horrible. Also the error messages are completely useless:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false ]""")
res4: io.circe.Decoder.Result[Foo] = Left(DecodingFailure(Int, List()))
Surely there's a way to do this that doesn't involve switching back and forth between cursors and Json values, throwing away history in our error messages, and just generally being an eyesore?
Some context: questions about writing custom JSON array decoders like this in circe come up fairly often (e.g. this morning). The specific details of how to do this are likely to change in an upcoming version of circe (although the API will be similar; see this experimental project for some details), so I don't really want to spend a lot of time adding an example like this to the documentation, but it comes up enough that I think it does deserve a Stack Overflow Q&A.
Working with cursors
There is a better way! You can write this much more concisely while also maintaining useful error messages by working directly with cursors all the way through:
case class Foo(firstName: String, lastName: String, age: Int, stuff: List[Boolean])
import cats.syntax.either._
import io.circe.Decoder
implicit val fooDecoder: Decoder[Foo] = Decoder.instance { c =>
val fnC = c.downArray
for {
fn <- fnC.as[String]
lnC = fnC.deleteGoRight
ln <- lnC.as[String]
ageC = lnC.deleteGoLast
age <- ageC.as[Int]
stuffC = ageC.delete
stuff <- stuffC.as[List[Boolean]]
} yield Foo(fn, ln, age, stuff)
}
This also works:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false, 137 ]""")
res0: io.circe.Decoder.Result[Foo] = Right(Foo(Foo,McBar,137,List(true, false)))
But it also gives us an indication of where errors happened:
scala> fooDecoder.decodeJson(json"""[ "Foo", "McBar", true, false ]""")
res1: io.circe.Decoder.Result[Foo] = Left(DecodingFailure(Int, List(DeleteGoLast, DeleteGoRight, DownArray)))
Also it's shorter, more declarative, and doesn't require that unreadable nesting.
How it works
The key idea is that we interleave "reading" operations (the .as[X] calls on the cursor) with navigation / modification operations (downArray and the three delete method calls).
When we start, c is an HCursor that we hope points at an array. c.downArray moves the cursor to the first element in the array. If the input isn't an array at all, or is an empty array, this operation will fail, and we'll get a useful error message. If it succeeds, the first line of the for-comprehension will try to decode that first element into a string, and leaves our cursor pointing at that first element.
The second line in the for-comprehension says "okay, we're done with the first element, so let's forget about it and move to the second". The delete part of the method name doesn't mean it's actually mutating anything—nothing in circe ever mutates anything in any way that users can observe—it just means that that element won't be available to any future operations on the resulting cursor.
The third line tries to decode the second element in the original JSON array (now the first element in our new cursor) as a string. When that's done, the fourth line "deletes" that element and moves to the end of the array, and then the fifth line tries to decode that final element as an Int.
The next line is probably the most interesting:
stuffC = ageC.delete
This says, okay, we're at the last element in our modified view of the JSON array (where earlier we deleted the first two elements). Now we delete the last element and move the cursor up so that it points at the entire (modified) array, which we can then decode as a list of booleans, and we're done.
More error accumulation
There's actually an even more concise way you can write this:
import cats.syntax.all._
import io.circe.Decoder
implicit val fooDecoder: Decoder[Foo] = (
Decoder[String].prepare(_.downArray),
Decoder[String].prepare(_.downArray.deleteGoRight),
Decoder[Int].prepare(_.downArray.deleteGoLast),
Decoder[List[Boolean]].prepare(_.downArray.deleteGoRight.deleteGoLast.delete)
).map4(Foo)
This will also work, and it has the added benefit that if decoding would fail for more than one of the members, you can get error messages for all of the failures at the same time. For example, if we have something like this, we should expect three errors (for the non-string first name, the non-integral age, and the non-boolean stuff value):
val bad = """[["Foo"], "McBar", true, "true", false, 13.7 ]"""
val badResult = io.circe.jawn.decodeAccumulating[Foo](bad)
And that's what we see (together with the specific location information for each failure):
scala> badResult.leftMap(_.map(println))
DecodingFailure(String, List(DownArray))
DecodingFailure(Int, List(DeleteGoLast, DownArray))
DecodingFailure([A]List[A], List(MoveRight, DownArray, DeleteGoParent, DeleteGoLast, DeleteGoRight, DownArray))
Which of these two approaches you should prefer is a matter of taste and whether or not you care about error accumulating—I personally find the first a little more readable.
I have some JSON coming from an external API which I have no control over. Part of the JSON is formatted like this:
{
"room_0": {
"area_sq_ft": 151.2
},
"room_1": {
"area_sq_ft": 200.0
}
}
Instead of using an array like they should have, they've used room_n for a key to n number of elements. Instead of creating a case class with room_0, room_1, room_2, etc., I want to convert this to a Seq[Room] where this is my Room case class:
case class Room(area: Double)
I am using Reads from play.api.libs.json for converting other parts of the JSON to case classes and would prefer to use Reads for this conversion. How could I accomplish that?
Here's what I've tried.
val sqFtReads = (__ \ "size_sq_ft").read[Double]
val roomReads = (__ \ "size_sq_ft").read[Seq[Room]](sqFtReads).map(Room)
cmd19.sc:1: overloaded method value read with alternatives:
(t: Seq[$sess.cmd17.Room])play.api.libs.json.Reads[Seq[$sess.cmd17.Room]] <and>
(implicit r: play.api.libs.json.Reads[Seq[$sess.cmd17.Room]])play.api.libs.json.Reads[Seq[$sess.cmd17.Room]]
cannot be applied to (play.api.libs.json.Reads[Double])
val roomReads = (__ \ "size_sq_ft").read[Seq[Room]](sqFtReads).map(Room)
A tricky little challenge but completely achievable with Reads.
First, Reads[Room] - i.e. the converter for a single Room instance:
val roomReads = new Reads[Room] {
override def reads(json: JsValue): JsResult[Room] = {
(json \ "area_sq_ft").validate[Double].map(Room(_))
}
}
Pretty straightforward; we peek into the JSON and try to find a top-level field called area_sq_ft which validates as a Double. If it's all good, we return the populated Room instance as needed.
Next up, the converter for your upstream object that in good Postel's Law fashion, you are cleaning up for your own consumers.
val strangeObjectReads = new Reads[Seq[Room]] {
override def reads(json: JsValue): JsResult[Seq[Room]] = {
json.validate[JsObject].map { jso =>
val roomsSortedNumerically = jso.fields.sortBy { case (name, contents) =>
val numericPartOfRoomName = name.dropWhile(!_.isDigit)
numericPartOfRoomName.toInt
}
roomsSortedNumerically.map { case (name, contents) =>
contents.as[Room](roomReads)
}
}
}
}
The key thing here is the json.validate[JsObject] around the whole lot. By mapping over this we get the JsResult that we need to wrap the whole thing, plus, we can get access to the fields inside the JSON object, which is defined as a Seq[(String, JsValue)].
To ensure we put the fields in the correct order in the output sequence, we do a little bit of string manipulation, getting the numeric part of the room_1 string, and using that as the sortBy criteria. I'm being a bit naive here and assuming your upstream server won't do anything nasty like skip room numbers!
Once you've got the rooms sorted numerically, we can just map over them, converting each one with our roomReads converter.
You've probably noticed that my custom Reads implementations are most definitely not one-liners. This comes from bitter experience dealing with oddball upstream JSON formats. Being a bit verbose, using a few more variables and breaking things up a bit pays off big time when that upstream server changes its JSON format suddenly!
I am trying to query an API which returns a JSON array (e.g. [{"name":"obj1", "value":5}, {"name":"obj2", "value":2}]) and process the result, which gets parsed as an Option[List[Map[String,Any]]]. However, I am not sure how to properly extract each Map, since the types are erased at runtime.
import scala.util.parsing.json._
import scalaj.http._
val url = "https://api.github.com/users/torvalds/repos"
val req = Http(url).asString
val parsed = JSON.parseFull(req.body) match {
case Some(data) => data match {
case list: List[_] => list
case _ => sys.error("Result is not a list.")
}
case None => sys.error("Invalid JSON received.")
}
parsed.foreach{
case x: Map[_,_] => x.get("full_name") // error here
}
The error occurs because I cannot apply the function with a String key type. However, because of type erasure, the key and value type are unknown, and specifying that it's a String map throws compiler warnings.
Am I going about things the wrong way? Or maybe I'd have better luck with a different HTTP/JSON library?
You can replace your last line with:
parsed.collect{ case x: Map[_,_] => x.asInstanceOf[Map[String,Any]].get("full_name") }
We sort of "cheat" here since we know the keys in a JSON are always Strings.
As for your last question, if you need something lightweight, I think what you have here is as simple as it gets.
Take a look at this SO post if you want to do something more powerful with your pattern matching.
I am trying out scala.js, and would like to use simple extracted row data in json format from my Postgres database.
Here is an contrived example of the type of json I would like to parse into some strongly typed scala collection, features to note are that there are n rows, various column types including an array just to cover likely real life scenarios, don't worry about the SQL which creates an inline table to extract the JSON from, I've included it for completeness, its the parsing of the JSON in scala.js that is causing me problems
with inline_t as (
select * from (values('One',1,True,ARRAY[1],1.0),
('Six',6,False,ARRAY[1,2,3,6],2.4494),
('Eight',8,False,ARRAY[1,2,4,8],2.8284)) as v (as_str,as_int,is_odd,factors,sroot))
select json_agg(t) from inline_t t;
[{"as_str":"One","as_int":1,"is_odd":true,"factors":[1],"sroot":1.0},
{"as_str":"Six","as_int":6,"is_odd":false,"factors":[1,2,3,6],"sroot":2.4494},
{"as_str":"Eight","as_int":8,"is_odd":false,"factors":[1,2,4,8],"sroot":2.8284}]
I think this should be fairly easy using something like upickle or prickle as hinted at here: How to parse a json string to a case class in scaja.js and vice versa but I haven't been able to find a code example, and I'm not up to speed enough with Scala or Scala.js to work it out myself. I'd be very grateful if someone could post some working code to show how to achieve the above
EDIT
This is the sort of thing I've tried, but I'm not getting very far
val jsparsed = scala.scalajs.js.JSON.parse(jsonStr3)
val jsp1 = jsparsed.selectDynamic("1")
val items = jsp1.map{ (item: js.Dynamic) =>
(item.as_str, item.as_int, item.is_odd, item.factors, item.sroot)
.asInstanceOf[js.Array[(String, Int, Boolean, Array[Int], Double)]].toSeq
}
println(items._1)
So you are in a situation where you actually want to manipulate JSON values. Since you're not serializing/deserializing Scala values from end-to-end, serialization libraries like uPickle or Prickle will not be very helpful to you.
You could have a look at a cross-platform JSON manipulation library, such as circe. That would give you the advantage that you wouldn't have to "deal with JavaScript data structures" at all. Instead, the library would parse your JSON and expose it as a Scala data structure. This is probably the best option if you want your code to also cross-compile.
If you're only writing Scala.js code, and you want a more lightweight version (no dependency), I recommend declaring types for your JSON "schema", then use those types to perform the conversion in a safer way:
import scala.scalajs.js
import scala.scalajs.js.annotation._
// type of {"as_str":"Six","as_int":6,"is_odd":false,"factors":[1,2,3,6],"sroot":2.4494}
#ScalaJSDefined
trait Row extends js.Object {
val as_str: String
val as_int: Int
val is_odd: Boolean
val factors: js.Array[Int]
val sroot: Double
}
type Rows = js.Array[Row]
val rows = js.JSON.parse(jsonStr3).asInstanceOf[Rows]
val items = (for (row <- rows) yield {
import row._
(as_str, as_int, is_odd, factors.toArray, sroot)
}).toSeq
I have a simple single key-valued Map(K,V) myDictionary that is populated by my program and at the end I want to write it as JSON format string in a text file - as I would need parse them later.
I was using this code earlier,
Some(new PrintWriter(outputDir+"/myDictionary.json")).foreach{p => p.write(compact(render(decompose(myDictionary)))); p.close}
I found it to be slower as the input size increased. Later, I used this var out = new
var out = new PrintWriter(outputDir+"/myDictionary.json");
out.println(scala.util.parsing.json.JSONObject(myDictionary.toMap).toString())
This is proving to be bit faster.
I have run this for sample input and found that this is faster than my earlier approach. I assuming my input map size would reach at least a million values( >1GB text file) (K,V) hence I want to make sure that I follow the faster and memory efficient approach for Map serialization process.What are other approaches that you would recommend,that I can look into to optimize this.
The JSON support in the standard Scala library is probably not the best choice. Unfortunately the situation with JSON libraries for Scala is a bit confusing, there are many alternatives (Lift JSON, Play JSON, Spray JSON, Twitter JSON, Argonaut, ...), basically one library for each day of the week... I suggest you have a look at these at least to see if any of them is easier to use and more performative.
Here is an example using Play JSON which I have chosen for particular reasons (being able to generate formats with macros):
object JsonTest extends App {
import play.api.libs.json._
type MyDict = Map[String, Int]
implicit object MyDictFormat extends Format[MyDict] {
def reads(json: JsValue): JsResult[MyDict] = json match {
case JsObject(fields) =>
val b = Map.newBuilder[String, Int]
fields.foreach {
case (k, JsNumber(v)) => b += k -> v.toInt
case other => return JsError(s"Not a (string, number) pair: $other")
}
JsSuccess(b.result())
case _ => JsError(s"Not an object: $json")
}
def writes(m: MyDict): JsValue = {
val fields: Seq[(String, JsValue)] = m.map {
case (k, v) => k -> JsNumber(v)
} (collection.breakOut)
JsObject(fields)
}
}
val m = Map("hallo" -> 12, "gallo" -> 34)
val serial = Json.toJson(m)
val text = Json.stringify(serial)
println(text)
val back = Json.fromJson[MyDict](serial)
assert(back == JsSuccess(m), s"Failed: $back")
}
While you can construct and deconstruct JsValues directly, the main idea is to use a Format[A] where A is the type of your data structure. This puts more emphasis on type safety than the standard Scala-Library JSON. It looks more verbose, but in end I think it's the better approach.
There are utility methods Json.toJson and Json.fromJson which look for an implicit format of the type you want.
On the other hand, it does construct everything in-memory and it does duplicate your data structure (because for each entry in your map you will have another tuple (String, JsValue)), so this isn't necessarily the most memory efficient solution, given that you are operating in the GB magnitude...
Jerkson is a Scala wrapper for the Java JSON library Jackson. The latter apparently has the feature to stream data. I found this project which says it adds streaming support. Play JSON in turn is based on Jerkson, so perhaps you can even figure out how to stream your object with that. See also this question.