Suggestions for Writing Map as JSON file in Scala - json

I have a simple single key-valued Map(K,V) myDictionary that is populated by my program and at the end I want to write it as JSON format string in a text file - as I would need parse them later.
I was using this code earlier,
Some(new PrintWriter(outputDir+"/myDictionary.json")).foreach{p => p.write(compact(render(decompose(myDictionary)))); p.close}
I found it to be slower as the input size increased. Later, I used this var out = new
var out = new PrintWriter(outputDir+"/myDictionary.json");
out.println(scala.util.parsing.json.JSONObject(myDictionary.toMap).toString())
This is proving to be bit faster.
I have run this for sample input and found that this is faster than my earlier approach. I assuming my input map size would reach at least a million values( >1GB text file) (K,V) hence I want to make sure that I follow the faster and memory efficient approach for Map serialization process.What are other approaches that you would recommend,that I can look into to optimize this.

The JSON support in the standard Scala library is probably not the best choice. Unfortunately the situation with JSON libraries for Scala is a bit confusing, there are many alternatives (Lift JSON, Play JSON, Spray JSON, Twitter JSON, Argonaut, ...), basically one library for each day of the week... I suggest you have a look at these at least to see if any of them is easier to use and more performative.
Here is an example using Play JSON which I have chosen for particular reasons (being able to generate formats with macros):
object JsonTest extends App {
import play.api.libs.json._
type MyDict = Map[String, Int]
implicit object MyDictFormat extends Format[MyDict] {
def reads(json: JsValue): JsResult[MyDict] = json match {
case JsObject(fields) =>
val b = Map.newBuilder[String, Int]
fields.foreach {
case (k, JsNumber(v)) => b += k -> v.toInt
case other => return JsError(s"Not a (string, number) pair: $other")
}
JsSuccess(b.result())
case _ => JsError(s"Not an object: $json")
}
def writes(m: MyDict): JsValue = {
val fields: Seq[(String, JsValue)] = m.map {
case (k, v) => k -> JsNumber(v)
} (collection.breakOut)
JsObject(fields)
}
}
val m = Map("hallo" -> 12, "gallo" -> 34)
val serial = Json.toJson(m)
val text = Json.stringify(serial)
println(text)
val back = Json.fromJson[MyDict](serial)
assert(back == JsSuccess(m), s"Failed: $back")
}
While you can construct and deconstruct JsValues directly, the main idea is to use a Format[A] where A is the type of your data structure. This puts more emphasis on type safety than the standard Scala-Library JSON. It looks more verbose, but in end I think it's the better approach.
There are utility methods Json.toJson and Json.fromJson which look for an implicit format of the type you want.
On the other hand, it does construct everything in-memory and it does duplicate your data structure (because for each entry in your map you will have another tuple (String, JsValue)), so this isn't necessarily the most memory efficient solution, given that you are operating in the GB magnitude...
Jerkson is a Scala wrapper for the Java JSON library Jackson. The latter apparently has the feature to stream data. I found this project which says it adds streaming support. Play JSON in turn is based on Jerkson, so perhaps you can even figure out how to stream your object with that. See also this question.

Related

How do I transform a JSON fields into a Seq in Scala Play framework 2?

I have some JSON coming from an external API which I have no control over. Part of the JSON is formatted like this:
{
"room_0": {
"area_sq_ft": 151.2
},
"room_1": {
"area_sq_ft": 200.0
}
}
Instead of using an array like they should have, they've used room_n for a key to n number of elements. Instead of creating a case class with room_0, room_1, room_2, etc., I want to convert this to a Seq[Room] where this is my Room case class:
case class Room(area: Double)
I am using Reads from play.api.libs.json for converting other parts of the JSON to case classes and would prefer to use Reads for this conversion. How could I accomplish that?
Here's what I've tried.
val sqFtReads = (__ \ "size_sq_ft").read[Double]
val roomReads = (__ \ "size_sq_ft").read[Seq[Room]](sqFtReads).map(Room)
cmd19.sc:1: overloaded method value read with alternatives:
(t: Seq[$sess.cmd17.Room])play.api.libs.json.Reads[Seq[$sess.cmd17.Room]] <and>
(implicit r: play.api.libs.json.Reads[Seq[$sess.cmd17.Room]])play.api.libs.json.Reads[Seq[$sess.cmd17.Room]]
cannot be applied to (play.api.libs.json.Reads[Double])
val roomReads = (__ \ "size_sq_ft").read[Seq[Room]](sqFtReads).map(Room)
A tricky little challenge but completely achievable with Reads.
First, Reads[Room] - i.e. the converter for a single Room instance:
val roomReads = new Reads[Room] {
override def reads(json: JsValue): JsResult[Room] = {
(json \ "area_sq_ft").validate[Double].map(Room(_))
}
}
Pretty straightforward; we peek into the JSON and try to find a top-level field called area_sq_ft which validates as a Double. If it's all good, we return the populated Room instance as needed.
Next up, the converter for your upstream object that in good Postel's Law fashion, you are cleaning up for your own consumers.
val strangeObjectReads = new Reads[Seq[Room]] {
override def reads(json: JsValue): JsResult[Seq[Room]] = {
json.validate[JsObject].map { jso =>
val roomsSortedNumerically = jso.fields.sortBy { case (name, contents) =>
val numericPartOfRoomName = name.dropWhile(!_.isDigit)
numericPartOfRoomName.toInt
}
roomsSortedNumerically.map { case (name, contents) =>
contents.as[Room](roomReads)
}
}
}
}
The key thing here is the json.validate[JsObject] around the whole lot. By mapping over this we get the JsResult that we need to wrap the whole thing, plus, we can get access to the fields inside the JSON object, which is defined as a Seq[(String, JsValue)].
To ensure we put the fields in the correct order in the output sequence, we do a little bit of string manipulation, getting the numeric part of the room_1 string, and using that as the sortBy criteria. I'm being a bit naive here and assuming your upstream server won't do anything nasty like skip room numbers!
Once you've got the rooms sorted numerically, we can just map over them, converting each one with our roomReads converter.
You've probably noticed that my custom Reads implementations are most definitely not one-liners. This comes from bitter experience dealing with oddball upstream JSON formats. Being a bit verbose, using a few more variables and breaking things up a bit pays off big time when that upstream server changes its JSON format suddenly!

Apply key to map obtained via pattern matching in Scala (type erased)

I am trying to query an API which returns a JSON array (e.g. [{"name":"obj1", "value":5}, {"name":"obj2", "value":2}]) and process the result, which gets parsed as an Option[List[Map[String,Any]]]. However, I am not sure how to properly extract each Map, since the types are erased at runtime.
import scala.util.parsing.json._
import scalaj.http._
val url = "https://api.github.com/users/torvalds/repos"
val req = Http(url).asString
val parsed = JSON.parseFull(req.body) match {
case Some(data) => data match {
case list: List[_] => list
case _ => sys.error("Result is not a list.")
}
case None => sys.error("Invalid JSON received.")
}
parsed.foreach{
case x: Map[_,_] => x.get("full_name") // error here
}
The error occurs because I cannot apply the function with a String key type. However, because of type erasure, the key and value type are unknown, and specifying that it's a String map throws compiler warnings.
Am I going about things the wrong way? Or maybe I'd have better luck with a different HTTP/JSON library?
You can replace your last line with:
parsed.collect{ case x: Map[_,_] => x.asInstanceOf[Map[String,Any]].get("full_name") }
We sort of "cheat" here since we know the keys in a JSON are always Strings.
As for your last question, if you need something lightweight, I think what you have here is as simple as it gets.
Take a look at this SO post if you want to do something more powerful with your pattern matching.

Abstraction to extract data from JSON in Scala

I am looking for a good abstraction to extract data form JSON (I am using json4s now).
Suppose I have a case class A and data in JSON format.
case class A(a1: String, a2: String, a3: String)
{"a1":"xxx", "a2": "yyy", "a3": "zzz"}
I need a function to extract the JSON data and return A with these data as follows:
val a: JValue => A = ...
I do not want to write the function a from scratch. I would rather compose it from primitive functions.
For example, I can write a primitive function to extract string by field name:
val str: (String, JValue) => String = {(fieldName, jval) => ... }
Now I would like to compose the function a: JValue => A from str. Does it make sense ?
Consider use of Play-JSON, which has a composable "Reads" object. If you've ever used ReactiveMongo, it can be used in much the same way. Contrary to some older posts here, it can be used stand-alone, without most of the rest of Play.
It uses the common "implicit translator" (my term) idiom. I found that my favorite deserializing pattern for using it is not highlighted in the docs, though - the pattern they espouse is a lot harder to get right, IMHO. I make heavy use of .as and .asOpt, which are documented on the first linked page above, in the small section "Using JsValue.as/asOpt". When deserializing a JSON object, you can say something like
val person:Person = (someParsedJsonObject \ "aPerson").as[Person]
and as long as you have an implicit Reads[Person] in scope, all just works. There are built-in Reads for all primitive types and many collection types. In many cases, it makes sense to put the Reads and Writes implicit objects in the companion object for, e.g., Person.
I thought json4s had a similar feature, but I could be wrong.
Argonaut is fully functional Scala library.
It allows to encode/decode case classes (JSON codecs).
import argonaut._, Argonaut._
case class Person(name: String, age: Int)
implicit def PersonDecodeJson: DecodeJson[Person]
jdecode2L(Person.apply)("name", "age")
// Codec for Person case class from JSON of form
// { "name": "string", "age": 1 }
It also provides JSON cursor (lenses/monocle) for custom parsing.
implicit def PersonDecodeJson: DecodeJson[Person] =
DecodeJson(c => for {
name <- (c --\ "_name").as[String]
age <- (c --\ "_age").as[String].map(_.toInt)
} yield Person(name, age))
// Decode Person from a JSON with property names different
// from those of the case class, and age passed as string:
// { "_name": "string", "age": "10" }
Parsing result is represented by DecodeResult type that can be composed (.map, .flatMap) and handle error cases.

Optimal way to read out JSON from MongoDB into a Scalatra API

I have a pre-formatted JSON blob stored as a string in MongoDB as a field in one of collections. Currently in my Scalatra based API, I have a before filter that renders all of my responses with a JSON content type. An example of how I return the content looks like the following:
get ("/boxscore", operation(getBoxscore)) {
val game_id:Int = params.getOrElse("game_id", "3145").toInt
val mongoColl = mongoDb.apply("boxscores")
val q: DBObject = MongoDBObject("game_id" -> game_id)
val res = mongoColl.findOne(q)
res match {
case Some(j) => JSON.parseFull(j("json_body").toString)
case None => NotFound("Requested document could not be found.")
}
}
Now this certainly does work. It doesn't seem the "Scala" way of doing things and I feel like this can be optimized. The worrisome part to me is when I add a caching layer and a cache does not hit that I am spending additional CPU time on re-parsing a String I already formatted as JSON in MongoDB:
JSON.parseFull(j("json_body").toString)
I have to take the result from findOne(), run .toString on it, then re-parse it into JSON afterwards. Is there a more optimal route? Since the JSON is already stored as a String in MongoDB, I'm guessing a serializer / case class isn't the right solution here. Of course I can just leave what's here - but I'd like to learn if there's a way that would be more Scala-like and CPU friendly going forward.
There is the option to extend Scalatra's render pipeline with handling for MongoDB classes. The following two routes act as an example. They return a MongoCursor and a DBObject as result. We are going to convert those to a string.
get("/") {
mongoColl.find
}
get("/:key/:value") {
val q = MongoDBObject(params("key") -> params("value"))
mongoColl.findOne(q) match {
case Some(x) => x
case None => halt(404)
}
}
In order to handle the types we need to define a partial function which takes care of the conversion and sets the appropriate content type.
There are two cases, the first one handles a DBObject. The content type is set to "application/json" and the object is converted to a string by calling the toString method. The second case handles a MongoCursor. Since it implements TraversableOnce the map function can be used.
def renderMongo = {
case dbo: DBObject =>
contentType = "application/json"
dbo.toString
case xs: TraversableOnce[_] => // handles a MongoCursor, be aware of type erasure here
contentType = "application/json"
val ls = xs map (x => x.toString) mkString(",")
"[" + ls + "]"
}: RenderPipeline
(Note the following type definition: type RenderPipeline = PartialFunction[Any, Any])
Now the method needs to get hooked in. After a HTTP call has been handled the result is forwarded to the render pipeline for further conversion. Custom handling can be added by overriding the renderPipeline method from ScalatraBase. With the following definition the renderMongo function is called first:
override protected def renderPipeline = renderMongo orElse super.renderPipeline
This is a basic approach to handle MongoDB types. There are other options as well, for example by making use of json4s-mongo.
Here is the previous code in a working sample project.

List to Json in Playframework Scala

I am new to scala and to the playframework, but so far it is great. I am having trouble figuring out how to turn a list of data into json (or really any complex structure). This isn't a real-world example, but here is what I am trying to do. Get some data from a database.
scala> val result:List[(Long,String)] = DB.withConnection { implicit c =>
SQL("select * from users").as(
long("id")~str("uid") map(flatten)*)
}
result: List[(Long, String)] = List((3,397a73ee5150429786863db144341bb3), (4,2850760dc9024c16bea6c8c65f409821), (5,636ee2bf758e4f699f27890ac55d7db2))
I would like to be able to then turn that into json and return it. based on this doc, it looks like i need to iterate through and call toJson on the result
http://www.playframework.org/documentation/2.0/ScalaJson
But, In practice i am having trouble doing it. Is this even the correct approach? Is there some scala concept that would make this simple? I see some examples using case classes, but I haven't quite wrapped my head around that concept yet.
I don't really expect this to work but, i guess I am conceptually trying to do something like this
scala> toJson(Map("response" -> result))
<console>:27: error: No Json deserializer found for type scala.collection.immutable.Map[java.lang.String,List[(Long, String)]]. Try to implement an implicit Writes or Format for this type.
toJson(Map("response" -> result))
Thanks
As said, you can write you own implicit Writes to do that but you can also rely on the existing Writes and just retrieve your data as a List[Map[String, Any]] and apply toJson on it:
val simple = {
get[Pk[Long]]("user.id") ~
get[Long]("user.uid") map {
case id~uid => Map("id" -> id.get.toString, "uid" -> uid.toString)
}
}
val result:List[Map(String,String)] = DB.withConnection { implicit c =>
SQL("select * from users").as(User.simple *)
}