Abstraction to extract data from JSON in Scala - json

I am looking for a good abstraction to extract data form JSON (I am using json4s now).
Suppose I have a case class A and data in JSON format.
case class A(a1: String, a2: String, a3: String)
{"a1":"xxx", "a2": "yyy", "a3": "zzz"}
I need a function to extract the JSON data and return A with these data as follows:
val a: JValue => A = ...
I do not want to write the function a from scratch. I would rather compose it from primitive functions.
For example, I can write a primitive function to extract string by field name:
val str: (String, JValue) => String = {(fieldName, jval) => ... }
Now I would like to compose the function a: JValue => A from str. Does it make sense ?

Consider use of Play-JSON, which has a composable "Reads" object. If you've ever used ReactiveMongo, it can be used in much the same way. Contrary to some older posts here, it can be used stand-alone, without most of the rest of Play.
It uses the common "implicit translator" (my term) idiom. I found that my favorite deserializing pattern for using it is not highlighted in the docs, though - the pattern they espouse is a lot harder to get right, IMHO. I make heavy use of .as and .asOpt, which are documented on the first linked page above, in the small section "Using JsValue.as/asOpt". When deserializing a JSON object, you can say something like
val person:Person = (someParsedJsonObject \ "aPerson").as[Person]
and as long as you have an implicit Reads[Person] in scope, all just works. There are built-in Reads for all primitive types and many collection types. In many cases, it makes sense to put the Reads and Writes implicit objects in the companion object for, e.g., Person.
I thought json4s had a similar feature, but I could be wrong.

Argonaut is fully functional Scala library.
It allows to encode/decode case classes (JSON codecs).
import argonaut._, Argonaut._
case class Person(name: String, age: Int)
implicit def PersonDecodeJson: DecodeJson[Person]
jdecode2L(Person.apply)("name", "age")
// Codec for Person case class from JSON of form
// { "name": "string", "age": 1 }
It also provides JSON cursor (lenses/monocle) for custom parsing.
implicit def PersonDecodeJson: DecodeJson[Person] =
DecodeJson(c => for {
name <- (c --\ "_name").as[String]
age <- (c --\ "_age").as[String].map(_.toInt)
} yield Person(name, age))
// Decode Person from a JSON with property names different
// from those of the case class, and age passed as string:
// { "_name": "string", "age": "10" }
Parsing result is represented by DecodeResult type that can be composed (.map, .flatMap) and handle error cases.

Related

PlayJSON in Scala

I am trying to familiarize myself with the PlayJSON library. I have a JSON formatted file like this:
{
"Person": [
{
"name": "Jonathon",
"age": 24,
"job": "Accountant"
}
]
}
However, I'm having difficulty with parsing it properly due to the file having different types (name is a String but age is an Int). I could technically make it so the age is a String and call .toInt on it later but for my purposes, it is by default an integer.
I know how to parse some of it:
import play.api.libs.json.{JsValue, Json}
val parsed: JsValue = Json.parse(jsonFile) //assuming jsonFile is that entire JSON String as shown above
val person: List[Map[String, String]] = (parsed \ "Person").as[List[Map[String, String]]]
Creating that person value throws an error. I know Scala is a strongly-typed language but I'm sure there is something I am missing here. I feel like this is an obvious fix too but I'm not quite sure.
The error produced is:
JsResultException(errors:List(((0)/age,List(JsonValidationError(List(error.expected.jsstring),WrappedArray())))
The error you are having, as explained in the error you are getting, is in casting to the map of string to string. The data you provided does not align with it, because the age is a string. If you want to keep in with this approach, you need to parse it into a type that will handle both strings and ints. For example:
(parsed \ "Person").validate[List[Map[String, Any]]]
Having said that, as #Luis wrote in a comment, you can just use case class to parse it. Lets declare 2 case classes:
case class JsonParsingExample(Person: Seq[Person])
case class Person(name: String, age: Int, job: String)
Now we will create a formatter for each of them on their corresponding companion object:
object Person {
implicit val format: OFormat[Person] = Json.format[Person]
}
object JsonParsingExample {
implicit val format: OFormat[JsonParsingExample] = Json.format[JsonParsingExample]
}
Now we can just do:
Json.parse(jsonFile).validate[JsonParsingExample]
Code run at Scastie.

How to handle different JSON schemas and dispatch hem to be handled by the right parser?

I am currently building a very simple JSON parser in Scala that has to deal with two (slightly) different schemas. My objective is to parse one value in the json, and based on that value, I would like to dispatch it to the relevant decoder. I have used circe for my implementation, but other implementations and/or suggestions are also welcome.
I have formulated a simplified version for my example to help clarifying the question.
There are two types of JSONs that I can receive, either a stock:
"data": {
"name": "XYZ"
},
"type": "STOCK"
}
Or a quote (which is similar to the stock but includes a price).
"data": {
"name": "ABC",
"price": 1151.6214,
},
"type": "QUOTE"
}
On my side, I have developed a simple decoder that looks like this (for the stock):
implicit private val dataDecoder: Decoder[Stock] = (hCursor: HCursor) => {
for {
isin <- hCursor.downField("data").downField("name").as[String]
typ <- hCursor.downField("type").as[StockType]
} yield Instrument(name, typ, LocalDateTime.now())
}
I could also develop a parser that only parses the "type" part of the JSON, and then send the data to be handled by the relevant parser (quote or stock). However, I am wondering what is the:
Efficient way to do this
Idiomatic/clean way to do this
To help rephrase my question if needed, what is the right & efficient way to handle slightly different JSON schemas, and forward them to be handled by the right parser.
I usually encounter this situation when need to serialize ADTs. As somebody mentioned in the comments to your question, circe has support for ADT codec auto generation, however I usually prefer to manually write the codecs.
In any case in a situation like your I would do something along these lines:
sealed trait Data
case class StockData(name: String) extends Data
case class QuoteData(name: String, quote: Double) extends Data
implicit val stockDataEncoder: Encoder[StockData] = ???
implicit val stockDataDecoder: Decoder[StockData] = ???
implicit val quoteDataEncoder: Encoder[QuoteData] = ???
implicit val quoteDataDecoder: Decoder[QuoteData] = ???
implicit val dataEncoder: Encoder[Data] = Encoder.instance {
case s: StockData => stockDataEncoder(s).withObject(_.add("type", "stock))
case q: QuoteData => quoteDataEncoder(q).withObject(_.add("type", "quote"))
}
implicit val dataDecoder: Decoder[Data] = Decoder.instance { c =>
for {
stype <- c.get[String]("type)
res <- stype match {
case "stock" => stockDataDecoder(c)
case "quote" => quoteDataDecoder(c)
case unk => Left(DecodingFailure(s"Unsupported data type: ${unk}", c.history))
}
} yield res
}

How to properly use JSON.parse in kotlinjs with enums?

During my fresh adventures with kotlin-react I hit a hard stop when trying to parse some data from my backend which contains enum values.
Spring-Boot sends the object in JSON form like this:
{
"id": 1,
"username": "Johnny",
"role": "CLIENT"
}
role in this case is the enum value and can have the two values CLIENT and LECTURER. If I were to parse this with a java library or let this be handled by Spring-Boot, role would be parsed to the corresponding enum value.
With kotlin-js' JSON.parse, that wouldn't work and I would have a simple string value in there.
After some testing, I came up with this snippet
val json = """{
"id": 1,
"username": "Johnny",
"role": "CLIENT",
}"""
val member: Member = JSON.parse(json) { key: String, value: Any? ->
if (key == "role") Member.Role.valueOf(value.toString())
else value
}
in which I manually have to define the conversion from the string value to the enum.
Is there something I am missing that would simplify this behaviour?
(I am not referring to using ids for the JSON and the looking those up, etc. I am curious about some method in Kotlin-JS)
I have the assumption there is not because the "original" JSON.parse in JS doesn't do this and Kotlin does not add any additional stuff in there but I still have hope!
As far as I know, no.
The problem
Kotlin.JS produces an incredibly weird type situation when deserializing using the embedded JSON class, which actually is a mirror for JavaScript's JSON class. While I haven't done much JavaScript, its type handling is near non-existent. Only manual throws can enforce it, so JSON.parse doesn't care if it returns a SomeCustomObject or a newly created object with the exact same fields.
As an example of that, if you have two different classes with the same field names (no inheritance), and have a function that accepts a variable, it doesn't care which of those (or a third for that matter) it receives as long as the variables it tries accessing on the class exists.
The type issues manifest themselves into Kotlin. Now wrapping it back to Kotlin, consider this code:
val json = """{
"x": 1, "y": "yes", "z": {
"x": 42, "y": 314159, "z": 444
}
}""".trimIndent()
data class SomeClass(val x: Int, val y: String, val z: Struct)
data class Struct(val x: Int, val y: Int, val z: Int)
fun main(args: Array<String>) {
val someInstance = JSON.parse<SomeClass>(json)
if(someInstance.z::class != Struct::class) {
println("Incompatible types: Required ${Struct::class}, found ${someInstance.z::class}");
}
}
What would you expect this to print? The natural would be to expect a Struct. The type is also explicitly declared
Unfortunately, that is not the case. Instead, it prints:
Incompatible types: Required class Struct, found class Any
The point
The embedded JSON de/serializer isn't good with types. You might be able to fix this by using a different serializing library, but I'll avoid turning this into a "use [this] library".
Essentially, JSON.parse fails to parse objects as expected. If you entirely remove the arguments and try a raw JSON.parse(json); on the JSON in your question, you'll get a role that is a String and not a Role, which you might expect. And with JSON.parse doing no type conversion what so ever, that means you have two options: using a library, or using your approach.
Your approach will unfortunately get complicated if you have nested objects, but with the types being changed, the only option you appear to have left is explicitly parsing the objects manually.
TL;DR: your approach is fine.

Argonaut: Generic method to encode/decode array of objects

I am trying to implement a generic pattern with which to generate marshallers and unmarshallers for an Akka HTTP REST service using Argonaut, handling both entity and collection level requests and responses. I have no issues in implementing the entity level as such:
case class Foo(foo: String)
object Foo {
implicit val FooJsonCodec = CodecJson.derive[Foo]
implicit val EntityEncodeJson = FooJson.Encoder
implicit val EntityDecodeJson = FooJson.Decoder
}
I am running into issues attempting to provide encoders and decoders for the following:
[
{ "foo": "1" },
{ "foo": "2" }
]
I have attempted adding the following to my companion:
object Foo {
implicit val FooCollectionJsonCodec = CodecJson.derive[HashSet[Foo]]
}
However, I am receiving the following error:
Error:(33, 90) value jencode0L is not a member of object argonaut.EncodeJson
I see this method truly does not exist but is there any other generic method to generate my expected result. I'm strongly avoiding using an additional case class to describe the collection since I am using reflection heavily in my use case.
At this point, I'd even be fine with a manually constructed Encoder and Decoder, however, I've found no documentation on how to construct it with the expected structure.
Argonaut have predefined encoders and decoders for Scala's immutable lists, sets, streams and vectors. If your type is not supported explicitly, as in the case of java.util.HashSet, you can easily add EncodeJson and DecodeJson for the type:
import argonaut._, Argonaut._
import scala.collection.JavaConverters._
implicit def hashSetEncode[A](
implicit element: EncodeJson[A]
): EncodeJson[java.util.HashSet[A]] =
EncodeJson(set => EncodeJson.SetEncodeJson[A].apply(set.asScala.toSet))
implicit def hashSetDecode[A](
implicit element: DecodeJson[A]
): DecodeJson[java.util.HashSet[A]] =
DecodeJson(cursor => DecodeJson.SetDecodeJson[A]
.apply(cursor)
.map(set => new java.util.HashSet(set.asJava)))
// Usage:
val set = new java.util.HashSet[Int]
set.add(1)
set.add(3)
val jsonSet = set.asJson // [1, 3]
jsonSet.jdecode[java.util.HashSet[Int]] // DecodeResult(Right([1, 3]))
case class A(set: java.util.HashSet[Int])
implicit val codec = CodecJson.derive[A]
val a = A(set)
val jsonA = a.asJson // { "set": [1, 3] }
jsonA.jdecode[A] // DecodeResult(Right(A([1, 3])))
Sample is checked on Scala 2.12.1 and Argonaut 6.2-RC2, but as far as I know it shouldn't depend on some latest changes.
Approach like this works with any linear or unordered homogenous data structure that you want to represent as JSON array. Also, this is preferable to creating a CodecJson: latter can be inferred automatically from JsonEncode and JsonDecode, but not vice versa. This way, your set will serialize and deserialize both when used independently or within other data type, as shown in example.
I don't use Argonaut but use spray-json and suspect solution can be similar.
Have you tried something like this ?
implicit def HashSetJsonCodec[T : CodecJson] = CodecJson.derive[Set[T]]
if it doesn't work I'd probably try creating more verbose implicit function like
implicit def SetJsonCodec[T: CodecJson](implicit codec: CodecJson[T]): CodecJson[Set[T]] = {
CodecJson(
{
case value => JArray(value.map(codec.encode).toList)
},
c => c.as[JsonArray].flatMap {
case arr: Json.JsonArray =>
val items = arr.map(codec.Decoder.decodeJson)
items.find(_.isError) match {
case Some(error) => DecodeResult.fail[Set[T]](error.toString(), c.history)
case None => DecodeResult.ok[Set[T]](items.flatMap(_.value).toSet[T])
}
}
)
}
PS. I didn't test this but hopefully it leads you to the right direction :)

Suggestions for Writing Map as JSON file in Scala

I have a simple single key-valued Map(K,V) myDictionary that is populated by my program and at the end I want to write it as JSON format string in a text file - as I would need parse them later.
I was using this code earlier,
Some(new PrintWriter(outputDir+"/myDictionary.json")).foreach{p => p.write(compact(render(decompose(myDictionary)))); p.close}
I found it to be slower as the input size increased. Later, I used this var out = new
var out = new PrintWriter(outputDir+"/myDictionary.json");
out.println(scala.util.parsing.json.JSONObject(myDictionary.toMap).toString())
This is proving to be bit faster.
I have run this for sample input and found that this is faster than my earlier approach. I assuming my input map size would reach at least a million values( >1GB text file) (K,V) hence I want to make sure that I follow the faster and memory efficient approach for Map serialization process.What are other approaches that you would recommend,that I can look into to optimize this.
The JSON support in the standard Scala library is probably not the best choice. Unfortunately the situation with JSON libraries for Scala is a bit confusing, there are many alternatives (Lift JSON, Play JSON, Spray JSON, Twitter JSON, Argonaut, ...), basically one library for each day of the week... I suggest you have a look at these at least to see if any of them is easier to use and more performative.
Here is an example using Play JSON which I have chosen for particular reasons (being able to generate formats with macros):
object JsonTest extends App {
import play.api.libs.json._
type MyDict = Map[String, Int]
implicit object MyDictFormat extends Format[MyDict] {
def reads(json: JsValue): JsResult[MyDict] = json match {
case JsObject(fields) =>
val b = Map.newBuilder[String, Int]
fields.foreach {
case (k, JsNumber(v)) => b += k -> v.toInt
case other => return JsError(s"Not a (string, number) pair: $other")
}
JsSuccess(b.result())
case _ => JsError(s"Not an object: $json")
}
def writes(m: MyDict): JsValue = {
val fields: Seq[(String, JsValue)] = m.map {
case (k, v) => k -> JsNumber(v)
} (collection.breakOut)
JsObject(fields)
}
}
val m = Map("hallo" -> 12, "gallo" -> 34)
val serial = Json.toJson(m)
val text = Json.stringify(serial)
println(text)
val back = Json.fromJson[MyDict](serial)
assert(back == JsSuccess(m), s"Failed: $back")
}
While you can construct and deconstruct JsValues directly, the main idea is to use a Format[A] where A is the type of your data structure. This puts more emphasis on type safety than the standard Scala-Library JSON. It looks more verbose, but in end I think it's the better approach.
There are utility methods Json.toJson and Json.fromJson which look for an implicit format of the type you want.
On the other hand, it does construct everything in-memory and it does duplicate your data structure (because for each entry in your map you will have another tuple (String, JsValue)), so this isn't necessarily the most memory efficient solution, given that you are operating in the GB magnitude...
Jerkson is a Scala wrapper for the Java JSON library Jackson. The latter apparently has the feature to stream data. I found this project which says it adds streaming support. Play JSON in turn is based on Jerkson, so perhaps you can even figure out how to stream your object with that. See also this question.