parsing json with different schema with Play json - json

I've got to parse a list of Json messages. I am using Play Json
All messages have similar structure, and at high level may be represented as
case class JMessage(
event: String,
messageType: String,
data: JsValue // variable data structure
)
data may hold entries of different types - Double, String, Int, so I cant go with a Map.
Currently there are at least three various types of the data. The structure of data may be identified by messageType.
So far I've created three case classes each representing the structure of data. As well as implicit Reads for them. And the 4th one that is a case class for result with some Option-al fields. So basically I need to map various json messages to some output format.
The approach I'm currently using is:
messages.map(Json.parse(_)).(_.as[JMessage]).map {
elem => {
if (elem.messageType == "event") {
Some(parseMessageOfTypeEvent(elem.data))
}
Some(..)
} else {
None
}
}
.filter(_.nonEmpty)
The parseMessageOfType%type% functions are basically (v: type) => JsValue.
So after all I have 4 case classes and 3 functions for parsing. It works, but it is ugly.
Is there a more beautiful Scala way to it?

Related

To handle dynamic fields at scala end

scenario is like :
we are pulling data from mongo in JSON format and processing through spark .
At times we are not getting the desired field inside the complex datatype eg nested array of string or struct within array .
Is there any workaround while loading JSON file to put null values to the absent field.
(validator checks)
2.If want to handle dynamic nature at scala end, how it is suppose to be.
def checkAvailableColumns(df: DataFrame, expectedColumnsInput: List[String]) : DataFrame = {
expectedColumnsInput.foldLeft(df) {
(df,column) => {
if(df.columns.contains(column) == false) {
df.withColumn(column,lit("null"))
}
else (df)
}
}
}
I am using the above code to verify if columns are present in source side while comparing with the required column names ,if not present put null to that column.
Question here is how to get complex data type like array of struct into general column name so that i can compare it .
(i can using dot operator to pull column with struct but if that column doesn't exist my script will fail .
Take a look at the Scala Option class. Let's assume you have a
case class JsonTemplate(optionalArray: Option[Seq[String]])
And let's assume you get the valid json {}, the parser will put None as the value. And you will get the instance: JsonTemplate(None)

Json4s: keep unknown fields in a map when deserialising

I am trying to parse the response given by a HTTP endpoint using json4s in scala. The Json returned could have many number of fields (They are all documented and defined, but there are a lot of them, and they are subject to change.) I don't need to reference many of these fields, only pass them on to some other service to deal with.
I want to take the fields I need, and deserialise the rest to a map. The class needs to be serialised correctly also.
e.g. JSON response from endpoint:
{
"name": "value",
"unknown_field": "unknown",
"unknown_array": ["one", "two", "three"],
...
...
}
e.g. case class used in code:
case class TestResponse(name: String, otherFields: Map[String, Any])
Is there a simple solution for this?
I have made an attempt to implement a custom Serialiser for this, but have not had much luck as yet. Seems like this would be a common enough requirement. Is there a way to do this OOTB with json4s?
Cheers
Current attempt at customer serialiser:
private object TestResponseDeserializer extends CustomSerializer[TestResponse](_ => ( {
case JObject(JField("name_one", JString(name)) :: rest) => TestType1Response(name, rest.toMap)
case JObject(JField("name_two", JString(name)) :: rest) => TestType2Response(name, rest.toMap)
}, {
case testType1: TestType1Response=>
JObject(JField("name_one", JString(testType1.name)))
case testType2: TestType2Response=> JObject(JField("name_two", JString(testType2.name)))
}))
I was able to solve this using the custom serialiser in the original question. It didn't work for me due to an unrelated issue.

How do I transform a JSON fields into a Seq in Scala Play framework 2?

I have some JSON coming from an external API which I have no control over. Part of the JSON is formatted like this:
{
"room_0": {
"area_sq_ft": 151.2
},
"room_1": {
"area_sq_ft": 200.0
}
}
Instead of using an array like they should have, they've used room_n for a key to n number of elements. Instead of creating a case class with room_0, room_1, room_2, etc., I want to convert this to a Seq[Room] where this is my Room case class:
case class Room(area: Double)
I am using Reads from play.api.libs.json for converting other parts of the JSON to case classes and would prefer to use Reads for this conversion. How could I accomplish that?
Here's what I've tried.
val sqFtReads = (__ \ "size_sq_ft").read[Double]
val roomReads = (__ \ "size_sq_ft").read[Seq[Room]](sqFtReads).map(Room)
cmd19.sc:1: overloaded method value read with alternatives:
(t: Seq[$sess.cmd17.Room])play.api.libs.json.Reads[Seq[$sess.cmd17.Room]] <and>
(implicit r: play.api.libs.json.Reads[Seq[$sess.cmd17.Room]])play.api.libs.json.Reads[Seq[$sess.cmd17.Room]]
cannot be applied to (play.api.libs.json.Reads[Double])
val roomReads = (__ \ "size_sq_ft").read[Seq[Room]](sqFtReads).map(Room)
A tricky little challenge but completely achievable with Reads.
First, Reads[Room] - i.e. the converter for a single Room instance:
val roomReads = new Reads[Room] {
override def reads(json: JsValue): JsResult[Room] = {
(json \ "area_sq_ft").validate[Double].map(Room(_))
}
}
Pretty straightforward; we peek into the JSON and try to find a top-level field called area_sq_ft which validates as a Double. If it's all good, we return the populated Room instance as needed.
Next up, the converter for your upstream object that in good Postel's Law fashion, you are cleaning up for your own consumers.
val strangeObjectReads = new Reads[Seq[Room]] {
override def reads(json: JsValue): JsResult[Seq[Room]] = {
json.validate[JsObject].map { jso =>
val roomsSortedNumerically = jso.fields.sortBy { case (name, contents) =>
val numericPartOfRoomName = name.dropWhile(!_.isDigit)
numericPartOfRoomName.toInt
}
roomsSortedNumerically.map { case (name, contents) =>
contents.as[Room](roomReads)
}
}
}
}
The key thing here is the json.validate[JsObject] around the whole lot. By mapping over this we get the JsResult that we need to wrap the whole thing, plus, we can get access to the fields inside the JSON object, which is defined as a Seq[(String, JsValue)].
To ensure we put the fields in the correct order in the output sequence, we do a little bit of string manipulation, getting the numeric part of the room_1 string, and using that as the sortBy criteria. I'm being a bit naive here and assuming your upstream server won't do anything nasty like skip room numbers!
Once you've got the rooms sorted numerically, we can just map over them, converting each one with our roomReads converter.
You've probably noticed that my custom Reads implementations are most definitely not one-liners. This comes from bitter experience dealing with oddball upstream JSON formats. Being a bit verbose, using a few more variables and breaking things up a bit pays off big time when that upstream server changes its JSON format suddenly!

Play Json Writes: Scala Seq to Json Object

In Scala, I have the following data structure (Item names are always unique within the same Container):
case class Container(content: Seq[Item])
case class Item(name: String, elements: Seq[String])
Example instance:
val container = Container(Seq(
Item("A", Seq("A1", "A2")),
Item("B", Seq("B1", "B2"))
))
What I want to do is to define a Writes[Container] that produces the following JSON:
{
"A": ["A1", "A2"],
"B": ["B1", "B2"]
}
I guess a possible solution could be to transform the Container(Seq[Item]) into a Map[String, Seq[String]] where each key corresponds to an item's name and the value to an item's elements and let the API do the rest (there's probably an implicit write for maps, at least this is the case when reading JSON).
But: this approach creates a new Map for every Container with no other purpose than producing JSON. There are a lot Containerinstances that need to be transformed to JSON, so I assume this approach is rather expensive. How else could I do this?
I don't think you should necessarily worry about the speed here (or at least verify that it is a problem before worrying about it), and converting to a map is probably the easiest option. An alternative, which may well not perform any better, is:
val customWrites: Writes[Container] = new Writes[Container] {
override def writes(container: Container): JsValue = {
val elems = container.content.map(
elem => elem.name -> Json.toJsFieldJsValueWrapper(elem.elements))
Json.obj(elems: _*)
}
}
(The explicit conversion to a JsValueWrapper - which is normally implicit - seems to be necessary in this context for reasons I don't entirely understand or have time to delve into. This answer has some details.)
One advantage of this method is that it will handle Item objects with duplicate names (which of course is legal JSON but would cause collisions with a map.)

Parsing the JSON representation of database rows in Scala.js

I am trying out scala.js, and would like to use simple extracted row data in json format from my Postgres database.
Here is an contrived example of the type of json I would like to parse into some strongly typed scala collection, features to note are that there are n rows, various column types including an array just to cover likely real life scenarios, don't worry about the SQL which creates an inline table to extract the JSON from, I've included it for completeness, its the parsing of the JSON in scala.js that is causing me problems
with inline_t as (
select * from (values('One',1,True,ARRAY[1],1.0),
('Six',6,False,ARRAY[1,2,3,6],2.4494),
('Eight',8,False,ARRAY[1,2,4,8],2.8284)) as v (as_str,as_int,is_odd,factors,sroot))
select json_agg(t) from inline_t t;
[{"as_str":"One","as_int":1,"is_odd":true,"factors":[1],"sroot":1.0},
{"as_str":"Six","as_int":6,"is_odd":false,"factors":[1,2,3,6],"sroot":2.4494},
{"as_str":"Eight","as_int":8,"is_odd":false,"factors":[1,2,4,8],"sroot":2.8284}]
I think this should be fairly easy using something like upickle or prickle as hinted at here: How to parse a json string to a case class in scaja.js and vice versa but I haven't been able to find a code example, and I'm not up to speed enough with Scala or Scala.js to work it out myself. I'd be very grateful if someone could post some working code to show how to achieve the above
EDIT
This is the sort of thing I've tried, but I'm not getting very far
val jsparsed = scala.scalajs.js.JSON.parse(jsonStr3)
val jsp1 = jsparsed.selectDynamic("1")
val items = jsp1.map{ (item: js.Dynamic) =>
(item.as_str, item.as_int, item.is_odd, item.factors, item.sroot)
.asInstanceOf[js.Array[(String, Int, Boolean, Array[Int], Double)]].toSeq
}
println(items._1)
So you are in a situation where you actually want to manipulate JSON values. Since you're not serializing/deserializing Scala values from end-to-end, serialization libraries like uPickle or Prickle will not be very helpful to you.
You could have a look at a cross-platform JSON manipulation library, such as circe. That would give you the advantage that you wouldn't have to "deal with JavaScript data structures" at all. Instead, the library would parse your JSON and expose it as a Scala data structure. This is probably the best option if you want your code to also cross-compile.
If you're only writing Scala.js code, and you want a more lightweight version (no dependency), I recommend declaring types for your JSON "schema", then use those types to perform the conversion in a safer way:
import scala.scalajs.js
import scala.scalajs.js.annotation._
// type of {"as_str":"Six","as_int":6,"is_odd":false,"factors":[1,2,3,6],"sroot":2.4494}
#ScalaJSDefined
trait Row extends js.Object {
val as_str: String
val as_int: Int
val is_odd: Boolean
val factors: js.Array[Int]
val sroot: Double
}
type Rows = js.Array[Row]
val rows = js.JSON.parse(jsonStr3).asInstanceOf[Rows]
val items = (for (row <- rows) yield {
import row._
(as_str, as_int, is_odd, factors.toArray, sroot)
}).toSeq