Extracting string from JSON using Json4s using Scala - json

I have a JSON body in the following form:
val body =
{
"a": "hello",
"b": "goodbye"
}
I want to extract the VALUE of "a" (so I want "hello") and store that in a val.
I know I should use "parse" and "Extract" (eg. val parsedjson = parse(body).extract[String]) but I don't know how to use them to specifically extract the value of "a"

To use extract you need to create a class that matches the shape of the JSON that you are parsing. Here is an example using your input data:
val body ="""
{
"a": "hello",
"b": "goodbye"
}
"""
case class Body(a: String, b: String)
import org.json4s._
import org.json4s.jackson.JsonMethods._
implicit val formats = DefaultFormats
val b = Extraction.extract[Body](parse(body))
println(b.a) // hello

You'd have to use pattern matching/extractors:
val aOpt: List[String] = for {
JObject(map) <- parse(body)
JField("a", JString(value)) <- map
} yield value
alternatively use querying DSL
parse(body) \ "a" match {
case JString(value) => Some(value)
case _ => None
}
These are options as you have no guarantee that arbitrary JSON would contain field "a".
See documentation
extract would make sense if you were extracting whole JObject into a case class.

Related

Scala: Transform and replace values of Spark DataFrame with nested json structure

I have a nested json file that I am reading as Spark DataFrame and that I want to replace certain values in using an own transformation.
For now let's assume it looks as follows (which follows this)
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
// Convenience function for turning JSON strings into DataFrames.
def jsonToDataFrame(json: String, schema: StructType = null): DataFrame = {
// SparkSessions are available with Spark 2.0+
val reader = spark.read
Option(schema).foreach(reader.schema)
reader.json(sc.parallelize(Array(json)))
}
val df = jsonToDataFrame("""
{
"A": {
"B": "b",
"C": "c",
"D": {"E": "e"
}
}
}
""")
display(df)
df.printSchema()
Suppose the following transformation (turn lower-case to upper-case) shall be applied for certain values in above Spark DataFrame
import org.apache.spark.sql.functions.udf
val upper: String => String = _.toUpperCase
val upperUDF = udf(upper)
While this doesn't work at all:
df.withColumn("A.B", upperUDF('A.B)).show()
the following works:
val df1 = df.select("A.B")
df1.withColumn("B", upperUDF('B)).show()
But in the end I want to stick to my nested structure and just replace certain values accordign to my transformation.
How can one achieve that? How can one preserve the schema wehen using withColumn?
Finally I have found this thread which gives the answer to my question. The trick is to dynamically preserve the schema while transforming the columns. Using the mutate() function defined therein, the following woks well for me:
val df2 = mutate(df, c => if (c.toString == "A.B") upperUDF(c) else c)
val df3 = mutate(df, c => if (c.toString == "A.D.E") upperUDF(c) else c)
display(df2)
df2.printSchema
display(df3)
df3.printSchema

Capturing unused fields while decoding a JSON object with circe

Suppose I have a case class like the following, and I want to decode a JSON object into it, with all of the fields that haven't been used ending up in a special member for the leftovers:
import io.circe.Json
case class Foo(a: Int, b: String, leftovers: Json)
What's the best way to do this in Scala with circe?
(Note: I've seen questions like this a few times, so I'm Q-and-A-ing it for posterity.)
There are a couple of ways you could go about this. One fairly straightforward way would be to filter out the keys you've used after decoding:
import io.circe.{ Decoder, Json, JsonObject }
implicit val decodeFoo: Decoder[Foo] =
Decoder.forProduct2[Int, String, (Int, String)]("a", "b")((_, _)).product(
Decoder[JsonObject]
).map {
case ((a, b), all) =>
Foo(a, b, Json.fromJsonObject(all.remove("a").remove("b")))
}
Which works as you'd expect:
scala> val doc = """{ "something": false, "a": 1, "b": "abc", "0": 0 }"""
doc: String = { "something": false, "a": 1, "b": "abc", "0": 0 }
scala> io.circe.jawn.decode[Foo](doc)
res0: Either[io.circe.Error,Foo] =
Right(Foo(1,abc,{
"something" : false,
"0" : 0
}))
The disadvantage of this approach is that you have to maintain code to remove the keys you've used separately from their use, which can be error-prone. Another approach is to use circe's state-monad-powered decoding tools:
import cats.data.StateT
import cats.instances.either._
import io.circe.{ ACursor, Decoder, Json }
implicit val decodeFoo: Decoder[Foo] = Decoder.fromState(
for {
a <- Decoder.state.decodeField[Int]("a")
b <- Decoder.state.decodeField[String]("b")
rest <- StateT.inspectF((_: ACursor).as[Json])
} yield Foo(a, b, rest)
)
Which works the same way as the previous decoder (apart from some small differences in the errors you'll get if decoding fails):
scala> io.circe.jawn.decode[Foo](doc)
res1: Either[io.circe.Error,Foo] =
Right(Foo(1,abc,{
"something" : false,
"0" : 0
}))
This latter approach doesn't require you to change the used fields in multiple places, and it also has the advantage of looking a little more like any other decoder you'd write manually in circe.

Extract a Json from an array inside a json in spark

I have a complicated JSON column whose structure is :
story{
cards: [{story-elements: [{...}{...}{...}}]}
The length of the story-elements is variable. I need to extract a particular JSON block from the story-elements array. For this, I first need to extract the story-elements.
Here is the code which I have tried, but it is giving error:
import org.json4s.{DefaultFormats, MappingException}
import org.json4s.jackson.JsonMethods._
import org.apache.spark.sql.functions._
def getJsonContent(jsonstring: String): (String) = {
implicit val formats = DefaultFormats
val parsedJson = parse(jsonstring)
val value1 = (parsedJson\"cards"\"story-elements").extract[String]
value1
}
val getJsonContentUDF = udf((jsonstring: String) =>
getJsonContent(jsonstring))
input.withColumn("cards",getJsonContentUDF(input("storyDataFrame")))
According to json you provided, story-elements is a an array of json objects, but you trying to extract array as a string ((parsedJson\"cards"\"story-elements").extract[String]).
You can create case class representing on story (like case class Story(description: String, pageUrl: String, ...)) and then instead of extract[String], try extract[List[Story]] or extract[Array[Story]]
If you need just one piece of data from story (e.g. descrition), then you can use xpath-like syntax to get that and then extract List[String]

Rename JSON fields with circe

I want to have different names of fields in my case classes and in my JSON, therefore I need a comfortable way of renaming in both, encoding and decoding.
Does someone have a good solution ?
You can use Custom key mappings via annotations. The most generic way is the JsonKey annotation from io.circe.generic.extras._. Example from the docs:
import io.circe.generic.extras._, io.circe.syntax._
implicit val config: Configuration = Configuration.default
#ConfiguredJsonCodec case class Bar(#JsonKey("my-int") i: Int, s: String)
Bar(13, "Qux").asJson
// res5: io.circe.Json = JObject(object[my-int -> 13,s -> "Qux"])
This requires the package circe-generic-extras.
Here's a code sample for Decoder (bit verbose since it won't remove the old field):
val pimpedDecoder = deriveDecoder[PimpClass].prepare {
_.withFocus {
_.mapObject { x =>
val value = x("old-field")
value.map(x.add("new-field", _)).getOrElse(x)
}
}
}
implicit val decodeFieldType: Decoder[FieldType] =
Decoder.forProduct5("nth", "isVLEncoded", "isSerialized", "isSigningField", "type")
(FieldType.apply)
This is a simple way if you have lots of different field names.
https://circe.github.io/circe/codecs/custom-codecs.html
You can use the mapJson function on Encoder to derive an encoder from the generic one and remap your field name.
And you can use the prepare function on Decoder to transform the JSON passed to a generic Decoder.
You could also write both from scratch, but it may be a ton of boilerplate, those solutions should both be a handful of lines max each.
The following function can be used to rename a circe's JSON field:
import io.circe._
object CirceUtil {
def renameField(json: Json, fieldToRename: String, newName: String): Json =
(for {
value <- json.hcursor.downField(fieldToRename).focus
newJson <- json.mapObject(_.add(newName, value)).hcursor.downField(fieldToRename).delete.top
} yield newJson).getOrElse(json)
}
You can use it in an Encoder like so:
implicit val circeEncoder: Encoder[YourCaseClass] = deriveEncoder[YourCaseClass].mapJson(
CirceUtil.renameField(_, "old_field_name", "new_field_name")
)
Extra
Unit tests
import io.circe.parser._
import org.specs2.mutable.Specification
class CirceUtilSpec extends Specification {
"CirceUtil" should {
"renameField" should {
"correctly rename field" in {
val json = parse("""{ "oldFieldName": 1 }""").toOption.get
val resultJson = CirceUtil.renameField(json, "oldFieldName", "newFieldName")
resultJson.hcursor.downField("oldFieldName").focus must beNone
resultJson.hcursor.downField("newFieldName").focus must beSome
}
"return unchanged json if field is not found" in {
val json = parse("""{ "oldFieldName": 1 }""").toOption.get
val resultJson = CirceUtil.renameField(json, "nonExistentField", "newFieldName")
resultJson must be equalTo json
}
}
}
}

immutable Map (de)serialization to/from Play JSON

I have following (simplified) structure:
case class MyKey(key: String)
case class MyValue(value: String)
Let's assume that I have Play JSON formatters for both case classes.
As an example I have:
val myNewMessage = collection.immutable.Map(MyKey("key1") -> MyValue("value1"), MyKey("key2") -> MyValue("value2"))
As a result of following transformation
play.api.libs.json.Json.toJson(myNewMessage)
I'm expecting something like:
{ "key1": "value1", "key2": "value2" }
I have tried writing the formatter, but somehow I can not succeed:
implicit lazy val mapMyKeyMyValueFormat: Format[collection.immutable.Map[MyKey, MyValue]] = new Format[collection.immutable.Map[MyKey, MyValue]] {
override def writes(obj: collection.immutable.Map[MyKey, MyValue]): JsValue = Json.toJson(obj.map {
case (key, value) ⇒ Json.toJson(key) -> Json.toJson(value)
})
override def reads(json: JsValue): JsResult[collection.immutable.Map[MyKey, MyValue]] = ???
}
I have no idea how to write proper reads function. Is there any simpler way of doing it? I'm also not satisfied with my writes function.
Thx!
The reason the writes method is not working is because you're transforming the Map[MyKey, MyValue] into a Map[JsValue, JsValue], but you can't serialize that to JSON. The JSON keys need to be strings, so you need some way of transforming MyKey to some unique String value. Otherwise you'd be trying to serialize something like this:
{"key": "keyName"} : {"value": "myValue"}
Which is not valid JSON.
If MyKey is as simple as stated in your question, this can work:
def writes(obj: Map[MyKey, MyValue]): JsValue = Json.toJson(obj.map {
case (key, value) => key.key -> Json.toJson(value)
}) // ^ must be a String
Play will then know how to serialize a Map[String, MyValue], given the appropriate Writes[MyValue].
But I'm not certain that's what you want. Because it produces this:
scala> Json.toJson(myNewMessage)
res0: play.api.libs.json.JsValue = {"key1":{"value":"value1"},"key2":{"value":"value2"}}
If this is the output you want:
{ "key1": "value1", "key2": "value2" }
Then your Writes should look more like this:
def writes(obj: Map[MyKey, MyValue]): JsValue = {
obj.foldLeft(JsObject(Nil)) { case (js, (key, value)) =>
js ++ Json.obj(key.key -> value.value)
}
}
Which produces this:
scala> writes(myNewMessage)
res5: play.api.libs.json.JsValue = {"key1":"value1","key2":"value2"}
Reads are easy so long as the structure of MyKey and MyValue are the same, otherwise I have no idea what you'd want it to do. It's very dependent on the actual structure you want. As is, I would suggest leveraging existing Reads[Map[String, String]] and transforming it to the type you want.
def reads(js: JsValue): JsResult[Map[MyKey, MyValue]] = {
js.validate[Map[String, String]].map { case kvMap =>
kvMap.map { case (key, value) => MyKey(key) -> MyValue(value) }
}
}
It's hard to see much else without knowing the actual structure of the data. In general I stay away from having to serialize and deserialize Maps.