Suppose I have some JSON data like this:
{
"data": {
"title": "example input",
"someBoolean": false,
"innerData": {
"innerString": "input inner string",
"innerBoolean": true,
"innerCollection": [1,2,3,4,5]
},
"collection": [6,7,8,9,0]
}
}
And I want to flatten it a bit and transform or remove some fields, to get the following result:
{
"data": {
"ttl": "example input",
"bool": false,
"collection": [6,7,8,9,0],
"innerCollection": [1,2,3,4,5]
}
}
How can I do this with Circe?
(Note that I'm asking this as a FAQ since similar questions often come up in the Circe Gitter channel. This specific example is from a question asked there yesterday.)
I've sometimes said that Circe is primarily a library for encoding and decoding JSON, not for transforming JSON values, and in general I'd recommend mapping to Scala types and then defining relationships between those (as Andriy Plokhotnyuk suggests here), but for many cases writing transformations with cursors works just fine, and in my view this kind of thing is one of them.
Here's how I'd implement this transformation:
import io.circe.{DecodingFailure, Json, JsonObject}
import io.circe.syntax._
def transform(in: Json): Either[DecodingFailure, Json] = {
val someBoolean = in.hcursor.downField("data").downField("someBoolean")
val innerData = someBoolean.delete.downField("innerData")
for {
boolean <- someBoolean.as[Json]
collection <- innerData.get[Json]("innerCollection")
obj <- innerData.delete.up.as[JsonObject]
} yield Json.fromJsonObject(
obj.add("boolean", boolean).add("collection", collection)
)
}
And then:
val Right(json) = io.circe.jawn.parse(
"""{
"data": {
"title": "example input",
"someBoolean": false,
"innerData": {
"innerString": "input inner string",
"innerBoolean": true,
"innerCollection": [1,2,3]
},
"collection": [6,7,8]
}
}"""
)
And:
scala> transform(json)
res1: Either[io.circe.DecodingFailure,io.circe.Json] =
Right({
"data" : {
"title" : "example input",
"collection" : [
6,
7,
8
]
},
"boolean" : false,
"collection" : [
1,
2,
3
]
})
If you look at it the right way, our transform method kind of resembles a decoder, and we can actually write it as one (although I'd definitely recommend not making it implicit):
import io.circe.{Decoder, Json, JsonObject}
import io.circe.syntax._
val transformData: Decoder[Json] = { c =>
val someBoolean = c.downField("data").downField("someBoolean")
val innerData = someBoolean.delete.downField("innerData")
(
innerData.delete.up.as[JsonObject],
someBoolean.as[Json],
innerData.get[Json]("innerCollection")
).mapN(_.add("boolean", _).add("collection", _)).map(Json.fromJsonObject)
}
This can be convenient in some situations where you want to perform the transformation as part of a pipeline that expects a decoder:
scala> io.circe.jawn.decode(myJsonString)(transformData)
res2: Either[io.circe.Error,io.circe.Json] =
Right({
"data" : {
"title" : "example input",
"collection" : [ ...
This is also potentially confusing, though, and I've thought about adding some kind of Transformation type to Circe that would encapsulate transformations like this without questionably repurposing the Decoder type class.
One nice thing about both the transform method and this decoder is that if the input data doesn't have the expected shape, the resulting error will include a history that points to the problem.
Related
I have been given a json string that looks like the following one:
{
"dataflows": [
{
"name": "test",
"sources": [
{
"name": "person_inputs",
"path": "/data/input/events/person/*",
"format": "JSON"
}
],
"transformations": [
{
"name": "validation",
"type": "validate_fields",
"params": {
"input": "person_inputs",
"validations": [
{
"field": "office",
"validations": [
"notEmpty"
]
},
{
"field": "age",
"validations": [
"notNull"
]
}
]
}
},
{
"name": "ok_with_date",
"type": "add_fields",
"params": {
"input": "validation_ok",
"addFields": [
{
"name": "dt",
"function": "current_timestamp"
}
]
}
}
],
"sinks": [
{
"input": "ok_with_date",
"name": "raw-ok",
"paths": [
"/data/output/events/person"
],
"format": "JSON",
"saveMode": "OVERWRITE"
},
{
"input": "validation_ko",
"name": "raw-ko",
"paths": [
"/data/output/discards/person"
],
"format": "JSON",
"saveMode": "OVERWRITE"
}
And I have been asked to use it as some kind of recipe for an ETL pipeline, i.e., the data must be extracted from the "path" specifid in the "sources" key, the transformations to be carried out are specified within the "transformations" key and, finally, the transformed data must saved to one of the two specified "sink" keys.
I have decided to convert the json string into a scala map, as follows:
val json = Source.fromFile("path/to/json")
//parse
val parsedJson = jsonStrToMap(json.mkString)
implicit val formats = org.json4s.DefaultFormats
val parsedJson = parse(jsonStr).extract[Map[String, Any]]
so, with that, I get a structure like this one:
which is a map whose first value is a list of maps. I can evaluate parsedJson("dataflows") to get:
which is a list, as expected, but, then I cannot traverse such list, even though I need to in order to get to the sources, transformations and sinks. I have tried using the index of the listto, for example, get its first element, like this: parsedJson("dataflows")(0), but to no avail.
Can anyone please help me traverse this structure? Any help would be much appreciated.
Cheers,
When you evaluate parsedJson("dataflows") a Tuple2 is returned aka a Tuple which has two elements that are accessed with ._1 and ._2
So for dataflows(1)._1 the value returned is "sources" and dataflows(1)._2 is list of maps (List[Map[K,V]) which can be traversed like you would normally traverse elements of a List where each element is Map
Let's deconstruct this for example:
val dataFlowsZero = ("sources", List(Map(42 -> "foo"), Map(42 -> "bar")))
The first element in the Tuple
scala> dataFlowsZero._1
String = sources
The second element in the Tuple
scala> dataFlowsZero._2
List[Map[Int, String]] = List(Map(42 -> foo), Map(42 -> bar))`
Map the keys in each Map in List to a new List
scala> dataFlowsZero._2.map(m => m.keys)
List[Iterable[Int]] = List(Set(42), Set(42))
Map the values in each Map in the List to a new List
scala> dataFlowsZero._2.map(m => m.values)
List[Iterable[String]] = List(Iterable(foo), Iterable(bar))
The best solution is to convert the JSON to the full data structure that you have been provided rather than just Map[String, Any]. This makes it trivial to pick out the data that you want. For example,
val dataFlows = parse(jsonStr).extract[DataFlows]
case class DataFlows(dataflows: List[DataFlow])
case class DataFlow(name: String, sources: List[Source], transformations: List[Transformation], sinks: List[Sink])
case class Source(name: String, path: String, format: String)
case class Transformation(name: String, `type`: String, params: List[Param])
case class Param(input: String, validations: List[Validation])
case class Validation(field: String, validations: List[String])
case class Sink(input: String, name: String, paths: List[String], format: String, saveMode: String)
The idea is to make the JSON handler do most of the work to create a type-safe version of the original data.
I'm getting a JSON object over the network, as a String. I'm then using Circe to parse it. I want to add a handful of fields to it, and then pass it on downstream.
Almost all of that works.
The problem is that my "adding" is really "overwriting". That's actually ok, as long as I add an empty object first. How can I add such an empty object?
So looking at the code below, I am overwriting "sometimes_empty:{}" and it works. But because sometimes_empty is not always empty, it results in some data loss. I'd like to add a field like: "custom:{}" and then ovewrite the value of custom with my existing code.
Two StackOverflow posts were helpful. One worked, but wasn't quite what I was looking for. The other I couldn't get to work.
1: Modifying a JSON array in Scala with circe
2: Adding field to a json using Circe
val js: String = """
{
"id": "19",
"type": "Party",
"field": {
"id": 1482,
"name": "Anne Party",
"url": "https"
},
"sometimes_empty": {
},
"bool": true,
"timestamp": "2018-12-18T11:39:18Z"
}
"""
val newJson = parse(js).toOption
.flatMap { doc =>
doc.hcursor
.downField("sometimes_empty")
.withFocus(_ =>
Json.fromFields(
Seq(
("myUrl", Json.fromString(myUrl)),
("valueZ", Json.fromString(valueZ)),
("valueQ", Json.fromString(valueQ)),
("balloons", Json.fromString(balloons))
)
)
)
.top
}
newJson match {
case Some(v) => return v.toString
case None => println("Failure!")
}
We need to do a couple of things. First, we need to zoom in on the specific property we want to update, if it doesn't exist, we'll create a new empty one. Then, we turn the zoomed in property in the form of a Json into JsonObject in order to be able to modify it using the +: method. Once we've done that, we need to take the updated property and re-introduce it in the original parsed JSON to get the complete result:
import io.circe.{Json, JsonObject, parser}
import io.circe.syntax._
object JsonTest {
def main(args: Array[String]): Unit = {
val js: String =
"""
|{
| "id": "19",
| "type": "Party",
| "field": {
| "id": 1482,
| "name": "Anne Party",
| "url": "https"
| },
| "bool": true,
| "timestamp": "2018-12-18T11:39:18Z"
|}
""".stripMargin
val maybeAppendedJson =
for {
json <- parser.parse(js).toOption
sometimesEmpty <- json.hcursor
.downField("sometimes_empty")
.focus
.orElse(Option(Json.fromJsonObject(JsonObject.empty)))
jsonObject <- json.asObject
emptyFieldJson <- sometimesEmpty.asObject
appendedField = emptyFieldJson.+:("added", Json.fromBoolean(true))
res = jsonObject.+:("sometimes_empty", appendedField.asJson)
} yield res
maybeAppendedJson.foreach(obj => println(obj.asJson.spaces2))
}
}
Yields:
{
"id" : "19",
"type" : "Party",
"field" : {
"id" : 1482,
"name" : "Anne Party",
"url" : "https"
},
"sometimes_empty" : {
"added" : true,
"someProperty" : true
},
"bool" : true,
"timestamp" : "2018-12-18T11:39:18Z"
}
Suppose I want to decode some values from a JSON array into a case class with circe. The following works just fine:
scala> import io.circe.generic.auto._, io.circe.jawn.decode
import io.circe.generic.auto._
import io.circe.jawn.decode
scala> case class Foo(name: String)
defined class Foo
scala> val goodDoc = """[{ "name": "abc" }, { "name": "xyz" }]"""
goodDoc: String = [{ "name": "abc" }, { "name": "xyz" }]
scala> decode[List[Foo]](goodDoc)
res0: Either[io.circe.Error,List[Foo]] = Right(List(Foo(abc), Foo(xyz)))
It's sometimes the case that the JSON array I'm decoding contains other, non-Foo-shaped stuff, though, which results in a decoding error:
scala> val badDoc =
| """[{ "name": "abc" }, { "id": 1 }, true, "garbage", { "name": "xyz" }]"""
badDoc: String = [{ "name": "abc" }, { "id": 1 }, true, "garbage", { "name": "xyz" }]
scala> decode[List[Foo]](badDoc)
res1: Either[io.circe.Error,List[Foo]] = Left(DecodingFailure(Attempt to decode value on failed cursor, List(DownField(name), MoveRight, DownArray)))
How can I write a decoder that ignores anything in the array that can't be decoded into my case class?
The most straightforward way to solve this problem is to use a decoder that first tries to decode each value as a Foo, and then falls back to the identity decoder if the Foo decoder fails. The new either method in circe 0.9 makes the generic version of this practically a one-liner:
import io.circe.{ Decoder, Json }
def decodeListTolerantly[A: Decoder]: Decoder[List[A]] =
Decoder.decodeList(Decoder[A].either(Decoder[Json])).map(
_.flatMap(_.left.toOption)
)
It works like this:
scala> val myTolerantFooDecoder = decodeListTolerantly[Foo]
myTolerantFooDecoder: io.circe.Decoder[List[Foo]] = io.circe.Decoder$$anon$21#2b48626b
scala> decode(badDoc)(myTolerantFooDecoder)
res2: Either[io.circe.Error,List[Foo]] = Right(List(Foo(abc), Foo(xyz)))
To break down the steps:
Decoder.decodeList says "define a list decoder that tries to use the given decoder to decode each JSON array value".
Decoder[A].either(Decoder[Json] says "first try to decode the value as an A, and if that fails decode it as a Json value (which will always succeed), and return the result (if any) as a Either[A, Json]".
.map(_.flatMap(_.left.toOption)) says "take the resulting list of Either[A, Json] values and remove all the Rights".
…which does what we want in a fairly concise, compositional way. At some point we might want to bundle this up into a utility method in circe itself, but for now writing out this explicit version isn't too bad.
I have the following JSON file to be parsed into a case class:
{
"root": {
"nodes": [{
"id": "1",
"attributes": {
"name": "Node 1",
"size": "3"
}
},
{
"id": "2",
"attributes": {
"value": "4",
"name": "Node 2"
}
}
]
}
}
The problem is that the attributes could have any value inside it: name, size, value, anything ...
At this moment I have defined my case classes:
case class Attributes(
name: String,
size: String,
value: Sting
)
case class Nodes(
id: String,
attributes: Attributes
)
case class Root(
nodes: List[Nodes]
)
case class R00tJsonObject(
root: Root
)
Whats is the best way to deal with this scenario when I can receive any attribute ?
Currently I am using Json4s to handle son files.
Thanks!
Your attributes are arbitrarily many and differently named, but it seems you can store them in a Map[String, String] (at least, if those examples are anything to go by). In this case, using circe-parser (https://circe.github.io/circe/parsing.html), you could simply use code along these lines in order to convert your JSON directly into a simple case-class:
import io.circe._, io.circe.parser._
import io.circe.generic.semiauto._
case class Node(id: String, attributes: Map[String,String])
case class Root(nodes: List[Node])
implicit val nodeDecoder: Decoder[Node] = deriveDecoder[Node]
implicit val nodeEncoder: Encoder[Node] = deriveEncoder[Node]
implicit val rootDecoder: Decoder[Root] = deriveDecoder[Root]
implicit val rootEncoder: Encoder[Root] = deriveEncoder[Root]
def myParse(jsonString: String) = {
val res = parse(jsonString) match {
case Right(json) => {
val cursor = json.hcursor
cursor.get[Root]("root")
}
case _ => Left("Wrong JSON!")
}
println(res)
}
This snippet will print
Right(Root(List(Node(1,Map(name -> Node 1, size -> 3)), Node(2,Map(value -> 4, name -> Node 2)))))
on the console, for the JSON, you've given. (Assuming, the solution doesn't have to be in Json4s.)
I have an RDD of type RDD[(String, List[String])].
Example:
(FRUIT, List(Apple,Banana,Mango))
(VEGETABLE, List(Potato,Tomato))
I want to convert the above output to json object like below.
{
"categories": [
{
"name": "FRUIT",
"nodes": [
{
"name": "Apple",
"isInTopList": false
},
{
"name": "Banana",
"isInTopList": false
},
{
"name": "Mango",
"isInTopList": false
}
]
},
{
"name": "VEGETABLE",
"nodes": [
{
"name": "POTATO",
"isInTopList": false
},
{
"name": "TOMATO",
"isInTopList": false
},
]
}
]
}
Please suggest the best possible way to do it.
NOTE: "isInTopList": false is always constant and has to be there with every item in the jsonobject.
First I used the following code to reproduce the scenario that you mentioned:
val sampleArray = Array(
("FRUIT", List("Apple", "Banana", "Mango")),
("VEGETABLE", List("Potato", "Tomato")))
val sampleRdd = sc.parallelize(sampleArray)
sampleRdd.foreach(println) // Printing the result
Now, I am using json4s Scala library to convert this RDD into the JSON structure that you requested:
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL.WithDouble._
val json = "categories" -> sampleRdd.collect().toList.map{
case (name, nodes) =>
("name", name) ~
("nodes", nodes.map{
name => ("name", name)
})
}
println(compact(render(json))) // Printing the rendered JSON
The result is:
{"categories":[{"name":"FRUIT","nodes":[{"name":"Apple"},{"name":"Banana"},{"name":"Mango"}]},{"name":"VEGETABLE","nodes":[{"name":"Potato"},{"name":"Tomato"}]}]}
Since you want a single JSON for you entire RDD, I would start by doing Rdd.collect. Be careful that your set fits in memory, as this will move the data back to the driver.
To get the json, just use a library to traverse your objects. I like Json4s due to its simple internal structure and practical, clean operators. Here is a sample from their website that shows how to traverse nested structures (in particular, lists):
object JsonExample extends App {
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
case class Winner(id: Long, numbers: List[Int])
case class Lotto(id: Long, winningNumbers: List[Int], winners: List[Winner], drawDate: Option[java.util.Date])
val winners = List(Winner(23, List(2, 45, 34, 23, 3, 5)), Winner(54, List(52, 3, 12, 11, 18, 22)))
val lotto = Lotto(5, List(2, 45, 34, 23, 7, 5, 3), winners, None)
val json =
("lotto" ->
("lotto-id" -> lotto.id) ~
("winning-numbers" -> lotto.winningNumbers) ~
("draw-date" -> lotto.drawDate.map(_.toString)) ~
("winners" ->
lotto.winners.map { w =>
(("winner-id" -> w.id) ~
("numbers" -> w.numbers))}))
println(compact(render(json)))
}