I have an RDD of type RDD[(String, List[String])].
Example:
(FRUIT, List(Apple,Banana,Mango))
(VEGETABLE, List(Potato,Tomato))
I want to convert the above output to json object like below.
{
"categories": [
{
"name": "FRUIT",
"nodes": [
{
"name": "Apple",
"isInTopList": false
},
{
"name": "Banana",
"isInTopList": false
},
{
"name": "Mango",
"isInTopList": false
}
]
},
{
"name": "VEGETABLE",
"nodes": [
{
"name": "POTATO",
"isInTopList": false
},
{
"name": "TOMATO",
"isInTopList": false
},
]
}
]
}
Please suggest the best possible way to do it.
NOTE: "isInTopList": false is always constant and has to be there with every item in the jsonobject.
First I used the following code to reproduce the scenario that you mentioned:
val sampleArray = Array(
("FRUIT", List("Apple", "Banana", "Mango")),
("VEGETABLE", List("Potato", "Tomato")))
val sampleRdd = sc.parallelize(sampleArray)
sampleRdd.foreach(println) // Printing the result
Now, I am using json4s Scala library to convert this RDD into the JSON structure that you requested:
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL.WithDouble._
val json = "categories" -> sampleRdd.collect().toList.map{
case (name, nodes) =>
("name", name) ~
("nodes", nodes.map{
name => ("name", name)
})
}
println(compact(render(json))) // Printing the rendered JSON
The result is:
{"categories":[{"name":"FRUIT","nodes":[{"name":"Apple"},{"name":"Banana"},{"name":"Mango"}]},{"name":"VEGETABLE","nodes":[{"name":"Potato"},{"name":"Tomato"}]}]}
Since you want a single JSON for you entire RDD, I would start by doing Rdd.collect. Be careful that your set fits in memory, as this will move the data back to the driver.
To get the json, just use a library to traverse your objects. I like Json4s due to its simple internal structure and practical, clean operators. Here is a sample from their website that shows how to traverse nested structures (in particular, lists):
object JsonExample extends App {
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
case class Winner(id: Long, numbers: List[Int])
case class Lotto(id: Long, winningNumbers: List[Int], winners: List[Winner], drawDate: Option[java.util.Date])
val winners = List(Winner(23, List(2, 45, 34, 23, 3, 5)), Winner(54, List(52, 3, 12, 11, 18, 22)))
val lotto = Lotto(5, List(2, 45, 34, 23, 7, 5, 3), winners, None)
val json =
("lotto" ->
("lotto-id" -> lotto.id) ~
("winning-numbers" -> lotto.winningNumbers) ~
("draw-date" -> lotto.drawDate.map(_.toString)) ~
("winners" ->
lotto.winners.map { w =>
(("winner-id" -> w.id) ~
("numbers" -> w.numbers))}))
println(compact(render(json)))
}
Related
Suppose I have some JSON data like this:
{
"data": {
"title": "example input",
"someBoolean": false,
"innerData": {
"innerString": "input inner string",
"innerBoolean": true,
"innerCollection": [1,2,3,4,5]
},
"collection": [6,7,8,9,0]
}
}
And I want to flatten it a bit and transform or remove some fields, to get the following result:
{
"data": {
"ttl": "example input",
"bool": false,
"collection": [6,7,8,9,0],
"innerCollection": [1,2,3,4,5]
}
}
How can I do this with Circe?
(Note that I'm asking this as a FAQ since similar questions often come up in the Circe Gitter channel. This specific example is from a question asked there yesterday.)
I've sometimes said that Circe is primarily a library for encoding and decoding JSON, not for transforming JSON values, and in general I'd recommend mapping to Scala types and then defining relationships between those (as Andriy Plokhotnyuk suggests here), but for many cases writing transformations with cursors works just fine, and in my view this kind of thing is one of them.
Here's how I'd implement this transformation:
import io.circe.{DecodingFailure, Json, JsonObject}
import io.circe.syntax._
def transform(in: Json): Either[DecodingFailure, Json] = {
val someBoolean = in.hcursor.downField("data").downField("someBoolean")
val innerData = someBoolean.delete.downField("innerData")
for {
boolean <- someBoolean.as[Json]
collection <- innerData.get[Json]("innerCollection")
obj <- innerData.delete.up.as[JsonObject]
} yield Json.fromJsonObject(
obj.add("boolean", boolean).add("collection", collection)
)
}
And then:
val Right(json) = io.circe.jawn.parse(
"""{
"data": {
"title": "example input",
"someBoolean": false,
"innerData": {
"innerString": "input inner string",
"innerBoolean": true,
"innerCollection": [1,2,3]
},
"collection": [6,7,8]
}
}"""
)
And:
scala> transform(json)
res1: Either[io.circe.DecodingFailure,io.circe.Json] =
Right({
"data" : {
"title" : "example input",
"collection" : [
6,
7,
8
]
},
"boolean" : false,
"collection" : [
1,
2,
3
]
})
If you look at it the right way, our transform method kind of resembles a decoder, and we can actually write it as one (although I'd definitely recommend not making it implicit):
import io.circe.{Decoder, Json, JsonObject}
import io.circe.syntax._
val transformData: Decoder[Json] = { c =>
val someBoolean = c.downField("data").downField("someBoolean")
val innerData = someBoolean.delete.downField("innerData")
(
innerData.delete.up.as[JsonObject],
someBoolean.as[Json],
innerData.get[Json]("innerCollection")
).mapN(_.add("boolean", _).add("collection", _)).map(Json.fromJsonObject)
}
This can be convenient in some situations where you want to perform the transformation as part of a pipeline that expects a decoder:
scala> io.circe.jawn.decode(myJsonString)(transformData)
res2: Either[io.circe.Error,io.circe.Json] =
Right({
"data" : {
"title" : "example input",
"collection" : [ ...
This is also potentially confusing, though, and I've thought about adding some kind of Transformation type to Circe that would encapsulate transformations like this without questionably repurposing the Decoder type class.
One nice thing about both the transform method and this decoder is that if the input data doesn't have the expected shape, the resulting error will include a history that points to the problem.
I'm getting a JSON object over the network, as a String. I'm then using Circe to parse it. I want to add a handful of fields to it, and then pass it on downstream.
Almost all of that works.
The problem is that my "adding" is really "overwriting". That's actually ok, as long as I add an empty object first. How can I add such an empty object?
So looking at the code below, I am overwriting "sometimes_empty:{}" and it works. But because sometimes_empty is not always empty, it results in some data loss. I'd like to add a field like: "custom:{}" and then ovewrite the value of custom with my existing code.
Two StackOverflow posts were helpful. One worked, but wasn't quite what I was looking for. The other I couldn't get to work.
1: Modifying a JSON array in Scala with circe
2: Adding field to a json using Circe
val js: String = """
{
"id": "19",
"type": "Party",
"field": {
"id": 1482,
"name": "Anne Party",
"url": "https"
},
"sometimes_empty": {
},
"bool": true,
"timestamp": "2018-12-18T11:39:18Z"
}
"""
val newJson = parse(js).toOption
.flatMap { doc =>
doc.hcursor
.downField("sometimes_empty")
.withFocus(_ =>
Json.fromFields(
Seq(
("myUrl", Json.fromString(myUrl)),
("valueZ", Json.fromString(valueZ)),
("valueQ", Json.fromString(valueQ)),
("balloons", Json.fromString(balloons))
)
)
)
.top
}
newJson match {
case Some(v) => return v.toString
case None => println("Failure!")
}
We need to do a couple of things. First, we need to zoom in on the specific property we want to update, if it doesn't exist, we'll create a new empty one. Then, we turn the zoomed in property in the form of a Json into JsonObject in order to be able to modify it using the +: method. Once we've done that, we need to take the updated property and re-introduce it in the original parsed JSON to get the complete result:
import io.circe.{Json, JsonObject, parser}
import io.circe.syntax._
object JsonTest {
def main(args: Array[String]): Unit = {
val js: String =
"""
|{
| "id": "19",
| "type": "Party",
| "field": {
| "id": 1482,
| "name": "Anne Party",
| "url": "https"
| },
| "bool": true,
| "timestamp": "2018-12-18T11:39:18Z"
|}
""".stripMargin
val maybeAppendedJson =
for {
json <- parser.parse(js).toOption
sometimesEmpty <- json.hcursor
.downField("sometimes_empty")
.focus
.orElse(Option(Json.fromJsonObject(JsonObject.empty)))
jsonObject <- json.asObject
emptyFieldJson <- sometimesEmpty.asObject
appendedField = emptyFieldJson.+:("added", Json.fromBoolean(true))
res = jsonObject.+:("sometimes_empty", appendedField.asJson)
} yield res
maybeAppendedJson.foreach(obj => println(obj.asJson.spaces2))
}
}
Yields:
{
"id" : "19",
"type" : "Party",
"field" : {
"id" : 1482,
"name" : "Anne Party",
"url" : "https"
},
"sometimes_empty" : {
"added" : true,
"someProperty" : true
},
"bool" : true,
"timestamp" : "2018-12-18T11:39:18Z"
}
I'm trying to parse some JSON to kotlin objects. The JSON looks like:
{
data: [
{ "name": "aaa", "age": 11 },
{ "name": "bbb", "age": 22 },
],
otherdata : "don't need"
}
I just need to data part of the entire JSON, and parse each item to a User object:
data class User(name:String, age:Int)
But I can't find an easy way to do it.
Here's one way you can achieve this
import com.beust.klaxon.Klaxon
import java.io.StringReader
val json = """
{
"data": [
{ "name": "aaa", "age": 11 },
{ "name": "bbb", "age": 22 },
],
"otherdata" : "not needed"
}
""".trimIndent()
data class User(val name: String, val age: Int)
fun main(args: Array<String>) {
val klaxon = Klaxon()
val parsed = klaxon.parseJsonObject(StringReader(json))
val dataArray = parsed.array<Any>("data")
val users = dataArray?.let { klaxon.parseFromJsonArray<User>(it) }
println(users)
}
This will work as long as you can fit the whole json string in memory. Otherwise you may want to look into the streaming API: https://github.com/cbeust/klaxon#streaming-api
I have the following JSON file to be parsed into a case class:
{
"root": {
"nodes": [{
"id": "1",
"attributes": {
"name": "Node 1",
"size": "3"
}
},
{
"id": "2",
"attributes": {
"value": "4",
"name": "Node 2"
}
}
]
}
}
The problem is that the attributes could have any value inside it: name, size, value, anything ...
At this moment I have defined my case classes:
case class Attributes(
name: String,
size: String,
value: Sting
)
case class Nodes(
id: String,
attributes: Attributes
)
case class Root(
nodes: List[Nodes]
)
case class R00tJsonObject(
root: Root
)
Whats is the best way to deal with this scenario when I can receive any attribute ?
Currently I am using Json4s to handle son files.
Thanks!
Your attributes are arbitrarily many and differently named, but it seems you can store them in a Map[String, String] (at least, if those examples are anything to go by). In this case, using circe-parser (https://circe.github.io/circe/parsing.html), you could simply use code along these lines in order to convert your JSON directly into a simple case-class:
import io.circe._, io.circe.parser._
import io.circe.generic.semiauto._
case class Node(id: String, attributes: Map[String,String])
case class Root(nodes: List[Node])
implicit val nodeDecoder: Decoder[Node] = deriveDecoder[Node]
implicit val nodeEncoder: Encoder[Node] = deriveEncoder[Node]
implicit val rootDecoder: Decoder[Root] = deriveDecoder[Root]
implicit val rootEncoder: Encoder[Root] = deriveEncoder[Root]
def myParse(jsonString: String) = {
val res = parse(jsonString) match {
case Right(json) => {
val cursor = json.hcursor
cursor.get[Root]("root")
}
case _ => Left("Wrong JSON!")
}
println(res)
}
This snippet will print
Right(Root(List(Node(1,Map(name -> Node 1, size -> 3)), Node(2,Map(value -> 4, name -> Node 2)))))
on the console, for the JSON, you've given. (Assuming, the solution doesn't have to be in Json4s.)
im trying to extract my data from json into a case class without success.
the Json file:
[
{
"name": "bb",
"loc": "sss",
"elements": [
{
"name": "name1",
"loc": "firstHere",
"elements": []
}
]
},
{
"name": "ca",
"loc": "sss",
"elements": []
}
]
my code :
case class ElementContainer(name : String, location : String,elements : Seq[ElementContainer])
object elementsFormatter {
implicit val elementFormatter = Json.format[ElementContainer]
}
object Applicationss extends App {
val el = new ElementContainer("name1", "firstHere", Seq.empty)
val el1Cont = new ElementContainer("bb","sss", Seq(el))
val source:String=Source.fromFile("src/bin/elementsTree.json").getLines.mkString
val jsonFormat = Json.parse(source)
val r1= Json.fromJson[ElementContainer](jsonFormat)
}
after running this im getting inside r1:
JsError(List((/elements,List(ValidationError(List(error.path.missing),WrappedArray()))), (/name,List(ValidationError(List(error.path.missing),WrappedArray()))), (/location,List(ValidationError(List(error.path.missing),WrappedArray())))))
been trying to extract this data forever, please advise
You have location instead loc and, you'll need to parse file into a Seq[ElementContainer], since it's an array, not a single ElementContainer:
Json.fromJson[Seq[ElementContainer]](jsonFormat)
Also, you have the validate method that will return you either errors or parsed json object..