How to convert this JSON to Dataframe in scala
json_string = """{
"module": {
"col1": "a",
"col2": {
"5": 1,
"3": 4,
"4": {
"numeric reasoning": 2,
"verbal": 4
},
"7": {
"landline": 2,
"landLine": 4
}
}
}
}"""
Function I use -
val jsonRDD = spark.parallelize(json_string::Nil)
val jsonDF = sqlContext.read.json(jsonRDD)
val df = flattenRecursive(jsonDF)
df.show()
def flattenRecursive(df: DataFrame): DataFrame = {
val fields = df.schema.fields
val fieldNames = fields.map(x => x.name)
val length = fields.length
for(i <- 0 to fields.length-1){
val field = fields(i)
val fieldtype = field.dataType
val fieldName = field.name
fieldtype match {
case arrayType: ArrayType =>
println("flatten array")
val newfieldNames = fieldNames.filter(_!=fieldName) ++ Array("explode_outer(".concat(fieldName).concat(") as ").concat(fieldName))
val explodedDf = df.selectExpr(newfieldNames:_*)
return flattenRecursive(explodedDf)
case structType: StructType =>
println("flatten struct")
val newfieldNames = fieldNames.filter(_!= fieldName) ++ structType.fieldNames.map(childname => fieldName.concat(".").concat(childname) .concat(" as ").concat(fieldName).concat("_").concat(childname))
val explodedf = df.selectExpr(newfieldNames:_*)
return flattenRecursive(explodedf)
case _ =>
println("other type")
}
}
df
}
Error I face is -
Ambiguous reference to fields StructField(landLine,LongType,true), StructField(landline,LongType,true);
Required output - if we can edit 1 landline column to landline_1 before explode
**Note - please provide the generic code because
I don't know on which level I face this ambiguity and also
I don't know the schema while running the code**
Here is my json:
{
"stringField" : "whatever",
"nestedObject": { "someProperty": "someValue"}
}
I want to map it to
case class MyClass(stringField: String, nestedObject:String)
nestedObject should not be deserialized, I want json4s to leave it as string.
resulting instance shouldBe:
val instance = MyClass(stringField="whatever", nestedObject= """ { "someProperty": "someValue"} """)
Don't understand how to do it in json4s.
You can define a custom serializer:
case object MyClassSerializer extends CustomSerializer[MyClass](f => ( {
case jsonObj =>
implicit val format = org.json4s.DefaultFormats
val stringField = (jsonObj \ "stringField").extract[String]
val nestedObject = compact(render(jsonObj \ "nestedObject"))
MyClass(stringField, nestedObject)
}, {
case myClass: MyClass =>
("stringField" -> myClass.stringField) ~
("nestedObject" -> myClass.nestedObject)
}
))
Then add it to the default formatter:
implicit val format = org.json4s.DefaultFormats + MyClassSerializer
println(parse(jsonString).extract[MyClass])
will output:
MyClass(whatever,{"someProperty":"someValue"})
Code run at Scastie
I am using akka with spray json support for which I need to edit value in the recieved json.
import akka.http.scaladsl.server.Directives
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport
import spray.json._
final case class Item(name: String, id: Long)
final case class Order(items: List[Item],orderTag:String)
trait JsonSupport extends SprayJsonSupport with DefaultJsonProtocol {
implicit val itemFormat = jsonFormat2(Item)
implicit val orderFormat = jsonFormat2(Order)
}
In my use case I recieve the json with orderTag value as null, all I need to do is edit the orderTag value with and then use it as entity value.Is it possible to write/edit jsonObject and How to do that ?
class MyJsonService extends Directives with JsonSupport {
// format: OFF
val route =
get {
pathSingleSlash {
complete(Item("thing", 42)) // will render as JSON
}
} ~
post {
entity(as[Order]) { order => // will unmarshal JSON to Order
val itemsCount = order.items.size
val itemNames = order.items.map(_.name).mkString(", ")
complete(s"Ordered $itemsCount items: $itemNames")
}
}
}
You can just edit the json AST like ..
val json = """{"orderTag":null}"""
val jsVal = json.parseJson
val updatedJs = if (jsObj.fields.get("orderTag") == Some(JsNull)) {
JsObject(jsObj.fields + ("orderTag" -> JsString("new tag")))
} else {
jsObj
}
updatedJs.compactPrint
res26: String = """
{"orderTag":"new tag"}
"""
Origin
{
"first_name" : "foo",
"last_name" : "bar",
"parent" : {
"first_name" : "baz",
"last_name" : "bazz",
}
}
Expected
{
"firstName" : "foo",
"lastName" : "bar",
"parent" : {
"firstName" : "baz",
"lastName" : "bazz",
}
}
How can I transform all keys of json objects ?
Here's how I'd write this. It's not as concise as I'd like, but it's not terrible:
import cats.free.Trampoline
import cats.std.list._
import cats.syntax.traverse._
import io.circe.{ Json, JsonObject }
/**
* Helper method that transforms a single layer.
*/
def transformObjectKeys(obj: JsonObject, f: String => String): JsonObject =
JsonObject.fromIterable(
obj.toList.map {
case (k, v) => f(k) -> v
}
)
def transformKeys(json: Json, f: String => String): Trampoline[Json] =
json.arrayOrObject(
Trampoline.done(json),
_.traverse(j => Trampoline.suspend(transformKeys(j, f))).map(Json.fromValues),
transformObjectKeys(_, f).traverse(obj => Trampoline.suspend(transformKeys(obj, f))).map(Json.fromJsonObject)
)
And then:
import io.circe.literal._
val doc = json"""
{
"first_name" : "foo",
"last_name" : "bar",
"parent" : {
"first_name" : "baz",
"last_name" : "bazz"
}
}
"""
def sc2cc(in: String) = "_([a-z\\d])".r.replaceAllIn(in, _.group(1).toUpperCase)
And finally:
scala> import cats.std.function._
import cats.std.function._
scala> transformKeys(doc, sc2cc).run
res0: io.circe.Json =
{
"firstName" : "foo",
"lastName" : "bar",
"parent" : {
"firstName" : "baz",
"lastName" : "bazz"
}
}
We probably should have some way of recursively applying a Json => F[Json] transformation like this more conveniently.
Depending on your full use-case, with the latest Circe you might prefer just leveraging the existing decoder/encoder for converting between camel/snake according to these references:
https://dzone.com/articles/5-useful-circe-feature-you-may-have-overlooked
https://github.com/circe/circe/issues/663
For instance, in my particular use-case this makes sense because I'm doing other operations that benefit from the type-safety of first deserializing into case classes. So if you're willing to decode the JSON into a case class, and then encode it back into JSON, all you would need is for your (de)serializing code to extend a trait that configures this, like:
import io.circe.derivation._
import io.circe.{Decoder, Encoder, ObjectEncoder, derivation}
import io.circe.generic.auto._
import io.circe.parser.decode
import io.circe.syntax._
trait JsonSnakeParsing {
implicit val myCustomDecoder: Decoder[MyCaseClass] = deriveDecoder[MyCaseClass](io.circe.derivation.renaming.snakeCase)
// only needed if you want to serialize back to snake case json:
// implicit val myCustomEncoder: ObjectEncoder[MyCaseClass] = deriveEncoder[MyCaseClass](io.circe.derivation.renaming.snakeCase)
}
For example, I then extend that when I actually parse or output the JSON:
trait Parsing extends JsonSnakeParsing {
val result: MyCaseClass = decode[MyCaseClass](scala.io.Source.fromResource("my.json").mkString) match {
case Left(jsonError) => throw new Exception(jsonError)
case Right(source) => source
}
val theJson = result.asJson
}
For this example, your case class might look like:
case class MyCaseClass(firstName: String, lastName: String, parent: MyCaseClass)
Here's my full list of circe dependencies for this example:
val circeVersion = "0.10.0-M1"
"io.circe" %% "circe-generic" % circeVersion,
"io.circe" %% "circe-parser" % circeVersion,
"io.circe" %% "circe-generic-extras" % circeVersion,
"io.circe" %% "circe-derivation" % "0.9.0-M5",
def transformKeys(json: Json, f: String => String): TailRec[Json] = {
if(json.isObject) {
val obj = json.asObject.get
val fields = obj.toList.foldLeft(done(List.empty[(String, Json)])) { (r, kv) =>
val (k, v) = kv
for {
fs <- r
fv <- tailcall(transformKeys(v, f))
} yield fs :+ (f(k) -> fv)
}
fields.map(fs => Json.obj(fs: _*))
} else if(json.isArray) {
val arr = json.asArray.get
val vsRec = arr.foldLeft(done(List.empty[Json])) { (vs, v) =>
for {
s <- vs
e <- tailcall(transformKeys(v, f))
} yield s :+ e
}
vsRec.map(vs => Json.arr(vs: _*))
} else {
done(json)
}
}
Currently I do transform like this, but is rather complicated, hope there is a simple way.
I took #Travis answer and modernized it a bit, I took his code and I had several error and warnings, so the updated version for Scala 2.12 with Cats 1.0.0-MF:
import io.circe.literal._
import cats.free.Trampoline, cats.instances.list._, cats.instances.function._, cats.syntax.traverse._, cats.instances.option._
def transformKeys(json: Json, f: String => String): Trampoline[Json] = {
def transformObjectKeys(obj: JsonObject, f: String => String): JsonObject =
JsonObject.fromIterable(
obj.toList.map {
case (k, v) => f(k) -> v
}
)
json.arrayOrObject(
Trampoline.done(json),
_.toList.traverse(j => Trampoline.defer(transformKeys(j, f))).map(Json.fromValues(_)),
transformObjectKeys(_, f).traverse(obj => Trampoline.defer(transformKeys(obj, f))).map(Json.fromJsonObject)
)
}
def sc2cc(in: String) = "_([a-z\\d])".r.replaceAllIn(in, _.group(1).toUpperCase)
def camelizeKeys(json: io.circe.Json) = transformKeys(json, sc2cc).run
how do one update/create field in JSON object with arbitrary schema and write it back as JSON in Scala?
I tried with spray-json with something like that:
import spray.json._
import DefaultJsonProtocol._
val jsonAst = """{"anyfield":"1234", "sought_optional_field":5.0}""".parse
val newValue = jsonAst.asJsObject.fields.getOrElse("sought_optional_field", 1)
val newMap = jsonAst.asJsObject.fields + ("sought_optional_field" -> newValue)
JSONObject(newMap).toJson
but it gives weird result: "{"anyfield"[ : "1234", "sought_optional_field" : ]1}
You were almost there :
import spray.json._
import DefaultJsonProtocol._
def changeField(json: String) = {
val jsonAst = JsonParser(json)
val map = jsonAst.asJsObject.fields
val sought = map.getOrElse("sought_optional_field", 1.toJson)
map.updated("sought_optional_field", sought).toJson
}
val jsonA = """{"anyfield":"1234", "sought_optional_field":5.0}"""
val jsonB = """{"anyfield":"1234"}"""
changeField(jsonA)
// spray.json.JsValue = {"anyfield":"1234","sought_optional_field":5.0}
changeField(jsonB)
// spray.json.JsValue = {"anyfield":"1234","sought_optional_field":1}
Using Argonaut:
import argonaut._, Argonaut._
def changeField2(json: String) =
json.parseOption.map( parsed =>
parsed.withObject(o =>
o + ("sought_optional_field", o("sought_optional_field").getOrElse(jNumber(1)))
)
)
changeField2(jsonA).map(_.nospaces)
// Option[String] = Some({"anyfield":"1234","sought_optional_field":5})
changeField2(jsonB).map(_.nospaces)
// Option[String] = Some({"anyfield":"1234","sought_optional_field":1})