I want to create an empty JValue to be able to parse JSON objects together.
As for now I am creating the JValue containing {} and then I am parsing the other objects to it and then in the end I remove the first row using an RDD, but I would like to create
var JValue: JValue = JValue.empty
from the beginning to be able to skip the removing part.
Is it possible to create an empty JValue?
import org.json4s._
import org.json4s.jackson.JsonMethods._
var JValue: JValue = parse("{}")
val a = parse(""" {"name":"Scott", "age":33} """)
val b = parse(""" {"name":"Scott", "location":"London"} """)
JValue = JValue.++(a)
JValue = JValue.++(b)
val df = spark.read.json(Seq(compact(render(JValue ))) toDS())
val rdd = df.rdd.first()
val removeFirstRow = df.rdd.filter(row => row != rdd)
val newDataFrame = spark.createDataFrame(removeFirstRow,df.schema)
If I understand correctly what you are trying to achieve, you can start from an empty array like so:
var JValue: JValue = JArray(List.empty)
Calling the ++ method on the empty array will result in the items being added to that array, as defined here.
The final result is the following object:
[ {
"name" : "Scott",
"age" : 33
}, {
"name" : "Scott",
"location" : "London"
} ]
If you want to play around with the resulting code, you can have a look at this worksheet on Scastie (please bear in mind that I did not pull in the Spark dependency there and I'm not 100% sure that would work anyway in Scastie).
As you can notice in the code I linked above, you can also just to a ++ b to obtain the same result, so you don't have to necessarily start from the empty array.
As a further note, you may want to rename JValue to something different to avoid weird errors in which you cannot tell apart the variable and the JValue type. Usually in Scala types are capitalized and variables are not. But of course, try to work towards the existing practices of your codebase.
Related
I have a JSON body in the following form:
val body =
{
"a": "hello",
"b": "goodbye"
}
I want to extract the VALUE of "a" (so I want "hello") and store that in a val.
I know I should use "parse" and "Extract" (eg. val parsedjson = parse(body).extract[String]) but I don't know how to use them to specifically extract the value of "a"
To use extract you need to create a class that matches the shape of the JSON that you are parsing. Here is an example using your input data:
val body ="""
{
"a": "hello",
"b": "goodbye"
}
"""
case class Body(a: String, b: String)
import org.json4s._
import org.json4s.jackson.JsonMethods._
implicit val formats = DefaultFormats
val b = Extraction.extract[Body](parse(body))
println(b.a) // hello
You'd have to use pattern matching/extractors:
val aOpt: List[String] = for {
JObject(map) <- parse(body)
JField("a", JString(value)) <- map
} yield value
alternatively use querying DSL
parse(body) \ "a" match {
case JString(value) => Some(value)
case _ => None
}
These are options as you have no guarantee that arbitrary JSON would contain field "a".
See documentation
extract would make sense if you were extracting whole JObject into a case class.
I have many very large json-objects that I return from Play Framework with Scala.
In most cases the user doesn't need all the data in the objects, only a few fields. So I want to pass in the paths I need (as query parameters), and return a subset of the json object.
I have looked at using JSON Transformers for this task.
Filter code
def filterByPaths(paths: List[JsPath], inputObject: JsObject) : JsObject = {
paths
.map(_.json.pick)
.map(inputObject.transform)
.filter(_.isSuccess)
.map { case JsSuccess(value, path) => (value, path) }
.foldLeft(Json.obj()) { (obj, jsValueAndPath) =>
val(jsValue, path) = jsValueAndPath
val transformer = __.json.update(path.json.put(jsValue))
obj.transform(transformer).get
}
}
Usage:
val input = Json.obj(
"field1" -> Json.obj(
"field2" -> "right result"
),
"field4" -> Json.obj(
"field5" -> "not included"
),
)
val result = filterByPaths(List(JsPath \ "field1" \ "field2"), input)
// {"field1":{"field2":"right result"}}
Problem
This code works fine for JsObjects. But I can't make it work if there are JsArrays in the strucure. I had hoped that my JsPath could contain an index to look up the field, but that's not the case. (Don't know why I assumed that, maybe my head was too far in the JavaScript-world)
So this would fail to return the first entry in the Array:
val input: JsObject = Json.parse("""
{
"arr1" : [{
"field1" : "value1"
}]
}
""").as[JsObject]
val result = filterByPaths(List(JsPath \ "arr1" \ "0"), input)
// {}
Question
My question is: How can I return a subset of a json structure that contains arrays?
Alternative solution
I have the data as a case class first, and I serialize it to Json, and then run filterByPaths on it. Having a Reader that only creates the json I need in the first place might be a better solution, but creating a Reader on the fly, with configuration from queryparams seamed a more difficult task, then just stripping down the json afterwards.
The example of the returning array element:
val input: JsValue = Json.parse("""
{
"arr1" : [{
"field1" : "value1"
}]
}
""")
val firstElement = (input \ "arr1" \ 0).get
val firstElementAnotherWay = input("arr1")(0)
More about this in the Play Framework documentation: https://www.playframework.com/documentation/2.6.x/ScalaJson
Update
It looks like you got the old issue RuntimeException: expected KeyPathNode. JsPath.json.put, JsPath.json.update can't past an object to a nesting array.
https://github.com/playframework/playframework/issues/943
https://github.com/playframework/play-json/issues/82
What you can do:
Use the JSZipper: https://github.com/mandubian/play-json-zipper
Create a script to update arrays "manually"
If you can afford it, strip array in a resulting object
Example of stripping array (point 3):
def filterByPaths(paths: List[JsPath], inputObject: JsObject) : JsObject = {
paths
.map(_.json.pick)
.map(inputObject.transform)
.filter(_.isSuccess)
.map { case JsSuccess(value, path) => (value, path)}
.foldLeft(Json.obj()) { (obj, jsValueAndPath) =>
val (jsValue, path) = jsValueAndPath
val arrayStrippedPath = JsPath(path.path.filter(n => !(n.toJsonString matches """\[\d+\]""")))
val transformer = __.json.update(arrayStrippedPath.json.put(jsValue))
obj.transform(transformer).get
}
}
val result = filterByPaths(List(JsPath \ "arr1" \ "0"), input)
// {"arr1":{"field1":"value1"}}
The example
The best to handle JSON objects is by using case classes and create implicit Reads and Writes, by that you can handle errors every fields directly. Don't make it complicated.
Don't use .get() much recommended to use .getOrElse() because scala is a type-safe programming language.
Don't just use any Libraries except you know the process behind it, much better to create your own parsing method with simplified solution to save memory.
I hope it will help you..
Is there a simple way to converting a given Row object to json?
Found this about converting a whole Dataframe to json output:
Spark Row to JSON
But I just want to convert a one Row to json.
Here is pseudo code for what I am trying to do.
More precisely I am reading json as input in a Dataframe.
I am producing a new output that is mainly based on columns, but with one json field for all the info that does not fit into the columns.
My question what is the easiest way to write this function: convertRowToJson()
def convertRowToJson(row: Row): String = ???
def transformVenueTry(row: Row): Try[Venue] = {
Try({
val name = row.getString(row.fieldIndex("name"))
val metadataRow = row.getStruct(row.fieldIndex("meta"))
val score: Double = calcScore(row)
val combinedRow: Row = metadataRow ++ ("score" -> score)
val jsonString: String = convertRowToJson(combinedRow)
Venue(name = name, json = jsonString)
})
}
Psidom's Solutions:
def convertRowToJSON(row: Row): String = {
val m = row.getValuesMap(row.schema.fieldNames)
JSONObject(m).toString()
}
only works if the Row only has one level not with nested Row. This is the schema:
StructType(
StructField(indicator,StringType,true),
StructField(range,
StructType(
StructField(currency_code,StringType,true),
StructField(maxrate,LongType,true),
StructField(minrate,LongType,true)),true))
Also tried Artem suggestion, but that did not compile:
def row2DataFrame(row: Row, sqlContext: SQLContext): DataFrame = {
val sparkContext = sqlContext.sparkContext
import sparkContext._
import sqlContext.implicits._
import sqlContext._
val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
val dataFrame = rowRDD.toDF() //XXX does not compile
dataFrame
}
You can use getValuesMap to convert the row object to a Map and then convert it JSON:
import scala.util.parsing.json.JSONObject
import org.apache.spark.sql._
val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")
val row = df.first() // this is an example row object
def convertRowToJSON(row: Row): String = {
val m = row.getValuesMap(row.schema.fieldNames)
JSONObject(m).toString()
}
convertRowToJSON(row)
// res46: String = {"A" : 1, "B" : 2, "C" : 3}
I need to read json input and produce json output.
Most fields are handled individually, but a few json sub objects need to just be preserved.
When Spark reads a dataframe it turns a record into a Row. The Row is a json like structure. That can be transformed and written out to json.
But I need to take some sub json structures out to a string to use as a new field.
This can be done like this:
dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))
location.address is the path to the sub json object of the incoming json based dataframe. address_json is the column name of that object converted to a string version of the json.
to_json is implemented in Spark 2.1.
If generating it output json using json4s address_json should be parsed to an AST representation otherwise the output json will have the address_json part escaped.
Pay attention scala class scala.util.parsing.json.JSONObject is deprecated and not support null values.
#deprecated("This class will be removed.", "2.11.0")
"JSONFormat.defaultFormat doesn't handle null values"
https://issues.scala-lang.org/browse/SI-5092
JSon has schema but Row doesn't have a schema, so you need to apply schema on Row & convert to JSon. Here is how you can do it.
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
def convertRowToJson(row: Row): String = {
val schema = StructType(
StructField("name", StringType, true) ::
StructField("meta", StringType, false) :: Nil)
return sqlContext.applySchema(row, schema).toJSON
}
Essentially, you can have a dataframe which contains just one row. Thus, you can try to filter your initial dataframe and then parse it to json.
I had the same issue, I had parquet files with canonical schema (no arrays), and I only want to get json events. I did as follows, and it seems to work just fine (Spark 2.1):
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{DataFrame, Dataset, Row}
import scala.util.parsing.json.JSONFormat.ValueFormatter
import scala.util.parsing.json.{JSONArray, JSONFormat, JSONObject}
def getValuesMap[T](row: Row, schema: StructType): Map[String,Any] = {
schema.fields.map {
field =>
try{
if (field.dataType.typeName.equals("struct")){
field.name -> getValuesMap(row.getAs[Row](field.name), field.dataType.asInstanceOf[StructType])
}else{
field.name -> row.getAs[T](field.name)
}
}catch {case e : Exception =>{field.name -> null.asInstanceOf[T]}}
}.filter(xy => xy._2 != null).toMap
}
def convertRowToJSON(row: Row, schema: StructType): JSONObject = {
val m: Map[String, Any] = getValuesMap(row, schema)
JSONObject(m)
}
//I guess since I am using Any and not nothing the regular ValueFormatter is not working, and I had to add case jmap : Map[String,Any] => JSONObject(jmap).toString(defaultFormatter)
val defaultFormatter : ValueFormatter = (x : Any) => x match {
case s : String => "\"" + JSONFormat.quoteString(s) + "\""
case jo : JSONObject => jo.toString(defaultFormatter)
case jmap : Map[String,Any] => JSONObject(jmap).toString(defaultFormatter)
case ja : JSONArray => ja.toString(defaultFormatter)
case other => other.toString
}
val someFile = "s3a://bucket/file"
val df: DataFrame = sqlContext.read.load(someFile)
val schema: StructType = df.schema
val jsons: Dataset[JSONObject] = df.map(row => convertRowToJSON(row, schema))
if you are iterating through an data frame , you can directly convert the data frame to a new dataframe with json object inside and iterate that
val df_json = df.toJSON
I combining the suggestion from: Artem, KiranM and Psidom. Did a lot of trails and error and came up with this solutions that I tested for nested structures:
def row2Json(row: Row, sqlContext: SQLContext): String = {
import sqlContext.implicits
val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
val dataframe = sqlContext.createDataFrame(rowRDD, row.schema)
dataframe.toJSON.first
}
This solution worked, but only while running in driver mode.
I want to count the language tags in Github repositories. I am using scalaj-http for that.
val response: HttpResponse[String] = Http("https://api.github.com/search/repositories?q=size:>=0").asString
val b = response.body,
val c = response.code,
val h = response.headers
I get back following:
b: String
c: Int
h: Map[String,String]
Body is returned as string. I want to now iterate over this body result to extract and further call a few nested URLs (you might get better idea of this if you see GET result of URL mentioned above).
Basically I want to call one of the URLs. How can I do this?
Something like this, where I have to work with json response, I used json4s and it path to extract the required field. Basically example code would be something like this
import org.json4s._
import org.json4s.native.JsonMethods._
val body= """ { "a" : { "b" : { "url" : "http://required.com" }}} """
val requiredUrl = (parse(body) \ "a" \"b" \ "url" ).values
If path matche point to more field in a list, you will get the results as list I think.
Would anyone please explain why the following happens?
scala> import play.api.libs.json._
scala> Json.toJson("""{"basic":"test"}""") // WORKS CORRECTLY
res134: play.api.libs.json.JsValue = "{\"basic\":\"test\"}"
scala> Json.toJson(""" {"basic":"test"} """) \ "basic" // ??? HOW COME?
res131: play.api.libs.json.JsValue = JsUndefined('basic' is undefined on object: " {\"basic\":\"test\"} ")
Many thanks
Json.toJson renders its argument as a JSON value using an implicitly provided Writes instance. If you give it a string, you'll get a JsString (typed as a JsValue). You want Json.parse, which parses its argument:
scala> Json.parse("""{"basic":"test"}""") \ "basic"
res0: play.api.libs.json.JsValue = "test"
As expected.
And to address your answer (which should be a comment or a new question, by the way), if you give toJson a value of some type A, it will convert it into a JSON value, assuming that there's an instance of the Writes type class in scope for that A. For example, the library provides Writes[String], Writes[Int], etc., so you can do the following:
scala> Json.prettyPrint(Json.toJson(1))
res11: String = 1
scala> Json.prettyPrint(Json.toJson("a"))
res12: String = "a"
scala> Json.prettyPrint(Json.toJson(List("a", "b")))
res13: String = [ "a", "b" ]
You can also create Writes instances for your own types (here I'm using Play's "JSON inception"):
case class Foo(i: Int, s: String)
implicit val fooWrites: Writes[Foo] = Json.writes[Foo]
And then:
scala> Json.prettyPrint(Json.toJson(Foo(123, "foo")))
res14: String =
{
"i" : 123,
"s" : "foo"
}
Using type classes to manage encoding and decoding is an alternative to reflection-based approaches, and it has a lot of advantages (but that's out of the scope of this question).
Turning my comment into an answer:
Json.toJson() does not create an object. It turns an object into a JSON string. What I think you're wanting is Json.parse(). Once you've parsed a JSON string, it's an object, and you can get to the properties.
Thanks a lot to both of you. So the following works as expected.
scala> Json.parse("""{"basic":"test"}""") \ "basic"
res137: play.api.libs.json.JsValue = "test"
I'd still like to understand what Json.toJson does. The docs state "Transform a stream of A to a stream of JsValue". Can anyone point out in what context this can be used?