spark streaming JSON value in dataframe column scala - json

I have a text file with json value. and this gets read into a DF
{"name":"Michael"}
{"name":"Andy", "age":30}
I want to infer the schema dynamically for each line while Streaming and store it in separate locations(tables) depending on its schema.
unfortunately while I try to read the value.schema it still shows as String. Please help on how to do it on Streaming as RDD is not allowed in streaming.
I wanted to use the following code which doesnt work as the value is still read as String format.
val jsonSchema = newdf1.select("value").as[String].schema
val df1 = newdf1.select(from_json($"value", jsonSchema).alias("value_new"))
val df2 = df1.select("value_new.*")
I even tried to use,
schema_of_json("json_schema"))
val jsonSchema: String = newdf.select(schema_of_json(col("value".toString))).as[String].first()
still no hope.. Please help..

You can load the data as textFile, create case class for person and parse every json string to Person instance using json4s or gson, then creating the Dataframe as follows:
case class Person(name: String, age: Int)
val jsons = spark.read.textFile("/my/input")
val persons = jsons.map{json => toPerson(json) //instead of 'toPerson' actually parse with json4s or gson to return Person instance}
val df = sqlContext.createDataFrame(persons)
Deserialize json to case class using json4s:
https://commitlogs.com/2017/01/14/serialize-deserialize-json-with-json4s-in-scala/
Deserialize json to case class using gson:
https://alvinalexander.com/source-code/scala/scala-case-class-gson-json-object-deserialization-and-scalatra

Related

Is there a way to remove a sub-attribute from a json column through spark sql

I have a table in which there is column with nested json data. Need to remove a attribute from that json through spark sql
Checked on basic spark json function but not getting a way to do it
Assuming you read in a JSON file and print the schema you are showing us like this:
val df = sqlContext.read().json("/path/to/file").toDF();
df.registerTempTable("df");
df.printSchema();
Then you can select nested objects inside a struct type like so...
val app = df.select("app");
app.registerTempTable("app");
app.printSchema();
app.show();
val appName = app.select("element.appName");
appName.registerTempTable("appName");
appName.printSchema();
appName.show();
val trimmedDF = appName.drop("firstname")

Scala get JSON value in a specific data type on map object

Using jackson library I read json data from a file (each row of file is a JSON object) an parse it to a map object of String and Any. My goal is to save specified keys (id and text) to a collection.
val input = scala.io.Source.fromFile("data.json").getLines()
val mapper = new ObjectMapper() with DefaultScalaModule
val data_collection = mutable.HashMap.empty[Int, String]
for (i <- input){
val parsedJson = mapper.readValue[Map[String, Any]](i)
data_collection.put(
parsedJson.get("id"),
parsedJson.get("text")
)
But as the values in the parsedJson map have the Any type, getting some keys like id and text, it returns Some(value) not just the value with the appropriate type. I expect the values for the id key to be Integer and values for the text to be String.
Running the code I got the error:
Error:(31, 23) type mismatch;
found : Option[Any]
required: Int
parsedJson.get("id"),
Here is a sample of JSON data in the file:
{"text": "Hello How are you", "id": 1}
Is it possible in Scala to parse id values to Int and text values to String, or at least convert Some(value) to value with type Int or String?
If you want to get a plain value from a Map instead of a Option you can use the () (apply) method - However it will throw an exception if the key is not found.
Second, Scala type system is static not dynamic, if you have an Any that's it, it won't change to Int or String at runtime, and the compiler will fail - Nevertheless, you can cast them using the asInstanceOf[T] method, but again if type can't be casted to the target type it will throw an exception.
Please note that even if you can make your code work with the above tricks, that code wouldn't be what you would expect in Scala. There are ways to make the code more typesafe (like pattern matching), but parsing a Json to a typesafe object is an old problem, I'm sure jackson provides a way to parse a json into case class that represent your data. If not take a look to circe it does.
Try the below code :
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import
com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
val input = scala.io.Source.fromFile("data.json").getLines()
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
val obj = mapper.readValue[Map[String, Any]](input)
val data_collection = mutable.HashMap.empty[Int, String]
for (i <- c) {
data_collection.put(
obj.get("id").fold(0)(_.toString.toInt),
obj.get("text").fold("")(_.toString)
)
}
println(data_collection) // Map(1 -> Hello How are you)

Extract a Json from an array inside a json in spark

I have a complicated JSON column whose structure is :
story{
cards: [{story-elements: [{...}{...}{...}}]}
The length of the story-elements is variable. I need to extract a particular JSON block from the story-elements array. For this, I first need to extract the story-elements.
Here is the code which I have tried, but it is giving error:
import org.json4s.{DefaultFormats, MappingException}
import org.json4s.jackson.JsonMethods._
import org.apache.spark.sql.functions._
def getJsonContent(jsonstring: String): (String) = {
implicit val formats = DefaultFormats
val parsedJson = parse(jsonstring)
val value1 = (parsedJson\"cards"\"story-elements").extract[String]
value1
}
val getJsonContentUDF = udf((jsonstring: String) =>
getJsonContent(jsonstring))
input.withColumn("cards",getJsonContentUDF(input("storyDataFrame")))
According to json you provided, story-elements is a an array of json objects, but you trying to extract array as a string ((parsedJson\"cards"\"story-elements").extract[String]).
You can create case class representing on story (like case class Story(description: String, pageUrl: String, ...)) and then instead of extract[String], try extract[List[Story]] or extract[Array[Story]]
If you need just one piece of data from story (e.g. descrition), then you can use xpath-like syntax to get that and then extract List[String]

Not able to convert Scala object to JSON String

I am using Play Framework and I am trying to convert a Scala object to a JSON string.
Here is my code where I get my object:
val profile: Future[List[Profile]] = profiledao.getprofile(profileId);
The object is now in the profile value.
Now I want to convert that profile object which is a Future[List[Profile]] to JSON data and then convert that data into a JSON string then write into a file.
Here is the code that I wrote so far:
val jsondata = Json.toJson(profile)
Jackson.toJsonString(jsondata)
This is how I am trying to convert into JSON data but it is giving me the following output:
{"empty":false,"traversableAgain":true}
I am using the Jackson library to do the conversion.
Can someone help me with this ?
Why bother with Jackson? If you're using Play, you have play-json available to you, which uses Jackson under the hood FWIW:
First, you need an implicit Reads to let play-json know how to serialize Profile. If Profile is a case class, you can do this:
import play.api.libs.json._
implicit val profileFormat = Json.format[Profile]
If not, define your own Reads like this.
Then since getprofile (which should follow convention and be getProfile) returns Future[List[Profile]], you can do this to get a JsValue:
val profilesJson = profiledao.getprofile(profileId).map(toJson)
(profiledao should also be profileDao.)
In the end, you can wrap this in a Result like Ok and return that from your controller.

Convert scala json string to an object

I need to convert a string json in an object using scala.
I tried this, but I cannot access the prop Name, for example
import scala.util.parsing.json._
val parsed = JSON.parseFull("""{"Name":"abc", "age":10}""")
How can I do to get an string json and convert to an object? Thanks
val parsed = JSON.parseFull("""{"Name":"abc", "age":10}""")
val parsed = JSON.parseFull(parsed)
try it