scala - How to parse JSON string into separate records?

scala - How to parse JSON string into separate records? - json

I wrote a method to concatenate JSON values.
def mergeSales(storeJValue: JValue): String = {
val salesJValue: JValue = parse(rawJson)
val store = compact(render(storeJValue))
val sales = compact(render(salesJValue))
val mergedSales: String = s"""{"store":$store,"sales":$sales}"""
mergedSales
}
As a result I'm getting strings like this, a store with an array of corresponding sales:
{"store":{"store_id":"01","name":"Store_1"}, "sales":[{"saleId": 10, "name": "New name1", "saleType": "New Type1"}, {"saleId": 20, "name": "Some name1", "saleType": "SomeType5"}, {"saleId": 30, "name": "Some name3", "saleType": "SomeType3"}]}
How should I parse it to get a list of records where the same store is mapped to each sale from the array? I want it to look like this:
{"store":{"store_id":"01","name":"Store_1"}, "sale":{"saleId": 10, "name": "New name1", "saleType": "New Type1"}}
{"store":{"store_id":"01","name":"Store_1"}, "sale":{"saleId": 20, "name": "New name2", "saleType": "New Type2"}}
{"store":{"store_id":"01","name":"Store_1"}, "sale":{"saleId": 30, "name": "Some name3", "saleType": "SomeType3"}}
Sales have a huge amount of fields in reality, so creating a case class will be rather complex.

i think best way to use json4s API which will extract all your json code and convert it into map than you can easily traverse
you required to create case class :
case class Store(store_id: String, name: String)
case class Sale(saleId:String, name:String, saleType:String)
case class Result(store: Store, sale: Sale)
case class SaleStore(store: Store, sales: List[Sale])
then it is very straight forward to get solution using json4s
val str =
"""{
| "store": {
| "store_id": "01",
| "name": "Store_1"
| },
| "sales": [
| {
| "saleId": 10,
| "name": "New name1",
| "saleType": "New Type1"
| },
| {
| "saleId": 20,
| "name": "Some name1",
| "saleType": "SomeType5"
| },
| {
| "saleId": 30,
| "name": "Some name3",
| "saleType": "SomeType3"
| }
| ]
|}""".stripMargin
import org.json4s._
import org.json4s.jackson.JsonMethods._
implicit val formats = org.json4s.DefaultFormats
val saleStore = parse(str).extract[SaleStore]
val result = saleStore.sales.flatMap(sale => List(saleStore.store -> sale))
val mapper: ObjectMapper = new ObjectMapper()
mapper.registerModule(DefaultScalaModule)
result.map(r => mapper.writeValueAsString(Result(r._1, r._2))).foreach(println)
output:
{"store":{"store_id":"01","name":"Store_1"},"sale":{"saleId":"10","name":"New name1","saleType":"New Type1"}}
{"store":{"store_id":"01","name":"Store_1"},"sale":{"saleId":"20","name":"Some name1","saleType":"SomeType5"}}
{"store":{"store_id":"01","name":"Store_1"},"sale":{"saleId":"30","name":"Some name3","saleType":"SomeType3"}}

Related

Scala: Map part of Json to Object with playframework

I'm relatively new to Scala. I would like to map part of my Json to my Object. Code looks like this:
def seasons = (json \\ "season")
case class:
case class Season(startDate: LocalDate, endDate: LocalDate)
json-structure:
[
{
"id": "",
"name": "",
"season": {
"start": "0",
"end": "0"
}
}
]
I would somehow like to end up with a List[Season], so I can loop through it.
Question #2
json-structure:
[
{
"id": "",
"name": "",
"season": {
"start": "0",
"end": "0"
}
},
{
"id": "",
"name": "",
"season": {
"start": "0",
"end": "0"
}
}...
]
Json (which is a JsValue btw) brings multiple regions as can be seen above. Case classed are provided (Region holds a Season), naming is the same as in json.
Formats look like this:
implicit val seasonFormat: Format[Season] = Json.format[Season]
implicit val regionFormat: Format[Region] = Json.format[Region]
So what would I need to call in order to get a List[Region]? I thought of something like regionsJson.as[List[Region]] as I defined the Format, which provides me the Read/Write possibilities. But unfortunately, it's not working.
What is the best way doing this? I've tried it with an JsArray, but I have difficulties with mapping it...
Any input would be much appreciated!

I've added some changes to your original case class and renamed its fields to match json fields.
The following code does parsing of the json into Seq[Session]
import java.time.LocalDate
import play.api.libs.json._
case class Season(start: LocalDate, end: LocalDate)
implicit val sessionFormat: Format[Season] = Json.format[Season]
val json =
"""
|[
| {
| "id": "",
| "name": "",
| "season": {
| "start": "2020-10-20",
| "end": "2020-10-22"
| }
| }
|]
|""".stripMargin
val seasonsJson: collection.Seq[JsValue] = Json.parse(json) \\ "season"
val seasons: collection.Seq[Season] = seasonsJson.map(_.as[Season])
seasons.foreach(println)
Please note that I changed the data of your json and instead of 0, which is not a valid date, I provided dates in iso format yyyy-mm-dd.
The above code works with play-json version 2.9.0.
---UPDATE---
Following up comment by #cchantep.
Method as will produce an exception if json cannot be mapped in case class, a non-exception option is to use asOpt that does not throw an exception but returns a None if mapping is not possible.

deserializing json4s with generic type

Update: Looked closer into the rest of my code and I had an issue elsewhere which is why it was not working. Thanks
I wanted to know if one can use json4 serializer to deserialize and object that uses generic.
My json data has similar traits with different information for one part
For example, I have Superhero and who has skills different
*Json Data
{
"type": "Blue",
"name": "Aquaman",
"age": "4",
"skills": {
"Cooking": 9,
"Swimming": 4
}
}
{
"type": "Red",
"name": "Flash",
"age": "8",
"skills": {
"Speed": 9,
"Punctual": 10
}
}
So what I wanted to do was
case class Superhero[T](
`type`: String,
name: String,
age: Int,
skills: T
)
and the respective skill case class
case class BlueSkill(
Cooking: Int,
Swimming: Int
)
case class RedSkill(
Speed: Int,
Punctual: Int
)
but when I read and try to map it to another object I get null in my dataframe.
val bluePerson = read[Superhero[BlueSkill]](jsonBody)
So wanted to know if reading generic object is possible with json4.

Sure it can be done, why would it work any differently from non-generic types?
import org.json4s.native.{Serialization => S}
import org.json4s.DefaultFormats
implicit val fmts = DefaultFormats
S.read[Superhero[RedSkill]]("""|{
| "type": "Red",
| "name": "Flash",
| "age": 8,
| "skills": {
| "Speed": 9,
| "Punctual": 10
| }
|}""".stripMargin)
But frankly, I'd stay away from json4s or any other introspection nonsense and use a typeclass-based library such as circe instead.

How to Parse Nested Json using Spark and Scala?

Visitors of an eCommerce site browse multiple products during their visit. All visit data of a visitor is consolidated in a JSON document containing vistor Id and a list of product Ids, along with an interest attribute containing value of interest expressed by visitor in a product. Here are two example records - rec1 and rec2 containing visit data of two visitors v1 and v2:
val rec1: String = """{
"visitorId": "v1",
"products": [{
"id": "i1",
"interest": 0.68
}, {
"id": "i2",
"interest": 0.42
}]
}"""
val rec2: String = """{
"visitorId": "v2",
"products": [{
"id": "i1",
"interest": 0.78
}, {
"id": "i3",
"interest": 0.11
}]
}"""
val visitsData: Seq[String] = Seq(rec1, rec2)
val productIdToNameMap = Map("i1" -> "Nike Shoes", "i2" -> "Umbrella", "i3" -> "Jeans")
Given the collection of records (visitsData) and a map (productIdToNameMap) of product Ids and their names:
Write the code to enrich every record contained in visitsData with the name of the product. The output should be another sequence with all the original JSON documents enriched with product name. Here is the example output.
val output: Seq[String] = Seq(enrichedRec1, enrichedRec1)
where enrichedRec1 has value -
"""{
"visitorId": "v1",
"products": [{
"id": "i1",
"name": "Nike Shoes",
"interest": 0.68
}, {
"id": "i2",
"name": "Umbrella",
"interest": 0.42
}]
}"""
And enrichedRec2 has value -
"""{
"visitorId": "v2",
"products": [{
"id": "i1",
"name": "Nike Shoes",
"interest": 0.78
}, {
"id": "i3",
"name": "Jeans",
"interest": 0.11
}]
}"""

This is the way to do enrichment of the json
package com.examples
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.functions.{col, explode}
import org.apache.spark.sql.{DataFrame, SparkSession}
object EnrichJson extends App {
private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()
Logger.getLogger("org").setLevel(Level.WARN)
spark.sparkContext.setLogLevel("ERROR")
import spark.implicits._
val rec1: String =
"""{
"visitorId": "v1",
"products": [{
"id": "i1",
"interest": 0.68
}, {
"id": "i2",
"interest": 0.42
}]
}"""
val rec2: String =
"""{
"visitorId": "v2",
"products": [{
"id": "i1",
"interest": 0.78
}, {
"id": "i3",
"interest": 0.11
}]
}"""
val visitsData: Seq[String] = Seq(rec1, rec2)
val productIdToNameMap = Map("i1" -> "Nike Shoes", "i2" -> "Umbrella", "i3" -> "Jeans")
val dictionary = productIdToNameMap.toSeq.toDF("id", "name")
val rddData = spark.sparkContext.parallelize(visitsData)
dictionary.printSchema()
println("for spark version >2.2.0")
var resultDF = spark.read.json(visitsData.toDS)
.withColumn("products", explode(col("products")))
.selectExpr("products.*", "visitorId")
.join(dictionary, Seq("id"))
resultDF.show
resultDF.printSchema()
convertJson(resultDF)
println("for spark version <2.2.0")
resultDF = spark.read.json(rddData)
.withColumn("products", explode(col("products")))
.selectExpr("products.*", "visitorId")
.join(dictionary, Seq("id"))
// .withColumn("products", explode(col("products")))
resultDF.show
resultDF.printSchema()
convertJson(resultDF)
/**
* convertJson : converts the data frame to json string
* #param resultDF
*/
private def convertJson(resultDF: DataFrame) = {
import org.apache.spark.sql.functions.{collect_list, _}
val x: DataFrame = resultDF
.groupBy("visitorId")
.agg(collect_list(struct("id", "interest", "name")).as("products"))
x.show
println(x.toJSON.collect.mkString)
}
}
Result :
root
|-- id: string (nullable = true)
|-- name: string (nullable = true)
for spark version >2.2.0
+---+--------+---------+----------+
| id|interest|visitorId| name|
+---+--------+---------+----------+
| i1| 0.68| v1|Nike Shoes|
| i2| 0.42| v1| Umbrella|
| i1| 0.78| v2|Nike Shoes|
| i3| 0.11| v2| Jeans|
+---+--------+---------+----------+
root
|-- id: string (nullable = true)
|-- interest: double (nullable = true)
|-- visitorId: string (nullable = true)
|-- name: string (nullable = true)
+---------+--------------------+
|visitorId| products|
+---------+--------------------+
| v2|[[i1, 0.78, Nike ...|
| v1|[[i1, 0.68, Nike ...|
+---------+--------------------+
{"visitorId":"v2","products":[{"id":"i1","interest":0.78,"name":"Nike Shoes"},{"id":"i3","interest":0.11,"name":"Jeans"}]}{"visitorId":"v1","products":[{"id":"i1","interest":0.68,"name":"Nike Shoes"},{"id":"i2","interest":0.42,"name":"Umbrella"}]}
for spark version <2.2.0
+---+--------+---------+----------+
| id|interest|visitorId| name|
+---+--------+---------+----------+
| i1| 0.68| v1|Nike Shoes|
| i2| 0.42| v1| Umbrella|
| i1| 0.78| v2|Nike Shoes|
| i3| 0.11| v2| Jeans|
+---+--------+---------+----------+
root
|-- id: string (nullable = true)
|-- interest: double (nullable = true)
|-- visitorId: string (nullable = true)
|-- name: string (nullable = true)
+---------+--------------------+
|visitorId| products|
+---------+--------------------+
| v2|[[i1, 0.78, Nike ...|
| v1|[[i1, 0.68, Nike ...|
+---------+--------------------+
{"visitorId":"v2","products":[{"id":"i1","interest":0.78,"name":"Nike Shoes"},{"id":"i3","interest":0.11,"name":"Jeans"}]}{"visitorId":"v1","products":[{"id":"i1","interest":0.68,"name":"Nike Shoes"},{"id":"i2","interest":0.42,"name":"Umbrella"}]}

Example of method to parse json with scala and give the result in case class
/** ---------------------------------------
*
{
"fields": [
{
"field1": "value",
"field2": [
{
"field21": "value",
"field22": "value"
},
{
"field21": "value",
"field22": "value"
}
]
}
]
}*/
case class elementClass(element1 : String, element2 : String)
case class outputDataClass(field1 : String, exampleClassData : List[elementClass])
def multipleMapJsonParser(jsonDataFile : String) : List[outputDataClass] = {
val JsonData : String = Source.fromFile(jsonDataFile).getLines.mkString
val jsonFormatData = JSON.parseFull(JsonData)
.map{
case json : Map[String, List[Map[String,Any]]] => json("fields").map(
jsonElem =>
outputDataClass(jsonElem("field1").toString,
jsonElem("field2").asInstanceOf[List[Map[String,String]]].map{
case element : Map[String,String] => elementClass(element("field21"),element("field22"))
})
)
}.get
jsonFormatData
}

How to convert arry:string to specific model in Scala

I want to convert array: String to Seq[Message]...
Case class:
case class Message(name: String, sex: String)
Source:
[
{ "name": "Bean",
"sex": "F"
},
{
"name": "John",
"sex": "M"
}
]
Destination
Seq[Person]
How to convert?? code...

You need to use some sort of decoders/deserialisers to decode string into case class. There are tons of decoders in scala. One of my favourite is circe as it is functional and also works pretty well with scalajs.
import io.circe._, io.circe.generic.auto._, io.circe.parser._, io.circe.syntax._
case class Message(name: String, sex: String)
val encoded =
"""
|[
| { "name": "Bean",
| "sex": "F"
| },
| {
| "name": "John",
| "sex": "M"
| }
|]
""".stripMargin
val decoded: Either[Error, List[Message]] = decode[List[Message]](encoded)
decoded match {
case Right(e) => println("success: " + e)
case Left(l) => println("failure: "+ l)
}
output:
success: List(Message(Bean,F), Message(John,M))
If you looking for plain simple compatible with java, take a look at https://github.com/FasterXML/jackson-module-scala
Also see: Scala deserialize JSON to Collection

How do I parse a deeply nested JSON document that may have some missing or extra fields using Scala?

I have read other Scala JSON parsing questions but they all seem to assume a very basic document that is not deeply nested or of mixed types. Or they assume you know all of the members of the document or that some will never be missing.
I am currently using Jackson's Streaming API but the code required is difficult to understand and maintain. Instead, I'd like to use Jerkson to return an object representing the parsed JSON if possible.
Basically: I'd like to replicate the JSON parsing functionality that's so familiar to me in dynamic languages. I realize that's probably very wrong, and so I'm here to be educated.
Say we have a Tweet:
{
"id": 100,
"text": "Hello, world."
"user": {
"name": "Brett",
"id": 200
},
"geo": {
"lat": 10.5,
"lng": 20.7
}
}
Now, the Jerkson Case Class examples make a lot of sense when you only want to parse out, say, the ID:
val tweet = """{...}"""
case class Tweet(id: Long)
val parsed = parse[Tweet](tweet)
But how do I deal with something like the Tweet above?
Some gotchas:
Some fields can be null or missing, for example "geo" above may be null, or one day they may drop a field and I don't want my parsing code to fail.
Some fields will be extra, or fields will be added and I want to be able to ignore them.

Lift json-scalaz is the best way to read and write JSON that I have come across. Docs explain it's usage pretty well here:
https://github.com/lift/framework/tree/master/core/json-scalaz

Of course right after I post this I find some help elsewhere. :)
I believe the "richer JSON example" towards the bottom of this post is a step in the right direction: http://bcomposes.wordpress.com/2012/05/12/processing-json-in-scala-with-jerkson/

Another alternatie using Lift-json could be:
package code.json
import org.specs2.mutable.Specification
import net.liftweb.json._
class JsonSpecs extends Specification {
implicit val format = DefaultFormats
val a = parse("""{
| "id": 100,
| "text": "Hello, world."
| "user": {
| "name": "Brett",
| "id": 200
| },
| "geo": {
| "lat": 10.5,
| "lng": 20.7
| }
|}""".stripMargin)
val b = parse("""{
| "id": 100,
| "text": "Hello, world."
| "user": {
| "name": "Brett",
| "id": 200
| }
|}""".stripMargin)
"Lift Json" should{
"find the id" in {
val res= (a \ "id").extract[String]
res must_== "100"
}
"find the name" in{
val res= (a \ "user" \ "name").extract[String]
res must_== "Brett"
}
"find an optional geo data" in {
val res= (a \ "geo" \ "lat").extract[Option[Double]]
res must_== Some(10.5)
}
"ignore missing geo data" in {
val res= (b \ "geo" \ "lat").extract[Option[Double]]
res must_== None
}
}
}
Note how when the geo data is missing on the val b, the parsing works just fine, expecting a None.
Or do you want to get case classes as the result?
For a case class example, see:
package code.json
import org.specs2.mutable.Specification
import net.liftweb.json._
class JsonSpecs extends Specification {
implicit val format = DefaultFormats
case class Root(id: Int, text: Option[String], user: Option[User], geo: Option[Geo])
case class User(name: String, id: Int)
case class Geo(lat: Double, lng: Double)
val c = parse("""{
| "id": 100
| "user": {
| "name": "Brett",
| "id": 200
| },
| "geo": {
| "lng": 20.7
| }
|}""".stripMargin)
"Lift Json" should{
"return none for geo lat data" in {
val res= c.extract[Root].geo.map(_.lat)
res must_== None
}
}
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

scala - How to parse JSON string into separate records? - json

Related

Scala: Map part of Json to Object with playframework

deserializing json4s with generic type

How to Parse Nested Json using Spark and Scala?

How to convert arry:string to specific model in Scala

How do I parse a deeply nested JSON document that may have some missing or extra fields using Scala?

Categories

Resources