Parsing a file having content as Json format in Scala - json

I want to parse a file having content as json format.
From the file I want to extract few properties (name, DataType, Nullable) to create some column names dynamically.
I have gone through some examples but most of them are using case class but my problem is every time I will receive a file may have different content.
I tried to use the ujson library to parse the file but I am unable to understand how to use it properly.
object JsonTest {
def main(args: Array[String]): Unit = {
val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
println(source)
val input = try source.mkString finally source.close()
println(input)
val data = ujson.read(input)
data("name") = data("name").str.reverse
val updated = data.render()
}
}
Content of the file example:
{
"Organization": {
"project": {
"name": "POC 4PL",
"description": "Implementation of orderbook"
},
"Entities": [
{
"name": "Shipments",
"Type": "Fact",
"Attributes": [
{
"name": "Shipment_Details",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "Shipment_ID",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "View_Cost",
"DataType": "StringType",
"Nullable": "true"
}
],
"ADLS_Location": "/mnt/mns/adls/raw/poc/orderbook/"
}
]
}
}
Expected output:
StructType(
Array(StructField("Shipment_Details",StringType,true),
StructField("Shipment_ID",DateType,true),
StructField("View_Cost",DateType,true)))
StructType needs to be added to the expected output programatically.

Try Using Playframework's Json utils - https://www.playframework.com/documentation/2.7.x/ScalaJson
Here's the solution to your issue-
\ Placed your json in text file
val fil_path = "C:\\TestData\\Config\\Conf.txt"
val conf_source = scala.io.Source.fromFile(fil_path)
lazy val json_str = try conf_source.mkString finally conf_source.close()
val conf_json: JsValue = Json.parse(json_str)
val all_entities: JsArray = (conf_json \ "Organization" \ "Entities").get.asInstanceOf[JsArray]
val shipments: JsValue = all_entities.value.filter(e => e.\("name").as[String] == "Shipments").head
val shipments_attributes: IndexedSeq[JsValue] = shipments.\("Attributes").get.asInstanceOf[JsArray].value
val shipments_schema: StructType = StructType(shipments_attributes.map(a => Tuple3(a.\("name").as[String], a.\("DataType").as[String], a.\("Nullable").as[String]))
.map(x => StructField(x._1, StrtoDatatype(x._2), x._3.toBoolean)))
shipments_schema.fields.foreach(println)
Output is -
StructField(Shipment_Details,StringType,true)
StructField(Shipment_ID,StringType,true)
StructField(View_Cost,StringType,true)

It depends if you want it to be completely dynamic or not, here are some options:
If you just want to read one field you can do:
import upickle.default._
val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)
println(json("Organization")("project")("name"))
the output will be: "POC 4PL"
If you just want just the Attributes to be with types, you can do:
import upickle.default.{macroRW, ReadWriter => RW}
import upickle.default._
val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)
val entitiesArray = json("Organization")("Entities")(0)("Attributes")
println(read[Seq[StructField]](entitiesArray))
case class StructField(name: String, DataType: String, Nullable: String)
object StructField{
implicit val rw: RW[StructField] = macroRW
}
the output will be: List(StructField(Shipment_Details,StringType,true), StructField(Shipment_ID,StringType,true), StructField(View_Cost,StringType,true))
another option, is to use a different library to do the class mapping. If you use Google Protobuf Struct and JsonFormat it can be 2-liner:
import com.google.protobuf.Struct
import com.google.protobuf.util.JsonFormat
val source = scala.io.Source.fromFile("C:\\Users\\ktngme\\Desktop\\ass\\file.txt")
val input = try source.mkString finally source.close()
JsonFormat.parser().merge(input, builder)
println(builder.build())
the output will be: fields { key: "Organization" value { struct_value { fields { key: "project" value { struct_value { fields { key: "name" value { string_value: "POC 4PL" } } fields { key: "description" value { string_value: "Implementation of orderbook" } } } } } fields { key: "Entities" value { list_value { values { struct_value { fields { key: "name" value { string_value: "Shipments" } }...

Related

How to serialize fields with varying type?

I have the following data classes to parse JSON. I can parse it easily with the decodeFromString method. However, the Info classes could contain the List<Int> type from time to time along with the Int type so that both are included in a single JSON. How can I handle this variation in serialization?
#Serializable
data class Node (#SerialName("nodeContent") val nodeContent: List<Info>)
#Serializable
data class Info (#SerialName("info") val info: Int)
p.s. The closest question to mine is this one: Kotlinx Serialization, avoid crashes on other datatype. I wonder if there are other ways?
EDIT:
An example is given below.
"nodeContent": [
{
"info": {
"name": "1",
},
},
{
"info": [
{
"name": "1"
},
{
"name": "2"
},
],
},
{
"info": {
"name": "2",
},
}
]
Here is an approach with a custom serializer similar to the link you provided. The idea is to return a list with just a single element.
// Can delete these two lines, they are only for Kotlin scripts
#file:DependsOn("org.jetbrains.kotlinx:kotlinx-serialization-json:1.2.0")
#file:CompilerOptions("-Xplugin=/snap/kotlin/current/lib/kotlinx-serialization-compiler-plugin.jar")
import kotlinx.serialization.*
import kotlinx.serialization.json.*
import kotlinx.serialization.encoding.Decoder
#Serializable
data class Node (val nodeContent: List<Info>)
#Serializable(with = InfoSerializer::class)
data class Info (val info: List<Name>)
#Serializable
data class Name (val name: Int)
#Serializer(forClass = Info::class)
object InfoSerializer : KSerializer<Info> {
override fun deserialize(decoder: Decoder): Info {
val json = ((decoder as JsonDecoder).decodeJsonElement() as JsonObject)
return Info(parseInfo(json))
}
private fun parseInfo(json: JsonObject): List<Name> {
val info = json["info"] ?: return emptyList()
return try {
listOf(Json.decodeFromString<Name>(info.toString()))
} catch (e: Exception) {
(info as JsonArray).map { Json.decodeFromString<Name>(it.toString()) }
}
}
}
Usage:
val ss2 = """
{
"nodeContent": [
{
"info":
{"name": 1}
},
{
"info": [
{"name": 1},
{"name": 2}
]
},
{
"info":
{"name": 2}
}
]
}
"""
val h = Json.decodeFromString<Node>(ss2)
println(h)
Result:
Node(nodeContent=[Info(info=[Name(name=1)]), Info(info=[Name(name=1), Name(name=2)]), Info(info=[Name(name=2)])])

Convert List to Json

How do I create a Json (Circe) looking like this:
{
"items": [{
"field1": "somevalue",
"field2": "somevalue2"
},
{
"field1": "abc",
"field2": "123abc"
}]
}
val result = Json.fromFields(List("items" -> ???))
You can do so using Circe's built in list typeclasses for encoding JSON. This code will work:
import io.circe.{Encoder, Json}
import io.circe.syntax._
case class Field(field1: String, field2: String)
object Field {
implicit val encodeFoo: Encoder[Field] = new Encoder[Field] {
final def apply(a: Field): Json = Json.obj(
("field1", Json.fromString(a.field1)),
("field2", Json.fromString(a.field2))
)
}
}
class Encoding(items: List[Field]) {
def getJson: Json = {
Json.obj(
(
"items",
items.asJson
)
)
}
}
If we instantiate an "Encoding" class and call getJson it will give you back the desired JSON. It works because with circe all you need to do to encode a list is provide an encoder for whatever is inside the list. Thus, if we provide an encoder for Field it will encode it inside a list when we call asJson on it.
If we run this:
val items = new Encoding(List(Field("jf", "fj"), Field("jfl", "fjl")))
println(items.getJson)
we get:
{
"items" : [
{
"field1" : "jf",
"field2" : "fj"
},
{
"field1" : "jfl",
"field2" : "fjl"
}
]
}

Alternative to parsing json with option [ either [A,B ] ] in scala

For example, here payload is optional and it has 3 variants:
How can I parse the json with types like option[either[A,B,C]] but to use abstract data type using things sealed trait or sum type?
Below is a minimal example with some boiler plate:
https://scalafiddle.io/sf/K6RUWqk/1
// Start writing your ScalaFiddle code here
val json =
"""[
{
"id": 1,
"payload" : "data"
},
{
"id": 2.1,
"payload" : {
"field1" : "field1",
"field2" : 5,
"field3" : true
}
},
{
"id": 2.2,
"payload" : {
"field1" : "field1",
}
},
{
"id": 3,
payload" : 4
},
{
"id":4,
"
}
]"""
final case class Data(field1: String, field2: Option[Int])
type Payload = Either[String, Data]
final case class Record(id: Int, payload: Option[Payload])
import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
implicit final val dataDecoder: Decoder[Data] = deriveDecoder
implicit final val payloadDecoder: Decoder[Payload] = Decoder[String] either Decoder[Data]
implicit final val recordDecoder: Decoder[Record] = deriveDecoder
val result = io.circe.parser.decode[List[Record]](json)
println(result)
Your code is almost fine, you have just syntax issues in your json and Record.id should be Double instead of Int - because it is how this field present in your json ("id": 2.1). Please, find fixed version below:
val json =
s"""[
{
"id": 1,
"payload" : "data"
},
{
"id": 2.1,
"payload" : {
"field1" : "field1",
"field2" : 5,
"field3" : true
}
},
{
"id": 2.2,
"payload" : {
"field1" : "field1"
}
},
{
"id": 3,
"payload" : 4
},
{
"id": 4
}
]"""
type Payload = Either[String, Data]
final case class Data(field1: String, field2: Option[Int])
final case class Record(id: Double, payload: Option[Payload]) // id is a Double in your json in some cases
import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
implicit val dataDecoder: Decoder[Data] = deriveDecoder
implicit val payloadDecoder: Decoder[Payload] = Decoder[String] either Decoder[Data]
implicit val recordDecoder: Decoder[Record] = deriveDecoder
val result = io.circe.parser.decode[List[Record]](json)
println(result)
Which produced in my case:
Right(List(Record(1.0,Some(Left(data))), Record(2.1,Some(Right(Data(field1,Some(5))))), Record(2.2,Some(Right(Data(field1,None)))), Record(3.0,Some(Left(4))), Record(4.0,None)))
UPDATE:
The more general approach would be to use so-called Sum Types or in simple words - general sealed trait with several different implementations. Please, see for more details next Circe doc page : https://circe.github.io/circe/codecs/adt.html
In your case it can be achieved something like this:
import cats.syntax.functor._
import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
sealed trait Payload
object Payload {
implicit val decoder: Decoder[Payload] = {
List[Decoder[Payload]](
Decoder[StringPayload].widen,
Decoder[IntPayload].widen,
Decoder[ObjectPayload].widen
).reduce(_ or _)
}
}
case class StringPayload(value: String) extends Payload
object StringPayload {
implicit val decoder: Decoder[StringPayload] = Decoder[String].map(StringPayload.apply)
}
case class IntPayload(value: Int) extends Payload
object IntPayload {
implicit val decoder: Decoder[IntPayload] = Decoder[Int].map(IntPayload.apply)
}
case class ObjectPayload(field1: String, field2: Option[Int]) extends Payload
object ObjectPayload {
implicit val decoder: Decoder[ObjectPayload] = deriveDecoder
}
final case class Record(id: Double, payload: Option[Payload])
object Record {
implicit val decoder: Decoder[Record] = deriveDecoder
}
def main(args: Array[String]): Unit = {
val json =
s"""[
{
"id": 1,
"payload" : "data"
},
{
"id": 2.1,
"payload" : {
"field1" : "field1",
"field2" : 5,
"field3" : true
}
},
{
"id": 2.2,
"payload" : {
"field1" : "field1"
}
},
{
"id": 3,
"payload" : "4"
},
{
"id": 4
}
]"""
val result = io.circe.parser.decode[List[Record]](json)
println(result)
}
which produced in my case next output:
Right(List(Record(1.0,Some(StringPayload(data))), Record(2.1,Some(ObjectPayload(field1,Some(5)))), Record(2.2,Some(ObjectPayload(field1,None))), Record(3.0,Some(StringPayload(4))), Record(4.0,None)))
Hope this helps!

Select a particular value from a JSON array

I have a JSON array containing the following details, I would like to extract the text Alignment value of Right and assign it to a val.
"data":[
{
"formatType": "text",
"value": "bgyufcie huis huids hufhsduhfsl hd"
},
{
"formatType": "text size",
"value": 12
},
{
"formatType": "text alignment",
"value" : "right"
}
]
Any thoughts?
Using the Gson library, you can map the json in a Java object.
So, you have to create a class like this:
public class MyObject{
private String formatType;
private String value;
//Constuctors, Getter and Setter...
//.....
//.....
}
After, using the method fromJson you can create an array of MyObject.
Gson gson = new Gson();
MyObject[] array = gson.fromJson(new FileReader("file.json"), MyObject[].class);
You can also use json4s library as shown next:
import org.json4s._
import org.json4s.jackson.JsonMethods._
val json = """{
"data":[
{
"formatType": "text",
"value": "bgyufcie huis huids hufhsduhfsl hd"
},
{
"formatType": "text size",
"value": 12
},
{
"formatType": "text alignment",
"value" : "right"
}
]
}"""
val parsed = parse(json)
val value = (parsed \ "data" \\ classOf[JObject]).filter(m => m("formatType") == "text alignment")(0)("value")
// value: Any = right
The filter (parsed \ "data" \\ classOf[JObject]) extracts all the items into a List of Map i.e:
List(
Map(formatType -> text, value -> bgyufcie huis huids hufhsduhfsl hd),
Map(formatType -> text size, value -> 12), Map(formatType -> text alignment, value -> right)
).
From those we apply the filter filter(m => m("formatType") == "text alignment") to retrieve the record that we really need.
Use Dijon FTW!
Here is a test that demonstrates how easily the "right" value can be found in samples like yours:
import com.github.pathikrit.dijon._
val json = parse(
"""{
|"data":[
| {
| "formatType": "text",
| "value": "bgyufcie huis huids hufhsduhfsl hd"
| },
| {
| "formatType": "text size",
| "value": 12
| },
| {
| "formatType": "text alignment",
| "value" : "right"
| }
|]
|}""".stripMargin)
assert(json.data.toSeq.collect {
case obj if obj.formatType == "text alignment" => obj.value
}.head == "right")
I would use the Jackson library, it is very helpful for parsing JSON. You can read the JSON using an ObjectMapper.
Here is a full tutorial to get you started: https://www.mkyong.com/java/jackson-how-to-parse-json/
create a multiline JSON string, then parse that string directly into a Scala object, use the net.liftweb package to solve this.
import net.liftweb.json._
object SarahEmailPluginConfigTest {
implicit val formats = DefaultFormats
case class Mailserver(url: String, username: String, password: String)
val json = parse(
"""
{
"url": "imap.yahoo.com",
"username": "myusername",
"password": "mypassword"
}
"""
)
def main(args: Array[String]) {
val m = json.extract[Mailserver]
println(m.url)
println(m.username)
println(m.password)
}
}
https://alvinalexander.com/scala/simple-scala-lift-json-example-lift-framework
Reference link

Parse JSON with unknown attributes names into a Case Class

I have the following JSON file to be parsed into a case class:
{
"root": {
"nodes": [{
"id": "1",
"attributes": {
"name": "Node 1",
"size": "3"
}
},
{
"id": "2",
"attributes": {
"value": "4",
"name": "Node 2"
}
}
]
}
}
The problem is that the attributes could have any value inside it: name, size, value, anything ...
At this moment I have defined my case classes:
case class Attributes(
name: String,
size: String,
value: Sting
)
case class Nodes(
id: String,
attributes: Attributes
)
case class Root(
nodes: List[Nodes]
)
case class R00tJsonObject(
root: Root
)
Whats is the best way to deal with this scenario when I can receive any attribute ?
Currently I am using Json4s to handle son files.
Thanks!
Your attributes are arbitrarily many and differently named, but it seems you can store them in a Map[String, String] (at least, if those examples are anything to go by). In this case, using circe-parser (https://circe.github.io/circe/parsing.html), you could simply use code along these lines in order to convert your JSON directly into a simple case-class:
import io.circe._, io.circe.parser._
import io.circe.generic.semiauto._
case class Node(id: String, attributes: Map[String,String])
case class Root(nodes: List[Node])
implicit val nodeDecoder: Decoder[Node] = deriveDecoder[Node]
implicit val nodeEncoder: Encoder[Node] = deriveEncoder[Node]
implicit val rootDecoder: Decoder[Root] = deriveDecoder[Root]
implicit val rootEncoder: Encoder[Root] = deriveEncoder[Root]
def myParse(jsonString: String) = {
val res = parse(jsonString) match {
case Right(json) => {
val cursor = json.hcursor
cursor.get[Root]("root")
}
case _ => Left("Wrong JSON!")
}
println(res)
}
This snippet will print
Right(Root(List(Node(1,Map(name -> Node 1, size -> 3)), Node(2,Map(value -> 4, name -> Node 2)))))
on the console, for the JSON, you've given. (Assuming, the solution doesn't have to be in Json4s.)