I would like to create a json string from several fields in scala.
the different fields are retrieved from a text file:
import scala.io.Source
val source = Source.fromFile("D:/Web/Data/QueueFile/FromCarrier/00001709.status.201808010837422.txt")
val lines = source.getLines()
while (lines.hasNext){
val newLine = lines.next()
println(newLine)
val arrayLine = newLine.split(";").toArray
val MatchingField = arrayLine(1)
val TStatus_Code = arrayLine(0)
val Trace_Date = arrayLine(2)
println("MatchingField: " + MatchingField + " - TStatus_Code: " + TStatus_Code + " - Trace_Date: " + Trace_Date)
if(TStatus_Code.nonEmpty && Trace_Date.nonEmpty){
println("TStatus_Code and Trace_Date exist, we should build the json structure")
**val MQS.Trace.Trace_Date = Trace_Date
val MQS.Trace.TStatus_Code = TStatus_Code
val MQS.MatchingField = MatchingField
val MQS.Provider_ID = 1
val MQS.Customer_ID = 989
val QMessage = SerializeJson(MQS)**
}
else
println("TStatus_Code and/or Trace_Date does not exist, don't generate the json structure")
}
The code between ** works in another langage (coldfusion). I would like to do the same thing in scala.
Any help would be much appreciated. Thanks
There are two parts to this question:
1. How to represent the data
2. How to serialize it to json
For the first part, you can create a case class and fill it with your data, e.g.:
case class MQS(traceDate: String, tStatusCode: String, matchingField: String, providerId: Int, customerId: Int)
val mqs = MQS(Trace_Date, TStatus_Code, MatchingField, 1, 989)
For the second part you can use any of many json serializers. A simple example would be with json4s (http://json4s.org/):
import org.json4s.DefaultFormats
import org.json4s.native.Serialization.write
write(mqs)(DefaultFormats)
EDIT
The case class needs to be defined in an object. Following is a full example:
import org.json4s.DefaultFormats
import org.json4s.native.Serialization.write
object App {
case class MQS(traceDate: String, tStatusCode: String, matchingField: String, providerId: Int, customerId: Int)
def main(args: Array[String]): Unit = {
println("Hello, world!")
val mqs = MQS("Trace_Date", "TStatus_Code", "MatchingField", 1, 989)
println(write(mqs)(DefaultFormats))
}
}
The case class definition cannot be inside the function main (or any method/class) but instead must be in an object.
Related
Scala - Couldn't remove double quotes for "{}" braces while building Json
import scala.util.Random
import math.Ordered.orderingToOrdered
import math.Ordering.Implicits.infixOrderingOps
import play.api.libs.json._
import play.api.libs.json.Writes
import play.api.libs.json.Json.JsValueWrapper
val data1 = (1 to 2)
.map {r => Json.toJson(Map(
"name" -> Json.toJson(s"Perftest${Random.alphanumeric.take(6).mkString}"),
"domainId"->Json.toJson("343RDFDGF4RGGFG"),
"value" ->Json.toJson("{}")))}
val data2 = Json.toJson(data1)
println(data2)
Result :
[{"name":"PerftestpXI1ID","domainId":"343RDFDGF4RGGFG","value":"{}"},{"name":"PerftestHoZSQR","domainId":"343RDFDGF4RGGFG","value":"{}"}]
Expected :
"value":{}
[{"name":"PerftestpXI1ID","domainId":"343RDFDGF4RGGFG","value":{}},{"name":"PerftestHoZSQR","domainId":"343RDFDGF4RGGFG","value":{}}]
Please suggest a solution
You are giving it a String so it is creating a string in JSON. What you actually want is an empty dictionary, which is a Map in Scala:
val data1 = (1 to 2)
.map {r => Json.toJson(Map(
"name" -> Json.toJson(s"Perftest${Random.alphanumeric.take(6).mkString}"),
"domainId"->Json.toJson("343RDFDGF4RGGFG"),
"value" ->Json.toJson(Map.empty[String, String])))}
More generally you should create a case class for the data and create a custom Writes implementation for that class so that you don't have to call Json.toJson on every value.
Here is how to do the conversion using only a single Json.toJson call:
import play.api.libs.json.Json
case class MyData(name: String, domainId: String, value: Map[String,String])
implicit val fmt = Json.format[MyData]
val data1 = (1 to 2)
.map { r => new MyData(
s"Perftest${Random.alphanumeric.take(6).mkString}",
"343RDFDGF4RGGFG",
Map.empty
)
}
val data2 = Json.toJson(data1)
println(data2)
The value field can be a standard type such as Boolean or Double. It could also be another case class to create nested JSON as long as there is a similar Json.format line for the new type.
More complex JSON can be generated by using a custom Writes (and Reads) implementation as described in the documentation.
How can I write circe decoder for class
case class KeyValueRow(count: Int, key: String)
where json contains field "count" (Int) and some extra string-field (name of this field may be various, like "url", "city", whatever)?
{"count":974989,"url":"http://google.com"}
{"count":1234,"city":"Rome"}
You can do what you need like this:
import io.circe.syntax._
import io.circe.parser._
import io.circe.generic.semiauto._
import io.circe.{ Decoder, Encoder, HCursor, Json, DecodingFailure}
object stuff{
case class KeyValueRow(count: Int, key: String)
implicit def jsonEncoder : Encoder[KeyValueRow] = deriveEncoder
implicit def jsonDecoder : Decoder[KeyValueRow] = Decoder.instance{ h =>
(for{
keys <- h.keys
key <- keys.dropWhile(_ == "count").headOption
} yield {
for{
count <- h.get[Int]("count")
keyValue <- h.get[String](key)
} yield KeyValueRow(count.toInt, keyValue)
}).getOrElse(Left(DecodingFailure("Not a valid KeyValueRow", List())))
}
}
import stuff._
val a = KeyValueRow(974989, "www.google.com")
println(a.asJson.spaces2)
val test1 = """{"count":974989,"url":"http://google.com"}"""
val test2 = """{"count":1234,"city":"Rome", "will be dropped": "who cares"}"""
val parsedTest1 = parse(test1).flatMap(_.as[KeyValueRow])
val parsedTest2 = parse(test2).flatMap(_.as[KeyValueRow])
println(parsedTest1)
println(parsedTest2)
println(parsedTest1.map(_.asJson.spaces2))
println(parsedTest2.map(_.asJson.spaces2))
Scalafiddle: link here
As I mentioned in the comment above, keep in mind that the that if you decode some json, and then re-encode it, the result will be different from the initial input. To fix that, you would need to keep track of the original name of the key field.
Is there a simple way to converting a given Row object to json?
Found this about converting a whole Dataframe to json output:
Spark Row to JSON
But I just want to convert a one Row to json.
Here is pseudo code for what I am trying to do.
More precisely I am reading json as input in a Dataframe.
I am producing a new output that is mainly based on columns, but with one json field for all the info that does not fit into the columns.
My question what is the easiest way to write this function: convertRowToJson()
def convertRowToJson(row: Row): String = ???
def transformVenueTry(row: Row): Try[Venue] = {
Try({
val name = row.getString(row.fieldIndex("name"))
val metadataRow = row.getStruct(row.fieldIndex("meta"))
val score: Double = calcScore(row)
val combinedRow: Row = metadataRow ++ ("score" -> score)
val jsonString: String = convertRowToJson(combinedRow)
Venue(name = name, json = jsonString)
})
}
Psidom's Solutions:
def convertRowToJSON(row: Row): String = {
val m = row.getValuesMap(row.schema.fieldNames)
JSONObject(m).toString()
}
only works if the Row only has one level not with nested Row. This is the schema:
StructType(
StructField(indicator,StringType,true),
StructField(range,
StructType(
StructField(currency_code,StringType,true),
StructField(maxrate,LongType,true),
StructField(minrate,LongType,true)),true))
Also tried Artem suggestion, but that did not compile:
def row2DataFrame(row: Row, sqlContext: SQLContext): DataFrame = {
val sparkContext = sqlContext.sparkContext
import sparkContext._
import sqlContext.implicits._
import sqlContext._
val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
val dataFrame = rowRDD.toDF() //XXX does not compile
dataFrame
}
You can use getValuesMap to convert the row object to a Map and then convert it JSON:
import scala.util.parsing.json.JSONObject
import org.apache.spark.sql._
val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")
val row = df.first() // this is an example row object
def convertRowToJSON(row: Row): String = {
val m = row.getValuesMap(row.schema.fieldNames)
JSONObject(m).toString()
}
convertRowToJSON(row)
// res46: String = {"A" : 1, "B" : 2, "C" : 3}
I need to read json input and produce json output.
Most fields are handled individually, but a few json sub objects need to just be preserved.
When Spark reads a dataframe it turns a record into a Row. The Row is a json like structure. That can be transformed and written out to json.
But I need to take some sub json structures out to a string to use as a new field.
This can be done like this:
dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))
location.address is the path to the sub json object of the incoming json based dataframe. address_json is the column name of that object converted to a string version of the json.
to_json is implemented in Spark 2.1.
If generating it output json using json4s address_json should be parsed to an AST representation otherwise the output json will have the address_json part escaped.
Pay attention scala class scala.util.parsing.json.JSONObject is deprecated and not support null values.
#deprecated("This class will be removed.", "2.11.0")
"JSONFormat.defaultFormat doesn't handle null values"
https://issues.scala-lang.org/browse/SI-5092
JSon has schema but Row doesn't have a schema, so you need to apply schema on Row & convert to JSon. Here is how you can do it.
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
def convertRowToJson(row: Row): String = {
val schema = StructType(
StructField("name", StringType, true) ::
StructField("meta", StringType, false) :: Nil)
return sqlContext.applySchema(row, schema).toJSON
}
Essentially, you can have a dataframe which contains just one row. Thus, you can try to filter your initial dataframe and then parse it to json.
I had the same issue, I had parquet files with canonical schema (no arrays), and I only want to get json events. I did as follows, and it seems to work just fine (Spark 2.1):
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{DataFrame, Dataset, Row}
import scala.util.parsing.json.JSONFormat.ValueFormatter
import scala.util.parsing.json.{JSONArray, JSONFormat, JSONObject}
def getValuesMap[T](row: Row, schema: StructType): Map[String,Any] = {
schema.fields.map {
field =>
try{
if (field.dataType.typeName.equals("struct")){
field.name -> getValuesMap(row.getAs[Row](field.name), field.dataType.asInstanceOf[StructType])
}else{
field.name -> row.getAs[T](field.name)
}
}catch {case e : Exception =>{field.name -> null.asInstanceOf[T]}}
}.filter(xy => xy._2 != null).toMap
}
def convertRowToJSON(row: Row, schema: StructType): JSONObject = {
val m: Map[String, Any] = getValuesMap(row, schema)
JSONObject(m)
}
//I guess since I am using Any and not nothing the regular ValueFormatter is not working, and I had to add case jmap : Map[String,Any] => JSONObject(jmap).toString(defaultFormatter)
val defaultFormatter : ValueFormatter = (x : Any) => x match {
case s : String => "\"" + JSONFormat.quoteString(s) + "\""
case jo : JSONObject => jo.toString(defaultFormatter)
case jmap : Map[String,Any] => JSONObject(jmap).toString(defaultFormatter)
case ja : JSONArray => ja.toString(defaultFormatter)
case other => other.toString
}
val someFile = "s3a://bucket/file"
val df: DataFrame = sqlContext.read.load(someFile)
val schema: StructType = df.schema
val jsons: Dataset[JSONObject] = df.map(row => convertRowToJSON(row, schema))
if you are iterating through an data frame , you can directly convert the data frame to a new dataframe with json object inside and iterate that
val df_json = df.toJSON
I combining the suggestion from: Artem, KiranM and Psidom. Did a lot of trails and error and came up with this solutions that I tested for nested structures:
def row2Json(row: Row, sqlContext: SQLContext): String = {
import sqlContext.implicits
val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
val dataframe = sqlContext.createDataFrame(rowRDD, row.schema)
dataframe.toJSON.first
}
This solution worked, but only while running in driver mode.
I want to have different names of fields in my case classes and in my JSON, therefore I need a comfortable way of renaming in both, encoding and decoding.
Does someone have a good solution ?
You can use Custom key mappings via annotations. The most generic way is the JsonKey annotation from io.circe.generic.extras._. Example from the docs:
import io.circe.generic.extras._, io.circe.syntax._
implicit val config: Configuration = Configuration.default
#ConfiguredJsonCodec case class Bar(#JsonKey("my-int") i: Int, s: String)
Bar(13, "Qux").asJson
// res5: io.circe.Json = JObject(object[my-int -> 13,s -> "Qux"])
This requires the package circe-generic-extras.
Here's a code sample for Decoder (bit verbose since it won't remove the old field):
val pimpedDecoder = deriveDecoder[PimpClass].prepare {
_.withFocus {
_.mapObject { x =>
val value = x("old-field")
value.map(x.add("new-field", _)).getOrElse(x)
}
}
}
implicit val decodeFieldType: Decoder[FieldType] =
Decoder.forProduct5("nth", "isVLEncoded", "isSerialized", "isSigningField", "type")
(FieldType.apply)
This is a simple way if you have lots of different field names.
https://circe.github.io/circe/codecs/custom-codecs.html
You can use the mapJson function on Encoder to derive an encoder from the generic one and remap your field name.
And you can use the prepare function on Decoder to transform the JSON passed to a generic Decoder.
You could also write both from scratch, but it may be a ton of boilerplate, those solutions should both be a handful of lines max each.
The following function can be used to rename a circe's JSON field:
import io.circe._
object CirceUtil {
def renameField(json: Json, fieldToRename: String, newName: String): Json =
(for {
value <- json.hcursor.downField(fieldToRename).focus
newJson <- json.mapObject(_.add(newName, value)).hcursor.downField(fieldToRename).delete.top
} yield newJson).getOrElse(json)
}
You can use it in an Encoder like so:
implicit val circeEncoder: Encoder[YourCaseClass] = deriveEncoder[YourCaseClass].mapJson(
CirceUtil.renameField(_, "old_field_name", "new_field_name")
)
Extra
Unit tests
import io.circe.parser._
import org.specs2.mutable.Specification
class CirceUtilSpec extends Specification {
"CirceUtil" should {
"renameField" should {
"correctly rename field" in {
val json = parse("""{ "oldFieldName": 1 }""").toOption.get
val resultJson = CirceUtil.renameField(json, "oldFieldName", "newFieldName")
resultJson.hcursor.downField("oldFieldName").focus must beNone
resultJson.hcursor.downField("newFieldName").focus must beSome
}
"return unchanged json if field is not found" in {
val json = parse("""{ "oldFieldName": 1 }""").toOption.get
val resultJson = CirceUtil.renameField(json, "nonExistentField", "newFieldName")
resultJson must be equalTo json
}
}
}
}
Suppose I have the following structure I want to serialize in Json:
case class A(name:String)
case class B(age:Int)
case class C(id:String, a:A,b:B)
I'm using lift-json "write(...)" , but I want to flatten the structure so instead of:
{ id:xx , a:{ name:"xxxx" }, b:{ age:xxxx } }
I want to get:
{ id:xx , name:"xxxx" , age:xxxx }
Use transform method on JValue:
import net.liftweb.json._
import net.liftweb.json.JsonAST._
implicit val formats = net.liftweb.json.DefaultFormats
val c1 = C("c1", A("some-name"), B(42))
val c1flat = Extraction decompose c1 transform { case JField(x, JObject(List(jf))) if x == "a" || x == "b" => jf }
val c1str = Printer pretty (JsonAST render c1flat)
Result:
c1str: String =
{
"id":"c1",
"name":"some-name",
"age":42
}
If A and B have multiple fields you will want a slightly different approach:
import net.liftweb.json._
import net.liftweb.json.JsonAST._
import net.liftweb.json.JsonDSL._
implicit val formats = net.liftweb.json.DefaultFormats
implicit def cToJson(c: C): JValue = (("id" -> c.id):JValue) merge (Extraction decompose c.a) merge (Extraction decompose c.b)
val c1 = C("c1", A("a name", "a nick", "an alias"), B(11, 111, 1111))
Printer pretty (JsonAST render c1)
res0: String =
{
"id":"c1",
"name":"a name",
"nick":"a nick",
"alias":"an alias",
"age":11,
"weight":111,
"height":1111
}
You can declare a new object D with fields (id, name, age) and load the values you want in the constructor then serialize that class to json. There may be another way but this way will work.