Put Data in mutlple branch of Array : Json Transformer ,Scala Play - json

i want to add values to all the arrays in json object.
For eg:
value array [4,2.5,2.5,1.5]
json =
{
"items": [
{
"id": 1,
"name": "one",
"price": {}
},
{
"id": 2,
"name": "two"
},
{
"id": 3,
"name": "three",
"price": {}
},
{
"id": 4,
"name": "four",
"price": {
"value": 1.5
}
}
]
}
i want to transform the above json in
{
"items": [
{
"id": 1,
"name": "one",
"price": {
"value": 4
}
},
{
"id": 2,
"name": "two",
"price": {
"value": 2.5
}
},
{
"id": 3,
"name": "three",
"price": {
"value": 2.5
}
},
{
"id": 4,
"name": "four",
"price": {
"value": 1.5
}
}
]
}
Any suggestions on how do i achieve this. My goal is to put values inside the specific fields of json array. I am using play json library throughout my application. What other options do i have instead of using json transformers.

You may use simple transformation like
val prices = List[Double](4,2.5,2.5,1.5).map(price => Json.obj("price" -> Json.obj("value" -> price)))
val t = (__ \ "items").json.update(
of[List[JsObject]]
.map(_.zip(prices).map(o => _._1 ++ _._2))
.map(JsArray)
)
res5: play.api.libs.json.JsResult[play.api.libs.json.JsObject] = JsSuccess({"items":[{"id":1,"name":"one","price":{"value":4}},{"id":2,"name":"two","price":{"value":2.5}},{"id":3,"name":"three","price":{"value":2.5}},{"id":4,"name":"four","price":{"value":1.5}}]},/items)

I suggest using classes, but not sure this fits to your project because it's hard to guess how your whole codes look like.
I put new Item manually for simplicity. You can create items using Json library :)
class Price(val value:Double) {
override def toString = s"{value:${value}}"
}
class Item(val id: Int, val name: String, val price: Price) {
def this(id: Int, name: String) {
this(id, name, null)
}
override def toString = s"{id:${id}, name:${name}, price:${price}}"
}
val price = Array(4, 2.5, 2.5, 1.5)
/** You might convert Json data to List[Item] using Json library instead. */
val items: List[Item] = List(
new Item(1, "one"),
new Item(2, "two"),
new Item(3, "three"),
new Item(4, "four", new Price(1.5))
)
val valueMappedItems = items.zipWithIndex.map{case (item, index) =>
if (item.price == null) {
new Item(item.id, item.name, new Price(price(index)))
} else {
item
}
}

Related

How to update nested JSON array in scala

I'm a newbie to scala/play and I've been trying to update (add a new id to) the query array in the sections[1] of the JSON but I've had no success traversing the JSON as I have little knowledge of transformers and how to use it.
"definitions": [
{
"sections": [
{
"priority": 1,
"content": {
"title": "Driver",
"links": [
{
"url": "https://blabla.com",
"text": "See all"
}
]
},
"SearchQuery": {
"options": {
"aggregate": true,
"size": 20,
},
"query": "{\"id\":{\"include\":[\"0wxZ4Nr2\", \"0wxZbNr2\", \"6WZOPMw1\"}}"
}
},
{
"priority": 2,
"content": {
"title": "Deliver",
"links": [
{
"url": "https://blabla.com",
"text": "See all"
}
]
},
"SearchQuery": {
"options": {
"aggregate": true,
"size": 20,
},
"query": "{\"id\":{\"include\":[\"2W12Q2wq\", \"Nwq09lW3\", \"QweNN2d9\"]}}"
}
}
]
}
Any suggestions on how I can achieve this. My goal is to put values inside the specific fields of JSON array. I am using play JSON library throughout my application?
As you noticed if you use PlayJSON you can use Json Transformers
Updating a field would work like this:
val queryUpdater = (__ \ "definitions" \ 1 \ "SearchQuery" \ "query").json.update(
of[JsString].map {
case JsString(value) =>
val newValue: String = ... // calculate new value
JsString(newValue)
}
)
json.transform(queryUpdater)
If you needed to update all queries it would be more like:
val updateQuery = (__ \ "SearchQuery" \ "query").json.update(
of[JsString].map {
case JsString(value) =>
val newValue: String = ... // calculate new value
JsString(newValue)
}
)
val updateQueries = (__ \ "definitions").json.update(
of[JsArray].map {
case JsArray(arr) =>
JsArray(arr.map(_.transform(updateQuery)))
}
)
json.transform(updateQueries)

Spark: Splitting JSON strings into separate dataframe columns

Im loading the below JSON string into a dataframe column.
{
"title": {
"titleid": "222",
"titlename": "ABCD"
},
"customer": {
"customerDetail": {
"customerid": 878378743,
"customerstatus": "ACTIVE",
"customersystems": {
"customersystem1": "SYS01",
"customersystem2": null
},
"sysid": null
},
"persons": [{
"personid": "123",
"personname": "IIISKDJKJSD"
},
{
"personid": "456",
"personname": "IUDFIDIKJK"
}]
}
}
val js = spark.read.json("./src/main/resources/json/customer.txt")
println(js.schema)
val newDF = df.select(from_json($"value", js.schema).as("parsed_value"))
newDF.selectExpr("parsed_value.customer.*").show(false)
//Schema:
StructType(StructField(customer,StructType(StructField(customerDetail,StructType(StructField(customerid,LongType,true), StructField(customerstatus,StringType,true), StructField(customersystems,StructType(StructField(customersystem1,StringType,true), StructField(customersystem2,StringType,true)),true), StructField(sysid,StringType,true)),true), StructField(persons,ArrayType(StructType(StructField(personid,StringType,true), StructField(personname,StringType,true)),true),true)),true), StructField(title,StructType(StructField(titleid,StringType,true), StructField(titlename,StringType,true)),true))
//Output:
+------------------------------+---------------------------------------+
|customerDetail |persons |
+------------------------------+---------------------------------------+
|[878378743, ACTIVE, [SYS01,],]|[[123, IIISKDJKJSD], [456, IUDFIDIKJK]]|
+------------------------------+---------------------------------------+
My Question: Is there a way that I can split the key value as a separate dataframe columns like below
by keeping the Array columns as is since I need to have only one record per json string:
Example for customer column:
customer.customerDetail.customerid,customer.customerDetail.customerstatus,customer.customerDetail.customersystems.customersystem1,customer.customerDetail.customersystems.customersystem2,customerid,customer.customerDetail.sysid,customer.persons
878378743,ACTIVE,SYS01,null,null,{"persons": [ { "personid": "123", "personname": "IIISKDJKJSD" }, { "personid": "456", "personname": "IUDFIDIKJK" } ] }
Edited post :
val df = spark.read.json("your/path/data.json")
import org.apache.spark.sql.functions.col
def collectFields(field: String, sc: DataType): Seq[String] = {
sc match {
case sf: StructType => sf.fields.flatMap(f => collectFields(field+"."+f.name, f.dataType))
case _ => Seq(field)
}
}
val fields = collectFields("",df.schema).map(_.tail)
df.select(fields.map(col):_*).show(false)
Output :
+----------+--------------+---------------+---------------+-----+-------------------------------------+-------+---------+
|customerid|customerstatus|customersystem1|customersystem2|sysid|persons |titleid|titlename|
+----------+--------------+---------------+---------------+-----+-------------------------------------+-------+---------+
|878378743 |ACTIVE |SYS01 |null |null |[[123,IIISKDJKJSD], [456,IUDFIDIKJK]]|222 |ABCD |
+----------+--------------+---------------+---------------+-----+-------------------------------------+-------+---------+
You can try with the help of RDD's by defining column names in an empty RDD and then reading json,converting it to DataFrame with .toDF() and iterating it to the empty RDD.

How to stream insert JSON array to BigQuery table in Apache Beam

My apache beam application receives a message in JSON array but insert each row to a BigQuery table. How can I support this usecase in ApacheBeam? Can I split each row and insert it to table one by one?
JSON message example:
[
{"id": 1, "name": "post1", "price": 10},
{"id": 2, "name": "post2", "price": 20},
{"id": 3, "name": "post3", "price": 30}
]
BigQuery table schema:
[
{
"mode": "REQUIRED",
"name": "id",
"type": "INT64"
},
{
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "price",
"type": "INT64"
}
]
Here is my solution. I converted JSON string to List once then c.output one by one. My code in in Scala but you can do the same thing in Java.
case class MyTranscationRecord(id: String, name: String, price: Int)
case class MyTranscation(recordList: List[MyTranscationRecord])
class ConvertJSONTextToMyRecord extends DoFn[KafkaRecord[java.lang.Long, String], MyTranscation]() {
private val logger: Logger = LoggerFactory.getLogger(classOf[ConvertJSONTextToMyRecord])
#ProcessElement
def processElement(c: ProcessContext): Unit = {
try {
val mapper: ObjectMapper = new ObjectMapper()
.registerModule(DefaultScalaModule)
val messageText = c.element.getKV.getValue
val transaction: MyRecord = mapper.readValue(messageText, classOf[MyTranscation])
logger.info(s"successfully converted to an EPC transaction = $transaction")
for (record <- transaction.recordList) {
c.output(record)
}
} catch {
case e: Exception =>
val message = e.getLocalizedMessage + e.getStackTrace
logger.error(message)
}
}
}

Get _id from MongoDB using Scala

For a given JSON how do I get the _id to use it as an id for inserting in another JSON?
Tried to get the ID as shown below but does not return correct results.
private def getModelRunId(): List[String] = {
val resultsCursor: List[DBObject] =
modelRunResultsCollection.find(MongoDBObject.empty, MongoDBObject(FIELD_ID -> 1)).toList
println("resultsCursor >>>>>>>>>>>>>>>>>> " + resultsCursor)
resultsCursor.map(x => (Json.parse(x.toString()) \ FIELD_ID).asOpt[String]).flatten
}
{
"_id": ObjectId("5269723bd516ec3a69f3639e"),
"modelRunId": ObjectId("5269723ad516ec3a69f3639d"),
"results": [
{
"ClaimId": "526971f5b5b8b9148404623a",
"pricingResult": {
"TxId": 0,
"ClaimId": "Large_Batch_1",
"Errors": [
],
"Disposition": [
{
"GroupId": 1,
"PriceAmt": 20,
"Status": "Priced Successfully",
"ReasonCode": 0,
"Reason": "RmbModel(PAM_DC_1):ProgramNode(Validation CPG):ServiceGroupNode(Medical Services):RmbTerm(RT)",
"PricingMethodologyId": 2,
"Lines": [
{
"Id": 1
}
]
}
]
}
},
If you want to find objectId's:
import com.mongodb.casbah.Imports._
collection.find(MongoDBObject(/*query*/)).map(_._id)
If you want to query by id:
collection.findOneByID(/*id*/)
I suppose you are using Casbah, the official Driver for Scala.
You just need to modify the map function :
resultsCursor.map { x => x.as[org.bson.types.ObjectId](FIELD_ID)}
Casbah does the deserialization from BSON to Scala object, so you don't have to do it yourself !

Groovy JSONBuilder issues

I'm trying to use JsonBuilder with Groovy to dynamically generate JSON. I want to create a JSON block like:
{
"type": {
"__type": "urn",
"value": "myCustomValue1"
},
"urn": {
"__type": "urn",
"value": "myCustomValue2"
},
"date": {
"epoch": 1265662800000,
"str": "2010-02-08T21:00:00Z"
},
"metadata": [{
"ratings": [{
"rating": "NR",
"scheme": "eirin",
"_type": {
"__type": "urn",
"value": "myCustomValue3"
}
}],
"creators": [Jim, Bob, Joe]
}]
}
I've written:
def addUrn(parent, type, urnVal) {
parent."$type" {
__type "urn"
"value" urnVal
}
}
String getEpisode(String myCustomVal1, String myCustomVal2, String myCustomVal3) {
def builder = new groovy.json.JsonBuilder()
def root = builder {
addUrn(builder, "type", myCustomVal1)
addUrn(builder, "urn", "some:urn:$myCustomVal2")
"date" {
epoch 1265662800000
str "2010-02-08T21:00:00Z"
}
"metadata" ({
ratings ({
rating "G"
scheme "eirin"
addUrn(builder, "_type", "$myCustomVal3")
})
creators "Jim", "Bob", "Joe"
})
}
return root.toString();
}
But I've run into the following issues:
Whenever I call addUrn, nothing is returned in the string. Am I misunderstanding how to use methods in Groovy?
None of the values are encapsulated in double (or single) quotes in the returned string.
Anytime I use a {, I get a '_getEpisode_closure2_closure2#(insert hex)' in the returned value.
Is there something wrong with my syntax? Or can someone point me to some example/tutorial that uses methods and/or examples beyond simple values (e.g. nested values within arrays).
NOTE: This is a watered down example, but I tried to maintain the complexity around the areas that were giving me issues.
You have to use delegate in addUrn method instead of
passing the builder on which you are working.
It is because you are doing a toSting() or toPrettyString() on root instead of builder.
Solved if #2 is followed.
Sample:
def builder = new groovy.json.JsonBuilder()
def root = builder {
name "Devin"
data {
type "Test"
note "Dummy"
}
addUrn(delegate, "gender", "male")
addUrn(delegate, "zip", "43230")
}
def addUrn(parent, type, urnVal) {
parent."$type" {
__type "urn"
"value" urnVal
}
}
println builder.toPrettyString()
Output:-
{
"name": "Devin",
"data": {
"type": "Test",
"note": "Dummy"
},
"gender": {
"__type": "urn",
"value": "male"
},
"zip": {
"__type": "urn",
"value": "43230"
}
}