How to stream insert JSON array to BigQuery table in Apache Beam - json

My apache beam application receives a message in JSON array but insert each row to a BigQuery table. How can I support this usecase in ApacheBeam? Can I split each row and insert it to table one by one?
JSON message example:
[
{"id": 1, "name": "post1", "price": 10},
{"id": 2, "name": "post2", "price": 20},
{"id": 3, "name": "post3", "price": 30}
]
BigQuery table schema:
[
{
"mode": "REQUIRED",
"name": "id",
"type": "INT64"
},
{
"mode": "REQUIRED",
"name": "name",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "price",
"type": "INT64"
}
]

Here is my solution. I converted JSON string to List once then c.output one by one. My code in in Scala but you can do the same thing in Java.
case class MyTranscationRecord(id: String, name: String, price: Int)
case class MyTranscation(recordList: List[MyTranscationRecord])
class ConvertJSONTextToMyRecord extends DoFn[KafkaRecord[java.lang.Long, String], MyTranscation]() {
private val logger: Logger = LoggerFactory.getLogger(classOf[ConvertJSONTextToMyRecord])
#ProcessElement
def processElement(c: ProcessContext): Unit = {
try {
val mapper: ObjectMapper = new ObjectMapper()
.registerModule(DefaultScalaModule)
val messageText = c.element.getKV.getValue
val transaction: MyRecord = mapper.readValue(messageText, classOf[MyTranscation])
logger.info(s"successfully converted to an EPC transaction = $transaction")
for (record <- transaction.recordList) {
c.output(record)
}
} catch {
case e: Exception =>
val message = e.getLocalizedMessage + e.getStackTrace
logger.error(message)
}
}
}

Related

Scala Json parsing

This is the input json which I am getting which is nested json structure and I don't want to map directly to class, need custom parsing of some the objects as I have made the case classes
{
"uuid": "b547e13e-b32d-11ec-b909-0242ac120002",
"log": {
"Response": {
"info": {
"receivedTime": "2022-02-09T00:30:00Z",
"isSecure": "Yes",
"Data": [{
"id": "75641",
"type": "vendor",
"sourceId": "3",
"size": 53
}],
"Group": [{
"isActive": "yes",
"metadata": {
"owner": "owner1",
"compressionType": "gz",
"comments": "someComment",
"createdDate": "2022-01-11T11:00:00Z",
"updatedDate": "2022-01-12T14:17:55Z"
},
"setId": "1"
},
{
"isActive": "yes",
"metadata": {
"owner": "owner12",
"compressionType": "snappy",
"comments": "someComment",
"createdDate": "2022-01-11T11:00:00Z",
"updatedDate": "2022-01-12T14:17:55Z"
},
"setId": "2"
},
{
"isActive": "yes",
"metadata": {
"owner": "owner123",
"compressionType": "snappy",
"comments": "someComment",
"createdDate": "2022-01-11T11:00:00Z",
"updatedDate": "2022-01-12T14:17:55Z"
},
"setId": "4"
},
{
"isActive": "yes",
"metadata": {
"owner": "owner124",
"compressionType": "snappy",
"comments": "someComments",
"createdDate": "2022-01-11T11:00:00Z",
"updatedDate": "2022-01-12T14:17:55Z"
},
"setId": "4"
}
]
}
}
}
}
Code that I am trying play json also tried circe . Please help ..New to scala world
below is object and case class
case class DataCatalog(uuid: String, data: Seq[Data], metadata: Seq[Metadata])
object DataCatalog {
case class Data(id: String,
type: String,
sourceId: Option[Int],
size: Int)
case class Metadata(isActive: String,
owner: String,
compressionType: String,
comments: String,
createdDate: String,
updatedDate: String
)
def convertJson(inputjsonLine: String): Option[DataCatalog] = {
val result = Try {
//val doc: Json = parse(line).getOrElse(Json.Null)
//val cursor: HCursor = doc.hcursor
//val uuid: Decoder.Result[String] = cursor.downField("uuid").as[String]
val lat = (inputjsonLine \ "uuid").get
DataCatalog(uuid, data, group)
}
//do pattern matching
result match {
case Success(dataCatalog) => Some(dataCatalog)
case Failure(exception) =>
}
}
}
Any parsing api is fine.
If you use Scala Play, for each case class you should have an companion object which will help you a lot with read/write object in/from json:
object Data {
import play.api.libs.json._
implicit val read = Json.reads[Data ]
implicit val write = Json.writes[Data ]
def tupled = (Data.apply _).tupled
}
object Metadata {
import play.api.libs.json._
implicit val read = Json.reads[Metadata ]
implicit val write = Json.writes[Metadata ]
def tupled = (Metadata.apply _).tupled
}
Is required as each companion object to be in same file as the case class. For your json example, you need more case classes because you have a lot of nested objects there (log, Response, info, each of it)
or, you can read the field which you're interested in as:
(jsonData \ "fieldName").as[CaseClassName]
You can try to access the Data value as:
(jsonData \ "log" \ "Response" \ "info" \ "Data").as[Data]
same for Metadata

How to deserialize dynamic Json with pure Kotlin Serialization

I am investigating the use of Kotlin Serialization in my latest project
The Json I have to contend with is as follows:-
{
"countryCodeMapping": {
"ssd": [
{
"name": "South Sudan",
"alpha2": "ss",
"alpha3": "ssd"
}
],
"nic": [
{
"name": "Nicaragua",
"alpha2": "ni",
"alpha3": "nic"
}
],
"fin": [
{
"name": "Finland",
"alpha2": "fi",
"alpha3": "fin"
}
],
"cck": [
{
"name": "Cocos (Keeling) Islands",
"alpha2": "cc",
"alpha3": "cck"
}
],
"tun": [
{
"name": "Tunisia",
"alpha2": "tn",
"alpha3": "tun"
}
],
"uga": [
{
"name": "Uganda",
"alpha2": "ug",
"alpha3": "uga"
}
],
....
"myDescriptions": {
"1": {
"groupId": 1,
"description": "cvbxcvbxcvbxbxb",
"myTypes": [
"cxcvbxb",
"xcvxcbxxx",
"cvbxcvbxcxvcb"
]
},
"2": {
"groupId": 2,
"description": "vxvbxbxbxbx",
"myTypes": [
"xcvb",
"xcvbxcvb"
]
},
"3": {
"groupId": 3,
"description": "xcvbxcvbxcvbxcvb",
"myTypes": [
"xcvbxcvbxcvbx"
]
},
"4": {
"groupId": 4,
"description": "xcvbxcbxcvbxcb",
"myTypes": [
"xcvbxcxcvb"
]
},
"5": {
"groupId": 5,
"description": "xcvbxcvbxcvbxcvbx",
"myTypes": [
"xcvbxcvbxc",
"xcvbxcvbxcvbxb"
]
},
"6": {
"groupId": 6,
"description": "dfgsdfgsdgsdfgsdgsg",
"myTypes": [
I must decode the entire Json string, however I started with the countryCodeMapping
section as follows:-
#Serializable
data class Meta(
#SerialName("countryCodeMapping")
val mapping: CountryCodeMapping
)
#Serializable
data class CountryCodeMapping(val ssd: List<Country>)
#Serializable
data class Country(
#SerialName("name")
val name: String,
#SerialName("alpha2")
val alpha2: String,
#SerialName("alpha3")
val alpha3: String
)
val countryMap: Meta = Json {
isLenient = true
ignoreUnknownKeys = true
allowStructuredMapKeys = true
}.decodeFromString<Meta>(metaJson)
This works, however I obviously only decode Sudan
countryMap = Meta(mapping=CountryCodeMapping(ssd=[Country(name=South Sudan, alpha2=ss, alpha3=ssd)]))
What data class models do I require to enable me to achieve my goal?
UPDATE
The changes I had to make (Based on answer comment) were
#Serializable
data class Meta(
#SerialName("countryCodeMapping")
val mapping: Map<String, List<Country>>
)
#Serializable
data class Country(
#SerialName("name")
val name: String? = null,
#SerialName("alpha2")
val alpha2: String? = null,
#SerialName("alpha3")
val alpha3: String? = null
)
You can just use a regular map here:
val mapping: Map<String, List<Country>>
When you have JSON objects like { "someKey": ..., "otherKey": ... }, you have mainly 2 options to represent them:
use a regular class with properties
use a generic Map<String, ...>
You current approach uses option 1 (with CountryCodeMapping class, and hardcoded ssd property). This approach is best when the keys are fixed and well known.
Approach 2 is better when the keys are dynamic and all the values have the same type, which is your case here.
To implement approach 2, use what #broot suggested: a Map<String, List<Country>> instead of the CountryCodeMapping class:
#Serializable
data class Meta(
#SerialName("countryCodeMapping")
val mapping: Map<String, List<Country>>
)
#Serializable
data class Country(
#SerialName("name")
val name: String,
#SerialName("alpha2")
val alpha2: String,
#SerialName("alpha3")
val alpha3: String
)

How to parse just part of JSON with Klaxon?

I'm trying to parse some JSON to kotlin objects. The JSON looks like:
{
data: [
{ "name": "aaa", "age": 11 },
{ "name": "bbb", "age": 22 },
],
otherdata : "don't need"
}
I just need to data part of the entire JSON, and parse each item to a User object:
data class User(name:String, age:Int)
But I can't find an easy way to do it.
Here's one way you can achieve this
import com.beust.klaxon.Klaxon
import java.io.StringReader
val json = """
{
"data": [
{ "name": "aaa", "age": 11 },
{ "name": "bbb", "age": 22 },
],
"otherdata" : "not needed"
}
""".trimIndent()
data class User(val name: String, val age: Int)
fun main(args: Array<String>) {
val klaxon = Klaxon()
val parsed = klaxon.parseJsonObject(StringReader(json))
val dataArray = parsed.array<Any>("data")
val users = dataArray?.let { klaxon.parseFromJsonArray<User>(it) }
println(users)
}
This will work as long as you can fit the whole json string in memory. Otherwise you may want to look into the streaming API: https://github.com/cbeust/klaxon#streaming-api

How to get value from JsValue?

For Example:
My Db stores following Json. Form Following json I need to extract the value of particular field.
"student": [
{
"name": "Xyz",
"college": "abc",
"student_id":{
"$oid": "59a9314f6d0000920962e247"
}},
{
"name": "DDD",
"college": "opop",
"student_id":{
"$oid": "59a9314f6d0000920962e257"
}}
]
How can I pick only the value of "$oid" and save json in following way:
"student": [
{
"name": "Xyz",
"college": "abc",
"student_id":
"59a9314f6d0000920962e247"
},
{
"name": "DDD",
"college": "opop",
"student_id":
"59a9314f6d0000920962e257"
}
]
In my scenario, I'm reading it from client side as -
String Json = null;
JsonNode body = request().body().asJson();
Json = body.toString();
Logger.info(Json);
String role = body.get("role").get("role").asText();
users.firstName = body.get("firstName").asText();
users.lastName = body.get("lastName").asText();
You need to change the definition of JsonNode body = request().body().asJson(); and the other code as per your scenario to get it from db.
You will need to replace "student_id" jsValue with string value as below:
val original: JsValue = Json.parse(
""" {"student": [
{
"name": "Xyz",
"college": "abc",
"student_id":{
"$oid": "59a9314f6d0000920962e247"
}},
{
"name": "DDD",
"college": "opop",
"student_id":{
"$oid": "59a9314f6d0000920962e257"
}}
]}""")
val changed = original.as[JsObject] ++ Json.obj(
"student" -> Json.arr {
original.transform((__ \ 'student)
.json.pick[JsArray])
.getOrElse(Json.arr())
.value.map(e => {
val value = e.transform((__ \ 'student_id \ '$oid).json.pick[JsString]).get
e.as[JsObject] ++ Json.obj("student_id" -> value)
})
})
println(Json.stringify(changed))
//Result:
{"student":[[{"name":"Xyz","college":"abc","student_id":"59a9314f6d0000920962e247"},{"name":"DDD","college":"opop","student_id":"59a9314f6d0000920962e257"}]]}

Put Data in mutlple branch of Array : Json Transformer ,Scala Play

i want to add values to all the arrays in json object.
For eg:
value array [4,2.5,2.5,1.5]
json =
{
"items": [
{
"id": 1,
"name": "one",
"price": {}
},
{
"id": 2,
"name": "two"
},
{
"id": 3,
"name": "three",
"price": {}
},
{
"id": 4,
"name": "four",
"price": {
"value": 1.5
}
}
]
}
i want to transform the above json in
{
"items": [
{
"id": 1,
"name": "one",
"price": {
"value": 4
}
},
{
"id": 2,
"name": "two",
"price": {
"value": 2.5
}
},
{
"id": 3,
"name": "three",
"price": {
"value": 2.5
}
},
{
"id": 4,
"name": "four",
"price": {
"value": 1.5
}
}
]
}
Any suggestions on how do i achieve this. My goal is to put values inside the specific fields of json array. I am using play json library throughout my application. What other options do i have instead of using json transformers.
You may use simple transformation like
val prices = List[Double](4,2.5,2.5,1.5).map(price => Json.obj("price" -> Json.obj("value" -> price)))
val t = (__ \ "items").json.update(
of[List[JsObject]]
.map(_.zip(prices).map(o => _._1 ++ _._2))
.map(JsArray)
)
res5: play.api.libs.json.JsResult[play.api.libs.json.JsObject] = JsSuccess({"items":[{"id":1,"name":"one","price":{"value":4}},{"id":2,"name":"two","price":{"value":2.5}},{"id":3,"name":"three","price":{"value":2.5}},{"id":4,"name":"four","price":{"value":1.5}}]},/items)
I suggest using classes, but not sure this fits to your project because it's hard to guess how your whole codes look like.
I put new Item manually for simplicity. You can create items using Json library :)
class Price(val value:Double) {
override def toString = s"{value:${value}}"
}
class Item(val id: Int, val name: String, val price: Price) {
def this(id: Int, name: String) {
this(id, name, null)
}
override def toString = s"{id:${id}, name:${name}, price:${price}}"
}
val price = Array(4, 2.5, 2.5, 1.5)
/** You might convert Json data to List[Item] using Json library instead. */
val items: List[Item] = List(
new Item(1, "one"),
new Item(2, "two"),
new Item(3, "three"),
new Item(4, "four", new Price(1.5))
)
val valueMappedItems = items.zipWithIndex.map{case (item, index) =>
if (item.price == null) {
new Item(item.id, item.name, new Price(price(index)))
} else {
item
}
}