Is it possible to dynamically deserialize an external, of unknown length, ByteString stream from Akka HTTP into domain objects?
Context
I call an infinitely long HTTP endpoint that outputs a JSON Array that keeps growing:
[
{ "prop": true, "prop2": false, "prop3": 97, "prop4": "sample" },
{ "prop": true, "prop2": false, "prop3": 97, "prop4": "sample" },
{ "prop": true, "prop2": false, "prop3": 97, "prop4": "sample" },
{ "prop": true, "prop2": false, "prop3": 97, "prop4": "sample" },
{ "prop": true, "prop2": false, "prop3": 97, "prop4": "sample" },
...
] <- Never sees the daylight
I guess that JsonFraming.objectScanner(Int.MaxValue) should be used in this case. As docs state:
Returns a Flow that implements a "brace counting" based framing
operator for emitting valid JSON chunks. It scans the incoming data
stream for valid JSON objects and returns chunks of ByteStrings
containing only those valid chunks. Typical examples of data that one
may want to frame using this operator include: Very large arrays
So you can end up with something like this:
val response: Future[HttpResponse] = Http().singleRequest(HttpRequest(uri = serviceUrl))
response.onComplete {
case Success(value) =>
value.entity.dataBytes
.via(JsonFraming.objectScanner(Int.MaxValue))
.map(_.utf8String) // In case you have ByteString
.map(decode[MyEntity](_)) // Use any Unmarshaller here
.grouped(20)
.runWith(Sink.ignore) // Do whatever you need here
case Failure(exception) => log.error(exception, "Api call failed")
}
I had a very similar problem trying to parse the Twitter Stream (an infinite string) into a domain object.
I solved it using Json4s, like this:
case class Tweet(username: String, geolocation: Option[Geo])
case class Geo(latitude: Float, longitude: Float)
object Tweet{
def apply(s: String): Tweet = {
parse(StringInput(s), useBigDecimalForDouble = false, useBigIntForLong = false).extract[Tweet]
}
}
Then I just buffer the stream and mapped it to a Tweet:
val reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(inputStream), "UTF-8"))
var line = reader.readLine()
while(line != null){
store(Tweet.apply(line))
line = reader.readLine()
}
Json4s has full support over Option (or custom objects inside the object, like Geo in the example). Therefore, you can put an Option like I did, and if the field doesn't come in the Json, it will be set to None.
Hope it helps!
I think that play-iteratees-extras must help you. This library allow to parse Json via Enumerator/Iteratee pattern and, of course, don't waiting for receiving all data.
For example, lest build 'infinite' stream of bytes that represents 'infinite' Json array.
import play.api.libs.iteratee.{Enumeratee, Enumerator, Iteratee}
var i = 0
var isFirstWas = false
val max = 10000
val stream = Enumerator("[".getBytes) andThen Enumerator.generateM {
Future {
i += 1
if (i < max) {
val json = Json.stringify(Json.obj(
"prop" -> Random.nextBoolean(),
"prop2" -> Random.nextBoolean(),
"prop3" -> Random.nextInt(),
"prop4" -> Random.alphanumeric.take(5).mkString("")
))
val string = if (isFirstWas) {
"," + json
} else {
isFirstWas = true
json
}
Some(Codec.utf_8.encode(string))
} else if (i == max) Some("]".getBytes) // <------ this is the last jsArray closing tag
else None
}
}
Ok, this value contains jsArray of 10000 (or more) objects. Lets define case class that will be contain data of each object in our array.
case class Props(prop: Boolean, prop2: Boolean, prop3: Int, prop4: String)
Now write parser, that will be parse each item
import play.extras.iteratees._
import JsonBodyParser._
import JsonIteratees._
import JsonEnumeratees._
val parser = jsArray(jsValues(jsSimpleObject)) ><> Enumeratee.map { json =>
for {
prop <- json.\("prop").asOpt[Boolean]
prop2 <- json.\("prop2").asOpt[Boolean]
prop3 <- json.\("prop3").asOpt[Int]
prop4 <- json.\("prop4").asOpt[String]
} yield Props(prop, prop2, prop3, prop4)
}
Please, see doc for jsArray, jsValues and jsSimpleObject. To build result producer:
val result = stream &> Encoding.decode() ><> parser
Encoding.decode() from JsonIteratees package will decode bytes as CharString. result value has type Enumerator[Option[Item]] and you can apply some iteratee to this enumerator to start parsing process.
In total, I don't know how you receive bytes (the solution depends heavily on this), but I think that show one of the possible solutions of your problem.
Related
I have a json structure that I need to (sort of) flatten when serializing it into an object. Some of the elements are at the top level and some are in a sub field. In addition, 1 of the fields is an array of space delimited strings that I need to parse and represent as myString.splig(" ")[0]
So, short of a when expression to do the job, can I use something like a jsonpath query to bind to certain fields? I have thought of even doing some kind of 2-pass binding and then merging both instances.
{
"key": "FA-207542",
"fields": {
"customfield_10443": {
"value": "TBD"
},
"customfield_13600": 45,
"customfield_10900": {
"value": "Monitoring/Alerting"
},
"customfield_10471": [
"3-30536161871 (SM-2046076)"
],
"issuetype": {
"name": "Problem Mgmt - Corrective Action"
},
"created": "2022-08-11T04:46:44.000+0000",
"updated": "2022-11-08T22:11:23.000+0000",
"summary": "FA | EJWL-DEV3| ORA-00020: maximum number of processes (10000) exceeded",
"assignee": null
}
}
And, here's the data object I'd like to bind to. I have represented what they should be as jq expressions.
#Serializable
data class MajorIncident constructor(
#SerialName("key")
val id: String, // .key
val created: Instant, // .fields.created
val pillar: String, // .fields.customfield_10443.value
val impactBreadth: String?,
val duration: Duration, // .fields.customfield_13600 as minutes
val detectionSource: String, //.fields.customfield_10900.value
val updated: Instant, // .fields.updated
val assignee: String, // .fields.assignee
// "customfield_10471": [
// "3-30536161871 (SM-2046076)"
// ],
val serviceRequests: List<String>?, // .fields.customfield_10471 | map(split(" ")[0]) -
#SerialName("summary")
val title: String, //.summary
val type: String, // .fields.issuetype.name // what are options?
)
If you're using Kotlinx Serialization, I'm not sure there is any built-in support for jsonpath.
One simple option is to declare your Kotlin model in a way that matches the JSON. If you really want a flattened object, you could convert from the structured model into the flat model from Kotlin.
Another option is to write a custom serializer for your type.
I am writing a small scala practice code where my input is going to be in the fashion -
{
"code": "",
"unique ID": "",
"count": "",
"names": [
{
"Matt": {
"name": "Matt",
"properties": [
"a",
"b",
"c"
],
"fav-colour": "red"
},
"jack": {
"name": "jack",
"properties": [
"a",
"b"
],
"fav-colour": "blue"
}
}
]
}
I'll be passing this file as an command line argument.
I want to know that how do I accept the input file parse the json and use the json keys in my code?
You may use a json library such as play-json to parse the json content.
You could either operate on the json AST or you could write case classes that have the same structure as your json file and let them be parsed.
You can find the documentation of the library here.
You'll first have to add playjson as depedency to your project. If you're using sbt, just add to your build.sbt file:
libraryDependencies += "com.typesafe.play" %% "play-json" % "2.6.13"
Play json using AST
Let's read the input file:
import play.api.libs.json.Json
object Main extends App {
// first we'll need a inputstream of your json object
// this should be familiar if you know java.
val in = new FileInputStream(args(0))
// now we'll let play-json parse it
val json = Json.parse(in)
}
Let's extract some fields from the AST:
val code = (json \ "code").as[String]
val uniqueID = (json \ "unique ID").as[UUID]
for {
JsObject(nameMap) ← (json \ "names").as[Seq[JsObject]]
(name, userMeta) ← nameMap // nameMap is a Map[String, JsValue]
} println(s"User $name has the favorite color ${(userMeta \ "fav-colour").as[String]}")
Using Deserialization
As I've just described, we may create case classes that represent your structure:
case class InputFile(code: String, `unique ID`: UUID, count: String, names: Seq[Map[String, UserData]])
case class UserData(name: String, properties: Seq[String], `fav-colour`: String)
In addition you'll need to define an implicit Format e.g. in the companion object of each case class. Instead of writing it by hand you can use the Json.format macro that derives it for you:
object UserData {
implicit val format: OFormat[UserData] = Json.format[UserData]
}
object InputFile {
implicit val format: OFormat[InputFile] = Json.format[InputFile]
}
You can now deserialize your json object:
val argumentData = json.as[InputFile]
I generally prefer this approach but in your case the json structure does not fit really well. One improvement could be to add an additional getter to your InputFile class that makes accesing the fields with space and similar in the name easier:
case class InputFile(code: String, `unique ID`: UUID, count: String, names: Seq[Map[String, String]]) {
// this method is nicer to use
def uniqueId = `unique ID`
}
I know i can customize the JSON response registering JSON marshallers to Domain entities, even i can create profiles with names for different responses.
This is done filling an array that later will be marshalled like:
JSON.registerObjectMarshaller(myDomain) {
def returnArray = [:]
returnArray['id'] = it.id
returnArray['name'] = it.name
returnArray['price'] = it.price
return returnArray
}
What i want is to alter the way it gets marshalled to have two sections like
{
"paging": {
"total": 100
},
"data": [
{
"id": 1,
"description": "description 1",
}
},
...
]
}
I assume i have to implemetn a custom JSON Marshaller but i don't know how to use it for a specific response instead of wide application.
EDIT: I assume i'll need a custom RENDERER apart from the marshaller. Is this one that i don't know how to use for specific response.
What about a simple:
def json = new JSON([ paging: [ total: myArray.totalCount ], data: myArray ])
Your domain objects will be converted with the marshaller you have set up while your paging data will simply be transformed into JSON.
I need to be able to process large JSON files, instantiating objects from deserializable sub-strings as we are iterating-over/streaming-in the file.
For example:
Let's say I can only deserialize into instances of the following:
case class Data(val a: Int, val b: Int, val c: Int)
and the expected JSON format is:
{ "foo": [ {"a": 0, "b": 0, "c": 0 }, {"a": 0, "b": 0, "c": 1 } ],
"bar": [ {"a": 1, "b": 0, "c": 0 }, {"a": 1, "b": 0, "c": 1 } ],
.... MANY ITEMS .... ,
"qux": [ {"a": 0, "b": 0, "c": 0 } }
What I would like to do is:
import com.codahale.jerkson.Json
val dataSeq : Seq[Data] = Json.advanceToValue("foo").stream[Data](fileStream)
// NOTE: this will not compile since I pulled the "advanceToValue" out of thin air.
As a final note, I would prefer to find a solution that involves Jerkson or any other libraries that comes with the Play framework, but if another Scala library handles this scenario with greater ease and decent performance: I'm not opposed to trying another library. If there is a clean way of manually seeking through the file and then using a Json library to continue parsing from there: I'm fine with that.
What I do not want to do is ingest the entire file without streaming or using an iterator, as keeping the entire file in memory at a time would be prohibitively expensive.
I have not done it with JSON (and I hope someone will come up with a turnkey solution for you) but done it with XML and here is a way of handling it.
It is basically a simple Map->Reduce process with the help of stream parser.
Map (your advanceTo)
Use a streaming parser like JSON Simple (not tested). When on the callback you match your "path", collect anything below by writing it to a stream (file backed or in-memory, depending on your data). That will be your foo array in your example. If your mapper is sophisticated enough, you may want to collect multiple paths during the map step.
Reduce (your stream[Data])
Since the streams you collected above look pretty small, you probably do not need to map/split them again and you can parse them directly in memory as JSON objects/arrays and manipulate them (transform, recombine, etc...).
Here is the current way I am solving the problem:
import collection.immutable.PagedSeq
import util.parsing.input.PagedSeqReader
import com.codahale.jerkson.Json
import collection.mutable
private def fileContent = new PagedSeqReader(PagedSeq.fromFile("/home/me/data.json"))
private val clearAndStop = ']'
private def takeUntil(readerInitial: PagedSeqReader, text: String) : Taken = {
val str = new StringBuilder()
var readerFinal = readerInitial
while(!readerFinal.atEnd && !str.endsWith(text)) {
str += readerFinal.first
readerFinal = readerFinal.rest
}
if (!str.endsWith(text) || str.contains(clearAndStop))
Taken(readerFinal, None)
else
Taken(readerFinal, Some(str.toString))
}
private def takeUntil(readerInitial: PagedSeqReader, chars: Char*) : Taken = {
var taken = Taken(readerInitial, None)
chars.foreach(ch => taken = takeUntil(taken.reader, ch.toString))
taken
}
def getJsonData() : Seq[Data] = {
var data = mutable.ListBuffer[Data]()
var taken = takeUntil(fileContent, "\"foo\"")
taken = takeUntil(taken.reader, ':', '[')
var doneFirst = false
while(taken.text != None) {
if (!doneFirst)
doneFirst = true
else
taken = takeUntil(taken.reader, ',')
taken = takeUntil(taken.reader, '}')
if (taken.text != None) {
print(taken.text.get)
places += Json.parse[Data](taken.text.get)
}
}
data
}
case class Taken(reader: PagedSeqReader, text: Option[String])
case class Data(val a: Int, val b: Int, val c: Int)
Granted, This code doesn't exactly handle malformed JSON very cleanly and to use for multiple top-level keys "foo", "bar" and "qux", will require looking ahead (or matching from a list of possible top-level keys), but in general: I believe this does the job. It's not quite as functional as I'd like and isn't super robust but PagedSeqReader definitely keeps this from getting too messy.
I'm trying to parse some problematic Json in Scala using Play Json and using implicit, but not sure how to proceed...
The Json looks like this:
"rules": {
"Some_random_text": {
"item_1": "Some_random_text",
"item_2": "text",
"item_n": "MoreText",
"disabled": false,
"Other_Item": "thing",
"score": 1
},
"Some_other_text": {
"item_1": "Some_random_text",
"item_2": "text",
"item_n": "MoreText",
"disabled": false,
"Other_Item": "thing",
"score": 1
},
"Some_more_text": {
"item_1": "Some_random_text",
"item_2": "text",
"item_n": "MoreText",
"disabled": false,
"Other_Item": "thing",
"score": 1
}
}
I'm using an implicit reader but because each top level item in rules is effectively a different thing I don't know how to address that...
I'm trying to build a case class and I don't actually need the random text heading for each item but I do need each item.
To make my life even harder after these items are lots of things in other formats which I really don't need. They are unnamed items which just start:
{
random legal Json...
},
{
more Json...
}
I need to end up with the Json I'm parsing in a seq of case classes.
Thanks for your thoughts.
I'm using an implicit reader but because each top level item in rules is effectively a different thing I don't know how to address that...
Play JSON readers depend on knowing names of fields in advance. That goes for manually constructed readers and also for macro generated readers. You cannot use an implicit reader in this case. You need to do some traversing first and extract pieces of Json that do have regular structure with known names and types of fields. E.g. like this:
case class Item(item_1: String, item_2: String, item_n: String, disabled: Boolean, Other_Item: String, score: Int)
implicit val itemReader: Reads[Item] = Json.reads[Item]
def main(args: Array[String]): Unit = {
// parse JSON text and assume, that there is a JSON object under the "rules" field
val rules: JsObject = Json.parse(jsonText).asInstanceOf[JsObject]("rules").asInstanceOf[JsObject]
// traverse all fields, filter according to field name, collect values
val itemResults = rules.fields.collect {
case (heading, jsValue) if heading.startsWith("Some_") => Json.fromJson[Item](jsValue) // use implicit reader here
}
// silently ignore read errors and just collect sucessfully read items
val items = itemResults.flatMap(_.asOpt)
items.foreach(println)
}
Prints:
Item(Some_random_text,text,MoreText,false,thing,1)
Item(Some_random_text,text,MoreText,false,thing,1)
Item(Some_random_text,text,MoreText,false,thing,1)