Optimal way to read out JSON from MongoDB into a Scalatra API - json

I have a pre-formatted JSON blob stored as a string in MongoDB as a field in one of collections. Currently in my Scalatra based API, I have a before filter that renders all of my responses with a JSON content type. An example of how I return the content looks like the following:
get ("/boxscore", operation(getBoxscore)) {
val game_id:Int = params.getOrElse("game_id", "3145").toInt
val mongoColl = mongoDb.apply("boxscores")
val q: DBObject = MongoDBObject("game_id" -> game_id)
val res = mongoColl.findOne(q)
res match {
case Some(j) => JSON.parseFull(j("json_body").toString)
case None => NotFound("Requested document could not be found.")
}
}
Now this certainly does work. It doesn't seem the "Scala" way of doing things and I feel like this can be optimized. The worrisome part to me is when I add a caching layer and a cache does not hit that I am spending additional CPU time on re-parsing a String I already formatted as JSON in MongoDB:
JSON.parseFull(j("json_body").toString)
I have to take the result from findOne(), run .toString on it, then re-parse it into JSON afterwards. Is there a more optimal route? Since the JSON is already stored as a String in MongoDB, I'm guessing a serializer / case class isn't the right solution here. Of course I can just leave what's here - but I'd like to learn if there's a way that would be more Scala-like and CPU friendly going forward.

There is the option to extend Scalatra's render pipeline with handling for MongoDB classes. The following two routes act as an example. They return a MongoCursor and a DBObject as result. We are going to convert those to a string.
get("/") {
mongoColl.find
}
get("/:key/:value") {
val q = MongoDBObject(params("key") -> params("value"))
mongoColl.findOne(q) match {
case Some(x) => x
case None => halt(404)
}
}
In order to handle the types we need to define a partial function which takes care of the conversion and sets the appropriate content type.
There are two cases, the first one handles a DBObject. The content type is set to "application/json" and the object is converted to a string by calling the toString method. The second case handles a MongoCursor. Since it implements TraversableOnce the map function can be used.
def renderMongo = {
case dbo: DBObject =>
contentType = "application/json"
dbo.toString
case xs: TraversableOnce[_] => // handles a MongoCursor, be aware of type erasure here
contentType = "application/json"
val ls = xs map (x => x.toString) mkString(",")
"[" + ls + "]"
}: RenderPipeline
(Note the following type definition: type RenderPipeline = PartialFunction[Any, Any])
Now the method needs to get hooked in. After a HTTP call has been handled the result is forwarded to the render pipeline for further conversion. Custom handling can be added by overriding the renderPipeline method from ScalatraBase. With the following definition the renderMongo function is called first:
override protected def renderPipeline = renderMongo orElse super.renderPipeline
This is a basic approach to handle MongoDB types. There are other options as well, for example by making use of json4s-mongo.
Here is the previous code in a working sample project.

Related

How can I use http request headers for content negotiation in a Mashaller?

My app supports protobuf and JSON serialzation. For JSON serialization I use com.trueaccord.scalapb.json.JsonFormat, my dtos are generated from proto definitions.
The com.trueaccord serializer wraps option types to JSON objects which is causing issues for some clients so I want to be able to support org.json4s without braking the existing clients.
I would like to be able to pick a serializer based on a custom http header called JFORMAT. The idea is that if this header is sent I will use json4s otherwise I will use the trueaccord serializer.
I managed to create a Unmarshaller which can pick a request serializer based on a header value:
Unmarshaller.withMaterializer[HttpRequest, T](_ => implicit mat => {
case request: HttpRequest =>
val entity = request.entity
entity.dataBytes.runFold(ByteString.empty)(_ ++ _).map(data => {
entity.contentType match {
case `applicationJsonContentType` =>
val jsFormat = {
val header = request.headers.find(h => h.name() == jsonFormatHeaderName)
if (header.isEmpty) "1.0" else header.get.value()
}
val charBuffer = Unmarshaller.bestUnmarshallingCharsetFor(entity)
val jsonText = data.decodeString(charBuffer.nioCharset().name())
val dto = if(jsFormat == "2.0") {
write[T](value)(formats) // New Formatter
} else {
JsonFormat.fromJsonString[T](jsonText) // Old Formatter
}
dto
case `protobufContentType` =>
companion.parseFrom(CodedInputStream.newInstance(data.asByteBuffer)) // Proto Formatter
case _ =>
throw UnsupportedContentTypeException(applicationJsonContentType, protobufContentType)
}
})
I want to do the same with my Marshaller which I use with Marshaller.oneOf and the JSON handling one looks like:
Marshaller.withFixedContentType(contentType) { value =>
val jsonText = JsonSerializer.toJsonString[T](value)
HttpEntity(contentType, jsonText)
}
Is there a way to construct a Mashaller which is aware of the request http headers? The Akka HTTP docs don't have any examples and I cannot make sense of the PredefinedToRequestMarshallers.
Do I need to combine multiple marshallers somehow or can I append some metadata to a context during the request serialization I can use later in the Marshaller? I want to avoid appending meta to my dto if possible or using a custom content type like application/vnd.api+json
There are lots of other useful info I could use from the request when I format the response like Accept-Encoding, custom headers like unique request id to create a correlation id, I could add JSONP support by reading the callback query parmeter, etc.
To clarify: I need a solution to use the Mashaller, subclass of it or a custom version created by a factory method or maybe multiple Marshallers chained together. Marshaller.withFixedContentType already using the Accept header so there must be a way. I added added bounty to reward a solution to a specific challenge. I am ware of hacks and workarounds and I asked the question because I need a clean solution solving a specific scenario.
Custom Marshallers section mentions Marshaller.oneOf overloaded methods, that seems to be what you want:
Helper for creating a "super-marshaller" from a number of
"sub-marshallers". Content-negotiation determines, which
"sub-marshaller" eventually gets to do the job.
The Marshaller companion object has many methods that receive a Seq[HttpHeader]. You can look into their implementations as well.
I don't have the time to look into the source code myself, but if this is not enough to put you on the right path, let me know.
Edit:
How about?
get {
optionalHeaderValueByName("JFORMAT") { format =>
complete {
format match {
case Some(f) => "Complete with json4s"
case _ => "Complete with trueaccord"
}
}
}
}

Apply key to map obtained via pattern matching in Scala (type erased)

I am trying to query an API which returns a JSON array (e.g. [{"name":"obj1", "value":5}, {"name":"obj2", "value":2}]) and process the result, which gets parsed as an Option[List[Map[String,Any]]]. However, I am not sure how to properly extract each Map, since the types are erased at runtime.
import scala.util.parsing.json._
import scalaj.http._
val url = "https://api.github.com/users/torvalds/repos"
val req = Http(url).asString
val parsed = JSON.parseFull(req.body) match {
case Some(data) => data match {
case list: List[_] => list
case _ => sys.error("Result is not a list.")
}
case None => sys.error("Invalid JSON received.")
}
parsed.foreach{
case x: Map[_,_] => x.get("full_name") // error here
}
The error occurs because I cannot apply the function with a String key type. However, because of type erasure, the key and value type are unknown, and specifying that it's a String map throws compiler warnings.
Am I going about things the wrong way? Or maybe I'd have better luck with a different HTTP/JSON library?
You can replace your last line with:
parsed.collect{ case x: Map[_,_] => x.asInstanceOf[Map[String,Any]].get("full_name") }
We sort of "cheat" here since we know the keys in a JSON are always Strings.
As for your last question, if you need something lightweight, I think what you have here is as simple as it gets.
Take a look at this SO post if you want to do something more powerful with your pattern matching.

How to get all data I have Inserted?

I have made a small app using json and reactiveMongo which inserts
Students Information.
object Applications extends Controller{
val studentDao = StudentDaoAndEntity
val studentqueryReader: Reads[JsObject] = implicitly[Reads[JsObject]]
def saveStudent = Action.async(parse.json) { request =>
request.body.validate[StudentInfo].map {
k => studentDao.insertStudent(k).map {
l => Ok("Successfully inserted")
}
}.getOrElse(Future.successful(BadRequest("Invalid Json"))
In databse
object StudentDaoAndEntity {
val sreader: Reads[StudentInfo] = Json.reads[StudentInfo]
val swriter: Writes[StudentInfo] = Json.writes[StudentInfo]
val studentqueryReader: Reads[JsObject] = implicitly[Reads[JsObject]]
def db = ReactiveMongoPlugin.db
def collection: JSONCollection = db[JSONCollection]("student")
def insertStudent(student: StudentInfo): Future[JsObject]= {
val modelToJsObj = swriter.writes(student).as[JsObject]
collection.insert(modelToJsObj) map (_ => modelToJsObj)
}
This works fine. Now I need to get all data I have inserted. How can I
do that? I am not asking for code but for Idea.
First of all: it seems that you are using Play-ReactiveMongo (as far as i know, JSONCollection is not part of ReactiveMongo itself). If this is the case, then your code is unnecessarily complex. Instead of doing the JSON conversions manually, you just can pass your StudentInfo objects directly to insert. Minimal example:
val studentInfo: StudentInfo = ...
def collection: JSONCollection = db[JSONCollection]("student")
collection.insert(studentInfo)
That's the elegant part of the Play plugin. Yes, MongoDB persists data as JSON (or BSON, to be more precise), but you don't have do deal with it. Just make sure the implicit Writes (or Reads, in case of querying) is in scope, as well as other necessary imports (e.g. play.modules.reactivemongo.json._).
Now I need to get all data I have inserted. How can I do that? I am
not asking for code but for Idea.
Well, you want to have a look at the documentation (scroll down for examples), it's quite simple and there is not more to it. In your case, it could look like this:
// perform query via cursor
val cursor: Cursor[StudentInfo] =
collection.find(Json.obj("lastName" -> "Regmi")).cursor[StudentInfo]
// gather results as list
val futureStudents: Future[List[StudentInfo]] = cursor.collect[List]()
In this case, you get all students with the last name Regmi. If you really want to retrieve all students, then you probably need to pass an empty JsObject as your query. Again, it's not necessary to deal with JSON conversions, as long as the implicit Reads is in scope.
Here is the Complete Answer of my own Question
package controllers
def findAll=Action.async {
val cursor = Json.obj()
StudentDaoAndEntity.findAllStudent(cursor) map {
case Nil => Ok("Student Not Found")
case l:Seq[JsObject] => Ok(Json.toJson(l))
}
}
def findAllStudent(allStd: JsObject): Future[Seq[JsObject]] = {
// gather all the JsObjects in a list
collection.find(allStd).cursor[JsObject].collect[List]()
}

Spray Client I want to return String from Json response

Sorry, but I am new to Scala. I have read about Futures and Akka, however I still have issue returning a string for my method.
I have a method getAuthString which should return Authentication String(or Token).
I have used spray Jsonsupport and I can print the result
def getToken(url: String, username: String , password: String) = Future[String]{
import MyJsonProtocol._
import spray.httpx.SprayJsonSupport._
val pipeline: HttpRequest => Future[AuthTokenResult[Entry]] = (addCredentials(BasicHttpCredentials(username, password))
~> sendReceive
~> unmarshal[AuthTokenResult[Entry]]
)
val myfutureResponse: Future[AuthTokenResult[Entry]] = pipeline(Get(url))
myfutureResponse onComplete {
case Success(AuthTokenResult(Entry(Content(authString)):: _)) => println(authString)
case Failure(error) => println("An error has occured: " + error.getMessage)
}
this unmarshal the json and print the desired authString. However, printing is no good to me. I know onComplete returns unit. I want to return authString so that I can use it somewhere else with another request. I think I will have to use flatmap or map, but I am not sure how. I need my method to return authString or error.
You don't want to return a String, you want to return a Future[String] - once something is async the only way to make it not async is to block, and that's (usually) a waste, making the whole async-ness pointless.
I'm not sure why you're wrapping the whole thing in a Future either - the trivial bits of computation can happen on their own, there's little value in forcing them onto a separate thread. So you want something like:
def getToken(url: String, ...): Future[String] = {
...
val myFutureResponse: Future[AuthTokenResult[Entry]] = ...
myFutureResponse map {
case AuthTokenResult(Entry(Content(authString))::_) => authString
}
}
So you use map to transform a Future into another Future with a computation. This will "pass through" errors, but you can use something like recover or recoverWith if you want to handle them in a particular way.
Then when you want to use your Future[String] in a Spray route, you can use the onSuccess or onComplete directives:
val myRoute = (path("/somewhere") & parameter("authData") {
authData =>
onSuccess(getToken(authData)) {
authToken =>
complete("Authed as " + authToken)
}
}
This will use the Future in a proper async, reactive way, without blocking.

Suggestions for Writing Map as JSON file in Scala

I have a simple single key-valued Map(K,V) myDictionary that is populated by my program and at the end I want to write it as JSON format string in a text file - as I would need parse them later.
I was using this code earlier,
Some(new PrintWriter(outputDir+"/myDictionary.json")).foreach{p => p.write(compact(render(decompose(myDictionary)))); p.close}
I found it to be slower as the input size increased. Later, I used this var out = new
var out = new PrintWriter(outputDir+"/myDictionary.json");
out.println(scala.util.parsing.json.JSONObject(myDictionary.toMap).toString())
This is proving to be bit faster.
I have run this for sample input and found that this is faster than my earlier approach. I assuming my input map size would reach at least a million values( >1GB text file) (K,V) hence I want to make sure that I follow the faster and memory efficient approach for Map serialization process.What are other approaches that you would recommend,that I can look into to optimize this.
The JSON support in the standard Scala library is probably not the best choice. Unfortunately the situation with JSON libraries for Scala is a bit confusing, there are many alternatives (Lift JSON, Play JSON, Spray JSON, Twitter JSON, Argonaut, ...), basically one library for each day of the week... I suggest you have a look at these at least to see if any of them is easier to use and more performative.
Here is an example using Play JSON which I have chosen for particular reasons (being able to generate formats with macros):
object JsonTest extends App {
import play.api.libs.json._
type MyDict = Map[String, Int]
implicit object MyDictFormat extends Format[MyDict] {
def reads(json: JsValue): JsResult[MyDict] = json match {
case JsObject(fields) =>
val b = Map.newBuilder[String, Int]
fields.foreach {
case (k, JsNumber(v)) => b += k -> v.toInt
case other => return JsError(s"Not a (string, number) pair: $other")
}
JsSuccess(b.result())
case _ => JsError(s"Not an object: $json")
}
def writes(m: MyDict): JsValue = {
val fields: Seq[(String, JsValue)] = m.map {
case (k, v) => k -> JsNumber(v)
} (collection.breakOut)
JsObject(fields)
}
}
val m = Map("hallo" -> 12, "gallo" -> 34)
val serial = Json.toJson(m)
val text = Json.stringify(serial)
println(text)
val back = Json.fromJson[MyDict](serial)
assert(back == JsSuccess(m), s"Failed: $back")
}
While you can construct and deconstruct JsValues directly, the main idea is to use a Format[A] where A is the type of your data structure. This puts more emphasis on type safety than the standard Scala-Library JSON. It looks more verbose, but in end I think it's the better approach.
There are utility methods Json.toJson and Json.fromJson which look for an implicit format of the type you want.
On the other hand, it does construct everything in-memory and it does duplicate your data structure (because for each entry in your map you will have another tuple (String, JsValue)), so this isn't necessarily the most memory efficient solution, given that you are operating in the GB magnitude...
Jerkson is a Scala wrapper for the Java JSON library Jackson. The latter apparently has the feature to stream data. I found this project which says it adds streaming support. Play JSON in turn is based on Jerkson, so perhaps you can even figure out how to stream your object with that. See also this question.