restart iterator on exceptions in Scala - exception

I have an iterator (actually a Source.getLines) that's reading an infinite stream of data from a URL. Occasionally the iterator throws a java.io.IOException when there is a connection problem. In such situations, I need to re-connect and re-start the iterator. I want this to be seamless so that the iterator just looks like a normal iterator to the consumer, but underneath is restarting itself as necessary.
For example, I'd like to see the following behavior:
scala> val iter = restartingIterator(() => new Iterator[Int]{
var i = -1
def hasNext = {
if (this.i < 3) {
true
} else {
throw new IOException
}
}
def next = {
this.i += 1
i
}
})
res0: ...
scala> iter.take(6).toList
res1: List[Int] = List(0, 1, 2, 3, 0, 1)
I have a partial solution to this problem, but it will fail on some corner cases (e.g. an IOException on the first item after a restart) and it's pretty ugly:
def restartingIterator[T](getIter: () => Iterator[T]) = new Iterator[T] {
var iter = getIter()
def hasNext = {
try {
iter.hasNext
} catch {
case e: IOException => {
this.iter = getIter()
iter.hasNext
}
}
}
def next = {
try {
iter.next
} catch {
case e: IOException => {
this.iter = getIter()
iter.next
}
}
}
}
I keep feeling like there's a better solution to this, maybe some combination of Iterator.continually and util.control.Exception or something like that, but I couldn't figure one out. Any ideas?

This is fairly close to your version and using scala.util.control.Exception:
def restartingIterator[T](getIter: () => Iterator[T]) = new Iterator[T] {
import util.control.Exception.allCatch
private[this] var i = getIter()
private[this] def replace() = i = getIter()
def hasNext: Boolean = allCatch.opt(i.hasNext).getOrElse{replace(); hasNext}
def next(): T = allCatch.opt(i.next).getOrElse{replace(); next}
}
For some reason this is not tail recursive but it that can be fixed by using a slightly more verbose version:
def restartingIterator2[T](getIter: () => Iterator[T]) = new Iterator[T] {
import util.control.Exception.allCatch
private[this] var i = getIter()
private[this] def replace() = i = getIter()
#annotation.tailrec def hasNext: Boolean = {
val v = allCatch.opt(i.hasNext)
if (v.isDefined) v.get else {replace(); hasNext}
}
#annotation.tailrec def next(): T = {
val v = allCatch.opt(i.next)
if (v.isDefined) v.get else {replace(); next}
}
}
Edit: There is a solution with util.control.Exception and Iterator.continually:
def restartingIterator[T](getIter: () => Iterator[T]) = {
import util.control.Exception.allCatch
var iter = getIter()
def f: T = allCatch.opt(iter.next).getOrElse{iter = getIter(); f}
Iterator.continually { f }
}

There is a better solution, the Iteratee:
http://apocalisp.wordpress.com/2010/10/17/scalaz-tutorial-enumeration-based-io-with-iteratees/
Here is for example an enumerator that restarts on encountering an exception.
def enumReader[A](r: => BufferedReader, it: IterV[String, A]): IO[IterV[String, A]] = {
val tmpReader = r
def loop: IterV[String, A] => IO[IterV[String, A]] = {
case i#Done(_, _) => IO { i }
case Cont(k) => for {
s <- IO { try { val x = tmpReader.readLine; IO(x) }
catch { case e => enumReader(r, it) }}.join
a <- if (s == null) k(EOF) else loop(k(El(s)))
} yield a
}
loop(it)
}
The inner loop advances the Iteratee, but the outer function still holds on to the original. Since Iteratee is a persistent data structure, to restart you just have to call the function again.
I'm passing the Reader by name here so that r is essentially a function that gives you a fresh (restarted) reader. In practise you will want to bracket this more effectively (close the existing reader on exception).

Here's an answer that doesn't work, but feels like it should:
def restartingIterator[T](getIter: () => Iterator[T]): Iterator[T] = {
new Traversable[T] {
def foreach[U](f: T => U): Unit = {
try {
for (item <- getIter()) {
f(item)
}
} catch {
case e: IOException => this.foreach(f)
}
}
}.toIterator
}
I think this very clearly describes the control flow, which is great.
This code will throw a StackOverflowError in Scala 2.8.0 because of a bug in Traversable.toStream, but even after the fix for that bug, this code still won't work for my use case because toIterator calls toStream, which means that it will store all items in memory.
I'd love to be able to define an Iterator by just writing a foreach method, but there doesn't seem to be any easy way to do that.

Related

Handle exception for Scala Async and Future variable, error accessing variable name outside try block

#scala.throws[scala.Exception]
def processQuery(searchQuery : scala.Predef.String) : scala.concurrent.Future[io.circe.Json] = { /* compiled code */ }
How do I declare the searchResult variable at line 3 so that it can be initailized inside the try block and can be processed if it's successful after and outside the try block. Or, is there any other way to handle the exception? The file containing processQuery function is not editable to me, it's read-only.
def index = Action.async{ implicit request =>
val query = request.body.asText.get
var searchResult : scala.concurrent.Future[io.circe.Json] = Future[io.circe.Json] //line 3
var jsonVal = ""
try {
searchResult = search.processQuery(query)
} catch {
case e :Throwable => jsonVal = e.getMessage
}
searchResult onSuccess ({
case result => jsonVal = result.toString()
})
searchResult.map{ result =>
Ok(Json.parse(jsonVal))
}
}
if declared in the way it has been declared it's showing compilation error
Would using the recover method help you? I also suggest to avoid var and use a more functional approach if possible. In my world (and play Json library), I would hope to get to something like:
def index = Action.async { implicit request =>
processQuery(request.body.asText.get).map { json =>
Ok(Json.obj("success" -> true, "result" -> json))
}.recover {
case e: Throwable => Ok(Json.obj("success" -> false, "message" -> e.getMessage))
}
}
I guess it may be necessary to put the code in another whole try catch:
try {
processQuery....
...
} catch {
...
}
I have here a way to validate on the incoming JSON and fold on the result of the validation:
def returnToNormalPowerPlant(id: Int) = Action.async(parse.tolerantJson) { request =>
request.body.validate[ReturnToNormalCommand].fold(
errors => {
Future.successful{
BadRequest(
Json.obj("status" -> "error", "message" -> JsError.toJson(errors))
)
}
},
returnToNormalCommand => {
actorFor(id) flatMap {
case None =>
Future.successful {
NotFound(s"HTTP 404 :: PowerPlant with ID $id not found")
}
case Some(actorRef) =>
sendCommand(actorRef, id, returnToNormalCommand)
}
}
)
}

Do one webservice request only from play framework

I'm new to the play framework generally and how to use it with Scala. I want to build a proxy for big Json objects. I achieved so far that the json is stored in a cache and if it is not there, requested from a webservice.
However when two requests are coming in, targeting the same end point (webservice and path are identicall) only one call should be performed and the other request should wait for the result of the first call. At the moment it is performing a call to the service with every request.
This is my controller:
#Singleton
class CmsProxyController #Inject()(val cmsService: CmsProxyService) extends Controller {
implicit def ec : ExecutionContext = play.api.libs.concurrent.Execution.defaultContext
def header(path: String) = Action.async { context =>
cmsService.head(path) map { title =>
Ok(Json.obj("title" -> title))
}
}
def teaser(path: String) = Action.async { context =>
cmsService.teaser(path) map { res =>
Ok(res).as(ContentTypes.JSON)
}
}
}
This is the service:
trait CmsProxyService {
def head(path: String): Future[String]
def teaser(path: String): Future[String]
}
#Singleton
class DefaultCmsProxyService #Inject()(cache: CacheApi, cmsCaller: CmsCaller) extends CmsProxyService {
private val BASE = "http://foo.com"
private val CMS = "bar/rest/"
private val log = Logger("application")
override def head(path: String) = {
query(url(path), "$.payload[0].title")
}
override def teaser(path: String) = {
query(url(path), "$.payload[0].content.teaserText")
}
private def url(path: String) = s"${BASE}/${CMS}/${path}"
private def query(url: String, jsonPath: String): Future[String] = {
val key = s"${url}?${jsonPath}"
val payload = findInCache(key)
if (payload.isDefined) {
log.debug("found payload in cache")
Future.successful(payload.get)
} else {
val queried = parse(fetch(url)) map { json =>
JSONPath.query(jsonPath, json).as[String]
}
queried.onComplete(value => saveInCache(key, value.get))
queried
}
}
private def parse(fetched: Future[String]): Future[JsValue] = {
fetched map { jsonString =>
Json.parse(jsonString)
}
}
//retrieve the requested value from the cache or from ws
private def fetch(url: String): Future[String] = {
val body = findInCache(url)
if (body.isDefined) {
log.debug("found body in cache")
Future.successful(body.get)
} else {
cmsCaller.call(url)
}
}
private def findInCache(key: String): Option[String] = cache.get(key)
private def saveInCache(key: String, value: String, duration: FiniteDuration = 5.minutes) = cache.set(key, value, 5.minutes)
}
And finally the call to the webservice:
trait CmsCaller {
def call(url: String): Future[String]
}
#Singleton
class DefaultCmsCaller #Inject()(wsClient: WSClient) extends CmsCaller {
import scala.concurrent.ExecutionContext.Implicits.global
//keep those futures which are currently requested
private val calls: Map[String, Future[String]] = TrieMap()
private val log = Logger("application")
override def call(url: String): Future[String] = {
if(calls.contains(url)) {
Future.successful("ok")
}else {
val f = doCall(url)
calls put(url, f)
f
}
}
//do the final call
private def doCall(url: String): Future[String] = {
val request = ws(url)
val response = request.get()
val mapped = mapResponse(response)
mapped.onComplete(_ => cmsCalls.remove(url))
mapped
}
private def ws(url: String): WSRequest = wsClient.url(url)
//currently executed with every request
private def mapResponse(f: Future[WSResponse]): Future[String] = {
f.onComplete(_ => log.debug("call completed"))
f map {res =>
val status = res.status
log.debug(s"ws called, response status: ${status}")
if (status == 200) {
res.body
} else {
""
}
}
}
}
My question is: How can only one call to the webservice beeing executed? Even if there are several requests to the same target. I don't want to block it, the other request (not sure if I use the right word here) shall just be informed that there is already a webservice call on the way.
The request to head and teaser, see controller, shall perform only one call to the webservice.
Simple answer using Scala lazy keyword
def requestPayload(): String = ??? //do something
#Singleton
class SimpleCache #Inject() () {
lazy val result: Future[String] = requestPayload()
}
//Usage
#Singleton
class SomeController #Inject() (simpleCache: SimpleCache) {
def action = Action { req =>
simpleCache.result.map { result =>
Ok("success")
}
}
}
First request will trigger the rest call and all the other requests will use the cached result. Use map and flatMap to chain the requests.
Complicated answer using Actors
Use Actor to queue requests and Cache the result of the first successful request json result. All the other requests will read the result of the first request.
case class Request(value: String)
class RequestManager extends Actor {
var mayBeResult: Option[String] = None
var reqs = List.empty[(ActorRef, Request)]
def receive = {
case req: Request =>
context become firstReq
self ! req
}
def firstReq = {
case req: Request =>
process(req).onSuccess { value =>
mayBeResult = Some(value)
context become done
self ! "clear_pending_reqs"
}
context become processing
}
def processing = {
case req: Request =>
//queue requests
reqs = reqs ++ List(sender -> req)
}
def done = {
case "clear_pending_reqs" =>
reqs.foreach { case (sender, _) =>
//send value to the sender
sender ! value.
}
}
}
handle the case where the first request fails. In the above code block if the first request fails then actor will never go to the done state.
I solved my problem with a synchronization of the cache in the service. I'm not sure if this an elegant solution, but it works for me.
trait SyncCmsProxyService {
def head(path: String): String
def teaser(path: String): String
}
#Singleton
class DefaultSyncCmsProxyService #Inject()(implicit cache: CacheApi, wsClient: WSClient) extends SyncCmsProxyService with UrlBuilder with CacheAccessor{
private val log = Logger("application")
override def head(path: String) = {
log.debug("looking for head ...")
query(url(path), "$.payload[0].title")
}
override def teaser(path: String) = {
log.debug("looking for teaser ...")
query(url(path), "$.payload[0].content.teaserText")
}
private def query(url: String, jsonPath: String) = {
val key = s"${url}?${jsonPath}"
val payload = findInCache(key)
if (payload.isDefined) {
payload.get
}else{
val json = Json.parse(body(url))
val queried = JSONPath.query(jsonPath, json).as[String]
saveInCache(key, queried)
}
}
private def body(url: String) = {
cache.synchronized {
val body = findInCache(url)
if (body.isDefined) {
log.debug("found body in cache")
body.get
} else {
saveInCache(url, doCall(url))
}
}
}
private def doCall(url : String): String = {
import scala.concurrent.ExecutionContext.Implicits.global
log.debug("calling...")
val req = wsClient.url(url).get()
val f = req map { res =>
val status = res.status
log.debug(s"endpoint called! response status: ${status}")
if (status == 200) {
res.body
} else {
""
}
}
Await.result(f, 15.seconds)
}
}
Note that I omitted the traits UrlBuilder and CacheAccessor here because they are trivial.

Scala-Slick: only part of sequence of actions executed

Have a piece of code that adds/updates a Product and also associates one or more tags to it. Tags are actually added to a TagGroup and that is associated with the Product.
Issue I am facing is that only "part" of addOrUpdateProductWithTags() executes. Product is updated or created but Tags are not added. If I comment the last query (see comment) then everything works. Have turned on "" to confirm this.
lazy val pRetId = prods returning prods.map(_.id)
def addTags(keywords: Seq[String]) = {
for {
k <- keywords
} yield {
tags.filter(_.keyword === k).take(1).result.headOption.flatMap {
case Some(tag) => {
Logger.debug("Using existing tag: " + k)
DBIO.successful(tag.id)
}
case None => {
Logger.debug("Adding new tag: " + k)
tags.returning(tags.map(_.id)) += Tag(k, Some("DUMMY"))
}
}
}
}
def addOrUpdateProductWithTags(prod: Product, tagSet: Seq[String]): Future[Option[Long]] = {
// handle add or update product
val prodObject = prod.id match {
case 0L => pRetId += prod
case _ => prods.withFilter(_.id === prod.id).update(prod)
}
val action = for {
pid <- prodObject
tids <- DBIO.sequence(addTags(tagSet))
} yield (tids, pid)
val finalAction = action.flatMap {
case (tids, pid) => {
val prodId = if (prod.id > 0L) prod.id else pid.asInstanceOf[Number].longValue
val delAction = tagGroups.filter(_.prodId === prodId).delete
val tgAction = for {
tid <- tids
} yield {
tagGroups += TagGroup("Ignored-XX", prodId, tid)
}
delAction.flatMap { x => DBIO.sequence(tgAction) }
// IF LINE BELOW IS COMMENTED THEN TagGroup is created else even delete above doesn't happen
prods.filter(_.id === prodId).map(_.id).result.headOption
}
}
db.run(finalAction.transactionally)
}
This is the snippet in the controller where this method is called from. My suspicion is that caller doesn't wait long enough but not sure...
val prod = Prod(...)
val tagSet = generateTags(prod.tags)
val add = prodsService.addOrUpdateProductWithTags(prod, tagSet)
add.map { value =>
Redirect(controllers.www.routes.Dashboard.dashboard)
}.recover {
case th =>
InternalServerError("bad things happen in life: " + th)
}
Any clue what's wrong with the query ?
Stack: Scala 2.11.7, play version 2.5.4, play-slick 2.0.0 (slick 3.1)
Finally figured out a solution:
In place of the following 2 lines:
delAction.flatMap { x => DBIO.sequence(tgAction) }
prods.filter(_.id === prodId).map(_.id).result.headOption
I combined the actions with andThen operators as follows:
delAction >> DBIO.sequence(tgAction) >> prods.filter(_.id === prodId).map(_.id).result.headOption
Now the entire sequence gets executed. I still don't know what's wrong with the original solution but this works.

Groovy JsonBuilder strange behavior when toString()

I need to create a json to use as body in an http.request. I'm able to build dynamically up the json, but I noticed a strange behavior when calling builder.toString() twice. The resulting json was totally different. I'm likely to think this is something related to a kind of buffer or so. I've been reading the documentation but I can't find a good answer. Here is a code to test.
import groovy.json.JsonBuilder
def builder = new JsonBuilder()
def map = [
catA: ["catA-element1", "catA-element2"],
catB:[],
catC:["catC-element1"]
]
def a = map.inject([:]) { res, k, v ->
def b = v.inject([:]) {resp, i ->
resp[i] = k
resp
}
res += b
}
println a
def root = builder.query {
bool {
must a.collect{ k, v ->
builder.match {
"$v" k
}
}
}
should([
builder.simple_query_string {
query "text"
}
])
}
println builder.toString()
println builder.toString()
This will print the following lines. Pay attention to the last two lines
[catA-element1:catA, catA-element2:catA, catC-element1:catC]
{"query":{"bool":{"must":[{"match":{"catA":"catA-element1"}},{"match":{"catA":"catA-element2"}},{"match":{"catC":"catC-element1"}}]},"should":[{"simple_query_string":{"query":"text"}}]}}
{"match":{"catC":"catC-element1"}}
In my code I can easily send the first toString() result to a variable and use it when needed. But, why does it change when invoking more than one time?
I think this is happening because you are using builder inside the closure bool. If we make print builder.content before printing the result (buider.toString() is calling JsonOutput.toJson(builder.content)) we get:
[query:[bool:ConsoleScript54$_run_closure3$_closure6#294b5a70, should:[[simple_query_string:[query:text]]]]]
Adding println builder.content to the bool closure we can see that the builder.content is modified when the closure is evaluated:
def root = builder.query {
bool {
must a.collect{ k, v ->
builder.match {
"$v" k
println builder.content
}
}
}
should([
builder.simple_query_string {
query "text"
}
])
}
println JsonOutput.toJson(builder.content)
println builder.content
The above yields:
[query:[bool:ConsoleScript55$_run_closure3$_closure6#39b6156d, should:[[simple_query_string:[query:text]]]]]
[match:[catA:catA-element1]]
[match:[catA:catA-element2]]
{"query":{"bool":{"must":[{"match":{"catA":"catA-element1"}},{"match":{"catA":"catA-element2"}},{"match":{"catC":"catC-element1"}}]},"should":[{"simple_query_string":{"query":"text"}}]}}
[match:[catC:catC-element1]]
You can easily avoid that with a different builder for the closure inside:
def builder2 = new JsonBuilder()
def root = builder.query {
bool {
must a.collect{ k, v ->
builder2.match {
"$v" k
}
}
}
should([
builder.simple_query_string {
query "text"
}
])
}
Or even better:
def root = builder.query {
bool {
must a.collect({ k, v -> ["$v": k] }).collect({[match: it]})
}
should([
simple_query_string {
query "text"
}
])
}

Comma separated list with Enumerator

I've just started working with Scala in my new project (Scala 2.10.3, Play2 2.2.1, Reactivemongo 0.10.0), and encountered a pretty standard use case, which is - stream all the users in MongoDB to the external client. After navigating Enumerator, Enumeratee API I have not found a solid solution for that, and so I solved this in following way:
val users = collection.find(Json.obj()).cursor[User].enumerate(Integer.MAX_VALUE, false)
var first:Boolean = true
val indexedUsers = (users.map(u => {
if(first) {
first = false;
Json.stringify(Json.toJson(u))
} else {
"," + Json.stringify(Json.toJson(u))
}
}))
Which, from my point of view, is a little bit tricky - mainly because I needed to add Json Start Array, Json End Array and comma separators in element list, and I was not able to provide it as a pure Json stream, so I converted it to String steam.
What is a standard solution for that, using reactivemongo in play?
I wrote a helper function which does what you want to achieve:
def intersperse[E](e: E, enum: Enumerator[E]): Enumerator[E] = new Enumerator[E] {
val element = Input.El(e)
override def apply[A](i1: Iteratee[E, A]): Future[Iteratee[E, A]] = {
var iter = i1
val loop: Iteratee[E, Unit] = {
lazy val contStep = Cont(step)
def step(in: Input[E]): Iteratee[E, Unit] = in match {
case Input.Empty ⇒ contStep
case Input.EOF ⇒ Done((), Input.Empty)
case e # Input.El(_) ⇒
iter = Iteratee.flatten(iter.feed(element).flatMap(_.feed(e)))
contStep
}
lazy val contFirst = Cont(firstStep)
def firstStep(in: Input[E]): Iteratee[E, Unit] = in match {
case Input.EOF ⇒ Done((), Input.Empty)
case Input.Empty ⇒
iter = Iteratee.flatten(iter.feed(in))
contFirst
case Input.El(x) ⇒
iter = Iteratee.flatten(iter.feed(in))
contStep
}
contFirst
}
enum(loop).map { _ ⇒ iter }
}
}
Usage:
val prefix = Enumerator("[")
val suffix = Enumerator("]")
val asStrings = Enumeratee.map[User] { u => Json.stringify(Json.toJson(u)) }
val result = prefix >>> intersperse(",", users &> asStrings) >>> suffix
Ok.chunked(result)