Reading and initializing json data with scalatest withFixture - json

I am trying to use withFixture method to initialize my var ip2GeoTestJson and use it throughout my tests. I was able to achieve the desired logic with var year. I believe the error I am getting (parsing JNothing) is caused because the withFixture is not initializing my ip2GeoTestJson with the JSON.
I am currently getting this error:
*** RUN ABORTED ***
An exception or error caused a run to abort: java.lang.ClassCastException was thrown scenario("event.client_ip_address and event_header.client_ip_address both have values") -, construction cannot continue: "org.json4s.JsonAST$JNothing$ cannot be cast to org.json4s.JsonAST$JObject" (IP2GeoTestSuite.scala:51)
Code:
class IP2GeoTestSuite extends FeatureSpec with SparkContextFixture {
var ip2GeoTestJson: JValue = null
var year: String = null
feature("feature") {
scenario("scenario") {
println(ip2GeoTestJson)
assert(year != null)
assert(ip2GeoTestJson != null)
}
}
def withFixture(test: NoArgTest): org.scalatest.Outcome = {
year = test.configMap("year").asInstanceOf[String]
val ip2GeoConfigFile = test.configMap("config").asInstanceOf[String]
val ip2GeoUrl = getClass.getResourceAsStream(s"/$ip2GeoConfigFile")
val ip2GeoJsonString = Source.fromInputStream(ip2GeoUrl).getLines.mkString("")
System.out.println(ip2GeoJsonString)
ip2GeoTestJson = parse(ip2GeoJsonString)
try {
test()
}
}
}
The code works fine when the lines regarding ip2GeoData are moved to the top of the class like so however I need to hardcode the file name:
class IP2GeoTestSuite extends FeatureSpec with SparkContextFixture {
val ip2GeoConfigFile = "ip2geofile.json"
val ip2GeoUrl = getClass.getResourceAsStream(s"/$ip2GeoConfigFile")
val ip2GeoJsonString = Source.fromInputStream(ip2GeoUrl).getLines.mkString("")
System.out.println(ip2GeoJsonString)
val ip2GeoTestJson = parse(ip2GeoJsonString)
var year: String = null
feature("feature") {
scenario("scenario") {
println(ip2GeoTestJson)
assert(year != null)
assert(ip2GeoTestJson != null)
}
}
def withFixture(test: NoArgTest): org.scalatest.Outcome = {
year = test.configMap("year").asInstanceOf[String]
try {
test()
}
}
}

Set params before every test (see http://www.scalatest.org/user_guide/sharing_fixtures#withFixtureOneArgTest):
case class FixtureParams(year: String, ip2GeoTestJson: JValue)
class IP2GeoTestSuite extends FeatureSpec with SparkContextFixture {
feature("feature") {
scenario("scenario") {
println(ip2GeoTestJson)
assert(year != null)
assert(ip2GeoTestJson != null)
}
}
override def withFixture(test: OneArgTest): org.scalatest.Outcome = {
val year = test.configMap("year").asInstanceOf[String]
val ip2GeoConfigFile = test.configMap("config").asInstanceOf[String]
val ip2GeoUrl = getClass.getResourceAsStream(s"/$ip2GeoConfigFile")
val ip2GeoJsonString = Source.fromInputStream(ip2GeoUrl).getLines.mkString("")
val fixtureParam = FixtureParam(year, parseJson(ip2GeoJsonString))
try {
withFixture(test.toNoArgTest(fixtureParam))
} finally {
// Close resourses to avoid memory leak and unpredictable behaviour
ip2GeoUrl.close()
}
}
}
Set params only once before any test will run (http://www.scalatest.org/user_guide/sharing_fixtures#beforeAndAfter):
class IP2GeoTestSuite extends FeatureSpec with BeforeAndAfter {
var ip2GeoTestJson: JValue = null
var year: String = null
before {
// Load config manually because configMap isn't available here.
val config = ConfigFactory.load()
year = config.getString("year")
val ip2GeoConfigFile = "ip2geofile.json"
val ip2GeoUrl = getClass.getResourceAsStream(s"/$ip2GeoConfigFile")
val ip2GeoJsonString = Source.fromInputStream(ip2GeoUrl).getLines.mkString("")
ip2GeoUrl.close()
System.out.println(ip2GeoJsonString)
ip2GeoTestJson = parseJson(ip2GeoJsonString)
}
feature("feature") {
scenario("scenario") {
println(ip2GeoTestJson)
assert(year != null)
assert(ip2GeoTestJson != null)
}
}
}

Related

How can i connectScala.js with mySQL database?

How can I connect the Scala.js controller method by using mySQL database?
This is what I have so far:
case class data(user_d: String, udid: String, image: String)
object mysqlDAO {
def findById(udid: Long): Option[data] = {
val connection = DriverManager.getConnection("jdbc:mysql://localhost:3307/umo", "root", "")
try {
val statement = connection.prepareStatement("""select * from messages where udid = ?""")
statement.setLong(1, udid)
val rs = statement.executeQuery()
if (rs.next())
Option(new data(rs.getString(1), rs.getString(2), rs.getString(3)))
else
Option.empty
} finally {
connection.close()
}
}
}
first of all you can create the case class in Model package then u can create object and serialize the your case class Data
----------
case class Data(user_id: String, id: String, image: String)
object Data {
/* this one is the serialization of the data*/
implicit val DataWrites = new Writes[Data] {
def writes(json_write: Data): JsValue = {
Json.obj(
"user_id" -> json_write.user_id,
"id" -> json_write.id,
"image" -> json_write.image
)
}
}
val simple = {
get[String]("user.user_id") ~
get[String]("user.id") ~
get[String]("user.image") map {
case user_id ~ id ~ image =>
Data(user_id, id, image)
}
}
/*to get the data */
def find(udid: String): Option[Data] = {
DB.withConnection { implicit connection =>
SQL(ConstantSQL.GET_DETAILS).on('uid -> id).as(Data.simple.singleOpt)
}
}
}
----------
//query for to get the data based on the uid
val GET_DETAILS = """select * from user_image where uid = ?"""
then u can create your own controller method
----------
var uDetails = Data.find(uid)
----------
use the above line to get the data in your class

Do one webservice request only from play framework

I'm new to the play framework generally and how to use it with Scala. I want to build a proxy for big Json objects. I achieved so far that the json is stored in a cache and if it is not there, requested from a webservice.
However when two requests are coming in, targeting the same end point (webservice and path are identicall) only one call should be performed and the other request should wait for the result of the first call. At the moment it is performing a call to the service with every request.
This is my controller:
#Singleton
class CmsProxyController #Inject()(val cmsService: CmsProxyService) extends Controller {
implicit def ec : ExecutionContext = play.api.libs.concurrent.Execution.defaultContext
def header(path: String) = Action.async { context =>
cmsService.head(path) map { title =>
Ok(Json.obj("title" -> title))
}
}
def teaser(path: String) = Action.async { context =>
cmsService.teaser(path) map { res =>
Ok(res).as(ContentTypes.JSON)
}
}
}
This is the service:
trait CmsProxyService {
def head(path: String): Future[String]
def teaser(path: String): Future[String]
}
#Singleton
class DefaultCmsProxyService #Inject()(cache: CacheApi, cmsCaller: CmsCaller) extends CmsProxyService {
private val BASE = "http://foo.com"
private val CMS = "bar/rest/"
private val log = Logger("application")
override def head(path: String) = {
query(url(path), "$.payload[0].title")
}
override def teaser(path: String) = {
query(url(path), "$.payload[0].content.teaserText")
}
private def url(path: String) = s"${BASE}/${CMS}/${path}"
private def query(url: String, jsonPath: String): Future[String] = {
val key = s"${url}?${jsonPath}"
val payload = findInCache(key)
if (payload.isDefined) {
log.debug("found payload in cache")
Future.successful(payload.get)
} else {
val queried = parse(fetch(url)) map { json =>
JSONPath.query(jsonPath, json).as[String]
}
queried.onComplete(value => saveInCache(key, value.get))
queried
}
}
private def parse(fetched: Future[String]): Future[JsValue] = {
fetched map { jsonString =>
Json.parse(jsonString)
}
}
//retrieve the requested value from the cache or from ws
private def fetch(url: String): Future[String] = {
val body = findInCache(url)
if (body.isDefined) {
log.debug("found body in cache")
Future.successful(body.get)
} else {
cmsCaller.call(url)
}
}
private def findInCache(key: String): Option[String] = cache.get(key)
private def saveInCache(key: String, value: String, duration: FiniteDuration = 5.minutes) = cache.set(key, value, 5.minutes)
}
And finally the call to the webservice:
trait CmsCaller {
def call(url: String): Future[String]
}
#Singleton
class DefaultCmsCaller #Inject()(wsClient: WSClient) extends CmsCaller {
import scala.concurrent.ExecutionContext.Implicits.global
//keep those futures which are currently requested
private val calls: Map[String, Future[String]] = TrieMap()
private val log = Logger("application")
override def call(url: String): Future[String] = {
if(calls.contains(url)) {
Future.successful("ok")
}else {
val f = doCall(url)
calls put(url, f)
f
}
}
//do the final call
private def doCall(url: String): Future[String] = {
val request = ws(url)
val response = request.get()
val mapped = mapResponse(response)
mapped.onComplete(_ => cmsCalls.remove(url))
mapped
}
private def ws(url: String): WSRequest = wsClient.url(url)
//currently executed with every request
private def mapResponse(f: Future[WSResponse]): Future[String] = {
f.onComplete(_ => log.debug("call completed"))
f map {res =>
val status = res.status
log.debug(s"ws called, response status: ${status}")
if (status == 200) {
res.body
} else {
""
}
}
}
}
My question is: How can only one call to the webservice beeing executed? Even if there are several requests to the same target. I don't want to block it, the other request (not sure if I use the right word here) shall just be informed that there is already a webservice call on the way.
The request to head and teaser, see controller, shall perform only one call to the webservice.
Simple answer using Scala lazy keyword
def requestPayload(): String = ??? //do something
#Singleton
class SimpleCache #Inject() () {
lazy val result: Future[String] = requestPayload()
}
//Usage
#Singleton
class SomeController #Inject() (simpleCache: SimpleCache) {
def action = Action { req =>
simpleCache.result.map { result =>
Ok("success")
}
}
}
First request will trigger the rest call and all the other requests will use the cached result. Use map and flatMap to chain the requests.
Complicated answer using Actors
Use Actor to queue requests and Cache the result of the first successful request json result. All the other requests will read the result of the first request.
case class Request(value: String)
class RequestManager extends Actor {
var mayBeResult: Option[String] = None
var reqs = List.empty[(ActorRef, Request)]
def receive = {
case req: Request =>
context become firstReq
self ! req
}
def firstReq = {
case req: Request =>
process(req).onSuccess { value =>
mayBeResult = Some(value)
context become done
self ! "clear_pending_reqs"
}
context become processing
}
def processing = {
case req: Request =>
//queue requests
reqs = reqs ++ List(sender -> req)
}
def done = {
case "clear_pending_reqs" =>
reqs.foreach { case (sender, _) =>
//send value to the sender
sender ! value.
}
}
}
handle the case where the first request fails. In the above code block if the first request fails then actor will never go to the done state.
I solved my problem with a synchronization of the cache in the service. I'm not sure if this an elegant solution, but it works for me.
trait SyncCmsProxyService {
def head(path: String): String
def teaser(path: String): String
}
#Singleton
class DefaultSyncCmsProxyService #Inject()(implicit cache: CacheApi, wsClient: WSClient) extends SyncCmsProxyService with UrlBuilder with CacheAccessor{
private val log = Logger("application")
override def head(path: String) = {
log.debug("looking for head ...")
query(url(path), "$.payload[0].title")
}
override def teaser(path: String) = {
log.debug("looking for teaser ...")
query(url(path), "$.payload[0].content.teaserText")
}
private def query(url: String, jsonPath: String) = {
val key = s"${url}?${jsonPath}"
val payload = findInCache(key)
if (payload.isDefined) {
payload.get
}else{
val json = Json.parse(body(url))
val queried = JSONPath.query(jsonPath, json).as[String]
saveInCache(key, queried)
}
}
private def body(url: String) = {
cache.synchronized {
val body = findInCache(url)
if (body.isDefined) {
log.debug("found body in cache")
body.get
} else {
saveInCache(url, doCall(url))
}
}
}
private def doCall(url : String): String = {
import scala.concurrent.ExecutionContext.Implicits.global
log.debug("calling...")
val req = wsClient.url(url).get()
val f = req map { res =>
val status = res.status
log.debug(s"endpoint called! response status: ${status}")
if (status == 200) {
res.body
} else {
""
}
}
Await.result(f, 15.seconds)
}
}
Note that I omitted the traits UrlBuilder and CacheAccessor here because they are trivial.

Serialization error while writing JSON to file

I am reading text files and creating Json objects JsValues in every iteration. I want to save them to a file at every iteration. I am using Play Framework to create JSON objects.
class Cleaner {
def getDocumentData() = {
for (i <- no_of_files) {
.... do something ...
some_json = Json.obj("text" -> LARGE_TEXT)
final_json = Json.stringify(some_json)
//save final_json here to a file
}
}
}
I tried using PrintWriter to save that json but I am getting Exception in thread "main" org.apache.spark.SparkException: Task not serializable as the error.
How should I correct this? or is there any other way I can save the JsValue?
UPDATE:
I read that the trait serializable has to be used in this case. I have the following function:
class Cleaner() extends Serializable {
def readDocumentData() {
val conf = new SparkConf()
.setAppName("linkin_spark")
.setMaster("local[2]")
.set("spark.executor.memory", "1g")
.set("spark.rdd.compress", "true")
.set("spark.storage.memoryFraction", "1")
val sc = new SparkContext(conf)
val temp = sc.wholeTextFiles("text_doc.dat)
val docStartRegex = """<DOC>""".r
val docEndRegex = """</DOC>""".r
val docTextStartRegex = """<TEXT>""".r
val docTextEndRegex = """</TEXT>""".r
val docnoRegex = """<DOCNO>(.*?)</DOCNO>""".r
val writer = new PrintWriter(new File("test.json"))
for (fileData <- temp) {
val filename = fileData._1
val content: String = fileData._2
println(s"For $filename, the data is:")
var startDoc = false // This is for the
var endDoc = false // whole file
var startText = false //
var endText = false //
var textChunk = new ListBuffer[String]()
var docID: String = ""
var es_json: JsValue = Json.obj()
for (current_line <- content.lines) {
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
writer.write(es_json.toString())
println(es_json.toString())
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
}
}
writer.close()
}
}
This is function to which I added the trait but still I am getting the same exception.
I rewrote a smaller version of it:
def foo() {
val conf = new SparkConf()
.setAppName("linkin_spark")
.setMaster("local[2]")
.set("spark.executor.memory", "1g")
.set("spark.rdd.compress", "true")
.set("spark.storage.memoryFraction", "1")
val sc = new SparkContext(conf)
var es_json: JsValue = Json.obj()
val writer = new PrintWriter(new File("test.json"))
for (i <- 1 to 10) {
es_json = Json.obj(
"_id" -> i,
"_source" -> Json.obj(
"text" -> "Eureka!"
)
)
println(es_json)
writer.write(es_json.toString() + "\n")
}
writer.close()
}
This function works fine with and also without serializable. I cannot understand what's happening?
EDIT: First answer made on phone.
It's not your main class that needs to be serializable but the class you use in the rdd processing loop in this case inside for (fileData <- temp)
It needs to be serializable because the spark data is on multiple partitions that may be on multiple computers. So the functions you apply to this data need to be serializable so you can send them to the other computer where they will be executed in parallel.
PrintWriter cannot be serializable since it refers to a file that is only available from the original computer. Hence the serializaion error.
To write your data on the computer initializing the spark process. You need to take the data that is all over the cluster and bring it to your machine then write it.
To do that you can either collect the result. rdd.collect() and that will take all the data from the cluster and put it in your driver thread memory. Then you can write it to a file using the PrintWriter.
like this:
temp.flatMap { fileData =>
val filename = fileData._1
val content: String = fileData._2
println(s"For $filename, the data is:")
var startDoc = false // This is for the
var endDoc = false // whole file
var startText = false //
var endText = false //
var textChunk = new ListBuffer[String]()
var docID: String = ""
var es_json: JsValue = Json.obj()
var results = ArrayBuffer[String]()
for (current_line <- content.lines) {
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
results.append(es_json.toString())
println(es_json.toString())
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
}
results
}
.collect()
.foreach(es_json => writer.write(es_json))
If the result is too large for the driver thread memory you can use the saveAsTextFile function that will stream each partition to your drive. In this second case the path you give as argument will be made into a folder and each partition of your rdd will be written to a numbered file in it.
like this:
temp.flatMap { fileData =>
val filename = fileData._1
val content: String = fileData._2
println(s"For $filename, the data is:")
var startDoc = false // This is for the
var endDoc = false // whole file
var startText = false //
var endText = false //
var textChunk = new ListBuffer[String]()
var docID: String = ""
var es_json: JsValue = Json.obj()
var results = ArrayBuffer[String]()
for (current_line <- content.lines) {
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
results.append(es_json.toString())
println(es_json.toString())
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
}
results
}
.saveAsTextFile("test.json")

How to insert a Path in between an existing Path

I am working on an implementation to generate alternate Paths using via node method.
While checking for local optimality I do the following
forwardEdge = bestWeightMapFrom.get(viaNode);
reverseEdge = bestWeightMapTo.get(viaNode);
double unpackedUntilDistance = 0;
while(forwardEdge.edge != -1) {
double parentDist = forwardEdge.parent != null ? forwardEdge.parent.distance : 0;
double dist = forwardEdge.distance - parentDist;
if(unpackedUntilDistance + dist >= T_THRESHOLD) {
EdgeSkipIterState edgeState = (EdgeSkipIterState) graph.getEdgeProps(forwardEdge.edge, forwardEdge.adjNode);
unpackStack.add(new EdgePair(edgeState, false));
sV = forwardEdge.adjNode;
forwardEdge = forwardEdge.parent;
break;
}
else {
unpackedUntilDistance += dist;
forwardEdge = forwardEdge.parent;
sV = forwardEdge.adjNode;
}
}
int oldSV = forwardEdge.adjNode;
EdgeEntry oldForwardEdge = forwardEdge;
I unpack the edge in the stack to further narrow down sV.
I get vT and oldVt in a similar fashion by traversing reverseEdge.
if I determine that the path from sV and vT is <= length of unpacked edges I accept this via node and construct the alternatePath as follows.
PathBidirRef p = (PathBidirRef) algo.calcPath(oldSV, oldVT);
Path4CHAlt p1 = new Path4CHAlt(graph, flagEncoder);
p1.setSwitchToFrom(false);
p1.setEdgeEntry(oldForwardEdge);
p1.segmentEdgeEntry = p.edgeEntry;
double weight = oldForwardEdge.weight + oldReverseEdge.weight + p.edgeEntry.weight + p.edgeTo.weight;
p1.setWeight(weight);
p1.edgeTo = oldReverseEdge;
p1.segmentEdgeTo = p.edgeTo;
Path p2 = p1.extract();
Path4CHAlt is
public class Path4CHAlt extends Path4CH {
private boolean switchWrapper = false;
public EdgeEntry segmentEdgeTo;
public EdgeEntry segmentEdgeEntry;
public Path4CHAlt( Graph g, FlagEncoder encoder )
{
super(g, encoder);
}
public Path4CHAlt setSwitchToFrom( boolean b )
{
switchWrapper = b;
return this;
}
#Override
public Path extract()
{
System.out.println("Path4CHAlt extract");
if (edgeEntry == null || edgeTo == null || segmentEdgeEntry == null || segmentEdgeTo == null)
return this;
if (switchWrapper)
{
EdgeEntry ee = edgeEntry;
edgeEntry = edgeTo;
edgeTo = ee;
ee = segmentEdgeEntry;
segmentEdgeEntry = segmentEdgeTo;
segmentEdgeTo = ee;
}
EdgeEntry currEdge = segmentEdgeEntry;
while (EdgeIterator.Edge.isValid(currEdge.edge))
{
processEdge(currEdge.edge, currEdge.adjNode);
currEdge = currEdge.parent;
}
currEdge.parent = edgeEntry;
currEdge = edgeEntry;
while (EdgeIterator.Edge.isValid(currEdge.edge))
{
processEdge(currEdge.edge, currEdge.adjNode);
currEdge = currEdge.parent;
}
setFromNode(currEdge.adjNode);
reverseOrder();
currEdge = segmentEdgeTo;
int tmpEdge = currEdge.edge;
while (EdgeIterator.Edge.isValid(tmpEdge))
{
currEdge = currEdge.parent;
processEdge(tmpEdge, currEdge.adjNode);
tmpEdge = currEdge.edge;
}
currEdge.parent = edgeTo;
currEdge = edgeTo;
tmpEdge = currEdge.edge;
while (EdgeIterator.Edge.isValid(tmpEdge))
{
currEdge = currEdge.parent;
processEdge(tmpEdge, currEdge.adjNode);
tmpEdge = currEdge.edge;
}
setEndNode(currEdge.adjNode);
return setFound(true);
}
}
This is not working all the time. I get exceptions in Path4CH
java.lang.NullPointerException
at com.graphhopper.routing.ch.Path4CH.expandEdge(Path4CH.java:62)
at com.graphhopper.routing.ch.Path4CH.processEdge(Path4CH.java:56)
at com.graphhopper.routing.PathBidirRef.extract(PathBidirRef.java:95)
at com.graphhopper.routing.DijkstraBidirectionRef.extractPath(DijkstraBidirectionRef.java:99)
at com.graphhopper.routing.AbstractBidirAlgo.runAlgo(AbstractBidirAlgo.java:74)
at com.graphhopper.routing.AbstractBidirAlgo.calcPath(AbstractBidirAlgo.java:60)
In Path
java.lang.IllegalStateException: Edge 1506012 was empty when requested with node 1289685, array index:0, edges:318
at com.graphhopper.routing.Path.forEveryEdge(Path.java:253)
at com.graphhopper.routing.Path.calcInstructions(Path.java:349)
I dont know what I am doing wrong. I could really use some help with this.
Thanks.
I solved this issue.
Inside DijkstraBidirectionRef.calcPath I was trying to calculate shortest path from an arbitrary node to source node and vertex node.
The error used to occur because the original call to calcPath was operating on QueryGraph and inside I was creating a new Object of DijkstraBidirectionRef using LevelGraphStorage.
This was a problem because QueryGraph may create virtual nodes and edges for source and target nodes. Call to calcPath(node, virtualNode) operating on LevelGraphStorage would throw an exception.
The fix was to call algo.setGraph(queryGraph) after creating DijkstraBidirectionRef.

restart iterator on exceptions in Scala

I have an iterator (actually a Source.getLines) that's reading an infinite stream of data from a URL. Occasionally the iterator throws a java.io.IOException when there is a connection problem. In such situations, I need to re-connect and re-start the iterator. I want this to be seamless so that the iterator just looks like a normal iterator to the consumer, but underneath is restarting itself as necessary.
For example, I'd like to see the following behavior:
scala> val iter = restartingIterator(() => new Iterator[Int]{
var i = -1
def hasNext = {
if (this.i < 3) {
true
} else {
throw new IOException
}
}
def next = {
this.i += 1
i
}
})
res0: ...
scala> iter.take(6).toList
res1: List[Int] = List(0, 1, 2, 3, 0, 1)
I have a partial solution to this problem, but it will fail on some corner cases (e.g. an IOException on the first item after a restart) and it's pretty ugly:
def restartingIterator[T](getIter: () => Iterator[T]) = new Iterator[T] {
var iter = getIter()
def hasNext = {
try {
iter.hasNext
} catch {
case e: IOException => {
this.iter = getIter()
iter.hasNext
}
}
}
def next = {
try {
iter.next
} catch {
case e: IOException => {
this.iter = getIter()
iter.next
}
}
}
}
I keep feeling like there's a better solution to this, maybe some combination of Iterator.continually and util.control.Exception or something like that, but I couldn't figure one out. Any ideas?
This is fairly close to your version and using scala.util.control.Exception:
def restartingIterator[T](getIter: () => Iterator[T]) = new Iterator[T] {
import util.control.Exception.allCatch
private[this] var i = getIter()
private[this] def replace() = i = getIter()
def hasNext: Boolean = allCatch.opt(i.hasNext).getOrElse{replace(); hasNext}
def next(): T = allCatch.opt(i.next).getOrElse{replace(); next}
}
For some reason this is not tail recursive but it that can be fixed by using a slightly more verbose version:
def restartingIterator2[T](getIter: () => Iterator[T]) = new Iterator[T] {
import util.control.Exception.allCatch
private[this] var i = getIter()
private[this] def replace() = i = getIter()
#annotation.tailrec def hasNext: Boolean = {
val v = allCatch.opt(i.hasNext)
if (v.isDefined) v.get else {replace(); hasNext}
}
#annotation.tailrec def next(): T = {
val v = allCatch.opt(i.next)
if (v.isDefined) v.get else {replace(); next}
}
}
Edit: There is a solution with util.control.Exception and Iterator.continually:
def restartingIterator[T](getIter: () => Iterator[T]) = {
import util.control.Exception.allCatch
var iter = getIter()
def f: T = allCatch.opt(iter.next).getOrElse{iter = getIter(); f}
Iterator.continually { f }
}
There is a better solution, the Iteratee:
http://apocalisp.wordpress.com/2010/10/17/scalaz-tutorial-enumeration-based-io-with-iteratees/
Here is for example an enumerator that restarts on encountering an exception.
def enumReader[A](r: => BufferedReader, it: IterV[String, A]): IO[IterV[String, A]] = {
val tmpReader = r
def loop: IterV[String, A] => IO[IterV[String, A]] = {
case i#Done(_, _) => IO { i }
case Cont(k) => for {
s <- IO { try { val x = tmpReader.readLine; IO(x) }
catch { case e => enumReader(r, it) }}.join
a <- if (s == null) k(EOF) else loop(k(El(s)))
} yield a
}
loop(it)
}
The inner loop advances the Iteratee, but the outer function still holds on to the original. Since Iteratee is a persistent data structure, to restart you just have to call the function again.
I'm passing the Reader by name here so that r is essentially a function that gives you a fresh (restarted) reader. In practise you will want to bracket this more effectively (close the existing reader on exception).
Here's an answer that doesn't work, but feels like it should:
def restartingIterator[T](getIter: () => Iterator[T]): Iterator[T] = {
new Traversable[T] {
def foreach[U](f: T => U): Unit = {
try {
for (item <- getIter()) {
f(item)
}
} catch {
case e: IOException => this.foreach(f)
}
}
}.toIterator
}
I think this very clearly describes the control flow, which is great.
This code will throw a StackOverflowError in Scala 2.8.0 because of a bug in Traversable.toStream, but even after the fix for that bug, this code still won't work for my use case because toIterator calls toStream, which means that it will store all items in memory.
I'd love to be able to define an Iterator by just writing a foreach method, but there doesn't seem to be any easy way to do that.