How can I compare if two json structures are the same in scala?
For example, if I have:
{
resultCount: 1,
results: [
{
artistId: 331764459,
collectionId: 780609005
}
]
}
and
{
results: [
{
collectionId: 780609005,
artistId: 331764459
}
],
resultCount: 1
}
They should be considered equal
You should be able to simply do json1 == json2, if the json libraries are written correctly. Is that not working for you?
This is with spray-json, although I would expect the same from every json library:
import spray.json._
import DefaultJsonProtocol._
Welcome to Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_51).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val json1 = """{ "a": 1, "b": [ { "c":2, "d":3 } ] }""".parseJson
json1: spray.json.JsValue = {"a":1,"b":[{"c":2,"d":3}]}
scala> val json2 = """{ "b": [ { "d":3, "c":2 } ], "a": 1 }""".parseJson
json2: spray.json.JsValue = {"b":[{"d":3,"c":2}],"a":1}
scala> json1 == json2
res1: Boolean = true
Spray-json uses an immutable scala Map to represent a JSON object in the abstract syntax tree resulting from a parse, so it is just Map's equality semantics that make this work.
You can also use scalatest-json
Example:
it("should fail on slightly different json explaining why") {
val input = """{"someField": "valid json"}""".stripMargin
val expected = """{"someField": "different json"}""".stripMargin
input should matchJson(expected)
}
When the 2 jsons doesn't match, a nice diff will be display which is quite useful when working with big jsons.
Can confirm that it also works just fine with the Jackson library using == operator:
val simpleJson =
"""
|{"field1":"value1","field2":"value2"}
""".stripMargin
val simpleJsonNode = objectMapper.readTree(simpleJson)
val simpleJsonNodeFromString = objectMapper.readTree(simpleJsonNode.toString)
assert(simpleJsonNode == simpleJsonNodeFromString)
spray-json is definitely great, but I use Gson since I already had dependency on Gson library on my project. I am using these in my unit tests, works well for simple json.
import com.google.gson.{JsonParser}
import org.apache.flume.event.JSONEvent
import org.scalatest.FunSuite
class LogEnricherSpec extends FunSuite {
test("compares json to json") {
val parser = new JsonParser()
assert(parser.parse("""
{
"eventType" : "TransferItems",
"timeMillis" : "1234567890",
"messageXml":{
"TransferId" : 123456
}
} """.stripMargin)
==
parser.parse("""
{
"timeMillis" : "1234567890",
"eventType" : "TransferItems",
"messageXml":{
"TransferId" : 123456
}
}
""".stripMargin))
}
Calling the method compare_2Json(str1,str2) will return a boolean value.
Please make sure that the two string parameters are json.
Welcome to use and test.
def compare_2Json(js1:String,js2:String): Boolean = {
var js_str1 = js1
var js_str2 = js2
js_str1=js_str1.replaceAll(" ","")
js_str2=js_str2.replaceAll(" ","")
var issame = false
val arrbuff1 = ArrayBuffer[String]()
val arrbuff2 = ArrayBuffer[String]()
if(js_str1.substring(0,1)=="{" && js_str2.substring(0,1)=="{" || js_str1.substring(0,1)=="["&&js_str2.substring(0,1)=="["){
for(small_js1 <- split_JsonintoSmall(js_str1);small_js2 <- split_JsonintoSmall((js_str2))) {
issame = compare_2Json(small_js1,small_js2)
if(issame == true){
js_str1 = js_str1.substring(0,js_str1.indexOf(small_js1))+js_str1.substring(js_str1.indexOf(small_js1)+small_js1.length)
js_str2 = js_str2.substring(0,js_str2.indexOf(small_js2))+js_str2.substring(js_str2.indexOf(small_js2)+small_js2.length)
}
}
js_str1 = js_str1.substring(1,js_str1.length-1)
js_str2 = js_str2.substring(1,js_str2.length-1)
for(str_js1 <- js_str1.split(","); str_js2 <- js_str2.split(",")){
if(str_js1!="" && str_js2!="")
if(str_js1 == str_js2){
js_str1 = js_str1.substring(0,js_str1.indexOf(str_js1))+js_str1.substring(js_str1.indexOf(str_js1)+str_js1.length)
js_str2 = js_str2.substring(0,js_str2.indexOf(str_js2))+js_str2.substring(js_str2.indexOf(str_js2)+str_js2.length)
}
}
js_str1=js_str1.replace(",","")
js_str2=js_str2.replace(",","")
if(js_str1==""&&js_str2=="")return true
else return false
}
else return false
}
def split_JsonintoSmall(js_str: String):ArrayBuffer[String]={
val arrbuff = ArrayBuffer[String]()
var json_str = js_str
while(json_str.indexOf("{",1)>0 || json_str.indexOf("[",1)>0){
if (json_str.indexOf("{", 1) < json_str.indexOf("[", 1) && json_str.indexOf("{",1)>0 || json_str.indexOf("{", 1) > json_str.indexOf("[", 1) && json_str.indexOf("[",1)<0 ) {
val right = findrealm(1, json_str, '{', '}')
arrbuff += json_str.substring(json_str.indexOf("{", 1), right + 1)
json_str = json_str.substring(0,json_str.indexOf("{",1))+json_str.substring(right+1)
}
else {
if(json_str.indexOf("[",1)>0) {
val right = findrealm(1, json_str, '[', ']')
arrbuff += json_str.substring(json_str.indexOf("[", 1), right + 1)
json_str = json_str.substring(0, json_str.indexOf("[", 1)) + json_str.substring(right + 1)
}
}
}
arrbuff
}
def findrealm(begin_loc: Int, str: String, leftch: Char, rightch: Char): Int = {
var left = str.indexOf(leftch, begin_loc)
var right = str.indexOf(rightch, left)
left = str.indexOf(leftch, left + 1)
while (left < right && left > 0) {
right = str.indexOf(rightch, right + 1)
left = str.indexOf(leftch, left + 1)
}
right
}
Related
Not able to append object value. trying to transform json to expected output having all the user infomration.
Groovy Script:
import groovy.json.*;
def data='''{"totalcount":5,"employees":[{"name":"Sam","age":34},{"name":"Richard","age":38},{"name":"Harry","age":36},{"name":"Tom","age":40},{"name":"David","age":84},{"name":"Chris","age":52}],"Salaries":[{"name":"Sam","salary":34000},{"name":"Richard","salary":89889},{"name":"Harry","salary":36898},{"name":"David","salary":489889},{"name":"Chris","salary":84898},{"name":"Toms","salary":5298}]}'''
def test = new JsonSlurper().parseText(data)
def a=[:]
def OU=[:]
def status=[]
a.OUTPUT=OU
OU.STATUS=status
def b = [:]
for(def y=0;y<test.employees.size();y++) {
for (def j = 0; j < test.Salaries.size(); j++) {
if (test.employees[y].name ==test.Salaries[j].name ) {
b.name = test.employees[y].name
b.age = test.employees[y].age
b.salary = test.Salaries[j].salary
status<<b
j=test.Salaries.size()
}
}
}
String finalJson = JsonOutput.toJson a;
println JsonOutput.prettyPrint( finalJson)
Expected Result :
{"OUTPUT":{"STATUS":[{"name":"Sam","age":34,"salary":34000},{"name":"Richard","age":38,"salary":89889},{"name":"Harry","age":36,"salary":36898},{"name":"David","age":84,"salary":489889},{"name":"Chris","age":52,"salary":84898}]}}
You can simplify your code to:
import groovy.json.*
def data='{"totalcount":5,"employees":[{"name":"Sam","age":34},{"name":"Richard","age":38},{"name":"Harry","age":36},{"name":"Tom","age":40},{"name":"David","age":84},{"name":"Chris","age":52}],"Salaries":[{"name":"Sam","salary":34000},{"name":"Richard","salary":89889},{"name":"Harry","salary":36898},{"name":"David","salary":489889},{"name":"Chris","salary":84898},{"name":"Toms","salary":5298}]}'
def test = new JsonSlurper().parseText(data)
def joinedByName = (test.employees + test.Salaries).groupBy { it.name }
def result = [
OUTPUT: [
STATUS: [
joinedByName.collect { it.value.inject { a, b -> a + b } }
.findAll { it.salary && it.age } // Remove people who don't have both
]
]
]
String finalJson = JsonOutput.toJson result;
println JsonOutput.prettyPrint(finalJson)
I'm working on a Scala program that uses the Scala Pickling library to serialize and deserialize a Map object that contains a String and a Point2D.Double object from the java.awt.geom package.
Here's the relevant logic:
contents +=
new Button("Save Config") {
reactions += {
case ButtonClicked(_) => {
var m: Map[String, Point2D.Double] = Map()
nodeFields.foreach(x => {
m += (x._1 -> new Point2D.Double(x._2._1.text.toDouble, x._2._2.text.toDouble))
})
val pkl = m.pickle
fc.showSaveDialog(null)
val outputFile = fc.selectedFile
val writer = new PrintWriter(outputFile)
writer.write(pkl.value)
writer.close()
Dialog.showMessage(null, "Success!")
}
}
}
If you need to see more, here's the commit with the offending logic
As it stands, the JSON formatted string output from pkl.value is a working serialized Map[String, Point2D.Double], except that the values of Point2D.Double are dropped!
Here's a snippet of the output:
{
"$type": "scala.collection.mutable.Map[java.lang.String,java.awt.geom.Point2D.Double]",
"elems": [
{
"$type": "scala.Tuple2[java.lang.String,java.awt.geom.Point2D.Double]",
"_1": "BOTTOMLANE\r",
"_2": {
}
},
{
"$type": "scala.Tuple2[java.lang.String,java.awt.geom.Point2D.Double]",
"_1": "UPPERLANESECOND_0\r",
"_2": {
}
},
{
"$type": "scala.Tuple2[java.lang.String,java.awt.geom.Point2D.Double]",
"_1": "upperSecondTower_1",
"_2": {
}
},
...
]
}
What can I do to fix this?
scala-pickling can not directly pickle/unpickle Point2D.Double because it has no public fields (the x and y values are accessible through the getX and getY getters).
A possible Pickler / Unpickler for Point2D.Double would be :
object Point2DPickler {
import scala.pickling._
import scala.pickling.Defaults._
import java.awt.geom.Point2D
type DoublePoint = java.awt.geom.Point2D.Double
implicit object Point2DDoublePickle extends Pickler[DoublePoint] with Unpickler[DoublePoint] {
private val doubleUnpickler = implicitly[Unpickler[Double]]
override def tag = FastTypeTag[java.awt.geom.Point2D.Double]
override def pickle(point: DoublePoint, builder: PBuilder) = {
builder.beginEntry(point)
builder.putField("x",
b => b.hintTag(FastTypeTag.Double).beginEntry(point.getX).endEntry()
)
builder.putField("y",
b => b.hintTag(FastTypeTag.Double).beginEntry(point.getY).endEntry()
)
builder.endEntry()
}
override def unpickle(tag: String, reader: PReader): DoublePoint = {
val x = doubleUnpickler.unpickleEntry(reader.readField("x")).asInstanceOf[Double]
val y = doubleUnpickler.unpickleEntry(reader.readField("y")).asInstanceOf[Double]
new Point2D.Double(x, y)
}
}
}
Which could be used as :
import scala.pickling.Defaults._
import scala.pickling.json._
import java.awt.geom.Point2D
import Point2DPickler._
val dpoint = new Point2D.Double(1d, 2d)
scala> val json = dpoint.pickle
json: pickling.json.pickleFormat.PickleType =
JSONPickle({
"$type": "java.awt.geom.Point2D.Double",
"x": {
"$type": "scala.Double",
"value": 1.0
},
"y": {
"$type": "scala.Double",
"value": 2.0
}
})
scala> val dpoint2 = json.value.unpickle[java.awt.geom.Point2D.Double]
dpoint2: java.awt.geom.Point2D.Double = Point2D.Double[1.0, 2.0]
for (character <- content) {
if (character == '\n') {
val current_line = line.mkString
line.clear()
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_index" -> "ES_SPARK_AP",
"_type" -> "document",
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
// yield es_json
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
} else {
line += character
}
}
The above for-loop parses through a text file and creates a JSON object of each chunk parsed in a loop. This is JSON will be sent to for further processing to Elasticsearch. In python, we can yield the JSON and use generator easily like:
def func():
for i in range(num):
... some computations ...
yield {
JSON ## JSON is yielded
}
for json in func(): ## we parse through the generator here.
process(json)
I cannot understand how I can use yield in similar fashion using scala?
If you want lazy returns, scala does this using Iterator types. Specifically if you want to handle line by line values, I'd split it into lines first with .lines
val content: String = ???
val results: Iterator[Json] =
for {
lines <- content.lines
line <- lines
} yield {
line match {
case docEndRegex(_*) => ...
}
}
You can also use a function directly
def toJson(line: String): Json =
line match {
case "hi" => Json.obj("line" -> "hi")
case "bye" => Json.obj("what" -> "a jerk")
}
val results: Iterator[Json] =
for {
lines <- content.lines
line <- lines
} yield toJson(line)
This is equivalent to doing
content.lines.map(line => toJson(line))
Or somewhat equivalently in python
lines = (line.strip() for line in content.split("\n"))
jsons = (toJson(line) for line in lines)
I am reading text files and creating Json objects JsValues in every iteration. I want to save them to a file at every iteration. I am using Play Framework to create JSON objects.
class Cleaner {
def getDocumentData() = {
for (i <- no_of_files) {
.... do something ...
some_json = Json.obj("text" -> LARGE_TEXT)
final_json = Json.stringify(some_json)
//save final_json here to a file
}
}
}
I tried using PrintWriter to save that json but I am getting Exception in thread "main" org.apache.spark.SparkException: Task not serializable as the error.
How should I correct this? or is there any other way I can save the JsValue?
UPDATE:
I read that the trait serializable has to be used in this case. I have the following function:
class Cleaner() extends Serializable {
def readDocumentData() {
val conf = new SparkConf()
.setAppName("linkin_spark")
.setMaster("local[2]")
.set("spark.executor.memory", "1g")
.set("spark.rdd.compress", "true")
.set("spark.storage.memoryFraction", "1")
val sc = new SparkContext(conf)
val temp = sc.wholeTextFiles("text_doc.dat)
val docStartRegex = """<DOC>""".r
val docEndRegex = """</DOC>""".r
val docTextStartRegex = """<TEXT>""".r
val docTextEndRegex = """</TEXT>""".r
val docnoRegex = """<DOCNO>(.*?)</DOCNO>""".r
val writer = new PrintWriter(new File("test.json"))
for (fileData <- temp) {
val filename = fileData._1
val content: String = fileData._2
println(s"For $filename, the data is:")
var startDoc = false // This is for the
var endDoc = false // whole file
var startText = false //
var endText = false //
var textChunk = new ListBuffer[String]()
var docID: String = ""
var es_json: JsValue = Json.obj()
for (current_line <- content.lines) {
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
writer.write(es_json.toString())
println(es_json.toString())
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
}
}
writer.close()
}
}
This is function to which I added the trait but still I am getting the same exception.
I rewrote a smaller version of it:
def foo() {
val conf = new SparkConf()
.setAppName("linkin_spark")
.setMaster("local[2]")
.set("spark.executor.memory", "1g")
.set("spark.rdd.compress", "true")
.set("spark.storage.memoryFraction", "1")
val sc = new SparkContext(conf)
var es_json: JsValue = Json.obj()
val writer = new PrintWriter(new File("test.json"))
for (i <- 1 to 10) {
es_json = Json.obj(
"_id" -> i,
"_source" -> Json.obj(
"text" -> "Eureka!"
)
)
println(es_json)
writer.write(es_json.toString() + "\n")
}
writer.close()
}
This function works fine with and also without serializable. I cannot understand what's happening?
EDIT: First answer made on phone.
It's not your main class that needs to be serializable but the class you use in the rdd processing loop in this case inside for (fileData <- temp)
It needs to be serializable because the spark data is on multiple partitions that may be on multiple computers. So the functions you apply to this data need to be serializable so you can send them to the other computer where they will be executed in parallel.
PrintWriter cannot be serializable since it refers to a file that is only available from the original computer. Hence the serializaion error.
To write your data on the computer initializing the spark process. You need to take the data that is all over the cluster and bring it to your machine then write it.
To do that you can either collect the result. rdd.collect() and that will take all the data from the cluster and put it in your driver thread memory. Then you can write it to a file using the PrintWriter.
like this:
temp.flatMap { fileData =>
val filename = fileData._1
val content: String = fileData._2
println(s"For $filename, the data is:")
var startDoc = false // This is for the
var endDoc = false // whole file
var startText = false //
var endText = false //
var textChunk = new ListBuffer[String]()
var docID: String = ""
var es_json: JsValue = Json.obj()
var results = ArrayBuffer[String]()
for (current_line <- content.lines) {
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
results.append(es_json.toString())
println(es_json.toString())
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
}
results
}
.collect()
.foreach(es_json => writer.write(es_json))
If the result is too large for the driver thread memory you can use the saveAsTextFile function that will stream each partition to your drive. In this second case the path you give as argument will be made into a folder and each partition of your rdd will be written to a numbered file in it.
like this:
temp.flatMap { fileData =>
val filename = fileData._1
val content: String = fileData._2
println(s"For $filename, the data is:")
var startDoc = false // This is for the
var endDoc = false // whole file
var startText = false //
var endText = false //
var textChunk = new ListBuffer[String]()
var docID: String = ""
var es_json: JsValue = Json.obj()
var results = ArrayBuffer[String]()
for (current_line <- content.lines) {
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
results.append(es_json.toString())
println(es_json.toString())
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
}
results
}
.saveAsTextFile("test.json")
I've just started working with Scala in my new project (Scala 2.10.3, Play2 2.2.1, Reactivemongo 0.10.0), and encountered a pretty standard use case, which is - stream all the users in MongoDB to the external client. After navigating Enumerator, Enumeratee API I have not found a solid solution for that, and so I solved this in following way:
val users = collection.find(Json.obj()).cursor[User].enumerate(Integer.MAX_VALUE, false)
var first:Boolean = true
val indexedUsers = (users.map(u => {
if(first) {
first = false;
Json.stringify(Json.toJson(u))
} else {
"," + Json.stringify(Json.toJson(u))
}
}))
Which, from my point of view, is a little bit tricky - mainly because I needed to add Json Start Array, Json End Array and comma separators in element list, and I was not able to provide it as a pure Json stream, so I converted it to String steam.
What is a standard solution for that, using reactivemongo in play?
I wrote a helper function which does what you want to achieve:
def intersperse[E](e: E, enum: Enumerator[E]): Enumerator[E] = new Enumerator[E] {
val element = Input.El(e)
override def apply[A](i1: Iteratee[E, A]): Future[Iteratee[E, A]] = {
var iter = i1
val loop: Iteratee[E, Unit] = {
lazy val contStep = Cont(step)
def step(in: Input[E]): Iteratee[E, Unit] = in match {
case Input.Empty ⇒ contStep
case Input.EOF ⇒ Done((), Input.Empty)
case e # Input.El(_) ⇒
iter = Iteratee.flatten(iter.feed(element).flatMap(_.feed(e)))
contStep
}
lazy val contFirst = Cont(firstStep)
def firstStep(in: Input[E]): Iteratee[E, Unit] = in match {
case Input.EOF ⇒ Done((), Input.Empty)
case Input.Empty ⇒
iter = Iteratee.flatten(iter.feed(in))
contFirst
case Input.El(x) ⇒
iter = Iteratee.flatten(iter.feed(in))
contStep
}
contFirst
}
enum(loop).map { _ ⇒ iter }
}
}
Usage:
val prefix = Enumerator("[")
val suffix = Enumerator("]")
val asStrings = Enumeratee.map[User] { u => Json.stringify(Json.toJson(u)) }
val result = prefix >>> intersperse(",", users &> asStrings) >>> suffix
Ok.chunked(result)