How to print JSON objects in AWK - json

I was looking for some built-in functions inside awk to easily generate JSON objects. I came across several answers and decided to create my own.
I'd like to generate JSON from multidimensional arrays, where I store table style data, and to use separate and dynamic definition of JSON schema to be generated from that data.
Desired output:
{
"Name": JanA
"Surname": NowakA
"ID": 1234A
"Role": PrezesA
}
{
"Name": JanD
"Surname": NowakD
"ID": 12341D
"Role": PrezesD
}
{
"Name": JanC
"Surname": NowakC
"ID": 12342C
"Role": PrezesC
}
Input file:
pierwsza linia
druga linia
trzecia linia
dane wspólników
imie JanA
nazwisko NowakA
pesel 11111111111A
funkcja PrezesA
imie Ja"nD
nazwisko NowakD
pesel 11111111111
funkcja PrezesD
imie JanC
nazwisko NowakC
pesel 12342C
funkcja PrezesC
czwarta linia
reprezentanci
imie Tomek
Based on input file i created a multidimensional array:
JanA NowaA 1234A PrezesA
JanD NowakD 12341D PrezesD
JanC NowakC 12342C PrezesC

I'll take a stab at a gawk solution. The indenting isn't perfect and the results aren't ordered (see "Sorting" note below), but it's at least able to walk a true multidimensional array recursively and should produce valid, parsable JSON from any array. Bonus: the data array is the schema. Array keys become JSON keys. There's no need to create a separate schema array in addition to the data array.
Just be sure to use the true multidimensional array[d1][d2][d3]... convention of constructing your data array, rather than the concatenated index array[d1,d2,d3...] convention.
Update:
I've got an updated JSON gawk script posted as a GitHub Gist. Although the script below is tested as working with OP's data, I might've made improvements since this post was last edited. Please see the Gist for the most thoroughly tested, bug-squashed version.
#!/usr/bin/gawk -f
BEGIN { IGNORECASE = 1 }
$1 ~ "imie" { record[++idx]["name"] = $2 }
$1 ~ "nazwisko" { record[idx]["surname"] = $2 }
$1 ~ "pesel" { record[idx]["ID"] = $2 }
$1 ~ "funkcja" { record[idx]["role"] = $2 }
END { print serialize(record, "\t") }
# ==== FUNCTIONS ====
function join(arr, sep, _p, i) {
# syntax: join(array, string separator)
# returns a string
for (i in arr) {
_p["result"] = _p["result"] ~ "[[:print:]]" ? _p["result"] sep arr[i] : arr[i]
}
return _p["result"]
}
function quote(str) {
gsub(/\\/, "\\\\", str)
gsub(/\r/, "\\r", str)
gsub(/\n/, "\\n", str)
gsub(/\t/, "\\t", str)
return "\"" str "\""
}
function serialize(arr, indent_with, depth, _p, i, idx) {
# syntax: serialize(array of arrays, indent string)
# returns a JSON formatted string
# sort arrays on key, ensures [...] values remain properly ordered
if (!PROCINFO["sorted_in"]) PROCINFO["sorted_in"] = "#ind_num_asc"
# determine whether array is indexed or associative
for (i in arr) {
_p["assoc"] = or(_p["assoc"], !(++_p["idx"] in arr))
}
# if associative, indent
if (_p["assoc"]) {
for (i = ++depth; i--;) {
_p["end"] = _p["indent"]; _p["indent"] = _p["indent"] indent_with
}
}
for (i in arr) {
# If key length is 0, assume its an empty object
if (!length(i)) return "{}"
# quote key if not already quoted
_p["key"] = i !~ /^".*"$/ ? quote(i) : i
if (isarray(arr[i])) {
if (_p["assoc"]) {
_p["json"][++idx] = _p["indent"] _p["key"] ": " \
serialize(arr[i], indent_with, depth)
} else {
# if indexed array, dont print keys
_p["json"][++idx] = serialize(arr[i], indent_with, depth)
}
} else {
# quote if not numeric, boolean, null, already quoted, or too big for match()
if (!((arr[i] ~ /^[0-9]+([\.e][0-9]+)?$/ && arr[i] !~ /^0[0-9]/) ||
arr[i] ~ /^true|false|null|".*"$/) || length(arr[i]) > 1000)
arr[i] = quote(arr[i])
_p["json"][++idx] = _p["assoc"] ? _p["indent"] _p["key"] ": " arr[i] : arr[i]
}
}
# I trial and errored the hell out of this. Problem is, gawk cant distinguish between
# a value of null and no value. I think this hack is as close as I can get, although
# [""] will become [].
if (!_p["assoc"] && join(_p["json"]) == "\"\"") return "[]"
# surround with curly braces if object, square brackets if array
return _p["assoc"] ? "{\n" join(_p["json"], ",\n") "\n" _p["end"] "}" \
: "[" join(_p["json"], ", ") "]"
}
Output resulting from OP's example data:
[{
"ID": "1234A",
"name": "JanA",
"role": "PrezesA",
"surname": "NowakA"
}, {
"ID": "12341D",
"name": "JanD",
"role": "PrezesD",
"surname": "NowakD"
}, {
"ID": "12342C",
"name": "JanC",
"role": "PrezesC",
"surname": "NowakC"
}, {
"name": "Tomek"
}]
Sorting
Although the results by default are ordered in a manner only gawk understands, it is possible for gawk to sort the results on a field. If you'd like to sort on the ID field for example, add this function:
function cmp_ID(i1, v1, i2, v2) {
if (!isarray(v1) && v1 ~ /"ID"/ ) {
return v1 < v2 ? -1 : (v1 != v2)
}
}
Then insert this line within your END section above print serialize(record):
PROCINFO["sorted_in"] = "cmp_ID"
See Controlling Array Traversal for more information.

My updated awk implementation of simple array printer with regex based validation for each column(running using gawk):
function ltrim(s) { sub(/^[ \t]+/, "", s); return s }
function rtrim(s) { sub(/[ \t]+$/, "", s); return s }
function sTrim(s){
return rtrim(ltrim(s));
}
function jsonEscape(jsValue) {
gsub(/\\/, "\\\\", jsValue)
gsub(/"/, "\\\"", jsValue)
gsub(/\b/, "\\b", jsValue)
gsub(/\f/, "\\f", jsValue)
gsub(/\n/, "\\n", jsValue)
gsub(/\r/, "\\r", jsValue)
gsub(/\t/, "\\t", jsValue)
return jsValue
}
function jsonStringEscapeAndWrap(jsValue) {
return "\42" jsonEscape(jsValue) "\42"
}
function jsonPrint(contentArray, contentRowsCount, schemaArray){
result = ""
schemaLength = length(schemaArray)
for (x = 1; x <= contentRowsCount; x++) {
result = result "{"
for(y = 1; y <= schemaLength; y++){
result = result "\42" sTrim(schemaArray[y]) "\42:" sTrim(contentArray[x, y])
if(y < schemaLength){
result = result ","
}
}
result = result "}"
if(x < contentRowsCount){
result = result ",\n"
}
}
return result
}
function jsonValidateAndPrint(contentArray, contentRowsCount, schemaArray, schemaColumnsCount, errorArray){
result = ""
errorsCount = 1
for (x = 1; x <= contentRowsCount; x++) {
jsonRow = "{"
for(y = 1; y <= schemaColumnsCount; y++){
regexValue = schemaArray[y, 2]
jsonValue = sTrim(contentArray[x, y])
isValid = jsonValue ~ regexValue
if(isValid == 0){
errorArray[errorsCount, 1] = "\42" sTrim(schemaArray[y, 1]) "\42"
errorArray[errorsCount, 2] = "\42Value " jsonValue " not match format: " regexValue " \42"
errorArray[errorsCount, 3] = x
errorsCount++
jsonValue = "null"
}
jsonRow = jsonRow "\42" sTrim(schemaArray[y, 1]) "\42:" jsonValue
if(y < schemaColumnsCount){
jsonRow = jsonRow ","
}
}
jsonRow = jsonRow "}"
result = result jsonRow
if(x < contentRowsCount){
result = result ",\n"
}
}
return result
}
BEGIN{
rowsCount =1
matchCount = 0
errorsCount = 0
shareholdersJsonSchema[1, 1] = "Imie"
shareholdersJsonSchema[2, 1] = "Nazwisko"
shareholdersJsonSchema[3, 1] = "PESEL"
shareholdersJsonSchema[4, 1] = "Funkcja"
shareholdersJsonSchema[1, 2] = "\\.*"
shareholdersJsonSchema[2, 2] = "\\.*"
shareholdersJsonSchema[3, 2] = "^[0-9]{11}$"
shareholdersJsonSchema[4, 2] = "\\.*"
errorsSchema[1] = "PropertyName"
errorsSchema[2] = "Message"
errorsSchema[3] = "PositionIndex"
resultSchema[1]= "ShareHolders"
resultSchema[2]= "Errors"
}
/dane wspólników/,/czwarta linia/{
if(/imie/ || /nazwisko/ || /pesel/ || /funkcja/){
if(/imie/){
shareholdersArray[rowsCount, 1] = jsonStringEscapeAndWrap($2)
matchCount++
}
if(/nazwisko/){
shareholdersArray[rowsCount, 2] = jsonStringEscapeAndWrap($2)
matchCount ++
}
if(/pesel/){
shareholdersArray[rowsCount, 3] = $2
matchCount ++
}
if(/funkcja/){
shareholdersArray[rowsCount, 4] = jsonStringEscapeAndWrap($2)
matchCount ++
}
if(matchCount==4){
rowsCount++
matchCount = 0;
}
}
}
END{
shareHolders = jsonValidateAndPrint(shareholdersArray, rowsCount - 1, shareholdersJsonSchema, 4, errorArray)
shareHoldersErrors = jsonPrint(errorArray, length(errorArray) / length(errorsSchema), errorsSchema)
resultArray[1,1] = "\n[\n" shareHolders "\n]\n"
resultArray[1,2] = "\n[\n" shareHoldersErrors "\n]\n"
resultJson = jsonPrint(resultArray, 1, resultSchema)
print resultJson
}
Produces output:
{"ShareHolders":
[
{"Imie":"JanA","Nazwisko":"NowakA","PESEL":null,"Funkcja":"PrezesA"},
{"Imie":"Ja\"nD","Nazwisko":"NowakD","PESEL":11111111111,"Funkcja":"PrezesD"},
{"Imie":"JanC","Nazwisko":"NowakC","PESEL":null,"Funkcja":"PrezesC"}
]
,"Errors":
[
{"PropertyName":"PESEL","Message":"Value 11111111111A not match format: ^[0-9]{11}$ ","PositionIndex":1},
{"PropertyName":"PESEL","Message":"Value 12342C not match format: ^[0-9]{11}$ ","PositionIndex":3}
]
}

Related

Converting JSON data to CSV in Cloudant using List and View

I tried to convert the JSON data in my Cloudant db to csv format, using the List function. It works perfectly for all values except JSON array values, i.e. the nested values. For these, I am getting [object object] as the output in my csv document.
Please find the sample JSON document which I am using, below:
{
"NAME": "Aparna",
"EMAIL": "something#domain.com",
"PUBLIC_OFFICIALS_CONTACTED": [
{ "NAME_PUBLIC_OFFICIAL": [ "ab"],
"TITLE_PUBLIC_OFFICIAL": ["cd"]}
],
"COMMUNICATION_TYPE": [
"Meeting",
"Phone",
"Handout",
"Conference"
],
"NAMES_OF_OTHERS_FROM_IBM": [
{ "NAME_OF_OTHERS": ["ef"],
"TITLE_OF_OTHERS": [ "gh"]}
],
"COMMUNICATION_BENEFIT": "Yes",
"LAST_UPDATE_BY" : "ap"
}
Please find the map and list functions used below :
"map" : "function(doc){
if((\"SAVE_TYPE_SUBMIT\" in doc) && (doc.SAVE_TYPE_SUBMIT== \"Submit\")) {
emit (doc. LAST_UPDATE_BY,[doc.NAME,doc.EMAIL,doc.PUBLIC_OFFICIALS_CONTACTED[0].NAME_PUBLIC_OFFICIAL,\n doc.PUBLIC_OFFICIALS_CONTACTED[0].TITLE_PUBLIC_OFFICIAL,doc.COMMUNICATION_TYPE,doc.NAMES_OF_OTHERS_FROM_IBM[0].NAME_OF_OTHERS, doc.NAMES_OF_OTHERS_FROM_IBM[0].TITLE_OF_OTHERS, doc.COMMUNICATION_BENEFIT,doc. LAST_UPDATE_BY,doc.LAST_UPDATE_DATE]) ;
}
}
"list" : "function (head, req) {
var row;
start({\n headers: {'Content-Type': 'text/csv' },
});
var first = true;
while(row = getRow()) {
var doc = row.doc;
if (first) {
send(Object.keys(doc).join(',') + '\\n');
first = false;\n }
var line = '';
for(var i in doc) {
// comma separator
if (line.length > 0) {
line += ',';\n }
// output the value, ensuring values that themselves
// contain commas are enclosed in double quotes
var val = doc[i];
if (typeof val == 'string' && val.indexOf(',') > -1) {
line += '\"' + val.replace(/\"/g,'\"\"') + '\"';
}
else {
line += val;
}
}
line += '\\n';
send(line);
}}"
Note : In the map, only the first values have been fetched from the JSON arrays for now, on purpose, to simplify the function.
Please help understand how to fetched the nested JSON values or arrays and download the same in csv format. Any guidance would be much appreciated!
You can try to stringify the object you are trying to export and you will get some clue
if (typeof val == 'string' && val.indexOf(',') > -1) {
line += '\"' + val.replace(/\"/g,'\"\"') + '\"';
}
else {
line += JSON.stringify(val);
}
Or even better
if (typeof val == 'string' && val.indexOf(',') > -1) {
line += '\"' + val.replace(/\"/g,'\"\"') + '\"';
}
else if(val instanceof Array){
line += val.join(',');
}
else {
line += JSON.stringify(val);
}
There are a couple of things to change here that might help. The first thing is that you don't need to emit all the values you want to use, because you can access the document itself from the list when dealing with a view.
With this in mind, the map could have an emit like
emit (doc.LAST_UPDATE_BY, null);
With this in place, if you request the list/view with include_docs=true then you can refer to the fields in your document inside the while(row = getRow()) section like this:
send(row.doc.NAME + ',' + row.doc.EMAIL + '\\n');
And for the nested documents, try something like:
row.doc.PUBLIC_OFFICIALS_CONTACTED.0.NAME_PUBLIC_OFFICIAL
You already referred in another question to the article I'd recommend for a full working example https://developer.ibm.com/clouddataservices/2015/09/22/export-cloudant-json-as-csv-rss-or-ical/ - hopefully this explanation helps also.

Convert spark decision tree model debug string to nested JSON in scala

Similar to the tree json parsing quoted here, I am trying to implement a simple visualization of decision trees in scala. It is exactly same as the display method available in databricks notebooks.
I am new to scala and struggling to get the logic right. I understand we have to make recursive calls to build the children and break when the final prediction values are shown. i have attempted a code here using the below mentioned input model debug string
def getStatmentType(x: String): (String, String) = {
val ifPattern = "If+".r
val ifelsePattern = "Else+".r
var t = ifPattern.findFirstIn(x.toString)
if(t != None){
("If", (x.toString).replace("If",""))
}else {
var ts = ifelsePattern.findFirstIn(x.toString)
if(ts != None) ("Else", (x.toString).replace("Else", ""))
else ("None", (x.toString).replace("(", "").replace(")",""))
}
}
def delete[A](test:List[A])(i: Int) = test.take(i) ++ test.drop((i+1))
def BuildJson(tree:List[String]):List[Map[String, Any]] = {
var block:List[Map[String, Any]] = List()
var lines:List[String] = tree
loop.breakable {
while (lines.length > 0) {
println("here")
var (cond, name) = getStatmentType(lines(0))
println("initial" + cond)
if (cond == "If") {
println("if" + cond)
// lines = lines.tail
lines = delete(lines)(0)
block = block :+ Map("if-name" -> name, "children" -> BuildJson(lines))
println("After pop Else State"+lines(0))
val (p_cond, p_name) = getStatmentType(lines(0))
// println(p_cond + " = "+ p_name+ "\n")
cond = p_cond
name = p_name
println(cond + " after="+ name+ "\n")
if (cond == "Else") {
println("else" + cond)
lines = lines.tail
block = block :+ Map("else-name" -> name, "children" -> BuildJson(lines))
}
}else if( cond == "None") {
println(cond + "NONE")
lines = delete(lines)(0)
block = block :+ Map("predict" -> name)
}else {
println("Finaly Break")
println("While loop--" +lines)
loop.break()
}
}
}
block
}
def treeJson1(str: String):JsValue = {
val str = "If (feature 0 in {1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,10.0,11.0,12.0,13.0})\n If (feature 0 in {6.0})\n Predict: 17.0\n Else (feature 0 not in {6.0})\n Predict: 6.0\n Else (feature 0 not in {1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,10.0,11.0,12.0,13.0})\n Predict: 20.0"
val x = str.replace(" ","")
val xs = x.split("\n").toList
var js = BuildJson(xs)
println(MapReader.mapToJson(js))
Json.toJson("")
}
Expected output:
[
{
'name': 'Root',
'children': [
{
'name': 'feature 0 in {1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,10.0,11.0,12.0,13.0}',
'children': [
{
'name': 'feature 0 in {6.0}',
'children': [
{
'name': 'Predict: 17.0'
}
]
},
{
'name': 'feature 0 not in {6.0}',
'children': [
{
'name': 'Predict: 6.0'
}
]
}
]
},
{
'name': 'feature 0 not in {1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,10.0,11.0,12.0,13.0}',
'children': [
{
'name': 'Predict: 20.0'
}
]
}
]
you don`t need to parse the debugstring, instead, you can parse from the rootnode of the model.
refer to enter link description here

How to yield a JSON object from a for loop in scala?

for (character <- content) {
if (character == '\n') {
val current_line = line.mkString
line.clear()
current_line match {
case docStartRegex(_*) => {
startDoc = true
endText = false
endDoc = false
}
case docnoRegex(group) => {
docID = group.trim
}
case docTextStartRegex(_*) => {
startText = true
}
case docTextEndRegex(_*) => {
endText = true
startText = false
}
case docEndRegex(_*) => {
endDoc = true
startDoc = false
es_json = Json.obj(
"_index" -> "ES_SPARK_AP",
"_type" -> "document",
"_id" -> docID,
"_source" -> Json.obj(
"text" -> textChunk.mkString(" ")
)
)
// yield es_json
textChunk.clear()
}
case _ => {
if (startDoc && !endDoc && startText) {
textChunk += current_line.trim
}
}
}
} else {
line += character
}
}
The above for-loop parses through a text file and creates a JSON object of each chunk parsed in a loop. This is JSON will be sent to for further processing to Elasticsearch. In python, we can yield the JSON and use generator easily like:
def func():
for i in range(num):
... some computations ...
yield {
JSON ## JSON is yielded
}
for json in func(): ## we parse through the generator here.
process(json)
I cannot understand how I can use yield in similar fashion using scala?
If you want lazy returns, scala does this using Iterator types. Specifically if you want to handle line by line values, I'd split it into lines first with .lines
val content: String = ???
val results: Iterator[Json] =
for {
lines <- content.lines
line <- lines
} yield {
line match {
case docEndRegex(_*) => ...
}
}
You can also use a function directly
def toJson(line: String): Json =
line match {
case "hi" => Json.obj("line" -> "hi")
case "bye" => Json.obj("what" -> "a jerk")
}
val results: Iterator[Json] =
for {
lines <- content.lines
line <- lines
} yield toJson(line)
This is equivalent to doing
content.lines.map(line => toJson(line))
Or somewhat equivalently in python
lines = (line.strip() for line in content.split("\n"))
jsons = (toJson(line) for line in lines)

How do I pretty print JSON with multiple levels of minimization?

We have standard pretty printed JSON:
{
"results": {
"groups": {
"alpha": {
"items": {
"apple": {
"attributes": {
"class": "fruit"
}
},
"pear": {
"attributes": {
"class": "fruit"
}
},
"dog": {
"attributes": {
"class": null
}
}
}
},
"beta": {
"items": {
"banana": {
"attributes": {
"class": "fruit"
}
}
}
}
}
}
}
And we have JMin:
{"results":{"groups":{"alpha":{"items":{"apple":{"attributes":{"class":"fruit"}},"pear":{"attributes":{"class":"fruit"}},"dog":{"attributes":{"class":null}}}},"beta":{"items":{"banana":{"attributes":{"class":"fruit"}}}}}}}
But I want to be able to print JSON like this on the fly:
{
"results" : {
"groups" : {
"alpha" : {
"items" : {
"apple":{"attributes":{"class":"fruit"}},
"pear":{"attributes":{"class":"fruit"}},
"dog":{"attributes":{"class":null}}
}
},
"beta" : {
"items" : {
"banana":{"attributes":{"class":"fruit"}}}
}
}
}
}
The above I would describe as "pretty-print JSON, minimized at level 5". Are there any tools that do that?
I wrote my own JSON formatter, based on this script:
#! /usr/bin/env python
VERSION = "1.0.1"
import sys
import json
from optparse import OptionParser
def to_json(o, level=0):
if level < FOLD_LEVEL:
newline = "\n"
space = " "
else:
newline = ""
space = ""
ret = ""
if isinstance(o, basestring):
o = o.encode('unicode_escape')
ret += '"' + o + '"'
elif isinstance(o, bool):
ret += "true" if o else "false"
elif isinstance(o, float):
ret += '%.7g' % o
elif isinstance(o, int):
ret += str(o)
elif isinstance(o, list):
#ret += "[" + ",".join([to_json(e, level+1) for e in o]) + "]"
ret += "[" + newline
comma = ""
for e in o:
ret += comma
comma = "," + newline
ret += space * INDENT * (level+1)
ret += to_json(e, level+1)
ret += newline + space * INDENT * level + "]"
elif isinstance(o, dict):
ret += "{" + newline
comma = ""
for k,v in o.iteritems():
ret += comma
comma = "," + newline
ret += space * INDENT * (level+1)
#ret += '"' + str(k) + '"' + space + ':' + space
ret += '"' + str(k) + '":' + space
ret += to_json(v, level+1)
ret += newline + space * INDENT * level + "}"
elif o is None:
ret += "null"
else:
#raise TypeError("Unknown type '%s' for json serialization" % str(type(o)))
ret += str(o)
return ret
#main():
FOLD_LEVEL = 10000
INDENT = 4
parser = OptionParser(usage='%prog json_file [options]', version=VERSION)
parser.add_option("-f", "--fold-level", action="store", type="int",
dest="fold_level", help="int (all json is minimized to this level)")
parser.add_option("-i", "--indent", action="store", type="int",
dest="indent", help="int (spaces of indentation, default is 4)")
parser.add_option("-o", "--outfile", action="store", type="string",
dest="outfile", metavar="filename", help="write output to a file")
(options, args) = parser.parse_args()
if len(args) == 0:
infile = sys.stdin
elif len(args) == 1:
infile = open(args[0], 'rb')
else:
raise SystemExit(sys.argv[0] + " json_file [options]")
if options.outfile == None:
outfile = sys.stdout
else:
outfile = open(options.outfile, 'wb')
if options.fold_level != None:
FOLD_LEVEL = options.fold_level
if options.indent != None:
INDENT = options.indent
with infile:
try:
obj = json.load(infile)
except ValueError, e:
raise SystemExit(e)
with outfile:
outfile.write(to_json(obj))
outfile.write('\n')
The script accepts fold level, indent and output file from the command line:
$ jsonfold.py -h
Usage: jsonfold.py json_file [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-f FOLD_LEVEL, --fold-level=FOLD_LEVEL
int (all json is minimized to this level)
-i INDENT, --indent=INDENT
int (spaces of indentation, default is 4)
-o filename, --outfile=filename
write output to a file
To get my example from above, fold at the 5th level:
$ jsonfold.py test2 -f 5
{
"results": {
"groups": {
"alpha": {
"items": {
"pear": {"attributes":{"class":"fruit"}},
"apple": {"attributes":{"class":"fruit"}},
"dog": {"attributes":{"class":None}}
}
},
"beta": {
"items": {
"banana": {"attributes":{"class":"fruit"}}
}
}
}
}
}

Compare json equality in Scala

How can I compare if two json structures are the same in scala?
For example, if I have:
{
resultCount: 1,
results: [
{
artistId: 331764459,
collectionId: 780609005
}
]
}
and
{
results: [
{
collectionId: 780609005,
artistId: 331764459
}
],
resultCount: 1
}
They should be considered equal
You should be able to simply do json1 == json2, if the json libraries are written correctly. Is that not working for you?
This is with spray-json, although I would expect the same from every json library:
import spray.json._
import DefaultJsonProtocol._
Welcome to Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_51).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val json1 = """{ "a": 1, "b": [ { "c":2, "d":3 } ] }""".parseJson
json1: spray.json.JsValue = {"a":1,"b":[{"c":2,"d":3}]}
scala> val json2 = """{ "b": [ { "d":3, "c":2 } ], "a": 1 }""".parseJson
json2: spray.json.JsValue = {"b":[{"d":3,"c":2}],"a":1}
scala> json1 == json2
res1: Boolean = true
Spray-json uses an immutable scala Map to represent a JSON object in the abstract syntax tree resulting from a parse, so it is just Map's equality semantics that make this work.
You can also use scalatest-json
Example:
it("should fail on slightly different json explaining why") {
val input = """{"someField": "valid json"}""".stripMargin
val expected = """{"someField": "different json"}""".stripMargin
input should matchJson(expected)
}
When the 2 jsons doesn't match, a nice diff will be display which is quite useful when working with big jsons.
Can confirm that it also works just fine with the Jackson library using == operator:
val simpleJson =
"""
|{"field1":"value1","field2":"value2"}
""".stripMargin
val simpleJsonNode = objectMapper.readTree(simpleJson)
val simpleJsonNodeFromString = objectMapper.readTree(simpleJsonNode.toString)
assert(simpleJsonNode == simpleJsonNodeFromString)
spray-json is definitely great, but I use Gson since I already had dependency on Gson library on my project. I am using these in my unit tests, works well for simple json.
import com.google.gson.{JsonParser}
import org.apache.flume.event.JSONEvent
import org.scalatest.FunSuite
class LogEnricherSpec extends FunSuite {
test("compares json to json") {
val parser = new JsonParser()
assert(parser.parse("""
{
"eventType" : "TransferItems",
"timeMillis" : "1234567890",
"messageXml":{
"TransferId" : 123456
}
} """.stripMargin)
==
parser.parse("""
{
"timeMillis" : "1234567890",
"eventType" : "TransferItems",
"messageXml":{
"TransferId" : 123456
}
}
""".stripMargin))
}
Calling the method compare_2Json(str1,str2) will return a boolean value.
Please make sure that the two string parameters are json.
Welcome to use and test.
def compare_2Json(js1:String,js2:String): Boolean = {
var js_str1 = js1
var js_str2 = js2
js_str1=js_str1.replaceAll(" ","")
js_str2=js_str2.replaceAll(" ","")
var issame = false
val arrbuff1 = ArrayBuffer[String]()
val arrbuff2 = ArrayBuffer[String]()
if(js_str1.substring(0,1)=="{" && js_str2.substring(0,1)=="{" || js_str1.substring(0,1)=="["&&js_str2.substring(0,1)=="["){
for(small_js1 <- split_JsonintoSmall(js_str1);small_js2 <- split_JsonintoSmall((js_str2))) {
issame = compare_2Json(small_js1,small_js2)
if(issame == true){
js_str1 = js_str1.substring(0,js_str1.indexOf(small_js1))+js_str1.substring(js_str1.indexOf(small_js1)+small_js1.length)
js_str2 = js_str2.substring(0,js_str2.indexOf(small_js2))+js_str2.substring(js_str2.indexOf(small_js2)+small_js2.length)
}
}
js_str1 = js_str1.substring(1,js_str1.length-1)
js_str2 = js_str2.substring(1,js_str2.length-1)
for(str_js1 <- js_str1.split(","); str_js2 <- js_str2.split(",")){
if(str_js1!="" && str_js2!="")
if(str_js1 == str_js2){
js_str1 = js_str1.substring(0,js_str1.indexOf(str_js1))+js_str1.substring(js_str1.indexOf(str_js1)+str_js1.length)
js_str2 = js_str2.substring(0,js_str2.indexOf(str_js2))+js_str2.substring(js_str2.indexOf(str_js2)+str_js2.length)
}
}
js_str1=js_str1.replace(",","")
js_str2=js_str2.replace(",","")
if(js_str1==""&&js_str2=="")return true
else return false
}
else return false
}
def split_JsonintoSmall(js_str: String):ArrayBuffer[String]={
val arrbuff = ArrayBuffer[String]()
var json_str = js_str
while(json_str.indexOf("{",1)>0 || json_str.indexOf("[",1)>0){
if (json_str.indexOf("{", 1) < json_str.indexOf("[", 1) && json_str.indexOf("{",1)>0 || json_str.indexOf("{", 1) > json_str.indexOf("[", 1) && json_str.indexOf("[",1)<0 ) {
val right = findrealm(1, json_str, '{', '}')
arrbuff += json_str.substring(json_str.indexOf("{", 1), right + 1)
json_str = json_str.substring(0,json_str.indexOf("{",1))+json_str.substring(right+1)
}
else {
if(json_str.indexOf("[",1)>0) {
val right = findrealm(1, json_str, '[', ']')
arrbuff += json_str.substring(json_str.indexOf("[", 1), right + 1)
json_str = json_str.substring(0, json_str.indexOf("[", 1)) + json_str.substring(right + 1)
}
}
}
arrbuff
}
def findrealm(begin_loc: Int, str: String, leftch: Char, rightch: Char): Int = {
var left = str.indexOf(leftch, begin_loc)
var right = str.indexOf(rightch, left)
left = str.indexOf(leftch, left + 1)
while (left < right && left > 0) {
right = str.indexOf(rightch, right + 1)
left = str.indexOf(leftch, left + 1)
}
right
}