Parsing Complex JSON in Scala - json

I have the below JSON field in my data.
val myjsonString = """[{"A":[{"myName":"Sheldon""age":30"Qualification":"Btech"}]"B":"UnitedStates"},{"A":[{"myName":"Raj""age":35"Qualification":"BSC"}]"B":"UnitedKIngDom"},{"A":[{"myName":"Howard""age":40"Qualification":"MTECH"}]"B":"Australia"}] """
The parse method gives the following structure:
scala > val json = parse(myjsonString)
json: org.json4s.JValue = JArray(List(JObject(List((A,JArray(List(JObject(List((myName,JString(Sheldon)), (age,JInt(30)), (Qualification,JString(Btech))))))), (B,JString(UnitedStates)))), JObject(List((A,JArray(List(JObject(List((myName,JString(Raj)), (age,JInt(35)), (Qualification,JString(BSC))))))), (B,JString(UnitedKIngDom)))), JObject(List((A,JArray(List(JObject(List((myName,JString(Howard)), (age,JInt(40)), (Qualification,JString(MTECH))))))), (B,JString(Australia))))))
I am trying to parse it using Scala json4s. Visited almost all previously asked questions related to this, however, could not get proper solution to this. The O/P should be something like this:-
UnitedStates 30
UnitedKIngDom 35
Australia 40
or only the age in 30#35#45 format.

The JSON you posted is invalid, there are missing commas between your object fields.
In order to get the output that you want you will need to extract the data from the parsed AST that Json4s will create upon successful parsing of the data. Json4s provides a number of ways with which you can manipulate and extract data from a parsed AST.
You could map over the list of objects inside the JArray and extract the country and age from each object. I don't wish to provide code to do this, as you haven't provided an example of what you have tried to do other than simply parsing the JSON string.

Related

Nested JSON to dataframe in Scala

I am using Spark/Scala to make an API Request and parse the response into a dataframe. Following is the sample JSON response I am using for testing purpose:
API Request/Response
However, I tried to use the following answer from StackOverflow to convert to JSON but the nested fields are not being processed. Is there any way to convert the JSON string to a dataframe with columns??
I think the problem is that the json that you have attached, if we read it as a df, it is giving a single row(and it is very huge) and hence spark might be truncating the result.
If this is what you want then you can try to use the spark property spark.debug.maxToStringFields to a higher value(default is 25)
spark.conf().set("spark.debug.maxToStringFields", 100)
However, if you want to process the Results from json, then it would be better to get it as data frame and then do the processing. Here is how you can do it
val results = JsonParser.parseString(<json content>).getAsJsonObject().get("Results").getAsJsonArray.toString
import spark.implicits._
val df = spark.read.json(Seq(results).toDS)
df.show(false)

Custom Formatting of JSON output using Spark

I have a dataset with a bunch of BigDecimal values. I would like to output these records to a JSON file, but when I do the BigDecimal values will often be written with trailing zeros (123.4000000000000), but the spec we are must conform to does not allow this (for reasons I don't understand).
I am trying to see if there is a way to override how the data is printed to JSON.
Currently, my best idea is to convert each record to a string using JACKSON and then writing the data using df.write().text(..) rather than JSON.
I suggest to convert Decimal type to String before writing to JSON.
Below code is in Scala, but you can use it in Java easily
import org.apache.spark.sql.types.StringType
# COLUMN_NAME is your DataFrame column name.
val new_df = df.withColumn('COLUMN_NAME_TMP', df.COLUMN_NAME.cast(StringType)).drop('COLUMN_NAME').withColumnRenamed('COLUMN_NAME_TMP', 'COLUMN_NAME')

Parsing large JSON file with Scala and JSON4S

I'm working with Scala in IntelliJ IDEA 15 and trying to parse a large twitter record json file and count the total number of hashtags. I am very new to Scala and the idea of functional programming. Each line in the json file is a json object (representing a tweet). Each line in the file starts like so:
{"in_reply_to_status_id":null,"text":"To my followers sorry..
{"in_reply_to_status_id":null,"text":"#victory","in_reply_to_screen_name"..
{"in_reply_to_status_id":null,"text":"I'm so full I can't move"..
I am most interested in a property called "entities" which contains a property called "hastags" with a list of hashtags. Here is an example:
"entities":{"hashtags":[{"text":"thewayiseeit","indices":[0,13]}],"user_mentions":[],"urls":[]},
I've browsed the various scala frameworks for parsing json and have decided to use json4s. I have the following code in my Scala script.
import org.json4s.native.JsonMethods._
var json: String = ""
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line
val data = parse(json)
My logic here is that I am trying to read each line from twitter38.json into a string and then parse the entire string with parse(). The parse function is throwing an error claiming:
"Type mismatch, expected: Nothing, found:String."
I have seen examples that use parse() on strings that hold json objects such as
val jsontest =
"""{
|"name" : "bob",
|"age" : "50",
|"gender" : "male"
|}
""".stripMargin
val data = parse(jsontest)
but I have received the same error. I am coming from an object oriented programming background, is there something fundamentally wrong with the way I am approaching this problem?
You have most likely incorrectly imported dependencies to your Intellij project or modules into your file. Make sure you have the following lines imported:
import org.json4s.native.JsonMethods._
Even if you correctly import this module, parse(String: json) will not work for you, because you have incorrectly formed a json. Your json String will look like this:
"""{"in_reply_...":"someValue1"}{"in_reply_...":"someValues2"}"""
but should look as follows to be a valid json that can be parsed:
"""{{"in_reply_...":"someValue1"},{"in_reply_...":"someValues2"}}"""
i.e. you need starting and ending brackets for the json, and a comma between each line of tweets. Please read the json4s documenation for more information.
Although being almost 6 years old, I think this question deserves another try.
JSON format has a few misunderstandings in people's minds, especially how they are stored and how they are read back.
JSON documents, are stored as either a single object having all the other fields, or an array of multiple object possibly in same format. this second part is important because arrays in almost every programming language are defined by angle brackets and values separated by commas (note here I used a person object as my single value):
[
{"name":"John","surname":"Doe"},
{"name":"Jane","surname":"Doe"}
]
also note that everything except brackets, numbers and booleans are enclosed in quotes when written into file.
however, there is another use that is not official but preferred to transfer datasets easily where every object, or document as in nosql/mongo language, are stored in a new line like this:
{"name":"John","surname":"Doe"}
{"name":"Jane","surname":"Doe"}
so for the question, OP has a document written in this second form, but tries an algorithm written to read the first form. following code has few simple changes to achieve this, and the user must read the file knowing that:
var json: String = "["
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line + ","
json=json.splitAt(json.length()-1)._1
json+= "]"
val data = parse(json)
PS: although #sbrannon, has the correct idea, the example he/she gave has mistakenly curly braces instead of angle brackets to surround the data.
EDIT: I have added json=json.splitAt(json.length()-1)._1 because the code above ends with a trailing comma which will cause parse error per the JSON format definition.

How to represent nested json objects in a cucumber feature file

I have a need to represent JSON object in the feature file. I could use a json file for this for the code to pick up. But this would mean that i cant pass values from the feature file.
Scenario: Test
Given a condition is met
Then the following json response is sent
| json |
| {"dddd":"dddd","ggggg":"ggggg"}|
Above works for a normal json. However if there are nested objects etc then writing the json in a single line like above would make the feature very difficult to read and difficult to fix.
Please let me know.
You can use a string to do that, it makes the json much more legible.
Then the following json response is sent
"""
{
'dddd': 'dddd',
'ggggg': 'ggggg',
'somethingelse': {
'thing': 'thingvalue',
'thing2': 'thing2value'
}
}
"""
In the code, you can use it directly:
Then(/^the following json response is sent$/) do |message|
expect(rest_stub.body).to eq(message)
end
or something like that.

Jackson filter its response based on JSON field

I parse about ~300kb (it might grow) json file, so instead of using default Android's JSON parser I've used Jackson. However, I have to filter it's response based on value of field (from JSON's object). Example:
Json File:
{"foobars":[{"foo":1,"bar":1},{"foo":2,"bar":2}]}
and the response will be an ArrayList (in my case) with these two objcets. However I'd like to get the objects only if "foo" < 2. How to do that?