Parsing nodes on JSON with Scala - - json

I've been asked to parse a JSON file to get all the buses that are over a specified speed inputed by the user.
The JSON file can be downloaded here
It's like this:
{
"COLUMNS": [
"DATAHORA",
"ORDEM",
"LINHA",
"LATITUDE",
"LONGITUDE",
"VELOCIDADE"
],
"DATA": [
[
"04-16-2015 00:00:55",
"B63099",
"",
-22.7931,
-43.2943,
0
],
[
"04-16-2015 00:01:02",
"C44503",
781,
-22.853649,
-43.37616,
25
],
[
"04-16-2015 00:11:40",
"B63067",
"",
-22.7925,
-43.2945,
0
],
]
}
The thing is: I'm really new to scala and I have never worked with json before (shame on me). What I need is to get the "Ordem", "Linha" and "Velocidade" from DATA node.
I created a case class to enclousure all the data so as to later look for those who are over the specified speed.
case class Bus(ordem: String, linha: Int, velocidade: Int)
I did this reading the file as a textFile and spliting. Although this way, I need to foreknow the content of the file in order to go to the lines after DATA node.
I want to know how to do this using a JSON parser. I've tried many solutions, but I couldn't adapt to my problem, because I need to extract all the lines from DATA node instead of nodes inside one node.
Can anyone help me?
PS: Sorry for my english, not a native speaker.

First of all, you need to understand the different JSON data types. The basic types in JSON are numbers, strings, booleans, arrays, and objects. The data returned in your example is an object with two keys: COLUMNS and DATA. The COLUMNS key has a value that is an array of strings and numbers. The DATA key has a value which is an array of arrays of strings.
You can use a library like PlayJSON to work with this type of data:
val js = Json.parse(x).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]
val busses = values.map(valueList => {
val keyValues = (keys zip valueList).toMap
for {
ordem <- keyValues("ORDEM").asOpt[String]
linha <- keyValues("LINHA").asOpt[Int]
velocidade <- keyValues("VELOCIDADE").asOpt[Int]
} yield Bus(ordem, linha, velocidade)
})
Note the use of asOpt when converting the properties to the expected types. This operator converts the key-values to the provided type if possible (wrapped in Some), and returns None otherwise. So, if you want to provide a default value instead of ignoring other results, you could use keyValues("LINHA").asOpt[Int].getOrElse(0), for example.
You can read more about the Play JSON methods used here, like \ and as, and asOpt in their docs.

You can use Spark SQL to achieve it. Refer section under JSON Datasets here
In essence, Use spark APIs to load a JSON and register it as temp table.
You can run your SQL queries on the table from there.

As seen on #Ben Reich answer, that code works great. Thank you very much.
Although, my Json had some type problems on "Linha". As it can be seen on the JSON example that I put on the Question, there are "" and also numbers, e.g., 781.
When trying to do keyValues("LINHA").asOpt[Int].getOrElse(0), it was producing an error saying that value flatMap is not a member of Int.
So, I had to change some things:
case class BusContainer(ordem: String, linha: String, velocidade: Int)
val jsonString = fromFile("./project/rj_onibus_gps.json").getLines.mkString
val js = Json.parse(jsonString).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]
val buses = values.map(valueList => {
val keyValues = (keys zip valueList).toMap
println(keyValues("ORDEM"),keyValues("LINHA"),keyValues("VELOCIDADE"))
for {
ordem <- keyValues("ORDEM").asOpt[String]
linha <- keyValues("LINHA").asOpt[Int].orElse(keyValues("LINHA").asOpt[String])
velocidade <- keyValues("VELOCIDADE").asOpt[Int]
} yield BusContainer(ordem, linha.toString, velocidade)
})
Thanks for the help!

Related

convert nested json string column into map type column in spark

overall aim
I have data landing into blob storage from an azure service in form of json files where each line in a file is a nested json object. I want to process this with spark and finally store as a delta table with nested struct/map type columns which can later be queried downstream using the dot notation columnName.key
data nesting visualized
{
key1: value1
nestedType1: {
key1: value1
keyN: valueN
}
nestedType2: {
key1: value1
nestedKey: {
key1: value1
keyN: valueN
}
}
keyN: valueN
}
current approach and problem
I am not using the default spark json reader as it is resulting in some incorrect parsing of the files instead I am loading the files as text files and then parsing using udfs by using python's json module ( eg below ) post which I use explode and pivot to get the first level of keys into columns
#udf('MAP<STRING,STRING>' )
def get_key_val(x):
try:
return json.loads(x)
except:
return None
Post this initial transformation I now need to convert the nestedType columns to valid map types as well. Now since the initial function is returning map<string,string> the values in nestedType columns are not valid jsons so I cannot use json.loads, instead I have regex based string operations
#udf('MAP<STRING,STRING>' )
def convert_map(string):
try:
regex = re.compile(r"""\w+=.*?(?:(?=,(?!"))|(?=}))""")
obj = dict([(a.split('=')[0].strip(),(a.split('=')[1])) for a in regex.findall(s)])
return obj
except Exception as e:
return e
this is fine for second level of nesting but if I want to go further that would require another udf and subsequent complications.
question
How can I use a spark udf or native spark functions to parse the nested json data such that it is queryable in columnName.key format.
also there is no restriction of spark version, hopefully I was able to explain this properly. do let me know if you want me to put some sample data and the code for ease. Any help is appreciated.

Apply key to map obtained via pattern matching in Scala (type erased)

I am trying to query an API which returns a JSON array (e.g. [{"name":"obj1", "value":5}, {"name":"obj2", "value":2}]) and process the result, which gets parsed as an Option[List[Map[String,Any]]]. However, I am not sure how to properly extract each Map, since the types are erased at runtime.
import scala.util.parsing.json._
import scalaj.http._
val url = "https://api.github.com/users/torvalds/repos"
val req = Http(url).asString
val parsed = JSON.parseFull(req.body) match {
case Some(data) => data match {
case list: List[_] => list
case _ => sys.error("Result is not a list.")
}
case None => sys.error("Invalid JSON received.")
}
parsed.foreach{
case x: Map[_,_] => x.get("full_name") // error here
}
The error occurs because I cannot apply the function with a String key type. However, because of type erasure, the key and value type are unknown, and specifying that it's a String map throws compiler warnings.
Am I going about things the wrong way? Or maybe I'd have better luck with a different HTTP/JSON library?
You can replace your last line with:
parsed.collect{ case x: Map[_,_] => x.asInstanceOf[Map[String,Any]].get("full_name") }
We sort of "cheat" here since we know the keys in a JSON are always Strings.
As for your last question, if you need something lightweight, I think what you have here is as simple as it gets.
Take a look at this SO post if you want to do something more powerful with your pattern matching.

Play Framework 2 cannot properly validate Json Arrays (code inside)

I want to upload a JSON array of entities using Play.
My model looks like this:
case class Pet(name: String, age: Int)
object Pet {
implicit val petReads: Reads[Pet] = (
(JsPath \ "name").read[String](minLength[String](2)) and
(JsPath \ "age").read[Int](min(0))
)(Pet.apply _)
)
My JSON input is a JSON array of entries. For example:
'[{"name": "Scooby","age":7},{"name": "Toothless","age": 3}]'
The action for working on the entries is this:
def create = Action(BodyParsers.parse.json) { implicit request =>
val entries = request.body.validate[Seq[Pet]]
entries.fold(errors => {BadRequest(Json.obj("status" -> "Bad Request", "message" -> JsError.toJson(errors)))},
elements => {//do something with it
Ok(Json.obj("status" -> "OK", "message" -> (Json.toJson("Done."))))})
}
I want my validation to be able to detect value problems. For example, if the string.length < 2 or the age number is negative.
However it doesn't work for arrays with .validate[Seq[Pet]]. Entries with a name of length < 2 can get past the validation.
If I try to upload each entry individually as a simple JSON entry (not a json array) and use .validate[Pet] instead, everything works fine. Any tips on how to tweak the validation so that it works for arrays?
Found the solution, just use .validate[Array[Pet]] and it works out of the box.

Play Framework 2.2.1 / How should controller handle a queryString in JSON format

My client side executes a server call encompassing data (queryString) in a JSON object like this:
?q={"title":"Hello"} //non encoded for the sample but using JSON.stringify actually
What is an efficient way to retrieve the title and Hello String?
I tried this:
val params = request.queryString.map {case(k,v) => k->v.headOption}
that returns the Tuple: (q,Some({"title":"hello"}))
I could further extract to retrieve the values (although I would need to manually map the JSON object to a Scala object), but I wonder whether there is an easier and shorter way.
Any idea?
First, if you intend to pluck only the q parameter from a request and don't intend to do so via a route you could simply grab it directly:
val q: Option[String] = request.getQueryString("q")
Next you'd have to parse it as a JSON Object:
import play.api.libs.json._
val jsonObject: Option[JsValue] = q.map(raw: String => json.parse(raw))
with that you should be able to check for the components the jsonObject contains:
val title: Option[String] = jsonObject.flatMap { json: JsValue =>
(json \ "title").asOpt[String]
}
In short, omitting the types you could use a for comprehension for the whole thing like so:
val title = for {
q <- request.getQueryString("q")
json <- Try(Json.parse(q)).toOption
titleValue <- (json \ "title").asOpt[String]
} yield titleValue
Try is defined in scala.util and basically catches Exceptions and wraps it in a processable form.
I must admit that the last version simply ignores Exceptions during the parsing of the raw JSON String and treats them equally to "no title query has been set".
That's not the best way to know what actually went wrong.
In our productive code we're using implicit shortcuts that wraps a None and JsError as a Failure:
val title: Try[String] = for {
q <- request.getQueryString("q") orFailWithMessage "Query not defined"
json <- Try(Json.parse(q))
titleValue <- (json \ "title").validate[String].asTry
} yield titleValue
Staying in the Try monad we gain information about where it went wrong and can provide that to the User.
orFailWithMessage is basically an implicit wrapper for an Option that will transform it into Succcess or Failure with the specified message.
JsResult.asTry is also simply a pimped JsResult that will be Success or Failure as well.

How do I deserialize a JSON array using the Play API

I have a string that is a Json array of two objects.
> val ss = """[ {"key1" :"value1"}, {"key2":"value2"}]"""
I want to use the Play Json libraries to deserialize it and create a map from the key values to the objects.
def deserializeJsonArray(ss:String):Map[String, JsValue] = ???
// Returns Map("value1" -> {"key1" :"value1"}, "value2" -> {"key2" :"value2"})
How do I write the deserializeJsonArray function? This seems like it should be easy, but I can't figure it out from either the Play documentation or the REPL.
I'm a bit rusty, so please forgive the mess. Perhaps another overflower can come in here and clean it up for me.
This solution assumes that the JSON is an array of objects, and each of the objects contains exactly one key-value pair. I would highly recommend spicing it up with some error handling and/or pattern matching to validate the parsed JSON string.
def deserializeJsonArray(ss: String): Map[String, JsValue] = {
val jsObjectSeq: Seq[JsObject] = Json.parse(ss).as[Seq[JsObject]]
val jsValueSeq: Seq[JsValue] = Json.parse(ss).as[Seq[JsValue]]
val keys: Seq[String] = jsObjectSeq.map(json => json.keys.head)
(keys zip jsValueSeq).toMap
}