Extract fields from JSON - json

I have a JSON object of the form:
{"apps":{"app":[{"id":"application_1481567788061_0002","user":"root","name":"wordcount.py","queue":"default","state":"FAILED","finalStatus":"FAILED","progress":0.0,"trackingUI":"History", "diagnostics":"Application application_1481567788061_0002 failed 2 times due to AM Container for appattempt_1481567788061_0002_000002 exited with exitCode: 255\nFor more detailed output, check application tracking page:http://sandbox:8088/proxy/application_1481567788061_0002/Then, click on links to logs of each attempt.\nDiagnostics: Exception from container-launch.\nContainer id: container_1481567788061_0002_02_000001\nExit code: 255\nStack trace: ExitCodeException exitCode=255: \n\tat org.apache.hadoop.util.Shell.runCommand(Shell.java:538)\n\tat org.apache.hadoop.util.Shell.run(Shell.java:455)\n\tat org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)\n\tat org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:262)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:744)\n\n\nContainer exited with a non-zero exit code 255\nFailing this attempt. Failing the application.","clusterId":1481567788061,"applicationType":"SPARK","applicationTags":"","startedTime":1481568051052,"finishedTime":1481568079289,"elapsedTime":28237,"amHostHttpAddress":"sandbox:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":55598,"vcoreSeconds":27,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0},{"id":"application_1481567788061_0001","user":"root","name":"pi.py","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","diagnostics":"","clusterId":1481567788061,"applicationType":"SPARK","applicationTags":"","startedTime":1481567853324,"finishedTime":1481567888648,"elapsedTime":35324,"amContainerLogs":"http://sandbox:8042/node/containerlogs/container_1481567788061_0001_01_000001/root","amHostHttpAddress":"sandbox:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":138031,"vcoreSeconds":66,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}]}}
I would like to extract from it a List[Application], where application is:
case class Application(id: String, user: String, name: String)
I imported spray-json.
If message is a string containing the JSON component, I want to do something like:
val json: JsValue = message.parseJson
val jobsJson = json.first.first
val jobs = jobsJson.map(job => Application(job(0), job(1), job(2)))
But this is not correct because I can't use json.first.
So how can I extract fields nested in the JSON object?
Is there another library that makes things easier?

Note: This answer is about play-json and not spray-json library.
You should be able to get data out of the json object with \ or \\
A single slash will look in the next lever down for what ever you are looking for while a double slash will look through the whole object.
Say you had the following json stored in a variable called obj:
{"foo":"bar","num":3, "value":{"num":4}}
using obj\num you would just get 3. But with obj\\num you would get in iterator with both 3 and 4 in it.
try this link for a little more information.

Related

GCP Proto Datastore encode JsonProperty in base64

I store a blob of Json in the datastore using JsonProperty.
I don't know the structure of the json data.
I am using endpoints proto datastore in order to retrieve my data.
The probleme is the json property is encoded in base64 and I want a plain json object.
For the example, the json data will be:
{
first: 1,
second: 2
}
My code looks something like:
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Model(EndpointsModel):
data = ndb.JsonProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class DataEndpoint(remote.Service):
#Model.method(path='mymodel2', http_method='POST',
name='mymodel.insert')
def MyModelInsert(self, my_model):
my_model.data = {"first": 1, "second": 2}
my_model.put()
return my_model
#Model.method(path='mymodel/{entityKey}',
http_method='GET',
name='mymodel.get')
def getMyModel(self, model):
print(model.data)
return model
API = endpoints.api_server([DataEndpoint])
When I call the api for getting a model, I get:
POST /_ah/api/myapi/v1/mymodel2
{
"data": "eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ=="
}
where eyJzZWNvbmQiOiAyLCAiZmlyc3QiOiAxfQ== is the base64 encoded of {"second": 2, "first": 1}
And the print statement give me: {u'second': 2, u'first': 1}
So, in the method, I can explore the json blob data as a python dict.
But, in the api call, the data is encoded in base64.
I expeted the api call to give me:
{
'data': {
'second': 2,
'first': 1
}
}
How can I get this result?
After the discussion in the comments of your question, let me share with you a sample code that you can use in order to store a JSON object in Datastore (it will be stored as a string), and later retrieve it in such a way that:
It will show as plain JSON after the API call.
You will be able to parse it again to a Python dict using eval.
I hope I understood correctly your issue, and this helps you with it.
import endpoints
from google.appengine.ext import ndb
from protorpc import remote
from endpoints_proto_datastore.ndb import EndpointsModel
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
#endpoints.api(name='myapi', version='v1', description='My Sample API')
class MyApi(remote.Service):
# URL: .../_ah/api/myapi/v1/mymodel - POSTS A NEW ENTITY
#Sample.method(path='mymodel', http_method='GET', name='Sample.insert')
def MyModelInsert(self, my_model):
dict={'first':1, 'second':2}
dict_str=str(dict)
my_model.column1="Year"
my_model.column2=2018
my_model.column3=dict_str
my_model.put()
return my_model
# URL: .../_ah/api/myapi/v1/mymodel/{ID} - RETRIEVES AN ENTITY BY ITS ID
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
dict=eval(my_model.column3)
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
application = endpoints.api_server([MyApi], restricted=False)
I have tested this code using the development server, but it should work the same in production using App Engine with Endpoints and Datastore.
After querying the first endpoint, it will create a new Entity which you will be able to find in Datastore, and which contains a property column3 with your JSON data in string format:
Then, if you use the ID of that entity to retrieve it, in your browser it will show the string without any strange encoding, just plain JSON:
And in the console, you will be able to see that this string can be converted to a Python dict (or also a JSON, using the json module if you prefer):
I hope I have not missed any point of what you want to achieve, but I think all the most important points are covered with this code: a property being a JSON object, store it in Datastore, retrieve it in a readable format, and being able to use it again as JSON/dict.
Update:
I think you should have a look at the list of available Property Types yourself, in order to find which one fits your requirements better. However, as an additional note, I have done a quick test working with a StructuredProperty (a property inside another property), by adding these modifications to the code:
#Define the nested model (your JSON object)
class Structured(EndpointsModel):
first = ndb.IntegerProperty()
second = ndb.IntegerProperty()
#Here I added a new property for simplicity; remember, StackOverflow does not write code for you :)
class Sample(EndpointsModel):
column1 = ndb.StringProperty()
column2 = ndb.IntegerProperty()
column3 = ndb.StringProperty()
column4 = ndb.StructuredProperty(Structured)
#Modify this endpoint definition to add a new property
#Sample.method(request_fields=('id',), path='mymodel/{id}', http_method='GET', name='Sample.get')
def MyModelGet(self, my_model):
if not my_model.from_datastore:
raise endpoints.NotFoundException('MyModel not found.')
#Add the new nested property here
dict=eval(my_model.column3)
my_model.column4=dict
print(json.dumps(my_model.column3))
print("This is the Python dict recovered from a string: {}".format(dict))
return my_model
With these changes, the response of the call to the endpoint looks like:
Now column4 is a JSON object itself (although it is not printed in-line, I do not think that should be a problem.
I hope this helps too. If this is not the exact behavior you want, maybe should play around with the Property Types available, but I do not think there is one type to which you can print a Python dict (or JSON object) without previously converting it to a String.

In Spark JobServer. How to pass a json formated string on the input.string?

Im trying to execute the following curl command to run a job:
curl -k --basic --user 'user:psw' -d 'input.string= {"user":13}' 'https://localhost:8090/jobs?appName=test&classPath=test.ImportCSVFiles&context=import&sync=true'
But I get the following error:
"com.typesafe.config.ConfigException$WrongType: String: 1: input.string has type OBJECT rather than STRING"
My idea is to pass more than one parameter like an sql query. A json format to easy handling on my submitted jar.
I'm on the right way or there is another way?
Copying it from comment.
Issue was json was trying to read it not as string but as object directly because of string starting with braces "{".
Correct input 'input.string= \"{\"user\":13}\" '
input.string is not a reserved keyword -- in fact, you can name the parameters arbitrarily. Let say, you POST two parameterss foo.string and foo.number. You then read the parameters in your SJS job, like this:
// run is the starting point of a SJS job (also see validate function)
override def runJob(sc: SparkContext, config: Config): Any = {
val cmd = config.getString("input.cmd")
val fooString = config.getString("foo.bar")
val fooNum = config.getInt("foo.number")
Just in case you plan to execute SJS jobs from Scala/Java:
Apache Commons Lang (org.apache.commons.lang3) comes with a very helpful class to escape JSON: StringEscapeUtils
I use it to escape inputs from my Scala application that I need to pass to SparkJobServer jobs, like this:
input.schema=\"" + StringEscapeUtils.escapeJson(referenceSchema)+"\""
referenceSchema is a JSON document (in my case a JSON array)
input.schema is then one of many comma-separated parameters in the body of HTTP post from Scala...

How to test if the case classes I have created for the parser are correct using json4s libraries in scala?

I have a huge json object and I need to parse it and then write some tests to see if everything goes as expected.
case class User(id: Identification, age:Int, name: String ...)
case class Identification(id: Int, hash: String ....)
... a lot more classes
Now I'm trying to write the tests
val json = parse(Source.fromFile(/path).getLines.mkString("\n"))
import org.json4s.DefaultFormats
implicit val formats = DefaultFormats
So my question is how can i test if the case classes are ok?
I thought maybe I should try to extract for ex. the users and then to check parameter by parameter if they are correct, but I don't thing that is a good way because it is not me who created the json so I'm not interested about the content.
Thanks
This is what I found working with JSON and case classes overt the years the minimum to test.
This three things should be tested always
Serialization with deserialiaztion combined
val example = MyCaseClass()
read[MyCaseClass](write(example)) should Equal example
Checks if a class can be transformed to JSON, read back and still has the same values. This one breaks more often than one would think.
Deserialization: JSON String -> CaseClasses
val exampleAsJSON : String
val exampleAsCaseClass : MyCaseClass
read(exampleAsJSON) shouldEqual exampleAsCaseClass
Checks if JSON still can be deserialized.
Serialization: CaseClasses -> JSON String
val exampleAsJSON : String
val exampleAsCaseClass : MyCaseClass
write(exampleAsCaseClass) shouldEqual exampleAsJSON
Checks if String/ JSON Representation stays stable. Here it is hard to keep the data up to date and often some not nice whitespace changes lead to false alarms.
Additional things to test
Are there optional parameters present? If yes all tests should be done with and without the optional parameters.

how to format scala's output from JSON to text file format

I am using Scala with Spark with below version.
Scala - 2.10.4
Spark - 1.2.0
I am mentioning below my situation.
I have a RDD(Say - JoinOp) with nested tuples(having case classes), for example -
(123,(null,employeeDetails(Smith,NY,DW)))
(456,(null,employeeDetails(John,IN,CS)))
This RDD is being created from a Join with two files.
Now, my requirement is to convert this JSON format to text file format without any "Null" and any case class name(here 'employeeDetails').
My desired output is =
123,Smith,NY,DW
456,John,IN,CS
I have tried with String Interpolation for the same but with partial success.
val textOp = JoinOp.map{jm => s"${jm._1},${jm._2._2}"}
if I print textOp then it will give me below output.
123,employeeDetails(Smith,NY,DW)
456,employeeDetails(John,IN,CS)
Now if I try to access nested elements in "employeeDetails" case class with String interpolation, it will throwing error like below.
JoinOp.map{jm => s"${jm._1},${jm._2._2._1}"}.foreach(println)
<console> :23: Error : value _1 is not member of jm
Here I can understand that, with the above syntax, it's unable to access nested element for "employeeDetails" case class.
What might be the solution for this issue. Any help or point forward would be of much help.
Many Thanks,
Pralay
Case classes have field names. So, instead of ._1 you need to use the field name for that position. Assuming the following definition:
case class EmployeeDetails(name: String, state: String)
you would access it
JoinOp.map{jm => s"${jm._1},${jm._2._2.name}"}.foreach(println)
If you just need to print all fields of case class, you may use productIterator to traverse field list.
val textOp = JoinOp.map { jm =>
s"""${jm._1},${jm._2._2.productIterator.mkString(",")}"""
}
You can do it like this:
case class EmployeeDetails(var0: String, var1: String, var2: String)
val data = List((123,(null, EmployeeDetails("Smith", "NY", "DW"))))
data.map {case (num, (sth, EmployeeDetails(var0, var1, var2))) =>
s"$num,$var0,$var1,$var2"}

Play Framework 2.2.1 / How should controller handle a queryString in JSON format

My client side executes a server call encompassing data (queryString) in a JSON object like this:
?q={"title":"Hello"} //non encoded for the sample but using JSON.stringify actually
What is an efficient way to retrieve the title and Hello String?
I tried this:
val params = request.queryString.map {case(k,v) => k->v.headOption}
that returns the Tuple: (q,Some({"title":"hello"}))
I could further extract to retrieve the values (although I would need to manually map the JSON object to a Scala object), but I wonder whether there is an easier and shorter way.
Any idea?
First, if you intend to pluck only the q parameter from a request and don't intend to do so via a route you could simply grab it directly:
val q: Option[String] = request.getQueryString("q")
Next you'd have to parse it as a JSON Object:
import play.api.libs.json._
val jsonObject: Option[JsValue] = q.map(raw: String => json.parse(raw))
with that you should be able to check for the components the jsonObject contains:
val title: Option[String] = jsonObject.flatMap { json: JsValue =>
(json \ "title").asOpt[String]
}
In short, omitting the types you could use a for comprehension for the whole thing like so:
val title = for {
q <- request.getQueryString("q")
json <- Try(Json.parse(q)).toOption
titleValue <- (json \ "title").asOpt[String]
} yield titleValue
Try is defined in scala.util and basically catches Exceptions and wraps it in a processable form.
I must admit that the last version simply ignores Exceptions during the parsing of the raw JSON String and treats them equally to "no title query has been set".
That's not the best way to know what actually went wrong.
In our productive code we're using implicit shortcuts that wraps a None and JsError as a Failure:
val title: Try[String] = for {
q <- request.getQueryString("q") orFailWithMessage "Query not defined"
json <- Try(Json.parse(q))
titleValue <- (json \ "title").validate[String].asTry
} yield titleValue
Staying in the Try monad we gain information about where it went wrong and can provide that to the User.
orFailWithMessage is basically an implicit wrapper for an Option that will transform it into Succcess or Failure with the specified message.
JsResult.asTry is also simply a pimped JsResult that will be Success or Failure as well.