Play JSON Parse and Extract Elements Without a Key Path - json

I have a JSON that looks like this, yes the JSON is a valid format.
[2,
"19223201",
"BootNotification",
{
"reason": "PowerUp",
"chargingStation": {
"model": "SingleSocketCharger",
"vendorName": "VendorX"
}
}
]
I'm using Play framework's JSON library and I would like to understand how I could parse the 3rd line and extract the BootNotification value as a String.
If it had a key, I can use that key to traverse the JSON and get the corresponding value, but this is not the case here. I also do not have the possibility to load this line by line and infer from line number 3 as with the example above.
Any suggestions on how I could do this?

I think, I have found out a way after trying all this on Ammonite. Here is what I could do:
# val input: JsValue = Json.parse("""[2,"12345678","BNR",{"reason":"PowerUp"}]""")
input: JsValue = JsArray(ArrayBuffer(JsNumber(2), JsString("12345678"), JsString("BNR"), JsObject(Map("reason" -> JsString("PowerUp")))))
Parsing the JSON, I get a nice array and I know that I always expect just 4 elements in the Array, so explicitly looking for an element with the array index is what I need. So to get the text at position 3, I could do the following:
# (input \ 2)
res2: JsLookupResult = JsDefined(JsString("BNR"))
# (input \ 2).toOption
res3: Option[JsValue] = Some(JsString("BNR"))
# (input \ 2).toOption.isDefined
res4: Boolean = true

Related

How to dynamically reference items in a JSON struct using pyspark [duplicate]

I have a pyspark dataframe with StringType column (edges), which contains a list of dictionaries (see example below). The dictionaries contain a mix of value types, including another dictionary (nodeIDs). I need to explode the top-level dictionaries in the edges field into rows; ideally, I should then be able to convert their component values into separate fields.
Input:
import findspark
findspark.init()
SPARK = SparkSession.builder.enableHiveSupport() \
.getOrCreate()
data = [
Row(trace_uuid='aaaa', timestamp='2019-05-20T10:36:33+02:00', edges='[{"distance":4.382441320292239,"duration":1.5,"speed":2.9,"nodeIDs":{"nodeA":954752475,"nodeB":1665827480}},{"distance":14.48582171131768,"duration":2.6,"speed":5.6,"nodeIDs":{"nodeA":1665827480,"nodeB":3559056131}}]', count=156, level=36),
Row(trace_uuid='bbbb', timestamp='2019-05-20T11:36:10+03:00', edges='[{"distance":0,"duration":0,"speed":0,"nodeIDs":{"nodeA":520686131,"nodeB":520686216}},{"distance":8.654358326561642,"duration":3.1,"speed":2.8,"nodeIDs":{"nodeA":520686216,"nodeB":506361795}}]', count=179, level=258)
]
df = SPARK.createDataFrame(data)
Desired output:
data_reshaped = [
Row(trace_uuid='aaaa', timestamp='2019-05-20T10=36=33+02=00', distance=4.382441320292239, duration=1.5, speed=2.9, nodeA=954752475, nodeB=1665827480, count=156, level=36),
Row(trace_uuid='aaaa', timestamp='2019-05-20T10=36=33+02=00', distance=16.134844841712574, duration=2.9,speed=5.6, nodeA=1665827480, nodeB=3559056131, count=156, level=36),
Row(trace_uuid='bbbb', timestamp='2019-05-20T11=36=10+03=00', distance=0, duration=0, speed=0, nodeA=520686131, nodeB=520686216, count=179, level=258),
Row(trace_uuid='bbbb', timestamp='2019-05-20T11=36=10+03=00', distance=8.654358326561642, duration=3.1, speed=2.8, nodeA=520686216, nodeB=506361795, count=179, level=258)
]
Is there a way to do that? I've tried using cast to cast the edges field into an array first, but I can't figure out how to get it to work with the mixed data types.
I'm using Spark 2.4.0.
You can use from_json() with schema_of_json() to infer the JSON schema. for example:
from pyspark.sql import functions as F
# a sample json string:
edges_json_sample = data[0].edges
# or edges_json_sample = df.select('edges').first()[0]
>>> edges_json_sample
#'[{"distance":4.382441320292239,"duration":1.5,"speed":2.9,"nodeIDs":{"nodeA":954752475,"nodeB":1665827480}},{"distance":14.48582171131768,"duration":2.6,"speed":5.6,"nodeIDs":{"nodeA":1665827480,"nodeB":3559056131}}]'
# infer schema from the sample string
schema = df.select(F.schema_of_json(edges_json_sample)).first()[0]
>>> schema
#u'array<struct<distance:double,duration:double,nodeIDs:struct<nodeA:bigint,nodeB:bigint>,speed:double>>'
# convert json string to data structure and then retrieve desired items
new_df = df.withColumn('data', F.explode(F.from_json('edges', schema))) \
.select('*', 'data.*', 'data.nodeIDs.*') \
.drop('data', 'nodeIDs', 'edges')
>>> new_df.show()
+-----+-----+--------------------+----------+-----------------+--------+-----+----------+----------+
|count|level| timestamp|trace_uuid| distance|duration|speed| nodeA| nodeB|
+-----+-----+--------------------+----------+-----------------+--------+-----+----------+----------+
| 156| 36|2019-05-20T10:36:...| aaaa|4.382441320292239| 1.5| 2.9| 954752475|1665827480|
| 156| 36|2019-05-20T10:36:...| aaaa|14.48582171131768| 2.6| 5.6|1665827480|3559056131|
| 179| 258|2019-05-20T11:36:...| bbbb| 0.0| 0.0| 0.0| 520686131| 520686216|
| 179| 258|2019-05-20T11:36:...| bbbb|8.654358326561642| 3.1| 2.8| 520686216| 506361795|
+-----+-----+--------------------+----------+-----------------+--------+-----+----------+----------+
# expected result
data_reshaped = new_df.rdd.collect()

How to parse a string to key value pair using regex?

What is the best way to parse the string into key value pair using regex?
Sample input:
application="fre" category="MessagingEvent" messagingEventType="MessageReceived"
Expected output:
application "fre"
Category "MessagingEvent"
messagingEventType "MessageReceived"
We already tried the following regex and its working.
application=(?<application>(...)*) *category=(?<Category>\S*) *messagingEventType=(?<messagingEventType>\S*)
But we want a generic regex which will parse the sample input to the expected output as key value pair?
Any idea or solution will be helpful.
input = 'application="fre" category="MessagingEvent" messagingEventType="MessageReceived"'
puts input.
scan(/(\w+)="([^"]+)"/). # scan for KV-pairs
map{ |k, v| %Q|#{k.ljust(30,' ')}"#{v}"| }. # adjust as you requested
join($/) # join with platform-dependent line delimiters
#⇒ application "fre"
# category "MessagingEvent"
# messagingEventType "MessageReceived"
Instead of using regex, it can be done by spliting and storing the string in hash like below:
input = 'application="fre" category="MessagingEvent" messagingEventType="MessageReceived"'
res = {}
input.split.each { |str| a,b = str.split('='); res[a] = b}
puts res
==> {"application"=>"\"fre\"", "category"=>"\"MessagingEvent\"", "messagingEventType"=>"\"MessageReceived\""}

Python Json dumps not printing values

I am trying out a simple program
import json
class unified_response():
trinitiversion="3"
preprocess = []
if __name__ == '__main__':
ur = unified_response()
preprocessValDict = dict()
preprocessValDict["input"] = "some string"
preprocessValDict["correct"] = " correct some string"
ur.preprocess.append(preprocessValDict)
s = json.dumps(unified_response.__dict__)
print s
s = json.dumps(ur.__dict__)
print s
First print statement prints
{"preprocess": [{"input": "some string", "correct": " correct some string"}], "trinitiversion": "3", "__module__": "__main__", "__doc__": null}
Second print statement prints
{}
Why is the second object not printing any values?
This is not related at all to the json module.
ur.__dict__ is an empty dictionary since only instance attributes are saved in the instance.
The unified_response class only has class attributes hence ur.__dict__ is an empty dict which json.dumps transforms to an empty string.
Compare the outputs of print unified_response.__dict__ and print ur.__dict__.
As a side note:
ur.preprocess.append(preprocessValDict)
Accessing (and especially modifying) class attributes through an instance is considered a bad practice as it can lead to hard-to-find bugs.

Play JSON: Reading and validating a JsObject with unknown keys

I'm reading a nested JSON document using several Reads[T] implementations, however, I'm stuck with the following sub-object:
{
...,
"attributes": {
"keyA": [1.68, 5.47, 3.57],
"KeyB": [true],
"keyC": ["Lorem", "Ipsum"]
},
...
}
The keys ("keyA", "keyB"...) as well as the amount of keys are not known at compile time and can vary. The values of the keys are always JsArray instances, but of different size and type (however, all elements of a particular array must have the same JsValue type).
The Scala representation of one single attribute:
case class Attribute[A](name: String, values: Seq[A])
// 'A' can only be String, Boolean or Double
The goal is to create a Reads[Seq[Attribute]] that can be used for the "attributes"-field when transforming the whole document (remember, "attributes" is just a sub-document).
Then there is a simple map that contains allowed combinations of keys and array types that should be used to validate attributes. Edit: This map is specific for each request (or rather specific for every type of json document). But you can assume that it is always available in the scope.
val required = Map(
"KeyA" -> "Double",
"KeyB" -> "String",
"KeyD" -> "String",
)
So in the case of the JSON shown above, the Reads should create two errors:
"keyB" does exist, but has the wrong type (expected String, was boolean).
"keyD" is missing (whereas keyC is not needed and can be ignored).
I'm having trouble creating the necessary Reads. The first thing I tried as a first step, from the perspective of the outer Reads:
...
(__ \ "attributes").reads[Map[String, JsArray]]...
...
I thought this is a nice first step because if the JSON structure is not an object containing Strings and JsArrays as key-value pairs, then the Reads fails with proper error messages. It works, but: I don't know how to go on from there. Of course I just could create a method that transforms the Map into a Seq[Attribute], but this method somehow should return a JsResult, since there are further validations to do.
The second thing I tried:
val attributeSeqReads = new Reads[Seq[Attribute]] {
def reads(json: JsValue) = json match {
case JsObject(fields) => processAttributes(fields)
case _ => JsError("attributes not an object")
}
def processAttributes(fields: Map[String, JsValue]): JsResult[Seq[Attribute]] = {
// ...
}
}
The idea was to validate each element of the map manually within processAttributes. But I think this is too complicated. Any help is appreciated.
edit for clarification:
At the beginning of the post I said that the keys (keyA, keyB...) are unknown at compile time. Later on I said that those keys are part of the map required which is used for validation. This sounds like a contradiction, but the thing is: required is specific for each document/request and is also not known at compile time. But you don't need to worry about that, just assume that for every request the correct required is already available in the scope.
You are too confused by the task
The keys ("keyA", "keyB"...) as well as the amount of keys are not known at compile time and can vary
So the number of keys and their types are known in advance and the final?
So in the case of the JSON shown above, the Reads should create two
errors:
"keyB" does exist, but has the wrong type (expected String, was
boolean).
"keyD" is missing (whereas keyC is not needed and can be ignored).
Your main task is just to check the availability and compliance?
You may implement Reads[Attribute] for every your key with Reads.list(Reads.of[A]) (this Reads will check type and required) and skip omitted (if not required) with Reads.pure(Attribute[A]). Then tuple convert to list (_.productIterator.toList) and you will get Seq[Attribute]
val r = (
(__ \ "attributes" \ "keyA").read[Attribute[Double]](list(of[Double]).map(Attribute("keyA", _))) and
(__ \ "attributes" \ "keyB").read[Attribute[Boolean]](list(of[Boolean]).map(Attribute("keyB", _))) and
((__ \ "attributes" \ "keyC").read[Attribute[String]](list(of[String]).map(Attribute("keyC", _))) or Reads.pure(Attribute[String]("keyC", List()))) and
(__ \ "attributes" \ "keyD").read[Attribute[String]](list(of[String]).map(Attribute("keyD", _)))
).tupled.map(_.productIterator.toList)
scala>json1: play.api.libs.json.JsValue = {"attributes":{"keyA":[1.68,5.47,3.57],"keyB":[true],"keyD":["Lorem","Ipsum"]}}
scala>res37: play.api.libs.json.JsResult[List[Any]] = JsSuccess(List(Attribute(keyA,List(1.68, 5.47, 3.57)), Attribute(KeyB,List(true)), Attribute(keyC,List()), Attribute(KeyD,List(Lorem, Ipsum))),)
scala>json2: play.api.libs.json.JsValue = {"attributes":{"keyA":[1.68,5.47,3.57],"keyB":[true],"keyC":["Lorem","Ipsum"]}}
scala>res38: play.api.libs.json.JsResult[List[Any]] = JsError(List((/attributes/keyD,List(ValidationError(List(error.path.missing),WrappedArray())))))
scala>json3: play.api.libs.json.JsValue = {"attributes":{"keyA":[1.68,5.47,3.57],"keyB":["Lorem"],"keyC":["Lorem","Ipsum"]}}
scala>res42: play.api.libs.json.JsResult[List[Any]] = JsError(List((/attributes/keyD,List(ValidationError(List(error.path.missing),WrappedArray()))), (/attributes/keyB(0),List(ValidationError(List(error.expected.jsboolean),WrappedArray())))))
If you will have more than 22 attributes, you will have another problem: Tuple with more than 22 properties.
for dynamic properties in runtime
inspired by 'Reads.traversableReads[F[_], A]'
def attributesReads(required: Map[String, String]) = Reads {json =>
type Errors = Seq[(JsPath, Seq[ValidationError])]
def locate(e: Errors, idx: Int) = e.map { case (p, valerr) => (JsPath(idx)) ++ p -> valerr }
required.map{
case (key, "Double") => (__ \ key).read[Attribute[Double]](list(of[Double]).map(Attribute(key, _))).reads(json)
case (key, "String") => (__ \ key).read[Attribute[String]](list(of[String]).map(Attribute(key, _))).reads(json)
case (key, "Boolean") => (__ \ key).read[Attribute[Boolean]](list(of[Boolean]).map(Attribute(key, _))).reads(json)
case _ => JsError("")
}.iterator.zipWithIndex.foldLeft(Right(Vector.empty): Either[Errors, Vector[Attribute[_ >: Double with String with Boolean]]]) {
case (Right(vs), (JsSuccess(v, _), _)) => Right(vs :+ v)
case (Right(_), (JsError(e), idx)) => Left(locate(e, idx))
case (Left(e), (_: JsSuccess[_], _)) => Left(e)
case (Left(e1), (JsError(e2), idx)) => Left(e1 ++ locate(e2, idx))
}
.fold(JsError.apply, { res =>
JsSuccess(res.toList)
})
}
(__ \ "attributes").read(attributesReads(Map("keyA" -> "Double"))).reads(json)
scala> json: play.api.libs.json.JsValue = {"attributes":{"keyA":[1.68,5.47,3.57],"keyB":[true],"keyD":["Lorem","Ipsum"]}}
scala> res0: play.api.libs.json.JsResult[List[Attribute[_ >: Double with String with Boolean]]] = JsSuccess(List(Attribute(keyA,List(1.68, 5.47, 3.57))),/attributes)

create empty object (empty brackets) with toJSON

I need to create a JSON string from R using toJSON. My issue is that part of the JSON should contain an empty JSON object {}. I thought list() would do it for me:
> fromJSON("{}")
list()
> toJSON(list())
[1] "[]"
[Scratches head]
Anybody know how to get a {} using toJSON? I am using a lib that does the encoding, so answers that do not use toJSON will not help me.
Thanks!
There are a number of packages that have toJSON and fromJSON functions.
Using rjson::fromJSON, '{}' is read in as a list of length 0, whereas RJSONIO::fromJSON reads in {} as a named list of length 0.
In either package, calling fromJSON on a named list will do what you want.
Clearly, RJSONIO is performing as you want it to do
RJSONIO::toJSON(RJSONIO::fromJSON('{}'))
## [1] '{}'
rjson::toJSON(rjson::fromJSON('{}'))
## [1] "[]"
If you use rjson then you will have to manually set the names of the list of length 0
rjson::toJSON(setNames(rjson::fromJSON('{}'), character(0)))
## [1] "{}"