How to code a json tree pruning in Python3? - json

The structure of json tree is known. However, how do we prune the json tree in Python3?
I had been trying to create a medical file format for patients. Each json object is a case or detail about a patient.
I tried linearizing the json, and count the levels, but the code quickly gets untenable. I also looked at binary trees, but this is not a binary tree. I attempted to itemized each json object as an atom, which means it would be a form of pointer, however, python does not have pointers.
Examples:
insert / replace json into 0.1.2
delete json at 0.1.1.3
extract json at 0.1.1.1 // may be sub-tree
{ // 0
"field1": "value1",
"field2": { // 0.0
"field3": "val3",
"field4": "val4"
}
}
For example, I want to remove 0.0:
{ // 0
"field1": "value1",
// removed
}
to insert 0.1:
{ // 0
"field1": "value1",
"field2": { // 0.0
"field3": "val3",
"field4": "val4"
}
"field2x": { // 0.1
"field3x": "val3x",
"field4x": "val4x"
}
}
0.1 must be given:
"field2x": { // 0.1
"field3x": "val3x",
"field4x": "val4x"
}
now i want to insert 0.1.0:
"field2xx": { // 0.1.0
"field3xx": "val3xx",
"field4xx": "val4xx"
}
{ // 0
"field1": "value1",
"field2": { // 0.0
"field3": "val3",
"field4": "val4"
}
"field2x": { // 0.1
"field3x": "val3x",
"field4x": "val4x"
"field2xx": { // 0.1.0
"field3xx": "val3xx",
"field4xx": "val4xx"
}
}
}
now I want to extract 0.1, it should give me:
"field2x": { // 0.1
"field3x": "val3x",
"field4x": "val4x"
"field2xx": { // 0.1.0
"field3xx": "val3xx",
"field4xx": "val4xx"
}
}
leaving:
{ // 0
"field1": "value1",
"field2": { // 0.0
"field3": "val3",
"field4": "val4"
}
// removed 0.1
}

I would highly recomend not attempting to use indices to find a field in a dictionary. With how JSON works, and their usual mappings to dictionaries/maps in a programming language, You generally cannot guarantee that ordering of the keys is preserved. However, depending on your specific version it may work, you can check the documentation at https://docs.python.org/3.10/library/json.html#json.dump
If you really need to use this kind of access and operations, then given a dictionary dict you can find the key at index i using list(dict.keys())[i], and it's value using list(dict.values())[i]. With that, you can parse your input parameters, crawl to the point in the document you need to make your operation, and perform that operation.
Again, I highly, highly advise against this approach as you want to use arrays instead of objects/dictionaries/maps if ordering is important. But if you really have no control over the input format, and you can guarantee that key ordering is preserved, then the above would work.

json.load() and json.loads() in the standard library take an object_pairs_hook parameter that lets you create custom objects from the JSON source.
You want a dict that lets you access items by index as well as by key. So the strategy is to provide a mapping class that lets you access the items either way. Then provide that class as the object_pairs_hook argument.
There is probably a library that does this, but my Google-fu is off this morning and I couldn't find one. So I wiped this up. Basically the class keeps an internal list of keys by index as well as a regular dict. The dunder methods keep the list and dict in synch.
import json
from collections.abc import MutableMapping
class IndexableDict(MutableMapping):
def __init__(self, *args, **kwds):
self.key_from_index = []
self.data = {}
if args:
for key, value in args:
self.__setitem__(key, value)
if kwds:
for key, value in kwds.items():
self.__setitem__(key, value)
def __setitem__(self, key_or_index, value):
if isinstance(key_or_index, (tuple, list)):
obj = self
for item in key_or_index[:-1]:
obj = obj[item]
obj[key_or_index[-1]] = value
elif isinstance(key_or_index, int):
key = self.key_from_index[key_or_index]
self.data[key] = value
elif isinstance(key_or_index, str):
if key_or_index not in self.data:
self.key_from_index.append(key_or_index)
self.data[key_or_index] = value
else:
raise ValueError(f"Unknown type of key '{key}'")
def __getitem__(self, key_or_index):
if isinstance(key_or_index, (tuple, list)):
obj = self
for item in key_or_index:
obj = obj[item]
return obj
elif isinstance(key_or_index, int):
key = self.key_from_index[key_or_index]
return self.data[key]
elif isinstance(key_or_index, str):
return self.data[key_or_index]
else:
raise ValueError(f"Unknown type of key '{key_or_index}'")
def __delitem__(self, key_or_index):
if isinstance(key_or_index, (tuple, list)):
obj = self
for item in key_or_index[:-1]:
obj = obj[item]
del obj[key_or_index[-1]]
elif isinstance(key_or_index, int):
key = self.key_from_index[key_or_index]
del self.data[key]
del self.key_from_index[key_or_index]
elif isinstance(key_or_index, str):
index = self.key_from_index.find(key_or_index)
del self.key_from_index[index]
del self.data[key_or_index]
else:
raise ValueError(f"Unknown type of key '{key_or_index}'")
def __iter__(self):
yield from self.data.items()
def __len__(self):
return len(self.data)
def __repr__(self):
s = ', '.join(f'{k}={repr(v)}' for k, v in self)
if len(s) > 50:
s = s[:47] + '...'
return f'<IterableDict({s})>'
It can be used like this:
data = """{"zero":0, "one":{"a":1, "b":2}, "two":[3, 4, 5]}"""
def object_pairs_hook(pairs):
return IndexableDict(*pairs)
dd = json.loads(data, object_pairs_hook=object_pairs_hook)
print(dd[0], dd['zero']) # get values by index or key
print(dd[(1,0)]) # get values by a list or tuple of keys
# equivalent to dd[1][0]
print(dd[(2,1)])
dd[['two', 1]] = 42 # sequence works to set a value too
print(dd[(2,1)])
Prints:
0 0
1
4
42
No time to do an insert(), but is should be similar to __setitem__(). It has not been tested much, so there may be some bugs. It could also use some refactoring.

I second the people saying that indexing a dictionary by position is not the natural way, but it is possible since python3.7 as the dict is insertion-ordered as a guaranteed language-feature in python.
This is my working example, the indices are different than your schematic, but it made more sense for me to index it like that. It makes use of recursive traversing of the data by the given indices and then depending on the operation removing, inserting or returning the nested data.
The insertion of data makes use of the mentioned ordering by insertion in python.
data.update(dict(**insert, **after))
It leaves the data before the insertion as is (so it is older and thus staying in front)
Then it updates the inserted data
And last the data after the inserted data (making it the oldest and thus at the back).
from copy import deepcopy
import itertools
import json
def traverse(data, index_list):
index = index_list.pop()
if index_list:
nested_data = list(data.values())[index]
return traverse(nested_data, index_list)
return data, index
def insert(data, data_insert, index_list):
data, index = traverse(data, index_list)
after = dict(itertools.islice(data.items(), index)) or None
data.update(dict(**data_insert, **after))
def remove(data, index_list):
key, data = get(data, index_list)
return {key: data.pop(key)}
def get(data, index_list):
data, index = traverse(data, index_list)
key = list(data.keys())[index]
return key, data
def run_example(example_name, json_in, index_str, operation, data_insert=None):
print("-" * 40 + f"\n{example_name}")
print(f"json before {operation} at {index_str}:")
print(json.dumps(json_in, indent=2, sort_keys=False))
index_list = [int(idx_char) for idx_char in index_str.split(".")]
if operation == "insert":
json_out = insert(json_in, data_insert, index_list)
elif operation == "remove":
json_out = remove(json_in, index_list)
elif operation == "get":
key, data = get(json_in, index_list)
json_out = {key: data[key]}
else:
raise NotImplementedError("Not a valid operation")
print(f"json after:")
print(json.dumps(json_in, indent=2, sort_keys=False))
print(f"json returned:")
print(json.dumps(json_out, indent=2, sort_keys=False))
json_data = {
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
}
}
run_example("example 1", deepcopy(json_data), "1", "remove")
run_example("example 2", json_data, "2", "insert", {"field2x": {"field3x": "val3x", "field4x": "val4x"}})
run_example("example 3", json_data, "2", "get")
run_example("example 4", json_data, "2.2", "insert", {"field2xx": {"field3xx": "val3xx", "field4xx": "val4xx"}})
run_example("example 5", json_data, "2", "remove")
This gives the following output:
----------------------------------------
example 1
json before remove at 1:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
}
}
json after:
{
"field1": "value1"
}
json returned:
{
"field2": {
"field3": "val3",
"field4": "val4"
}
}
----------------------------------------
example 2
json before insert at 2:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
}
}
json after:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
},
"field2x": {
"field3x": "val3x",
"field4x": "val4x"
}
}
json returned:
null
----------------------------------------
example 3
json before get at 2:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
},
"field2x": {
"field3x": "val3x",
"field4x": "val4x"
}
}
json after:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
},
"field2x": {
"field3x": "val3x",
"field4x": "val4x"
}
}
json returned:
{
"field2x": {
"field3x": "val3x",
"field4x": "val4x"
}
}
----------------------------------------
example 4
json before insert at 2.2:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
},
"field2x": {
"field3x": "val3x",
"field4x": "val4x"
}
}
json after:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
},
"field2x": {
"field3x": "val3x",
"field4x": "val4x",
"field2xx": {
"field3xx": "val3xx",
"field4xx": "val4xx"
}
}
}
json returned:
null
----------------------------------------
example 5
json before remove at 2:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
},
"field2x": {
"field3x": "val3x",
"field4x": "val4x",
"field2xx": {
"field3xx": "val3xx",
"field4xx": "val4xx"
}
}
}
json after:
{
"field1": "value1",
"field2": {
"field3": "val3",
"field4": "val4"
}
}
json returned:
{
"field2x": {
"field3x": "val3x",
"field4x": "val4x",
"field2xx": {
"field3xx": "val3xx",
"field4xx": "val4xx"
}
}
}

Related

Scala manipulate json object

I have a dynamic json object generated with a certain format and I would like to manipulate that object to map it to another format in scala.
The problem is that the names of the fields are dynamic so "field1" and "field2" can be anything.
Is there any way to do it dynamically in scala?
The original object:
{
"field1": {
"value" "some value 1",
"description": "some test description",
...
},
"field2": {
"value" "some value 2",
"description": "some test description",
...
}
}
And I'd like to convert it to something like:
{
"field1": "some value 1",
"field2": "some value 2"
}
You can collect all keys and then check if the downField("value") exists
import io.circe._
import io.circe.literal.JsonStringContext
object CirceFieldsToMap {
def main(args: Array[String]): Unit = {
val json: Json =
json"""{
"field1": {
"foo" : "bar1",
"value" : "foobar1"
},
"field2": {
"foo" : "bar2",
"value" : "foobar2"
},
"field3": {
"foo" : "bar2"
}
}"""
implicit val decodeFoo = new Decoder[Map[String, Option[String]]] {
final def apply(c: HCursor): Decoder.Result[Map[String, Option[String]]] = {
val result = c.keys.get //// You should handle the .get properly ( if None it breaks)
.toList
.foldLeft(List.empty[(String, Option[String])]) { case (acc, key) =>
acc :+ (key, c.downField(key).downField("value").as[String].toOption)
}
Right(result.toMap)
}
}
val myField01 = json.as[Map[String, Option[String]]]
println(myField01) //Right(Map(field1 -> Some(foobar1), field2 -> Some(foobar2), field3 -> None))
}
}

Nested Json extract the value with unknown key in the middle

I have a Json column(colJson) in a dataframe like this
{
"a": "value1",
"b": "value1",
"c": true,
"details": {
"qgiejfkfk123": { //unknown value
"model1": {
"score": 0.531,
"version": "v1"
},
"model2": {
"score": 0.840,
"version": "v2"
},
"other_details": {
"decision": false,
"version": "v1"
}
}
}
}
Here 'qgiejfkfk123' is dynamic value and changes with each row. However I need to extract model1.score as well as model2.score.
I tried
sourceDf.withColumn("model1_score",get_json_object(col("colJson"), "$.details.*.model1.score").cast(DoubleType))
.withColumn("model2_score",get_json_object(col("colJson"), "$.details.*.model2.score").cast(DoubleType))
but did not work.
I managed to get your solution by using from_json, parsing the dynamic value as Map<String, Struct> and exploding the values from it:
val schema = "STRUCT<`details`: MAP<STRING, STRUCT<`model1`: STRUCT<`score`: DOUBLE, `version`: STRING>, `model2`: STRUCT<`score`: DOUBLE, `version`: STRING>, `other_details`: STRUCT<`decision`: BOOLEAN, `version`: STRING>>>>"
val fromJsonDf = sourceDf.withColumn("colJson", from_json(col("colJson"), lit(schema)))
val explodeDf = fromJsonDf.select($"*", explode(col("colJson.details")))
// +----------------------------------------------------------+------------+--------------------------------------+
// |colJson |key |value |
// +----------------------------------------------------------+------------+--------------------------------------+
// |{{qgiejfkfk123 -> {{0.531, v1}, {0.84, v2}, {false, v1}}}}|qgiejfkfk123|{{0.531, v1}, {0.84, v2}, {false, v1}}|
// +----------------------------------------------------------+------------+--------------------------------------+
val finalDf = explodeDf.select(col("value.model1.score").as("model1_score"), col("value.model2.score").as("model2_score"))
// +------------+------------+
// |model1_score|model2_score|
// +------------+------------+
// | 0.531| 0.84|
// +------------+------------+

JSON Objects null filtering in scala

I am using play.api.libs.json in Scala (2.12.8) to process some json objects. I have for example a JSON string that looks like:
{
"field1": null,
"field2": 23,
"field3": {
"subfield1": "a",
"subfield2": null
},
"field4": {
"subfield1": true,
"subfield2": {
"subsubfield1": null,
"subsubfield2": "45"
},
"field5": 3
}
}
And I want to filter out every null fields or subfields.
As explained here: Play: How to remove the fields without value from JSON and create a new JSON with them
Doing:
import play.api.libs.json.{ JsNull, JsObject, JsValue, Json }
val j = Json.parse(myJsonString).as[JsObject]
JsObject(j.fields.filterNot(t => withoutValue(t._2)))
def withoutValue(v: JsValue) = v match {
case JsNull => true
case _ => false
}
helps me remove the upper level fields: in my case, field1
But field3.subfield2 and field4.subfield2.subsubfield1 are still present. I want to remove them. Also I should mention that not every subfields can be null at once. Should this happen, I think we could just remove the upper level field. If field3.subfield1and field3.subfield2 are null, we can remove field3.
Any idea on how to do this neatly in Scala?
PS: the desired output is:
{
"field2": 23,
"field3": {
"subfield1": "a"
},
"field4": {
"subfield1": true,
"subfield2": {
"subsubfield2": "45"
},
"field5": 3
}
}
You need to do a recursive solution. For example:
def removeNulls(jsObject: JsObject): JsValue = {
JsObject(jsObject.fields.collect {
case (s, j: JsObject) =>
(s, removeNulls(j))
case other if (other._2 != JsNull) =>
other
})
}
Code run at Scastie. Output is as expected.

How do I get "keys" from an unkown .json saved in my computer using groovy

My end goal is to parse thru an unknown .json file stored on my laptop and get the key names (not the values only key names) using Groovy in SoapUI.
I want to parse an unknown JSON file stored in my computer and get its keys (name of the keys, not the values). I can do 2 things separately:
I am able to read the local JSON using the following code I found online:
def JSON_URL = "file:///C:/Users/xxx/example.json"
URL url = new URL(JSON_URL)
InputStream urlStream = null
try {
urlStream = url.openStream()
BufferedReader reader = new BufferedReader(new Inpu tStreamReader(urlStream))
JsonSlurper js1 = new JsonSlurper()
Object result = js1.parse(reader)
log.info "==> readJSONfile2 result of read: "+result
} catch (Exception e) {
log.info e
}
And if I have the URL, I am able to parse it and get keys like so:
// getting the response
def resp1 = context.expand('${testStepName#response}')
// parsing the set context
def js1 = new JsonSlurper().parseText(resp1)
def keys = js1.entrySet() as List
log.info "==> runTreatmentPlan keys list is: "+keys
log.info "==> runTreatmentPlan keys size is: "+keys.size()
But I am unable to get the keys if the JSON is local to my machine ie unable to combine both the codes. I get error when I do:
Object result = js1.parseText(reader)
I am new to groovy and SoapUI and json - a total newbie and this is my first question. I am really scared because I have seen that some people are kinda rough towards other newbies if the question is basic. I promise, I did google a lot, and I am sure the experienced people might find my question stupid as well, but I am really stuck. I am unable to combine both pieces of code. I kinda feel that I will some how have to use the response from #1 code, but I don't know how.
Can someone help me please?
==== Updating with the JSON structure:
{
"Key0": [
{
"Key1": "Value1",
"Key2": "Value2",
"Key3": "Value3",
"Key4": {
"subKey1": "subValue1"
},
"Key5": {
"subKey1": "subValue1",
"subKey2": "subValue2",
"subKey3": "subValue3",
"subKey4": "subValue4"
},
"Key6": "2016-07-11T17:52:59.000Z",
"Key7": [
{
"subKey1": "subValue1",
"subKey2": "subValue2"
},
{
"subKey1": "subValue1",
"subKey2": "subValue2"
},
{
"subKey1": "subValue1",
"subKey2": "subValue2"
},
{
"subKey1": "subValue1",
"subKey2": "subValue2"
}
]
},
{
"Key1": "Value1",
"Key2": "Value2",
"Key3": "Value3",
"Key4": {
"subKey1": "subValue1"
},
"Key5": {
"subKey1": "subValue1",
"subKey2": "subValue2",
"subKey3": "subValue3",
"subKey4": "subValue4"
},
"Key6": "2016-07-11T17:52:59.000Z",
"Key7": [
{
"subKey1": "subValue1",
"subKey2": "subValue2"
},
{
"subKey1": "subValue1",
"subKey2": "subValue2"
},
{
"subKey1": "subValue1",
"subKey2": "subValue2"
},
{
"subKey1": "subValue1",
"subKey2": "subValue2"
}
]
}
]
}
Given your input, this code:
import groovy.json.JsonSlurper
def traverse
traverse = { tree, keys = [], prefix = '' ->
switch (tree) {
case Map:
tree.each { k, v ->
def name = prefix ? "${prefix}.${k}" : k
keys << name
traverse(v, keys, name)
}
return keys
case Collection:
tree.eachWithIndex { e, i -> traverse(e, keys, "${prefix}[$i]") }
return keys
default :
return keys
}
}
def content = new JsonSlurper().parse( new File( 'sample.json' ) )
traverse(content).each { println it }
Yields this output:
Key0
Key0[0].Key1
Key0[0].Key2
Key0[0].Key3
Key0[0].Key4
Key0[0].Key4.subKey1
Key0[0].Key5
Key0[0].Key5.subKey1
Key0[0].Key5.subKey2
Key0[0].Key5.subKey3
Key0[0].Key5.subKey4
Key0[0].Key6
Key0[0].Key7
Key0[0].Key7[0].subKey1
Key0[0].Key7[0].subKey2
Key0[0].Key7[1].subKey1
Key0[0].Key7[1].subKey2
Key0[0].Key7[2].subKey1
Key0[0].Key7[2].subKey2
Key0[0].Key7[3].subKey1
Key0[0].Key7[3].subKey2
Key0[1].Key1
Key0[1].Key2
Key0[1].Key3
Key0[1].Key4
Key0[1].Key4.subKey1
Key0[1].Key5
Key0[1].Key5.subKey1
Key0[1].Key5.subKey2
Key0[1].Key5.subKey3
Key0[1].Key5.subKey4
Key0[1].Key6
Key0[1].Key7
Key0[1].Key7[0].subKey1
Key0[1].Key7[0].subKey2
Key0[1].Key7[1].subKey1
Key0[1].Key7[1].subKey2
Key0[1].Key7[2].subKey1
Key0[1].Key7[2].subKey2
Key0[1].Key7[3].subKey1
Key0[1].Key7[3].subKey2
import groovy.json.JsonSlurper
/**
* sample.json present in C:/tools/
* { "foo": "bar", "baz": 123 }
*/
def content = new JsonSlurper().parse( new File( 'C:/tools/sample.json' ) )
assert content.keySet() == ['baz', 'foo'] as Set
UPDATE:
After looking at the actual json structure and the issues with the version of Groovy in SOAPUI, below would be a viable option to get all the keys in a specific format:
import groovy.json.JsonSlurper
def textFromFile = new File('/path/to/example.json').text
def content = new JsonSlurper().parseText(textFromFile)
def getAllKeys(Map item) {
item.collect { k, v ->
v instanceof Map ? convertToString(v).collect { "${k}.$it" } : k
}.flatten()
}
assert getAllKeys(content) == [
'Key1', 'Key2', 'Key3',
'Key4.subKey1', 'Key5.subKey1', 'Key5.subKey2',
'Key5.subKey3', 'Key5.subKey4',
'Key6', 'Key7'
]

Get keys from json in python

I have an API whose response is json like this:
{
"a":1,
"b":2,
"c":[
{
"d":4,
"e":5,
"f":{
"g":6
}
}
]
}
How can I write a python program which will give me the keys ['d','e','g']. What I tried is:
jsonData = request.json() #request is having all the response which i got from api
c = jsonData['c']
for i in c.keys():
key = key + str(i)
print(key)
Function which return only keys which aren't containing dictionary as their value.
jsonData = {
"a": 1,
"b": 2,
"c": [{
"d": 4,
"e": 5,
"f": {
"g": 6
}
}]
}
def get_simple_keys(data):
result = []
for key in data.keys():
if type(data[key]) != dict:
result.append(key)
else:
result += get_simple_keys(data[key])
return result
print get_simple_keys(jsonData['c'][0])
To avoid using recursion change line result += get_simple_keys(data[key])to result += data[key].keys()
Try this,
dct = {
"a":1,
"b":2,
"c":[
{
"d":4,
"e":5,
"f":{
"g":6
}
}
]
}
k=dct["c"][0]
for key in k.keys():
print key